From: Darryl Neil Veitch (d.veitch@ee.mu.oz.au)
Date: Mon Jan 22 2001 - 18:57:34 PST
Dear NLANR PMA team,
thanks for including us.
> o focus of your research in the area of passive measurement analysis
> o trace durations and trace schedules
Our work in this area is different in focus I think to that of many others.
We are first of all focused on very high resolution measurements, for two reasons:
- a main interest is scaling (fractal) behaviour of traffic, and to investigate this
well, we need lots of scales present, which means lots of data, and as we are interested
in measuring and modelling the fine time-scale features of data, we are not interested
only in very long coarser measurements but in a resolution down to close to the
transmission unit.
- our modelling approach is to examine many different representations of the same data,
for examine time-series such as the number of bytes (packets) in consecutive bins,
both for undifferentiated traffic (say all bytes) as well as for higher level
such as the number of new TCP connections in a bin, and the inter-arrival times of
such. We also want to go deeper and understand the burstiness structure of
packets inside TCP connections. This requires having raw data level information,
so that all these, often non-orthogonal, but nonetheless different points of view of
the same trace at different level of detail can be reconstructed, and new ones
produced as needed.
So, we definitely need long traces with very complete information.
Typically a set of such traces at a single point over a extended period, such as the Auckland II
traces, suit us well. They are detailed enough and long enough to test things such as
reproducibility of observations and worth of models against such causes of non-stationarity
as diurnal variation, day of week variation, and `extraordinary events'. At this level of
modelling, such a set can keep us busy for a long time. The next set of traces could
be months later, in order to test the applicability of results with respect to
`traffic growth and evolution' type of non-stationarities. A central issue here is,
yes of course things change over time (like diurnal load), but perhaps other quantities
do not, after normalising for the changing load or perhaps even if not. For example many
scaling features do not vary with load. You want a trace to be long enough so that
`stationary' subsets can be found, which simplies analysis and modelling, but these
subsets may or may not correspond to periods of approximately constant load.
In summary: our preference for trace size is: as detailed as possible, then as long as possible! for
say a couple of weeks, then do it again 6 months later. Would you have the time and cycles
to do it any more often anyway?
Historic data is very valuable in that a large set of traces like this takes a long time to
analyse, and there is a lot of value from having several groups study the same data, so that
more can be discovered and a common language and understanding built up. Witness the
value from the `Bellcore Ethernet' data sets which became defacto standards for the analysis
of long range dependenc in traffic.
More generally (and appropriately given that we are interested in time-scale analysis),
I see trace collection as an exercise on multiple time scales. Clearly some will
want coarse scale measurements (and therefore over long periods), and other fine scale
measurements (over not too long a period by practical
necessity...). The answer would seem to be a set of trace sets, with collection
rate (in sets per year at a given level of detail+length) inversely proportional to
storage size of the set (or perhaps at some other power-law frequency).
> o monitor placement strategies
For the work described above, an entry point link where both directions of TCP connections
must pass suits us well, although in many ways we could do the same things with only
one way information. The exception is TCP dynamics where half of the story would be
missing if only one direction were available. So we definately do want at least
some entry/choke point type links monitored.
We are also interested in active measurement, and there, to validate ideas on the nature
of cross traffic on a route, and even to capture a picture of an entire route AS the
probe traffic traverses it, it would be wonderful to have an entire route instrumented
in a synchronised way (although very high resolution synchronisation would not be essential),
the probe streams would be identifiable.
> o trace postprocessing and WWW publishing
Because of the detailed nature of the statistical analysis, and the need to analyse data in
an investigative way with a lot of feedback between the data and the analysis, it is
impractical for NLANR to do processing for us. However the following points are important:
-- some higher level `what is this trace like' information on the web is very useful,
so that appropriate traces can be identified for more attention.
I am not very familar with how this is done at NLANR at present, but
for example plots like those for the Auckland II site at WAND are very helpful.
It would be useful to us nice to complement these a little to include some statistics on scaling,
and on marginal distributions. Such additions would not cost much in memory, as the
extra data is O(1) in the length of the time series. (we could help with this)
-- to go further, well, inevitably people want to do different things,
so flexibility is required. Rather than
trying to have a data presentation engine be all things to all people, an alternative
would be an option to do some processing on site, perhaps even with a general purpose tool
such as Matlab available to do tailored analysis, with matlab functions and user supplied
functions available. To keep the cost of this under control,
it could perhaps only be allowed on subsets of moderate length.
-- On-line real-time analysis can get around the problem of resolution versus trace length,
at the cost of not being able to return to the data, and needing the underlying
assumptions for the analysis to be checked automatically on-line as well, which implies
considerable maturity of the analysis methods.
A beginning could be made on this area, published on the web
as they come in at some suitable time scale. This could also be migrated into the hardware,
as could filtering (yes I know, easy to say!)
Obvious targets are windowed means, windowed histograms for marginal
distribution (windowed because of non-stationarity), and some scaling exponents
using wavelet analysis (we have already done real-time versions of this at OC3).
Of course the problem here is that one could discuss forever exactly what to measure,
but why not just begin simply and evolve it? The basic stream selection and time series
generation aspects would be generic. We could perhaps collaborate on this, we have been
thinking of doing it for Gigabit Ethernet at EMUlab.
Although the role of trace archiving is here to stay, I think it is time, given the
increasing storage problems with faster links, to at last move on-line measurement into reality.
Joerg wrote:
An option being considered is to let different monitors run different
trace strategies. It is also possible to "profile" measurement points,
where we do not keep the data, but some graphs on important parameters,
which allow you to check how a particular sample relates to the activity
of the rest of the day. Instead of pictures, we could keep high-level
samples about certain parameters (for instance packets/sec, megabits/sec,
flows/sec in samples of one second). This vastly decreases the amount
of information stored, but still provides a hook to evaluate the bias
for a detailed sample in correlation to a complete (long) trace.
I think this idea is good and fits in well with the on-line philosophy. On-line can do
'several things all the time', presented as data and/or web plots, and archive traces
can keep 'everything' for selected durations on different time scales.
> o trace scenarios (router instrumentation, cross-US, transatlantic, ...)
These would be interesting links.
> o trace variety (LAN views, WAN access view, backbone view)
We are more interested in the backbone view for the scaling work, and all of them
for the active work.
Darryl
+----------------------------+--------------------------------------------+
| Darryl Veitch | Email: d.veitch@ee.mu.oz.au |
| EMUlab | Telephone: |
| Department of Electrical | Direct- +61 3 8344 9196 |
| & Electronic Engineering, | Inquiries- +61 3 8344 9204 |
| University of Melbourne | |
| | Fax: +61 3 8344 9188 |
| Victoria 3010 | |
| Australia | Web: http://www.emulab.ee.mu.oz.au/~darryl |
+----------------------------+--------------------------------------------+
This archive was generated by hypermail 2b30 : Thu Sep 27 2001 - 16:24:41 PDT