From: Neil Spring (nspring@cs.washington.edu)
Date: Sat Jan 20 2001 - 19:15:27 PST
> Ok. Longer traces. I think we have the necessary cycles to do
> filtering on traces, the more since it reduces disk IO, which
> I see becoming scarce. I wonder whether it is practical from
> an analysis point of view.
Yes, the disk IO tradeoff makes sampling more attractive
from a technical point of view. I am also curious to hear
what sort of anaylses this scheme might prevent.
> Did you consider an Auckland or NZIX trace yet ? How long do you
> think would a trace like this "last", in terms of time required
> for analysis until you need the next data set ?
I hadn't looked in detail at either set of traces; now that
I have, they seem very well suited to the sorts of things
I'm interested in. I haven't yet adapted my libpcap-based
tools to parse any of the other formats (distracted by other
work), so I can't say for certain.
> > Fewer, longer traces would seem fine to me. I don't really
> > care about daily usage patterns: only enough samples to
> > convince me that what I saw was not an effect of being
> > run at 5am would be sufficient.
>
> To make it extreme, we would capture a single 24 hour trace. Ignoring
> disk space constraints for a moment, lets look what impact this has
> on frequency. At the moment, monitors capture 8 90 second samples
> per day. This is about 12 minutes per day. About 6 hours per month.
> To not increase data volume, we could only capture a one-day trace
> every four month, that is three per year. I can see such sampling
> as being biased by chance.
You're exactly right.
I articulated my message poorly, perhaps relying on Stephen
Donnell's earlier message about flow length to advocate
longer traces:
SD> Is 90 seconds sufficient for looking at flows? I assume tehre will be
SD> many complete http sessions/flows in 90s, but what about ftp, nntp,
SD> streaming media etc? Are people concerned about this?
If the interest is in having an archive of traffic to see
how it changes over time, then biases from short trace
durations would apply to both current and archived traffic.
This probably wouldn't hurt too much.
As long as it is clear that one shouldn't use these 90
second traces to draw conclusions about the duration of
streaming media flows, or FTP connections through dial-up
modems, 90 seconds would be fine.
It's not clear there is a trace duration that would
service both camps, and 90 seconds seems like a reasonable
compromise.
> An option being considered is to let different monitors run different
> trace strategies. It is also possible to "profile" measurement points,
> where we do not keep the data, but some graphs on important parameters,
> which allow you to check how a particular sample relates to the activity
> of the rest of the day. Instead of pictures, we could keep high-level
> samples about certain parameters (for instance packets/sec, megabits/sec,
> flows/sec in samples of one second). This vastly decreases the amount
> of information stored, but still provides a hook to evaluate the bias
> for a detailed sample in correlation to a complete (long) trace.
That would be remarkable, addressing both issues.
I'm curious what sort of design you can come up with,
and hope that it will be simple enough.
> > I don't know, but the thought of having automatically
> > generated graphs for the raw traces is interesting.
> > You wouldn't be able to automatically generate some of the
> > graphs from http://www.caida.org/outreach/papers/Inet98/?
> > Not serious, just a wish-list item. I'm curious how figure
> > 8 changes over time.
>
> I think we are willing to go great length to match the needs for traces,
> however, we may have to priorize and consider how long it takes to
> implement certain features. Having an extensive set of pictures on very
> few selected traces sounds like the best option to me at the moment.
What is the goal of presenting the graphs alongside the
traces? Is the intent to advertise the trace in some way?
Or to present sample analysis so that I can verify the
function of my tools? Or are you going to do the simple
traffic analysis "for us" so that we can use current data
in our research without writing our own scripts?
It would be exciting to build a framework where traffic
analysis papers became dynamic: that their graphs were
updated as new data supplied by your project was generated.
If the conclusions noted in the paper from earlier data
were no longer justifyable, they would be removed or
redlined.
Now that I look at the graphs presented for the nzix
traces, I have some idea of the context of your question.
If it's not impossible, splitting the TCP traffic into the
top four or five protocols by port might be interesting.
Since I don't know the goals of including the graphs,
I'm not sure what to suggest.
-neil
This archive was generated by hypermail 2b30 : Thu Sep 27 2001 - 16:24:41 PDT