From: k claffy (kc@ipn.caida.org)
Date: Mon Jan 22 2001 - 10:10:39 PST
On Sat, Jan 20, 2001 at 07:15:27PM -0800, Neil Spring wrote:
> Ok. Longer traces. I think we have the necessary cycles to do
> filtering on traces, the more since it reduces disk IO, which
> I see becoming scarce. I wonder whether it is practical from
> an analysis point of view.
Yes, the disk IO tradeoff makes sampling more attractive
from a technical point of view. I am also curious to hear
what sort of anaylses this scheme might prevent.
any intra-flow characteristics,
e.g., iat burstiness.
also reckon harder to piece together things
like the UNC-CH paper did
http://www.cs.unc.edu/~jeffay/papers/SIGMETRICS-01.pdf
we have no decent paper on the effects of
sampling in characterizing various
traffic characteristics (i didn't like the
1993 sigcomm one but i'm not objective)
someone should write this one,
is anyone interested in that?
we have the data just no cycles to work on it.
it is a badly needed paper, whoever writes it
makes well-recognized contribution in the field,
and i am more than happy to help,
but can't own the project right now.
> Did you consider an Auckland or NZIX trace yet ? How long do you
> think would a trace like this "last", in terms of time required
> for analysis until you need the next data set ?
I hadn't looked in detail at either set of traces; now that
I have, they seem very well suited to the sorts of things
I'm interested in. I haven't yet adapted my libpcap-based
tools to parse any of the other formats (distracted by other
work), so I can't say for certain.
if you do port those tools, can we put
them on the website?
SD> Is 90 seconds sufficient for looking at flows? I assume tehre will be
joerg: no.
SD> many complete http sessions/flows in 90s, but what about ftp, nntp,
SD> streaming media etc? Are people concerned about this?
If the interest is in having an archive of traffic to see
how it changes over time, then biases from short trace
durations would apply to both current and archived traffic.
This probably wouldn't hurt too much.
yeah, if your goal is biased data,
we're definitely on the right track. <s>
As long as it is clear that one shouldn't use these 90
second traces to draw conclusions about the duration of
streaming media flows, or FTP connections through dial-up
modems, 90 seconds would be fine.
It's not clear there is a trace duration that would
service both camps, and 90 seconds seems like a reasonable
compromise.
not reasonable to most folks i hear complaints from.
> An option being considered is to let different monitors run different
> trace strategies. It is also possible to "profile" measurement points,
> where we do not keep the data, but some graphs on important parameters,
> which allow you to check how a particular sample relates to the activity
> of the rest of the day. Instead of pictures, we could keep high-level
> samples about certain parameters (for instance packets/sec, megabits/sec,
> flows/sec in samples of one second). This vastly decreases the amount
> of information stored, but still provides a hook to evaluate the bias
> for a detailed sample in correlation to a complete (long) trace.
That would be remarkable, addressing both issues.
I'm curious what sort of design you can come up with,
and hope that it will be simple enough.
neil i think the idea is that we're all supposed
to help him, not just make him propose something
we'll all hate :)
(seriously, your comments are Extremely useful,
if we had this feedback from everyone we'd
move a lot faster i reckon -- )
> > I don't know, but the thought of having automatically
> > generated graphs for the raw traces is interesting.
> > You wouldn't be able to automatically generate some of the
> > graphs from http://www.caida.org/outreach/papers/Inet98/?
> > Not serious, just a wish-list item. I'm curious how figure
> > 8 changes over time.
>
> I think we are willing to go great length to match the needs for traces,
> however, we may have to priorize and consider how long it takes to
> implement certain features. Having an extensive set of pictures on very
> few selected traces sounds like the best option to me at the moment.
What is the goal of presenting the graphs alongside the
traces? Is the intent to advertise the trace in some way?
Or to present sample analysis so that I can verify the
function of my tools? Or are you going to do the simple
traffic analysis "for us" so that we can use current data
in our research without writing our own scripts?
all of those are options, yes
It would be exciting to build a framework where traffic
analysis papers became dynamic: that their graphs were
updated as new data supplied by your project was generated.
If the conclusions noted in the paper from earlier data
were no longer justifyable, they would be removed or
redlined.
but that's the real idea, yes.
(from my perspective, anyway)
Now that I look at the graphs presented for the nzix
traces, I have some idea of the context of your question.
If it's not impossible, splitting the TCP traffic into the
top four or five protocols by port might be interesting.
Since I don't know the goals of including the graphs,
I'm not sure what to suggest.
more later,
k
This archive was generated by hypermail 2b30 : Thu Sep 27 2001 - 16:24:41 PDT