Trigger metrics not matching up with built in metrics

triggers
metrics

#1

Version 5.2.2

I am aiming to make a dashboard that shows our overall network throughput (bps), then a breakdown of the throughput, then top client/server throughput on select protocols.

The first two are easily done with a graph of Capture: Network Bytes. The third one seems impossible with the built in metrics (for example, Capture: Network Bytes: CIFS: by Client IP).

I found the trigger “WAN Opt Analysis - Network App bytes” on our appliance, which seems to attempt to do what I want. So I lifted heavily from that trigger and created my own. The problem is, my trigger metrics are not even close to the built in metrics.

In this picture, the first graph is the built in metric (Capture: Network Bytes: CIFS), the second graph is my custom metric (code to follow), and the third graph is the built in NAS metric (All Activity: NAS Read/Write).

I only included the third graph because I noticed that it and my custom metric strangely correlate, although my custom metric is maybe 10-20Mbps higher.

This is the trigger code. The trigger is assigned to All, and it has one event, FLOW_TICK:

var app = undefined;
var clientIP = Flow.client.ipaddr;
var bytes = Flow.server.bytes + Flow.client.bytes;
var protocol = Flow.l7proto.toLowerCase();
var ipproto = Flow.ipproto.toLowerCase();
var port = Flow.server.port;

// Capture traffic for L7 protocols
if (protocol == 'cifs' || protocol == 'ica' || protocol == 'nfs' || protocol == 'tds') {
	app = protocol;
}
else if (protocol.indexOf('http') > -1) {
    app = 'http';
}
else if (protocol.indexOf('ssl') > -1) {
	app = 'ssl';
}
else if (port == 10565 || port == 10566) {
	app = 'netapp';
}
else if (ipproto == 'tcp' || ipproto == 'udp') {
	app = ipproto;
}

if (app) {
    Network.metricAddCount('network-app-count-' + app, 1, true);
    Network.metricAddCount('network-app-byte-' + app, bytes, true);
    Network.metricAddDetailCount('network-app-count-detail-' + app, clientIP, 1, true);
    Network.metricAddDetailCount('network-app-byte-detail-' + app, clientIP, bytes, true);
}

I would love to nail this down. Please help!


#2

This is an interesting challenge. I don’t see any errors in your trigger. However, I can tell you that Capture–>Network Bytes is showing L2-L4 bytes, whereas Flow.server.bytes is L4 only.

Now, that, by itself doesn’t really answer the magnitude of the discrepancy. But, L4 is also after all filtering and de-duplication is ran, whereas the capture metric includes everything that was received. If you go to System Health --> Capture Throughput, do you have duplicate traffic being tagged?

Also, I’m not sure what kind of box or how much throughput you are receiving, but having FLOW_TICK running on “All Devices”, could be expensive. I would also check to see if you are having any trigger drops, effectively cutting your metrics.


#3

Knowing that the Capture network shows all traffic, including duplicates, seems like it is key. I assumed it would not.

This is a snapshot of the System Health: Capture Throughput

You can see the majority of the traffic is duplicate traffic.

I have played with using l2bytes instead of bytes, but as you might already guess, it did not get me even close to the Capture metrics.

Trigger load sits at around 12-20%, with no drops, so I am not worried about performance at the moment.

Here are my next questions:

  1. If Capture shows duplicate traffic, is there a similar item to use that does not show duplicate traffic (maybe “All Activity”)?
  2. If I use l2bytes in the trigger, is that before or after filtering and de-duplication?

#4

The L4 bytes (i.e. Flow.server.bytes) is what you would use to grab clean traffic. If you want to do a comparison, you may want to look at All Activity --> Network Bytes. But I’m not 100% sure.

L2 bytes is handled differently in triggers. If you are executing on FLOW_TICK, the L2 bytes counter gets reset on every TICK/TURN, whereas the L4 bytes does not until the flow is closed. What I was told, is that any L2 bytes is before dedupe, since we put everything together in the state table at L4, and that is where we can clean up the flow.


#5

I am still digging around to get a dashboard of overall network traffic.

The graphs on Settings: System Health seem like a good reference point.

Looking at Settings: System Health: Incoming Throughput Breakdown vs. Dashboard: Capture: Network Bytes (Average Rate), I believe that the Capture device does not show duplicate traffic, and in fact, shows only analyzed traffic.

Can you confirm this?