Using MetricRecord in Triggers

I recently tackled an interesting support case:

I created a trigger which sends data to an HTTP endpoint through Open Data Stream, but I am having trouble replicating a specific metric: Network - Bytes In/Out by IP. Currently I am running the trigger on FLOW_TICK and TCP_CLOSE, but it isn’t the same as what I see on the L2 page for the group for which I have the trigger running.

Trying to recreate the Network Bytes In/Out metric in a trigger can be a daunting task. This is mostly due to the nature of triggers — triggers fire on specific events, but the Network Bytes In/Out metrics don’t have an event to trigger on. We calculate this as a total over time for each metric cycle (30 second, 5 minute, 1 hour).

Instead of calculating this metric from scratch, we can use a trigger to look up the metric data we’ve already calculated. This trigger will fire on METRIC_RECORD_COMMIT — we’ll look up the value in the metric cycle store immediately before we commit those metrics to disk, and then we can send that data to our HTTP endpoint. We call these ‘bridge triggers’; instead of firing on events in the live data capture, we’re going to fire on an event on the datastore/bridge. These can be very tricky, but I hope to shed some light on an example here.

Triggers on METRIC_RECORD_COMMIT require that we define which metric categories we want to look at. The metric catalog provides this information:

We first need to configure our trigger’s advanced options to incorporate this information, like this:

Since we’re interested in the 30 second metric cycle for a Device metric type, METRIC_RECORD_COMMIT should fire at least once every 30 seconds for every device on your EDA for which we’re committing data. Each time the trigger fires, we will expose a MetricRecord object which contains information about the device it’s firing on (MetricRecord.object) and all of the metrics being committed for that object (MetricRecord.fields). The metric catalog says we care about “bytes_in” for this particular metric, so we’ll be looking at MetricRecord.fields.bytes_in specifically in our trigger code.

MetricRecord.fields.bytes_in is a Topnset object, which contains a collection of metrics grouped by key. Topnset objects provide methods to query for specific keys. Topnset objects also have an entries object, which is a Javascript Array of all metrics in the Topnset. The array contains at most N objects with key and value properties, where N is currently hardcoded to 1000. (You can’t get more than 1000 currently. Sorry.)

MetricRecord.fields.bytes_in.entries is an array of Javascript Objects. Each object has a key and a value property.

For each array entry in MetricRecord.fields.bytes_in.entries, the value is the value of the metric — here, the count of bytes for that key for this metric cycle. The key is another Object with type and value properties. The type tells you what type of object the value is (e.g. int, string, ipaddr). The value is just the value of the key. (This is all described in our Topnset documentation).

What we want is the MetricRecord.fields.bytes_in.entries[x].key.value and the MetricRecord.fields.bytes_in.entries[x].value for all x in entries[].

As mentioned earlier, METRIC_RECORD_COMMIT will fire at least once every metric cycle for every device. If you want to restrict your output to certain devices, you will have to look at MetricRecord.object — for ‘extrahop.device.net_detail’, the object will be a Device object. Our trigger API documentation provides full details on how to use the Device object to get the information you want.

With all of this information in hand, I’m ready to tie it all together into a trigger:

// Event: METRIC_RECORD_COMMIT
// Metric Cycle: 30sec
// Metric Types: extrahop.device.net_detail

var thisIP = MetricRecord.object.ipaddrs[0]; // IP address for this Record's device
var myIP = new IPAddress("192.168.5.5"); // my laptop's IP address

if (thisIP && thisIP.equals(myIP)) { // only run for my laptop, instead of all devices

    var output = MetricRecord.id + "\n" + MetricRecord.time + "\n";
    var entries = MetricRecord.fields.bytes_in.entries;

    for (var e = 0; e < entries.length; e++) {

        if (entries[e].key) {
            output += entries[e].key.value.addr + ": " + entries[e].value + "\n";  
        }
    }
    debug(output);      
}

Here is the Runtime Log output for one of the 30-second metric cycles in my lab:

Thu Mar 09 12:22:30
extrahop.device.net_detail
1489090980000
128.121.22.158: 52
192.168.254.20: 104
208.79.254.254: 2640
216.58.193.110: 1289
173.194.202.125: 422
173.192.82.194: 405
136.146.208.117: 4867
216.58.193.101: 3001
192.168.254.253: 39744
52.45.155.59: 1432
74.125.28.189: 1177
149.154.164.3: 1586

Thu Mar 09 12:22:30
extrahop.device.net_detail
1489090980000
192.168.254.254: 2640

Thu Mar 09 12:22:30
extrahop.device.net_detail
1489090980000
192.168.254.20: 104
192.168.254.253: 19872

Thu Mar 09 12:22:30
extrahop.device.net_detail
1489090980000

You’ll notice that we recorded a handful of METRIC_RECORD_COMMIT events for this one time period — this is simply due to idiosyncrasies in how we store and manage data. But here we are: The Network Bytes In by IP for my laptop for this 30-second metric cycle. This data should be identical to what I would see in the ExtraHop Discover Appliance’s web UI if I were to look at my laptop’s L2 metrics for this 30-second slice of time. We can expand from here and do much more with this data – look at additional devices, only look at 5-minute or 1-hour metrics instead of 30-second, send the data off-box to an Open Data Stream endpoint, etc.

A normal capture trigger should take care of 99% of use cases on the appliance – most of what we care about is looking at wire data as it comes in and analyzing that data. However, if you want access to some of our Network metrics or other metrics that don’t have a clear capture-level event to trigger on, your best bet may be a bridge trigger to look at the wire data we’ve already processed.

N.B. I glossed over some of the legwork in writing this post – how do I know that MetricRecord.fields.bytes_in is a Topnset? Part of it is super-secret knowledge about how we store certain data in our datastore. Part of it is extra research with the trigger editor, the runtime log, and copious application of debug() statements. debug() is your best friend in writing complicated triggers. Here, debug(MetricRecord.fields.bytes_in); told me what I needed to know ("[object Topnset]").

Hi, glad to see this information can be of use to other people as well. I just had one follow-up question regarding the Topsnet object. Is there any chance that hardcoded 1000 will be changed in the near future? We want to be able to see more than just the first 1000 records, if not, we may have to return to querying the Rest API.