Tip of the Week: Passing Data Between Triggers and Trigger Events

triggers

#1

Getting Started

This TotW will be all about sharing data between trigger events. This is a
super useful piece of the ExtraHop trigger writer’s toolkit and while there are
some subtlties to it, it’s really not that bad.

First, let me try to explain why you’d ever want to do that… so when a trigger
fires it provides context for the data that is being chewed on - things like
the IP addresses of the endpoints, TCP port numbers, maybe the HTTP URI or other
things that ExtraHop has been able to extract out of the data. Sometimes one
piece of information available in one event is wanted (but not immediately
available) in another. Good example: maybe in DB_RESPONSE you want access to
what the SQL query was from the request. Or in a more extreme case, maybe you
are processing a database request and you actually wanted something from a
related HTTP connection.

See what I’m getting at now? You have data in one trigger and you want it to be
available in another, and this is really cramping your style.

Well, there happen to be two different techniques that can help us avoid
harshing our mellow. They’re known conversationally as the “flow store” and the
“session table”. I’m gonna cover what both of those are, how they’re different,
when to use them, and how to write the code for both. It’s going to be great,
trust me.

The Flow Store

Let’s start with the first example I gave above: how to provide the SQL
statement from a DB query to the DB_REQUEST event. This is a great time to make
use of the flow store. Why? Well this is important: because the request and
response are using the same TCP connection (the same “flow”), and thus the
Javascript interpreter that runs the trigger code can pass data from one event
to the other in a very lightweight and performant manner.

So here’s a rule: If you want to pass data between one event and another that
involves the same flow of data use the flow store
(think things like TCP connection, or a stream of
related UDP datagrams, etc). It’ll be efficient both in
terms of performance and resource utilization, and best of all it’s super
flexible.

That last point deserves an explanation all its own, but it’s best to actually
show how it works now. Here’s a few examples:

Flow.store.thing = "some value I need";

This creates a new key in the flow store named “thing” with a value of a string.
Here’s another more complicated example:

Flow.store.my_stuff = { foo: 16, sql: DB.statement, bar: [ "baz" ] };

Here’s what that does - it dynamically adds a new property to the Flow.store
object, and the value of that property can be any legal Javascript value - so
that could be a simple value like a number or a string, but it can also (like in
this example) be a full fledged object that’s arbitrarily complicated. This
example assigns an object with three keys (foo, sql, and bar) and each of the
keys has a value (the number 16, a copy of the DB.statement variable, and an
array containing a string). If I had wanted to I could have nested more objects
within it, or arrays of arrays, functions, etc - any legal Javascript. Now, most
of the time, you’re just going to want to stash away specific simple things, but
don’t forget that it can hold whatever you want organized however you want it.
It’s truly super flexible.

So once you add it to the flow store, how do you retrieve it when you want it?
Like this:

var thing = Flow.store.thing;
var my_var = Flow.store.my_stuff;

thing and my_var are now variables that contains whatever was previously
assigned to those properties of the flow store for that connection.

Remember, the data is only available to triggers that are acting on the same
flow of packets as it was set. Each flow gets its own flow store with its own
variables. So the same trigger can fire for multiple flows, and each flow can
have a variable set with the same name each with their own values and they won’t
conflict or otherwise step on each other.

Cleaning Up

One last point before we move on: while you don’t need to clean up after
yourself in most cases because once the flow is done being processed, its flow
store is automatically freed up, it can still be a very good idea to do so
anyways, and there’s two reasons why:

First, for long lived connections, a previously set flow store
value could mistakenly be used later unintentionally. Think about a TCP
connection that is used for many requests and responses. On one request you add
a value to the flow store, and then use it without cleaning up in the response -
then in another request (same TCP connection though so same flow) a value is not
added to the flow store - but in its response the trigger tries to grab it but
gets the value from the first request but doesn’t know any better. These can be maddening bugs to track down! To avoid this, if you’re sure that you’re done with a piece of data on the flow store, do one of two things with it:

  1. Set it to null

     Flow.store.my_stuff = null;
    
  2. Delete it from the store object like so:

     delete Flow.store.my_stuff;
    

Why list two methods? Well, there’s some Javascript voodoo involved in answering
this, but it’s generally more performant to assign null to the object. So
there’s that - but the reason to delete it is that it is cleaner because it
actually gets rid of the object such that it’ll no longer be defined after the
call. Thus, it’s a tradeoff between running faster and being easier to work
with - if you set something to null the variable still exists and attempts to
access it will result in, well, null (but null is something!). But if you
delete it the system will act like it was never there to begin with. Pick one
and do it.

However, be careful that you don’t delete/nullify something that another trigger
acting on the same event might expect to be there. If you have this situation
(multiple triggers acting on the same flows using the same flow store data),
make sure that the trigger that does the delete/nullify operation gets run last
by setting its priority appropriately in the trigger’s definition.

Secondly, aside from the subtle bugs that not cleaning up can introduce, entries
in the Flow store do take up RAM, and it doesn’t grow on trees. On a busy box
running lots of triggers, cleaning up after yourself can honestly make a big
difference in how well the system scales.

So, another rule: Delete/nullify data from the flow store when you’re finished
with it.
Do this out of habit even if you don’t think you need to do so.

Now that you’re an expert in using the Flow store, let’s move on.

The Session Table

The flow store is useful for passing data between events operating on the same
flow, so you can probaby guess that the session table is for passing data
between events on unrelated flows. Such a guess is pretty accurate.

I mentioned a scenario earlier of wanting to get at a piece of data (say,
an http URI) from a web request in a related database event. In the HTTP_REQUEST
event, you’d have something that looks like this:

Session.add("the_uri", HTTP.uri);

To retrieve the value in a DB_RESPONSE event, you’d have code like this:

var uri = Session.lookup("the_uri");

Now, a few things to be aware of when using the session table…

First, the session table is different than the flow store in that the session
table is a key/value store and will not accept rich Javascript objects as
values. As a workaround for this, you can (usually) call JSON.stringify() on an
object to serialize it as a string and then JSON.parse() to coerce it back to
an object - but be aware that doing so is not a lightweight operation. Stick to
simple string values if you can preferring multiple session table entries over
JSON strings of compound values.

Secondly, the keys in the session table are globally unique and globally
available such that if multiple triggers try to use the same key for different
things, you’ll have issues. This is very unlike the flow store where each
connection gets its own sandbox to play in - the session table is one big
playground with all triggers playing in it at the same time.

There are different methods (add(), replace(), update(), etc) that can help to handle
this situation, but in the common cases you probably can avoid that by ensuring
that the key is unique by building it by combining a prefix for what you’re
doing with attributes of what is being stored - things like session IDs and
usernames come in handy here (an example would be a prefix of “session_” and a
session id pulled out of a cookie appended).

So there’s a good hint: when putting something into the session table, consider using a
prefix along with a piece of data that will be unique to your use to avoid
conflicts - unless you really know nothing else will use that key.

Third, session table entries exist until there’s no more room (32768
entries by default at the time of writing this) or they time out, and when
they’re removed you have the option of raising a trigger event to act on that
happening. That’s nothing like the flow store! See the trigger API docs and the
SESSION_EXPIRED event for a full explanation.

Fourth, and I apologize for getting into what might be an implementation detail,
but the session table actually involves a lookup to something outside of the
Javascript interpreter - and that doesn’t come for free. This is related to why
it can’t store Javascript objects as already mentioned, but I bring it up here
because of performance. While the session table is still quite fast, it’s going
to be a bit slower than the flow store. Don’t abuse it.

I mentioned that there are multiple methods available for interacting with the
session table in different ways. There are some subleties that I won’t attempt
to cover here, so please refer to the Trigger API documentation for the full
details of how to use them (they’re documented as part of the Session object).

Lastly, it’s actually possible to access the session table externally which
allows one to feed the session table from outside of triggers - but that’s a
topic for a future TotW. In the meantime, let your brain chew on the
possibilities there for a minute - they’re infinite.

TLDR

It’s absolutely possible to share data between different triggers and their
events. Use the flow store to pass data between an earlier event and a later one
operating on the same connection. The flow store can hold any valid JSON, but
clean up after yourself if you use it a lot. Use the session table for
everything else, but be aware that it won’t accept complex objects for values,
and that there is just one session table for the entire system.