Humans & Robots (or... Uncovering Hidden Automated Workloads)


#1

Servers that are designed to serve humans don’t always serve just humans. If you look closely, in the way ExtraHop enables you to do, you’ll also find a lot of automated traffic hitting a server. Robot traffic, I’ll call it, to keep with the theme of the eye-catching title.

The robots usually aren’t the evil kind. Sometimes they’re just reaching out to a web API to pull in a recent stock price or temperature to keep a web browser tab fresh. Or they may be issuing synthetic transactions to keep an eye on the availability of a key web page.

But they might be evil robots, slowly draining away server performance, or worse, sensitive data, but doing it in a slow trickling sort of way.

Wouldn’t it be cool to get an accounting of all the robo-transactions playing out in your environment, whether good or evil?

ExtraHop can help.

First, it’s a trivial to expose the robots with the ExtraHop browser UI if you have a time of day where there’s no human-driven activity. For example, if your transactions per second graph looks like this…

… then one click of the ExtraHop will list all the URL’s or clients that make up that constant workload.

But… what if you have a large, variable and continual human component like in the graph below, and you suspect there’s a robot component hidden deep underneath? How do you excavate and shine a light on those robot transactions?

One answer: A script like this built on the ExtraHop Python API.

Recall that one of the key things the ExtraHop Python API exposes is the historical data in the ExtraHop data store. The premise of the script here is simple. Extract the details of URLs accessed for 5 recent time intervals. Compare the # of times each URL is accessed across the five intervals. If the count is exactly the same 5 out of 5 times (or even 4 out of 5 times), there’s a good chance that it’s a robot at work. (The number 5 is configurable inside the script, BTW.)

Here’s some sample output when the script was applied to a router device serving Internet-bound web traffic:

Establishing session with ExtraHop device @ 10.10.9.8… connected.

Checking 14:03:00 to 14:33:00
Checking 13:33:00 to 14:03:00
Checking 13:03:00 to 13:33:00
Checking 12:33:00 to 13:03:00
Checking 12:03:00 to 12:33:00

(5 out of 5 times) 18 requests for www.example.com/post/blog/news/network
(5 out of 5 times) 15 requests for msh.example.com/utils/get
…

As mentioned above, the example script can be edited to focus on other resources the robots might be interested in, say database calls or DNS lookups.

See the comments in the script for how to adjust parameters. And please feel free to post any improvements you might make to the script. With my Python skills, there are certainly many to be had!