Logging to MongoDB, Part III

In part II of this mini-series, we logged some JSON-rpc style payload data right into MongoDB.

Some questions pop out:

  1. Why?
  2. What can we now do with this?

Let’s answer via example. I’ll log into my MongoDB virtual machine and type this:

mongo

I end up with a MongoDB cursor:

>

Let’s see if our ‘valves’ DB got created:

> show dbs;

insert1	0.0625GB
local	(empty)
logs	0.0625GB
syslog	0.0625GB
test	0.0625GB
valves	0.0625GB
web	0.0625GB

Sweet, it’s there. Let’s start exploring our data.

> use valves
switched to db valves
> show collections;
metrics
system.indexes

How many entries to we have?

> db.metrics.count()
541

Cool, we’ve got some data to play with. Let’s see how a record looks:

> db.metrics.findOne()
{
	"_id" : ObjectId("534b054c4c446270780a06f3"),
	"location" : "newyork:14",
	"metric" : "pressure",
	"value" : 15
}

Let’s see the first 20 pressure readings and omit the crud of the ‘_id’ field:

> db.metrics.find({ metric: "pressure"},{_id:0})
{ "location" : "newyork:14", "metric" : "pressure", "value" : 15 }
{ "location" : "detroit:7", "metric" : "pressure", "value" : 14 }
{ "location" : "seattle:2", "metric" : "pressure", "value" : 10 }
{ "location" : "tampa:13", "metric" : "pressure", "value" : 10 }
{ "location" : "reno:15", "metric" : "pressure", "value" : 11 }
{ "location" : "reno:12", "metric" : "pressure", "value" : 17 }
{ "location" : "losangeles:3", "metric" : "pressure", "value" : 14 }
{ "location" : "boston:8", "metric" : "pressure", "value" : 18 }
{ "location" : "tampa:3", "metric" : "pressure", "value" : 15 }
{ "location" : "sanfrancisco:8", "metric" : "pressure", "value" : 17 }
{ "location" : "birmingham:11", "metric" : "pressure", "value" : 17 }
{ "location" : "boston:7", "metric" : "pressure", "value" : 12 }
{ "location" : "birmingham:4", "metric" : "pressure", "value" : 17 }
{ "location" : "atlanta:4", "metric" : "pressure", "value" : 13 }
{ "location" : "detroit:9", "metric" : "pressure", "value" : 18 }
{ "location" : "austin:11", "metric" : "pressure", "value" : 16 }
{ "location" : "reno:8", "metric" : "pressure", "value" : 16 }
{ "location" : "boston:7", "metric" : "pressure", "value" : 18 }
{ "location" : "reno:12", "metric" : "pressure", "value" : 15 }
{ "location" : "denver:8", "metric" : "pressure", "value" : 10 }
has more

The ‘has more’ at the bottom means that there are a bunch more to look at. Hmm. Let’s just look for Tampa’s pressure:

// Notice the 'location:/tampa/' bit below. It knows regex!

> db.metrics.find({ location: /tampa/, metric: "pressure"},{_id:0})
{ "location" : "tampa:13", "metric" : "pressure", "value" : 10 }
{ "location" : "tampa:3", "metric" : "pressure", "value" : 15 }
{ "location" : "tampa:8", "metric" : "pressure", "value" : 16 }
{ "location" : "tampa:3", "metric" : "pressure", "value" : 13 }
{ "location" : "tampa:2", "metric" : "pressure", "value" : 18 }
{ "location" : "tampa:1", "metric" : "pressure", "value" : 12 }
{ "location" : "tampa:3", "metric" : "pressure", "value" : 18 }
{ "location" : "tampa:10", "metric" : "pressure", "value" : 16 }
{ "location" : "tampa:13", "metric" : "pressure", "value" : 13 }
{ "location" : "tampa:6", "metric" : "pressure", "value" : 15 }

I’d like to see the readings for valve #13 in Tampa now:

// Regex is now /tampa:13/
> db.metrics.find({ location: /tampa:13/, metric: "pressure"},{_id:0})
{ "location" : "tampa:13", "metric" : "pressure", "value" : 10 }
{ "location" : "tampa:13", "metric" : "pressure", "value" : 13 }

I’ll pause here. Hopefully you can see WHY we’d do this - now we can easily mine relevant data, right from the wire. Payload data is almost never logged like this, so it’s quite untapped. The ExtraHop does it in a single spot, without custom app code, etc. It Just Works.

What is cool about MongoDB is that it speaks JSON natively, so it understands trigger objects natively. Querying it is basically just asking for Javascrypt-y things based on properties.

For example: which specific valves do we have in New York?

> db.metrics.distinct('location', {location:/newyork:/})
[
	"newyork:14",
	"newyork:4",
	"newyork:13",
	"newyork:5",
	"newyork:7",
	"newyork:9",
	"newyork:2",
	"newyork:3",
	"newyork:10",
	"newyork:12",
	"newyork:15"
]

Which valves have a pressure > 10? This could be an issue:

> db.metrics.find({metric:"pressure", value: {$gt: 15}}, {_id:0, location:1, value:1})
{ "location" : "reno:12", "value" : 17 }
{ "location" : "boston:8", "value" : 18 }
{ "location" : "sanfrancisco:8", "value" : 17 }
{ "location" : "birmingham:11", "value" : 17 }
{ "location" : "birmingham:4", "value" : 17 }
{ "location" : "detroit:9", "value" : 18 }
{ "location" : "austin:11", "value" : 16 }
{ "location" : "reno:8", "value" : 16 }
{ "location" : "boston:7", "value" : 18 }
{ "location" : "nashville:13", "value" : 17 }
{ "location" : "tampa:8", "value" : 16 }
{ "location" : "austin:2", "value" : 18 }
{ "location" : "newyork:14", "value" : 18 }

We’re just scratching the surface here. Hopefully this will pique your interest in what the possibilities are.

What we’ve done is pretty dramatic, even with a tiny example like this: we’ve stashed Wire Data of interest into a big-data store (MongoDB) that we can now index, query, and search however we want.

Feel free to comment with any questions, and thanks for sticking with me on this long series!