Automated Traffic Mirroring in AWS

Other Posts in This Series

  1. Automated Traffic Mirroring in AWS (you are here)
  2. Automated Traffic Mirroring in AWS at Scale
  3. Automated Traffic Mirroring in AWS with Full Coverage
  4. Automated Traffic Mirroring in AWS with Tags

Overview
This is what we’re going to build over the course of this series:

Starting with "Why?"
I’m always trying to keep our equation for data value in mind, especially as an ex-software engineer, where these days it’s just too easy to create something that seems valuable to me (ie. fun!), but ultimately does not increase velocity, reduce friction, and/or fails to increase the spread of valuable intelligence to more humans.


While most of us can likely agree that our network traffic is simultaneously both the richest source of information available in the digital world and the absolute ground source of truth, this first-class data source has historically lacked widespread adoption because it comes at an extremely high velocity, involves a significant amount of friction to acquire in most cases, and there just aren’t that many humans that can understand the resulting data without significant processing & analysis being done first.

The ExtraHop Reveal(x) product already solves the velocity factor by handling ludicrous amounts of network traffic, and serves the human factor by extracting intelligence and presenting it in a way that makes sense. However, until now we’ve been at the mercy of the data center plumbers to have any meaningful amount of access to this data source… and in a traditional data center that hasn’t changed much unless you’re using one of the well-known packet brokers out there and you have a mature tap aggregation infrastructure in place. Lucky!

Okay, enough preamble… so by now you know where I’m headed. With Amazon, Google, and Microsoft all announcing support for traffic mirroring in their respective clouds, this data source can be tapped with an amazingly low friction factor: ZERO .

So let’s put that into our equation for data value and smoke build it.

Summary
This post is the first in a series focused on eliminating this friction factor in the cloud, using Amazon Web Services (AWS) and the automated management of their VPC traffic mirror sessions with a cloud-native serverless approach. In this tutorial we will use a single Lambda function to catch the EC2 RunInstances operation exposed as an event in CloudWatch via CloudTrail to create a new Traffic Mirror session for all attached Network Interfaces (ENIs) within ~5 seconds.

For this solution to work, we will be using the following services, so before you begin you’ll need to ensure your user — or the credentials you’ll be using via CLI — has permissions to use the following:

  • CloudTrail
  • IAM
  • CloudWatch Events
  • Lambda

This example only provides coverage for new instances. Subsequent posts in this series will build on this example by first decoupling for scale and cost, adding additional coverage by catching other events such as the StartInstances operation for existing stopped instances that are started , along with the AttachNetworkInterface operation for cases where one or more ENIs are attached to running instances, and even catching the DeleteTrafficMirrorSession operation to prevent accidental or malicious removal of traffic mirror sessions. Finally, the series will wrap with an example that adds a bit of orchestration to the example using tags.

So let’s get started …

Prerequisite: Create a Trail in CloudTrail
While highly unlikely, if you have not yet created a Trail at the Account level or Organization level in CloudTrail, you will need to do this to enable the events we’re going to hook in CloudWatch with our Lambda . While it’s possible to skip this step and simply hook EC2 State Changes , I’ve found this method provides both the most real time results (~5 seconds) as well as a much more granular set of events we can hook and handle in one place, as you’ll see in subsequent posts in this series.

Step 1: Create an Execution Role in IAM
First we need to create a new execution role for our Lambda function. This role needs to have the following Policy attached to it at a minimum:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:CreateTags",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": [
        "arn:aws:ec2:*:012345678901:traffic-mirror-session/*",
        "arn:aws:logs:*:012345678901:log-group:/aws/lambda/rx-vpctm-*:log-stream:*",
        "arn:aws:logs:*:012345678901:log-group:/aws/lambda/rx-vpctm-*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeTrafficMirrorFilters",
        "ec2:DescribeNetworkInterfaces",
        "ec2:DescribeTrafficMirrorTargets",
        "ec2:CreateTrafficMirrorSession",
        "logs:CreateLogGroup"
      ],
      "Resource": "*"
    }
  ]
}

Step 2: Create a new Function in Lambda
For now our Lambda function only needs to handle a single operation: RunInstances. This is actually the easiest operation to handle because all of the information we need about the EC2 instance is included in the initial event variable passed to Lambda.

We will be adding to this function in other posts in this series, so there are a few included Helper functions and the actions have been split out into Worker functions using Promises to keep everything as readable and asynchronous as possible, which will become much more important and seem less over-engineered once we’re handling multiple operations.

IMPORTANT: When creating your Lambda function, be sure to choose the role you created in the previous step as the Execution Role instead of having a new basic execution role created for you, which is the default.

Once created, populate your function with the following code:

'use strict';

const AWS = require('aws-sdk')
const EC2 = new AWS.EC2()

exports.handler = (event, context, callback) =>
{
	// Safety Net for unsupported events
	if (event.detail.eventName !== 'RunInstances') {return context.fail(`Unsupported Event [eventName]: ${event.detail.eventName}`)}

	// Occasionally the event will be received so fast that the response from
	// the API call to create the instance(s) has not been sent back to the
	// caller yet! We can wait for the next one, so just FAIL for now.
  if (! event.detail.responseElements) {return context.fail(`API Request Only [responseElements]`)}

	// Make things a little more readable
  let instances = event.detail.responseElements.instancesSet.items

	// Verify we have at least one Nitro instance to work with
	instances = instances.filter(isNitro)
  if (! instances.length) {return context.fail(`No instances to work on [isNitro]`)}

	// Create a TrafficMirrorSession for all [instances]
  Promise.all(instances.map(createTrafficMirrorSession))

  // All done ...
  .then(() => context.succeed(`VPC Traffic Mirror Sessions Enabled for (${instances.length}) instance(s)`))

  // Awwww maaaannnn ....
  .catch(error => {
    console.error(error)
    context.fail(`Unexpected Error: ${JSON.stringify(error.errorMessage || error)}`)
  })
}

/* ========================================================================== >>
   WORKER FUNCTIONS
============================================================================= */
function createTrafficMirrorSession (instance)
{
	// Setup REQUIRED utility variables
	instance.az = instance.placement.availabilityZone
	instance.nics = getNics(instance.networkInterfaceSet.items)
	if (! instance.nics || ! instance.nics.length) {return}

	// Setup OPTIONAL utility variables
	instance.tags = getTags(instance.tagSet.items)

	// Get all available traffic mirror [targets] in this REGION
	return new Promise((go, stop) =>
	{
		EC2.describeTrafficMirrorTargets(null, (err, data) =>
		{
			if (err) {return stop(err)}

			// Convert [TrafficMirrorTargets] into a hash table of [targets] by ENI
			let targets = {}
			data.TrafficMirrorTargets.forEach(t => (targets[t.NetworkInterfaceId] = {
				'id': t.TrafficMirrorTargetId
			}))

			go(targets)
		})
	})

	// Get full detail for each [target] ENI
	.then(targets => new Promise((go, stop) =>
	{
		EC2.describeNetworkInterfaces({NetworkInterfaceIds:Object.keys(targets)}, (err, data) =>
		{
			if (err) {return stop(err)}

			// Add applicable ENI properties to [targets] from [data], and filter out
			// [targets] not in the same [AvailabilityZone] as the [instance]
			for (const eni of data.NetworkInterfaces)
			{
				if (eni.AvailabilityZone !== instance.az)
				{
					delete targets[eni.NetworkInterfaceId]
					continue
				}

				targets[eni.NetworkInterfaceId] = {
					'id': targets[eni.NetworkInterfaceId].id,
					'subnet': eni.SubnetId,
					'vpc': eni.VpcId
				}
			}

			// Convert [targets] hash table to an Array of [targets]
			targets = Object.values(targets)
			if (! targets.length) {return stop(`No Traffic Mirror Targets Available [availabilityZone]: ${instance.az}`)}

			// Set the [target] for each [nic] on the [instance]
			instance.nics = instance.nics.map(nic =>
			{
				nic.target = false

				// Determine if a [target] exists in the same [subnet]
				for (const target of targets)
				{
					// Stop instantly: a local target is always ideal
					if (target.subnet === nic.subnet)
					{
						nic.target = target.id
						break
					}

					// Set the [target] based on VPC until something better is found
					if (target.vpc === nic.vpc) {nic.target = target.id}
				}

				// No [targets] in local [subnet] or [vpc], use first found in this AZ
				if (! nic.target) {nic.target = targets[0].id}

				return nic
			})

			go()
		})
	}))

	// Get the traffic mirror [filters] for this REGION
	.then(() => new Promise ((go, stop) =>
	{
		EC2.describeTrafficMirrorFilters(null, (err, data) =>
		{
			if (err) {return stop(err)}

			if (! data.TrafficMirrorFilters || ! data.TrafficMirrorFilters.length)
			{return stop(`No Traffic Mirror Filters Available [availabilityZone]: ${instance.az}`)}

			// Take the first [TrafficMirrorFilters] entry in this region
			go(data.TrafficMirrorFilters[0].TrafficMirrorFilterId)
		})
	}))

  // Create TrafficMirrorSession with [filter] for [nics] attached to [instance]
	.then(filter => Promise.all(instance.nics.map(nic => new Promise((go, stop) =>
  {
    let params = {
      NetworkInterfaceId: nic.id,
      SessionNumber: 1,
      TrafficMirrorFilterId: filter,
      TrafficMirrorTargetId: nic.target,
      Description: 'Automated Mirror Session'
    }

    if (instance.tags.Name)
    {
      params.TagSpecifications = [
        {ResourceType:'traffic-mirror-session', Tags:[{Key:'Name', Value:instance.tags.Name}]}
      ]
    }

    EC2.createTrafficMirrorSession(params, (err, data) => (err ? stop(err.message) : go()))
  }))))
}

/* ========================================================================== >>
   HELPER FUNCTIONS
============================================================================= */

// The [hypervisor] property on the [instance] object is not consistent, so
// it's necessary to check the instance family before making a final decision.
function isNitro (instance)
{
	if (instance.hypervisor === 'nitro') {return true}
  let nitro = ['a1','t3','t3a','m6g','m5','m5a','m5n']
  let family = instance.instanceType.split('.')[0]
  return (nitro.indexOf(family) !== -1)
}

// Collapses a [tagSet] into a proper hash table of [tags]
function getTags (tagSet)
{
  if (tagSet === undefined || ! tagSet.length) {return false}
  let tags = {}
  tagSet.forEach(tag => tags[tag.key] = tag.value)
  return tags
}

// Converts a [networkInterfaceSet] into a simple dictionary of [nics]
function getNics (networkInterfaceSet)
{
  if (! networkInterfaceSet || ! networkInterfaceSet.length) {return false}
  return networkInterfaceSet.map(nic => ({
    'id': nic.networkInterfaceId || nic.NetworkInterfaceId,
    'subnet': nic.subnetId || nic.SubnetId,
    'vpc': nic.vpcId || nic.VpcId,
    'owner': nic.ownerId || nic.OwnerId,
    'status': nic.status || nic.Status
  }))
}

So what’s really happening here …
You will find that the code is heavily commented so I would ask you give it a read, you’ll be surprised how simple this all really is. However, here’s a tl;dr for you as well:

  1. When the Lambda function receives the event, a few sanity checks are performed, such as ensuring we have at least one Nitro-based instance to work with.
  2. Each instance is then asynchronously passed to the createTrafficMirrorSession function.
    • A few light operations are performed to make the tags and nics (ENIs) easier to work with.
    • EC2.describeTrafficMirrorTargets: returns all available traffic mirror targets for the region where the instance was launched.
    • EC2.describeNetworkInterfaces: provides full detail for the ENI of each target interface so we know which subnet, VPC, and AZ each target lives in. At this point any targets outside the AZ where the instance was launched are discarded to ensure traffic is not mirrored to a target outside the local AZ.
    • Each eni on the instance is then assigned a target in the following order of preference. If no targets are found that match this criteria an error is recorded, which could be monitored by a CloudWatch alarm if desired:
      1. target is in the same subnet
      2. target is in the same ‘VPC’
      3. target is in the same AZ
    • EC2.describeTrafficMirrorFilters: returns all available filters and blindly takes the first one returned, assuming for this tutorial that there is only one filter defined per region. See Notes & Caveats section below for an explanation of how this could be improved.
    • EC2.createTrafficMirrorSession: creates a new traffic mirror session for each eni attached to the instance, pointing to the best/closest target found for each individual source eni.

Step 3: Add a CloudWatch Events Rule
The easiest way to do this is straight from the Lambda function editor in the AWS console by clicking the Add Trigger button near the top of the screen in the Designer section and choosing CloudWatch Events as the source. This will automatically associate the Lambda to the event rule.


This can also be done from the CloudWatch Events console, or of course through the API . Using any of those methods, you can simply copy/paste the following Event Pattern document to create your rule:

{
  "source": [
    "aws.ec2"
  ],
  "detail-type": [
    "AWS API Call via CloudTrail"
  ],
  "detail": {
    "eventSource": [
      "ec2.amazonaws.com"
    ],
    "eventName": [
      "RunInstances"
    ]
  }
}

Conclusion, for now …
Congratulations! We’ve only just begun with this project and you already have a fully automated serverless VPC Traffic Mirror Manager that can handle a single new instance with a one ENI, or multiple instances each with multiple ENIs being mirrored automatically within seconds of creation to the closest/best traffic mirror target available in the local availability zone.

The next post in this series will show you how to decouple by moving the createTrafficMirrorSession function into it’s own Lambda function that can be called asynchronously to fan-out the work horizontally. This will serve to keep costs low in highly dynamic environments and allow this solution to scale to hundreds or thousands of instances with one or more ENIs for each instance being operated on simultaneously within seconds to ensure you don’t miss a single packet!

Notes & Caveats Thus Far …

  • VPC Traffic Mirroring is only available for EC2 Instances running on the Nitro hypervisor. In most cases switching your instances to Nitro is as easy as: stop, switch instance type, start. In all cases that I’m aware of, Nitro is faster and cheaper, so this is a great time to switch!
  • Rules in CloudWatch Events and Functions in Lambda work only in the Region in which they are created. If you configure CloudTrail to track API calls in multiple Regions, and you want this to work in each of those Regions, you must create a separate CloudWatch Events Rule and Lambda Function in each Region.
  • Scoping the automation
    • Tags: Scoping the automation using a tag on your EC2 instances (ie. Mirror: True) to limit which instances are automatically managed and/or using a Tag to direct the function to mirror instances with a specific tag to a specific mirror target.
    • Geography: By default the Lambda function will attempt to find a traffic mirror target in the same Subnet, if none are present, the scope will increase to any target in the same VPC, and then Availability Zone. It is not advisable to use a traffic mirror target outside the AZ of the source ENI for both cost and sanity purposes.
  • If it’s desired to have more than one Traffic Mirror Filter rule, perhaps applied conditionally depending on the sensitivity of the traffic being mirrored, this example does not support this requirement. However, something as simple as matching tags could be used to pick the correct mirror filter to match the instance or ENI being mirrored (ie. instance tag “Mirror: ALL” would match to traffic mirror filter tag “Mirror:ALL” or “Mirror:SOME” etc). This would be trivial to implement, yet increases the orchestration capabilities exponentially.
5 Likes