We recently had the opportunity to present a webinar with Eric Sharpsten, CTO for Lockheed Martin's Federal Health practice. This division of Lockheed Martin serves federal agencies involved in healthcare, one of which serves 125 million beneficiaries. These environments are extremely large, complex, and dynamic.
In the 18 months since Lockheed Martin implemented ExtraHop, Eric's organization has already seen tremendous success in providing their federal customers with vastly improved insight into application performance, infrastructure optimization, and security.
You can view the webinar above or read the full transcript below. Be sure to listen to Eric as he relates how Lockheed Martin uses ExtraHop to do the following:
- Automatically discover device dependencies and communications to validate documentation (at 21:55 in the webinar)
- Gain assurance during a datacenter migration project by baselining application performance (at 24:00 in the webinar)
- Support cybersecurity efforts with real-time visualization of geographic origination of requests (at 24:55 in the webinar)
- Stop the blame game between teams and speed resolution when conducting root-cause analysis (at 26:30 in the webinar)
- Enable proactive measures in their network operations center (at 30:15 in the webinar)
- Discover an overlooked startup script that was causing 20 million database errors each week (at 31:00 in the webinar)
- Reporting application performance SLAs to customers (at 33:50 in the webinar)
- Understand the strengths and weaknesses of wire data compared with other sources of data, such as machine data and agent data (at 55:00 in the webinar)
Steve LeSueur: Hello, everyone, and welcome to our webcast, "Situational Awareness for IT Operations and Cyber Security," sponsored by ExtraHop. I am Steve LeSueur, a contributing editor for Custom Media with the 1105 Public Sector Media Group, publisher of FCW, GCN, Washington Technology, and Defense Systems magazines. I will be your moderator, today.
That brings us to our webcast on improving situational awareness of your agency security. To a large degree, maintaining security depends on maintaining visibility into your network. I am very pleased today that we are going to examine how next generation wire data analytics can improve visibility and situational awareness into what's going on within your networks and systems.
Our first speaker will be Erik Giesa, who is the Senior Vice President for Marketing and Business Development at ExtraHop. After Eric, we will hear from Eric Sharpsten, who is the Chief Technology Officer for Lockheed Martin Federal Health. Our speakers have prepared a superb program, today. Without further ado, let's turn the time over to them. Erik Giesa, I believe you're going to start us off, today. The time is now yours.
Erik Giesa: Thank you, Steve. The agenda, today, we're going to focus mostly on what Eric Sharpsten, the CTO of Lockheed, has to say about his experiences and the approach that they've taken to really get control of a lot of the complexity and dynamism that they face in servicing all their federal customers.
To lay the groundwork, it's first and foremost, and most of you all in the line are probably painfully aware of the need for comprehensive and real time situational awareness.
There are some new concepts, finally, within the IT construct and in IT operations that are really proving out the promise of what can be and what should be, in terms of giving you the visibility and insight required to put your organization in a better position for not only proactive monitoring and insight, but even new ways to get better security analytics.
That's why we unified those two topics today. Then Eric is going to be going through some real world examples of how they've applied ExtraHop within the context of their environment, and realized several different value and benefits throughout the way, and even some potential ways they might use this in the future.
With that, you all are aware that the causes of lacking situational awareness, there's initiatives that we face every day, whether it's the move to the cloud, data center migration, or moving composite workloads from one environment, on premises to the cloud and virtualization, as well as facing the consistent threats that are out there, it makes for managing the environment very, very difficult. The effects impact everything from your overall mission readiness to fulfilling the obligations of the mission of your agency.
That has collateral effects, in terms of impact to end user experience, inefficient war room sessions as you're trying to isolate causes, or even plan. Forget about effective capacity planning, or even getting proactive when faced with all of these challenges.
If you were to sum it up, it really comes down to, we face and this will not go away intense complexity, scale and dynamism. It's just the nature of IT, and so the question becomes, "How do we get control of this?" Given the fact that we will always have new initiatives, we will always have new threats, we will have new technologies that are going to make us more productive in achieving our agency's mission, how do we lay down a framework, an architecture, that supports that, so that it minimizes the complexity, and gives you back control?
Well, I'll tell you this. The traditional approach has failed. I would hazard a guess, and I speak and I'm on the road with a lot of different customers, that most of you attempt to manage your environment in this fashion data silos. This is not your fault. This is the fault of vendors. It's a perspective that only really benefits vendors. To convince you that you need an isolated database monitoring solution, or "Oh, you're moving to the cloud. You need a separate monitoring solution for that. Oh, and then for security forensic purposes, you need a separate packet capture system and the network and the application performance monitor," ad nauseam.
I bet that most of you have probably over 12 to 15, maybe even two dozen, different tools used by different IT domains within your organization. None of these are integrated. None of them work together. This is the antithesis of real time situational awareness. These are data islands, so all we have is drowning in data but very little insight into the environment. A better model it's a data driven model and it's something that organizations like ExtraHop, Splunk, and some other forward thinking companies are starting to adopt and Gardner is backing and refers to as IT operations analytics.
We have to throw out the legacy acronyms that really mean nothing today. Application performance monitoring, for one, or network performance monitoring.
An application is not an isolated construct. It's dependent upon the underlying network, the infrastructure, DNS, LDAP, storage it's not some executable running on a server. It's the entire application delivery chain.
The question becomes, how do you provide visibility across that delivery chain? If you think about it, there really are only four sources of data to get that visibility, and we've categorized them here. It's a very, very proven taxonomy.
Wire data: this is what ExtraHop does and I'll get a little bit more into this. One of the key things is the characteristics. Instead of talking about, "Do you give application performance monitoring?" we should be asking, "What view does the dataset give in relation to application performance? What view does the dataset give in relation to network performance?" This then can affect our workflow.
When we look at this there are certain characteristics. Wire data is the observed behavior of all transactions between all systems that are connected on your network. Think of this as constant surveillance of your environment. What it does not do: it does not see what is going on in the host itself. If it is an activity that does not hit the wire, a wire data solution does not see it.
That's why we're approaching this from an architectural perspective. That's the role of machine data. Machine data is system self reported information. It's like a patient telling a doctor how they're feeling. It's valuable information, but it's limited in its insight. It can tell you only what some individual in the past has determined was appropriate to log or what was appropriate to instrument and publish in the SNMP. It's all the information that a machine produces. That machine and its information is going to be only as good as the individual who determined what should be logged.
Again, it's a valuable dataset but it's one perspective. The third dataset is agent. This is where you would deploy an agent on a system and you can't deploy an agent everywhere. But this is where you can get host based and custom instrumented information from that host. Machine and agent have a lot in common, because they are both host based. This is why they're complementary to a wire data perspective.
The fourth and final one is probe. This is periodic or scripted behavior. It could be an external monitor that does a periodic check. Viable for service checks and up down behavior, but rather limited in its scope, in terms of understanding and creating situational awareness. It could be used for an early warning system.
We have customers doing this today. What we say, "OK. No more data islands." These datasets belong together in a common data store. This borrows from the principles of Big Data. This does not take a lot of effort, time or money. In fact, it's incredibly operationally efficient. This approach, using something like Elasticsearch or MongoDB. You could even use something, like Loggly, Sumo Logic or Splunk, on the back end of that common data store.
Point is, unifying these datasets allows us to correlate across them, and drive new and interesting business and operational insights, as well as security insights.
With that, I'm going to explain a little bit more about wire data, because it is a unique concept. You all probably are familiar with the concept of wire protocols. HTTP is a wire protocol. FTP is a wire protocol, SMTP, et cetera. Wire data is the ability, to not only monitor and measure the performance of those wire protocols, but actually extracting the information from the payloads, measure and visualize that as well.
It's a whole process of taking unstructured data, which are packets, and turning them into structured wire data, so that they can be interrogated. The value of this is it opens up a world of information or a variable goldmine. They can be now chopped. What previously was impossible or impractical, now becomes practical.
What you see here is a sample dataset of the types of things that now can be measured without requiring agents, or telling an application to log that information non invasively, and from a global surveillance type of perspective. It's putting control back in your hands.
What's really powerful as you can imagine, this has brought implications, because if we can survey in real time, from that first client's DNS request to a last byte serve, out of storage and everything in between, and feed down to the file access level, and answer the questions of who, what, when and how much.
It opens up a world of possibility, on who could benefit from this. With that, I'm going to turn it over to Eric Sharpsten, to talk about this new paradigm, and how it can benefit and be leveraged across teams within an organization. Eric?
Eric Sharpsten: Erik, thanks a lot. Erik just discussed, there's lots of data that's being collected and stored inside of the ExtraHop framework. We have lots of different needs here. There's a need for providing that data to individuals in under different roles. Chief information officers, my customers, my senior executives, they want to know how well is an application form. Is it up? Is it down?
The developers, you're going to want to know many details of how well our certain elements of that application performing. It will only be from one release, to the next release. We typically snap baselines. We'll be able to tell them, "You're performing better, or you're performing worse, based upon those changes that we're delivering up to those baselines."
It's a really good touch point, where I'd say, "When we do the release, do we really want to keep her, or do we want to roll back? How good do we feel about that?" We talked a little bit about, it's not just the data that you see today that we're catching off the wire, but it's fact that I can go back and I can look at multiple releases. There's a bit of a historic analysis to that. I can even do trending with it.
It's very valuable to say storage admins, who need to trend their storage, or network admins are looking at bandwidth on major lines that they're bringing in. Our database administers, how well is a database performing based on these set of updates, or this upgrade of a database per se. The other thing is, as we all know, when an application or system has a problem, the infrastructure is guilty, until proven innocent. This provides us with a real good basis for which to say, "Here's where the problem really is." Because we can allow our administrators or developers role based access to the information, they have real time information that they can go and query and say, "When the root cause analysis is going on, this is how I get at my data." It gives us really good information of feedback to them, to support DevOps and agile development. It's been a very useful tool for us. We're just scratching the surface with it now.
Erik Giesa: Excellent. Thank you, Eric. That's a great segue, because when you think of...This is why the old classifications of application performance monitoring, infrastructure performance monitoring, virtual monitoring, all these things, no longer apply in this complex, dynamic and mission critical world that we live in. What we need to do is, begin unifying those efforts and views.
How do we bring these together and apply that dataset in ways that may be you didn't anticipate before? I'll give you some examples. The way we codify this is into four primary, call them "Use Categories." The whole objective first and foremost, MIT Sloan School of Management wrote these absolutely stunning treaties, on the myth of IT business alignment, or the traps of IT and business alignment.
We all want that. We all want IT aligned with our mission objectives. The whole premise behind the research that MIT did is, alignment before IT is effective, can be disastrous. You can't be aligned and start aligning until you've covered the basics. That's that first bucket here. What you heard Eric described was moving from a reactive state, to a proactive state, where you got the situational awareness, to understand the relationships between all systems, as well as the people managing those systems. You're going to talk about DevOps, baselining and trending. Having that continuous visibility applied allows you to get ahead of the curve in the game and anticipate and address issues before they become problems. With the same dataset, however, if you've got that historical trending, you can apply it for things like answering questions. "Do we need this storage? Do we need that storage upgrade? Why? Oh, 20 percent of the files in use represent 80 percent of the traffic. What if we move the other storage off to tier 2? We could save money." It's applying the dataset in that fashion that gets you to this next phase.
The third one is, because we are continually on, having constant surveillance, understanding the context of, "Who's a client? Where are they coming from? What are they doing? How much? How often? When? And what?" becomes a very powerful addition to the security ops team and security analytics overall.
The other cool thing about this is it starts to bring security awareness and responsibility to everybody in the organization, not just the security team.
Then finally, there's real time operational analytics that can be had. If you think about it, regardless of the initiative, the technology, and the change underneath, whatever's going on, whether you're virtualizing, you're doing private cloud, public cloud, a hybrid, you're adopting SDN or network virtualization, the one constant between, regardless of what happens, everything transacts on the wire.
If you have a means to mine and measure that, and by mining, I mean even extracting things from the payload, you can get to a state even of real time business or mission analytics.
With that, I'm going to turn it over now, the rest of the presentation, to Eric, because one thing that you've got to be aware of is when any organization, when there's a new technology out there that's pioneering like wire data analytics, it's the early adopters that see the potential and apply it, and start seeing great things happen.
Lockheed is one of those organizations. They've been around a long time, but it's refreshing to see that regardless of their size, they really practice what they preach in regards to innovation. With that, Eric, I'll turn it over to you.
Eric Sharpsten: Thanks, Erik. This is where I really get to get excited about things, because we're only starting to scratch the surface with utilizing the ExtraHop toolset. I'm going to take some time to share some use cases, I guess is the best way to put it, with how we've done it. One of the first things we realized we could do was automatic discovery. We basically came in and pointed at one element of the network, and the tool just built out the network from there based upon the data flow traffic. It really is an extra hop to a hop to a hop.
It works with both physical and virtual networks, and you should probably know a little bit about what we do. My customer has a host of PII and PHI data, and we serve some 125 million beneficiaries throughout the greater United States, and throughout the world in some cases.
We have a highly virtualized environment. I'd say it's probably about 90 percent virtualized, and it is great for updating or validating your documentation. We know that we constantly go through changes where we're even changing configuration settings on databases and firewalls, so not necessarily just code releases, but we're actually changing how the systems are talking to one another. We all want to make sure we get our documentation done. We've got release management practices, but things get missed. Sometimes it's nice to go out and take a look at this, and we'll say, "Gee, you'll see that on this chart right here...," and this is just a chart that ExtraHop provided us, so this isn't one of ours.
You see in the upper right hand side, there're three servers, or three elements hanging out there. Sometimes the things get discovered, and we don't know how to put them together, or they're misconfigured, and so that's a really good indicator of how things are misconfigured. You can go over it, and you say, "Well, this doesn't make sense," and you might find some problems and even resolve some issues before they become issues.
We're also using this. Right now, we're in the midst of migrating a very large data center to two smaller data centers that are COCO owns. The large one is GOCO government owned, customer operated, as opposed to contractor owned, contractor operated.
What we're actually doing is we're actually snapping performance characteristics, or baselines, of each one of these application sets before we move them over, and demonstrating that they operate as well if not better in the new location as they did to the old location.
You could do the same thing for moving to a cloud data center. Many people don't understand or fear that maybe, that it won't operate as well in the cloud for security reasons or for scale reasons or what have you, so this gives you a good way to validate that.
The next topic I wanted to bring up was support for cyber security. We are really just scratching the surface on this. We have a very large set of security based tools. One of the things that we learned that we could do with this is just give us some visualization, so in that we're looking at the inbound firewalls on all of our systems.
One of the things we noticed was, we brought our graphics map up, is that where are these pings that we're getting from China, North Korea, and Russia. Because we're a government organization, we're constantly under attack by these external entities, both domestic and foreign, but these were things that were just glaring right out there at us.
What we were able to do was inform the SOC of this, that is actually managed by the customer, not by use directly. It provides them with another set of information that they can incorporate into their feeds and give them better insight to what's happening with respect to cyber security. With that, I know we have an engagement with the [indecipherable 0:26:05] office, who we don't directly work for, but they're asking us to help them to get a better understanding of the situational awareness that they're not able to see necessarily with their tools.
I'm going to talk a little bit more about another incident that happened, here in the future, on another slide. The next thing I wanted to talk to was troubleshooting and root cause analysis. This is one of our best areas, because our customer, when Erik was talking earlier about the different classes of tools, our customer has been heavily reliant on agent based tools the IT cams, the CAs, the new relics of the world.
The problem with those tools that we've found is twofold. One is they imply or impart a penalty on the systems, and there are some of our systems that are so tightly managed that we cannot put an agent based system on there to capture that data.
The second item that we had was that whenever we want to make any changes to that agent based data or update or install, we have to ask the developers to give us a system for four or five hours to make the upgrade. The schedules are so tight here that the developers are either unable or unwilling to provide a downtime for us to be able to upgrade those systems. It becomes a bit of a scheduling challenge.
A third item is that I only see what I instrument, so if I have roughly half the structure implemented, or instrumented, and that being mainly the servers, I don't see the other data. I don't see the data from the firewall, I don't see the data from the load balancers. I don't see the data from the MQ brokers.
With this, because I'm on the network, I see everything that traverses the network, and I can actually crack the packets open and look and say, "Oh, you've got this 400 error," and I could tell the developer, "The reason why you get this 400 error is because you don't have the JPEG that you're looking to build this Web page with. It can't find it, so you've got a mismatch. Either you didn't deliver the JPEG, or you gave the wrong directory for the JPEG to be included in."
Oftentimes we can get down, and we can figure out almost exactly to what line the error was on, but the other thing with respect to troubleshooting, and we talked about, it's the ExtraHop provides the...everybody's looking at the same data.
When we get on root cause analysis calls, and I will tell you in full disclosure, we use two wire collection data tools, of which this is one of them. The other one takes looks at my infrastructure. Through those tools, we can get everybody to say, "OK, at this time, I know what happened. I know what happened to this database. I know what happened on this [indecipherable 0:29:20] . I know what happened on this server."
In the past, what they all did is say, "At time x, what's happening on the main switch? OK, what's happening on this port on this switch? What are the data...?" and it would take us forever to work through these scenarios.
Because each discipline is looking at their own elements, I've got a database admin looking at his database server, but he can also see the feeds coming in off of his network. I've got a WAS admin server looking at what's happening on the JVM stacks at the exact same time.
They're both looking at it from their own perspective, but they're both looking at the exact same data, rather than having to piece something together from one spot, so this has really been a really effective toolset for us.
The other thing is that because we've got access to all this data, we have actually built triggers in to enable our NCC, which is our network control center many of you probably call these NOCs to take proactive measures on items.
Whenever we see something going astray based on a trigger that was placed there by a WAS admin, a database admin, or a storage admin, we're able to either take action to eliminate the severity ahead of time, proactively, or we're able to dispatch an individual from those teams if they haven't given us a specific script to go ahead and take proactive measures on that.
One of the other things that I'll talk about here is a database example, and I alluded to this earlier in the discovery. Whenever we got our proof of concept with ExtraHop, the first thing we did was we put it on to our network and pointed it at one of our databases on a small application. We collected data for a week. We got that back, and we said, "Oh, this is interesting." We've got this one database server, and it had 800 million transactions in a week. We had about 21 million errors, and we said, "Well, that's interesting. Let's drill into that," and so we looked at that, and we said, "Well, about 20.5 million errors was attributable to a single IP address.
We said, "Whoa, that's interesting. Let's drill down further." Out of that, about 19.8 million were bad username password calls." We did some investigation and found out who was given a bad username password call, and it turns out that our electronic file transfer server, grand server, was decommissioned. We had to actually changed that over to a new application. What happened was, we shut it down and we removed the application from the database. We subsequently went through a password aging where we changed all our passwords over so many dates because of the previous password.
Whenever the password aging happened, we never updated gen transfer, because it wasn't supposed to be there. Yet, it got left in a startup script and we had our maintenance cycle. We powered that gen transfer over back up, our virtual server back up. It was just pounding away the database server. In this case, it was a very innocent issue, "Why was anybody attacking us?" That was chewing up five percent of the total base of that server. It was now impacting operations, but we were wasting a lot of services.
There were some configuration changes that were made, and we thought we had the documentation all updated. Guess what. We missed this one script file. Great lesson. That was actually the first lesson that we learned from it. Everybody really took a lot away from it. The application performance tracking, I talked a little bit about that. We're actually utilizing this in one other program to see data into a business analytics engine. We're actually producing all of SLAs with this or are reporting to our customer on a monthly basis, but we're also usually a visualization tool. We're taking our infrastructure reporting and we're putting that together with our application performance, and we're reporting that side by side to our customer.
Our customer, when they saw this, none of our competitors are providing this kind of information and this kind of transparency to them. This really has improved our standing with our customer. They're actually using this to push our competitors to be more forthcoming in their information.
Erik Giesa: I think it would be really helpful. Would you mind letting the audience know? It's OK if you can't if it's private. What are you using for that common data store? You can mention the other vendor, too, because that's the whole point, is that we fundamentally believe, and I know you think this way very clearly, no one vendor can provide all the data you need. The point is the system should be open and you should be able to do the data integration. Could you talk to that?
Eric Sharpsten: Yeah. I can, a little bit. The other vendor we're using for my layers of one through three is SevOne. I'm getting my JMX data, my Net Flows, and my SNMP trapping data form there. Combined, they actually have their own database as does ExtraHop, but to do our SLA reporting, because I need to have a lot of collected data over long historical time, we're using APIs that are built into ExtraHop to put data into Cognos. Cognos, it happened to be our analytics engine, between using micro strategies apps, what have you. We're able to actually generate our SLA report. This is back to the point that Erik had talked about. There's lots of data out there. We need to be able to utilize data to more effective manner. With the presentation side, the real time presentation of data, I'm using another tool, which is a visualization engine, the best way to present it. That's from Edge Technologies. Once again, they're also feeding that data. They're pulling the data directly from ExtraHop. We're leveraging the rules that are sitting inside of ExtraHop to see how well an application is or is not performing. The Edge Tool app is just presenting that. We're able to slice and dice the data differently for different customers.
I have dashboards for executives. I have dashboards for system owners. I have a dashboard for the data center owner. We have actually our customer who offers IT services to their customers within their organization. They actually were able to take in break outs. The financial applications, we're able to break up all financial applications, put them together in the dashboard, say, "Here's how well they're performing. Is it green? Is red? How red or how green is it?" It's almost in a dial kind of a format.
They want it all to be green, but they don't want to be all the way or the left in green, because that means they're probably spending too many resources. What's the balance? We've been working with the customers to make sure that we hit the right balance to use the systems effectively. Erik, that covers it for you?
Erik Giesa: Yeah. I think the audience will really appreciate that, because it highlights the point. You guys show specific tools, because you are using them. They are good for you. The point is, it provides a lot of flexibility. There's almost any type of visualization or data store can be used whatsoever appropriate for them. What I love is that you guys were already thinking this way prescriptively in building an IT operations analytics architecture.
Without having Gartner tell you or a vendor tell you that you just organically approached it this way, which is so innovative, I think. The results are pretty powerful and speak for themselves. Thank you for that.
Eric Sharpsten: The next slide that I just brought up is just a follow one where we're just talking about the real time situational awareness, that transparency that builds rapport with the customer, with the database or with the database administrators, with the system developers. It's been a hard role to get the system developers to trust us, because there's always that tendency that point fingers and we brought them in and said, "Look, here's all this data." What do you want instruments? We can give you better information. Realizing that it's wired data, it's not impacting their performance as much. We're winning them over a little bit of the time.
The other thing is that, we talked about periodic data reviews, and I went back to that...Let's go back to that database where we had the username password pairs. First thing they did was, they said, "Security team, how come you're not seeing this?" What happens is, the security team uses a lot of log based analytic rules. Actually, we're utilizing Splunk. What happens, they base lined it after this situation has occurred. It looks normal to Splunk. They're not going through and looking for errors, yet it wasn't until we had ExtraHop on the system that we're looking and going, "Wow, look at all these errors, because it actually comes up as, 'Error is being kicked on the server.'"
They give us that better insight. We said, "If I just would have SevOne, I wouldn't be able to feel these things, because SevOne just tells me to help with the infrastructure. It doesn't tell me to help with the application. That's what I get from ExtraHop.
I will know that in full disclosure, I have got a fairly significant, a very robust large scale monitoring platform in here. There are probably three of them in the world, and I've got one of them. I did not have this kind of visibility before I put these two things in.
When we put SevOne, it helped, but we put in ExtraHop and we start feeding information to our customer, our customer was, "Whoa. Wait a minute. I have got too much action with data here. I have got to change my processes on the way I declare severities and how I filter and allow that information." To me, that's a huge success. We just hit one out of the park with my customer.
Unless there are questions, Erik, I'll turn it back to you.
Erik Giesa: I'll just sum up for the audience on this webcast. First and foremost, some prescriptive items. Don't ever believe a vendor if they tell you they're the only tool or only platform you will ever need. It's simply not true. That's a legacy approach, what Eric was alluding to about he had one of the largest monitoring environments in the world. The framework approach doesn't work. There is no single pane of glass. In fact, most of our customers refer to that as the myth of or the single glass of pane.
What it's all about is data integration and workflows. You heard Eric talk about that in regards to changing the processes. This doesn't have to be a painful process. Once you get some of those quick win, the constituency starts to see the value and you start to see this organic and natural coordination between teams, which creates better situational awareness, when you got people working together and collaborating well. The first and foremost point, to get to an IT operations at analytics stance, you've got to move away from tool centric viewpoint of the world and a data driven view point.
If you look at the way that Eric and his team have quantified things. Step one really is a machine data platform, because it's getting that flow of that's produced by the switches and routers in the environment.
It's getting SNMP. Again, that's a machine generated. It's available on the wire but it's machine generated. They're using Splunk for the log. That's machine generated, and they do have some agents.
Using ExtraHop for the wire data and then bringing that together into at least some of the subset of these datasets into a common platform for correlation and application across teams and to their customer base. Again, this doesn't have to be hard. The old way as you would implement a framework and it might take three years and hundreds of consultants and a lot of cost.
Lockheed has been a customer of ours for about 18 months. This has been a continuous evolution over time. I'm excited to work with them moving forward and exploring more and more use cases and how we can help provide greater and greater situational awareness for the organization.
Clearly, they've already pioneered this approach and they're well on their way to driving operational efficiencies in achieving these main objectives like we've never seen before. It's really encouraging. With that, I'm going to turn it over to Steve, our moderator, who's going to open this up to Q&A for the next 15 minutes or so. Thank you.
Steve: Thank you. Thank you, Erik. Before our two Erics begin answering questions, I'll again remind our audience that if you would like to submit a question. Just look for the Ask a Question console in the lower left hand corner of your screen, type your question in the box, and then hit the Submit button.
We've already gotten some good questions for both our speakers. Erik Giesa, let's start this question with you. How does ExtraHop handle encrypted traffic?
Erik Giesa: That's a good question. One of the things we're very proud of, our pedigree comes from F5 Networks. Our two founders were the chief architects on version nine of the BIG-IP platform. I came from a product there for 12 years. We know encryption well and we know scale very well. On our platform, we can decrypt at wire speed up to 40 gigabytes per second and over 70,000 transactions per second. In order to do that, we just need the private key.
Again, like Eric Sharpsten had said, we're not invasive, we don't perturb the environment. All we need is a span or a network cap, so we get a copy of the traffic. We get a copy of that traffic and we have the private key, we can decrypt it at line rate. Another alternative, too, for people, especially if you're using a perfect forward secrecy or something like that, you could have a proxy that would be doing the decryption and then mirroring the traffic to us as well. There are lots of options, but we're fairly unique in our ability to scale and do decryption so that we can see that traffic.
Steve: Now, Eric Sharpsten, a question that's come in for you. The questioner asked, "How did Lockheed Martin select and then also justify ExtraHop?" Eric Sharpsten: We have pretty objective process, because we find that our people tend to get pretty married to one company or another. What we end up doing is, we set up our requirements, we do RFIs to several customer companies at once, and then we'll do basically a scaling, or we'll score them, and then we scale our scores, and we come up with a technical approach on that. Then we'll do a proof concept based on the top two, or possibly three but we'd like to keep the two.
One of the things that drove us to ExtraHop was, one, it's small sized. It is extraordinary agile. They are extraordinarily responsive to the point that they sat down with us and helped us get going. It took us about...I'd say about a month so that we could actually run with it ourselves, and two months where we're starting to get really good data. Within three months, we were actually utilizing it to draft and record SLAs.
I think for all those reasons, that's why we went with ExtraHop as opposed to some of our traditional tool sets.
Steve: Thank you. Let's see. We've got another question in here. Back to Eric Giesa, the questioner asked, "From the context of cyber security, specifically, what does ExtraHop provide? For example, monitor, log, prescriptive action, et cetera?
Erik Giesa: That's a great question. First and foremost, it stats with...Imagine that any activity on your network is measured and recorded and that activity has a lot of context associated with it. What was the requests, the transaction? Where did it come from? Who? What did it get the response from? Which system? Which element? What was exchanged in that transaction?
It's monitoring, obviously, at its heart but it's very contextually rich in terms of what it's monitoring. The way people are using us, it's fairly simple to set a policy for example that would say, "We have a collection of database this year, and we want to make sure that all transactions from this database are encrypted in flight, and they have the correct deciphers and certificates associated with it and key strength," so that we can make sure we're complying with our stated security or encryption goals. This is just one example.
The same database, you might also say, "By the way, no single client should ever be getting more than one megabyte of data from this database. If we see anything more than that, that forwards investigation." Then we look at this a little further, and you could set policy to say, "Also, we want to know every user to the example that Erik gave earlier." That was an innocent mistake. It was the authentication errors that were picked up by what was thought to be a decommissioned server.
We could provide who's logging in, admin, root, et cetera, whether the client is accessing this. You've got the situational awareness that not only are we encrypting everything of flight but the data leaving this fits within the parameters of what is acceptable. Which clients are consuming this is very valuable.
The other thing, too, and this is out there, if we had been in place, this node in episode might not have happened. We see all users. Again, this is possibly, this is mining the wire, accessing files.
You could also set a policy on your filers in your storage systems to say, "Boy, notify me if any one user is accessing more than a hundred files in a 12 hour period, because that's highly unusual, or the volume of data as well, or by location."
Unfortunately, our technology is so noble that we don't fit into a specific predefined category of security. The closest we come, or the way we can describe this, is continuous behavioral analysis and surveillance, network and application surveillance.
Eric Sharpsten: I'd like to add to that. This is happening real time, as opposed to many of our log analytics. You're waiting for your logs to get updated, or pushed to you. It depends on how long the cycles are. If you've got a SOC, or are not watching something, they're going to see it as it happens.
Steve: Thank you. Eric Sharpsten, since we have you on the line, there's a question here for you. The questioner asks, "What is the scale of Lockheed's ExtraHop deployment?"
Eric Sharpsten: I can only speak to my one program here, because we're kind of set aside from the rest of the Lockheed Martin network, but I can tell you that there are other organizations that are employing it, too. Right now, I talked about the fact that we had one data center, moving down to two.
I have three main data centers where I'm collecting data. I'm aggregating that data across the three data centers, so as I have a presentation layer in one, an application database layer in the other, I can track the application all the way through, and also understand how the line speeds are impacting it. I also have my disaster recovery site implemented.
I have one test bed that we've implemented, so since 18 months, we've invested quite a bit, and relied quite heavily on the ExtraHop infrastructure. Steve: Thank you. This question I'll throw back to Erik Giesa, but Eric Sharpsten, feel free to jump in if you want to add anything to his answer, because the question is this. What are the strengths and weaknesses of wire data compared to other sources of data?
Erik Giesa: That's good, because this gets to holding vendors accountable, and trying to cut through he noise, and oftentimes the over positioning of products. We, ExtraHop's platform, won't see things, activity, that doesn't leave a host. This is where the machine data, and even sometimes aging data is very valuable.
For example, if a configuration change is made on that system, that's typically a logged event. That doesn't hit the wire. We would see the administrator, or somebody connecting to the system. The fact that they were inside, and making some complete change, we would not see.
There are also some application errors or system level errors that might only be locked. That's rare. Basic premise, if it never leaves the host, we don't see it. Like Eric Sharpsten was saying, let him talk to this. Why they use SevOne? CPU, memory and I/O information, that's usually resident on the host, and published to be something like SNMP. We stop, if it doesn't hit the wire.
Eric Sharpsten: That's exactly why I've not gone to one tool that does everything. What we really focused on is, what are small set of tools that accomplish everything that we need to do, and coexist inside of environment, and complement each other? That's why I've gone to my line data tools. To your pointer, we know that configuration changes made on that, because I can utilize my performance log analyzer, to understand, "This is what's happening on that system." You may see the effects of that change, as a result of what ExtraHop is shackling in and out of that server.
Erik Giesa: Correct.
Eric Sharpsten: This looks different now. It looks different around this time. Now, I'll go into my SevOne Performance Log Analyzer and say, "Jim Smith signed on, at this point in time. It made some changes." Maybe that wasn't so good, and we can go back and make that correction.
Erik Giesa: Correct.
Eric Sharpsten: From the strength side, and I've talked about this a little bit, I see everything. With specific probes, I only see where I have probes. With my agent data, I only see where I have agents. I got to make sure my agents are up, because my agents crash on this a lot. I don't have that problem with blind data.
Steve: Terrific. We're just about out of time, but I see one question that might be easy to answer. I'll get this in. Erik Giesa, the question I ask, does ExtraHop work in the cloud?
Erik Giesa: We got virtual appliances for Hyper V. We have virtual instances that run in AWS. We're adding more as we go. That's a really important point, because the last thing you want to do is have a different operational model and workflow for the cloud, that it's different than on premises. That goes back to that tool centric that drives a lack of situational awareness. Yes, we do.
Eric Sharpsten: If I can just add to that. Our customers came to us and asked us, we're looking at doing an appointment in the cloud. We are looking at AWS. There's a host of tools inside of AWS, to do performance monitoring, et cetera. One of the things that we found lacking was the ability to do application performance monitoring. They're always working on those tool sets.
The fact that we're able to drop in into our plan a virtual appliance from ExtraHop, really dovetail with everything that we're doing today, and feeding into our network control center. We're able to leverage all of our current infrastructure we had in place for monitoring in our practices too. It is very viable inside of the AWS cloud.
Steve: Terrific. Thank you very much. I'm sorry to say that's all the time we have today. It's been a great presentation by both our Erics, Erik Giesa and Eric Sharpsten. I also want to give a special thanks to our sponsor, ExtraHop, for our program today. Thank you very much for attending today's webcast. This concludes our presentation.
This is a companion discussion topic for the original entry at https://www.extrahop.com/blog/2015/how-lockheed-martin-uses-wire-data-for-situational-awareness/