Getting started with Extrahop. My experience


#1

Someone was asking about Extrahop the other day. They were asking how do you start where do you start. I had been telling them that we have made at least 30 changes based on information gathered from Extrahop and have at least another 30 in the pipeline for change. They were amazed at how much we had changed.
I will say that the first week we had Extrahop in place it was a little overwhelming. Now granted we should be ahead of the game as we had Riverbed Netshark deployed and have been capturing packets and finding problems on a regular basis with it. But with Extrahop well all I can say is DATA overload. I sat there and looked at all the dashboards. All the bundles (yes I loaded a lot of them if it sounded cool I plugged it in). But now with all those dashboards where do I start. Where do I go to solve my problem?
Understand I forced our hand, some. Our team had been using Packet Analyzer and Netshark from Riverbed to solve problems all the time. Me being the all in person I am. I ripped all of that out and put Extrahop in and pointed them to it. So now we had to figure out how to solve a problems using a different tool. There was truly a learning curve. Matter of fact some people might have been a little mad at me. As I noticed it took several weeks of them not logging into it. But finally the problems came up and they had to log in to solve a problem.
However we did not spend the money we did JUST to solve a problem when they came up. We wanted to use it to make our environment better. So I poked around a bit to figure out where to start. For me I started digging into stuff I have knowledge of. One of the first things is I would ask myself why did someone want this metric on this dashboard? That started me thinking about how I could use it.
So my first big step was taking a hard look at DNS. DNS is something I think I understand. And something I think I can fix. So why not use it to help me understand what Extrahop can do for me. So I began looking at Query’s and why was I seeing these queries. Why did ISATAP show up as a top query? Why did WPAD show up top query? Why did I have 100K of failed query’s an hour of some odd alphabetic string? Why did I have DNS time outs? What were the time outs? Are time out’s bad?
Asking those questions and digging into the data. We found several issues.
• Some of our DNS requests for critical resources were timing out. That is not good. We worked to fix that and move on. We could watch to see if the change actually worked as well.
o This actually improved performance of our applications.
• We found that some app was calling chrome every 30 seconds and chrome was making 15 DNS queries to non-existent DNS names. Take that times over 100 computers you then have 1000’s of unnecessary requests.
• We found over 400 devices with misconfigured DNS servers. So they were querying non-existent domain servers and the requests was having to be dropped at the firewall.
• We also found several DNS query’s we decided to black hole.
• We found wpad queries happening all the time. Even though we had though we had applied a GPO to stop this. Same for ISATAP. If you do not understand why this concerns us. Do a little digging on the risk. We have stopped at least 80% of our WPAD queries.
• With the above work we were able to move from 90K queries a second (across several Domain Controllers) to less than 7K queries a second.

So that started to give me an idea of how to move around in Extrahop. Then I started looking at how I could use it elsewhere. This all may end up in a blog at some point but thought I would toss it out here as well.


#2

Thanks for sharing this! We’re thrilled you’ve had such a positive experience, and can’t wait to hear what you do next. If you have any questions, you know where to find us.


#3

Have you ever been in a meeting and the Boss\CIO\CEO asks a question about something and you sit there wondering how on earth I will answer this? Or maybe how on earth will I even have the data? Well those questions happen a lot for me. A couple of weeks ago the question came up about one of our vendors. They had sent a note that they were changing to TLS 1.2 and that all other versions would be blocked. So there I sat in a meeting with the CIO. And the question came up are we ready. Now understand this is business impacting for us in a significant way. And normally all I could say was we tested it and verified a couple of machines worked with their test environment.
But when you have over 1000 PC’s that answer just does not seem relevant or even confident. In some cases our CIO would walk away expecting an outage as testing often seems to test positive but fails when it comes to production. However on this day as the question was asked I started thinking. My boss the VP was looking at me since many seem to think I understand TLS and certificates better than I do. And the room filled with 30 or so people came to that dead silence as people patiently waited for the response. Hmm how should I answer this? Then I remembered I had Extrahop. I wondered could I report on this quickly enough. I began banging at my keyboard and looking at different dashboards. Before long I had records up on the screen from each store location showing all connections for the last 4 hours to that site were already TLS 1.2.
That was a win if I ever had one. Needless to say at that point we had only had Extrahop for a little over a month. But it was starting to pay off.


#4

Great story! Our IT Director has told me that he’s often remembered, “Oh yeah, we have ExtraHop!” and how it’s helped answer ad hoc questions like yours.

For example, after the Heartbleed bug came out, he used our historical ExtraHop data (we store it going back years) to see if we had incoming SSL heartbeat requests, which are the vector for exploiting that bug. We could confidently assume that no one had stolen information from us prior to the vulnerability being announced.


#5

i have experienced with this kind of situation before, but since i’m a consultant for clients who will/or uses ExtraHop, the questions are usually thrown at me, so what i did was, well I tried to explain them with simple cases like HTML what are the metrics for HTML that we need to take a more closer look; like HTTP Errors 5xx, or sometimes we can compare the HTML request to HTML responses, if there were too many differences in the comparison between request and response it might be, that the web servers are having an issue(s), or we can take a look metrics like TCP response time out (RTO) or High latency, start there then see metrics like zero windows in/out or TCP Resets.

There are many case studies provided by ExtraHop in their website. at first, i start from there, and then I took the time to watch the Tutorial Videos made provided, it certainly help a lot. you can find it in Youtube or Training Videos.

CMIIW. Cheers… (sorry for the bad english)


#6

So let me add to this. We have been able to make real time changes and prove if they work or don’t work thanks to Extrahop. Both for applications and for network stack. This tool has been invaluable for us. And we use it on a daily basis, both for trouble shooting and for proactive decision making. We are always questioning the metrics and trying to learn more about ways to use this tool. From Server performance issues to security related risks. the data is fantastic.
We are now building way more dynamic dashboards so we can selective pick applications we want to show.
And there is so much more we are doing, the PFS decryption is awesome. And ability to make custom metrics to track is a great feature as well.
We are well over 100 things that this tools has helped us change in our environment. I can’t say that for many other tools we own.