Someone was asking about Extrahop the other day. They were asking how do you start where do you start. I had been telling them that we have made at least 30 changes based on information gathered from Extrahop and have at least another 30 in the pipeline for change. They were amazed at how much we had changed.
I will say that the first week we had Extrahop in place it was a little overwhelming. Now granted we should be ahead of the game as we had Riverbed Netshark deployed and have been capturing packets and finding problems on a regular basis with it. But with Extrahop well all I can say is DATA overload. I sat there and looked at all the dashboards. All the bundles (yes I loaded a lot of them if it sounded cool I plugged it in). But now with all those dashboards where do I start. Where do I go to solve my problem?
Understand I forced our hand, some. Our team had been using Packet Analyzer and Netshark from Riverbed to solve problems all the time. Me being the all in person I am. I ripped all of that out and put Extrahop in and pointed them to it. So now we had to figure out how to solve a problems using a different tool. There was truly a learning curve. Matter of fact some people might have been a little mad at me. As I noticed it took several weeks of them not logging into it. But finally the problems came up and they had to log in to solve a problem.
However we did not spend the money we did JUST to solve a problem when they came up. We wanted to use it to make our environment better. So I poked around a bit to figure out where to start. For me I started digging into stuff I have knowledge of. One of the first things is I would ask myself why did someone want this metric on this dashboard? That started me thinking about how I could use it.
So my first big step was taking a hard look at DNS. DNS is something I think I understand. And something I think I can fix. So why not use it to help me understand what Extrahop can do for me. So I began looking at Query’s and why was I seeing these queries. Why did ISATAP show up as a top query? Why did WPAD show up top query? Why did I have 100K of failed query’s an hour of some odd alphabetic string? Why did I have DNS time outs? What were the time outs? Are time out’s bad?
Asking those questions and digging into the data. We found several issues.
• Some of our DNS requests for critical resources were timing out. That is not good. We worked to fix that and move on. We could watch to see if the change actually worked as well.
o This actually improved performance of our applications.
• We found that some app was calling chrome every 30 seconds and chrome was making 15 DNS queries to non-existent DNS names. Take that times over 100 computers you then have 1000’s of unnecessary requests.
• We found over 400 devices with misconfigured DNS servers. So they were querying non-existent domain servers and the requests was having to be dropped at the firewall.
• We also found several DNS query’s we decided to black hole.
• We found wpad queries happening all the time. Even though we had though we had applied a GPO to stop this. Same for ISATAP. If you do not understand why this concerns us. Do a little digging on the risk. We have stopped at least 80% of our WPAD queries.
• With the above work we were able to move from 90K queries a second (across several Domain Controllers) to less than 7K queries a second.
So that started to give me an idea of how to move around in Extrahop. Then I started looking at how I could use it elsewhere. This all may end up in a blog at some point but thought I would toss it out here as well.