Extremely slow upgrade from 8.5 to 8.6 (10 hours+ in one case)

We have a physical Extrahop discover unit and three virtual Explore units (running as VMs on ESXi) which I recently upgraded from 8.5 to 8.6, the previous two upgrades had gone fine but this time it didn’t go so smoothly. The Discover unit was very quick and went fine, the second Explore unit took around half an hour, the first Explore unit took just under ten and a half hours and the third Explore unit took over 11 hours. The entire time I was waiting the browser showed ‘Uploading…6%’ and very slowly ticked up, I didn’t want to cancel the operation in case it caused issues and did mail Extrahop at the time but there’s been no response.

Incredibly after all that time they did successfully update and there’s been no issues since although the Discover unit is a bit slower to come up first time. I’m wondering if there’s anything I can check on the Explore units to stop this happening again, I followed the steps I could find on this site so I disabled the record ingress on the Explore units, I upgraded the Discover unit first then the Explore units and I made sure all the Explore units were on the same firmware. I haven’t done any restarts on the devices themselves or cleared the cache or anything.

One concern I have is the disk usage on the Explore units which is as follows:
Explore Unit 1 - 509GB/718GB (71% used)
Explore Unit 2 - 609GB/718GB (85% used)
Explore Unit 3 - 523GB/718GB (73% used)

The replication is set to level 1 and unit 2 which was by far the quickest, has the most disk usage. I’ve had a hunt through the documentation but I can’t find what should be done long term to manage disk capacity since it can’t just keep climbing.

Any recommendations to check would be appreciated.

The EXAs will automatically delete old records as the disks fill to capacity. Please contact ExtraHop Support to troubleshoot the slow upgrade. Not sure why it would have taken so long for some nodes but not others.

I notice you said these are Virtual Explore units - - did you check the vcenter information and what other systems on the same vmhost? Or are these dedicated to the Explore unit?

As was suggested check with support, there may be logs still around that might suggest something odd happening.

I have one virtual explore unit - we also have 3 physical explore units. I still have some non-Reveal(x) systems that needed somewhere to put their data - they go tothe virtual explore unit.

I also notice my last upgrade took longer than I expected - but not more than 1 hour.

The ESXi hosts are not dedicated to the Explore units but they’re extremely powerful servers (lots of cores and ram) so they’re each running a good number of VMs with plenty of CPU and memory to spare. There’s been no performance problems on any of the other VMs on that cluster.

I did contact support on the night I was having problems but still haven’t received any response which is why I came here instead.

I’ve now done the upgrade from 8.6 to 8.7 and it took a few minutes so I’m wondering in future if I’m doing updates I should restart the Explore units before starting.

Hello JClagga,

Sorry to learn about the problem you are experiencing with your upgrade. We do indeed recommend system restart before the upgrade. Nonetheless, I have asked one of our Support Engineers to contact you to help with this or any other issues you may be experiencing.

In future and for a quick resolution, please endeavor to create a Support case so one of our Engineers can readily assist you.

Colin O
ExtraHop Support

@jmclagga I confirm this behavior. The solution is to stop the explorer cluster record ingest and reboot the node you want to install the firmware on. Wait for the node to recover from reboot and install firmware. Once you have upgraded your cluster you can re-enable records ingest. When you reboot each node your upgrade will take minutes.

Hey there, thanks for the reply. I’ve been doing a reboot of the Explore units before updating when I went to 8.7 and 8.8 which has worked perfectly with the units taking almost no time to update.