There's a cute infographic floating around that estimates that humanity creates 2.5 quintillion bytes of data every day. That estimate is based on a lot of assumptions and napkin math, but it is fun to think about. We could get into a debate about what data creation really means (is data only considered to have been "created" if it is saved and somehow retrievable?), but let's not.
Instead, let's scope this idea down a little, do some napkin math of our own, and try to figure out how much data is created by one particularly data-intensive event: The Super Bowl.
For the purposes of this thought experiment, we'll need to identify our data sources. What counts as data "created by the Super Bowl."
Any football game generates a lot of data because of the sheer number of people involved, and the number of digital transactions any one of those people participates in. Ticket scans, credit card swipes, social media posts etc.
Wired magazine put out an excellent article recently about the level of surveillance that goes into keeping the Super Bowl a safe place to be. The gist is that every person who walks within miles of the event has electronic eyes on them, and likely a digital dossier of their communications and behavior is being created and saved. So that's more data creation.
Every NFL player's uniform is equipped with tiny sensors that track their every move and transmit it via RFID to receivers around the stadium that then beam it to a broadcast truck and up into the cloud so it can be picked apart and analyzed for insights. Data, data, data!
On the sidelines of every NFL game there are dozens of customized Microsoft Surfaces, which are used by players, coaches and other non-player members of an NFL team to review plays and other game-related info nearly instantaneously after it occurs.
So. Much. Data.
All of that is nothing compared to one data type that accounts for a huge, huge volume of data generated at every Super Bowl: high-definition video.
I say any video recorded of the actual event should count. That means every smartphone in the audience, every TV camera, replay camera, security camera, every GoPro-on-a-drone (authorized or not). Let's restrict this data source to include only video recorded at or around the actual game, and exclude people who DVR the event at home. Most of the cameras at the game are probably attached to smartphones that are also sending and receiving texts, picture messages, location data, SnapChats … you get it.
We'll also exclude all the data that goes into making the Super Bowl happen. Such as the market research that drives the advertising, the intense planning and communication to decide where it will be, and how it will happen. We won't count the data that fuels the economic calculations every city performs to figure out, "Is it actually a good idea to host the Super Bowl?" (tl;dr: No)
For this thought experiment, we'll only include data that is actually created (i.e. saved, written to disk) during or directly and only because of the Super Bowl.
The Super Bowl itself is four hours long. That's 3.5 hours of game time with ads peppered throughout, and a 30-minute halftime show.
The average price for a Super Bowl 50 ticket was $3,667. (Not super relevant to this post. Just mind boggling).
Average attendance at a Super Bowl is 77,987 people.
Those are numbers we basically know. From here on out, we're estimating, and in some cases speculating wildly. Enjoy the ride.
Data Source #1: Fan-Generated Video
I think we can safely assume that almost everyone at the game has a smartphone, and is going to actively use its camera at some point. Let's say that 80% of those people are going to shoot a video during the game itself (not including everyone who records the halftime show), and that the average length of those videos is two minutes.
Eighty percent of 77,987 is 62,389 (rounded down). So if that number of people with smartphones take two minutes of high definition video apiece, that's 124,778 minutes, or about 2,079 hours of video.
Let's speculate that an hour of video shot on a smartphone takes up 7.8 GB. That assumes 1080p resolution at 30 frames per second, which is standard for the iPhone 6s. That means 2,079 hours equals about 16.2 TB of fan-generated video coming out of the Super Bowl's game time. Still not counting the halftime show. Kind of a lot … but not really.
Let's talk halftime. It's a 30-minute show featuring multiple huge celebrities. I would estimate that 30% of the audience films the entire thing on their smartphone.
That's 23,396 people recording 30 minutes of footage each, so 701,883 minutes, or 11,698 hours of footage. That's 91.2 TB and change, bringing our total to 117.4 TB of fan-generated video data. Now we're getting somewhere, but the accuracy of this estimate is going down. We're making some estimates of numbers that would be extremely hard to actually measure or confirm.
And we haven't even talked about professional cameras yet. Let's say there are 50 professional videographers at the event, working for news outlets, the NFL itself, the individual teams, or whoever. My gut says this is a low estimate, but while we're guessing wildly, let's stay conservative.
If these videographers or their employers are up to snuff, they'll be recording in 4k at 60fps, which takes ~45 GB per hour of footage. Say 50 cameras record the entire four-hour Super Bowl extravaganza at that resolution. That's another 9 TB.
Current total: ~126.4 TB.
Data Source #2: The Network
So far we have mostly talked about data being locally generated and stored on cameras with attached storage. What's also happening is that a huge percentage of this video, particularly the professional footage, is getting beamed out via wired or wireless connections, at which point it is backed up, and passed to video editors to slice up, add animations, and in a few cases, broadcast live, seconds (or less) after the original light hit the camera lens.
In fact, almost all the data that is created at the Super Bowl is going to cross the network at some point. How do all the coaches' Microsoft Surfaces get those instant replays? How do viewers at home get treated to dozens of camera angles on any given play? How does Chad send his "Hi Mom" SnapChat with Cam Newton dabbing tears from his eyes in the background? The answer to all of these is probably the same: over the network. (Even Chad's SnapChat probably went out over Super Bowl 50's free, open WiFi).
So let's go ahead and double our estimate of the data generated in the form of professional footage, because all of it is getting wired somewhere and backed up immediately, and probably backed up more than once.
Besides video, there are other sources of data that we can barely begin to quantify in this thought experiment. Every single one of these network transactions creates a record of itself, and those records are observed, and most of them saved. This is because of the value inherent in that data: the concealed insights into how to win at football and how to sell thirty second adverts for a million dollars, and how to secure a WiFi network accessible to 80,000 people.
Do you have any idea how much data a single credit card transaction creates, or how many credit card swipes there are at the Super Bowl? I don't! What about a the behavioral data that advertisers are scooping up about every single attendee, and every person out there watching the event on TV? How much space does that take? Where does it end up? Is it accurate?
Trying to estimate how much data the Super Bowl creates requires correlating an unfathomable number of factors and a crazy amount of data, and trying to do two things with it:
- Teams can get real-time insights that could influence the outcome of the actual game.
- Sponsors and other commercial entities can get ongoing insights that impact how much money they make at every future Super Bowl.
So How Much Data Is Created By The Super Bowl?
Unknowable. Based on the fact that our estimates got us up over 126 TB of fan-created video alone, I'm going to ballpark the entire event at several petabytes.
This is a company blog. You guys knew I was working an angle, right?
There is literally only one platform in the world that can analyze the volume of data created by the Super Bowl and extract useful insights from it in real time. That's ExtraHop. That's because the vast majority of the data we've discussed travels over the network, and thus can be observed by our platform.
With ExtraHop installed at a financial company, for example, we could potentially log every credit card transaction and tell you which card type was used the most, how many failed transactions there were compared to successful ones. If analyzing ESPN's data center communications, we could tell which moments in the game were most frequently live streamed or replayed by audience members and which plays were correlated with the most tweets. These are just some examples of what you can do with wire data.
Try the free ExtraHop demo to see what I mean.
This is a companion discussion topic for the original entry at https://www.extrahop.com/community/blog/2016/how-much-data-does-the-superbowl-create/