Normally I'd be more than happy to hate on averages. I just don't feel like a single calculation is the best way to summarize any and all sets of data - it's mathematically fuzzy and while a simple mean/average may be sufficient in some situations, it's always a good idea to have other options available. Averages are only good summary items for certain types of data sets so the more versatile you can be in summarizing that data set, the better off you'll be in efficiently understanding overall data trends.
The new weighting model options available for trend lines in the 3.10 firmware is perfect for doing just this. There are a variety of weighting models available to summarize metrics available on custom pages. I've outlined what's going on mathematically for each model below as well as some of their strengths and weaknesses, but am looking for suggestions of common use cases for each model. Where can you utilize a linear regression? What's a good situation to measure standard deviation? If you can think of a situation where any of the following may be useful, please leave a comment, or let me know if I can clarify anything else.
TL;DR - Mathematical explanations of what's going on regarding the trending options in the 3.10+ release are outlined below. Please feel free to contribute common use cases for each scenario, or let me know if you have any questions.
Mean - the "average" that is intended to summarize the set of data points. There are a few options on how to calculate this...
Linear Average - same as the arithmetic mean, add all the values up and divide by the count of how many metrics are in the set. Not useful in data sets with large variance.
Single Exponential - accounts for averages over a period of time, and makes the newer averages more "significant" in the calculation of the mean. Good in cases when data is unpredictable, and you want newer trends to be prominent.*
Double Exponential - looks at previous rate changes, trends those rates, and predicts what the most likely rate change for the data set is. Good to use when you want to be warned that something is "spinning" out of control.*
- “The most recent value is weighed at x times the oldest value” option helps you configure how powerful you want observations in the past to be. (Larger values for x will make past metrics less significant.)
Percentile - shows where a portion (percentage) of all metrics in the data set lie
Percentile (value) - answers the question: what value are x% of all metrics below?
Min Value - minimum (least) value of metrics
Max Value - maximum (greatest) value of metrics
Regression - finds the "line of best fit" and answers the question: given the data I've observed, what trends exist?
Linear - finds the trending line (y=mx+b) that has the least amount of "error" in comparison to the actual, observed values (sum of least squares). This method results in portions of straight lines, which is good to use when you expect consistency in data (metrics that are constantly increasing/decreasing/not changing).
2nd Degree Polynomial - Same idea of a linear regression, but this regression measures and graphs the rate of change (trending line represents the degree to which metrics are increasing/decreasing). This results in a hyperbolic (i.e. curved) function. Allows for more variation in trending, and can "bend"/accommodate more to whatever trend is being observed.*
Standard Deviation - describes how "concentrated" data is is. The "type" option picks how the standard deviation is calculated. The "normalization" option allows an extra option to modify standard deviation to a standard scale. Note that technically this measurement is only "valid" on normally distributed data sets, but it can still be used anywhere to give you a general idea of what data looks like.
Population Based - use if you "have" all the data you want to analyze in the trend, or if the data you're feeding to the ExtraHop is all you care about.
Sample Based - this option makes an estimate about everything else going on based on the data you're looking at. Utilize if you want to analyze your data as a portion of the environment, and use the results to make a statement regarding the "bigger picture."
Absolute - graphs the standard deviation as calculated
Relative to Mean - also known as variation coefficient, this is calculated as the standard deviation divided by the mean. This measurement equates to the "number" of standard deviations, whereas "absolute" is a measurement of standard deviation relative to the mean. "Relative to mean" is a good option if you want to compare standard deviations across different environments, where the averages/means may not compare nicely.
- Super simple - this one just graphs a straight line at the specified value.
Time Delta - this is not mathematically challenging, but can still give a lot of insight. If you want to compare current metrics to those seen in a previous time window, use this option. The comparison window is based off of the time window you have set on the main “trend line” tab. This configuration will have a major impact on how to interpret these metrics....
- Same hour/minute of day - This configuration will answer the question, "What was the value of this metric at this same hour/minute of the day exactly x days ago?" Very good for keeping track of long-term trends.
- Hour/minute rolling average - This configuration will answer the question, "What was the value of this metric exactly x hours/minutes ago?" This metric will not present valuable information with traffic that changes day-by-day, but is very good at finding changes that occur rapidly in the environment.
Trimean - find the median (value at 50% of all metrics, known as Q2) and two quartiles (values at 25% and 75% of all metrics, known as Q1 and Q3 respectively). Trimean = (Q1 + (2*Q2) + Q3) / 4.
Trivia - Trimean is best (in comparison to quadmean, quintmean, etc) because three measurement points are the "most efficient" at measuring without having diminishing returns with regards to accuracy. The more you know!
Winsorized Mean - Useful when you want to eliminate the effect of outliers. Depending on the threshold, you replace the highest and lowest values of the dataset with their closest "neighbors" and then calculate the mean on the new data set. Threshold options are as follows:
- 10th/90th percentile - Replace each data point in the lowest 10% of metrics with the value of the data point just above the 10th percentile, and replace each data point in the highest 10% of metrics with the value of the data point just below the 90th percentile.
- 5th/95th percentile - Replace each data point in the lowest 5% of metrics with the value of the data point just above the 5th percentile, and replace each data point in the highest 5% of metrics with the value of the data point just below the 95th percentile.
- 25th/75th percentile - Replace each data point in the lowest 25% of metrics with the value of the data point just above the 25th percentile, and replace each data point in the highest 25% of metrics with the value of the data point just below the 75th percentile.
* Note - any type of exponential (2nd degree) trending option will require three data points to graph.
(Math-y explanation - a trend is composed of two points. So in order to compare trends and graph them, you'll need at least two trends to look at (kind of like how you need at least two points to make a line). Put two trends back to back, and you'll have a minimum of three points since you can use the center point as the end of one trend, and the beginning of the next.)