Welcome!

Blog Feed Post

Using the Right Mean for Meaningful Performance Analysis

Performance analytics is a field which deals with huge discrete data sets that need to be grouped, organized, and aggregated to gain an understanding of the data. Synthetic and real user monitoring are the two most popular techniques to evaluate the performance of websites; both these techniques use historical data sets to evaluate performance.

In web performance analytics, it is preferred to use statistical values that describe a central tendency ( the odd numbermeasure of central location) for the discrete data set under observation. The statistical metric can be used to evaluate and analyze the data. These data sets have innumerable data points that need to be aggregated using different statistical approaches.

With the number of statistical metrics available, the big question is how do you determine the right statistical metric for a given data set. Mean, Median, and Geometric Mean are all valid measures of central tendency, but under different conditions, some measures of central tendency are more appropriate to use than others.

This article discusses different statistical approaches used in the world of web performance evaluation and the methods preferred in different contexts of performance analysis using real-world performance data.

Common Statistical Metrics

  • Arithmetic Mean (Average)

The average is used to describe a single central value in a large set of discrete data. The mathematical formula to calculate the average isThe average is equal to the sum of all data points divided by the number of items, where ‘n’ represents the number of data samples.

  • Median

Median is the middle score for a set of data that has been arranged in the order of magnitude. Let us consider a set of data point as [12, 31, 44, 47, 22, 18, 60, 75, 80]. To get the median of the data set the data points need to be sorted in ascending order.

12, 18, 22, 31, 44, 47, 60, 75, 80

The median for the above data set is ’44’ as the middle item is (n+1)/2 if odd number of items. The median would be n/2 if there is even number of items in the series.

  • Geometric Mean

Geometric mean is the nth positive root of the product of n positive given values. The mathematical formula to calculate the geometric mean for X containing n discrete set of data points is

  • Standard Deviation

Standard deviation is used for measuring the extent of variation of the data samples around the center. The mathematical formulae to calculate the standard deviation for a set of data samples is

Where ‘a’ denotes the average of ‘n’ data samples of value ‘x’.

Determining the Right Statistical Approach

The two graphs below illustrate the different data distributions we come across in web performance monitoring. Using the formulae explained above, we have derived the average, median and the geometric mean of the webpage load time for website A and B.

Webpage load time Website A

http://blog.catchpoint.com/wp-content/uploads/2017/05/stat4-300x130.png 300w, http://blog.catchpoint.com/wp-content/uploads/2017/05/stat4-768x332.png 768w" sizes="(max-width: 993px) 100vw, 993px" />

 Webpage load time Website B

http://blog.catchpoint.com/wp-content/uploads/2017/05/stat5-300x123.png 300w, http://blog.catchpoint.com/wp-content/uploads/2017/05/stat5-768x314.png 768w" sizes="(max-width: 994px) 100vw, 994px" />

Let us discuss a few use cases to understand how different statistical metrics are applicable in different scenarios.

USE CASE 1

G1 – Scatter plot showing webpage load time data set

http://blog.catchpoint.com/wp-content/uploads/2017/05/stat6-300x227.png 300w" sizes="(max-width: 500px) 100vw, 500px" />

G2 – Histogram shown the distribution of data

http://blog.catchpoint.com/wp-content/uploads/2017/05/stat7-300x223.png 300w" sizes="(max-width: 492px) 100vw, 492px" />

The graphs G1 and G2 plots data for webpage load time. The uneven distribution of the data points in the scatterplot and histogram helps us understand how inconsistent the load time is.

We can see a higher number of data points in the trailing end of the Gaussian distribution in the histogram (G2); this means that most of the data points are of higher value.

What would be a good statistical metric in such cases? Before answering this, lets us take an example. Consider the following data set

Data Set = [4,4.3,5,6.5,6.8,7,7.2,20,30]

If we use median it gives a value of 6.8. But most of the data points tend towards a higher range with 30 being the highest. So, taking the median value in cases with higher outliers is not an accurate estimate of the page load time. Median should be used for data sets with fewer outliers and values that are concentrated towards the center of the Gaussian distribution.

Now let us take the average for this same data set. This gives us a value of 27.4 which is slightly more skewed towards the outlier values. Once again, the average is not an accurate measure for web page load time.

Since median and average don’t apply to this set of data, let us consider the geometric mean. We get a value of 7.8 using geometric mean; this value is closer to the central value and is not skewed to the higher or lower values in the data set.

In this use case, we have determined the geometric mean as the most accurate statistical method to analyze the data.

USE CASE 2

G3 – Scatter plot showing webpage load time data set

http://blog.catchpoint.com/wp-content/uploads/2017/05/stat8-300x194.png 300w" sizes="(max-width: 462px) 100vw, 462px" />

G4 – Histogram shown the distribution of data

http://blog.catchpoint.com/wp-content/uploads/2017/05/stat9-300x230.png 300w" sizes="(max-width: 483px) 100vw, 483px" />

In the graphs above (G3 and G4), most of the data points are close to each other with a higher population in the center of Gaussian surface. The difference between each of the data points are much less than the distribution considered in the previous scenario. This indicates a consistent page load time across different test runs.

Using average or median to evaluate the central tendency would be more accurate in this case as there are not many outliers so the average wouldn’t be skewed towards the outlier values.

USE CASE 3

  Website A

http://blog.catchpoint.com/wp-content/uploads/2017/05/stat10-300x123.png 300w, http://blog.catchpoint.com/wp-content/uploads/2017/05/stat10-768x314.png 768w" sizes="(max-width: 832px) 100vw, 832px" />

Website B

http://blog.catchpoint.com/wp-content/uploads/2017/05/stat11-300x123.png 300w, http://blog.catchpoint.com/wp-content/uploads/2017/05/stat11-768x316.png 768w" sizes="(max-width: 827px) 100vw, 827px" />

 

The above data distribution shows the webpage load time for two different websites. In performance analysis, we need to evaluate the consistency of a webpage. And if there is high volatility in the page performance then we should be able to measure the difference between the central value versus the outliers.

In this case, the standard deviation values are 9.1 and 1.7 seconds for website A and B respectively while the median for website A and B are 26.6 and 18.1 seconds. Based on the standard deviation values, we see there are data points for website A at 36 secs (median + SD) and website B at 20 secs (median + SD). This means that website A had high number of data points concentrated at 36 secs or more and website B had high number data points concentrated at 20 secs or more.

To know what percent of data had higher value when compared to the standard deviation we can use the cumulative distribution graph.

Website A                                                                     
http://blog.catchpoint.com/wp-content/uploads/2017/05/stat12-300x127.png 300w, http://blog.catchpoint.com/wp-content/uploads/2017/05/stat12-768x324.png 768w" sizes="(max-width: 850px) 100vw, 850px" />
 Website B

http://blog.catchpoint.com/wp-content/uploads/2017/05/stat13-300x127.png 300w, http://blog.catchpoint.com/wp-content/uploads/2017/05/stat13-768x325.png 768w" sizes="(max-width: 827px) 100vw, 827px" />

From the cumulative distribution graph shown above we can see that website A had almost 20% of data points higher than the standard deviation values whereas website B had 10% of data more than standard deviation value.

Standard deviation can be used for evaluating how far and consistent the data points are with respect to the central value of data distribution in performance analysis.

 

Median and average are applicable when the data points are concentrated towards the center of the Gaussian distribution. On the other hand, if there are more data points distributed towards the tail of the Gaussian distribution and there is a high difference between each data point then geometric mean would be a better choice. Standard deviation should be used to understand the variance of the data points from the median value and to gauge the consistency of the sites performance.

 

The post Using the Right Mean for Meaningful Performance Analysis appeared first on Catchpoint's Blog - Web Performance Monitoring.

Read the original blog entry...

More Stories By Mehdi Daoudi

Catchpoint radically transforms the way businesses manage, monitor, and test the performance of online applications. Truly understand and improve user experience with clear visibility into complex, distributed online systems.

Founded in 2008 by four DoubleClick / Google executives with a passion for speed, reliability and overall better online experiences, Catchpoint has now become the most innovative provider of web performance testing and monitoring solutions. We are a team with expertise in designing, building, operating, scaling and monitoring highly transactional Internet services used by thousands of companies and impacting the experience of millions of users. Catchpoint is funded by top-tier venture capital firm, Battery Ventures, which has invested in category leaders such as Akamai, Omniture (Adobe Systems), Optimizely, Tealium, BazaarVoice, Marketo and many more.

Latest Stories
Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, will discuss how from store operations...
In his session at 21st Cloud Expo, James Henry, Co-CEO/CTO of Calgary Scientific Inc., will introduce you to the challenges, solutions and benefits of training AI systems to solve visual problems with an emphasis on improving AIs with continuous training in the field. He will explore applications in several industries and discuss technologies that allow the deployment of advanced visualization solutions to the cloud.
We all know that end users experience the Internet primarily with mobile devices. From an app development perspective, we know that successfully responding to the needs of mobile customers depends on rapid DevOps – failing fast, in short, until the right solution evolves in your customers' relationship to your business. Whether you’re decomposing an SOA monolith, or developing a new application cloud natively, it’s not a question of using microservices – not doing so will be a path to eventual b...
Nordstrom is transforming the way that they do business and the cloud is the key to enabling speed and hyper personalized customer experiences. In his session at 21st Cloud Expo, Ken Schow, VP of Engineering at Nordstrom, will discuss some of the key learnings and common pitfalls of large enterprises moving to the cloud. This includes strategies around choosing a cloud provider(s), architecture, and lessons learned. In addition, he’ll go over some of the best practices for structured team migrat...
As people view cloud as a preferred option to build IT systems, the size of the cloud-based system is getting bigger and more complex. As the system gets bigger, more people need to collaborate from design to management. As more people collaborate to create a bigger system, the need for a systematic approach to automate the process is required. Just as in software, cloud now needs DevOps. In this session, the audience can see how people can solve this issue with a visual model. Visual models ha...
SYS-CON Events announced today that Taica will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Taica manufacturers Alpha-GEL brand silicone components and materials, which maintain outstanding performance over a wide temperature range -40C to +200C. For more information, visit http://www.taica.co.jp/english/.
SYS-CON Events announced today that MIRAI Inc. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MIRAI Inc. are IT consultants from the public sector whose mission is to solve social issues by technology and innovation and to create a meaningful future for people.
Recently, REAN Cloud built a digital concierge for a North Carolina hospital that had observed that most patient call button questions were repetitive. In addition, the paper-based process used to measure patient health metrics was laborious, not in real-time and sometimes error-prone. In their session at 21st Cloud Expo, Sean Finnerty, Executive Director, Practice Lead, Health Care & Life Science at REAN Cloud, and Dr. S.P.T. Krishnan, Principal Architect at REAN Cloud, will discuss how they bu...
As hybrid cloud becomes the de-facto standard mode of operation for most enterprises, new challenges arise on how to efficiently and economically share data across environments. In his session at 21st Cloud Expo, Dr. Allon Cohen, VP of Product at Elastifile, will explore new techniques and best practices that help enterprise IT benefit from the advantages of hybrid cloud environments by enabling data availability for both legacy enterprise and cloud-native mission critical applications. By rev...
Join IBM November 1 at 21st Cloud Expo at the Santa Clara Convention Center in Santa Clara, CA, and learn how IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Cognitive analysis impacts today’s systems with unparalleled ability that were previously available only to manned, back-end operations. Thanks to cloud processing, IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Imagine a robot vacuum that becomes your personal assistant tha...
The next XaaS is CICDaaS. Why? Because CICD saves developers a huge amount of time. CD is an especially great option for projects that require multiple and frequent contributions to be integrated. But… securing CICD best practices is an emerging, essential, yet little understood practice for DevOps teams and their Cloud Service Providers. The only way to get CICD to work in a highly secure environment takes collaboration, patience and persistence. Building CICD in the cloud requires rigorous ar...
SYS-CON Events announced today that Datera will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datera offers a radically new approach to data management, where innovative software makes data infrastructure invisible, elastic and able to perform at the highest level. It eliminates hardware lock-in and gives IT organizations the choice to source x86 server nodes, with business model option...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
Infoblox delivers Actionable Network Intelligence to enterprise, government, and service provider customers around the world. They are the industry leader in DNS, DHCP, and IP address management, the category known as DDI. We empower thousands of organizations to control and secure their networks from the core-enabling them to increase efficiency and visibility, improve customer service, and meet compliance requirements.
Digital transformation is changing the face of business. The IDC predicts that enterprises will commit to a massive new scale of digital transformation, to stake out leadership positions in the "digital transformation economy." Accordingly, attendees at the upcoming Cloud Expo | @ThingsExpo at the Santa Clara Convention Center in Santa Clara, CA, Oct 31-Nov 2, will find fresh new content in a new track called Enterprise Cloud & Digital Transformation.