Welcome!

Blog Feed Post

A Bird’s Eye View Via Boxplot

The impact of website/app performance on the bottom line of an Internet firm is an undisputed fact (refer to our earlier blog for further discussion on the subject). Over the years, the industry has come to terms with no longer considering performance as an afterthought and making it a top priority. Now, performance analysis is easier said than done; for instance, let’s carry out a comparative performance analysis – measured via, say, webpage response time – of some of the leading airlines. The plot below shows a week-long snapshot where the aforementioned metric was sampled every 5 minutes (the data was extracted via the Catchpoint portal).

http://assetsblogfly2.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 300w, http://assetsblogfly2.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 768w, http://assetsblogfly2.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 374w, http://assetsblogfly2.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 624w" sizes="(max-width: 846px) 100vw, 846px" />

With increasing maturity of tooling, data collection has become a commodity today. However, any meaningful analysis, even visual analysis, of the plot is not practical. One may wonder what would happen if one were to lax the sampling rate to contain the “too much data” problem?

screen-shot-2017-02-15-at-1-13-27-pmhttp://assetsblogfly2.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 300w, http://assetsblogfly2.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 768w, http://assetsblogfly2.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 374w, http://assetsblogfly2.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 624w" sizes="(max-width: 838px) 100vw, 838px" />

The plot above corresponds to the same time period, but with a sampling period of 15 minutes. The overlap between the time series is still too heavy, thereby making it very hard to derive any material insights. How about laxing the sampling further?

screen-shot-2017-02-15-at-1-14-11-pmhttp://assetsblogfly1.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 300w, http://assetsblogfly1.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 768w, http://assetsblogfly1.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 374w, http://assetsblogfly1.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 624w" sizes="(max-width: 840px) 100vw, 840px" />

The plot above corresponds to the same time period, but with a sampling period of 30 minutes. From the plot above we note that, on average, Alaska Airlines has the best performance and Virgin has the worst performance. Having said that, from the above it is difficult to assess how often each airline experiences a performance hiccup. Concretely speaking, diving deeper to figure out how often one’s website experiences a webpage response time of, say, >3 seconds might lead to a useful discovery regarding user churn. To this end, a common method used is to analyze the probability density distribution of the metric of interest, as exemplified by the plot below (note that the plot below corresponds to data set sampled every 5 minutes).

screen-shot-2017-02-15-at-1-14-53-pmhttp://assetsblogfly1.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 300w, http://assetsblogfly1.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 624w" sizes="(max-width: 767px) 100vw, 767px" />

A lot of valuable insight can be extracted from the plot above based on the following:

  • Relative location of the peaks of each distribution
  • The spread (an indicator of variance) of the distribution
  • The fatness of the tails – this sheds light on the extent that the user base is being impacted, in the current context, outlier webpage response times

Still, the probability density distribution is not conducive to compare the key statistics such as, but not limited to, median, the first and third quartiles, and the density of outliers. Boxplot, proposed for over four decades (see [1] and [7]), is tailor-made for this. An example illustration of a boxplot is shown below.

screen-shot-2017-02-15-at-1-43-25-pmhttp://assetsblogfly2.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 300w, http://assetsblogfly2.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 624w" sizes="(max-width: 654px) 100vw, 654px" />

A boxplot is made up of five components that are carefully chosen to give a robust summary of the distribution of a dataset:

  1. The median
  2. The upper and lower fourth quartiles, commonly referred to as “hinges”
  3. The data values adjacent to the upper and lower fences, which lie 1.5 times the IQR (inter-quartile) range from the hinges
  4. Two whiskers that connect the hinges to the fences
  5. Anomalies, which are data points further away from the fences

Boxplot for the data set sampled every 5 minutes is shown below:

screen-shot-2017-02-15-at-1-16-56-pmhttp://assetsblogfly2.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 300w, http://assetsblogfly2.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 768w, http://assetsblogfly2.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 624w" sizes="(max-width: 863px) 100vw, 863px" />

From the plot above, it is straightforward to compare the various descriptive statistics of webpage response time across different airlines. For instance, although both Southwest and United have a lower median than Delta, the latter has a lower spread (= IQR = height of the box) than the former two. In a similar vein, we note that not only does Virgin has the highest median webpage response time, it also has the highest IQR. This clearly speaks well of the experience of Virgin’s (potential) customers.

One of the common use cases of boxplots is to detect anomalies. Although robust anomaly detection is subject to a multitude of factors, boxplots serve as a first-cut means to filter out potential anomalies. In the case of a standard normal distribution, 0.35% of the data points along each tail are deemed anomalous (see below).

screen-shot-2017-02-15-at-1-17-45-pmhttp://assetsblogfly1.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 300w, http://assetsblogfly1.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 768w, http://assetsblogfly1.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 624w" sizes="(max-width: 851px) 100vw, 851px" />

The limitations of boxplot are that it is primarily suited to:

  • Almost symmetric data
  • Approximately mesokurtic distribution, i.e., distributions with zero excess kurtosis

The above two assumptions do not hold in general for real world data. This is exemplified by the plot of the probability density distribution above.

One way to address the former, i.e., asymmetry, is to use medcouple – a robust metric to measure skewness of a univariate distribution. Using medcouple (MC), the whiskers of the boxplot are redefined as follows:

screen-shot-2017-02-15-at-1-19-34-pmhttp://assetsblogfly2.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 300w, http://assetsblogfly2.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 768w, http://assetsblogfly2.catchpoint.com/wp-content/uploads/2017/02/Screen-S... 624w" sizes="(max-width: 823px) 100vw, 823px" />

A number of techniques have been proposed, see [6, 8, 9], to adapt boxplot to different characteristics of the underlying distribution. Likewise, several variations of boxplots have been proposed, see [2]. In a similar vein, the addition of other graphical elements to display distributional features like kurtosis [3], skewness and multimodality [4], and mean and standard error [5] have been proposed. For instance, varying the width of the box based on the sample size. A user of Catchpoint can plugin a boxplot plotting library of their choice in a straightforward fashion (refer to our earlier blog for this).

By: Arun Kejariwal, Ryan Pellette, and Mehdi Daoudi

 

Readings

[1] “Exploratory Data Analysis”, by J. W. Tukey, Addison–Wesley, 1977.

[2] “Variation of Boxplots”, by R. McGill, J. W. Tukey and W. A. Larsen, 1978.

[3] “Shape-finder box plots”, by M. Aslam and A. Khurshid, 1991.

[4] “Can the box plot be improved?”, by C. Choonpradub and D. McNeil, 2005.

[5] “The shifting boxplot”, by F. Marmolejo-Ramos and T. Tian, 2010.

[6] “An adjusted boxplot for skewed distributions“, by M. Hubert and E. Vendervieren, 2008.

[7] “40 years of Boxplots”, by H. Wickham and L. Stryjewski, 2011. http://vita.had.co.nz/papers/boxplots.pdf

[8] “A generalized boxplot for skewed and heavy-tailed distributions”, by C. Bruffaerts, V. Verardi and C. Vermandele, 2014.

[9] “A Generalized Boxplot for Skewed and Heavy-tailed Distributions implemented in Stata”, by V. Verardi. http://www.stata.com/meeting/uk14/abstracts/materials/uk14_verardi.pdf

The post A Bird’s Eye View Via Boxplot appeared first on Catchpoint's Blog.

Read the original blog entry...

More Stories By Mehdi Daoudi

Catchpoint radically transforms the way businesses manage, monitor, and test the performance of online applications. Truly understand and improve user experience with clear visibility into complex, distributed online systems.

Founded in 2008 by four DoubleClick / Google executives with a passion for speed, reliability and overall better online experiences, Catchpoint has now become the most innovative provider of web performance testing and monitoring solutions. We are a team with expertise in designing, building, operating, scaling and monitoring highly transactional Internet services used by thousands of companies and impacting the experience of millions of users. Catchpoint is funded by top-tier venture capital firm, Battery Ventures, which has invested in category leaders such as Akamai, Omniture (Adobe Systems), Optimizely, Tealium, BazaarVoice, Marketo and many more.

Latest Stories
The Internet of Things can drive efficiency for airlines and airports. In their session at @ThingsExpo, Shyam Varan Nath, Principal Architect with GE, and Sudip Majumder, senior director of development at Oracle, discussed the technical details of the connected airline baggage and related social media solutions. These IoT applications will enhance travelers' journey experience and drive efficiency for the airlines and the airports.
In his keynote at @ThingsExpo, Chris Matthieu, Director of IoT Engineering at Citrix and co-founder and CTO of Octoblu, focused on building an IoT platform and company. He provided a behind-the-scenes look at Octoblu’s platform, business, and pivots along the way (including the Citrix acquisition of Octoblu).
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From ...
The explosion of new web/cloud/IoT-based applications and the data they generate are transforming our world right before our eyes. In this rush to adopt these new technologies, organizations are often ignoring fundamental questions concerning who owns the data and failing to ask for permission to conduct invasive surveillance of their customers. Organizations that are not transparent about how their systems gather data telemetry without offering shared data ownership risk product rejection, regu...
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settle...
In his keynote at @ThingsExpo, Chris Matthieu, Director of IoT Engineering at Citrix and co-founder and CTO of Octoblu, focused on building an IoT platform and company. He provided a behind-the-scenes look at Octoblu’s platform, business, and pivots along the way (including the Citrix acquisition of Octoblu).
In his general session at 18th Cloud Expo, Lee Atchison, Principal Cloud Architect and Advocate at New Relic, discussed cloud as a ‘better data center’ and how it adds new capacity (faster) and improves application availability (redundancy). The cloud is a ‘Dynamic Tool for Dynamic Apps’ and resource allocation is an integral part of your application architecture, so use only the resources you need and allocate /de-allocate resources on the fly.
Containers have changed the mind of IT in DevOps. They enable developers to work with dev, test, stage and production environments identically. Containers provide the right abstraction for microservices and many cloud platforms have integrated them into deployment pipelines. DevOps and Containers together help companies to achieve their business goals faster and more effectively. In his session at DevOps Summit, Ruslan Synytsky, CEO and Co-founder of Jelastic, reviewed the current landscape of D...
The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering Cloud Expo and @ThingsExpo will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at Cloud Expo. Product announcements during our show provide your company with the most reach through our targeted audiences.
Security, data privacy, reliability and regulatory compliance are critical factors when evaluating whether to move business applications from in-house client hosted environments to a cloud platform. In her session at 18th Cloud Expo, Vandana Viswanathan, Associate Director at Cognizant, In this session, will provide an orientation to the five stages required to implement a cloud hosted solution validation strategy.
SYS-CON Events announced today that Outlyer, a monitoring service for DevOps and operations teams, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Outlyer is a monitoring service for DevOps and Operations teams running Cloud, SaaS, Microservices and IoT deployments. Designed for today's dynamic environments that need beyond cloud-scale monitoring, we make monitoring effortless so you...
Cloud Expo, Inc. has announced today that Andi Mann and Aruna Ravichandran have been named Co-Chairs of @DevOpsSummit at Cloud Expo 2017. The @DevOpsSummit at Cloud Expo New York will take place on June 6-8, 2017, at the Javits Center in New York City, New York, and @DevOpsSummit at Cloud Expo Silicon Valley will take place Oct. 31-Nov. 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
DevOps is being widely accepted (if not fully adopted) as essential in enterprise IT. But as Enterprise DevOps gains maturity, expands scope, and increases velocity, the need for data-driven decisions across teams becomes more acute. DevOps teams in any modern business must wrangle the ‘digital exhaust’ from the delivery toolchain, "pervasive" and "cognitive" computing, APIs and services, mobile devices and applications, the Internet of Things, and now even blockchain.
Column Technologies exhibited at SYS-CON's @DevOpsSummit at Cloud Expo, which took place at the Javits Center in New York City, NY, in June 2016. Established in 1998, Column Technologies is a global technology solutions provider with over 400 employees, headquartered in the United States with offices in Canada, India, and the United Kingdom. Column Technologies provides “Best of Breed” technology solutions that automate the key DevOps principals and help our customers meet today’s DevOps and Dig...
20th Cloud Expo, taking place June 6-8, 2017, at the Javits Center in New York City, NY, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy.