Welcome!

Related Topics: Linux Containers, Java IoT, Mobile IoT, Microservices Expo, IBM Cloud, Containers Expo Blog, @CloudExpo, @ThingsExpo, @DevOpsSummit

Linux Containers: Article

Top @Docker Metrics | @DevOpsSummit #DevOps #APM #ML #Monitoring

Each container runs a single process, has its own environment, utilizes virtual networks or has many methods of managing storage

Container Monitoring: Top Docker Metrics to Watch
By Stefan Thies

Monitoring of Docker environments is challenging. Why? Because each container typically runs a single process, has its own environment, utilizes virtual networks, or has various methods of managing storage. Traditional monitoring solutions take metrics from each server and applications they run. These servers and applications running on them are typically very static, with very long uptimes. Docker deployments are different: a set of containers may run many applications, all sharing the resources of one or more underlying hosts. It's not uncommon for Docker servers to run thousands of short-term containers (e.g., for batch jobs) while a set of permanent services runs in parallel. Traditional monitoring tools not used to such dynamic environments are not suited for such deployments. On the other hand, some modern monitoring solutions (e.g. SPM from Sematext) were built with such dynamic systems in mind and even have out of the box reporting for docker monitoring. Moreover, container resource sharing calls for stricter enforcement of resource usage limits, an additional issue you must watch carefully. To make appropriate adjustments for resource quotas you need good visibility into any limits containers have reached or errors they have caused. We recommend using alerts according to defined limits; this way you can adjust limits or resource usage even before errors start happening.


Docker Containers != VMs or Servers. Forget your grandpa's old monitoring. Use monitoring designed for Docker.

SPM Docker Metrics Overview

Note: All images in this post are from Sematext's SPM Performance Monitoring tool and its Docker monitoring integration.

Watch Resources of Your Docker Hosts

Host CPU
Understanding the CPU utilization of hosts and containers helps one optimize the resource usage of Docker hosts. The container CPU usage can be throttled in order to avoid a single busy container slowing down other containers by taking away all available CPU resources. Throttling the CPU time is a good way to ensure a minimum of processing power for essential services - it's like the good old nice levels in Unix/Linux.

When the resource usage is optimized, a high CPU utilization might actually be expected and even desired, and alerts might make sense only for when CPU utilisation drops (service outages) or increases for a longer period over some max limit (e.g. 85%).


An overutilized Docker host is a sign of trouble.
An underutilized host is a sign you are wasting money.


Host Memory
The total memory used in each Docker host is important to know for the current operations and for capacity planning. Dynamic cluster managers like Docker Swarm use the total memory available on the host and the requested memory for containers to decide on which host a new container should ideally be launched. Deployments might fail if a cluster manager is unable to find a host with sufficient resources for the container. That's why it is important to know the host memory usage and the memory limits of containers. Adjusting the capacity of new cluster nodes according to the footprint of Docker applications could help optimize the resource usage.


No, Linux didn't eat your RAM.
But when buffered || cached memory goes to 0 it's time to expand the cluster


Host Disk Space
Docker images and containers consume additional disk space. For example, an application image might include a Linux operating system and might have a size of 150-700 MB depending on the size of the base image and installed tools in the container. Persistent Docker volumes consume disk space on the host as well. In our experience watching the disk space and using cleanup tools is essential for continuous operations of Docker hosts.


Good kids clean up their rooms.
Good Docker ops clean up their disks by removing unused containers & images.


DIsk Space on Docker Nodes
Disk space usage on Docker hosts

Because disk space is very critical it makes sense to define alerts for disk space utilization to serve as early warnings and provide enough time to clean up disks or add additional volumes. For example, SPM automatically sets alert rules for disk space usage for you, so you don't have to remember to do it.

A good practice is to run tasks to clean up the disk by removing unused containers and images frequently.

Total Number of Running Containers
The current and historical number of containers is an interesting metric for many reasons. For example, it is very handy during deployments and updates to check that everything is running like before.

When cluster managers like Docker Swarm, Mesos, Kubernetes, CoreOS/Fleet automatically schedule containers to run on different hosts using different scheduling policies, the number of containers running on each host can help one verify the activated scheduling policies. A stacked bar chart displaying the number of containers on each host and the total number of containers provides a quick visualization of how the cluster manager distributed the containers across the available hosts.

Docker Container Count

Container counts per Docker host over time


Use anomaly detection, not threshold-based alerts
to catch sudden container migrations that mean trouble.


This metric can have different "patterns" depending on the use case. For example, batch jobs running in containers vs. long running services commonly result in different container count patterns. A batch job typically starts a container on demand, or starts it periodically, and the container with that job terminates after a relatively short time. In such a scenario one might see a big variation in the number of containers running resulting in a "spiky" container count metric. On the other hand, long running services such as web servers or databases typically run until they get re-deployed during software updates. Although scaling mechanisms might increase or decrease the number of containers depending on load, traffic, and other factors, the container count metric will typically be relatively steady because in such cases containers are often added and removed more gradually. Because of that, there is no general pattern we could use for a default Docker alert rule on the number of running containers.

Nevertheless, alerts based on anomaly detection, which detect sudden changes in the number of the containers in total (or for specific hosts) in a short time window, can be very handy for most of the use cases. The simple threshold-based alerts make sense only when the maximum or minimum number of running containers is known, and in dynamic environments that scale up and down based on external factors, this is often not the case.

Container Metrics
Container metrics are basically the same metrics available for every Linux process, but include limits set via cgroups by Docker, such as limits for CPU or memory usage. Please note that sophisticated monitoring solutions like SPM for Docker are able to aggregate Container Metrics on different levels like Docker Hosts/Cluster Nodes, Image Name or ID and Container Name or ID. Having the ability to do that makes it easy to track resources usage by hosts, application types (image names) or specific containers. In the following examples we might use aggregations on various levels.


Use modern Docker monitoring solutions
to slice & dice by host, node, image or container.
You'll need that.


Container CPU - Throttled CPU Time
One of the most basic bits of information is information about how much CPU is being consumed by all containers, images, or by specific containers. A great advantage of using Docker is the capability to limit CPU utilisation by containers. Of course, you can't tune and optimize something if you don't measure it, so monitoring such limits is the prerequisite. Observing the total time that a container's CPU usage was throttled provides the information one needs to adjust the setting for CPU shares in Docker. Please note that CPU time is throttled only when the host CPU usage is maxed out. As long as the host has spare CPU cycles available for Docker it will not throttle containers' CPU usage. Therefore, the throttled CPU is typically zero and a spike of this metric is a typically a good indication of one or more containers needing more CPU power than the host can provide.

Docker Container CPU usage

Container CPU usage and throttled CPU time

The following screenshot shows containers with 5% CPU quota using the command "docker run -cpu-quota=5000 nginx", we see clearly how the throttled CPU grows until it reaches around 5%, enforced by the Docker engine.

CPU-throttled-time

Container CPU usage and throttled CPU time with CPU quota of 5%

Container Memory - Fail Counters
It is a good practice to set memory limits for containers. Doing that helps avoid a memory-hungry container taking all available memory and starving all other containers on the same server. Runtime constraints on resources can be defined in the Docker run command. For example, "-m 300M" sets the memory limit for the container to 300 MB. Docker exposes a metric called container memory fail counters. This counter is increased each time memory allocation fails - that is, each time the pre-set memory limit is hit. Thus, spikes in this metric indicate one or more containers needing more memory than was allocated. If the process in the container terminates because of this error, we might also see out of memory events from Docker.


Docker Memory Fail Counters tell you when containers need more memory.
Alerts are your friends.


A spike in memory fail counters is a critical event and putting alerts on the memory fail counter is very helpful to detect wrong settings for the memory limits or to discover containers that try to consume more memory than expected.

image03

Container Memory Usage
Different applications have different memory footprints. Knowing the memory footprint of the application containers is important for having a stable environment. Container memory limits ensure that applications perform well, without using too much memory, which could affect other containers on the same host. The best practice is to tune memory setting in a few iterations:

  1. Monitor memory usage of the application container
  2. Set memory limits according to the observations
  3. Continue monitoring of memory, memory fail counters, and Out-Of-Memory events. If OOM events happen, the container memory limits may need to be increased, or debugging is required to find the reason for the high memory consumptions.

Docker Container Memory Usage

Container memory usage

Container Swap
Like the memory of any other process, a container's memory could be swapped to disk. For applications like Elasticsearch or Solr one often finds instructions to deactivate swap on the Linux host - but if you run such applications on Docker it might be sufficient just to set "-memory-swap=-1" in the Docker run command!


Don't like to see your container swapping?
Use -memory-swap=-1 in the Docker run command and be done with it!


Docker Container Swap

Container swap, memory pages, and swap rate

Container Disk I/O
In Docker multiple applications use the same resources concurrently. Thus, watching the disk I/O helps one define limits for specific applications and give higher throughput to critical applications like data stores or web servers, while throttling disk I/O for batch operations. For example, the command docker run -it -device-write-bps /dev/sda:1mb mybatchjob would limit the container disk writes to a maximum of 1 MB/s.

Docker Container I/O

Container I/O throughput


To limit a Docker container from eating all your disk IO use
e.g. -device-write-bps /dev/sda:1mb


Container Network Metrics
Networking for containers can be very challenging. By default all containers share a network, or containers might be linked together to share a separated network on the same host. However, when it comes to networking between containers running on different hosts an overlay network is required, or containers could share the host network. Having many options for network configurations means there are many possible causes of network errors.

Moreover, not only errors or dropped packets are important to watch out for. Today, most of the applications are deeply dependent on network communication. Throughput of virtual networks could be a bottleneck especially for containers like load balancers. In addition, the network traffic might be a good indicator how much applications are used by clients and sometimes you might see high spikes, which could indicate denial of service attacks, load tests, or a failure in client apps. So watch the network traffic - it is a useful metric in many cases.

Docker Container Network I/o

Network traffic and transmission rates

Summary
There you have it - the top Docker metrics to watch. Staying focused on these top metrics and corresponding analysis will help you stay on the road while driving towards successful Docker deployments on many platforms such as Docker Swarm, Docker Cloud, Docker Datacenter or any other platform supporting Docker containers. If you'd like to learn even more about Docker Monitoring and Logging stay tuned on sematext.com/blog or follow @sematext.

The World's Largest "Cloud Digital Transformation" Event

@CloudExpo / @ThingsExpo 2017 New York 
(June 6-8, 2017, Javits Center, Manhattan)

@CloudExpo / @ThingsExpo 2017 Silicon Valley
(Oct. 31 - Nov. 2, 2017, Santa Clara Convention Center, CA)

Full Conference Registration Gold Pass and Exhibit Hall ▸ Here

Register For @CloudExpo ▸ Here via EventBrite

Register For @ThingsExpo ▸ Here via EventBrite

Register For @DevOpsSummit ▸ Here via EventBrite

Sponsorship Opportunities

Sponsors of Cloud Expo @ThingsExpo will benefit from unmatched branding, profile building and lead generation opportunities through:

  • Featured on-site presentation and ongoing on-demand webcast exposure to a captive audience of industry decision-makers
  • Showcase exhibition during our new extended dedicated expo hours
  • Breakout Session Priority scheduling for Sponsors that have been guaranteed a 35 minute technical session
  • Online targeted advertising in SYS-CON's i-Technology Publications
  • Capitalize on our Comprehensive Marketing efforts leading up to the show with print mailings, e-newsletters and extensive online media coverage
  • Unprecedented Marketing Coverage: Editorial Coverage on ITweetup to over 100,000 plus followers, press releases sent on major wire services to over 500 industry analysts

For more information on sponsorship, exhibit, and keynote opportunities, contact Carmen Gonzalez (@GonzalezCarmen) today by email at events (at) sys-con.com, or by phone 201 802-3021.

Secrets of Sponsors and Exhibitors ▸ Here
Secrets of Cloud Expo Speakers ▸ Here

All major researchers estimate there will be tens of billions devices - computers, smartphones, tablets, and sensors - connected to the Internet by 2020. This number will continue to grow at a rapid pace for the next several decades.

With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend @CloudExpo@ThingsExpo, June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA. Learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.

Track 1. FinTech
Track 2. Enterprise Cloud | Digital Transformation
Track 3. DevOps, Containers & Microservices 
Track 4. Big Data | Analytics
Track 5. Industrial IoT
Track 6. IoT Dev & Deploy | Mobility
Track 7. APIs | Cloud Security
Track 8. AI | ML | DL | Cognitive Computing

Delegates to Cloud Expo @ThingsExpo will be able to attend 8 simultaneous, information-packed education tracks.

There are over 120 breakout sessions in all, with Keynotes, General Sessions, and Power Panels adding to three days of incredibly rich presentations and content.

Join Cloud Expo @ThingsExpo conference chair Roger Strukhoff (@IoT2040), June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA for three days of intense Enterprise Cloud and 'Digital Transformation' discussion and focus, including Big Data's indispensable role in IoT, Smart Grids and (IIoT) Industrial Internet of Things, Wearables and Consumer IoT, as well as (new) Digital Transformation in Vertical Markets.

Financial Technology - or FinTech - Is Now Part of the @CloudExpo Program!

Accordingly, attendees at the upcoming 20th Cloud Expo @ThingsExpo June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA will find fresh new content in a new track called FinTech, which will incorporate machine learning, artificial intelligence, deep learning, and blockchain into one track.

Financial enterprises in New York City, London, Singapore, and other world financial capitals are embracing a new generation of smart, automated FinTech that eliminates many cumbersome, slow, and expensive intermediate processes from their businesses.

FinTech brings efficiency as well as the ability to deliver new services and a much improved customer experience throughout the global financial services industry. FinTech is a natural fit with cloud computing, as new services are quickly developed, deployed, and scaled on public, private, and hybrid clouds.

More than US$20 billion in venture capital is being invested in FinTech this year. @CloudExpo is pleased to bring you the latest FinTech developments as an integral part of our program, starting at the 20th International Cloud Expo June 6-8, 2017 in New York City and October 31 - November 2, 2017 in Silicon Valley.

@CloudExpo is accepting submissions for this new track, so please visit www.CloudComputingExpo.com for the latest information.

Speaking Opportunities

The upcoming 20th International @CloudExpo@ThingsExpo, June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA announces that its Call For Papers for speaking opportunities is open.

Submit your speaking proposal today! ▸ Here

Our Top 100 Sponsors and the Leading "Digital Transformation" Companies

(ISC)2, 24Notion (Bronze Sponsor), 910Telecom, Accelertite (Gold Sponsor), Addteq, Adobe (Bronze Sponsor), Aeroybyte, Alert Logic, Anexia, AppNeta, Avere Systems, BMC Software (Silver Sponsor), Bsquare Corporation (Silver Sponsor), BZ Media (Media Sponsor), Catchpoint Systems (Silver Sponsor), CDS Global Cloud, Cemware, Chetu Inc., China Unicom, Cloud Raxak, CloudBerry (Media Sponsor), Cloudbric, Coalfire Systems, CollabNet, Inc. (Silver Sponsor), Column Technologies, Commvault (Bronze Sponsor), Connect2.me, ContentMX (Bronze Sponsor), CrowdReviews (Media Sponsor) CyberTrend (Media Sponsor), DataCenterDynamics (Media Sponsor), Delaplex, DICE (Bronze Sponsor), EastBanc Technologies, eCube Systems, Embotics, Enzu Inc., Ericsson (Gold Sponsor), FalconStor, Formation Data Systems, Fusion, Hanu Software, HGST, Inc. (Bronze Sponsor), Hitrons Solutions, IBM BlueBox, IBM Bluemix, IBM Cloud (Platinum Sponsor), IBM Cloud Data Services/Cloudant (Platinum Sponsor), IBM DevOps (Platinum Sponsor), iDevices, Industrial Internet of Things Consortium (Association Sponsor), Impinger Technologies, Interface Masters, Intel (Keynote Sponsor), Interoute (Bronze Sponsor), IQP Corporation, Isomorphic Software, Japan IoT Consortium, Kintone Corporation (Bronze Sponsor), LeaseWeb USA, LinearHub, MangoApps, MathFreeOn, Men & Mice, MobiDev, New Relic, Inc. (Bronze Sponsor), New York Times, Niagara Networks, Numerex, NVIDIA Corporation (AI Session Sponsor), Object Management Group (Association Sponsor), On The Avenue Marketing, Oracle MySQL, Peak10, Inc., Penta Security, Plasma Corporation, Pulzze Systems, Pythian (Bronze Sponsor), Cosmos, RackN, ReadyTalk (Silver Sponsor), Roma Software, Roundee.io, Secure Channels Inc., SD Times (Media Sponsor), SoftLayer (Platinum Sponsor), SoftNet Solutions, Solinea Inc., SpeedyCloud, SSLGURU LLC, StarNet, Stratoscale, Streamliner, SuperAdmins, TechTarget (Media Sponsor), TelecomReseller (Media Sponsor), Tintri (Welcome Reception Sponsor), TMCnet (Media Sponsor), Transparent Cloud Computing Consortium, Veeam, Venafi, Violin Memory, VAI Software, Zerto

About SYS-CON Media & Events
SYS-CON Media (www.sys-con.com) has since 1994 been connecting technology companies and customers through a comprehensive content stream - featuring over forty focused subject areas, from Cloud Computing to Web Security - interwoven with market-leading full-scale conferences produced by SYS-CON Events. The company's internationally recognized brands include among others Cloud Expo® (@CloudExpo), Big Data Expo® (@BigDataExpo), DevOps Summit (@DevOpsSummit), @ThingsExpo® (@ThingsExpo), Containers Expo (@ContainersExpo) and Microservices Expo (@MicroservicesE).

Cloud Expo®, Big Data Expo® and @ThingsExpo® are registered trademarks of Cloud Expo, Inc., a SYS-CON Events company.

More Stories By Sematext Blog

Sematext is a globally distributed organization that builds innovative Cloud and On Premises solutions for performance monitoring, alerting and anomaly detection (SPM), log management and analytics (Logsene), and search analytics (SSA). We also provide Search and Big Data consulting services and offer 24/7 production support for Solr and Elasticsearch.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Latest Stories
The cloud promises new levels of agility and cost-savings for Big Data, data warehousing and analytics. But it’s challenging to understand all the options – from IaaS and PaaS to newer services like HaaS (Hadoop as a Service) and BDaaS (Big Data as a Service). In her session at @BigDataExpo at @ThingsExpo, Hannah Smalltree, a director at Cazena, provided an educational overview of emerging “as-a-service” options for Big Data in the cloud. This is critical background for IT and data professionals...
@GonzalezCarmen has been ranked the Number One Influencer and @ThingsExpo has been named the Number One Brand in the “M2M 2016: Top 100 Influencers and Brands” by Onalytica. Onalytica analyzed tweets over the last 6 months mentioning the keywords M2M OR “Machine to Machine.” They then identified the top 100 most influential brands and individuals leading the discussion on Twitter.
Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like “How is my application doing” but no id...
What happens when the different parts of a vehicle become smarter than the vehicle itself? As we move toward the era of smart everything, hundreds of entities in a vehicle that communicate with each other, the vehicle and external systems create a need for identity orchestration so that all entities work as a conglomerate. Much like an orchestra without a conductor, without the ability to secure, control, and connect the link between a vehicle’s head unit, devices, and systems and to manage the ...
More and more brands have jumped on the IoT bandwagon. We have an excess of wearables – activity trackers, smartwatches, smart glasses and sneakers, and more that track seemingly endless datapoints. However, most consumers have no idea what “IoT” means. Creating more wearables that track data shouldn't be the aim of brands; delivering meaningful, tangible relevance to their users should be. We're in a period in which the IoT pendulum is still swinging. Initially, it swung toward "smart for smar...
"We are an all-flash array storage provider but our focus has been on VM-aware storage specifically for virtualized applications," stated Dhiraj Sehgal of Tintri in this SYS-CON.tv interview at 19th Cloud Expo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
In an era of historic innovation fueled by unprecedented access to data and technology, the low cost and risk of entering new markets has leveled the playing field for business. Today, any ambitious innovator can easily introduce a new application or product that can reinvent business models and transform the client experience. In their Day 2 Keynote at 19th Cloud Expo, Mercer Rowe, IBM Vice President of Strategic Alliances, and Raejeanne Skillern, Intel Vice President of Data Center Group and G...
"We are a modern development application platform and we have a suite of products that allow you to application release automation, we do version control, and we do application life cycle management," explained Flint Brenton, CEO of CollabNet, in this SYS-CON.tv interview at DevOps at 19th Cloud Expo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
Information technology is an industry that has always experienced change, and the dramatic change sweeping across the industry today could not be truthfully described as the first time we've seen such widespread change impacting customer investments. However, the rate of the change, and the potential outcomes from today's digital transformation has the distinct potential to separate the industry into two camps: Organizations that see the change coming, embrace it, and successful leverage it; and...
In IT, we sometimes coin terms for things before we know exactly what they are and how they’ll be used. The resulting terms may capture a common set of aspirations and goals – as “cloud” did broadly for on-demand, self-service, and flexible computing. But such a term can also lump together diverse and even competing practices, technologies, and priorities to the point where important distinctions are glossed over and lost.
All clouds are not equal. To succeed in a DevOps context, organizations should plan to develop/deploy apps across a choice of on-premise and public clouds simultaneously depending on the business needs. This is where the concept of the Lean Cloud comes in - resting on the idea that you often need to relocate your app modules over their life cycles for both innovation and operational efficiency in the cloud. In his session at @DevOpsSummit at19th Cloud Expo, Valentin (Val) Bercovici, CTO of Soli...
Without a clear strategy for cost control and an architecture designed with cloud services in mind, costs and operational performance can quickly get out of control. To avoid multiple architectural redesigns requires extensive thought and planning. Boundary (now part of BMC) launched a new public-facing multi-tenant high resolution monitoring service on Amazon AWS two years ago, facing challenges and learning best practices in the early days of the new service. In his session at 19th Cloud Exp...
Predictive analytics tools monitor, report, and troubleshoot in order to make proactive decisions about the health, performance, and utilization of storage. Most enterprises combine cloud and on-premise storage, resulting in blended environments of physical, virtual, cloud, and other platforms, which justifies more sophisticated storage analytics. In his session at 18th Cloud Expo, Peter McCallum, Vice President of Datacenter Solutions at FalconStor, discussed using predictive analytics to mon...
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo 2016 in New York. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place June 6-8, 2017, at the Javits Center in New York City, New York, is co-located with 20th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry p...
Keeping pace with advancements in software delivery processes and tooling is taxing even for the most proficient organizations. Point tools, platforms, open source and the increasing adoption of private and public cloud services requires strong engineering rigor – all in the face of developer demands to use the tools of choice. As Agile has settled in as a mainstream practice, now DevOps has emerged as the next wave to improve software delivery speed and output. To make DevOps work, organization...