Welcome!

Blog Feed Post

Node.Js server monitoring, part 2

node-js-monitorLast time we mentioned two fundamental principles while monitoring any object:

1. The monitor should collect as much important information as possible that will allow to accurately evaluate the health state of an object.
2. The monitor should have little to no effect on the activity of the object.

Sure, these two principles work against each other in most of cases, but with Node.js they work together quite nicely because Node.js is based on event-driven technology and doesn’t use the traditional threads-driven approach. This technology allows to register many listeners for one event and process them in parallel almost independently. To avoid even a small effect on the production server, it was decided to separate the monitor into two parts – the first is the javascript module-plugin that listens to all server events and accumulates necessary information and the second is the Linux shell script that periodically runs the monitor-plugin by using the REST technique for collecting, processing, and sending information to the Monitis main server.

Normally, it is necessary to add couple lines in your existing Node.js server code to activate the monitor-plugin:

var monitor = require('monitor');// insert monitor module-plugin
....

var server = … // the definition of current Node.js server

….

monitor.Monitor(server); //add server to monitor

Now the monitor will begin collecting and measuring data. The monitor-plugin has an embedded simple HTTP server that sends accumulated data by request and should correspond (in current implementation) to the following pattern

http://127.0.0.1:10010/node_monitor?action=getdata&access_code=

where
10010 – the listen port of monitor-plugin (configurable)
‘node_monitor’ – the pathname keyword
‘action=getdata’ – command for getting collected data
‘access_code’ – the specially generated access code that changes for every session

Please notice that monitis-plugin server (in current implementation) listens the localhost only. This and usage of the security access code for every session gives enough security while monitoring. More detailed information can be found along with implemented code in the github repository.

Server monitoring metrics

There are a largely standard set of metrics which can be used to monitor the underlying health of any server.

* CPU Usage describes the level of utilisation of the system CPU(s) and is usually broken down into three states.
o IO Wait – indicates the proportion of CPU cycles spent waiting for IO (disk or network) events. If you experience large IO Wait proportions, it can indicate that your disks are causing a performance bottleneck.
o System – indicates the proportion of CPU cycles spent performing kernel-level processing. Generally you will find only a small proportion of your CPU cycles are spent on system tasks, Hence if you see spikes it could indicate a problem.
o User – indicates the proportion of CPU cycles spent performing user instigated processing. This is where you should see the bulk of your CPU cycles consumed; it includes activities such as web serving, application execution, and every other process not owned by the kernel.
o Idle – indicates the spare CPU capacity you have – all the cycles where the CPU is, quite literally, doing nothing.
* Load Average is a metric that indicates the level of load that a server is under at a given point in time. Usually evaluated as number of requests per second.
* RAM usage by server is broken down usually into the following parts.
o Free – the amount of unallocated RAM available. Linux systems tend to keep this as low as possible; and do not free up the system’s physical RAM until it is requested by another process.
o Inactive – RAM that is in-use for buffers and page caching, but hasn’t been used recently so will be reclaimed first for use by a running process.
o Active – RAM that has been used recently and will not be reclaimed unless we have insufficient Inactive RAM to claim from. In Linux systems this is generally the one to keep an eye on. Sudden, rapid increases signal a memory hungry process that will soon cause VM swapping to occur.
* The server uptime is a metric showing the elapsed time since the last reboot. Non-linear behavior of the server uptime line indicates that the server was rebooted somehow.
* Throughput – the amount of data traffic passing through the servers’ network interface is fundamentally important. It is usually broken down into inbound and outbound throughput and normally measured as average values for some period by kbit or kbyte per sec.
* Server Response time is defined as the duration from receiving a request to sending a response. Normally, it should not exceed reasonable timeout (usually this depends on the complexity of processing) but should be as little as possible. Usually, the average and peak response time is evaluated for some time period.
* The count of successfully processed requests is evaluated as the percentage of responses with 2xx status codes (success) to requests during observing time. This value should be as close as possible to 100%.

We have used part of these metrics and added some specific statistics that are important for our task (e.g. client platform, detailed info for response codes, etc.)

Test results

The results below were obtained on a Node.js server equipped with the monitor described above on a Debian6-x64. The server listens on HTTP (81) and HTTPS (443) ports and does not have a large load.


By double-clicking on a line you can view a specific part of the monitoring data.

The data can be shown in graphical view too.



In conclusion, the monitoring system has successfully tracked the metrics and found the Node.Js server to be in a good health state.

Share Now:del.icio.usDiggFacebookLinkedInBlinkListDZoneGoogle BookmarksRedditStumbleUponTwitterRSS

Read the original blog entry...

More Stories By Hovhannes Avoyan

Hovhannes Avoyan is the CEO of PicsArt, Inc.,

Latest Stories
SYS-CON Events announced today that CAST Software will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CAST was founded more than 25 years ago to make the invisible visible. Built around the idea that even the best analytics on the market still leave blind spots for technical teams looking to deliver better software and prevent outages, CAST provides the software intelligence that matter ...
The next XaaS is CICDaaS. Why? Because CICD saves developers a huge amount of time. CD is an especially great option for projects that require multiple and frequent contributions to be integrated. But… securing CICD best practices is an emerging, essential, yet little understood practice for DevOps teams and their Cloud Service Providers. The only way to get CICD to work in a highly secure environment takes collaboration, patience and persistence. Building CICD in the cloud requires rigorous ar...
SYS-CON Events announced today that Daiya Industry will exhibit at the Japanese Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Ruby Development Inc. builds new services in short period of time and provides a continuous support of those services based on Ruby on Rails. For more information, please visit https://github.com/RubyDevInc.
When it comes to cloud computing, the ability to turn massive amounts of compute cores on and off on demand sounds attractive to IT staff, who need to manage peaks and valleys in user activity. With cloud bursting, the majority of the data can stay on premises while tapping into compute from public cloud providers, reducing risk and minimizing need to move large files. In his session at 18th Cloud Expo, Scott Jeschonek, Director of Product Management at Avere Systems, discussed the IT and busine...
As businesses evolve, they need technology that is simple to help them succeed today and flexible enough to help them build for tomorrow. Chrome is fit for the workplace of the future — providing a secure, consistent user experience across a range of devices that can be used anywhere. In her session at 21st Cloud Expo, Vidya Nagarajan, a Senior Product Manager at Google, will take a look at various options as to how ChromeOS can be leveraged to interact with people on the devices, and formats th...
First generation hyperconverged solutions have taken the data center by storm, rapidly proliferating in pockets everywhere to provide further consolidation of floor space and workloads. These first generation solutions are not without challenges, however. In his session at 21st Cloud Expo, Wes Talbert, a Principal Architect and results-driven enterprise sales leader at NetApp, will discuss how the HCI solution of tomorrow will integrate with the public cloud to deliver a quality hybrid cloud e...
SYS-CON Events announced today that Yuasa System will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Yuasa System is introducing a multi-purpose endurance testing system for flexible displays, OLED devices, flexible substrates, flat cables, and films in smartphones, wearables, automobiles, and healthcare.
Companies are harnessing data in ways we once associated with science fiction. Analysts have access to a plethora of visualization and reporting tools, but considering the vast amount of data businesses collect and limitations of CPUs, end users are forced to design their structures and systems with limitations. Until now. As the cloud toolkit to analyze data has evolved, GPUs have stepped in to massively parallel SQL, visualization and machine learning.
Is advanced scheduling in Kubernetes achievable? Yes, however, how do you properly accommodate every real-life scenario that a Kubernetes user might encounter? How do you leverage advanced scheduling techniques to shape and describe each scenario in easy-to-use rules and configurations? In his session at @DevOpsSummit at 21st Cloud Expo, Oleg Chunikhin, CTO at Kublr, will answer these questions and demonstrate techniques for implementing advanced scheduling. For example, using spot instances ...
The session is centered around the tracing of systems on cloud using technologies like ebpf. The goal is to talk about what this technology is all about and what purpose it serves. In his session at 21st Cloud Expo, Shashank Jain, Development Architect at SAP, will touch upon concepts of observability in the cloud and also some of the challenges we have. Generally most cloud-based monitoring tools capture details at a very granular level. To troubleshoot problems this might not be good enough.
DevOps is under attack because developers don’t want to mess with infrastructure. They will happily own their code into production, but want to use platforms instead of raw automation. That’s changing the landscape that we understand as DevOps with both architecture concepts (CloudNative) and process redefinition (SRE). Rob Hirschfeld’s recent work in Kubernetes operations has led to the conclusion that containers and related platforms have changed the way we should be thinking about DevOps and...
SYS-CON Events announced today that Taica will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Taica manufacturers Alpha-GEL brand silicone components and materials, which maintain outstanding performance over a wide temperature range -40C to +200C. For more information, visit http://www.taica.co.jp/english/.
When it comes to cloud computing, the ability to turn massive amounts of compute cores on and off on demand sounds attractive to IT staff, who need to manage peaks and valleys in user activity. With cloud bursting, the majority of the data can stay on premises while tapping into compute from public cloud providers, reducing risk and minimizing need to move large files. In his session at 18th Cloud Expo, Scott Jeschonek, Director of Product Management at Avere Systems, discussed the IT and busine...
We all know that end users experience the Internet primarily with mobile devices. From an app development perspective, we know that successfully responding to the needs of mobile customers depends on rapid DevOps – failing fast, in short, until the right solution evolves in your customers' relationship to your business. Whether you’re decomposing an SOA monolith, or developing a new application cloud natively, it’s not a question of using microservices – not doing so will be a path to eventual b...
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities – ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups. As a result, many firms employ new business models that place enormous impor...