Welcome!

Article

Cassandra Nodetool Internals

Let's go inside the cassandra nodetool utility.

Relational database management systems are the most commonly used system to store and use data, but for extremely large amounts of data, this kind of system doesn’t scale up properly.

The concept of "NoSQL"(Not Only SQL) has been spreading due to the growing demand for relational database alternatives. The biggest motivation behind NoSQL is scalability. NoSQL solutions can offer a way to store and use extremely large amounts of data, but with less overhead, less work, better performance, and less downtime.

Apache Cassandra implements the “NoSQL” concept. It was developed at Facebook to power their Inbox Search feature, and it became an Apache open source project. Twitter, Digg, Reddit and quite a few others started using it invo

Cassandra exposes a number of management operations via Java Management Extensions (JMX). Java Management Extensions (JMX) is a Java technology that supplies tools for managing and monitoring Java applications and services. Any statistic or operation that a Java application has exposed as an MBean can then be monitored or manipulated using JMX.

In this article the goal is to go inside Cassandra nodetool, and see how easy it would be to extend it or build a Cassandra monitoring UI. For that we use JArchitect to understand how the nodetool works internally.

The nodetool utility is a command line interface for Cassandra. You can use it to help manage a cluster. It’s used like this nodetool -h HOSTNAME [-p JMX_PORT] COMMAND
Here are some available commands:

  ring                   - Print informations on the token ring
  join                   - Join the ring
  info                   - Print node informations (uptime, load, ...)
  cfstats                - Print statistics on column families
  clearsnapshot          - Remove all existing snapshots
  version                - Print cassandra version
  tpstats                - Print usage statistics of thread pools
  drain                  - Drain the node (stop accepting writes and flush all column families)

The nodetool classes exist in the org.apache.cassandra.tools package, and the entry class is NodeCmd.

NodeCmd

Let’s search for methods invoked from the main method from the NodeCmd class by executing the following CQLinq query:

from m in Methods where m.IsUsedBy ("org.apache.cassandra.tools.NodeCmd.main(String[])")
select new { m, m.NbBCInstructions }

cassandra1

To treat commands, the main method uses the Apache Commons CLI library which provides an API for parsing command line options. It's also able to print help messages of all the options available.

For each command the NodeCmd switch to the appropriate method to do the job. And the NodeCmd collaborates with the NodeProbe class.

NodeProbe

Let’s discover how NodeProbe achieve its task, for that we can search for all types used by it.

from t in Types where t.IsUsedBy ("org.apache.cassandra.tools.NodeProbe")
select new { t }
 

cassandra2

The NodeProbe class uses mainly the management beans; it acts as a facade and redirects each command to the appropriate JMX bean.

Here is the list of all Cassandra JMX beans:

from t in Types
where t.NameLike (@"mbean\i") && t.IsInterface 
select  t

cassandra3

Many managed beans are available which gives the possibility to create tools exploiting their capabilities, to help administrators monitor and manage the Cassandra cluster.

The idea of such tools is to interact with the JMX beans and invoke some of their methods, for that we need to create a JMX proxy by invoking JMX.newMBeanProxy.

Let’s search for methods which create a JMX proxy.

from m in Methods where m.IsUsing ("javax.management.JMX.newMBeanProxy(MBeanServerConnection,ObjectName,Class)")
select new { m, m.NbBCInstructions }

cassandra4

The NodeProbe which acts as a facade create these proxies, we can take as example the connect method which is invoked to create these proxies, and discover some methods invoked by it.

cassandra5

After the creation of proxies, all the commands will be just a redirection, for example let’s search for methods used by NodeProbe.forceRemoveCompletion.

from m in Methods where m.IsUsedBy ("org.apache.cassandra.tools.NodeProbe.forceRemoveCompletion()")
select new { m, m.NbBCInstructions }
 

cassandra6

Only the StorageServiceMBean.forceRemoveCompletion is used, all the logic of the treatment is in the server side. TheStorageServiceMBean is implemented by the StorageService class and here are all methods used by the StoageService. forceRemoveCompletion method:

cassandra7

Possible Design improvement

We discovered that NodeProbe is just a facade to the JMX beans, but what about NodeCmd class, it uses any JMX beans directly?

As shown before NodeCmd not create any JMX proxy, the creation of all the proxies is achieved by the NodeProbe. So we can conclude that all JMX beans invoking are from the NodeProbe class, and the responsibility of the NodeCmd class is just to treat the command line and ask the NodeProbe class to do the job. And to check that let’s search for JMX beans used directly by the NodeCmd class.

from t in Types where t.IsUsedBy ("org.apache.cassandra.tools.NodeCmd") && t.NameLike (@"mbean\i") 
select  t

cassandra8

The NodeCmd uses also some JMX beans like EndpointSnitchInfoMBean, and here are all the methods using this bean.

from m in Methods where m.IsUsing ("org.apache.cassandra.locator.EndpointSnitchInfoMBean")
select new { m, m.NbBCInstructions }

cassandra9

NodeCmd and NodeProbe use it, the NodeCmd doesn’t have the proxy but ask it from the NodeProbe class as shown in this dependency graph:

cassandra10

Maybe it will be better to refactor the nodetool and let only NodeProbe redirect commands to the JMX proxies, and acts as the only facade to the management capabilities, and the responsibility of the NodeCmd will be only the parsing of the command line and redirect to the NodeProbe class. Only a few commands are treated directly from the NodeCmd class and delegate this task to the NodeProbe is not a difficult task.

Conclusion
Cassandra expose many management capabilities thought JMX beans, and not all of them are treated by the nodetool utility, it’s very easy to add other commands to it, we can just take as example the already existing commands and add your own. Building administrator tools is also easy, because all the logic is in the Cassandra server side, you have just to develop a nice GUI and interact with the JMX beans. If you want to develop your own administration tool, the nodetool is a good beginning; understanding how it works will facilitate a lot your task.

More Stories By Lahlali Issam

Lahlali Issam Lead Developer at JavaDepend, a tool to manage and understand complex Java code. With JavaDepend, software quality can be measured using Code Metrics, visualized using Graphs and Treemaps, and queried using CQL language, a SQL like to query the code base.

Latest Stories
SYS-CON Events announced today that Linux Academy, the foremost online Linux and cloud training platform and community, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Linux Academy was founded on the belief that providing high-quality, in-depth training should be available at an affordable price. Industry leaders in quality training, provided services, and student certification passes, its goal is to c...
With the proliferation of both SQL and NoSQL databases, organizations can now target specific fit-for-purpose database tools for their different application needs regarding scalability, ease of use, ACID support, etc. Platform as a Service offerings make this even easier now, enabling developers to roll out their own database infrastructure in minutes with minimal management overhead. However, this same amount of flexibility also comes with the challenges of picking the right tool, on the right ...
Data is the fuel that drives the machine learning algorithmic engines and ultimately provides the business value. In his session at 20th Cloud Expo, Ed Featherston, director/senior enterprise architect at Collaborative Consulting, will discuss the key considerations around quality, volume, timeliness, and pedigree that must be dealt with in order to properly fuel that engine.
910Telecom exhibited at the 19th International Cloud Expo, which took place at the Santa Clara Convention Center in Santa Clara, CA, in November 2016. Housed in the classic Denver Gas & Electric Building, 910 15th St., 910Telecom is a carrier-neutral telecom hotel located in the heart of Denver. Adjacent to CenturyLink, AT&T, and Denver Main, 910Telecom offers connectivity to all major carriers, Internet service providers, Internet backbones and exchanges.
Cognitive Computing is becoming the foundation for a new generation of solutions that have the potential to transform business. Unlike traditional approaches to building solutions, a cognitive computing approach allows the data to help determine the way applications are designed. This contrasts with conventional software development that begins with defining logic based on the current way a business operates. In her session at 18th Cloud Expo, Judith S. Hurwitz, President and CEO of Hurwitz & ...
China Unicom exhibit at the 19th International Cloud Expo, which took place at the Santa Clara Convention Center in Santa Clara, CA, in November 2016. China United Network Communications Group Co. Ltd ("China Unicom") was officially established in 2009 on the basis of the merger of former China Netcom and former China Unicom. China Unicom mainly operates a full range of telecommunications services including mobile broadband (GSM, WCDMA, LTE FDD, TD-LTE), fixed-line broadband, ICT, data communica...
Zerto exhibited at SYS-CON's 18th International Cloud Expo®, which took place at the Javits Center in New York City, NY, in June 2016. Zerto is committed to keeping enterprise and cloud IT running 24/7 by providing innovative, simple, reliable and scalable business continuity software solutions. Through the Zerto Cloud Continuity Platform™, organizations can seamlessly move and protect virtualized workloads between public, private and hybrid clouds. The company’s flagship product, Zerto Virtual...
As businesses adopt functionalities in cloud computing, it’s imperative that IT operations consistently ensure cloud systems work correctly – all of the time, and to their best capabilities. In his session at @BigDataExpo, Bernd Harzog, CEO and founder of OpsDataStore, will present an industry answer to the common question, “Are you running IT operations as efficiently and as cost effectively as you need to?” He will expound on the industry issues he frequently came up against as an analyst, and...
All clouds are not equal. To succeed in a DevOps context, organizations should plan to develop/deploy apps across a choice of on-premise and public clouds simultaneously depending on the business needs. This is where the concept of the Lean Cloud comes in - resting on the idea that you often need to relocate your app modules over their life cycles for both innovation and operational efficiency in the cloud. In his session at @DevOpsSummit at19th Cloud Expo, Valentin (Val) Bercovici, CTO of Soli...
"We're bringing out a new application monitoring system to the DevOps space. It manages large enterprise applications that are distributed throughout a node in many enterprises and we manage them as one collective," explained Kevin Barnes, President of eCube Systems, in this SYS-CON.tv interview at DevOps at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
SYS-CON Events announced today that CA Technologies has been named "Platinum Sponsor" of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, New York, and 21st International Cloud Expo, which will take place in November in Silicon Valley, California.
SYS-CON Events announced today that delaPlex will exhibit at SYS-CON's @CloudExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. delaPlex pioneered Software Development as a Service (SDaaS), which provides scalable resources to build, test, and deploy software. It’s a fast and more reliable way to develop a new product or expand your in-house team.
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo 2016 in New York. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place June 6-8, 2017, at the Javits Center in New York City, New York, is co-located with 20th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry p...
Extreme Computing is the ability to leverage highly performant infrastructure and software to accelerate Big Data, machine learning, HPC, and Enterprise applications. High IOPS Storage, low-latency networks, in-memory databases, GPUs and other parallel accelerators are being used to achieve faster results and help businesses make better decisions. In his session at 18th Cloud Expo, Michael O'Neill, Strategic Business Development at NVIDIA, focused on some of the unique ways extreme computing is...
The explosion of new web/cloud/IoT-based applications and the data they generate are transforming our world right before our eyes. In this rush to adopt these new technologies, organizations are often ignoring fundamental questions concerning who owns the data and failing to ask for permission to conduct invasive surveillance of their customers. Organizations that are not transparent about how their systems gather data telemetry without offering shared data ownership risk product rejection, regu...