Welcome!

Article

Cassandra Nodetool Internals

Let's go inside the cassandra nodetool utility.

Relational database management systems are the most commonly used system to store and use data, but for extremely large amounts of data, this kind of system doesn’t scale up properly.

The concept of "NoSQL"(Not Only SQL) has been spreading due to the growing demand for relational database alternatives. The biggest motivation behind NoSQL is scalability. NoSQL solutions can offer a way to store and use extremely large amounts of data, but with less overhead, less work, better performance, and less downtime.

Apache Cassandra implements the “NoSQL” concept. It was developed at Facebook to power their Inbox Search feature, and it became an Apache open source project. Twitter, Digg, Reddit and quite a few others started using it invo

Cassandra exposes a number of management operations via Java Management Extensions (JMX). Java Management Extensions (JMX) is a Java technology that supplies tools for managing and monitoring Java applications and services. Any statistic or operation that a Java application has exposed as an MBean can then be monitored or manipulated using JMX.

In this article the goal is to go inside Cassandra nodetool, and see how easy it would be to extend it or build a Cassandra monitoring UI. For that we use JArchitect to understand how the nodetool works internally.

The nodetool utility is a command line interface for Cassandra. You can use it to help manage a cluster. It’s used like this nodetool -h HOSTNAME [-p JMX_PORT] COMMAND
Here are some available commands:

  ring                   - Print informations on the token ring
  join                   - Join the ring
  info                   - Print node informations (uptime, load, ...)
  cfstats                - Print statistics on column families
  clearsnapshot          - Remove all existing snapshots
  version                - Print cassandra version
  tpstats                - Print usage statistics of thread pools
  drain                  - Drain the node (stop accepting writes and flush all column families)

The nodetool classes exist in the org.apache.cassandra.tools package, and the entry class is NodeCmd.

NodeCmd

Let’s search for methods invoked from the main method from the NodeCmd class by executing the following CQLinq query:

from m in Methods where m.IsUsedBy ("org.apache.cassandra.tools.NodeCmd.main(String[])")
select new { m, m.NbBCInstructions }

cassandra1

To treat commands, the main method uses the Apache Commons CLI library which provides an API for parsing command line options. It's also able to print help messages of all the options available.

For each command the NodeCmd switch to the appropriate method to do the job. And the NodeCmd collaborates with the NodeProbe class.

NodeProbe

Let’s discover how NodeProbe achieve its task, for that we can search for all types used by it.

from t in Types where t.IsUsedBy ("org.apache.cassandra.tools.NodeProbe")
select new { t }
 

cassandra2

The NodeProbe class uses mainly the management beans; it acts as a facade and redirects each command to the appropriate JMX bean.

Here is the list of all Cassandra JMX beans:

from t in Types
where t.NameLike (@"mbean\i") && t.IsInterface 
select  t

cassandra3

Many managed beans are available which gives the possibility to create tools exploiting their capabilities, to help administrators monitor and manage the Cassandra cluster.

The idea of such tools is to interact with the JMX beans and invoke some of their methods, for that we need to create a JMX proxy by invoking JMX.newMBeanProxy.

Let’s search for methods which create a JMX proxy.

from m in Methods where m.IsUsing ("javax.management.JMX.newMBeanProxy(MBeanServerConnection,ObjectName,Class)")
select new { m, m.NbBCInstructions }

cassandra4

The NodeProbe which acts as a facade create these proxies, we can take as example the connect method which is invoked to create these proxies, and discover some methods invoked by it.

cassandra5

After the creation of proxies, all the commands will be just a redirection, for example let’s search for methods used by NodeProbe.forceRemoveCompletion.

from m in Methods where m.IsUsedBy ("org.apache.cassandra.tools.NodeProbe.forceRemoveCompletion()")
select new { m, m.NbBCInstructions }
 

cassandra6

Only the StorageServiceMBean.forceRemoveCompletion is used, all the logic of the treatment is in the server side. TheStorageServiceMBean is implemented by the StorageService class and here are all methods used by the StoageService. forceRemoveCompletion method:

cassandra7

Possible Design improvement

We discovered that NodeProbe is just a facade to the JMX beans, but what about NodeCmd class, it uses any JMX beans directly?

As shown before NodeCmd not create any JMX proxy, the creation of all the proxies is achieved by the NodeProbe. So we can conclude that all JMX beans invoking are from the NodeProbe class, and the responsibility of the NodeCmd class is just to treat the command line and ask the NodeProbe class to do the job. And to check that let’s search for JMX beans used directly by the NodeCmd class.

from t in Types where t.IsUsedBy ("org.apache.cassandra.tools.NodeCmd") && t.NameLike (@"mbean\i") 
select  t

cassandra8

The NodeCmd uses also some JMX beans like EndpointSnitchInfoMBean, and here are all the methods using this bean.

from m in Methods where m.IsUsing ("org.apache.cassandra.locator.EndpointSnitchInfoMBean")
select new { m, m.NbBCInstructions }

cassandra9

NodeCmd and NodeProbe use it, the NodeCmd doesn’t have the proxy but ask it from the NodeProbe class as shown in this dependency graph:

cassandra10

Maybe it will be better to refactor the nodetool and let only NodeProbe redirect commands to the JMX proxies, and acts as the only facade to the management capabilities, and the responsibility of the NodeCmd will be only the parsing of the command line and redirect to the NodeProbe class. Only a few commands are treated directly from the NodeCmd class and delegate this task to the NodeProbe is not a difficult task.

Conclusion
Cassandra expose many management capabilities thought JMX beans, and not all of them are treated by the nodetool utility, it’s very easy to add other commands to it, we can just take as example the already existing commands and add your own. Building administrator tools is also easy, because all the logic is in the Cassandra server side, you have just to develop a nice GUI and interact with the JMX beans. If you want to develop your own administration tool, the nodetool is a good beginning; understanding how it works will facilitate a lot your task.

More Stories By Lahlali Issam

Lahlali Issam Lead Developer at JavaDepend, a tool to manage and understand complex Java code. With JavaDepend, software quality can be measured using Code Metrics, visualized using Graphs and Treemaps, and queried using CQL language, a SQL like to query the code base.

Latest Stories
In his session at 21st Cloud Expo, Raju Shreewastava, founder of Big Data Trunk, will provide a fun and simple way to introduce Machine Leaning to anyone and everyone. Together we will solve a machine learning problem and find an easy way to be able to do machine learning without even coding. Raju Shreewastava is the founder of Big Data Trunk (www.BigDataTrunk.com), a Big Data Training and consulting firm with offices in the United States. He previously led the data warehouse/business intellige...
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
SYS-CON Events announced today that TidalScale, a leading provider of systems and services, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. TidalScale has been involved in shaping the computing landscape. They've designed, developed and deployed some of the most important and successful systems and services in the history of the computing industry - internet, Ethernet, operating s...
Infoblox delivers Actionable Network Intelligence to enterprise, government, and service provider customers around the world. They are the industry leader in DNS, DHCP, and IP address management, the category known as DDI. We empower thousands of organizations to control and secure their networks from the core-enabling them to increase efficiency and visibility, improve customer service, and meet compliance requirements.
In his session at 21st Cloud Expo, Michael Burley, a Senior Business Development Executive in IT Services at NetApp, will describe how NetApp designed a three-year program of work to migrate 25PB of a major telco's enterprise data to a new STaaS platform, and then secured a long-term contract to manage and operate the platform. This significant program blended the best of NetApp’s solutions and services capabilities to enable this telco’s successful adoption of private cloud storage and launchi...
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
Data scientists must access high-performance computing resources across a wide-area network. To achieve cloud-based HPC visualization, researchers must transfer datasets and visualization results efficiently. HPC clusters now compute GPU-accelerated visualization in the cloud cluster. To efficiently display results remotely, a high-performance, low-latency protocol transfers the display from the cluster to a remote desktop. Further, tools to easily mount remote datasets and efficiently transfer...
Cloud Expo, Inc. has announced today that Andi Mann and Aruna Ravichandran have been named Co-Chairs of @DevOpsSummit at Cloud Expo Silicon Valley which will take place Oct. 31-Nov. 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. "DevOps is at the intersection of technology and business-optimizing tools, organizations and processes to bring measurable improvements in productivity and profitability," said Aruna Ravichandran, vice president, DevOps product and solutions marketing...
Join IBM November 1 at 21st Cloud Expo at the Santa Clara Convention Center in Santa Clara, CA, and learn how IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Cognitive analysis impacts today’s systems with unparalleled ability that were previously available only to manned, back-end operations. Thanks to cloud processing, IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Imagine a robot vacuum that becomes your personal assistant tha...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, will lead you through the exciting evolution of the cloud. He'll look at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering ...
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Tha...
SYS-CON Events announced today that Avere Systems, a leading provider of enterprise storage for the hybrid cloud, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Avere delivers a more modern architectural approach to storage that doesn't require the overprovisioning of storage capacity to achieve performance, overspending on expensive storage media for inactive data or the overbui...
Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
In his general session at 21st Cloud Expo, Greg Dumas, Calligo’s Vice President and G.M. of US operations, will go over the new Global Data Protection Regulation and how Calligo can help business stay compliant in digitally globalized world. Greg Dumas is Calligo's Vice President and G.M. of US operations. Calligo is an established service provider that provides an innovative platform for trusted cloud solutions. Calligo’s customers are typically most concerned about GDPR compliance, applicatio...
SYS-CON Events announced today that N3N will exhibit at SYS-CON's @ThingsExpo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. N3N’s solutions increase the effectiveness of operations and control centers, increase the value of IoT investments, and facilitate real-time operational decision making. N3N enables operations teams with a four dimensional digital “big board” that consolidates real-time live video feeds alongside IoT sensor data a...