Welcome!

Blog Feed Post

Solr New Metrics API: Quick look at Solr 6.4

As you know, in Sematext we looooove logs and metrics and we enjoy playing with them on a daily basis. We have our Logsene, which is all about logs and their analysis. We have our SPM which can monitor half of the world when it comes to performance metrics, give you a perfect view of class level performance with transaction tracing, and give you a perfect view of your environment with components view. And that’s not all.

Because we enjoy working with metrics, we decided to take a look at the shiny, new Solr metrics API that came out with recently released Solr 6.4. It comes with the promise of providing a convenient method to fetch data in various formats. It also makes it possible to report metrics to external systems. As metrics aficionados and Solr users ourselves we are thrilled to have this at our disposal and jumped on the opportunity to play around with Solr’s new metrics goodness. Let’s start with some history on how it used to be in “the good old days”.

Back in the day…

A long, long time ago, in the Apache country ruled by king Solr and queen Lucene there was a messenger called JMX. Ok, wrong story. To make the long story short – till now the only thing you could do to fetch metrics from Solr is to use JMX. If you wanted to see what is happening inside Solr you had to either use the Solr admin panel and look into what you were interested in or use a Java tool that could read those metrics. Actually, any kind of tool that is able to read data from JMX, for example jconsole that is shipped in the JDK package, can be used to fetch metrics from Solr.

solr new metrics apihttps://sematext.com/wp-content/uploads/2017/02/Solr-new-metrics-api-300... 300w, https://sematext.com/wp-content/uploads/2017/02/Solr-new-metrics-api-768... 768w" sizes="(max-width: 975px) 100vw, 975px" />

This gives us some visibility, but there are issues with such approach. The main problem is that we don’t have a full view of the historical data. We see cumulative statistics or changes since the time we started looking into metrics, but if we want to see what was happening one hour ago, we couldn’t do that or at least not out of the box.

Of course, those obstacles can be overcome by using monitoring tools for Solr. We can either use something open source like Ganglia and configure everything from scratch using information available out in the wild places of the Internet or we can use something that does everything for us and we only need to install an agent, as is the case with SPM from Sematext.

solr new metrics apihttps://sematext.com/wp-content/uploads/2017/02/solr-new-metrics-api-2-3... 300w, https://sematext.com/wp-content/uploads/2017/02/solr-new-metrics-api-2-7... 768w" sizes="(max-width: 975px) 100vw, 975px" />

However, if you are allergic to any kind of UI that helps you with understanding your metrics you were forced to use JMX and tools that could fetch metrics from it. Until now.

Metrics in Solr – the new kid on the block

So what can we do now? Well, with SOLR-4735 Solr metrics evolved. Metrics are no longer limited to being exposed for polling via JMX.  Instead, Solr can now also send metrics to external destinations and file, exposes a nice REST API and extended set of metrics. What’s more, the extended metrics are now not only about Solr components, but you can also retrieve JVM level information or Jetty and HTTP related statistics. Sounds fun if you ask me!

However, this is not all. Right now Solr can export metrics to other systems and that functionality is available out of the box. Yes, apart from the standard JMX reporting, we can save metrics to log file using SLF4J reporter or even send metrics to Ganglia or Graphite. All of that is available out of the box without any special plugins or additions.

Finally, one last thing – the API. Once you know what you want to fetch, it is as easy as pointing your browser or curl command to /admin/metrics, providing the type and group of metrics you would like to retrieve, and Solr will return all the requested information.  

Next, let’s dig a bit deeper and see what we need to do to make all of that working or configured.

Types of meters

The improvements around Solr metrics are based on DropWizard metrics – a Java library supporting easy metrics reporting and different kinds of meters. The ones used by Solr are:

  • counters
  • gauges
  • histograms
  • meters
  • timers

The counters are there to measure simple counts – for example number of queries processed by the request handler. The gauges allow one to check the current value – for example the length of a queue or number, like number of pending documents. Histograms let Solr calculate approximate distribution of certain values and provide statistics around them. For example, histogram can be used to give us percentiles for query response times. Meters are there to provide rates of events, like average number of query requests per second in the periods of 1, 5 and 15 minutes. Finally, the timers are there to count the duration of certain events and allow for both counts and histograms.

Solr metric groups

Solr uses the above mentioned standardized meters to gather the data from various metric, categorizing them into the following groups:

  • Core information (solr.core.<collection_name> group, each for one core)
  • HTTP information (solr.http group)
  • Jetty information (solr.jetty group)
  • JVM information (solr.jvm group)
  • Solr node information (solr.node group)
  • More metrics are coming in the future, like shard leaders metrics, cluster nodes metrics and aggregated per-core metrics. However, these are available in Solr 6.4.

So what can we get from those metrics? The core information group lets us get merics  request handlers (counters, timers, errors, timeouts, etc), index related events (meters and gauges for merges, both current and historical, deleted documents information), Lucene Directory I/O information (read and writes meters and histograms with split per file type – like doc values or terms dictionary). In the future, we will also see shard replication, transaction log information and caches.

The HTTP group provides information regarding available, open and pending connections for shard handler and update handler.

The Jetty related group gives us view on a few very important statistics related to the servlet container in which Solr runs. We can look at threads and thread pools usage, connection and request timers, and meters for various HTTP response codes – like number of request with 200 response code, number of 400 or 500 and so on.

Very interesting JVM group gives us low level information on what is happening inside Solr and inside the JVM itself. We have the direct and mapped buffer pools information, class loading and unloading related information, garbage collection meters and timers, operating system memory, CPU timer, file descriptors count, system load and swap usage (btw, remember that JVM hates its memory to be swapped – it reacts with poor performance).

Finally, the node related group provides information related to Solr node, such as number of cores that are loaded on the core, lazy or unloaded, information about request handlers, collections, configuration sets and so on.

Solr Metrics API

One of the possible methods of retrieving Solr metrics is using the Metrics API. The admin/metrics endpoint provides access to all the metrics that we can get. We should just send a HTTP GET request to that endpoint and let Solr know what we are interested in via available parameters. The parameters of the endpoint are as follows:

  • group – comma separated metric group names we are interested or all value if we are interested in all of them, for example node,jvm.
  • type – comma separated types of metrics that we would like to retrieve or all if we are interested in all types. Possible values are the same as type of metrics that we already mentioned, so: counter, gauge, histogram, meter and timer.
  • prefix – the filter that will display metrics starting with the specified prefix. For example saying prefix=INDEX will return only indexing specific metrics.

Let’s now look at some example responses from Solr, so we know what to expect.

For example, to retrieve minor merge related metrics we could run:

http://localhost:8983/solr/admin/metrics?group=all&type=all&prefix=INDEX.merge.minor

The response looks as follows:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">0</int>
</lst>
<lst name="metrics">
  <lst name="solr.jvm"/>
  <lst name="solr.jetty"/>
  <lst name="solr.node"/>
  <lst name="solr.core.data.shard1.replica1">
    <lst name="INDEX.merge.minor">
      <long name="count">2</long>
      <double name="meanRate">0.022729587366906834</double>
      <double name="1minRate">0.018095662944349354</double>
      <double name="5minRate">0.00588664962556148</double>
      <double name="15minRate">0.0021316664456582147</double>
      <double name="min_ms">1132.643211</double>
      <double name="max_ms">1415.985689</double>
      <double name="mean_ms">1253.221722818028</double>
      <double name="median_ms">1132.643211</double>
      <double name="stddev_ms">140.09224396740174</double>
      <double name="p75_ms">1415.985689</double>
      <double name="p95_ms">1415.985689</double>
      <double name="p99_ms">1415.985689</double>
      <double name="p999_ms">1415.985689</double>
    </lst>
    <lst name="INDEX.merge.minor.running">
      <int name="value">0</int>
    </lst>
    <lst name="INDEX.merge.minor.running.docs">
      <long name="value">0</long>
    </lst>
    <lst name="INDEX.merge.minor.running.segments">
      <int name="value">0</int>
    </lst>
  </lst>
</lst>
</response>

We can also look at directory related metrics like this:

http://localhost:8983/solr/admin/metrics?group=core&prefix=DIRECTORY&wt=json

The response, this time in JSON (because of wt=json in the request) should have the structure that looks as follows:

{
  "responseHeader":{
    "status":0,
    "QTime":0},
  "metrics":[
    "solr.core.gettingstarted.shard1.replica1",[
      "DIRECTORY.total.reads",[
        "count",44502443,
        "meanRate",266358.17622526025,
        "1minRate",111103.14910078928,
        "5minRate",101317.5852718284,
        "15minRate",43604.83723238706],
      "DIRECTORY.total.writes",[
        "count",28269374,
        "meanRate",169199.22568953407,
        "1minRate",65274.18238346925,
        "5minRate",63257.474658024876,
        "15minRate",27508.27281761715]]]}

Reporters

Finally, all the metrics that we discussed so far can be reported automatically by Solr. In Solr 6.4 the following out of the box reporters are available:

  • JMX reporter (class org.apache.solr.metrics.reporters.SolrJmxReporter)
  • SLF4J reporter (class org.apache.solr.metrics.reporters.SolrSlf4jReporter)
  • Graphite reporter (class org.apache.solr.metrics.reporters.SolrGraphiteReporter)
  • Ganglia reporter (class org.apache.solr.metrics.reporters.SolrGangliaReporter)

So instead of retrieving the data on demand we can just let Solr report to an external place or log file. We include appropriate configuration in the solr.xml file (in the metrics section) and that is all what needs to be done.

For example, to include all the Solr node query related metrics on the console output of Solr we would include the following fragment in the solr.xml file:

<solr>
 <metrics>
  <reporter name="queryNodeReporter" registry="solr.node" class="org.apache.solr.metrics.reporters.SolrSlf4jReporter">
   <int name="period">60</int>
   <str name="prefix">example_query_stats</str>
   <str name="logger">CONSOLE</str>
   <str name="filter">QUERY</str>
  </reporter> 
 </metrics> 
</solr>

The above configuration tells Solr, that we want a new reporter called queryNodeReporter to be registered (this name needs to be unique) and we want it to look at the core information for the collection called data reported using the SolrSlf4jReporter, so written to a file. The data will be written to the console file every minute (period number of seconds) and will include information query related metrics (filter value of QUERY). The information written to the console looks as follows:

2017-01-14 20:23:30.940 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=GAUGE, name=QUERY.httpShardHandler.availableConnections, value=0
2017-01-14 20:23:30.940 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=GAUGE, name=QUERY.httpShardHandler.leasedConnections, value=0
2017-01-14 20:23:30.941 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=GAUGE, name=QUERY.httpShardHandler.maxConnections, value=10000
2017-01-14 20:23:30.941 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=GAUGE, name=QUERY.httpShardHandler.pendingConnections, value=0
2017-01-14 20:23:30.941 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=COUNTER, name=QUERY.httpShardHandler.threadPool.httpShardExecutor.running, count=0
2017-01-14 20:23:30.941 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=METER, name=QUERY.httpShardHandler.threadPool.httpShardExecutor.completed, count=0, mean_rate=0.0, m1=0.0, m5=0.0, m15=0.0, rate_unit=events/second
2017-01-14 20:23:30.941 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=METER, name=QUERY.httpShardHandler.threadPool.httpShardExecutor.submitted, count=0, mean_rate=0.0, m1=0.0, m5=0.0, m15=0.0, rate_unit=events/second
2017-01-14 20:23:30.941 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=TIMER, name=QUERY.httpShardHandler.threadPool.httpShardExecutor.duration, count=0, min=0.0, max=0.0, mean=0.0, stddev=0.0, median=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0, mean_rate=0.0, m1=0.0, m5=0.0, m15=0.0, rate_unit=events/second, duration_unit=milliseconds

Of course, we can have numerous reporters defined in the solr.xml and have mixed type of reporters. For example, we can have all core related metrics reported to Graphite and to log file at the same time:

<solr>
 <metrics>
  <reporter name="queryNodeReporter" registry="solr.node" class="org.apache.solr.metrics.reporters.SolrSlf4jReporter">
   <int name="period">60</int>
   <str name="prefix">example_query_stats</str>
   <str name="logger">CONSOLE</str>
   <str name="filter">QUERY</str>
  </reporter>
  <reporter name="exampleGraphiteReporter" group="solr.core.data" class="org.apache.solr.metrics.reporters.SolrGraphiteReporter">
   <str name="host">graphite.my.host.com</str>
   <int name="port">9999</int>
   <int name="period">60</int>
  </reporter> 
 </metrics> 
</solr>

Which Solr metrics reporter to use

Now that we know that Solr 6.4 can report all these new metrics you’ll need to decide where to write it.  If you already have Ganglia or Graphite in your organization and you use it for monitoring other services you may choose to continue using them.

While Ganglia and Graphite were sysadmins’ darlings at some point roughly a decade ago, they are (have been) showing their age.  Thus, if you don’t use Ganglia or Graphite today, I strongly suggest you don’t set them up just to have Solr monitoring.  There are better options.  Similarly, if you need to set up any infrastructure or extra tooling just so you can monitor your Solr clusters you may want to reconsider.  That tends to be too costly in terms of people’s time and/or infrastructure, although initially it almost always doesn’t feel that way. You can want to use Sematext SPM for Solr. It’s simple to install (you just need an agent to collect and ship metrics), pre-configured with various out of the box dashboards and alerts and, in short, it lets you troubleshoot Solr performance issues very quickly. What’s more, SPM integrates with Logsene, a hosted log management and analysis service, which makes it even easier to have a single pane of glass for all your operations data, not just Solr, not just metrics, but also metrics for other parts of your infrastructure and other logs. SPM will be soon able to collect and show all the additional metrics introduced with Solr 6.4, so watch out for announcement from us.

Where to go from here?

In this post we covered the basics of the new Solr metrics. For more information about Solr metrics visit  Solr CWiki pages.

If you are into metrics and monitoring like we are, you will like the new Solr exposing more of its metrics. If you only need this information from time to time for troubleshooting then the new API will come in handy for ad-hoc calls. For production monitoring you should use a good Solr monitoring solution that lets you not only look at the numbers and pretty charts, but also trigger alerts, correlate metrics with logs coming from different sources so that you can find, identify, troubleshoot, and fix problems faster.

To learn more about Solr ops, including its monitoring and metrics, consider Sematext’s Solr training, specifically the Intermediate Solr class that covers Solr monitoring.

SIGN UP – FREE TRIAL

Read the original blog entry...

More Stories By Sematext Blog

Sematext is a globally distributed organization that builds innovative Cloud and On Premises solutions for performance monitoring, alerting and anomaly detection (SPM), log management and analytics (Logsene), and search analytics (SSA). We also provide Search and Big Data consulting services and offer 24/7 production support for Solr and Elasticsearch.

Latest Stories
In his session at 21st Cloud Expo, Raju Shreewastava, founder of Big Data Trunk, provided a fun and simple way to introduce Machine Leaning to anyone and everyone. He solved a machine learning problem and demonstrated an easy way to be able to do machine learning without even coding. Raju Shreewastava is the founder of Big Data Trunk (www.BigDataTrunk.com), a Big Data Training and consulting firm with offices in the United States. He previously led the data warehouse/business intelligence and B...
Blockchain is a shared, secure record of exchange that establishes trust, accountability and transparency across business networks. Supported by the Linux Foundation's open source, open-standards based Hyperledger Project, Blockchain has the potential to improve regulatory compliance, reduce cost as well as advance trade. Are you curious about how Blockchain is built for business? In her session at 21st Cloud Expo, René Bostic, Technical VP of the IBM Cloud Unit in North America, discussed the b...
The past few years have brought a sea change in the way applications are architected, developed, and consumed—increasing both the complexity of testing and the business impact of software failures. How can software testing professionals keep pace with modern application delivery, given the trends that impact both architectures (cloud, microservices, and APIs) and processes (DevOps, agile, and continuous delivery)? This is where continuous testing comes in. D
SYS-CON Events announced today that Synametrics Technologies will exhibit at SYS-CON's 22nd International Cloud Expo®, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Synametrics Technologies is a privately held company based in Plainsboro, New Jersey that has been providing solutions for the developer community since 1997. Based on the success of its initial product offerings such as WinSQL, Xeams, SynaMan and Syncrify, Synametrics continues to create and hone in...
With tough new regulations coming to Europe on data privacy in May 2018, Calligo will explain why in reality the effect is global and transforms how you consider critical data. EU GDPR fundamentally rewrites the rules for cloud, Big Data and IoT. In his session at 21st Cloud Expo, Adam Ryan, Vice President and General Manager EMEA at Calligo, examined the regulations and provided insight on how it affects technology, challenges the established rules and will usher in new levels of diligence arou...
Nordstrom is transforming the way that they do business and the cloud is the key to enabling speed and hyper personalized customer experiences. In his session at 21st Cloud Expo, Ken Schow, VP of Engineering at Nordstrom, discussed some of the key learnings and common pitfalls of large enterprises moving to the cloud. This includes strategies around choosing a cloud provider(s), architecture, and lessons learned. In addition, he covered some of the best practices for structured team migration an...
As you move to the cloud, your network should be efficient, secure, and easy to manage. An enterprise adopting a hybrid or public cloud needs systems and tools that provide: Agility: ability to deliver applications and services faster, even in complex hybrid environments Easier manageability: enable reliable connectivity with complete oversight as the data center network evolves Greater efficiency: eliminate wasted effort while reducing errors and optimize asset utilization Security: imple...
Mobile device usage has increased exponentially during the past several years, as consumers rely on handhelds for everything from news and weather to banking and purchases. What can we expect in the next few years? The way in which we interact with our devices will fundamentally change, as businesses leverage Artificial Intelligence. We already see this taking shape as businesses leverage AI for cost savings and customer responsiveness. This trend will continue, as AI is used for more sophistica...
No hype cycles or predictions of a gazillion things here. IoT is here. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, an Associate Partner of Analytics, IoT & Cybersecurity at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He also discussed the evaluation of communication standards and IoT messaging protocols, data...
Companies are harnessing data in ways we once associated with science fiction. Analysts have access to a plethora of visualization and reporting tools, but considering the vast amount of data businesses collect and limitations of CPUs, end users are forced to design their structures and systems with limitations. Until now. As the cloud toolkit to analyze data has evolved, GPUs have stepped in to massively parallel SQL, visualization and machine learning.
The 22nd International Cloud Expo | 1st DXWorld Expo has announced that its Call for Papers is open. Cloud Expo | DXWorld Expo, to be held June 5-7, 2018, at the Javits Center in New York, NY, brings together Cloud Computing, Digital Transformation, Big Data, Internet of Things, DevOps, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
Modern software design has fundamentally changed how we manage applications, causing many to turn to containers as the new virtual machine for resource management. As container adoption grows beyond stateless applications to stateful workloads, the need for persistent storage is foundational - something customers routinely cite as a top pain point. In his session at @DevOpsSummit at 21st Cloud Expo, Bill Borsari, Head of Systems Engineering at Datera, explored how organizations can reap the bene...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
Kubernetes is an open source system for automating deployment, scaling, and management of containerized applications. Kubernetes was originally built by Google, leveraging years of experience with managing container workloads, and is now a Cloud Native Compute Foundation (CNCF) project. Kubernetes has been widely adopted by the community, supported on all major public and private cloud providers, and is gaining rapid adoption in enterprises. However, Kubernetes may seem intimidating and complex ...
In his session at 21st Cloud Expo, Michael Burley, a Senior Business Development Executive in IT Services at NetApp, described how NetApp designed a three-year program of work to migrate 25PB of a major telco's enterprise data to a new STaaS platform, and then secured a long-term contract to manage and operate the platform. This significant program blended the best of NetApp’s solutions and services capabilities to enable this telco’s successful adoption of private cloud storage and launching ...