Welcome!

Blog Feed Post

Solr New Metrics API: Quick look at Solr 6.4

As you know, in Sematext we looooove logs and metrics and we enjoy playing with them on a daily basis. We have our Logsene, which is all about logs and their analysis. We have our SPM which can monitor half of the world when it comes to performance metrics, give you a perfect view of class level performance with transaction tracing, and give you a perfect view of your environment with components view. And that’s not all.

Because we enjoy working with metrics, we decided to take a look at the shiny, new Solr metrics API that came out with recently released Solr 6.4. It comes with the promise of providing a convenient method to fetch data in various formats. It also makes it possible to report metrics to external systems. As metrics aficionados and Solr users ourselves we are thrilled to have this at our disposal and jumped on the opportunity to play around with Solr’s new metrics goodness. Let’s start with some history on how it used to be in “the good old days”.

Back in the day…

A long, long time ago, in the Apache country ruled by king Solr and queen Lucene there was a messenger called JMX. Ok, wrong story. To make the long story short – till now the only thing you could do to fetch metrics from Solr is to use JMX. If you wanted to see what is happening inside Solr you had to either use the Solr admin panel and look into what you were interested in or use a Java tool that could read those metrics. Actually, any kind of tool that is able to read data from JMX, for example jconsole that is shipped in the JDK package, can be used to fetch metrics from Solr.

solr new metrics apihttps://sematext.com/wp-content/uploads/2017/02/Solr-new-metrics-api-300... 300w, https://sematext.com/wp-content/uploads/2017/02/Solr-new-metrics-api-768... 768w" sizes="(max-width: 975px) 100vw, 975px" />

This gives us some visibility, but there are issues with such approach. The main problem is that we don’t have a full view of the historical data. We see cumulative statistics or changes since the time we started looking into metrics, but if we want to see what was happening one hour ago, we couldn’t do that or at least not out of the box.

Of course, those obstacles can be overcome by using monitoring tools for Solr. We can either use something open source like Ganglia and configure everything from scratch using information available out in the wild places of the Internet or we can use something that does everything for us and we only need to install an agent, as is the case with SPM from Sematext.

solr new metrics apihttps://sematext.com/wp-content/uploads/2017/02/solr-new-metrics-api-2-3... 300w, https://sematext.com/wp-content/uploads/2017/02/solr-new-metrics-api-2-7... 768w" sizes="(max-width: 975px) 100vw, 975px" />

However, if you are allergic to any kind of UI that helps you with understanding your metrics you were forced to use JMX and tools that could fetch metrics from it. Until now.

Metrics in Solr – the new kid on the block

So what can we do now? Well, with SOLR-4735 Solr metrics evolved. Metrics are no longer limited to being exposed for polling via JMX.  Instead, Solr can now also send metrics to external destinations and file, exposes a nice REST API and extended set of metrics. What’s more, the extended metrics are now not only about Solr components, but you can also retrieve JVM level information or Jetty and HTTP related statistics. Sounds fun if you ask me!

However, this is not all. Right now Solr can export metrics to other systems and that functionality is available out of the box. Yes, apart from the standard JMX reporting, we can save metrics to log file using SLF4J reporter or even send metrics to Ganglia or Graphite. All of that is available out of the box without any special plugins or additions.

Finally, one last thing – the API. Once you know what you want to fetch, it is as easy as pointing your browser or curl command to /admin/metrics, providing the type and group of metrics you would like to retrieve, and Solr will return all the requested information.  

Next, let’s dig a bit deeper and see what we need to do to make all of that working or configured.

Types of meters

The improvements around Solr metrics are based on DropWizard metrics – a Java library supporting easy metrics reporting and different kinds of meters. The ones used by Solr are:

  • counters
  • gauges
  • histograms
  • meters
  • timers

The counters are there to measure simple counts – for example number of queries processed by the request handler. The gauges allow one to check the current value – for example the length of a queue or number, like number of pending documents. Histograms let Solr calculate approximate distribution of certain values and provide statistics around them. For example, histogram can be used to give us percentiles for query response times. Meters are there to provide rates of events, like average number of query requests per second in the periods of 1, 5 and 15 minutes. Finally, the timers are there to count the duration of certain events and allow for both counts and histograms.

Solr metric groups

Solr uses the above mentioned standardized meters to gather the data from various metric, categorizing them into the following groups:

  • Core information (solr.core.<collection_name> group, each for one core)
  • HTTP information (solr.http group)
  • Jetty information (solr.jetty group)
  • JVM information (solr.jvm group)
  • Solr node information (solr.node group)
  • More metrics are coming in the future, like shard leaders metrics, cluster nodes metrics and aggregated per-core metrics. However, these are available in Solr 6.4.

So what can we get from those metrics? The core information group lets us get merics  request handlers (counters, timers, errors, timeouts, etc), index related events (meters and gauges for merges, both current and historical, deleted documents information), Lucene Directory I/O information (read and writes meters and histograms with split per file type – like doc values or terms dictionary). In the future, we will also see shard replication, transaction log information and caches.

The HTTP group provides information regarding available, open and pending connections for shard handler and update handler.

The Jetty related group gives us view on a few very important statistics related to the servlet container in which Solr runs. We can look at threads and thread pools usage, connection and request timers, and meters for various HTTP response codes – like number of request with 200 response code, number of 400 or 500 and so on.

Very interesting JVM group gives us low level information on what is happening inside Solr and inside the JVM itself. We have the direct and mapped buffer pools information, class loading and unloading related information, garbage collection meters and timers, operating system memory, CPU timer, file descriptors count, system load and swap usage (btw, remember that JVM hates its memory to be swapped – it reacts with poor performance).

Finally, the node related group provides information related to Solr node, such as number of cores that are loaded on the core, lazy or unloaded, information about request handlers, collections, configuration sets and so on.

Solr Metrics API

One of the possible methods of retrieving Solr metrics is using the Metrics API. The admin/metrics endpoint provides access to all the metrics that we can get. We should just send a HTTP GET request to that endpoint and let Solr know what we are interested in via available parameters. The parameters of the endpoint are as follows:

  • group – comma separated metric group names we are interested or all value if we are interested in all of them, for example node,jvm.
  • type – comma separated types of metrics that we would like to retrieve or all if we are interested in all types. Possible values are the same as type of metrics that we already mentioned, so: counter, gauge, histogram, meter and timer.
  • prefix – the filter that will display metrics starting with the specified prefix. For example saying prefix=INDEX will return only indexing specific metrics.

Let’s now look at some example responses from Solr, so we know what to expect.

For example, to retrieve minor merge related metrics we could run:

http://localhost:8983/solr/admin/metrics?group=all&type=all&prefix=INDEX.merge.minor

The response looks as follows:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">0</int>
</lst>
<lst name="metrics">
  <lst name="solr.jvm"/>
  <lst name="solr.jetty"/>
  <lst name="solr.node"/>
  <lst name="solr.core.data.shard1.replica1">
    <lst name="INDEX.merge.minor">
      <long name="count">2</long>
      <double name="meanRate">0.022729587366906834</double>
      <double name="1minRate">0.018095662944349354</double>
      <double name="5minRate">0.00588664962556148</double>
      <double name="15minRate">0.0021316664456582147</double>
      <double name="min_ms">1132.643211</double>
      <double name="max_ms">1415.985689</double>
      <double name="mean_ms">1253.221722818028</double>
      <double name="median_ms">1132.643211</double>
      <double name="stddev_ms">140.09224396740174</double>
      <double name="p75_ms">1415.985689</double>
      <double name="p95_ms">1415.985689</double>
      <double name="p99_ms">1415.985689</double>
      <double name="p999_ms">1415.985689</double>
    </lst>
    <lst name="INDEX.merge.minor.running">
      <int name="value">0</int>
    </lst>
    <lst name="INDEX.merge.minor.running.docs">
      <long name="value">0</long>
    </lst>
    <lst name="INDEX.merge.minor.running.segments">
      <int name="value">0</int>
    </lst>
  </lst>
</lst>
</response>

We can also look at directory related metrics like this:

http://localhost:8983/solr/admin/metrics?group=core&prefix=DIRECTORY&wt=json

The response, this time in JSON (because of wt=json in the request) should have the structure that looks as follows:

{
  "responseHeader":{
    "status":0,
    "QTime":0},
  "metrics":[
    "solr.core.gettingstarted.shard1.replica1",[
      "DIRECTORY.total.reads",[
        "count",44502443,
        "meanRate",266358.17622526025,
        "1minRate",111103.14910078928,
        "5minRate",101317.5852718284,
        "15minRate",43604.83723238706],
      "DIRECTORY.total.writes",[
        "count",28269374,
        "meanRate",169199.22568953407,
        "1minRate",65274.18238346925,
        "5minRate",63257.474658024876,
        "15minRate",27508.27281761715]]]}

Reporters

Finally, all the metrics that we discussed so far can be reported automatically by Solr. In Solr 6.4 the following out of the box reporters are available:

  • JMX reporter (class org.apache.solr.metrics.reporters.SolrJmxReporter)
  • SLF4J reporter (class org.apache.solr.metrics.reporters.SolrSlf4jReporter)
  • Graphite reporter (class org.apache.solr.metrics.reporters.SolrGraphiteReporter)
  • Ganglia reporter (class org.apache.solr.metrics.reporters.SolrGangliaReporter)

So instead of retrieving the data on demand we can just let Solr report to an external place or log file. We include appropriate configuration in the solr.xml file (in the metrics section) and that is all what needs to be done.

For example, to include all the Solr node query related metrics on the console output of Solr we would include the following fragment in the solr.xml file:

<solr>
 <metrics>
  <reporter name="queryNodeReporter" registry="solr.node" class="org.apache.solr.metrics.reporters.SolrSlf4jReporter">
   <int name="period">60</int>
   <str name="prefix">example_query_stats</str>
   <str name="logger">CONSOLE</str>
   <str name="filter">QUERY</str>
  </reporter> 
 </metrics> 
</solr>

The above configuration tells Solr, that we want a new reporter called queryNodeReporter to be registered (this name needs to be unique) and we want it to look at the core information for the collection called data reported using the SolrSlf4jReporter, so written to a file. The data will be written to the console file every minute (period number of seconds) and will include information query related metrics (filter value of QUERY). The information written to the console looks as follows:

2017-01-14 20:23:30.940 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=GAUGE, name=QUERY.httpShardHandler.availableConnections, value=0
2017-01-14 20:23:30.940 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=GAUGE, name=QUERY.httpShardHandler.leasedConnections, value=0
2017-01-14 20:23:30.941 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=GAUGE, name=QUERY.httpShardHandler.maxConnections, value=10000
2017-01-14 20:23:30.941 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=GAUGE, name=QUERY.httpShardHandler.pendingConnections, value=0
2017-01-14 20:23:30.941 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=COUNTER, name=QUERY.httpShardHandler.threadPool.httpShardExecutor.running, count=0
2017-01-14 20:23:30.941 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=METER, name=QUERY.httpShardHandler.threadPool.httpShardExecutor.completed, count=0, mean_rate=0.0, m1=0.0, m5=0.0, m15=0.0, rate_unit=events/second
2017-01-14 20:23:30.941 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=METER, name=QUERY.httpShardHandler.threadPool.httpShardExecutor.submitted, count=0, mean_rate=0.0, m1=0.0, m5=0.0, m15=0.0, rate_unit=events/second
2017-01-14 20:23:30.941 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=TIMER, name=QUERY.httpShardHandler.threadPool.httpShardExecutor.duration, count=0, min=0.0, max=0.0, mean=0.0, stddev=0.0, median=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0, mean_rate=0.0, m1=0.0, m5=0.0, m15=0.0, rate_unit=events/second, duration_unit=milliseconds

Of course, we can have numerous reporters defined in the solr.xml and have mixed type of reporters. For example, we can have all core related metrics reported to Graphite and to log file at the same time:

<solr>
 <metrics>
  <reporter name="queryNodeReporter" registry="solr.node" class="org.apache.solr.metrics.reporters.SolrSlf4jReporter">
   <int name="period">60</int>
   <str name="prefix">example_query_stats</str>
   <str name="logger">CONSOLE</str>
   <str name="filter">QUERY</str>
  </reporter>
  <reporter name="exampleGraphiteReporter" group="solr.core.data" class="org.apache.solr.metrics.reporters.SolrGraphiteReporter">
   <str name="host">graphite.my.host.com</str>
   <int name="port">9999</int>
   <int name="period">60</int>
  </reporter> 
 </metrics> 
</solr>

Which Solr metrics reporter to use

Now that we know that Solr 6.4 can report all these new metrics you’ll need to decide where to write it.  If you already have Ganglia or Graphite in your organization and you use it for monitoring other services you may choose to continue using them.

While Ganglia and Graphite were sysadmins’ darlings at some point roughly a decade ago, they are (have been) showing their age.  Thus, if you don’t use Ganglia or Graphite today, I strongly suggest you don’t set them up just to have Solr monitoring.  There are better options.  Similarly, if you need to set up any infrastructure or extra tooling just so you can monitor your Solr clusters you may want to reconsider.  That tends to be too costly in terms of people’s time and/or infrastructure, although initially it almost always doesn’t feel that way. You can want to use Sematext SPM for Solr. It’s simple to install (you just need an agent to collect and ship metrics), pre-configured with various out of the box dashboards and alerts and, in short, it lets you troubleshoot Solr performance issues very quickly. What’s more, SPM integrates with Logsene, a hosted log management and analysis service, which makes it even easier to have a single pane of glass for all your operations data, not just Solr, not just metrics, but also metrics for other parts of your infrastructure and other logs. SPM will be soon able to collect and show all the additional metrics introduced with Solr 6.4, so watch out for announcement from us.

Where to go from here?

In this post we covered the basics of the new Solr metrics. For more information about Solr metrics visit  Solr CWiki pages.

If you are into metrics and monitoring like we are, you will like the new Solr exposing more of its metrics. If you only need this information from time to time for troubleshooting then the new API will come in handy for ad-hoc calls. For production monitoring you should use a good Solr monitoring solution that lets you not only look at the numbers and pretty charts, but also trigger alerts, correlate metrics with logs coming from different sources so that you can find, identify, troubleshoot, and fix problems faster.

To learn more about Solr ops, including its monitoring and metrics, consider Sematext’s Solr training, specifically the Intermediate Solr class that covers Solr monitoring.

SIGN UP – FREE TRIAL

Read the original blog entry...

More Stories By Sematext Blog

Sematext is a globally distributed organization that builds innovative Cloud and On Premises solutions for performance monitoring, alerting and anomaly detection (SPM), log management and analytics (Logsene), and search analytics (SSA). We also provide Search and Big Data consulting services and offer 24/7 production support for Solr and Elasticsearch.

Latest Stories
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
We build IoT infrastructure products - when you have to integrate different devices, different systems and cloud you have to build an application to do that but we eliminate the need to build an application. Our products can integrate any device, any system, any cloud regardless of protocol," explained Peter Jung, Chief Product Officer at Pulzze Systems, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA
SYS-CON Events announced today that Telecom Reseller has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
SYS-CON Events announced today that SourceForge has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. SourceForge is the largest, most trusted destination for Open Source Software development, collaboration, discovery and download on the web serving over 32 million viewers, 150 million downloads and over 460,000 active development projects each and every month.
Multiple data types are pouring into IoT deployments. Data is coming in small packages as well as enormous files and data streams of many sizes. Widespread use of mobile devices adds to the total. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists looked at the tools and environments that are being put to use in IoT deployments, as well as the team skills a modern enterprise IT shop needs to keep things running, get a handle on all this data, and deliver...
"We do one of the best file systems in the world. We learned how to deal with Big Data many years ago and we implemented this knowledge into our software," explained Jakub Ratajczak, Business Development Manager at MooseFS, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"Tintri focuses on the Ops side of the DevOps, which basically is pushing more and more of the accessibility of the infrastructure to the developers and trying to get behind the scenes," explained Dhiraj Sehgal of Tintri in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Cloud applications are seeing a deluge of requests to support the exploding advanced analytics market. “Open analytics” is the emerging strategy to deliver that data through an open data access layer, in the cloud, to be directly consumed by external analytics tools and popular programming languages. An increasing number of data engineers and data scientists use a variety of platforms and advanced analytics languages such as SAS, R, Python and Java, as well as frameworks such as Hadoop and Spark...
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.
The current age of digital transformation means that IT organizations must adapt their toolset to cover all digital experiences, beyond just the end users’. Today’s businesses can no longer focus solely on the digital interactions they manage with employees or customers; they must now contend with non-traditional factors. Whether it's the power of brand to make or break a company, the need to monitor across all locations 24/7, or the ability to proactively resolve issues, companies must adapt to...
Both SaaS vendors and SaaS buyers are going “all-in” to hyperscale IaaS platforms such as AWS, which is disrupting the SaaS value proposition. Why should the enterprise SaaS consumer pay for the SaaS service if their data is resident in adjacent AWS S3 buckets? If both SaaS sellers and buyers are using the same cloud tools, automation and pay-per-transaction model offered by IaaS platforms, then why not host the “shrink-wrapped” software in the customers’ cloud? Further, serverless computing, cl...
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.
SYS-CON Events announced today that Enzu will exhibit at SYS-CON's 21st Int\ernational Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Enzu’s mission is to be the leading provider of enterprise cloud solutions worldwide. Enzu enables online businesses to use its IT infrastructure to their competitive advantage. By offering a suite of proven hosting and management services, Enzu wants companies to focus on the core of their ...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
In his session at @ThingsExpo, Eric Lachapelle, CEO of the Professional Evaluation and Certification Board (PECB), provided an overview of various initiatives to certify the security of connected devices and future trends in ensuring public trust of IoT. Eric Lachapelle is the Chief Executive Officer of the Professional Evaluation and Certification Board (PECB), an international certification body. His role is to help companies and individuals to achieve professional, accredited and worldwide re...