Welcome!

Blog Feed Post

Solr New Metrics API: Quick look at Solr 6.4

As you know, in Sematext we looooove logs and metrics and we enjoy playing with them on a daily basis. We have our Logsene, which is all about logs and their analysis. We have our SPM which can monitor half of the world when it comes to performance metrics, give you a perfect view of class level performance with transaction tracing, and give you a perfect view of your environment with components view. And that’s not all.

Because we enjoy working with metrics, we decided to take a look at the shiny, new Solr metrics API that came out with recently released Solr 6.4. It comes with the promise of providing a convenient method to fetch data in various formats. It also makes it possible to report metrics to external systems. As metrics aficionados and Solr users ourselves we are thrilled to have this at our disposal and jumped on the opportunity to play around with Solr’s new metrics goodness. Let’s start with some history on how it used to be in “the good old days”.

Back in the day…

A long, long time ago, in the Apache country ruled by king Solr and queen Lucene there was a messenger called JMX. Ok, wrong story. To make the long story short – till now the only thing you could do to fetch metrics from Solr is to use JMX. If you wanted to see what is happening inside Solr you had to either use the Solr admin panel and look into what you were interested in or use a Java tool that could read those metrics. Actually, any kind of tool that is able to read data from JMX, for example jconsole that is shipped in the JDK package, can be used to fetch metrics from Solr.

solr new metrics apihttps://sematext.com/wp-content/uploads/2017/02/Solr-new-metrics-api-300... 300w, https://sematext.com/wp-content/uploads/2017/02/Solr-new-metrics-api-768... 768w" sizes="(max-width: 975px) 100vw, 975px" />

This gives us some visibility, but there are issues with such approach. The main problem is that we don’t have a full view of the historical data. We see cumulative statistics or changes since the time we started looking into metrics, but if we want to see what was happening one hour ago, we couldn’t do that or at least not out of the box.

Of course, those obstacles can be overcome by using monitoring tools for Solr. We can either use something open source like Ganglia and configure everything from scratch using information available out in the wild places of the Internet or we can use something that does everything for us and we only need to install an agent, as is the case with SPM from Sematext.

solr new metrics apihttps://sematext.com/wp-content/uploads/2017/02/solr-new-metrics-api-2-3... 300w, https://sematext.com/wp-content/uploads/2017/02/solr-new-metrics-api-2-7... 768w" sizes="(max-width: 975px) 100vw, 975px" />

However, if you are allergic to any kind of UI that helps you with understanding your metrics you were forced to use JMX and tools that could fetch metrics from it. Until now.

Metrics in Solr – the new kid on the block

So what can we do now? Well, with SOLR-4735 Solr metrics evolved. Metrics are no longer limited to being exposed for polling via JMX.  Instead, Solr can now also send metrics to external destinations and file, exposes a nice REST API and extended set of metrics. What’s more, the extended metrics are now not only about Solr components, but you can also retrieve JVM level information or Jetty and HTTP related statistics. Sounds fun if you ask me!

However, this is not all. Right now Solr can export metrics to other systems and that functionality is available out of the box. Yes, apart from the standard JMX reporting, we can save metrics to log file using SLF4J reporter or even send metrics to Ganglia or Graphite. All of that is available out of the box without any special plugins or additions.

Finally, one last thing – the API. Once you know what you want to fetch, it is as easy as pointing your browser or curl command to /admin/metrics, providing the type and group of metrics you would like to retrieve, and Solr will return all the requested information.  

Next, let’s dig a bit deeper and see what we need to do to make all of that working or configured.

Types of meters

The improvements around Solr metrics are based on DropWizard metrics – a Java library supporting easy metrics reporting and different kinds of meters. The ones used by Solr are:

  • counters
  • gauges
  • histograms
  • meters
  • timers

The counters are there to measure simple counts – for example number of queries processed by the request handler. The gauges allow one to check the current value – for example the length of a queue or number, like number of pending documents. Histograms let Solr calculate approximate distribution of certain values and provide statistics around them. For example, histogram can be used to give us percentiles for query response times. Meters are there to provide rates of events, like average number of query requests per second in the periods of 1, 5 and 15 minutes. Finally, the timers are there to count the duration of certain events and allow for both counts and histograms.

Solr metric groups

Solr uses the above mentioned standardized meters to gather the data from various metric, categorizing them into the following groups:

  • Core information (solr.core.<collection_name> group, each for one core)
  • HTTP information (solr.http group)
  • Jetty information (solr.jetty group)
  • JVM information (solr.jvm group)
  • Solr node information (solr.node group)
  • More metrics are coming in the future, like shard leaders metrics, cluster nodes metrics and aggregated per-core metrics. However, these are available in Solr 6.4.

So what can we get from those metrics? The core information group lets us get merics  request handlers (counters, timers, errors, timeouts, etc), index related events (meters and gauges for merges, both current and historical, deleted documents information), Lucene Directory I/O information (read and writes meters and histograms with split per file type – like doc values or terms dictionary). In the future, we will also see shard replication, transaction log information and caches.

The HTTP group provides information regarding available, open and pending connections for shard handler and update handler.

The Jetty related group gives us view on a few very important statistics related to the servlet container in which Solr runs. We can look at threads and thread pools usage, connection and request timers, and meters for various HTTP response codes – like number of request with 200 response code, number of 400 or 500 and so on.

Very interesting JVM group gives us low level information on what is happening inside Solr and inside the JVM itself. We have the direct and mapped buffer pools information, class loading and unloading related information, garbage collection meters and timers, operating system memory, CPU timer, file descriptors count, system load and swap usage (btw, remember that JVM hates its memory to be swapped – it reacts with poor performance).

Finally, the node related group provides information related to Solr node, such as number of cores that are loaded on the core, lazy or unloaded, information about request handlers, collections, configuration sets and so on.

Solr Metrics API

One of the possible methods of retrieving Solr metrics is using the Metrics API. The admin/metrics endpoint provides access to all the metrics that we can get. We should just send a HTTP GET request to that endpoint and let Solr know what we are interested in via available parameters. The parameters of the endpoint are as follows:

  • group – comma separated metric group names we are interested or all value if we are interested in all of them, for example node,jvm.
  • type – comma separated types of metrics that we would like to retrieve or all if we are interested in all types. Possible values are the same as type of metrics that we already mentioned, so: counter, gauge, histogram, meter and timer.
  • prefix – the filter that will display metrics starting with the specified prefix. For example saying prefix=INDEX will return only indexing specific metrics.

Let’s now look at some example responses from Solr, so we know what to expect.

For example, to retrieve minor merge related metrics we could run:

http://localhost:8983/solr/admin/metrics?group=all&type=all&prefix=INDEX.merge.minor

The response looks as follows:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">0</int>
</lst>
<lst name="metrics">
  <lst name="solr.jvm"/>
  <lst name="solr.jetty"/>
  <lst name="solr.node"/>
  <lst name="solr.core.data.shard1.replica1">
    <lst name="INDEX.merge.minor">
      <long name="count">2</long>
      <double name="meanRate">0.022729587366906834</double>
      <double name="1minRate">0.018095662944349354</double>
      <double name="5minRate">0.00588664962556148</double>
      <double name="15minRate">0.0021316664456582147</double>
      <double name="min_ms">1132.643211</double>
      <double name="max_ms">1415.985689</double>
      <double name="mean_ms">1253.221722818028</double>
      <double name="median_ms">1132.643211</double>
      <double name="stddev_ms">140.09224396740174</double>
      <double name="p75_ms">1415.985689</double>
      <double name="p95_ms">1415.985689</double>
      <double name="p99_ms">1415.985689</double>
      <double name="p999_ms">1415.985689</double>
    </lst>
    <lst name="INDEX.merge.minor.running">
      <int name="value">0</int>
    </lst>
    <lst name="INDEX.merge.minor.running.docs">
      <long name="value">0</long>
    </lst>
    <lst name="INDEX.merge.minor.running.segments">
      <int name="value">0</int>
    </lst>
  </lst>
</lst>
</response>

We can also look at directory related metrics like this:

http://localhost:8983/solr/admin/metrics?group=core&prefix=DIRECTORY&wt=json

The response, this time in JSON (because of wt=json in the request) should have the structure that looks as follows:

{
  "responseHeader":{
    "status":0,
    "QTime":0},
  "metrics":[
    "solr.core.gettingstarted.shard1.replica1",[
      "DIRECTORY.total.reads",[
        "count",44502443,
        "meanRate",266358.17622526025,
        "1minRate",111103.14910078928,
        "5minRate",101317.5852718284,
        "15minRate",43604.83723238706],
      "DIRECTORY.total.writes",[
        "count",28269374,
        "meanRate",169199.22568953407,
        "1minRate",65274.18238346925,
        "5minRate",63257.474658024876,
        "15minRate",27508.27281761715]]]}

Reporters

Finally, all the metrics that we discussed so far can be reported automatically by Solr. In Solr 6.4 the following out of the box reporters are available:

  • JMX reporter (class org.apache.solr.metrics.reporters.SolrJmxReporter)
  • SLF4J reporter (class org.apache.solr.metrics.reporters.SolrSlf4jReporter)
  • Graphite reporter (class org.apache.solr.metrics.reporters.SolrGraphiteReporter)
  • Ganglia reporter (class org.apache.solr.metrics.reporters.SolrGangliaReporter)

So instead of retrieving the data on demand we can just let Solr report to an external place or log file. We include appropriate configuration in the solr.xml file (in the metrics section) and that is all what needs to be done.

For example, to include all the Solr node query related metrics on the console output of Solr we would include the following fragment in the solr.xml file:

<solr>
 <metrics>
  <reporter name="queryNodeReporter" registry="solr.node" class="org.apache.solr.metrics.reporters.SolrSlf4jReporter">
   <int name="period">60</int>
   <str name="prefix">example_query_stats</str>
   <str name="logger">CONSOLE</str>
   <str name="filter">QUERY</str>
  </reporter> 
 </metrics> 
</solr>

The above configuration tells Solr, that we want a new reporter called queryNodeReporter to be registered (this name needs to be unique) and we want it to look at the core information for the collection called data reported using the SolrSlf4jReporter, so written to a file. The data will be written to the console file every minute (period number of seconds) and will include information query related metrics (filter value of QUERY). The information written to the console looks as follows:

2017-01-14 20:23:30.940 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=GAUGE, name=QUERY.httpShardHandler.availableConnections, value=0
2017-01-14 20:23:30.940 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=GAUGE, name=QUERY.httpShardHandler.leasedConnections, value=0
2017-01-14 20:23:30.941 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=GAUGE, name=QUERY.httpShardHandler.maxConnections, value=10000
2017-01-14 20:23:30.941 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=GAUGE, name=QUERY.httpShardHandler.pendingConnections, value=0
2017-01-14 20:23:30.941 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=COUNTER, name=QUERY.httpShardHandler.threadPool.httpShardExecutor.running, count=0
2017-01-14 20:23:30.941 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=METER, name=QUERY.httpShardHandler.threadPool.httpShardExecutor.completed, count=0, mean_rate=0.0, m1=0.0, m5=0.0, m15=0.0, rate_unit=events/second
2017-01-14 20:23:30.941 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=METER, name=QUERY.httpShardHandler.threadPool.httpShardExecutor.submitted, count=0, mean_rate=0.0, m1=0.0, m5=0.0, m15=0.0, rate_unit=events/second
2017-01-14 20:23:30.941 INFO  (metrics-logger-reporter-1-thread-1) [   ] CONSOLE type=TIMER, name=QUERY.httpShardHandler.threadPool.httpShardExecutor.duration, count=0, min=0.0, max=0.0, mean=0.0, stddev=0.0, median=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0, mean_rate=0.0, m1=0.0, m5=0.0, m15=0.0, rate_unit=events/second, duration_unit=milliseconds

Of course, we can have numerous reporters defined in the solr.xml and have mixed type of reporters. For example, we can have all core related metrics reported to Graphite and to log file at the same time:

<solr>
 <metrics>
  <reporter name="queryNodeReporter" registry="solr.node" class="org.apache.solr.metrics.reporters.SolrSlf4jReporter">
   <int name="period">60</int>
   <str name="prefix">example_query_stats</str>
   <str name="logger">CONSOLE</str>
   <str name="filter">QUERY</str>
  </reporter>
  <reporter name="exampleGraphiteReporter" group="solr.core.data" class="org.apache.solr.metrics.reporters.SolrGraphiteReporter">
   <str name="host">graphite.my.host.com</str>
   <int name="port">9999</int>
   <int name="period">60</int>
  </reporter> 
 </metrics> 
</solr>

Which Solr metrics reporter to use

Now that we know that Solr 6.4 can report all these new metrics you’ll need to decide where to write it.  If you already have Ganglia or Graphite in your organization and you use it for monitoring other services you may choose to continue using them.

While Ganglia and Graphite were sysadmins’ darlings at some point roughly a decade ago, they are (have been) showing their age.  Thus, if you don’t use Ganglia or Graphite today, I strongly suggest you don’t set them up just to have Solr monitoring.  There are better options.  Similarly, if you need to set up any infrastructure or extra tooling just so you can monitor your Solr clusters you may want to reconsider.  That tends to be too costly in terms of people’s time and/or infrastructure, although initially it almost always doesn’t feel that way. You can want to use Sematext SPM for Solr. It’s simple to install (you just need an agent to collect and ship metrics), pre-configured with various out of the box dashboards and alerts and, in short, it lets you troubleshoot Solr performance issues very quickly. What’s more, SPM integrates with Logsene, a hosted log management and analysis service, which makes it even easier to have a single pane of glass for all your operations data, not just Solr, not just metrics, but also metrics for other parts of your infrastructure and other logs. SPM will be soon able to collect and show all the additional metrics introduced with Solr 6.4, so watch out for announcement from us.

Where to go from here?

In this post we covered the basics of the new Solr metrics. For more information about Solr metrics visit  Solr CWiki pages.

If you are into metrics and monitoring like we are, you will like the new Solr exposing more of its metrics. If you only need this information from time to time for troubleshooting then the new API will come in handy for ad-hoc calls. For production monitoring you should use a good Solr monitoring solution that lets you not only look at the numbers and pretty charts, but also trigger alerts, correlate metrics with logs coming from different sources so that you can find, identify, troubleshoot, and fix problems faster.

To learn more about Solr ops, including its monitoring and metrics, consider Sematext’s Solr training, specifically the Intermediate Solr class that covers Solr monitoring.

SIGN UP – FREE TRIAL

Read the original blog entry...

More Stories By Sematext Blog

Sematext is a globally distributed organization that builds innovative Cloud and On Premises solutions for performance monitoring, alerting and anomaly detection (SPM), log management and analytics (Logsene), and search analytics (SSA). We also provide Search and Big Data consulting services and offer 24/7 production support for Solr and Elasticsearch.

Latest Stories
China Unicom exhibit at the 19th International Cloud Expo, which took place at the Santa Clara Convention Center in Santa Clara, CA, in November 2016. China United Network Communications Group Co. Ltd ("China Unicom") was officially established in 2009 on the basis of the merger of former China Netcom and former China Unicom. China Unicom mainly operates a full range of telecommunications services including mobile broadband (GSM, WCDMA, LTE FDD, TD-LTE), fixed-line broadband, ICT, data communica...
Whether you like it or not, DevOps is on track for a remarkable alliance with security. The SEC didn’t approve the merger. And your boss hasn’t heard anything about it. Yet, this unruly triumvirate will soon dominate and deliver DevSecOps faster, cheaper, better, and on an unprecedented scale. In his session at DevOps Summit, Frank Bunger, VP of Customer Success at ScriptRock, discussed how this cathartic moment will propel the DevOps movement from such stuff as dreams are made on to a practic...
In their Live Hack” presentation at 17th Cloud Expo, Stephen Coty and Paul Fletcher, Chief Security Evangelists at Alert Logic, provided the audience with a chance to see a live demonstration of the common tools cyber attackers use to attack cloud and traditional IT systems. This “Live Hack” used open source attack tools that are free and available for download by anybody. Attendees learned where to find and how to operate these tools for the purpose of testing their own IT infrastructure. The...
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From ...
SYS-CON Events announced today that MobiDev, a client-oriented software development company, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MobiDev is a software company that develops and delivers turn-key mobile apps, websites, web services, and complex softw...
SYS-CON Events announced today that Loom Systems will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Founded in 2015, Loom Systems delivers an advanced AI solution to predict and prevent problems in the digital business. Loom stands alone in the industry as an AI analysis platform requiring no prior math knowledge from operators, leveraging the existing staff to succeed in the digital era. With offices in S...
SYS-CON Events announced today that Cloud Academy will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Cloud Academy is the industry’s most innovative, vendor-neutral cloud technology training platform. Cloud Academy provides continuous learning solutions for individuals and enterprise teams for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most popular cloud computing technologies. Ge...
Historically, some banking activities such as trading have been relying heavily on analytics and cutting edge algorithmic tools. The coming of age of powerful data analytics solutions combined with the development of intelligent algorithms have created new opportunities for financial institutions. In his session at 20th Cloud Expo, Sebastien Meunier, Head of Digital for North America at Chappuis Halder & Co., will discuss how these tools can be leveraged to develop a lasting competitive advanta...
"My role is working with customers, helping them go through this digital transformation. I spend a lot of time talking to banks, big industries, manufacturers working through how they are integrating and transforming their IT platforms and moving them forward," explained William Morrish, General Manager Product Sales at Interoute, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
For organizations that have amassed large sums of software complexity, taking a microservices approach is the first step toward DevOps and continuous improvement / development. Integrating system-level analysis with microservices makes it easier to change and add functionality to applications at any time without the increase of risk. Before you start big transformation projects or a cloud migration, make sure these changes won’t take down your entire organization.
With billions of sensors deployed worldwide, the amount of machine-generated data will soon exceed what our networks can handle. But consumers and businesses will expect seamless experiences and real-time responsiveness. What does this mean for IoT devices and the infrastructure that supports them? More of the data will need to be handled at - or closer to - the devices themselves.
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
My team embarked on building a data lake for our sales and marketing data to better understand customer journeys. This required building a hybrid data pipeline to connect our cloud CRM with the new Hadoop Data Lake. One challenge is that IT was not in a position to provide support until we proved value and marketing did not have the experience, so we embarked on the journey ourselves within the product marketing team for our line of business within Progress. In his session at @BigDataExpo, Sum...
The taxi industry never saw Uber coming. Startups are a threat to incumbents like never before, and a major enabler for startups is that they are instantly “cloud ready.” If innovation moves at the pace of IT, then your company is in trouble. Why? Because your data center will not keep up with frenetic pace AWS, Microsoft and Google are rolling out new capabilities In his session at 20th Cloud Expo, Don Browning, VP of Cloud Architecture at Turner, will posit that disruption is inevitable for c...
SYS-CON Events announced today that Telecom Reseller has been named “Media Sponsor” of SYS-CON's 20th International Cloud Expo, which will take place on June 6–8, 2017, at the Javits Center in New York City, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.