Welcome!

Blog Feed Post

Solr V2 API – Quick Look

We are all used to the Solr API that has been present in Solr from its beginnings. We send the data using HTTP protocol, we include all parameters in the URL itself, and we are bound to that. Some people loved this, some not so much.  Staring with Solr 6.5 we now have a new, self-documenting API called v2. Let’s look at this new API, how to use it and how it is different from the old fashioned Solr API.



Introducing the New Solr API

Let’s just immediately start working with the new API.  It’s probably the best way to learn about it.  Here’s the most basic request we can execute against the new Solr API:

$ curl http://localhost:8983/v2

First thing you’ll notice is that the new API is not available under the usual Solr context – there is no /solr in the URL. Instead, we talk to it using the /v2 URI path. This lets Solr have two separate sets of APIs in the same instance of Solr and have a space for new APIs introduced in the future. The response of the above call looks as follows:

{"responseHeader":{"status":0,"QTime":0},"collections":["gettingstarted"]} 

As we can see, the new API returns the same old standard response header and the list of collections that are present in the cluster. The call to the old API to get this same info looks like this:

$ curl 'http://localhost:8983/solr/admin/collections?action=LIST'

This time, the response is returned in the XML, but the information is the same:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">0</int></lst><arr name="collections"><str>gettingstarted</str></arr>
</response>

Of course, in both cases we can pretty-print the results by adding indent=true to the request, like this:

$ curl 'http://localhost:8983/v2?indent=true'
{
  "responseHeader":{
    "status":0,
    "QTime":0},
  "collections":["gettingstarted"]}

We can also change the response type when using the old API, so that the returned response is very similar:

$ curl 'http://localhost:8983/solr/admin/collections?action=LIST&wt=json&indent=true'
 {
   "responseHeader":{
     "status":0,
     "QTime":0},
   "collections":["gettingstarted"]}
 

So, why is that different?

First things first – the new API is self-documenting. That means that we can get the list of information and options we have when using the new API. By adding the _introspect endpoint to any API v2 calls we can get the list of possible operations using that endpoint. For example:

$ curl 'http://localhost:8983/v2/collections/_introspect?indent=true'

Or even better, we can use c instead of collections to shorten the call to look as follows:

$ curl 'http://localhost:8983/v2/c/_introspect?indent=true'

The response returned by Solr is rather large, so we’ll just show a portion of that, but you can see that the API contains not only the response with the data we are looking for, but also some additional descriptions which make the API self-documenting:

{
  "responseHeader":{
    "status":0,
    "QTime":2},
  "spec":[{
      "documentation":"https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api6",
      "description":"Deletes a collection.",
      "methods":["DELETE"],
      "url":{"paths":["/collections/{collection}",
          "/c/{collection}"]}},
    {
      "documentation":"https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api1",
      "description":"Create collections and collection aliases, backup or restore collections, and delete collections and aliases.",
      "methods":["POST"],
      "url":{"paths":["/collections",
          "/c"]},
.
.
.

As you can already tell, the new v2 API is more modern and most of the parameters are sent in the request body, instead of the URI. Once the new v2 API covers all the functionality of the old API, SolrJ and Solr admin will start using the new API and after that it is expected that the old API will be deprecated and then removed. Because of that it might be a good to start getting used to the new API right away, so you have easier learning curve and faster adoption when you finally decide to move to the new way of talking to Solr.

V2 Solr API Capabilities

The response returned by the commands that we’ve seen above is large, so I encourage you to check the response yourself. What I would like to do is provide you with a brief description on what can be done using the v2 API:

  • Creating, deleting and managing collections
  • Creating aliases, backing up and restoring collections
  • Sending data
  • Updating collection configuration
  • Managing schema and managed resources
  • Using request handlers – for example running search requests
  • Adding and removing replicas
  • Managing cores
  • Performing overseer operations
  • Managing node roles
  • Setting cluster properties
  • Uploading and downloading blobs and metadata

As you can see we can already do lots of things with the new API and because the API is self-documenting we can quickly, without searching for the documentation, see how to work with it. For example, if we wanted to see what we can do with shards, we could run a command like this (we’ll use one of the out of the box collections that come with Solr called gettingstarted):

$ curl 'localhost:8983/v2/c/gettingstarted/shards/_introspect?indent=true'

The response shows us what we can do with “/shards” API:

{
  "spec":[{
      "documentation":"https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api7",
      "description":"Deletes a shard by unloading all replicas of the shard, removing it from clusterstate.json, and by default deleting the instanceDir and dataDir. Only inactive shards or those which have no range for custom sharding will be deleted.",
      "methods":["DELETE"],
      "url":{
        "paths":["/collections/{collection}/shards/{shard}",
          "/c/{collection}/shards/{shard}"],
        "params":{
          "deleteInstanceDir":{
            "type":"boolean",
            "description":"By default Solr will delete the entire instanceDir of each replica that is deleted. Set this to false to prevent the instance directory from being deleted."},
          "deleteDataDir":{
            "type":"boolean",
            "description":"y default Solr will delete the dataDir of each replica that is deleted. Set this to false to prevent the data directory from being deleted."},
          "async":{
            "type":"string",
            "description":"Defines a request ID that can be used to track this action after it's submitted. The action will be processed asynchronously when this is defined. This command can be long-running, so running it asynchronously is recommended."}}}},
    {
      "documentation":"https://cwiki.apache.org/confluence/display/solr/Collections+API",
      "description":"Allows you to create a shard, split an existing shard or add a new replica.",
      "methods":["POST"],
      "url":{"paths":["/collections/{collection}/shards",
          "/c/{collection}/shards"]},
      "commands":{
        "split":{
          "type":"object",
          "documentation":"https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3",
          "description":"Splits an existing shard into two or more new shards. During this action, the existing shard will continue to contain the original data, but new data will be routed to the new shards once the split is complete. New shards will have as many replicas as the existing shards. A soft commit will be done automatically. An explicit commit request is not required because the index is automatically saved to disk during the split operation. New shards will use the original shard name as the basis for their names, adding an underscore and a number to differentiate the new shard. For example, 'shard1' would become 'shard1_0' and 'shard1_1'. Note that this operation can take a long time to complete.",
          "properties":{
            "shard":{
              "type":"string",
              "description":"The name of the shard to be split."},
            "ranges":{
              "description":"A comma-separated list of hexadecimal hash ranges that will be used to split the shard into new shards containing each defined range, e.g. ranges=0-1f4,1f5-3e8,3e9-5dc. This is the only option that allows splitting a single shard into more than 2 additional shards. If neither this parameter nor splitKey are defined, the shard will be split into two equal new shards.",
              "type":"string"},
            "splitKey":{
              "description":"A route key to use for splitting the index. If this is defined, the shard parameter is not required because the route key will identify the correct shard. A route key that spans more than a single shard is not supported. If neither this parameter nor ranges are defined, the shard will be split into two equal new shards.",
              "type":"string"},
            "coreProperties":{
              "type":"object",
              "documentation":"https://cwiki.apache.org/confluence/display/solr/Defining+core.properties",
              "description":"Allows adding core.properties for the collection. Some examples of core properties you may want to modify include the config set, the node name, the data directory, among others.",
              "additionalProperties":true},
            "async":{
              "type":"string",
              "description":"Defines a request ID that can be used to track this action after it's submitted. The action will be processed asynchronously when this is defined. This command can be long-running, so running it asynchronously is recommended."}}},
        "create":{
          "type":"object",
          "properties":{
            "nodeSet":{
              "description":"Defines nodes to spread the new collection across. If not provided, the collection will be spread across all live Solr nodes. The names to use are the 'node_name', which can be found by a request to the cluster/nodes endpoint.",
              "type":"array",
              "items":{"type":"string"}},
            "shard":{
              "description":"The name of the shard to be created.",
              "type":"string"},
            "coreProperties":{
              "type":"object",
              "documentation":"https://cwiki.apache.org/confluence/display/solr/Defining+core.properties",
              "description":"Allows adding core.properties for the collection. Some examples of core properties you may want to modify include the config set, the node name, the data directory, among others.",
              "additionalProperties":true},
            "async":{
              "type":"string",
              "description":"Defines a request ID that can be used to track this action after it's submitted. The action will be processed asynchronously when this is defined."}},
          "required":["shard"]},
        "add-replica":{
          "documentation":"https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica",
          "description":"",
          "type":"object",
          "properties":{
            "shard":{
              "type":"string",
              "description":"The name of the shard in which this replica should be created. If this parameter is not specified, then '_route_' must be defined."},
            "_route_":{
              "type":"string",
              "description":"If the exact shard name is not known, users may pass the _route_ value and the system would identify the name of the shard. Ignored if the shard param is also specified. If the 'shard' parameter is also defined, this parameter will be ignored."},
            "node":{
              "type":"string",
              "description":"The name of the node where the replica should be created."},
            "instanceDir":{
              "type":"string",
              "description":"An optional custom instanceDir for this replica."},
            "dataDir":{
              "type":"string",
              "description":"An optional custom directory used to store index data for this replica."},
            "coreProperties":{
              "type":"object",
              "documentation":"https://cwiki.apache.org/confluence/display/solr/Defining+core.properties",
              "description":"Allows adding core.properties for the collection. Some examples of core properties you may want to modify include the config set and the node name, among others.",
              "additionalProperties":true},
            "async":{
              "type":"string",
              "description":"Defines a request ID that can be used to track this action after it's submitted. The action will be processed asynchronously when this is defined."}},
          "required":["shard"]}}},
    {
      "documentation":"https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api1",
      "description":"Lists all collections, with details on shards and replicas in each collection.",
      "methods":["GET"],
      "url":{"paths":["/collections/{collection}",
          "/c/{collection}",
          "/collections/{collection}/shards",
          "/c/{collection}/shards",
          "/collections/{collection}/shards/{shard}",
          "/c/{collection}/shards/{shard}",
          "/collections/{collection}/shards/{shard}/{replica}",
          "/c/{collection}/shards/{shard}/{replica}"]}}],
  "WARNING":"This response format is experimental.  It is likely to change in the future.",
  "WARNING":"This response format is experimental.  It is likely to change in the future.",
  "WARNING":"This response format is experimental.  It is likely to change in the future.",
  "availableSubPaths":{
    "/c/gettingstarted/shards/{shard}/{replica}":["DELETE",
      "GET"],
    "/c/gettingstarted/shards/{shard}":["DELETE",
      "POST",
      "GET"]}}

As you can see, the API provides us all information about itself that we need – the HTTP verbs that we can use, the parameters that can be present, and finally their description, so that we know what each parameter is all about. We can also get information about the given command and/or the HTTP verb, for example:

$ curl 'http://localhost:8983/v2/c/gettingstarted/shards/shard2/_introspect?method=DELETE&indent=true'

Judging from the response further above we could, for example, delete a replica by running the following command:

$ curl -XDELETE 'localhost:8983/v2/c/gettingstarted/shards/shard2/core_node3'

The response to the last command would look as follows:

{"responseHeader":{"status":0,"QTime":278},"success":{"192.168.1.15:7574_solr":{"responseHeader":{"status":0,"QTime":69}}}}

Which means that the replica for the shard2 has been removed, which can also be checked via the Solr admin panel:

Solr V2 - Solr admin panelhttps://sematext.com/wp-content/uploads/2017/05/Solr-V2-1-300x41.png 300w, https://sematext.com/wp-content/uploads/2017/05/Solr-V2-1-768x104.png 768w" sizes="(max-width: 975px) 100vw, 975px" />

We can also add replicas using the new API and this operation will be good to illustrate how to pass parameters with the request. Let’s add the replica to shard2 by using the following command:

$ curl -XPOST 'localhost:8983/v2/c/gettingstarted/shards/' -H 'Content-type:application/json' -d '{
 "add-replica" : {
  "shard" : "shard2",
  "node" : "192.168.1.15:7574_solr"
 }
}'

We added the header identifying the content type of the body and we provided the add-replica command along with two parameters – shard and node. The shard parameter specifies which part of the collection we are interested in and the node property tells Solr, on which Solr instance the replica should be created. Please note that the node address is not only the IP address also include the port and usual _solr part.

The response would look as follows:

{"responseHeader":{"status":0,"QTime":1329},"success":{"192.168.1.15:7574_solr":{"responseHeader":{"status":0,"QTime":1318},"core":"gettingstarted_shard2_replica2"}}}

And would result in a new replica being added:

Solr V2 https://sematext.com/wp-content/uploads/2017/05/solr-V2-2-300x45.png 300w, https://sematext.com/wp-content/uploads/2017/05/solr-V2-2-768x115.png 768w" sizes="(max-width: 975px) 100vw, 975px" />

What’s Next

The API we just introduced is still work in progress. We are still missing a few things, but the V2 API is fairly new, so we can expect lots of changes in the next few Solr versions.

Want to learn more about Solr? Subscribe to our blog or follow @sematext. If you need any help with Solr / SolrCloud – don’t forget that we provide Solr Consulting, Solr Production Support, and offer Solr Training!

Read the original blog entry...

More Stories By Sematext Blog

Sematext is a globally distributed organization that builds innovative Cloud and On Premises solutions for performance monitoring, alerting and anomaly detection (SPM), log management and analytics (Logsene), and search analytics (SSA). We also provide Search and Big Data consulting services and offer 24/7 production support for Solr and Elasticsearch.

Latest Stories
SYS-CON Events announced today that Systena America will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Systena Group has been in business for various software development and verification in Japan, US, ASEAN, and China by utilizing the knowledge we gained from all types of device development for various industries including smartphones (Android/iOS), wireless communication, security technology and IoT serv...
DevOps at Cloud Expo – being held October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real r...
SYS-CON Events announced today that Cloud Academy will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Cloud Academy is the industry’s most innovative, vendor-neutral cloud technology training platform. Cloud Academy provides continuous learning solutions for individuals and enterprise teams for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most popular cloud computing technologies. Ge...
SYS-CON Events announced today that Fusion, a leading provider of cloud services, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Fusion, a leading provider of integrated cloud solutions to small, medium and large businesses, is the industry’s single source for the cloud. Fusion’s advanced, proprietary cloud service platform enables the integration of leading edge solutions in the cloud, including cloud...
Five years ago development was seen as a dead-end career, now it’s anything but – with an explosion in mobile and IoT initiatives increasing the demand for skilled engineers. But apart from having a ready supply of great coders, what constitutes true ‘DevOps Royalty’? It’ll be the ability to craft resilient architectures, supportability, security everywhere across the software lifecycle. In his keynote at @DevOpsSummit at 20th Cloud Expo, Jeffrey Scheaffer, GM and SVP, Continuous Delivery Busine...
While some vendors scramble to create and sell you a fancy solution for monitoring your spanking new Amazon Lambdas, hear how you can do it on the cheap using just built-in Java APIs yourself. By exploiting a little-known fact that Lambdas aren’t exactly single threaded, you can effectively identify hot spots in your serverless code. In his session at 20th Cloud Expo, David Martin, Principal Product Owner at CA Technologies, will give a live demonstration and code walkthrough, showing how to ov...
SYS-CON Events announced today that Interoute has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Interoute is the owner operator of Europe's largest network and a global cloud services platform, which encompasses over 70,000 km of lit fiber, 15 data centers, 17 virtual data centers and 33 colocation centers, with connections to 195 additional partner data centers. Our full-service Unifie...
Cloud promises the agility required by today’s digital businesses. As organizations adopt cloud based infrastructures and services, their IT resources become increasingly dynamic and hybrid in nature. Managing these require modern IT operations and tools. In his session at 20th Cloud Expo, Raj Sundaram, Senior Principal Product Manager at CA Technologies, will discuss how to modernize your IT operations in order to proactively manage your hybrid cloud and IT environments. He will be sharing bes...
SYS-CON Events announced today that Ocean9will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Ocean9 provides cloud services for Backup, Disaster Recovery (DRaaS) and instant Innovation, and redefines enterprise infrastructure with its cloud native subscription offerings for mission critical SAP workloads.
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with the 21st International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. @ThingsExpo Silicon Valley Call for Papers is now open.
SYS-CON Events announced today that Twistlock, the leading provider of cloud container security solutions, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Twistlock is the industry's first enterprise security suite for container security. Twistlock's technology addresses risks on the host and within the application of the container, enabling enterprises to consistently enforce security policies, monitor...
This talk centers around how to automate best practices in a multi-/hybrid-cloud world based on our work with customers like GE, Discovery Communications and Fannie Mae. Today’s enterprises are reaping the benefits of cloud computing, but also discovering many risks and challenges. In the age of DevOps and the decentralization of IT, it’s easy to over-provision resources, forget that instances are running, or unintentionally expose vulnerabilities.
SYS-CON Events announced today that Enzu will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Enzu’s mission is to be the leading provider of enterprise cloud solutions worldwide. Enzu enables online businesses to use its IT infrastructure to their competitive ad...
Everywhere we turn in our industry we can find strong opinions about the direction, type and nature of cloud’s impact on computing and business. Another word that is used in every context in our industry is “hybrid.” In his session at 20th Cloud Expo, Alvaro Gonzalez, Director of Technical, Partner and Field Marketing at Peak 10, will use a combination of a few conceptual props and some research recently commissioned by Peak 10 to offer a real-world consideration of how the various categories of...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In his Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, will explore t...