Welcome!

Blog Feed Post

Migrating to SolrCloud from Solr Master-Slave

Nowadays there are more and more organizations searching for fault-tolerant and highly available solutions for various parts of their infrastructure, including search, which evolved from merely a “nice to have” feature to the first class citizen and a “must have” element.

Apache Solr is a mature search solution that has been available for over a decade now.  Its traditional master-slave deployment has been available since 2006, while the fully distributed deployment known as SolrCloud has been available for only a few years now. Thus, naturally, many organizations are in the process of migrating from Solr master-slave to SolrCloud, or are at least thinking about the move. In this article, we will give you an overview of what’s needed to be done for the migration to SolrCloud to be as smooth as it can be.

Step One: Hardware

The first thing that you need to think about when migrating from old master-slave environment is the hardware. You may wonder why the hardware should be different from what you have right now. There are a few reasons:

  • You want to keep the current production while migrating to new solution to avoid service interruption. Doing that will also let you test the new SolrCloud cluster and rollback to the old Solr master-slave if something goes awry. It is also a good idea to run various performance tests before going to production, so having a separate hardware for the SolrCloud cluster may be a very good idea if you can afford it.
  • You will want to prepare the infrastructure for your new search setup so that it can handle the load for the next N months or even a few years, so you don’t have to change the architecture or infrastructure again in the near future.
  • If your data is changing rapidly, with SolrCloud you will index the data to all nodes at the same time, not only to the master servers. This may require more resources on the nodes. More disk I/O will be needed. Because of constant data indexing, data will also be indexed on the replicas, so you need to account for that.
  • If you plan to have replicas, keep in mind they will do the same operations as the leader shards, so having more replicas means not only more storage requirements, but also more resources like CPU and memory.
  • Finally, you will need ZooKeeper ensemble to be working. SolrCloud uses ZooKeeper to make itself fully configurable. We talk about this in the next point.

Keep all of the above in mind when choosing your final hardware. If you don’t, you’ll run into situations where you will need to adjust your provisioned machines in the near future, and that requires the thing we tend to have the least of – time.

Step Two: Solr Setup

Setting up SolrCloud is a bit different than setting up Solr master-slave architecture. First of all, you will need a working Apache ZooKeeper (http://zookeeper.apache.org/) ensemble. SolrCloud uses ZooKeeper to store collections configuration, collections state, to keep track of nodes, for leader election, and so on. In general ZooKeeper is critical for SolrCloud – it’s like a heart of a SolrCloud cluster. When Zookeeper is not available no indexing operation will be successful, some queries may be – up to a point, where something happens to the cluster.

solrcloud-architecturehttps://sematext.com/wp-content/uploads/2016/12/SolrCloud-architecture-3... 300w, https://sematext.com/wp-content/uploads/2016/12/SolrCloud-architecture-7... 768w, https://sematext.com/wp-content/uploads/2016/12/SolrCloud-architecture.png 1116w" sizes="(max-width: 648px) 100vw, 648px" />

In order to setup a highly available and fault-tolerant ZooKeeper ensemble you need at least 3 instances. That allows for a single instance of ZooKeeper to go down and have the ensemble running. The basic idea is that you need to have at least 50% + 1 nodes to be operational in ZooKeeper ensemble for it to be running properly. So when you have 3 nodes, you need at least 2, when you have 5 nodes you need at least 3 and so on.

It is also a very good idea to point all SolrCloud instances to all  ZooKeeper nodes, not just one of them.. That will mean that the ZK_HOST property in your solr.in.sh will look something like this:

ZK_HOST=10.0.0.1:2181,10.0.0.2:2181,10.0.0.3:2181

Of course, you may wonder why we need a standalone ZooKeeper instance when SolrCloud provides embedded ZooKeeper version when run with -c switch without ZK_HOST specified (or without -z switch). At the time of this writing the embedded ZooKeeper has not been designed for production deployments in mind. One of the reasons is that it can’t be used in a distributed mode and having a single Zookeeper instance running inside the same JVM as SolrCloud node is asking for trouble. Imagine JVM going out of memory or SolrCloud node being restarted – the embedded ZooKeeper would go down as well, which means that the cluster would loose its heart for some time. This is something we want to avoid.

Step Three: Migrating the Configuration

The next step in your migration from Solr master-slave to SolrCloud will be preparation of the configuration files. There are at least two files – the schema.xml and solrconfig.xml that you need to take care of.

Note that there can also be additional files that might be required or may need to be removed, depending on the configuration changes. We think that removing all the unneeded configuration files is a good idea because that avoids confusion in the future.

One big thing to remember – if you are migrating to Solr 6, Java 8 is a must. Since support for Java 7 has ended, you should use a later version of Java everywhere. 

Read the original blog entry...

More Stories By Sematext Blog

Sematext is a globally distributed organization that builds innovative Cloud and On Premises solutions for performance monitoring, alerting and anomaly detection (SPM), log management and analytics (Logsene), and search analytics (SSA). We also provide Search and Big Data consulting services and offer 24/7 production support for Solr and Elasticsearch.

Latest Stories
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
We build IoT infrastructure products - when you have to integrate different devices, different systems and cloud you have to build an application to do that but we eliminate the need to build an application. Our products can integrate any device, any system, any cloud regardless of protocol," explained Peter Jung, Chief Product Officer at Pulzze Systems, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA
SYS-CON Events announced today that Telecom Reseller has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
SYS-CON Events announced today that SourceForge has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. SourceForge is the largest, most trusted destination for Open Source Software development, collaboration, discovery and download on the web serving over 32 million viewers, 150 million downloads and over 460,000 active development projects each and every month.
Multiple data types are pouring into IoT deployments. Data is coming in small packages as well as enormous files and data streams of many sizes. Widespread use of mobile devices adds to the total. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists looked at the tools and environments that are being put to use in IoT deployments, as well as the team skills a modern enterprise IT shop needs to keep things running, get a handle on all this data, and deliver...
"We do one of the best file systems in the world. We learned how to deal with Big Data many years ago and we implemented this knowledge into our software," explained Jakub Ratajczak, Business Development Manager at MooseFS, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"Tintri focuses on the Ops side of the DevOps, which basically is pushing more and more of the accessibility of the infrastructure to the developers and trying to get behind the scenes," explained Dhiraj Sehgal of Tintri in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Cloud applications are seeing a deluge of requests to support the exploding advanced analytics market. “Open analytics” is the emerging strategy to deliver that data through an open data access layer, in the cloud, to be directly consumed by external analytics tools and popular programming languages. An increasing number of data engineers and data scientists use a variety of platforms and advanced analytics languages such as SAS, R, Python and Java, as well as frameworks such as Hadoop and Spark...
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.
The current age of digital transformation means that IT organizations must adapt their toolset to cover all digital experiences, beyond just the end users’. Today’s businesses can no longer focus solely on the digital interactions they manage with employees or customers; they must now contend with non-traditional factors. Whether it's the power of brand to make or break a company, the need to monitor across all locations 24/7, or the ability to proactively resolve issues, companies must adapt to...
Both SaaS vendors and SaaS buyers are going “all-in” to hyperscale IaaS platforms such as AWS, which is disrupting the SaaS value proposition. Why should the enterprise SaaS consumer pay for the SaaS service if their data is resident in adjacent AWS S3 buckets? If both SaaS sellers and buyers are using the same cloud tools, automation and pay-per-transaction model offered by IaaS platforms, then why not host the “shrink-wrapped” software in the customers’ cloud? Further, serverless computing, cl...
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.
SYS-CON Events announced today that Enzu will exhibit at SYS-CON's 21st Int\ernational Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Enzu’s mission is to be the leading provider of enterprise cloud solutions worldwide. Enzu enables online businesses to use its IT infrastructure to their competitive advantage. By offering a suite of proven hosting and management services, Enzu wants companies to focus on the core of their ...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
In his session at @ThingsExpo, Eric Lachapelle, CEO of the Professional Evaluation and Certification Board (PECB), provided an overview of various initiatives to certify the security of connected devices and future trends in ensuring public trust of IoT. Eric Lachapelle is the Chief Executive Officer of the Professional Evaluation and Certification Board (PECB), an international certification body. His role is to help companies and individuals to achieve professional, accredited and worldwide re...