Welcome!

Blog Feed Post

Migrating to SolrCloud from Solr Master-Slave

Nowadays there are more and more organizations searching for fault-tolerant and highly available solutions for various parts of their infrastructure, including search, which evolved from merely a “nice to have” feature to the first class citizen and a “must have” element.

Apache Solr is a mature search solution that has been available for over a decade now.  Its traditional master-slave deployment has been available since 2006, while the fully distributed deployment known as SolrCloud has been available for only a few years now. Thus, naturally, many organizations are in the process of migrating from Solr master-slave to SolrCloud, or are at least thinking about the move. In this article, we will give you an overview of what’s needed to be done for the migration to SolrCloud to be as smooth as it can be.

Step One: Hardware

The first thing that you need to think about when migrating from old master-slave environment is the hardware. You may wonder why the hardware should be different from what you have right now. There are a few reasons:

  • You want to keep the current production while migrating to new solution to avoid service interruption. Doing that will also let you test the new SolrCloud cluster and rollback to the old Solr master-slave if something goes awry. It is also a good idea to run various performance tests before going to production, so having a separate hardware for the SolrCloud cluster may be a very good idea if you can afford it.
  • You will want to prepare the infrastructure for your new search setup so that it can handle the load for the next N months or even a few years, so you don’t have to change the architecture or infrastructure again in the near future.
  • If your data is changing rapidly, with SolrCloud you will index the data to all nodes at the same time, not only to the master servers. This may require more resources on the nodes. More disk I/O will be needed. Because of constant data indexing, data will also be indexed on the replicas, so you need to account for that.
  • If you plan to have replicas, keep in mind they will do the same operations as the leader shards, so having more replicas means not only more storage requirements, but also more resources like CPU and memory.
  • Finally, you will need ZooKeeper ensemble to be working. SolrCloud uses ZooKeeper to make itself fully configurable. We talk about this in the next point.

Keep all of the above in mind when choosing your final hardware. If you don’t, you’ll run into situations where you will need to adjust your provisioned machines in the near future, and that requires the thing we tend to have the least of – time.

Step Two: Solr Setup

Setting up SolrCloud is a bit different than setting up Solr master-slave architecture. First of all, you will need a working Apache ZooKeeper (http://zookeeper.apache.org/) ensemble. SolrCloud uses ZooKeeper to store collections configuration, collections state, to keep track of nodes, for leader election, and so on. In general ZooKeeper is critical for SolrCloud – it’s like a heart of a SolrCloud cluster. When Zookeeper is not available no indexing operation will be successful, some queries may be – up to a point, where something happens to the cluster.

solrcloud-architecturehttps://sematext.com/wp-content/uploads/2016/12/SolrCloud-architecture-3... 300w, https://sematext.com/wp-content/uploads/2016/12/SolrCloud-architecture-7... 768w, https://sematext.com/wp-content/uploads/2016/12/SolrCloud-architecture.png 1116w" sizes="(max-width: 648px) 100vw, 648px" />

In order to setup a highly available and fault-tolerant ZooKeeper ensemble you need at least 3 instances. That allows for a single instance of ZooKeeper to go down and have the ensemble running. The basic idea is that you need to have at least 50% + 1 nodes to be operational in ZooKeeper ensemble for it to be running properly. So when you have 3 nodes, you need at least 2, when you have 5 nodes you need at least 3 and so on.

It is also a very good idea to point all SolrCloud instances to all  ZooKeeper nodes, not just one of them.. That will mean that the ZK_HOST property in your solr.in.sh will look something like this:

ZK_HOST=10.0.0.1:2181,10.0.0.2:2181,10.0.0.3:2181

Of course, you may wonder why we need a standalone ZooKeeper instance when SolrCloud provides embedded ZooKeeper version when run with -c switch without ZK_HOST specified (or without -z switch). At the time of this writing the embedded ZooKeeper has not been designed for production deployments in mind. One of the reasons is that it can’t be used in a distributed mode and having a single Zookeeper instance running inside the same JVM as SolrCloud node is asking for trouble. Imagine JVM going out of memory or SolrCloud node being restarted – the embedded ZooKeeper would go down as well, which means that the cluster would loose its heart for some time. This is something we want to avoid.

Step Three: Migrating the Configuration

The next step in your migration from Solr master-slave to SolrCloud will be preparation of the configuration files. There are at least two files – the schema.xml and solrconfig.xml that you need to take care of.

Note that there can also be additional files that might be required or may need to be removed, depending on the configuration changes. We think that removing all the unneeded configuration files is a good idea because that avoids confusion in the future.

One big thing to remember – if you are migrating to Solr 6, Java 8 is a must. Since support for Java 7 has ended, you should use a later version of Java everywhere. 

Read the original blog entry...

More Stories By Sematext Blog

Sematext is a globally distributed organization that builds innovative Cloud and On Premises solutions for performance monitoring, alerting and anomaly detection (SPM), log management and analytics (Logsene), and search analytics (SSA). We also provide Search and Big Data consulting services and offer 24/7 production support for Solr and Elasticsearch.

Latest Stories
In his session at 21st Cloud Expo, Raju Shreewastava, founder of Big Data Trunk, provided a fun and simple way to introduce Machine Leaning to anyone and everyone. He solved a machine learning problem and demonstrated an easy way to be able to do machine learning without even coding. Raju Shreewastava is the founder of Big Data Trunk (www.BigDataTrunk.com), a Big Data Training and consulting firm with offices in the United States. He previously led the data warehouse/business intelligence and B...
Blockchain is a shared, secure record of exchange that establishes trust, accountability and transparency across business networks. Supported by the Linux Foundation's open source, open-standards based Hyperledger Project, Blockchain has the potential to improve regulatory compliance, reduce cost as well as advance trade. Are you curious about how Blockchain is built for business? In her session at 21st Cloud Expo, René Bostic, Technical VP of the IBM Cloud Unit in North America, discussed the b...
The past few years have brought a sea change in the way applications are architected, developed, and consumed—increasing both the complexity of testing and the business impact of software failures. How can software testing professionals keep pace with modern application delivery, given the trends that impact both architectures (cloud, microservices, and APIs) and processes (DevOps, agile, and continuous delivery)? This is where continuous testing comes in. D
SYS-CON Events announced today that Synametrics Technologies will exhibit at SYS-CON's 22nd International Cloud Expo®, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Synametrics Technologies is a privately held company based in Plainsboro, New Jersey that has been providing solutions for the developer community since 1997. Based on the success of its initial product offerings such as WinSQL, Xeams, SynaMan and Syncrify, Synametrics continues to create and hone in...
With tough new regulations coming to Europe on data privacy in May 2018, Calligo will explain why in reality the effect is global and transforms how you consider critical data. EU GDPR fundamentally rewrites the rules for cloud, Big Data and IoT. In his session at 21st Cloud Expo, Adam Ryan, Vice President and General Manager EMEA at Calligo, examined the regulations and provided insight on how it affects technology, challenges the established rules and will usher in new levels of diligence arou...
Nordstrom is transforming the way that they do business and the cloud is the key to enabling speed and hyper personalized customer experiences. In his session at 21st Cloud Expo, Ken Schow, VP of Engineering at Nordstrom, discussed some of the key learnings and common pitfalls of large enterprises moving to the cloud. This includes strategies around choosing a cloud provider(s), architecture, and lessons learned. In addition, he covered some of the best practices for structured team migration an...
As you move to the cloud, your network should be efficient, secure, and easy to manage. An enterprise adopting a hybrid or public cloud needs systems and tools that provide: Agility: ability to deliver applications and services faster, even in complex hybrid environments Easier manageability: enable reliable connectivity with complete oversight as the data center network evolves Greater efficiency: eliminate wasted effort while reducing errors and optimize asset utilization Security: imple...
Mobile device usage has increased exponentially during the past several years, as consumers rely on handhelds for everything from news and weather to banking and purchases. What can we expect in the next few years? The way in which we interact with our devices will fundamentally change, as businesses leverage Artificial Intelligence. We already see this taking shape as businesses leverage AI for cost savings and customer responsiveness. This trend will continue, as AI is used for more sophistica...
No hype cycles or predictions of a gazillion things here. IoT is here. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, an Associate Partner of Analytics, IoT & Cybersecurity at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He also discussed the evaluation of communication standards and IoT messaging protocols, data...
Companies are harnessing data in ways we once associated with science fiction. Analysts have access to a plethora of visualization and reporting tools, but considering the vast amount of data businesses collect and limitations of CPUs, end users are forced to design their structures and systems with limitations. Until now. As the cloud toolkit to analyze data has evolved, GPUs have stepped in to massively parallel SQL, visualization and machine learning.
The 22nd International Cloud Expo | 1st DXWorld Expo has announced that its Call for Papers is open. Cloud Expo | DXWorld Expo, to be held June 5-7, 2018, at the Javits Center in New York, NY, brings together Cloud Computing, Digital Transformation, Big Data, Internet of Things, DevOps, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
Modern software design has fundamentally changed how we manage applications, causing many to turn to containers as the new virtual machine for resource management. As container adoption grows beyond stateless applications to stateful workloads, the need for persistent storage is foundational - something customers routinely cite as a top pain point. In his session at @DevOpsSummit at 21st Cloud Expo, Bill Borsari, Head of Systems Engineering at Datera, explored how organizations can reap the bene...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
Kubernetes is an open source system for automating deployment, scaling, and management of containerized applications. Kubernetes was originally built by Google, leveraging years of experience with managing container workloads, and is now a Cloud Native Compute Foundation (CNCF) project. Kubernetes has been widely adopted by the community, supported on all major public and private cloud providers, and is gaining rapid adoption in enterprises. However, Kubernetes may seem intimidating and complex ...
In his session at 21st Cloud Expo, Michael Burley, a Senior Business Development Executive in IT Services at NetApp, described how NetApp designed a three-year program of work to migrate 25PB of a major telco's enterprise data to a new STaaS platform, and then secured a long-term contract to manage and operate the platform. This significant program blended the best of NetApp’s solutions and services capabilities to enable this telco’s successful adoption of private cloud storage and launching ...