Welcome!

Blog Feed Post

Sematext Solr AutoComplete: Introduction and Howto

Sematext Solr AutoComplete is an open-source Solr add-on that provides suggest-as-you-type functionality. In this post we’ll explain how you can install it, load the autocomplete collection/core with suggestions and how to run queries to get those suggestions back.

Why Sematext Solr AutoComplete?

Before we start, you might wonder how is Sematext Solr AutoComplete different from Solr’s Suggesters. The most important pluses of AutoComplete are:

  • query flexibility. For example, with built-in suggesters you can choose an implementation that allows for fuzzy matches (vashin can return washington) or one matching infixes (wash can return the washington times), but you can’t have both. AutoComplete can do that (vashin can return the washington times)
  • ranking flexibility. Besides static boosts, you can boost based on word order (washington ti can return the washington times above time in washington) or completed words (new can return new york above newton). You can also group suggestions based on a field, for example to have sponsored suggestions higher
  • it comes with a few tools that help you load suggestions into the collection/core used for autocomplete. You can load suggestions from a file, another index or via the DataImportHandler
  • it comes with a GUI component – an AJAX that can be attached to an HTML search form to query the backend

Solr’s built-in suggesters are easier to maintain when it comes to upgrades and are potentially faster, depending on the selected implementation and number of suggestions that have to be queried. We suggest checking them out as well as Sematext Solr AutoComplete so you can choose what’s best for your use-case. In general, AutoComplete helps when you need more control over your suggestions, especially since it makes this customization easier via import tools and GUI code.

Installation

First, you’ll need to clone the AutoComplete repository and package it. At the time of this writing, the last supported Solr version is 6.3. Just make sure you have Java 8 and Maven and then:

git clone https://github.com/sematext/solr-autocomplete.git
cd solr-autocomplete
mvn clean package

Once the build process is done, copy the AutoComplete jar to Solr’s installation:

cp target/st-AutoComplete-1.6.6.3.1-SNAPSHOT.jar /opt/solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib/

AutoComplete depends on Sematext Solr ReSearcher‘s core jar. ReSearcher is another Solr addon that complements and extends Solr’s built-in spellcheckers, much like what AutoComplete does to Solr’s suggesters. We’ll explain ReSearcher in another post, but for now let’s treat it as a dependency:

git clone https://github.com/sematext/solr-researcher
cd core
mvn clean package
cp target/st-ReSearcher-core-1.12.6.3.1-SNAPSHOT.jar /opt/solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib/

With all the jars in place, start Solr. Here it’s in Cloud mode, but it works with Master-Slave Solr as well:

/opt/solr-6.3.0/bin/solr start -c

Create an AutoComplete collection (or core, if it weren’t SolrCloud). Here’s we’ll name it autocomplete, but it can be anything:

/opt/solr-6.3.0/bin/solr create -c autocomplete -d ./solr/collection1/conf/

Index and query suggestions

Everything is now ready to load some suggestions. AutoComplete comes with a few example files that we can load with the FileLoader tool. In production, it’s likely that the tools you use for manual or automatic curation of suggestions will output to a file, so you may end up using FileLoader as more than just a test script:

cat example/exampledocs/just-phrases.txt | java -cp /opt/solr-6.3.0/dist/*:/opt/solr-6.3.0/server/lib/ext/*:/opt/solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib/* com.sematext.autocomplete.loader.FileLoader http://localhost:8983/solr/autocomplete

To ask for a suggestion, we’ll just run a query on the autocomplete collection with the prefix, while specifying the dismax_ac query handler:

curl 'localhost:8983/solr/autocomplete/select?q=new&qt=dismax_ac&indent=true'

Tweaking queries and ranking

From the previous query, you’ll normally get back newton and new york. You can boost completed words (i.e. the new york suggestion) by setting ac_matchFullWords:

curl 'localhost:8983/solr/autocomplete/select?q=new&qt=dismax_ac&ac_matchFullWords=true&indent=true'

Typos can be tolerated too, via ac_spellcheck:

curl 'localhost:8983/solr/autocomplete/select?q=nee&qt=dismax_ac&ac_spellcheck=true&indent=true'

Lastly, let’s make some suggestions sponsored:

$ cat example/exampledocs/phrases-sponsored.txt
phrase:First Item Here is_sponsored:false
phrase:Second Item Here is_sponsored:true
phrase:Here Item is_sponsored:false
$ cat example/exampledocs/phrases-sponsored.txt | java -cp /opt/solr-6.3.0/dist/*:/opt/solr-6.3.0/server/lib/ext/*:/opt/solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib/* com.sematext.autocomplete.loader.FileLoader http://localhost:8983/solr/autocomplete

Now we can group them so that sponsored items come first:

curl 'localhost:8983/solr/autocomplete/select?q=ite&qt=dismax_ac&ac_grouping_field=is_sponsored&indent=true'


 Next steps

Once you get AutoComplete working as you wish, you can use the HTML and JS examples that come with AutoComplete to make this part easier. More details can be found on the Github README and the result should be similar to that on our search sites: search-lucene.com, search-hadoop.com and search-devops.com:

https://sematext.com/wp-content/uploads/2017/03/Screen-Shot-2017-03-08-a... 300w" sizes="(max-width: 674px) 100vw, 674px" />

 

Read the original blog entry...

More Stories By Sematext Blog

Sematext is a globally distributed organization that builds innovative Cloud and On Premises solutions for performance monitoring, alerting and anomaly detection (SPM), log management and analytics (Logsene), and search analytics (SSA). We also provide Search and Big Data consulting services and offer 24/7 production support for Solr and Elasticsearch.

Latest Stories
"I think DevOps is now a rambunctious teenager – it’s starting to get a mind of its own, wanting to get its own things but it still needs some adult supervision," explained Thomas Hooker, VP of marketing at CollabNet, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We are still a relatively small software house and we are focusing on certain industries like FinTech, med tech, energy and utilities. We help our customers with their digital transformation," noted Piotr Stawinski, Founder and CEO of EARP Integration, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"DX encompasses the continuing technology revolution, and is addressing society's most important issues throughout the entire $78 trillion 21st-century global economy," said Roger Strukhoff, Conference Chair. "DX World Expo has organized these issues along 10 tracks with more than 150 of the world's top speakers coming to Istanbul to help change the world."
"We've been engaging with a lot of customers including Panasonic, we've been involved with Cisco and now we're working with the U.S. government - the Department of Homeland Security," explained Peter Jung, Chief Product Officer at Pulzze Systems, in this SYS-CON.tv interview at @ThingsExpo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We're here to tell the world about our cloud-scale infrastructure that we have at Juniper combined with the world-class security that we put into the cloud," explained Lisa Guess, VP of Systems Engineering at Juniper Networks, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"I will be talking about ChatOps and ChatOps as a way to solve some problems in the DevOps space," explained Himanshu Chhetri, CTO of Addteq, in this SYS-CON.tv interview at @DevOpsSummit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We are focused on SAP running in the clouds, to make this super easy because we believe in the tremendous value of those powerful worlds - SAP and the cloud," explained Frank Stienhans, CTO of Ocean9, Inc., in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Your homes and cars can be automated and self-serviced. Why can't your storage? From simply asking questions to analyze and troubleshoot your infrastructure, to provisioning storage with snapshots, recovery and replication, your wildest sci-fi dream has come true. In his session at @DevOpsSummit at 20th Cloud Expo, Dan Florea, Director of Product Management at Tintri, provided a ChatOps demo where you can talk to your storage and manage it from anywhere, through Slack and similar services with...
The financial services market is one of the most data-driven industries in the world, yet it’s bogged down by legacy CPU technologies that simply can’t keep up with the task of querying and visualizing billions of records. In his session at 20th Cloud Expo, Karthik Lalithraj, a Principal Solutions Architect at Kinetica, discussed how the advent of advanced in-database analytics on the GPU makes it possible to run sophisticated data science workloads on the same database that is housing the rich...
SYS-CON Events announced today that Massive Networks will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Massive Networks mission is simple. To help your business operate seamlessly with fast, reliable, and secure internet and network solutions. Improve your customer's experience with outstanding connections to your cloud.
"We are an IT services solution provider and we sell software to support those solutions. Our focus and key areas are around security, enterprise monitoring, and continuous delivery optimization," noted John Balsavage, President of A&I Solutions, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Everything run by electricity will eventually be connected to the Internet. Get ahead of the Internet of Things revolution and join Akvelon expert and IoT industry leader, Sergey Grebnov, in his session at @ThingsExpo, for an educational dive into the world of managing your home, workplace and all the devices they contain with the power of machine-based AI and intelligent Bot services for a completely streamlined experience.
"We want to show that our solution is far less expensive with a much better total cost of ownership so we announced several key features. One is called geo-distributed erasure coding, another is support for KVM and we introduced a new capability called Multi-Part," explained Tim Desai, Senior Product Marketing Manager at Hitachi Data Systems, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
There is a huge demand for responsive, real-time mobile and web experiences, but current architectural patterns do not easily accommodate applications that respond to events in real time. Common solutions using message queues or HTTP long-polling quickly lead to resiliency, scalability and development velocity challenges. In his session at 21st Cloud Expo, Ryland Degnan, a Senior Software Engineer on the Netflix Edge Platform team, will discuss how by leveraging a reactive stream-based protocol,...
DevOps at Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to w...