Welcome!

Blog Feed Post

Sematext Solr AutoComplete: Introduction and Howto

Sematext Solr AutoComplete is an open-source Solr add-on that provides suggest-as-you-type functionality. In this post we’ll explain how you can install it, load the autocomplete collection/core with suggestions and how to run queries to get those suggestions back.

Why Sematext Solr AutoComplete?

Before we start, you might wonder how is Sematext Solr AutoComplete different from Solr’s Suggesters. The most important pluses of AutoComplete are:

  • query flexibility. For example, with built-in suggesters you can choose an implementation that allows for fuzzy matches (vashin can return washington) or one matching infixes (wash can return the washington times), but you can’t have both. AutoComplete can do that (vashin can return the washington times)
  • ranking flexibility. Besides static boosts, you can boost based on word order (washington ti can return the washington times above time in washington) or completed words (new can return new york above newton). You can also group suggestions based on a field, for example to have sponsored suggestions higher
  • it comes with a few tools that help you load suggestions into the collection/core used for autocomplete. You can load suggestions from a file, another index or via the DataImportHandler
  • it comes with a GUI component – an AJAX that can be attached to an HTML search form to query the backend

Solr’s built-in suggesters are easier to maintain when it comes to upgrades and are potentially faster, depending on the selected implementation and number of suggestions that have to be queried. We suggest checking them out as well as Sematext Solr AutoComplete so you can choose what’s best for your use-case. In general, AutoComplete helps when you need more control over your suggestions, especially since it makes this customization easier via import tools and GUI code.

Installation

First, you’ll need to clone the AutoComplete repository and package it. At the time of this writing, the last supported Solr version is 6.3. Just make sure you have Java 8 and Maven and then:

git clone https://github.com/sematext/solr-autocomplete.git
cd solr-autocomplete
mvn clean package

Once the build process is done, copy the AutoComplete jar to Solr’s installation:

cp target/st-AutoComplete-1.6.6.3.1-SNAPSHOT.jar /opt/solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib/

AutoComplete depends on Sematext Solr ReSearcher‘s core jar. ReSearcher is another Solr addon that complements and extends Solr’s built-in spellcheckers, much like what AutoComplete does to Solr’s suggesters. We’ll explain ReSearcher in another post, but for now let’s treat it as a dependency:

git clone https://github.com/sematext/solr-researcher
cd core
mvn clean package
cp target/st-ReSearcher-core-1.12.6.3.1-SNAPSHOT.jar /opt/solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib/

With all the jars in place, start Solr. Here it’s in Cloud mode, but it works with Master-Slave Solr as well:

/opt/solr-6.3.0/bin/solr start -c

Create an AutoComplete collection (or core, if it weren’t SolrCloud). Here’s we’ll name it autocomplete, but it can be anything:

/opt/solr-6.3.0/bin/solr create -c autocomplete -d ./solr/collection1/conf/

Index and query suggestions

Everything is now ready to load some suggestions. AutoComplete comes with a few example files that we can load with the FileLoader tool. In production, it’s likely that the tools you use for manual or automatic curation of suggestions will output to a file, so you may end up using FileLoader as more than just a test script:

cat example/exampledocs/just-phrases.txt | java -cp /opt/solr-6.3.0/dist/*:/opt/solr-6.3.0/server/lib/ext/*:/opt/solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib/* com.sematext.autocomplete.loader.FileLoader http://localhost:8983/solr/autocomplete

To ask for a suggestion, we’ll just run a query on the autocomplete collection with the prefix, while specifying the dismax_ac query handler:

curl 'localhost:8983/solr/autocomplete/select?q=new&qt=dismax_ac&indent=true'

Tweaking queries and ranking

From the previous query, you’ll normally get back newton and new york. You can boost completed words (i.e. the new york suggestion) by setting ac_matchFullWords:

curl 'localhost:8983/solr/autocomplete/select?q=new&qt=dismax_ac&ac_matchFullWords=true&indent=true'

Typos can be tolerated too, via ac_spellcheck:

curl 'localhost:8983/solr/autocomplete/select?q=nee&qt=dismax_ac&ac_spellcheck=true&indent=true'

Lastly, let’s make some suggestions sponsored:

$ cat example/exampledocs/phrases-sponsored.txt
phrase:First Item Here is_sponsored:false
phrase:Second Item Here is_sponsored:true
phrase:Here Item is_sponsored:false
$ cat example/exampledocs/phrases-sponsored.txt | java -cp /opt/solr-6.3.0/dist/*:/opt/solr-6.3.0/server/lib/ext/*:/opt/solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib/* com.sematext.autocomplete.loader.FileLoader http://localhost:8983/solr/autocomplete

Now we can group them so that sponsored items come first:

curl 'localhost:8983/solr/autocomplete/select?q=ite&qt=dismax_ac&ac_grouping_field=is_sponsored&indent=true'


 Next steps

Once you get AutoComplete working as you wish, you can use the HTML and JS examples that come with AutoComplete to make this part easier. More details can be found on the Github README and the result should be similar to that on our search sites: search-lucene.com, search-hadoop.com and search-devops.com:

https://sematext.com/wp-content/uploads/2017/03/Screen-Shot-2017-03-08-a... 300w" sizes="(max-width: 674px) 100vw, 674px" />

 

Read the original blog entry...

More Stories By Sematext Blog

Sematext is a globally distributed organization that builds innovative Cloud and On Premises solutions for performance monitoring, alerting and anomaly detection (SPM), log management and analytics (Logsene), and search analytics (SSA). We also provide Search and Big Data consulting services and offer 24/7 production support for Solr and Elasticsearch.

Latest Stories
SYS-CON Events announced today that CollabNet, a global leader in enterprise software development, release automation and DevOps solutions, will be a Bronze Sponsor of SYS-CON's 20th International Cloud Expo®, taking place from June 6-8, 2017, at the Javits Center in New York City, NY. CollabNet offers a broad range of solutions with the mission of helping modern organizations deliver quality software at speed. The company’s latest innovation, the DevOps Lifecycle Manager (DLM), supports Value S...
NHK, Japan Broadcasting, will feature the upcoming @ThingsExpo Silicon Valley in a special 'Internet of Things' and smart technology documentary that will be filmed on the expo floor between November 3 to 5, 2015, in Santa Clara. NHK is the sole public TV network in Japan equivalent to the BBC in the UK and the largest in Asia with many award-winning science and technology programs. Japanese TV is producing a documentary about IoT and Smart technology and will be covering @ThingsExpo Silicon Val...
Join IBM November 2 at 19th Cloud Expo at the Santa Clara Convention Center in Santa Clara, CA, and learn how to go beyond multi-speed it to bring agility to traditional enterprise applications. Technology innovation is the driving force behind modern business and enterprises must respond by increasing the speed and efficiency of software delivery. The challenge is that existing enterprise applications are expensive to develop and difficult to modernize. This often results in what Gartner calls ...
@GonzalezCarmen has been ranked the Number One Influencer and @ThingsExpo has been named the Number One Brand in the “M2M 2016: Top 100 Influencers and Brands” by Analytic. Onalytica analyzed tweets over the last 6 months mentioning the keywords M2M OR “Machine to Machine.” They then identified the top 100 most influential brands and individuals leading the discussion on Twitter.
Translating agile methodology into real-world best practices within the modern software factory has driven widespread DevOps adoption, yet much work remains to expand workflows and tooling across the enterprise. As models evolve from pockets of experimentation into wholescale organizational reinvention, practitioners find themselves challenged to incorporate the culture and architecture necessary to support DevOps at scale. In his session at @DevOpsSummit at 20th Cloud Expo, Anand Akela, Senior...
The 20th International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held June 6-8, 2017, at the Javits Center in New York City, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Containers, Microservices and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal ...
The age of Digital Disruption is evolving into the next era – Digital Cohesion, an age in which applications securely self-assemble and deliver predictive services that continuously adapt to user behavior. Information from devices, sensors and applications around us will drive services seamlessly across mobile and fixed devices/infrastructure. This evolution is happening now in software defined services and secure networking. Four key drivers – Performance, Economics, Interoperability and Trust ...
NHK, Japan Broadcasting, will feature the upcoming @ThingsExpo Silicon Valley in a special 'Internet of Things' and smart technology documentary that will be filmed on the expo floor between November 3 to 5, 2015, in Santa Clara. NHK is the sole public TV network in Japan equivalent to the BBC in the UK and the largest in Asia with many award-winning science and technology programs. Japanese TV is producing a documentary about IoT and Smart technology and will be covering @ThingsExpo Silicon Val...
The Internet of Things is clearly many things: data collection and analytics, wearables, Smart Grids and Smart Cities, the Industrial Internet, and more. Cool platforms like Arduino, Raspberry Pi, Intel's Galileo and Edison, and a diverse world of sensors are making the IoT a great toy box for developers in all these areas. In this Power Panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists discussed what things are the most important, which will have the most profound e...
SYS-CON Events announced today that Twistlock, the leading provider of cloud container security solutions, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Twistlock is the industry's first enterprise security suite for container security. Twistlock's technology addresses risks on the host and within the application of the container, enabling enterprises to consistently enforce security policies, monitor...
Multiple data types are pouring into IoT deployments. Data is coming in small packages as well as enormous files and data streams of many sizes. Widespread use of mobile devices adds to the total. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists will look at the tools and environments that are being put to use in IoT deployments, as well as the team skills a modern enterprise IT shop needs to keep things running, get a handle on all this data, and deli...
Automation is enabling enterprises to design, deploy, and manage more complex, hybrid cloud environments. Yet the people who manage these environments must be trained in and understanding these environments better than ever before. A new era of analytics and cognitive computing is adding intelligence, but also more complexity, to these cloud environments. How smart is your cloud? How smart should it be? In this power panel at 20th Cloud Expo, moderated by Conference Chair Roger Strukhoff, pane...
With billions of sensors deployed worldwide, the amount of machine-generated data will soon exceed what our networks can handle. But consumers and businesses will expect seamless experiences and real-time responsiveness. What does this mean for IoT devices and the infrastructure that supports them? More of the data will need to be handled at - or closer to - the devices themselves.
Building a cross-cloud operational model can be a daunting task. Per-cloud silos are not the answer, but neither is a fully generic abstraction plane that strips out capabilities unique to a particular provider. In his session at 20th Cloud Expo, Chris Wolf, VP & Chief Technology Officer, Global Field & Industry at VMware, will discuss how successful organizations approach cloud operations and management, with insights into where operations should be centralized and when it’s best to decentraliz...
In recent years, containers have taken the world by storm. Companies of all sizes and industries have realized the massive benefits of containers, such as unprecedented mobility, higher hardware utilization, and increased flexibility and agility; however, many containers today are non-persistent. Containers without persistence miss out on many benefits, and in many cases simply pass the responsibility of persistence onto other infrastructure, adding additional complexity.