Upgrading from Apache Solr 1.4 to 3.x

Dec 7, 2011   //   by Daniel Kranowski   //   Business  //  5 comments

If your IT organization has used Apache Solr for a while to manage enterprise search, perhaps you’ve noticed on the Solr download page that they offer version 1.4.1, last updated in 2009, and several 3.x versions, most recently 3.5.0. So you might be wondering: should we upgrade from 1.4 to 3.x? What happened to 2.0? And is Solr 3.x a radical departure from 1.4, how much trouble would this be to upgrade?

Let’s get the missing 2.0 out of the way first . The Solr codebase merged with Lucene, its underlying search technology; the two Apache projects are one now, and they share not just common code modules but they also share a version number. This table shows how Lucene and Solr version numbers were made to converge starting at 3.1.0:

9/15/2008 11/9/2009 6/24/2010 3/30/2011 6/3/2011 7/29/2011 9/9/2011 11/25/2011
Lucene 2.4.0 2.9.1 2.9.3 3.1.0 3.2.0 3.3.0 3.4.0 3.5.0
Solr 1.3.0 1.4.0 1.4.1 3.1.0 3.2.0 3.3.0 3.4.0 3.5.0

Future version numbers will also stay in sync.

New features in Solr 3.x

There are indeed many compelling new features in the latest Solr. If you upgrade from Solr 1.4 to Solr 3.5, you’ll have all this available to your enterprise search:

Geolocation. Also called Spatial Search, this feature weights search results by their proximity to a given latitude and longitude. Now your mobile users can get results tailored to their specific location.

Suggester. This could also be called Autocompletion. Like the popular Google Suggest, as you start typing it suggests search terms so you don’t have to type the whole thing. For example as you type “med” it could prompt you for “medical”, “medium”, and “medicine”.

UIMA integration. UIMA (Unstructured Information Management Architecture) is an emerging open-source framework for identifying meaningful concepts buried in your non-relational data (like Solr documents). You could take advantage of Solr’s integration with UIMA’s OpenCalais component to extract conceptual entities like Persons, Companies, Acquisitions, Mergers, or whatever you dream up.

Smaller memory footprint. Solr has benefited from many optimizations on a per-feature basis, but the recent term index enhancement has improved RAM usage by 3-5x across the board. Now you can host a massive Solr index on much cheaper hardware.

Field collapsing and result grouping. If your Solr data has duplicate records, you can clean up the search results by enabling the new grouping feature. It tells you how many duplicates were collapsed/grouped.

Range facets. Organizes search results into numeric ranges, for example price ranges $0-$49.99, $50-$99.99, $100+. Very handy for a retail e-commerce site. Previously you could only get one range at a time.

Extended dismax parser. Allows the untrained user to type in simple search queries, while expert users can type into the same search bar using more elaborate syntax. On a clothing retail e-commerce site, a regular user could search for “running shoes”, while the power user searches for “(running AND shoes) OR (breathable AND lightweight)”.

XSLT stylesheets. Yet another way to get documents into Solr: configure the server with an XSLT stylesheet, then you can post your own custom XML and the server will transform it into a Solr document.

There’s more, but those are the highlights.

Investment in the upgrade

Upgrading from Solr 1.4 to 3.x will take some effort in development and test. Nowhere near the level of effort it takes an IT organization to implement Solr for the first time, but not necessarily something to just do over a weekend either. The release notes for Solr and Lucene indicate certain items where backwards compatibility was not maintained, so Java developers will notice these changes in their solrj code and in the two primary configuration files, solrconfig.xml and schema.xml. The compatibility issues include such changes as the removal of obsolete API methods, Java classes being renamed or moved around, Java objects changing their type, and new default settings for Solr behavior such as merge policy and norms calculations. And you can’t do an in-place live upgrade either, you’ll have to clear out and regenerate your Solr records, because the internal binary format is different. These kinds of changes are par for the course with any kind of open source upgrade.

If you just want the benefit of bugfixes and performance enhancements without making use of any other new features, that’s the extent of your development effort. The compatibility issues described above are really quite minor and you’ll get to keep the majority of your existing investment in Solr development. It’s not as if Solr has completely changed and all the old rules are out the window – quite the opposite. The change from “1.4 to 3.x” sounds a lot more stark than it actually is.

Keep in mind your testing effort though. If a search for “running shoes” used to give 500 results, make sure you still get the same 500 results when searching on Solr 3.x. Due to changes in the search engine internals, the results could come back in a different order too. You should be in a good position to minimize the testing effort if you have already built up a thorough suite of automated tests for your Solr design.

As a word of caution: if you’re going to upgrade to 3.x, avoid the 3.1 release. Version 3.1 has two serious bugs: a memory trap in the spellcheck component which could take down your whole Solr server if there are too many misspelled words (SOLR-2462), and the risk of silently corrupting the index if the Solr server goes down unexpectedly (LUCENE-3418). These issues were fixed in releases 3.3 and 3.4 respectively. The most sensible option for an IT organization that wants to upgrade now is release 3.5, which is the latest stable release, because it includes those bugfixes and also the across-the-board memory optimization on the term index, which will save you money immediately by allowing you to deploy Solr on cheaper hardware.

Finally, know that Solr 4.0 is actively under development, with a push toward more cloud-computing support. There is currently no release date whatsoever for 4.0, so don’t try to plan around it. Looking at the JIRA page for Solr 4.0, it appears that the eventual 4.0 release will continue to strive for backward compatibility to the same extent that 3.x has done. Other than the new requirement to use JDK 1.6, when Solr 4.0 becomes available the upgrade path should be similar to today’s upgrade path to 3.x.

5 Comments

  • I am migrating from 1.3 and would like to know if the current stable relase that I should be looking at is 3.6.1 or 3.6.2 or 3.5 for migration?

  • Hi Sujatha,

    The current recommended release for your migration is Solr 3.6.1. 3.6.0 was another big step up from 3.5, with lots of new features and tons of bug fixes, and 3.6.1 fixed a few more bugs. There is no 3.6.2. 4.0.0-BETA is out now, which is exciting, but of course it is beta.

    Daniel

  • Thanks.The link below mentions 3.6.2 though

    https://issues.apache.org/jira/browse/SOLR

  • Yes it does! 3.6.2 is unreleased, so technically you could get it as a nightly build.

  • Hi,

    We are going to upgrade our current version of Solr 1.4 to the latest 3.6 soon. I just wondering if anyone has tried that and if there are any significant problems?

    Thanks for sharing this good article!

Please share your thoughts