Ord’s Blog
RSS icon Home icon
  • Local Solr

    Posted on June 12th, 2009 Ord 3 comments

    For searching at our Open Business Information Directory project we have been using a Solr server.  Based on the Apache Lucene library, it consistenly works well for the type of searches we need to do.  Since a lot of the applications for business data will be local searches, we looked at ways to implement distance algorithms.  For a quick solution, we turned to LocalLucene/LocalSolr.  This package adds distance search to Solr, so that we can send queires for matching within a certain radius of a point.  LocalLucene is available from SourgeForge.

    Getting started takes some doing.  First of all, we need to perform geocoding of  our records.  It is worth noting that not every record needs to be geocoded – ones that aren’t coded just won’t appear in proximity searches.  To get up and running quickly, we used a postal code database to get a rough location of the records that weren’t already geocoded.

    For the changes to the Solr installation, I referred to the helpful tutorial at GISSearch.com.  Once the changes had been made, I reindexed our records.  This was the longest part of the process – even though I only processed the Canadian records for this test, there are still over a million to go through.  If the server wasn’t being used I could have shut it down, deleted the indexes and rebuild to save some time.

    Along the way, a few problems came up.  First was that a version build with the latest sources didn’t work, I had to revert to some earlier stable versions.  At GISSearch there is an example package that has a compiled solr that works, so that is a good place to start if you are having issues there.  The other big problem was that a bug in the phps output writer was preventing the searches from running.  Switching to xml or json output solves that.

    Using Local Solr instead of writing our own solutions has saved a lot of development time.  We still need to do some performance testing to see how it will hold up under heavy usage, but so far it looks like with a dedicated server for geo searching we will be able to keep up with the loads.