What is Geospatial search?

Many applications wish to combine location data with text data. This is often called spatial search or geo-spatial search. Most of these applications need to do several things:

  1. Represent spatial data in the index
  2. Filter by some spatial concept such as a bounding box or other shape
  3. Sort by distance
  4. Score/boost by distance

Lucene 4 has a new spatial module that replaces the older one described below. The Solr adapters for it are documented here: SolrAdaptersForLuceneSpatial4. The rest of this document is about the still-supported older approach.

If you haven’t already, download Solr, start the example server and index the example data as shown in the solr tutorial. With the Solr server running, you should be able to click on the example links and see real responses.

GEOSPATIAL SEARCH

In the example data, certain documents have a field called “store” (with a fieldType named “location” implemented via LatLonType). Some of the points in the example data are:

<field name="store">45.17614,-93.87341</field>  <!-- Buffalo store -->
<field name="store">40.7143,-74.006</field>     <!-- NYC store -->
<field name="store">37.7752,-122.4232</field>   <!-- San Francisco store -->

Schema Configuration

This requires a location field type in schema.xml

  <fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>

and also a dynamic field type matching the suffix to store the data points:

  <dynamicField name="*_coordinate"  type="tdouble" indexed="true"  stored="false"/>

    geofilt – The distance filter

    Now let’s assume that we are at 45.15,-93.85 (which happens to be 3.437 km from the Buffalo store). We can use a geofilt filter to find all products (documents in our index) with the field store within 5km of our position:

    Sure enough, we find 8 products at the Buffalo store:

    ...
      "response":{"numFound":8,"start":0,"docs":[
          {
            "name":"Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133",
            "store":"45.17614,-93.87341"},
          {
            "name":"Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300",
            "store":"45.17614,-93.87341"},
    ...
    
    

    Spatial Query Parameters

    The main spatial search related queries, geofilt, bbox, and geodist default to looking for normal request parameters, so any of pt, sfield, and dist may be factored out and only specified once in a request (even if multiple spatial queries are used).

    Examples:

    bbox – Bounding-box filter

    Exact distance calculations can be somewhat expensive and it can often make sense to use a quick approximation instead. The bbox filter is guaranteed to encompass all of the points of interest, but it may also include other points that are slightly outside of the required distance. For our standard LatLonType, this is implemented as a bounding box – a box made up of a range of latitudes and longitudes that encompasses the circle of radius d (i.e. it will select the same or slightly more documents than geofilt will).

    The parameters are exactly the same as geofilt, so the following request will still match everything in the Buffalo store:

    Because the bounding box is less selective, if we change our distance to 3km it will still include the Buffalo store (which is actually 3.437 km away). If we used the more accurate geofilt at 3km, these documents would not match. There are many scenarios when the bounding box can make sense though – especially if you are sorting by some other criteria anyway, or sorting by distance itself.

    Since the LatLonType field also supports field queries and range queries, one can manually create their own bounding box rather than using bbox:

    geodist – The distance function

    The geodist(param1,param2,param3) function supports (optional) parameters:

    • param1: the sfield
    • param2: the latitude (pt)
    • param3: the longitude (pt)

    geodist is a function query that yields the calculated distance. This gives the flexibility to do a number of interesting things, such as sorting by the distance (Solr can sort by any function query), or combining the distance with the relevancy score, such as boosting by the inverse of the distance.

    Here’s an example of sorting by distance ascending:

    Or you could use the distance function as the main query (or part of it) to get the distance as the document score:

    The geodist function can have the points specified as function arguments, or can default to looking at the pt and sfield global request parameters.

    Or you could combine geodist() with geofilt (or bbox) to limit the results and sort them by distance (50km):

    This returns the as the score – the closest distance for 2 points that the user wants to check near (Denver and San Francisco):

    Or

    In order to return the number of results that match using a facet:

    Returning the distance

     Solr4.0

    You can use the pseudo-field feature to return the distance along with the stored fields of each document by adding fl=geodist() to the request. Use an alias like fl=dist:geodist() to make the distance come back in the dist pseudo-field instead. Here is an example of sorting by distance ascending and returning the distance for each document in dist.

    As a temporary workaround for older Solr versions, it’s possible to obtain distances by using geodist or geofilt as the only scoring part of the main query.

    Other Use Cases

    How to combine with a sub-query to expand results

    It is possible to filter by other criteria with an OR clause. Here is an example that says return by Jacksonville, FL or within 50 km from 45.15,-93.85:

    Note: you can’t try this example with the example schema since the “state” and “city” fields haven’t been defined.

    How to facet by distance

    Faceting by distance can be done using the frange QParser. Unfortunately, right now, it is a bit inefficient, but it likely will be fine in most situations. Note: frange is actually slower than geofilt.

    How to boost closest results

    It is possible also boost the score of a query by closest by factoring your function into the score of your main query…

    muthali ganesh

    Muthali loves writing about emerging technologies and easy solutions for complex tech issues. You can reach out to him through chat or by raising a support ticket on the left hand side of the page.

    You may also like