<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>ODM Technology &#187; OBID</title>
	<atom:link href="http://blog.odmtech.com/tag/obid/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.odmtech.com</link>
	<description>Ord's Blog</description>
	<lastBuildDate>Sat, 04 Jul 2009 15:22:00 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Local Solr</title>
		<link>http://blog.odmtech.com/2009/06/12/local-solr/</link>
		<comments>http://blog.odmtech.com/2009/06/12/local-solr/#comments</comments>
		<pubDate>Fri, 12 Jun 2009 16:15:20 +0000</pubDate>
		<dc:creator>Ord</dc:creator>
				<category><![CDATA[OBID]]></category>
		<category><![CDATA[geocoding]]></category>
		<category><![CDATA[gis]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>

		<guid isPermaLink="false">http://blog.odmtech.com/?p=94</guid>
		<description><![CDATA[For searching at our Open Business Information Directory project we have been using a Solr server.  Based on the Apache Lucene library, it consistenly works well for the type of searches we need to do.  Since a lot of the applications for business data will be local searches, we looked at ways to implement distance [...]]]></description>
			<content:encoded><![CDATA[<p>For searching at our <a href="http://obid.org">Open Business Information Directory</a> project we have been using a <a title="Solr" href="http://lucene.apache.org/solr/">Solr</a> server.  Based on the Apache Lucene library, it consistenly works well for the type of searches we need to do.  Since a lot of the applications for business data will be local searches, we looked at ways to implement distance algorithms.  For a quick solution, we turned to LocalLucene/LocalSolr.  This package adds distance search to Solr, so that we can send queires for matching within a certain radius of a point.  LocalLucene is available from <a href="http://sourceforge.net/projects/locallucene/">SourgeForge</a>.</p>
<p>Getting started takes some doing.  First of all, we need to perform geocoding of  our records.  It is worth noting that not every record needs to be geocoded &#8211; ones that aren&#8217;t coded just won&#8217;t appear in proximity searches.  To get up and running quickly, we used a postal code database to get a rough location of the records that weren&#8217;t already geocoded.</p>
<p>For the changes to the Solr installation, I referred to the helpful tutorial at <a href="http://www.gissearch.com/localsolr">GISSearch.com</a>.  Once the changes had been made, I reindexed our records.  This was the longest part of the process &#8211; even though I only processed the Canadian records for this test, there are still over a million to go through.  If the server wasn&#8217;t being used I could have shut it down, deleted the indexes and rebuild to save some time.</p>
<p>Along the way, a few problems came up.  First was that a version build with the latest sources didn&#8217;t work, I had to revert to some earlier stable versions.  At GISSearch there is an example package that has a compiled solr that works, so that is a good place to start if you are having issues there.  The other big problem was that a bug in the phps output writer was preventing the searches from running.  Switching to xml or json output solves that.</p>
<p>Using Local Solr instead of writing our own solutions has saved a lot of development time.  We still need to do some performance testing to see how it will hold up under heavy usage, but so far it looks like with a dedicated server for geo searching we will be able to keep up with the loads.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.odmtech.com/2009/06/12/local-solr/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Geocoding in PHP with Google&#8217;s API</title>
		<link>http://blog.odmtech.com/2009/02/23/geocoding-in-php-with-googles-api/</link>
		<comments>http://blog.odmtech.com/2009/02/23/geocoding-in-php-with-googles-api/#comments</comments>
		<pubDate>Tue, 24 Feb 2009 04:54:14 +0000</pubDate>
		<dc:creator>Ord</dc:creator>
				<category><![CDATA[OBID]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[geocode]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[map]]></category>
		<category><![CDATA[php]]></category>

		<guid isPermaLink="false">http://blog.odmtech.com/?p=30</guid>
		<description><![CDATA[Today&#8217;s project was geocoding entries in a database of businesses, where we have the street address, city and postal code, but need latitude and longitude for mapping purposes.  There are a number of web sites that will perform this service for a fee, but it can also be done without charge using either the Google [...]]]></description>
			<content:encoded><![CDATA[<p>Today&#8217;s project was geocoding entries in a database of businesses, where we have the street address, city and postal code, but need latitude and longitude for mapping purposes.  There are a number of web sites that will perform this service for a fee, but it can also be done without charge using either the <a href="http://code.google.com/apis/maps/">Google Maps API</a> or <a href="http://dev.live.com/VirtualEarth/">Microsoft Virtual Earth API</a>.  Both of these APIs limit the number of queries per day which can be looked up without charge.   Also, both have terms which put some limitations on what can be done with the data, the Mircosoft one being much more restrictive.</p>
<p>Since the Google API is easier to use, I started with that one.  The function is written in PHP so that it will be easy to call in in response to searches from a web site if need be.  The function presented here is only using Google&#8217;s map API, it will be expanded to add an the Virtual Earth version to it, so that the caller can decide which service to use.</p>
<p>In my case, I only want to do lookups on records that actually have street addresses.  Records that only specify a city, or maybe a PO box I am not interested in getting coordinates for, so that is why the test for record['street'] is there.  Google API allows a country bias value &#8211; this helps to resolve ambiguities in the address.  This value is the top level domain for the country, which is <em>usually</em> the country code, but not always.  Here I just want to make sure that Canadian records are recognized, so the test.  If this routine needs to deal with all countries, we would need to look up a table converting country codes to tlds, or else just assign the country code to the gl value and put in tests for the exceptions.  The sensor=false tells the API that this request is not coming from a device that can determine it&#8217;s own position.</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #990000;">define</span> <span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;GOOGLE_KEY&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;yourapikeyforgoogle&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;">// geocode the supplied record using $service</span>
<span style="color: #000000; font-weight: bold;">function</span> geoCode<span style="color: #009900;">&#40;</span><span style="color: #000088;">$record</span><span style="color: #339933;">,</span><span style="color: #000088;">$service</span><span style="color: #339933;">=</span><span style="color: #0000ff;">&quot;Google&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
    <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$service</span><span style="color: #339933;">==</span><span style="color: #0000ff;">&quot;Google&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
        <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$record</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'street'</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
            <span style="color: #000088;">$address</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$record</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'street'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
            <span style="color: #000088;">$address</span> <span style="color: #339933;">.=</span> <span style="color: #0000ff;">&quot; ,&quot;</span><span style="color: #339933;">.</span><span style="color: #000088;">$record</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'city'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
            <span style="color: #000088;">$address</span> <span style="color: #339933;">.=</span> <span style="color: #0000ff;">&quot; ,&quot;</span><span style="color: #339933;">.</span><span style="color: #000088;">$record</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'state'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
            <span style="color: #000088;">$address</span> <span style="color: #339933;">.=</span> <span style="color: #0000ff;">&quot; ,&quot;</span><span style="color: #339933;">.</span><span style="color: #000088;">$record</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'postalcode'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
            <span style="color: #000088;">$url</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;http://maps.google.com/maps/geo?q=&quot;</span><span style="color: #339933;">.</span><span style="color: #990000;">urlencode</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$address</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
            <span style="color: #000088;">$url</span><span style="color: #339933;">.=</span> <span style="color: #0000ff;">&quot;&amp;amp;output=csv&amp;amp;oe=utf8&amp;amp;sensor=false&amp;amp;key=&quot;</span><span style="color: #339933;">.</span>GOOGLE_KEY<span style="color: #339933;">;</span>
            <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$record</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'countrycode'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">==</span><span style="color: #0000ff;">&quot;CA&quot;</span><span style="color: #009900;">&#41;</span> <span style="color: #000088;">$url</span> <span style="color: #339933;">.=</span> <span style="color: #0000ff;">&quot;&amp;amp;gl=ca&quot;</span><span style="color: #339933;">;</span>
            <span style="color: #000088;">$result</span> <span style="color: #339933;">=</span> <span style="color: #990000;">file_get_contents</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$url</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
            <span style="color: #000088;">$parts</span> <span style="color: #339933;">=</span> <span style="color: #990000;">explode</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;,&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$result</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
            <span style="color: #990000;">list</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$status</span><span style="color: #339933;">,</span><span style="color: #000088;">$accuracy</span><span style="color: #339933;">,</span><span style="color: #000088;">$latitude</span><span style="color: #339933;">,</span><span style="color: #000088;">$longitude</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">=</span><span style="color: #000088;">$parts</span><span style="color: #339933;">;</span>
            <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$status</span><span style="color: #339933;">==</span><span style="color: #0000ff;">&quot;200&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
                <span style="color: #000088;">$record</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'latitude'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">=</span><span style="color: #000088;">$latitude</span><span style="color: #339933;">;</span>
                <span style="color: #000088;">$record</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'longitude'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">=</span> <span style="color: #000088;">$longitude</span><span style="color: #339933;">;</span>
                <span style="color: #000088;">$record</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'geo-accuracy'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">=</span><span style="color: #000088;">$accuracy</span><span style="color: #339933;">;</span>
                <span style="color: #000088;">$record</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'geo-service'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">=</span><span style="color: #0000ff;">&quot;Google&quot;</span><span style="color: #339933;">;</span>
            <span style="color: #009900;">&#125;</span>
        <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span>
    <span style="color: #b1b100;">return</span> <span style="color: #000088;">$record</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Google allows a maximum of 15000 queries per day from a single IP.  To code my values, I use a cron job that runs once per minute and looks up 10 at a time, with a 1 second interval between each.  That gives 14400 per day, slightly below the limit.</p>
<p>For either Google MAP or Microsoft VE API&#8217;s you will need a key. Google will give you one instantly when you <a href="http://code.google.com/apis/maps/signup.html">sign up</a>, and Microsof will let you request an <a href="https://mappoint-css.live.com/MwsSignup/">Evaluation Developer Acount </a>- a slightly longer process.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.odmtech.com/2009/02/23/geocoding-in-php-with-googles-api/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>US Zip Codes Database</title>
		<link>http://blog.odmtech.com/2009/02/17/us-zip-codes-database/</link>
		<comments>http://blog.odmtech.com/2009/02/17/us-zip-codes-database/#comments</comments>
		<pubDate>Tue, 17 Feb 2009 22:06:00 +0000</pubDate>
		<dc:creator>Ord</dc:creator>
				<category><![CDATA[OBID]]></category>
		<category><![CDATA[database]]></category>

		<guid isPermaLink="false">http://blog.odmtech.com/?p=9</guid>
		<description><![CDATA[More and more, businesses have &#8220;store locators&#8221; on their web sites.  Most of them want either a State and City, or a Zip Code.  These are handy for finding the location of the nearest Starbucks, but they really suck when I want to get a list of all the ones in Alaska.  Considering that I [...]]]></description>
			<content:encoded><![CDATA[<p>More and more, businesses have &#8220;store locators&#8221; on their web sites.  Most of them want either a State and City, or a Zip Code.  These are handy for finding the location of the nearest Starbucks, but they really suck when I want to get a list of all the ones in Alaska.  Considering that I will want to repeat this for many different chains, I need a solution that will let me automate it.</p>
<p>Using a list of all zip codes I could just brute force my way through it, querying every code in order.  With over 40000 codes in the US, this wouldn&#8217;t be too efficient &#8211; something like half a day at 1 query / second.  If there are 100 national chains that I wanted to do this for, it could take months.  It could be shortened by running several in parallel, maybe even using extra machines &#8211; but it&#8217;s still not a very elegant way of doing it.</p>
<p>Instead, knowing the coordinates of the zip code center, and the radius of the search I am using it should be possible to build a list of zip codes that cover the entire area, with as little overlap as possible.  Since I can&#8217;t be sure that the coordinates in my list of Zip Codes will be exactly the same as the coordinates used by the store locators, I&#8217;ll have to allow some overlap range.</p>
<p>For the zip code database, I am using one from <a href="http://www.free-zipcodes.com">http://www.free-zipcodes.com</a> &#8211; this is a 2006 database, and while it&#8217;s not as up to date as some of the commercial ones, it should be fine for my purposes.  I find it strange that the us post office doesn&#8217;t have a zip code database for download.  There doesn&#8217;t even seem to be anything convenient to crawl to build one.</p>
<p>Future versions could use data from the 2000 US census to predict which zip codes are more likely to have businesses near them, so that the searches could start with the highest density areas first and then fill in the other spots later.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.odmtech.com/2009/02/17/us-zip-codes-database/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
