7 February, 2003: Where did I put...?

[ Home page | Web log ]

Judge [to counsel]: Would you consider a region of the globe defined by a latitude and longitude to constitute a `place'?

Counsel: That, your honour, would be a matter of degree.

(Attributed, I think, to Henry Campbell-Bannerman, but I can't find the reference.)

Lots of people seem to have been thinking about geographical web searches. It won't surprise you to discover that I think they're all basically wrong. The various approaches they've been using fall into two rough categories:

  1. do some mapping-the-internet thing in an attempt to figure out the physical location of particular IP addresses;
  2. define a metadata standard so that you can put information about location into your web page.

Of these, the first is obviously bollocks. Apart from the fact that it'll never work, it doesn't even achieve what you want, since the purpose of geographical search is to find information about your area rather than just a list of the web servers which happen to be located there. About the only people likely to be interested in this are burglars and those setting up new colocation facilities.

The second is slightly more promising, but the problem is that it requires explicit support from search-engine operators before it can be used. And, of course, nobody will use it unless the search engines support it. Not good in the short term, though I expect that in the long term it will be used.

My idea is as follows: use (``leverage'') existing search engines by encoding geographical locations in text. Now, we can probably assume that people searching for web sites by location are going to want `pages in the vicinity of ...', so that's the only type of search we need to target.

It used to be the case that web search engines could query on word prefixes like `geog*', which would then match `geography', `geographic', etc. This suggests a really simple implementation: we write down the latitude and longitude of a location in some base which can easily be expressed in characters the search engines index (presumably the letters of the Latin alphabet), and we write down the location as (first digit of latitude), (first digit of longitude), (second digit of latitude), (second digit of longitude), ....

Then a user who wanted to search for pages which advertised a location in some vicinity could just put in as much precision as she wanted, append a wildcard and do a completely normal web search. (Note that there are some problems here; in particular, we search for things within boxes defined by constant latitude and longitude, whereas ideally we'd want to be able to search for `things within 100km of ...'. But I don't think that's a tremendous limitation.)

Unfortunately, Google doesn't support prefix searches. So to use this idea we'd have to encode the latitude/longitude information in words, and then search for a phrase which begins with the most-significant few digits of the position. There's a limitation here, which is that Google doesn't allow you more than ten search terms, but if we make the first word represent the first 1 box (so that it can take 360180 = 64,800 values -- and need only be four letters long, though in practice we'd probably want to prefix it `lll' or something to distinguish them from real words) , then we've already narrowed things down to the nearest (approximately) one hundred kilometres, so the search-terms limit oughtn't to be too onerous.

'Course, I don't think that the readership of my ``'blog'' is large enough for it to be worth working this idea out in any more detail (in particular, choosing an encoding). And if I did, I expect the XML crowd would whine at me for not making it XML and standards compliant and all that shite.


Copyright (c) 2003 Chris Lightfoot; available under a Creative Commons License.