Geolocation Matching Based on Latitudes and Longitudes in R

Today I was faced with an interesting challenge, finding time zones for users defined by latitude and longitude. Thanks to extensive package libraries of R, the solution was not hard at all.

First, I found the list of cities with populations greater than 1000 in Geonames.org web site. Now the list is quite big, almost 150K cities. Doing distance calculations for each of the 48K users that had location information would have been exceedingly long.

Luckily there are some tree based algorithms that make quick work of this kind of problem. I used the RANN package in R to do it. The package is incredibly intuitive to use. Here is the code I used.

# Import library
library(RANN)

# Read in geolocation data
geoloc<-read.table("data/cities1000.txt", sep="t", quote="")
# Column names
colnames(geoloc)<-c("id", "name", "asciiname", "latitude", "longitude", "feature_class", "feature_code", "country_code", "cc2", "admin1_code", "admin2_code", "admin3_code", "admin4_code", "pop", "elevation", "dem", "tz", "ModDate")
# Subset usefulpart
geoloc1<-unique(geoloc[,c("latitude","longitude","tz")])
# Create a list of locations from our own data to match the geolocation data
test<-unique(raw[,c("latitude", "longitude")])
# Get nearest neigbor
neigbor<-nn2(geoloc1[,1:2], na.exclude(test), k=1, treetype="bd")
test<-cbind(na.exclude(test), geoloc[neigbor$nn.idx,"tz"])
raw1<-merge(raw, test, all.x=T, by=c("latitude", "longitude"))
colnames(raw1)[26]<-"tz"

The code runs incredibly fast, I had timezone data in no time.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: