IBM algorithm allows location of users from Twitter history

26 Mar 2014

IBM is working on a system that would allow people's location to be traced from their Twitter history.

This does not sound like a big deal though. Twitter offers location sharing, but many people do not enable that, and people are left wondering where someone was located when they tweeted.

Researchers at IBM have now developed an algorithm that could place a person by looking at their 200 tweets without location information.

In a paper from IBM titled Home Location Identification of Twitter Users  by Jalal Mahmud, Jeffrey Nichols and Clemens Drews and published on online repository Arxiv, the researchers claim that the algorithm infers the home location of tweeters by looking at time zones and heuristic classifiers. They say this represented a fresh approach.

"Unlike existing approaches, our algorithm uses an ensemble of statistical and heuristic classifiers to predict locations and makes use of a geographic gazetteer dictionary to identify place-name entities," they said.

"We find that a hierarchical classification approach, where time zone, state or geographic region is predicted first and city is predicted next, can improve prediction accuracy. We have also analysed movement variations of Twitter users, built a classifier to predict whether a user was travelling in a certain period of time and use that to further improve the location detection accuracy."

"The benefit of developing these algorithms is two-fold," explains the research paper. "First, the output can be used to create location-based visualisations and applications on top of Twitter… Second, our examination of the discriminative features used by our algorithms suggests strategies for users to employ if they wish to micro-blog publicly but not inadvertently reveal their location."

The system allows the location of users to be calculated to different levels of accuracy, including city, state, time zone or geographic region. It makes use of a mixture of statistical analysis and heuristic indicators, as also reference to a geographic gazetteer (a dictionary of place names) to make its predictions.

The researchers have made use of a hierarchical approach to ensure as high a level of accuracy as possible, with the location of users determined by time zone, state or geographic region first followed by city.

On the basis of experimental testing, the team believed that the algorithm outperformed the best existing algorithms for predicting the home location of Twitter users.