Computing Veracity – the Fourth Challenge of Big Data

Country-level geolocation of tweets

A new paper in collaboration between the University of Warwick and the University of St. Andrews, exploring the ability to determine the country that a tweet has been posted from, has been accepted for publication in IEEE Transactions on Knowledge and Data Engineering.

This study explores how tweets from an unfiltered stream can be effectively classified by country in a real-time fashion. The work differs from previous research in tweet geolocation especially in two directions: (1) in that it processes every tweet observed in a stream, while previous work had only limited to tweets originating from a specific country or written in one language, and (2) in that the classification considers the ability to perform in real-time, i.e., solely making use of features inherent in a tweet and therefore not using other widely used features such as a user’s social network, which cannot realistically be collected while processing a stream of tweets in real-time.

Despite the difficulty of the task as it has to deal with a large number of countries using very limited data available in each tweet, the study shows that macro-F1 scores of over 85% can be achieved with a properly designed classifier.

The classifier has multiple real-world applications for data mining and analytics. In the context of PHEME, the classifier can be leveraged for the identification of trustworthy information sources posting from the same country where a breaking news event is developing, potentially mitigating the task of finding eyewitnesses.

The paper can be found here: author pre-print & publisher’s site

Be Sociable, Share!

Comments are currently closed.