Paper at WebScience 2011 — Twitter hashtags: Joint Translation and Clustering

Simon Carter, Manos Tsagkias, and Wouter Weerkamp

University of Amsterdam
09 April 2011
Keywords: paper, web science, machine translation

Abstract

The popularity of microblogging platforms, such as Twitter, renders them valuable real-time information resources for tracking various aspects of worldwide events, e.g., earthquakes, political elections, etc. Such events are usually characterized in microblog posts via the use of hashtags (#). As microbloggers come from different backgrounds, and express themselves in different languages, we witness different “translations” of hashtags which, however, are about the same event. Language-dependent variants of hashtags can possibly lead to issues in content-analysis. In this paper, we propose a method for translating hashtags, which builds on methods from information retrieval. The method introduced is source and target language independent. Our method is desirable, either instead of, or complimentary, to the direct translation of the hashtag for three reasons. First we return a list of hashtags on the same topic, which takes into account the plurality and variability of hashtags used by microbloggers for assigning posts to a topic. Second, our framework accounts for the problem that microbloggers in different languages will refer to the same topic using different tokens. Finally, our method does not require special preprocessing of hashtags, reducing barriers to real-world implementation. We present proof-of-concept results for the given Spanish hashtag #33mineros.

References

[1] Simon Carter, Manos Tsagkias, and Wouter Weerkamp. 2011. Twitter hashtags: Joint Translation and Clustering. In Web Science 2011. Uva Link; PDF.