Paper at SIGIR 2008 — Term Clouds as Surrogates for User Generated Speech

Manos Tsagkias, Martha Larson, and Maarten de Rijke

University of Amsterdam
20 June 2008
Keywords: paper, speech, information retrieval, sigir

Abstract

User generated spoken audio remains a challenge for Automatic Speech Recognition (ASR) technology and content-based audio surrogates derived from ASR-transcripts must be error robust. An investigation of the use of term clouds as surrogates for podcasts demonstrates that ASR term clouds closely approximate term clouds derived from human-generated transcripts across a range of cloud sizes. A user study confirms the conclusion that ASR-clouds are viable surrogates for depicting the content of podcasts.

References

[1] Manos Tsagkias, Martha Larson, and Maarten de Rijke. 2008. Term clouds as surrogates for user generated speech. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR ‘08). Association for Computing Machinery, New York, NY, USA, 773–774. ACM Link PDF