Paper at WWW 2009 — The University of Amsterdam at Web People Search Benchmark (WePS)

Krisztian Balog, Jiyin He, Katja Hofmann, Valentin Jijkoun, Christof Monz, Manos Tsagkias, Wouter Weerkamp, and Maarten de Rijke

University of Amsterdam
21 February 2009
Keywords: paper, www, entity search, semantic search, information retrieval

Abstract

In this paper we describe our participation [1] in the Second Web People Search workshop (WePS2) and detail our approaches. For the clustering task, our focus was on replicating the lessons learned at WEPS1 on the data set made available as part of WEPS2 and on experimenting with a voting-based combination of clustering methods. We found that clustering methods display the same overall behavior on the WEPS1 and WESP2 data sets and that a hierarchical clustering approach delivers the best performance, even outperforming voting-based combinations. For attribute extraction, we explore approaches using pattern matching with manually and automatically constructed patterns. Manual patterns were constructed using expert knowledge and following analysis of sample data. Automatic pattern construction extracts textual and syntactic context around training samples and selects patterns which are expected to perform well based on leave-one-out evaluation. Experimental results show that manually constructed patterns are very effective for obtaining high recall. For automatically extracted patterns performance varied widely depending on the attribute type. Larger amounts of training data may help improve these approaches in the future.

References

[1] Krisztian Balog, Jiyin He, Katja Hofmann, Valentin Jijkoun, Christof Monz, Manos Tsagkias, Wouter Weerkamp, and Maarten de Rijke. The University of Amsterdam at WePS2. In 2nd Web People Search Evaluation Workshop (WePS 2009). 18th WWW Conference, March 2009. PDF.