Paper at WWW 2009 — The University of Amsterdam at Web People Search Benchmark (WePS)

Abstract¶

In this paper we describe our participation [1] in the Second Web People Search workshop (WePS2) and detail our approaches. For the clustering task, our focus was on replicating the lessons learned at WEPS1 on the data set made available as part of WEPS2 and on experimenting with a voting-based combination of clustering methods. We found that clustering methods display the same overall behavior on the WEPS1 and WESP2 data sets and that a hierarchical clustering approach delivers the best performance, even outperforming voting-based combinations. For attribute extraction, we explore approaches using pattern matching with manually and automatically constructed patterns. Manual patterns were constructed using expert knowledge and following analysis of sample data. Automatic pattern construction extracts textual and syntactic context around training samples and selects patterns which are expected to perform well based on leave-one-out evaluation. Experimental results show that manually constructed patterns are very eﬀective for obtaining high recall. For automatically extracted patterns performance varied widely depending on the attribute type. Larger amounts of training data may help improve these approaches in the future.

References¶

[1] Krisztian Balog, Jiyin He, Katja Hofmann, Valentin Jijkoun, Christof Monz, Manos Tsagkias, Wouter Weerkamp, and Maarten de Rijke. The University of Amsterdam at WePS2. In 2^nd Web People Search Evaluation Workshop (WePS 2009). 18^th WWW Conference, March 2009. PDF.

We describe the participation of the University of Amsterdam’s ILPS group in the web, blog, web, entity, and relevance feedback track at TREC 2009. Our main preliminary conclusions are as follows. For the Blog track we find that for top stories identification a blogs to news approach outperforms a simple news to blogs approach. This is interesting, as this approach starts with no input except for a date, whereas the news to blogs approach also has news headlines as input. For the web track, we find that spam is an important issue in the ad hoc task and that Wikipedia- based heuristic optimization approaches help to boost the retrieval performance, which is assumed to potentially reduce the spam in top ranked documents. As for the diversity task, we explored different methods. Initial results show that clustering and a topic model-based approach have similar performance, which are relatively better than a query log based approach. Our performance in the Entity track was downright

24.48% similar — Paper at TREC 2009 — The University of Amsterdam at TREC 2009
Figure 1. “Detecting Algorithmic Discrimination” by Carlos Castillo, presented at DIR 2016. Delft, the Netherlands.

20.29% similar — Attending DIR 2016
Recent years have witnessed a persistent interest in generating pseudo test collections, both for training and evaluation purposes. We describe a method for generating queries and relevance judgments for microblog search in an unsupervised way. Our starting point is this intuition: tweets with a hashtag are relevant to the topic covered by the hashtag and hence to a suitable query derived from the hashtag. Our baseline method selects all commonly used hashtags, and all associated tweets as relevance judgments; we then generate a query from these tweets. Next, we generate a timestamp for each query, allowing us to use temporal information in the training process. We then enrich the generation process with knowledge derived from an editorial test collection for microblog search.

19.79% similar — Paper at SIGIR 2013 — Pseudo Test Collections for Training and Tuning Microblog Rankers
Starting my talk at ECIR 2018 Industry track. Courtesy of Gabriela Kazai.

European Conference on Information Retrieval (ECIR) is a annual European scientific conference around bhe topics of search engines, recommender systems, text analytics, user modeling, and evaluation. This year ECIR was held in Grenoble, France. With more than 250 attendants and 4 days packed with tutorials, workshops, research, and industry talks, it was a great place to be to get updated with the latest and greatest about search engines.

The theme of this year’s Industry Day was to bring lessons learned from industry to academia. What are differences when developing a research algorithm and when we bring to practice? These lessons could inspire and inform our fellow academics for the challenges practitioners face when bring these algorithms to production.

19.28% similar — Talk at ECIR 2018 Industry day
References

[1] David Graus, Daan Odijk, Manos Tsagkias, Wouter Weerkamp, and Maarten de Rijke. 2014. Semanticizing search engine queries: the University of Amsterdam at the ERD 2014 challenge. In Proceedings of the first international workshop on Entity recognition & disambiguation (ERD ‘14). Association for Computing Machinery, New York, NY, USA, 69–74. ACM Link; PDF.

19.18% similar — Paper at ERD 2014 — Semanticizing Search Engine Queries
For our experimental evaluation, we use data from Twitter, Digg, Delicious, the New York Times Community, Wikipedia, and the blogosphere to generate query models. We show that different query models, based on different data sources, provide complementary information and manage to retrieve different social media utterances from our target index. As a consequence, data fusion methods manage to significantly boost retrieval performance over individual approaches. Our graph-based term selection method is shown to help improve both effectiveness and efficiency.

17.79% similar — Paper at WSDM 2011 — Linking online news and social media
Abstract

Podcasts display an unevenness characteristic of domains dominated by user generated content, resulting in potentially radical variation of the user preference they enjoy. We report on work that uses easily extractable surface features of podcasts in order to achieve solid performance on two podcast preference prediction tasks: classification of preferred vs. non-preferred podcasts and ranking podcasts by level of preference. We identify features with good discriminative potential by carrying out manual data analysis, resulting in a refinement of the indicators of an existent podcast preference framework. Our preference prediction is useful for topic-independent ranking of podcasts, and can be used to support download suggestion or collection browsing.

References

15.05% similar — Paper at ECIR 2009 — Exploiting Surface Features for the Prediction of Podcast Preference

Paper at WWW 2009 — The University of Amsterdam at Web People Search Benchmark (WePS)

Krisztian Balog, Jiyin He, Katja Hofmann, Valentin Jijkoun, Christof Monz, Manos Tsagkias, Wouter Weerkamp, and Maarten de Rijke

University of Amsterdam

21 February 2009

Keywords: paper, www, entity search, semantic search, information retrieval

Abstract¶

References¶

Abstract¶

References¶

Related Posts

References

Abstract

References