About

Manos Tsagkias

[email protected]
05 December 2024

Short bio

I design scalable systems for speech, search, recommendation, and predictive analytics, combining theory with elegant engineering to enhance user experiences. With a Ph.D. in Machine Learning and a strong foundation in Physics, I have founded three companies: MyYard, the first cloud-based ERP system for the waste management industry; 904Labs, the world’s first self-learning product search engine offered as a service; and Solumbro, which introduced the first solar-powered umbrella with a virtual assistant. As an academic, my research has earned an H-index of 20, with over 55 published papers. I have co-supervised a Ph.D. thesis and guided more than 10 master’s theses. Currently, I’m an R&D Engineer at Apple, tackling virtual assistants at the challenging intersection of speech and search.

Highlights

904Labs self-learning search engine

904Labs’ self-learning search engine (904SLS) revolutionized e-commerce and content search as the first commercially available self-learning search engine. By optimizing search results in near real-time based on user behavior, 904SLS increased conversion rates and achieved a 30% revenue uplift for live e-commerce shops, consistently. Offering comprehensive search capabilities from query understanding to result re-ranking, 904SLS seamlessly integrates with Lucene-based infrastructures without requiring re-indexing. The team’s exceptional engineering tackled challenges in machine learning, scalability, and domain-specific needs, delivering a state-of-the-art system that outperformed traditional learning-to-rank solutions.

Streamwatchr

Streamwatchr (2013–2016), developed at the University of Amsterdam, used machine learning and named entity extraction to track real-time global music listening behavior from tweets. Offering features like Top-100 charts, interactive maps, and a “radio” stream mode powered by a dynamic recommender system, it analyzed over 438 million tweets, identifying 660,941 artists and linking them to MusicBrainz and YouTube videos. Streamwatchr’s innovative engineering and real-time analytics, built with Python and MongoDB, provided unique insights into music trends and earned it a spot in the Dutch delegation at South by Southwest (SXSW). Read more about Streamwatchr.

Predicting IMDB movie ratings

In collaboration with master’s students Andrei Oghina and Mathias Breuss, I supervised a hands-on project exploring whether a movie’s IMDB rating could be predicted from social media activity before its release. By analyzing tweets about movies and their YouTube trailers, the team extracted features to train a rating classifier. The model achieved an impressive ±0.25 accuracy, predicting ratings such as 7.75 or 8.25 for movies with an average IMDB rating of 8. This groundbreaking work, published at ECIR 2012, earned the Best Paper Award for its novel approach to forecasting movie success. Read the more about the paper.

Ventures

904Labs was an Amsterdam-based artificial intelligence company that offers the first commercially available self-learning search engine. I co-founded 904Labs in 2014 and led  the company until late 2019. I worked on setting business strategy and development, as well as on the research agenda and the engineering roadmap for 904Labs’ core search and recommendation algorithms.

Solumbro was an Athens-based Internet-of-Things company that designed and manufactured smart outdoor parasols. Solumbro was founded in 2015 and operated until late 2019. Solumbro was fully bootstrapped and got to a fully-working prototype, which made the news. I was an advisor at Solumbro in machine learning, big data, and software infrastructure.

MyYard (WasteLogics since 2013) is a U.K.-based software company that offers entreprise resource planning (ERP) suite for waste management. I co-founded MyYard in 2005, and I was involved in designing and engineering the MyYard software until 2007. MyYard was the first in its kind to be offered in the cloud with a pay-as-you-go business model–business properties that still keep it ahead of the curve.

Research

My research spans a broad spectrum of information access challenges, from predictive analytics to entity linking, online learning to rank, and voice search. Over the years, I’ve explored these topics at the University of Amsterdam, 904Labs, and Apple, resulting in 55+ publications and an H-index of 20. I’ve also contributed to the academic community as a program committee member for top-tier conferences and journals, and supervised numerous Ph.D., M.Sc., and B.Sc. theses, with some earning national and international recognition.

Community engagement and academic collaboration have been pivotal, with active roles in conferences like SIGIR, WWW, and WSDM, among others. My Ph.D. thesis, Mining Social Media: Tracking Content and Predicting Behavior, delves into tracking news content in social media and modeling user behavior. This work, alongside my industry research, continues to shape my understanding and application of machine learning and information retrieval to solve real-world problems.

Research Interests

I have worked on a wide spectrum of information access problems ranging from predictive analytics and content tracking to entity linking and online learning to rank. Most of  my research has been conducted during 2007–2014 at Information and Language Processing Systems (ILPS) at the University of Amsterdam and it continues today at Apple. In the table below, I list an overview of my research interests over the years:

Time span Research topic
2024– Apple Intelligence, evaluation, training
2020–2024 Voice search, voice editing, evaluation
2014–2019 Product search, online learning to rank, evaluation
2012–2014 Entity linking, summarization, recommender systems
2008–2012 Predictive analytics, social media, content tracking
2007–2008 Speech recognition in user generated content

As of September 6, 2021, I have published 55 papers with a total of 1,829 citations; my H-index is 20. You can access my publications via my Google Scholar profile.

Member of Ph.D. Examining Committees

Year My Role Ph.D. candidate Research topic
2021 Examiner Seyyed Hadi Hashemi Modeling Users Interacting with Smart Devices
2017 Copromotor David Graus Entities of Interest
2014 Examiner Damiano Spina Valenti Entity-based Filtering and Topic Detection for Online Reputation Monitoring in Twitter

Student Supervision

I am happy to have worked with more than a dozen of Ph.D., M.Sc., and B.Sc. students on a variety of research topics for their theses. Some of these masters theses have made it to a publication, some have won national thesis awards and some others have won best paper/poster awards.

Year Student and research topic
2021 M.Sc. Param Popat on application-specific language models
2020 M.Sc. Sashank Gondala on error-driven pruning of language models
B.Sc. Sahas Dendukuri on acoustic embeddings
2017 Ph.D. David Graus on Entities of Interest
2014 B.Sc. Thijs van der Velden on real-time music charts on Twitter
M.Sc. Bart Eijk on recommendation ensembles for music discovery
M.Sc. Varvara Tzika on retrieval of classifieds in auction sites
M.Sc. Guido van Bruggen on analyzing real-time context of TV broadcasts
M.Sc. Selvi Ratnasingam on information diffusion across languages in news and social media
M.Sc. Nikos Voskarides on learning entity relations in big data
2013 M.Sc. Kerim Meijer on evaluating performance of distributed search systems
M.Sc. Andrei Oghina on recommending content on news sites
M.Sc. Mathias Breuss on recommending content on Twitter
2012 M.Sc. Mark Bakker on predicting movie Awards using Twitter
B.Sc. Gijs van der Voort on classifying tweets as reviews
2011 B.Sc. Philo Kamenade on information propagation in Twitter
2010 M.Sc. Kamran Massoudi on microblog retrieval

Ph.D. Thesis

From the back cover of my thesis, Mining Social Media: Tracking Content and Predicting Behavior:

The advent of social media has established a symbiotic relationship between social media and online news. This relationship can be leveraged for tracking news content, and predicting behavior with tangible real-world applications, e.g., online reputation management, ad pricing, news ranking, and media analysis. In this thesis we focus on tracking news content in social media, and predicting user behavior.

In the first part, we develop methods for tracking content which build upon, and extend practices in Information Retrieval. We begin with discovering social media posts that discuss a news article yet they do not provide a hyperlink to it. Our methods model news articles using several channels of information, either endogenous or exogenous to the article. These models are then used to query an index of social media posts. During this process we found that the query models are close in size to the documents to be retrieved, violating a standard assumption of language modeling. We correct for this discrepancy by introducing two hypergeometric language models for modeling both queries, and documents to be retrieved.

In the second part, we focus on predicting behavior. First we look at predicting listeners’ preference in spoken user generated content, namely, podcasts. Then, we predict popularity of news articles from several news agents in terms of the volume of comments they receive. We develop models for predicting the popularity of an article for both before and after it is published. Finally, we look at a different aspect of news impact: how reading a news article affects future user browsing behavior. In each setting, we find patterns that characterize the underlying behavior and extract features that we then use to establish models for predicting online behavior.

I defended my Ph.D. thesis in December 2012. I worked on it at the University of Amsterdam, under the supervision of Prof.Dr.Maarten de Rijke. Maarten has been a great supervisor and exemplary researcher. Without his guidance and the people from ILPS and my co-authors, this thesis would not have been possible. Thank you!

Download a copy (PDF, 3.7MB)

Community Service

I have served in the program committee member of the following venues.

Year Venue
2021 Special Interest Group on Information Retrieval (SIGIR)
International Conference on Information and Knowledge Management (CIKM)
SIGIR Workshop on eCommerce (SIGIReCom)
Multimedia Systems Journal (MMSJ)
Journal of Information Retrieval (IRJ)
2020 Web Search and Data Mining (WSDM)
SIGIR Workshop on eCommerce (SIGIReCom)
Transactions on Information Systems (TOIS)
Special Interest Group on Information Retrieval (SIGIR)
European Conference on Informantion Retrieval (ECIR)
2019 Special Interest Group on Information Retrieval (SIGIR)
SIGIR Workshop on eCommerce (SIGIReCom)
Transactions on Information Systems (TOIS)
Word Wide Web conference (WWW)
Web Search and Data Mining (WSDM) – Outstanding PC Member Award, Session chair on Graphs
2018 Special Interest Group on Information Retrieval (SIGIR)
The Dutch-Belgian Information Retrieval Workshop (DIR)
Word Wide Web conference (WWW)
2017 Special Interest Group on Information Retrieval (SIGIR)
Transactions on Information Systems (TOIS)
2016 Special Interest Group on Information Retrieval (SIGIR)
Transactions on Information Systems (TOIS)
2015 International Conference on Knowledge Discovery and Information Retrieval (KDIR)
International Conference on Information and Knowledge Management (CIKM)
Information Processing & Management (IPM)
Web Search and Data Mining (WSDM)
Artificial Intelligence (AIRE)
Special Interest Group on Information Retrieval (SIGIR)
Transactions on Information Systems (TOIS)
2014 Workshop on Machine Learning for Predictive Models Information Processing & Management (IPM)
Special Interest Group on Information Retrieval (SIGIR)
Information Retrieval Facility Conference (IRFC)
Workshop on Social Multimedia and Storytelling Neurocomputing Journal
Information Retrieval Journal (IRJ)
Social News On the Web
IEEE’s Transactions on Knowledge and Data Engineering
European Conference on Informantion Retrieval (ECIR)
2013 Journal of Internet Services and Applications
International Joint Conference on Natural Language Processing
International Workshop and Challenge on News Recommender Systems
Information Retrieval Facility Conference (IRFC)
Information Processing & Management (IPM)
Special Interest Group on Information Retrieval (SIGIR)
The Dutch-Belgian Information Retrieval Workshop (DIR)
Transactions on Information Systems (TOIS)
European Conference on Informantion Retrieval (ECIR)
Transactions on the Web (ToW)
Journal of the Association for Information Science and Technology (JASIST)
2012 The Dutch-Belgian Information Retrieval Workshop
European Conference on Informantion Retrieval (ECIR)
2011 Journal of Computer Assisted Learning (JCAL)
2009 Symposium on String Processing and Information Retrieval (SPIRE)