Hover over the dots to explore related posts. Closer dots are more semantically related, and the red dot marks the current page.
Hover over the dots to explore related posts. Closer dots are more semantically related, and the red dot marks the current page.
I design scalable systems for speech, search, recommendation, and predictive analytics, combining theory with elegant engineering to enhance user experiences. With a Ph.D. in Machine Learning and a strong foundation in Physics, I have founded three companies: MyYard, the first cloud-based ERP system for the waste management industry; 904Labs, the world’s first self-learning product search engine offered as a service; and Solumbro, which introduced the first solar-powered umbrella with a virtual assistant. As an academic, my research has earned an H-index of 20, with over 55 published papers. I have co-supervised a Ph.D. thesis and guided more than 10 master’s theses. Currently, I’m an R&D Engineer at Apple, tackling virtual assistants at the challenging intersection of speech and search.
904Labs’ self-learning search engine (904SLS) revolutionized e-commerce and content search as the first commercially available self-learning search engine. By optimizing search results in near real-time based on user behavior, 904SLS increased conversion rates and achieved a 30% revenue uplift for live e-commerce shops, consistently. Offering comprehensive search capabilities from query understanding to result re-ranking, 904SLS seamlessly integrates with Lucene-based infrastructures without requiring re-indexing. The team’s exceptional engineering tackled challenges in machine learning, scalability, and domain-specific needs, delivering a state-of-the-art system that outperformed traditional learning-to-rank solutions.
Streamwatchr (2013–2016), developed at the University of Amsterdam, used machine learning and named entity extraction to track real-time global music listening behavior from tweets. Offering features like Top-100 charts, interactive maps, and a “radio” stream mode powered by a dynamic recommender system, it analyzed over 438 million tweets, identifying 660,941 artists and linking them to MusicBrainz and YouTube videos. Streamwatchr’s innovative engineering and real-time analytics, built with Python and MongoDB, provided unique insights into music trends and earned it a spot in the Dutch delegation at South by Southwest (SXSW). Read more about Streamwatchr.
In collaboration with master’s students Andrei Oghina and Mathias Breuss, I supervised a hands-on project exploring whether a movie’s IMDB rating could be predicted from social media activity before its release. By analyzing tweets about movies and their YouTube trailers, the team extracted features to train a rating classifier. The model achieved an impressive ±0.25 accuracy, predicting ratings such as 7.75 or 8.25 for movies with an average IMDB rating of 8. This groundbreaking work, published at ECIR 2012, earned the Best Paper Award for its novel approach to forecasting movie success. Read the more about the paper.
904Labs was an Amsterdam-based artificial intelligence company that offers the first commercially available self-learning search engine. I co-founded 904Labs in 2014 and led the company until late 2019. I worked on setting business strategy and development, as well as on the research agenda and the engineering roadmap for 904Labs’ core search and recommendation algorithms.
Solumbro was an Athens-based Internet-of-Things company that designed and manufactured smart outdoor parasols. Solumbro was founded in 2015 and operated until late 2019. Solumbro was fully bootstrapped and got to a fully-working prototype, which made the news. I was an advisor at Solumbro in machine learning, big data, and software infrastructure.
MyYard (WasteLogics since 2013) is a U.K.-based software company that offers entreprise resource planning (ERP) suite for waste management. I co-founded MyYard in 2005, and I was involved in designing and engineering the MyYard software until 2007. MyYard was the first in its kind to be offered in the cloud with a pay-as-you-go business model–business properties that still keep it ahead of the curve.
My research spans a broad spectrum of information access challenges, from predictive analytics to entity linking, online learning to rank, and voice search. Over the years, I’ve explored these topics at the University of Amsterdam, 904Labs, and Apple, resulting in 55+ publications and an H-index of 20. I’ve also contributed to the academic community as a program committee member for top-tier conferences and journals, and supervised numerous Ph.D., M.Sc., and B.Sc. theses, with some earning national and international recognition.
Community engagement and academic collaboration have been pivotal, with active roles in conferences like SIGIR, WWW, and WSDM, among others. My Ph.D. thesis, Mining Social Media: Tracking Content and Predicting Behavior, delves into tracking news content in social media and modeling user behavior. This work, alongside my industry research, continues to shape my understanding and application of machine learning and information retrieval to solve real-world problems.
I have worked on a wide spectrum of information access problems ranging from predictive analytics and content tracking to entity linking and online learning to rank. Most of my research has been conducted during 2007–2014 at Information and Language Processing Systems (ILPS) at the University of Amsterdam and it continues today at Apple. In the table below, I list an overview of my research interests over the years:
Time span | Research topic |
---|---|
2024– | Apple Intelligence, evaluation, training |
2020–2024 | Voice search, voice editing, evaluation |
2014–2019 | Product search, online learning to rank, evaluation |
2012–2014 | Entity linking, summarization, recommender systems |
2008–2012 | Predictive analytics, social media, content tracking |
2007–2008 | Speech recognition in user generated content |
As of September 6, 2021, I have published 55 papers with a total of 1,829 citations; my H-index is 20. You can access my publications via my Google Scholar profile.
Year | My Role | Ph.D. candidate | Research topic |
---|---|---|---|
2021 | Examiner | Seyyed Hadi Hashemi | Modeling Users Interacting with Smart Devices |
2017 | Copromotor | David Graus | Entities of Interest |
2014 | Examiner | Damiano Spina Valenti | Entity-based Filtering and Topic Detection for Online Reputation Monitoring in Twitter |
I am happy to have worked with more than a dozen of Ph.D., M.Sc., and B.Sc. students on a variety of research topics for their theses. Some of these masters theses have made it to a publication, some have won national thesis awards and some others have won best paper/poster awards.
Year | Student and research topic |
---|---|
2021 | M.Sc. Param Popat on application-specific language models |
2020 | M.Sc. Sashank Gondala on error-driven pruning of language models |
B.Sc. Sahas Dendukuri on acoustic embeddings | |
2017 | Ph.D. David Graus on Entities of Interest |
2014 | B.Sc. Thijs van der Velden on real-time music charts on Twitter |
M.Sc. Bart Eijk on recommendation ensembles for music discovery | |
M.Sc. Varvara Tzika on retrieval of classifieds in auction sites | |
M.Sc. Guido van Bruggen on analyzing real-time context of TV broadcasts | |
M.Sc. Selvi Ratnasingam on information diffusion across languages in news and social media | |
M.Sc. Nikos Voskarides on learning entity relations in big data | |
2013 | M.Sc. Kerim Meijer on evaluating performance of distributed search systems |
M.Sc. Andrei Oghina on recommending content on news sites | |
M.Sc. Mathias Breuss on recommending content on Twitter | |
2012 | M.Sc. Mark Bakker on predicting movie Awards using Twitter |
B.Sc. Gijs van der Voort on classifying tweets as reviews | |
2011 | B.Sc. Philo Kamenade on information propagation in Twitter |
2010 | M.Sc. Kamran Massoudi on microblog retrieval |
From the back cover of my thesis, Mining Social Media: Tracking Content and Predicting Behavior:
The advent of social media has established a symbiotic relationship between social media and online news. This relationship can be leveraged for tracking news content, and predicting behavior with tangible real-world applications, e.g., online reputation management, ad pricing, news ranking, and media analysis. In this thesis we focus on tracking news content in social media, and predicting user behavior.
In the first part, we develop methods for tracking content which build upon, and extend practices in Information Retrieval. We begin with discovering social media posts that discuss a news article yet they do not provide a hyperlink to it. Our methods model news articles using several channels of information, either endogenous or exogenous to the article. These models are then used to query an index of social media posts. During this process we found that the query models are close in size to the documents to be retrieved, violating a standard assumption of language modeling. We correct for this discrepancy by introducing two hypergeometric language models for modeling both queries, and documents to be retrieved.
In the second part, we focus on predicting behavior. First we look at predicting listeners’ preference in spoken user generated content, namely, podcasts. Then, we predict popularity of news articles from several news agents in terms of the volume of comments they receive. We develop models for predicting the popularity of an article for both before and after it is published. Finally, we look at a different aspect of news impact: how reading a news article affects future user browsing behavior. In each setting, we find patterns that characterize the underlying behavior and extract features that we then use to establish models for predicting online behavior.
I defended my Ph.D. thesis in December 2012. I worked on it at the University of Amsterdam, under the supervision of Prof.Dr.Maarten de Rijke. Maarten has been a great supervisor and exemplary researcher. Without his guidance and the people from ILPS and my co-authors, this thesis would not have been possible. Thank you!
I have served in the program committee member of the following venues.
Year | Venue |
---|---|
2021 | Special Interest Group on Information Retrieval (SIGIR) |
International Conference on Information and Knowledge Management (CIKM) | |
SIGIR Workshop on eCommerce (SIGIReCom) | |
Multimedia Systems Journal (MMSJ) | |
Journal of Information Retrieval (IRJ) | |
2020 | Web Search and Data Mining (WSDM) |
SIGIR Workshop on eCommerce (SIGIReCom) | |
Transactions on Information Systems (TOIS) | |
Special Interest Group on Information Retrieval (SIGIR) | |
European Conference on Informantion Retrieval (ECIR) | |
2019 | Special Interest Group on Information Retrieval (SIGIR) |
SIGIR Workshop on eCommerce (SIGIReCom) | |
Transactions on Information Systems (TOIS) | |
Word Wide Web conference (WWW) | |
Web Search and Data Mining (WSDM) – Outstanding PC Member Award, Session chair on Graphs | |
2018 | Special Interest Group on Information Retrieval (SIGIR) |
The Dutch-Belgian Information Retrieval Workshop (DIR) | |
Word Wide Web conference (WWW) | |
2017 | Special Interest Group on Information Retrieval (SIGIR) |
Transactions on Information Systems (TOIS) | |
2016 | Special Interest Group on Information Retrieval (SIGIR) |
Transactions on Information Systems (TOIS) | |
2015 | International Conference on Knowledge Discovery and Information Retrieval (KDIR) |
International Conference on Information and Knowledge Management (CIKM) | |
Information Processing & Management (IPM) | |
Web Search and Data Mining (WSDM) | |
Artificial Intelligence (AIRE) | |
Special Interest Group on Information Retrieval (SIGIR) | |
Transactions on Information Systems (TOIS) | |
2014 | Workshop on Machine Learning for Predictive Models Information Processing & Management (IPM) |
Special Interest Group on Information Retrieval (SIGIR) | |
Information Retrieval Facility Conference (IRFC) | |
Workshop on Social Multimedia and Storytelling Neurocomputing Journal | |
Information Retrieval Journal (IRJ) | |
Social News On the Web | |
IEEE’s Transactions on Knowledge and Data Engineering | |
European Conference on Informantion Retrieval (ECIR) | |
2013 | Journal of Internet Services and Applications |
International Joint Conference on Natural Language Processing | |
International Workshop and Challenge on News Recommender Systems | |
Information Retrieval Facility Conference (IRFC) | |
Information Processing & Management (IPM) | |
Special Interest Group on Information Retrieval (SIGIR) | |
The Dutch-Belgian Information Retrieval Workshop (DIR) | |
Transactions on Information Systems (TOIS) | |
European Conference on Informantion Retrieval (ECIR) | |
Transactions on the Web (ToW) | |
Journal of the Association for Information Science and Technology (JASIST) | |
2012 | The Dutch-Belgian Information Retrieval Workshop |
European Conference on Informantion Retrieval (ECIR) | |
2011 | Journal of Computer Assisted Learning (JCAL) |
2009 | Symposium on String Processing and Information Retrieval (SPIRE) |