Full Paper at ICASSP 2021 with the Apple Siri Team

Happy to share yet another publication with the Siri Speech team at Apple, this time led by Sashank Gondala, who interned with us last year. Our full paper “Predicting Entity Popularity to Improve Spoken Entity Recognition by Virtual Assistants” is accepted at ICASSP 2021.

Language models (LMs) for virtual assistants (VAs) are typically trained on large amounts of data, resulting in prohibitively large models which require excessive memory and/or cannot be used to serve user requests in real-time. Entropy pruning results in smaller models but with significant degradation of effectiveness in the tail of the user request distribution. We customize entropy pruning by allowing for a keep list of infrequent n-grams that require a more relaxed pruning threshold, and propose three methods to construct the keep list. Each method has its own advantages and dis- advantages with respect to LM size, ASR accuracy and cost of constructing the keep list. Our best LM gives 8% average Word Error Rate (WER) reduction on a targeted test set, but is 3 times larger than the baseline. We also propose discriminative methods to reduce the size of the LM while retaining the majority of the WER gains achieved by the largest LM.


Paper on eCommerce at SIGIR Forum

As eCommerce is becoming increasingly important, we recruited a small team of researchers from major and not so major players in the field to share views and experiences on theoretical and practical challenges and future directions on eCommerce search and recommendations. The result of our effort, “Challenges and Research Opportunities in eCommerce Search and Recommendations” is published in June 2020 issue of SIGIR Forum.

With the rapid adoption of online shopping, academic research in the eCommerce domain has gained traction. However, significant research challenges remain, spanning from classic eCommerce search problems such as matching textual queries to multi-modal documents and ranking optimization for two-sided marketplaces to human-computer interaction and recom- mender systems for discovery and browsing. These research areas are important for under- standing customer behavior, driving engagement, and improving product discoverability and conversion. In this article we identify the challenges and highlight research opportunities to improve the eCommerce customer experience.

Short Paper with the Apple Siri Team at SIGIR 2020

Happy to share my first pubication with the Siri Speech team at Apple. Our short paper “Predicting Entity Popularity to Improve Spoken Entity Recognition by Virtual Assistants” is accepted at SIGIR 2020.

We focus on improving the effectiveness of a Virtual Assistant (VA) in recognizing emerging entities in spoken queries. We introduce a method that uses historical user interactions to forecast which entities will gain in popularity and become trending, and it subse- quently integrates the predictions within the Automated Speech Recognition (ASR) component of the VA. Experiments show that our proposed approach results in a 20% relative reduction in errors on emerging entity name utterances without degrading the overall recognition quality of the system.

904Labs: Goodbye and thanks for all the clicks!

The end of 2019 will mark the fifth anniversary of 904Labs. Wouter and I founded 904Labs in late 2014 and embarked in an adventure of becoming entrepreneurs while holding tight to our scientist trait. We took courses in entrepreneurship, joined our university’s startup incubator, poured all the money we had saved by that moment in the company, and we set sail for bringing state-of-the-art search to the masses. 

A lot has happened in these five years. We met with incredible people in all ranks, involved in from local startups to worldwide multinationals. Each one helped us in their own way, from validating our ideas, shaping our product-A.I. for Search-, to running pilots. We received support from our university, University of Amsterdam, through grants and loans, which helped us go through the development phase of our product and bring it to the market. Within a year since we started, we grew from just two founders to a team of six, and our code base went from a bunch of scripts to a full blown, distributed, horizontally scalable, multi-tenant architecture. In our second year we ran our first pilot, with massive success: A.I. for Search managed to increase search driven revenue by 38%! The same year we had our first big customer and the year after a second, bigger one. By 2019, A.I. for Search has served searches from more than 15 countries and an equal number of languages. It has battled and won in more than 10 A/B tests, in all of which it has proven to substantially increase search driven revenue. Tech-wise, 904Labs has been success story, really. 

Business-wise, the story is slightly different. Wouter and I, our business developers and salespeople, found it hard to sell A.I. for Search. Customers were excited with the tech and the potential, but the deals were not closing. We’ve been banging our heads against the wall for a long time but didn’t manage to find a solution to why the deals were not closing, at least not early enough. Three years in and without a good customer base, we’ve hit a financial storm. We had to let all of our people go, and our offices as well. Despite the financial stress, we decided to keep the company running while hoping for better days; we minimized all possible expenses to squeeze out all the time we could buy until new deals come in–which would allow us to reboot the company. Two years since, we didn’t manage to close new deals, but the epiphany we were looking for since our early days stroke us: deals weren’t closing because we’ve been targeting a very narrow and difficult customer segment, which was insufficient to sustain the company growth we needed and wished for. The realization hit us hard. We spent a lot of time thinking of alternatives but couldn’t find a satisfying alternative. In the beginning of 2019, one of our two customers left, and their departure marked the beginning of winding down 904Labs. 

We decided that despite the difficult customer segment we chose to target, 904Labs’ tech was still magical and had to remain with the world no matter what was the future of 904Labs as company. We’ve reached out to anyone and any entity that we believed our tech will be useful: our professional network, potential customers, competitors. We offered a one time license to our source code. After taking hold of the code one would be able to do whatever they want with it. In the one time license fee, we included our latest version of our code base, our infrastructure, documentation, our processes, and, of course, onsite training and support. We’ve been actively talking with a few companies now, and they will bring in A.I. for Search to either complement and strengthen their offering, or to use it internally in different projects. The next few quarters may bring the company of 904Labs to an end, but its tech will survive, and it will keep helping people find what they’re looking for; by learning from one click at a time, as it used to.

As to what Wouter and I are up to next, it has been a tough decision but we’re on good and exciting new paths. Wouter has joined Zeta-Alpha-Vector, a Dutch deep learning research company based in Amsterdam, and I’m joining Apple in California to work with the Siri Understanding team. While at 904Labs, we both learned so much: building and running a business, developing a product, managing engineering and sales teams. It’s been an invaluable experience. 904Labs gifted us the knowledge of what it takes to build a product in the real-world, which is very different from what we had in mind when we left Academia. By now we know that a product needs to be fun from a tech/scientific perspective but it is equally important to satisfy customers needs and solve specific problems–at the same time it is bound by limited resources, either be it time, people, or budget. Finding the right balance on all three dimensions is key to product success. The lesson may have come late for 904Labs, but we are excited to be able to apply this hard earned knowledge as well as earlier and newly acquired skills to new environments and challenges!

Before waving goodbye, we would like to take a moment and extend a big thank you to everyone who has been involved with 904Labs, directly or indirectly: Our customers, engineers, sales people, university, mentors, shareholders, network, and last, but not least, our families and friends who’ve been very supportive throughout this entire journey. It’s been a fun ride but not always easy. We wouldn’t have made it so far without their support. Thank you!

Thoughts on the future of Search in E-commerce

I was invited to give a talk at the SIGIR Workshop On eCommerce 2019 but unfortunately, I am not attending SIGIR this year. Instead I wrote down some thoughts on interesting
problems, ideas and challenges in the e-commerce domain. Here they are:

  1. The vocabulary gap is still an open problem. People refer to products in different ways that products are described in the catalogue. Neural networks, learning to rank, query intent engines are all important. Our experience at 904Labs shown that a query intent engine that boosts specific product categories given a query, boosts our learning to rank system by more than 16% in additional revenue. These results suggest that effective initial ranking is very important for effective learning to rank. To this end, we foresee an increasing interest in methods that can re-rank the entire collection and not only the top-N documents. This is particularly important for larger shops with large inventories (> hundred of thousands of items) where a query can return hundreds of items, and only the top few are re-ranked.
  2. E-commerce search is as much as about exploration as it is about finding the best match. From our experience at 904Labs, we see a large fraction of queries to revolve around categories, or combinations of categories, e.g., “red shoes”, “kitchen tables”, “dvd players”, “ebooks for 12 years old”. This type of queries go beyond our typical search and require understanding the query and generating a list of recommended relevant items. This proposition is supported by the surprising effectiveness of sorting by popularity; the most popular items for a query are potential good candidates for this “recommendation list” that the user is looking for. Back to query understanding of this type of exploratory queries, one would think that natural language processing can help here but in the e-commerce setting, queries are very short and any language analysis falls short. An open question here is how can we go from these broad queries to a good set of recommendations? A natural way is devise a hybrid system of search and recommendations: We fire the query to a search system and then we take the first few items as seed to a recommender system for getting similar items. Or a system that transforms a query to an image (think AttnGAN or similar) and each product to an image and then rank documents by their image similarity to the query’s–the image representation may constraint the latent space, abstract the language of the query and that of the document and be able to capture semantics that are otherwise difficult to encode in textual form. The image representation of queries and documents also offers explainability when it comes to explaining the rankings of the system; which is becoming increasingly quite important in machine learning-based systems.
  3. Evaluation metrics in e-commerce. There several directions here that we need more work. First, the e-commerce setting is a conjunction of exploratory search, typical search, and recommendations. Using the standard IR precision and recall measures for e-commerce may not tell us the entire story for how happy makes its users. We need to discover the aspects of a system that makes users happy for designing one or more metrics for evaluating search and recommendation systems. These metrics should also correlate well with revenue but also with customer loyalty (measured in returning customers and in shortening the time between returns). Such a metric (or a multiple of metrics) will then allow us to run offline experiments and make predictions on revenue, which is the main KPI that systems are evaluated in production for most e-commerce business.

Extra tip: we found that boolean scoring may be on par or outperform tf.idf or BM25 scoring in the e-commerce domain, it’s worth checking its effectiveness on your own data 😉

A.I. Expo Europe 2018

This year we were invited to be part of A.I. Expo Europe, held in Amsterdam. The organizers were very kind to offer us a free booth in the Startup zone of the exhibition. The event took place on 27 and 28 June 2018, and it was packed with talks and visitors. There were three tracks: A.I., IoT, and Blockchain, with the latter two dominating the exhibitors list.

It was an intense two days, with lots of people dropping by 904Labs’ booth and asking about what we do and how 904Labs search technology can help them in their settings. Altough the average visitor profile was not looking for onsite search, we had good chats with a bunch of Dutch and international and small, medium, and big companies on how onsite search can help their endeavors. We talked on how search technology is underneath many applications, including chatbots, and self-service applications. No matter the application, good search can help increase revenue in the setting of selling products, or reduce costs in the setting of self-service as visitors are better able to find the information they’re looking for without calling the company’s call centre.

Besides core business, it was nice to see other startups and non-startups from Amsterdam and catch up with old faces and meet new ones. Overall, it was a fun experience. A big thanks to the A.I. Expo Europe organizers for inviting us over!

904Labs Query Intent Engine just got an upgrade and the results look great

Today at 904Labs Search we’ve been having fun with the next version of our Query Intent/Understanding engine.

If you don’t know what a Query Intent/Understanding engine is, here’s some background; otherwise keep reading.

Our Query Intent/Understanding engine is closing to one year old, and it’s pretty cool: it can quickly boostrap a knowledge graph and continuously update it straight from our customer’s data. Since its introduction in September 2017, it has been tested in many different domains and languages and has shown substantial revenue uplift across customers.

In the last couple of weeks, our team has been busy tweaking our engine and pushing it even further. Our Query Intent/Understanding engine was doing already a good job, mapping query free text to product attributes such as category, or brands. However, we’ve never tested going deeper into particular product attributes, such the product weight or even, wattage. Our engineers managed to extend the logic and the scalability of the algorithm to support very fine grained suggestions, if such fine-grained information exists in the index. It is quite a feat and I’m very proud of our team!

On top of the engineering feat, the algorithm itself is pretty lean and has quite a few distinctive features that are hard to find in other engines: it does not depend on external dependencies (read Wikipedia), nor it needs periodic re-training. The only thing that a customer has to do is to plug in our system to a Apache Solr/Elasticsearch, and 904Labs Query Intent engine is built incrementally, one click/add-to-basket/purchase at a time. Oh, and it doesn’t write anything on your index. Magic!

Here’s a few interesting examples that came out from today’s experimentation:

(the examples are in Dutch; I try to translate the queries into English. Field attributes and values may be partially redacted for anonymity)

First some relatively simple cases:

  • Query: “lampen” (lamps) translates into Solr’s directive: “category:Lampen”
  • Query: “Tent” => “category:Tenten”. Note how the system has learned the mapping from singular to plural without using of manual synonyms.
  • Query: “Stoel kussens” (chair pillows) => “category:Kussens” (pillows). Here the system mapped a more specific query to a broader category due to the taxonomy structure; there was no leaf in the taxonomy for chair pillows. After pushing all types of pillow higher, pillows for chairs are pushed further up downstream in the learning to rank process.

Some more difficult cases:

  • Query: “Bed 160×200” => “size:160×200 cm” and “size:160 x 200 cm”. Note how the system has learned the discrepency in attribute values and suggests both; in the first one there are no spaces in the representation of the dimension, in the second one there are.
  • Query: “stoelen” (chairs) => “category:… en eetkamerstoelen” (kitchens and dining chairs). Here the system attempts to infer the intented category for a broad query such as “chairs”. Although there is a category labeled “chairs”, the system decides to suggest a particular leaf within the taxonomy branch of chairs.
  • Query: “tuinset” (garden set) => “number_of_persons:12”. The system implicitly defines the size and dimensions of the garden set from historical user behavior. Given the lack of a user profile, suggestions like this aim at capturing the interests of the average user of the shop.

And finally, two of my favorites, where the system attempts to infer several attributes of products from broad queries:

  • Query: “prieel” (garden house) => “material:XXX, color:Beige, weight:4.9 kg”. Here the system maps a broad query to distinct set of attributes, i.e., the material, the color, and even the weight of the garden house. Again, with no user profile provided, these suggestions attempt to capture the interests of the average user of the shop.
  • Query “eenpersoons bed” (one person bed) => “size:90 x 200 cm”. The system maps the query to exact dimensions for the bed!
  • Query: “XXX lounge” (brand redacted) => “model:White XXX”. Here the system manages to map a brand and type of products from this brand to a specific product model from this brand. Again, without user profiling. Quite impressive!

I hope you also find exciting what the new version of 904Labs Query Intent/Understanding engine can do. You can have this technology working for you too and straight away, by getting in touch with us. Plus, unlike other search engine providers, you keep your index, and we don’t touch it so we don’t lock you in. The first month is for free, terminable monthly. Get in touch!

What is a Query Intent/Understanding engine?

In the past year, at 904Labs A.I. for Search, we’ve put a lot of effort to optimize and push further the technology behind query intent/understanding engines. If you are in the search engine/information retrieval community, you know the term and how challenging the problem is. But if you are not familiar with the problem, it is hard to grasp its importance and challenges. Here’s my attempt to explain it in simple words.

A Query Intent/Understanding engine aims at identifying the intent of a user’s query and ultimately at translating it to a set of search directives.

Consider a user coming to your e-shop and typing in the search box: “red shoes”. The query may look pretty straightforward to you when it comes to what products to show to the user but, for a machine, it’s pretty hard to figure out what the person meant.

To get into the machine’s shoes (pun intended) think for a moment that you were born and raised in a warehouse, which is isolated from the rest of the world but has all the inventory of an e-shop. You’ve never left the warehouse so you don’t know anything about the world. It’s quite grim world but bare with me. One day someone slips a message under the door of the warehouse with the words “red shoes”. Now your task is to select a set of products that satisfy/cover that person’s information need. Obviously you know nothing about that mysterious person, let alone their preferences, and you barely understand the language of the message–perhaps you are able to recognize characters and words and match them to words found on the labels of products, but that’s pretty much it.

In such a surreal world, you can imagine that it is quite difficult, even for you, a human, to select a set of products that relate to “red shoes”. The challenge lies in that much of the important, contextual, information that we have access to from our constant interactions with the real world by living in it, it is very much missing in this artificial setting: We don’t know whether the user is a he or a she (and therefore we don’t know if we should pick male or female type of shoes), we don’t know what is the “hot” color of the season, nor the most popular brand, nor whether the person is looking for sneakers or boots or high-heels. There are  lots of unknowns.

A Query Intent/Understanding engine is an algorithm that tries to make sense of the world for machines, or other entities, which are locked up in a similar warehouse as the one we described above and have no access to contextual information. A Query Intent/Understanding algorithm aims at mapping free text (a user query) to a series of directives (rules) that when applied, they will yield a useful set of products for the user.  In our “red shoes” example, we are looking for directives that look as the following: “filter on products that have attribute:red and category:shoes”. At a first glance, the mapping looks simple but as of now you’ve seen from our warehouse setting that it can be quite daunting.

Researchers and practitioners have been working on this problem for quite some time and progress has been made; however the community has still some way to go before solving it. At the core of current solutions, there is a lot of complex technology such as neural networks that power NLP (Natural Language Processing) tools and other pipelines that require lots of human annotations (read expensive). These approaches yield relatively good accuracy, however, putting these systems into production may still be a challenging engineering problem–from setting up the data pipelining to scaling up, to retraining, and to monitoring system effectiveness and efficiency.

At 904Labs we’ve developed a Query Intent/Understanding engine that hooks up on an e-shop’s data, bootstraps a knowledge graph, and it learns the mapping to directives automatically, i.e.,  without supervision (read without human annotations). It doesn’t require retraining nor external dependencies, and you can use it straight away, today, on your own Apache Solr or Elasticsearch index (from which we only read and never write and we never lock you in). If that sounds appealing, get in touch for a one month free trial!

Talk at TECH Talks Amsterdam

On 31 May, I was invited to give a talk at TECH Talks in Amsterdam. TECH Talks is a new meetup organized by, a new way for matching jobs and talent in IT. My talk revolved on how we built an e-commerce focused search engine using machine learning (or as most people know it, A.I.). The meetup had a great start with more than 200 registered people and more than 100 people showed up; the room at TQ was at its maximum capacity! The audience was diverse with a nice mix of frontend, backend , senior and junior engineers, and also people from other disciplines who are interested in keeping up with latest developments in IT and e-commerce. Techloop and TQ were great organizers providing a great atmosphere (pizza and beer, included) facilitating great chats and networks afterwards.

You can see the video of the talks here (mine is the first after the introduction by Techloop):


The slides of my talk are here:

Talk at ECIR 2018 Industry day

European Conference on Information Retrieval (ECIR) is a annual European scientific conference around bhe topics of search engines, recommender systems, text analytics, user modeling, and evaluation. This year ECIR was held in Grenoble, France. With more than 250 attendants and 4 days packed with tutorials, workshops, research, and industry talks, it was a great place to be to get updated with the latest and greatest about search engines.

The theme of this year’s Industry Day was to bring lessons learned from industry to academia. What are differences when developing a research algorithm and when we bring to practice? These lessons could inspire and inform our fellow academics for the challenges practitioners face when bring these algorithms to production.

In my talk I touched upon a bunch of challenges we’ve faced at 904Labs and how we came about to solving them. Open questions revolve around online learning to rank algorithms, delayed feedback, design of new metrics to avoid embarassing results, and the importance of investing in an evaluation platform. At 904Labs we have come a long way and we have working answers for these questions, however, as these questions are particularly difficult to answer, I’ve invited people to drop me a note with ideas, if they are interested. I’ve already got some nice feedback, and I hope to see more research on these areas in the near future!

Below are some tweets from my talk.