What’s up?

904Labs Query Intent Engine just got an upgrade and the results look great

Today at 904Labs Search we’ve been having fun with the next version of our Query Intent/Understanding engine.

If you don’t know what a Query Intent/Understanding engine is, here’s some background; otherwise keep reading.

Our Query Intent/Understanding engine is closing to one year old, and it’s pretty cool: it can quickly boostrap a knowledge graph and continuously update it straight from our customer’s data. Since its introduction in September 2017, it has been tested in many different domains and languages and has shown substantial revenue uplift across customers.

In the last couple of weeks, our team has been busy tweaking our engine and pushing it even further. Our Query Intent/Understanding engine was doing already a good job, mapping query free text to product attributes such as category, or brands. However, we’ve never tested going deeper into particular product attributes, such the product weight or even, wattage. Our engineers managed to extend the logic and the scalability of the algorithm to support very fine grained suggestions, if such fine-grained information exists in the index. It is quite a feat and I’m very proud of our team!

On top of the engineering feat, the algorithm itself is pretty lean and has quite a few distinctive features that are hard to find in other engines: it does not depend on external dependencies (read Wikipedia), nor it needs periodic re-training. The only thing that a customer has to do is to plug in our system to a Apache Solr/Elasticsearch, and 904Labs Query Intent engine is built incrementally, one click/add-to-basket/purchase at a time. Oh, and it doesn’t write anything on your index. Magic!

Here’s a few interesting examples that came out from today’s experimentation:

(the examples are in Dutch; I try to translate the queries into English. Field attributes and values may be partially redacted for anonymity)

First some relatively simple cases:

  • Query: “lampen” (lamps) translates into Solr’s directive: “category:Lampen”
  • Query: “Tent” => “category:Tenten”. Note how the system has learned the mapping from singular to plural without using of manual synonyms.
  • Query: “Stoel kussens” (chair pillows) => “category:Kussens” (pillows). Here the system mapped a more specific query to a broader category due to the taxonomy structure; there was no leaf in the taxonomy for chair pillows. After pushing all types of pillow higher, pillows for chairs are pushed further up downstream in the learning to rank process.

Some more difficult cases:

  • Query: “Bed 160×200” => “size:160×200 cm” and “size:160 x 200 cm”. Note how the system has learned the discrepency in attribute values and suggests both; in the first one there are no spaces in the representation of the dimension, in the second one there are.
  • Query: “stoelen” (chairs) => “category:… en eetkamerstoelen” (kitchens and dining chairs). Here the system attempts to infer the intented category for a broad query such as “chairs”. Although there is a category labeled “chairs”, the system decides to suggest a particular leaf within the taxonomy branch of chairs.
  • Query: “tuinset” (garden set) => “number_of_persons:12”. The system implicitly defines the size and dimensions of the garden set from historical user behavior. Given the lack of a user profile, suggestions like this aim at capturing the interests of the average user of the shop.

And finally, two of my favorites, where the system attempts to infer several attributes of products from broad queries:

  • Query: “prieel” (garden house) => “material:XXX, color:Beige, weight:4.9 kg”. Here the system maps a broad query to distinct set of attributes, i.e., the material, the color, and even the weight of the garden house. Again, with no user profile provided, these suggestions attempt to capture the interests of the average user of the shop.
  • Query “eenpersoons bed” (one person bed) => “size:90 x 200 cm”. The system maps the query to exact dimensions for the bed!
  • Query: “XXX lounge” (brand redacted) => “model:White XXX”. Here the system manages to map a brand and type of products from this brand to a specific product model from this brand. Again, without user profiling. Quite impressive!

I hope you also find exciting what the new version of 904Labs Query Intent/Understanding engine can do. You can have this technology working for you too and straight away, by getting in touch with us. Plus, unlike other search engine providers, you keep your index, and we don’t touch it so we don’t lock you in. The first month is for free, terminable monthly. Get in touch!


What is a Query Intent/Understanding engine?

In the past year, at 904Labs A.I. for Search, we’ve put a lot of effort to optimize and push further the technology behind query intent/understanding engines. If you are in the search engine/information retrieval community, you know the term and how challenging the problem is. But if you are not familiar with the problem, it is hard to grasp its importance and challenges. Here’s my attempt to explain it in simple words.

A Query Internet/Understanding engine aims at identifying the intent of a user’s query and ultimately at translating it to a set of search directives.

Consider a user coming to your e-shop and typing in the search box: “red shoes”. The query may look pretty straightforward to you when it comes to what products to show to the user but, for a machine, it’s pretty hard to figure out what the person meant.

To get into the machine’s shoes (pun intended) think for a moment that you were born and raised in a warehouse, which is isolated from the rest of the world but has all the inventory of an e-shop. You’ve never left the warehouse so you don’t know anything about the world. It’s quite grim world but bare with me. One day someone slips a message under the door of the warehouse with the words “red shoes”. Now your task is to select a set of products that satisfy/cover that person’s information need. Obviously you know nothing about that mysterious person, let alone their preferences, and you barely understand the language of the message–perhaps you are able to recognize characters and words and match them to words found on the labels of products, but that’s pretty much it.

In such a surreal world, you can imagine that it is quite difficult, even for you, a human, to select a set of products that relate to “red shoes”. The challenge lies in that much of the important, contextual, information that we have access to from our constant interactions with the real world by living in it, it is very much missing in this artificial setting: We don’t know whether the user is a he or a she (and therefore we don’t know if we should pick male or female type of shoes), we don’t know what is the “hot” color of the season, nor the most popular brand, nor whether the person is looking for sneakers or boots or high-heels. There are  lots of unknowns.

A Query Intent/Understanding engine is an algorithm that tries to make sense of the world for machines, or other entities, which are locked up in a similar warehouse as the one we described above and have no access to contextual information. A Query Intent/Understanding algorithm aims at mapping free text (a user query) to a series of directives (rules) that when applied, they will yield a useful set of products for the user.  In our “red shoes” example, we are looking for directives that look as the following: “filter on products that have attribute:red and category:shoes”. At a first glance, the mapping looks simple but as of now you’ve seen from our warehouse setting that it can be quite daunting.

Researchers and practitioners have been working on this problem for quite some time and progress has been made; however the community has still some way to go before solving it. At the core of current solutions, there is a lot of complex technology such as neural networks that power NLP (Natural Language Processing) tools and other pipelines that require lots of human annotations (read expensive). These approaches yield relatively good accuracy, however, putting these systems into production may still be a challenging engineering problem–from setting up the data pipelining to scaling up, to retraining, and to monitoring system effectiveness and efficiency.

At 904Labs we’ve developed a Query Intent/Understanding engine that hooks up on an e-shop’s data, bootstraps a knowledge graph, and it learns the mapping to directives automatically, i.e.,  without supervision (read without human annotations). It doesn’t require retraining nor external dependencies, and you can use it straight away, today, on your own Apache Solr or Elasticsearch index (from which we only read and never write and we never lock you in). If that sounds appealing, get in touch for a one month free trial!

Talk at TECH Talks Amsterdam

On 31 May, I was invited to give a talk at TECH Talks in Amsterdam. TECH Talks is a new meetup organized by Techloop.io, a new way for matching jobs and talent in IT. My talk revolved on how we built an e-commerce focused search engine using machine learning (or as most people know it, A.I.). The meetup had a great start with more than 200 registered people and more than 100 people showed up; the room at TQ was at its maximum capacity! The audience was diverse with a nice mix of frontend, backend , senior and junior engineers, and also people from other disciplines who are interested in keeping up with latest developments in IT and e-commerce. Techloop and TQ were great organizers providing a great atmosphere (pizza and beer, included) facilitating great chats and networks afterwards.

You can see the video of the talks here (mine is the first after the introduction by Techloop):


The slides of my talk are here:

Talk at ECIR 2018 Industry day

European Conference on Information Retrieval (ECIR) is a annual European scientific conference around bhe topics of search engines, recommender systems, text analytics, user modeling, and evaluation. This year ECIR was held in Grenoble, France. With more than 250 attendants and 4 days packed with tutorials, workshops, research, and industry talks, it was a great place to be to get updated with the latest and greatest about search engines.

The theme of this year’s Industry Day was to bring lessons learned from industry to academia. What are differences when developing a research algorithm and when we bring to practice? These lessons could inspire and inform our fellow academics for the challenges practitioners face when bring these algorithms to production.

In my talk I touched upon a bunch of challenges we’ve faced at 904Labs and how we came about to solving them. Open questions revolve around online learning to rank algorithms, delayed feedback, design of new metrics to avoid embarassing results, and the importance of investing in an evaluation platform. At 904Labs we have come a long way and we have working answers for these questions, however, as these questions are particularly difficult to answer, I’ve invited people to drop me a note with ideas, if they are interested. I’ve already got some nice feedback, and I hope to see more research on these areas in the near future!

Below are some tweets from my talk.

Talk at Frankfurt Data Science meetup

I was happy to be invited at Frankfurt Data Science meetup to talk about data science, meetups, and how to build a data science-oriented startup. The event was held at Frankfurt School on March 1st, 2018. In my talk I gave an overview of the meetup scene in Amsterdam, briefly presented Amsterdam Data Science and its activites, and then I shared my experiences on founding and developing 904Labs, before I delved into one of my favorite topics: machine learning and search.

It was a packed room, with more than 200 registered people for the event, and the talk was broadcasted live on YouTube. There will soon be a video, which I will share here. The audience was from diverse backgrounds, and I enjoyed the interactions very much. I was happily surprised by the professional organization of the event, and the ambition for making Frankfurt one of the leading centres in data science in Europe. All the best to the organizers and I hope to see more collaboration on data science between Frankfurt and Amsterdam in the near future!

The talk is online on YouTube (thank you Frankfurt Data Science for making this happen!):

Talk at A.I. for Commerce meetup

Emakina, one of the largest web agencies in Europe, works with internationally acclaimed brands on their branding and electronic presence. To keep their customers ahead of the curve, Emakina has recently started a series of meetups where experts in a wide range of fields come and talk about the latest developments in their field. The last meetup was held last Thursday, 8 February 2018, and with the topic: “A.I. for Commerce”, three talks were scheduled: one from Emakina, one from 904Labs, and one from Salesforce.

In our talk, I described the importance of search in e-commerce by giving examples of failed searches in a number of settings, from finding advertized items using onsite search to mobile search. I followed by with why people choose Amazon to start their product search and highlighted that 54% of them choose Amazon because of their great search functionality–that is the reason number five for people to go to Amazon. Then, I explained why optimizing the ranking manually is close to impossible for humans by laying out the insane amount of options available and enumerating the search space (which can be at millions of millions of choices). With this as foundation, I talked about machine learning and the particular type of machine learning that we use at 904Labs for optimizing search rankings in real-time. I followed up with describing our query intent engine, which is powered by 904Sense, and our automatic synonym extraction engine. In my conclusions, I re-iterated that A.I. for Search can boost search-driven revenue by 30% and that search is becoming part of platforms. In this angle, it is important for online retailers to test the claims of their vendors by doing A/B tests before they opt in for a solution.

Amsterdam City A.I. Event

I was happy to be invited at the Amsterdam City A.I. Event on December 11, 2017. My talk revolved around our experiences at 904Labs in building a A.I. focused company. It was fun to be among A.I. enthusiasts and to see that people identified with our experiences–which means that we are on good track!

The talk is online on YouTube (clicking the link will start at my talk):

Below are some tweets from the event:

Another historical moment for 904Labs

I’m very happy to share news on the latest success of 904Labs A.I. for Search with our new customer, eci.nl.

eci is one of the largest online book specialists in the Netherlands and Belgium, with an estimated yearly revenue of 15m euro for the Dutch part. For their site search, eci relied on a manually optimized Apache Solr, integrated into Intershop.

During a three-week A/B test we’ve shown a 38% improvement in revenue and a 34% increase in conversion rate compared to the in-house Apache Solr search engine. This shows once more that adding A.I. to your search engine really makes a difference! Read the full story here.

Keynote speech at AI2Future

AI2Future is a local, Croatian, initiative for disseminating the importance and for grounding the use of Artificial Intelligence. The conference brings together researchers, A.I. start ups, and large corporates from all around Croatia to share experiences and learn from each other on what A.I. can and cannot do.

I was invited to give one of the two keynote talks. The first keynote focused on conversational agents and natural language processing and mine followed up with insights how search powers a large spectrum of applications from search, question and answering, and recommendations. I focused on the work we do at 904Labs and illustrated the principles of online learning to rank through real-world examples from our experimences with our customers.

I got many questions after the talk which is a sign that the audience understood the topic and was intrigued by what A.I. can do for search. Some of the questions were brought to the following breaks with some of them leading to follow up meetings in the next days.

I much enjoyed the conference, and I hope that the organizers will follow up with another version next year. I believe that we do need more of this type of initiative to disseminate what A.I. is and A.I. can and cannot do so that organizations shape a better picture of how they can use it to solve challenges that they face.

A historical moment for 904Labs

I’m happy to share some great results with regards to the utility of self-learning search and revenue on e-commerce sites. 904Labs self-learning search improves revenue by 36% when compared to a highly, but manually, tuned Apache Solr search engine. Good job 904Labs team!

Read the full blog post at 904Labs, here: https://www.904labs.com/en/self-learning-search-improves-revenue-for-e-commerce