I was invited to give a talk at the SIGIR Workshop On eCommerce 2019 but unfortunately, I am not attending SIGIR this year. Instead I wrote down some thoughts on interesting
problems, ideas and challenges in the e-commerce domain. Here they are:
- The vocabulary gap is still an open problem. People refer to products in different ways that products are described in the catalogue. Neural networks, learning to rank, query intent engines are all important. Our experience at 904Labs shown that a query intent engine that boosts specific product categories given a query, boosts our learning to rank system by more than 16% in additional revenue. These results suggest that effective initial ranking is very important for effective learning to rank. To this end, we foresee an increasing interest in methods that can re-rank the entire collection and not only the top-N documents. This is particularly important for larger shops with large inventories (> hundred of thousands of items) where a query can return hundreds of items, and only the top few are re-ranked.
- E-commerce search is as much as about exploration as it is about finding the best match. From our experience at 904Labs, we see a large fraction of queries to revolve around categories, or combinations of categories, e.g., “red shoes”, “kitchen tables”, “dvd players”, “ebooks for 12 years old”. This type of queries go beyond our typical search and require understanding the query and generating a list of recommended relevant items. This proposition is supported by the surprising effectiveness of sorting by popularity; the most popular items for a query are potential good candidates for this “recommendation list” that the user is looking for. Back to query understanding of this type of exploratory queries, one would think that natural language processing can help here but in the e-commerce setting, queries are very short and any language analysis falls short. An open question here is how can we go from these broad queries to a good set of recommendations? A natural way is devise a hybrid system of search and recommendations: We fire the query to a search system and then we take the first few items as seed to a recommender system for getting similar items. Or a system that transforms a query to an image (think AttnGAN or similar) and each product to an image and then rank documents by their image similarity to the query’s–the image representation may constraint the latent space, abstract the language of the query and that of the document and be able to capture semantics that are otherwise difficult to encode in textual form. The image representation of queries and documents also offers explainability when it comes to explaining the rankings of the system; which is becoming increasingly quite important in machine learning-based systems.
- Evaluation metrics in e-commerce. There several directions here that we need more work. First, the e-commerce setting is a conjunction of exploratory search, typical search, and recommendations. Using the standard IR precision and recall measures for e-commerce may not tell us the entire story for how happy makes its users. We need to discover the aspects of a system that makes users happy for designing one or more metrics for evaluating search and recommendation systems. These metrics should also correlate well with revenue but also with customer loyalty (measured in returning customers and in shortening the time between returns). Such a metric (or a multiple of metrics) will then allow us to run offline experiments and make predictions on revenue, which is the main KPI that systems are evaluated in production for most e-commerce business.
Extra tip: we found that boolean scoring may be on par or outperform tf.idf or BM25 scoring in the e-commerce domain, it’s worth checking its effectiveness on your own data 😉