Two-tower Model

 

Long time no see - it has been a month since I started my work in Quora. I started my ML Engineer position at Distribution team, which is highly related to the content recommendations. Speaking of recomendation, I would like to introduce the two-tower model1, which is a part of the (almost) industry-standard architecture for any recommendation systems.

Overview

Modern recommendation systems consist of two main pillars - (i) retrieval and (ii) ranking. The first part is to choose which items to be shown to the user, and the second part is to order them up by some scoring criteria. Since there are tons of items (at least for those major services) ranking all of them is unreasonable, and showing all the chosen items simultaneously is also absurd idea. Hence, splitting the whole system into two parts is somehow natural consequence of the system design. Please let me know if you come up with more clever idea:)

What is Two-tower model?

So two-tower model is for the item retrieval. It seems to have two somethings, then what are those? Recall the objective of the recommendation system - showing the relevant items to each user. Those italic terms are the names of those two towers. By getting two encoded vectors from both item and user, we get the score of the match by dot product. Below is the great visualisation of Google.

two tower

Visualisation of Two-tower model. Originally from here.

We could know many things about items, and surely, if consented to give their information to us, about users as well. Not only we have the object ids, we have basic statistics, history, tags, followed entities, activities, and so many data. Features from the above image contains those information. Since the definition of the towers is quite flexible, it’s totally up to the designer who knows the specifics of the data. Fyi: Quora also has continuously improving features to make the recommendation better 😎.

How to train the model?

I guess this is not restricted to the Two-tower, but Gaurav2 made a nice explanation about how to train the recommendation system. We basically want higher scores to highly related item-user pairs, and lower scores to seemingly-not-matching pairs. From the history, we need to get positive examples and negative examples to train the model with balance.

Positive examples are exactly what you’re thinking. Regarding the target metric, any positive action between a user and an item can be a positive example. Not only the clicked items, but also for the items which got enough reading time, or upvoted items (sorry for Quora-specific examples) can all be in this group.

Negative examples, however, are quite opposite. We need this pool of data since we should not to recommend the irrelevant items to a user. If we only use the positive examples to train, then the model will just recommend any item since it doesn’t know what pairs are less related. There are two main approaches to compose the negative examples: candidate sampling and negative impressions. The former is the randomly picked items not watched by the user, and the latter is the items which are represented to the user by the system, yet not chosen. Guarav says:

Typically the first approach, negative sampling works better in the initial stages. Once the model has already reached a high level of recall, a bit of negative impression based training might further improve the model. Just using impressions might not work.

Sleek strategy, isn’t it?

Afterword

Two-tower model is quite a huge concept and can versatilely be applied to different systems. I’ll try to go through the recommendation systems of some major companies use. Until then!

  1. Yi, Xinyang, et al. “Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations”, RecSys, 2019. 

  2. Chakravorty, Gaurav, “Personalized recommendations - IV (two tower models for retrieval)”, https://www.linkedin.com/pulse/personalized-recommendations-iv-two-tower-models-gaurav-chakravorty/