Is it possible to build a Generic Recommender System With Deep Learning and NLP?

Berk Gökden
Berk Gökden’s adventures
4 min readMar 15, 2020

--

This question is on my mind for a couple of years, can we predict the next event in the sequence where the type of events is not known or labeled before.

Promise: In this post, I will design a simple recommender system using Keras LSTM model and TensorFlow multi-language universal encoder as embedding to recommend the next two products based on the previous three products based on product transactions.

First of all, to use deep learning, I needed to find a way to encode events to a vector. I started with Word2Vec and averaging words vectors, then I found out about Doc2Vec from Gensim. The problem with these is the training a model takes time and it required significant cleaning of the input data. Then the transformers arrived. Pretrained transformer models like BERT and GPT-2 revolutionalized the NLP world. But I will use Tensorflow Universal Sentence Encoder Multilingual Large model since it is easier to use for my demo case.

Tensorflow Universal Sentence Encoder Multilingual Large model creates a 512 float array vector representation of sentences. I won’t get into the detail about this model in this post.

For reverting back products from vector spaces, I will use a simple nearest neighborhood classifier. Since this is a small dataset, I will use NearestCentroid Classifier from sklearn library. I will not use a deep learning library for text generation because we have a smaller subset to recommend and text generation is a harder problem. I have built a feature store called Veri to handle larger datasets.

In the first phase, I will prepare data and convert everything into vector space with Tensorflow Universal Sentence Encoder. I will use an E-commerce Dataset from Kaggle. This dataset includes actual transactions from a UK retailer. But it has two problems where purchases are not actually time-series and descriptions are not sentences, just names of the products. If I can find a better dataset, I will republish new results.

First, I read the data with pandas and I prepared both data for the nearest neighbor classifier and transaction data. Transaction data should be in the form of 3 inputs and 2 outputs. Each element is a 512 element float array. As a side note, many implementations use embedding as the first layer of their model but I will keep embedding as a separate layer to keep these operations separate. I have grouped transactions into groups of 5, if less than 5, I have padded array with empty strings. Since these transactions are not time-series, I created permutations of arrays and get the subset of 3 elements. I do this if I use LSTMs with unordered data. Using a more general convolution network would be better in this case but I want to develop a generic model that can be used with ordered or unordered data. Although this dataset doesn’t include, website funnel data is usually ordered. I stored prepared data as NumPy arrays to be used for training the model. The code snippet is as follows:

Data preparation code in python

In the second part, I will load the data from the first part then train a multivariate multi-step encoder-decoder LSTM model with Keras where input is 3x512 float and output is 2x512 float. Usually, this model should be optimized by hyperparameter tuning but I will use some logical values that I chose model parameters by hand and my symmetry obsession. I have read the LSTM book of Jason Brownlee and used the example from Machine Learning Mastery as a base for this example.

Train Keras model

In the third part, I will test the model with some existing data. I will reuse some code from the first part and call the model to make predictions.

Test model with using existing transactions

Results:

Results of 20 randomly chosen transactions.

You can see that mostly the results are logically related to the input. The product named “STORAGE TIN VINTAGE LEAF” is recommended multiple times, I believe this product has its own cluster so anything results in that area end up with this product. The recommender system doesn’t need to be perfect so these results look good enough with little development.

The advantage of this model is that it can be used for more generic input:

Example with existing products:
['BABY BOOM RIBBONS ', 'GINGERBREAD MAN COOKIE CUTTER', 'ROSE COTTAGE KEEPSAKE BOX '] => ['ALARM CLOCK BAKELIKE IVORY']
Example with non-existing products:
['ribbon', 'cookie cutter', 'rose box'] => ['DOORKNOB CRACKED GLAZE IVORY', 'ALARM CLOCK BAKELIKE IVORY']

Even when a user didn’t buy any products yet, it is possible to create a recommendation based on the webpage or the information of the user gathered from other sources.

This kind of model can be saved and used for other e-commerce platforms with little to no training. It has an easy and straightforward development. These examples unfortunately not complete and data is not properly formatted. It would be much better to use it with a time-series funnel dataset.

Please add your thoughts as a comment.

Code can be found my GitHub repo:

I will continue in my next blog post on how to pack and serve this project with a docker image.

--

--