K-means clustering algorithm
On the left, we get to see unclustered data. Consider this to be the initial state of all the parameters of every product that the user has looked at so far. This may be a cluster of a hundred products or simply one. Out of these parameters, certain groups may be identified visually but for a computer that is not the case. The algorithm simply associates a few points as the mean. These can found at the centre of each coloured cluster as seen on the right.
Three clusters obtained using the k-means algorithm
From this mean point, the distance to all data points are calculated and then based on which points are closest, a small cluster is formed. It is quite possible that during the formation of such a cluster, the mean might not be a feasible point and thus it may be moved, removed or even a new mean point could be added. After multiple iterations we obtain the cluster on the right. This data set could be indicative of which products are the most popular and thus are more likely to get recommended to all customers. This way, a cold start problem can be easily resolved.
Purchase history affects recommendations more than anything
A combination of many algorithms
A recommender system is by no means simple, it has a lot of different algorithms working towards producing just one result. A commonly known system that used a lot many algorithms was the one used by Netflix back in 2007, called Cinematch. Over 170 algorithms were used for each individual result. However, this was not enough and Netflix introduced a prize for $1,000,000 to anyone who could improve the accuracy of their system by 10%. This may seem to be a minor improvement but that 10% was more than enough to make good for the $1,000,000 prize money they were giving away. Going on a meta level increases the accuracy of a system. So if you were to start describing each parameter with another parameter then another dimension comes into the picture. “User Generated Tags” a feature implemented by Steam, the online game distribution platform is one such thing. It is a form of collaborative filtering where user inputs are random. These tags may be used by other customers and they’ll be recommended all the games that have the tags that they used.
How valuable are they?
The answer to this question depends on a lot many parameters. Primarily, it depends on how big your inventory is. Then it comes to breaking down the products into varying categories and then describing them with as many parameters as possible. All of this while maintaining consistency between the parameters. For example, for two different phones describing the screen of one screen using “surface area” and the other by using “diagonal length” gets you no way of associating the two. Then you need to have a big enough inventory to start off. It is a fact that each individual customer is only interested in a small fraction of products that a store offers. Having a recommender system otherwise ends up giving less returns in the long run.
The multiple ways an engine figures out what you like
However, if none of these issues plagues your enterprise then you might see a surge of up to 60% in sales. Netflix revealed that about 60% of their rentals are thanks to the recommendations given by Cinematch. Adding a social aspect helps tremendously, people tend to value the opinions of people they know and trust. This is one of the reasons why things go viral on social networks. Tapping the power of a social network becomes a huge driving factor.
Make your own recommender system
While you can find recommender systems available in the form of plugins, they are restricted to allow tags that suit their filters. This adds a little restriction to the system and will most probably affect your sales. So it is best to create your own recommender system for your website which is customised for the environment. There are plenty of open source recommender projects or libraries which can be used to create your own system and here are a few that should help you get started.
LensKit It is a framework for creating a collaborative filtering recommender system. It also provides the infrastructure for you to compare your recommender system with others that have used the same algorithm.
http://dgit.in/Lenskit
Crab
Another framework that enables user and item-based collaborative filtering. It’s based on Python and integrates seamlessly with popular scientific python libraries.
http://dgit.in/CrabRecommend
MyMediaLite
This is a library consisting of multiple algorithms which you can use to generate a recommender system and implements rating prediction based on collaborative filtering.
http://dgit.in/MyMediaLite
Recommenderlab
This is an testing environment that allows you to prototype and test your system. It provides feedback in the form of accuracy of each recommendations.
http://dgit.in/RecommenderLab
Neo4j
Primarily a graph database, Neo4j is helpful for new users to get familiar with the functioning of a recommendation engine. You can view each query executed as a graph. Here is a demonstration video wherein Neo4j is used to create a recommendation engine.
http://dgit.in/Neo4jDemo