Online shopping, listening to music, watching movies on the site, the site will be based on our shopping records, listen to the record or to watch the record to recommend some products, music or movies. So how do these recommender systems come true? First of all, to recommend a thing that shoppers are interested in, then how to determine the shopper interested in the item? Then, you have to judge by the shopper's record, assuming the shopper is interested in the items he buys. The items that are similar to the shopper's items are recommended, and are based on similar articles. In addition, people who can be similar tend to have the same hobbies, so they can find similar shoppers based on the shopper's buying record, and recommend items that are similar to shoppers, which is based on similar user recommendations.
Here is a simple classical algorithm based on user similarity-Euclidean distance algorithm.
Euclidean distance refers to the distance between two points on the plane, which is calculated as follows:
Extended to n-dimensional space, the distance between the two points is:
So, assuming that two shopper x, Y, they have purchased an item, or have not purchased an item, then the distance of the point is 0, if only one person buys, then the distance is 1, by calculating the distance of all items they buy, you can calculate the distance between x and Y distance
This can be done through the following programs:
#!/usr/bin/python#Datax_user={'Pen': 1,'Pencil'70A'Knife': 1,'Notebook': 1,' Book': 0}y_user={'Pen'70A'Pencil'70A'Knife': 1,'Notebook': 1,' Book': 0}#Calculate Distance fromMathImportsqrtdefdistance (person1,person2): Dis=0 Dis=sum ([Pow (person1[item]-person2[item],2) forIteminchPerson1]) returnsqrt (dis)PrintDistance (X_user,y_user)
The initial distance, that is, the minimum distance is 0. For the connected goods, 0 means no purchase, 1 means the purchase. By calculation, the distance between x and Y is 1.
This calculation has a problem, for different numbers of goods, the calculated distance will not be in the same range, so the results need to be processed. In the range of 0-1, the greater the similarity is, the higher the similarity level. Then the similarity formula is:
By doing so, the similarity and distance are guaranteed to be inversely proportional and remain within the range of 0-1.
The code is fixed as follows:
#!/usr/bin/python#Datax_user={'Pen': 1,'Pencil'70A'Knife': 1,'Notebook': 1,' Book': 0}y_user={'Pen'70A'Pencil'70A'Knife': 1,'Notebook': 1,' Book': 0}#Calculate Distance fromMathImportsqrtdefdistance (person1,person2): Dis=0 Dis=sum ([Pow (person1[item]-person2[item],2) forIteminchPerson1]) return1/(1 +sqrt (dis))PrintDistance (X_user,y_user)
Comments:
Pow () function---the second party
SQRT ()---the square root function of a nonnegative real number
Dictionary: Dictionary (that is, map of the C + + standard library)
Dict = {' Ob1 ': ' Computer ', ' ob2 ': ' Mouse ', ' ob3 ': ' Printer '}
Each element is a pair, containing the key, value two parts. Key is an integer or string type and value is any type.
The key is unique, and the dictionary only recognize the last assigned key value.
[Process_item for item in Some_list_or_tuple if condition]
This kind of statement can get a list,
For example, to get a list of every item in every list.
>>> L = [1, 2, 3, 4]
>>> L
[1, 2, 3, 4]
>>> L2 = [i * 2 for i in L]
>>> L2
[2, 4, 6, 8]
If you want to double the price conditions to judge, the conditions are not in the results, such as more than 2 to double, you can write
>>> L3 = [i * 2 for i in L if I > 2]
>>> L3
[6, 8]
Sum_of_squares=sum ([Pow (prefs[person1][item]-prefs[person2][item],2) for item in Prefs[person1] if item in prefs[ Person2]])
I understand that this expression is very well understood.
This parameter in sum () is a list,[pow (prefs[person1][item]-prefs[person2][item],2) for item in Prefs[person1] if item in prefs[ Person2]]
Each item of the original list (Prefs[person1]) is evaluated by POW (prefs[person1][item]-prefs[person2][item],2), which is used as a new list and, of course, the condition if Item in Prefs[person2]
------------------------------------------
At this point, the calculated similarity is 0.5.
The above is just a simple implementation of Euclidean method, this algorithm is not very high precision, there are a number of areas to improve:
1, for each item of distance, can be more accurate, not just 0 or 1, can be a value between 0-5
2, different items may contribute the right value, for example, some items many people like, some items are very few people like, can also be based on the popularity of the item to give different weights to improve accuracy.
Reference:? http://blog.chinaunix.net/uid-21718047-id-3220140.html
and "collective Intelligence programming"
Collective intelligence Programming (a) recommended system Euclidean distance from Germany