# 欧几里德距离和皮尔逊相关系数

d = sqrt((x1-x2)^+(y1-y2)^)

d=sqrt((x1-x2)^+(y1-y2)^+(z1-z2)^)

d=sqrt( ∑(xi1-xi2)^ ) 这里i=1,2..n

xi1表示第一个点的第i维坐标,xi2表示第二个点的第i维坐标

n维欧氏空间是一个点集,它的每个点可以表示为(x(1),x(2),…x(n)),其中x(i)(i=1,2…n)是实数,称为x的第i个坐标,两个点x和y=(y(1),y(2)…y(n))之间的距离d(x,y)定义为上面的公式.

**皮尔逊相关系数
**

R的取值在-1与+1之间，若R＞0，表明两个变量是正相关，即一个变量的值越大，另一个变量的值也会越大；若R＜0，表明两个变量是负相关，即一个变量的值越大另一个变量的值反而会越小。R的绝对值越大表明相关性越强，要注意的是这里并不存在因果关系。若R=0，表明两个变量间不是线性相关，但有可能是其他方式的相关（比如曲线方式）。

critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
'The Night Listener': 3.0},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 3.5},
'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
'Superman Returns': 3.5, 'The Night Listener': 4.0},
'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
'The Night Listener': 4.5, 'Superman Returns': 4.0,
'You, Me and Dupree': 2.5},
'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 2.0},
'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}

from math import sqrt

# Returns a distance-based similarity score for person1 and person2
def sim_distance(prefs, person1, person2):

# Get the list of shared_items
si = {}
for item in prefs[person1]:
if item in prefs[person2]:
si[item] = 1

# if they have no ratings in common, return 0
if len(si) == 0:
return 0

# Add up the squares of all the differences
sum_of_squares=sum([pow(prefs[person1][item]-prefs[person2][item],2)
for item in prefs[person1] if item in prefs[person2]])

return 1/(1+sqrt(sum_of_squares))

<module ‘recommendation’ from ‘recommendation.py’>

recommendation.sim_distance(recommendation.critics,’Lisa Rose’,’Gene Seymour’)
0.29429805508554946

OK,对我来说上边输出的这一串小数是很令人激动的，因为以前在写豆瓣fm桌面版程序时，分析它的api时，就会看到一些如此的小数~

def sim_pearson(prefs, p1, p2):
#Get the list of mutually rated shared_items
si = {}
for item in prefs[p1]:
if item in prefs[p2]:
si[item] = 1

# if they are no ratings in common, return 0
if len(si) == 0:
return 0

n = len(si)

# Sums of all the preferences
sum1 = sum([prefs[p1][it] for it in si])
sum2 = sum([prefs[p2][it] for it in si])

# Sums of the squares
sum1Sq = sum([pow(prefs[p1][it],2) for it in si])
sum2Sq = sum([pow(prefs[p2][it],2) for it in si])

# Sums of the products
pSum = sum([prefs[p1][it]*prefs[p2][it] for it in si])

# Calculate r (Pearson score)
num = pSum - (sum1*sum2/n)
den = sqrt((sum1Sq-pow(sum1,2)/n)*(sum2Sq-pow(sum2,2)/n))
if den == 0:
return 0

r = num / den
return r