首页 1 2 3 4 5 6 7

Cousera Recommender System专项课程 Assignment1

1.Mean Rating: Calculate the mean rating for each movie, order with the highest rating listed first, and submit the top three (along with the mean scores for the top two).

import pandas as pd
import numpy as np 

calculationCSV = pd.read_csv(r'D:\BaiduNetdiskDownload\Recommender Systems 专项课程\1、Introduction to Recommender Systems\03_non-personalized-and-stereotype-based-recommenders\05_module-assessments\HW1-data.csv')

calculationCSV.shape

print(list(calculationCSV.columns))

calculation = calculationCSV.drop(['User','Gender (1 =F, 0=M)'], axis=1)
print(calculation)

#mean rating
mean = calculation.mean()

print(mean.sort_values(axis = 0, ascending=False).head(3))

output:
在这里插入图片描述

2.Rating Count (popularity): Count the number of ratings for each movie, order with the most number of ratings first, and submit the top three (along with the counts for the top two).

#rating count
count = calculation.count()

print(count.sort_values(axis = 0, ascending=False).head(3))

output:
在这里插入图片描述

3.% of ratings 4+ (liking): Calculate the percentage of ratings for each movie that are 4 or higher. Order with the highest percentage first, and submit the top three (along with the percentage for the top two). Notice that the three different measures of “best” reflect different priorities and give different results; this should help you see why you need to be thoughtful about what metrics you use.

#% of ratings 4+
def ifgreaterthan4(x):
 if (x >= 4):
  return 1
 else:
  return 0

cols = calculation.columns
print(cols)

for col in cols:
    calculation[col] = calculation[col].apply(lambda x: ifgreaterthan4(x))

sum=calculation.sum()
print(sum)

liking=sum/count
print(liking)

output:
在这里插入图片描述

4.Top movies for someone who has seen Toy Story: Calculate movies that most often occur with Movie #1: Toy Story, using the (x+y)/x method described in class. In other words, for each movie, calculate the percentage of Toy Story raters who also rated that movie. Order with the highest percentage first, and submit the top 3 (along with the correlations for the top two). Note, you will have ties - to break the ties, use the lowest- numbered movie as the higher-ranked one. In other words, if Movies 541 and 318 are tied, then 318 gets the higher rank.

#top movies for those who has seen toy story
newCalculation = calculationCSV.drop(['User','Gender (1 =F, 0=M)'], axis=1)
# print(toyStory)

for i in range(len(newCalculation)):
  if np.isnan(newCalculation['1: Toy Story (1995)'][i]):
    newCalculation = newCalculation.drop(i)

nC = newCalculation.drop(['1: Toy Story (1995)'], axis=1)

print(nC.count().sort_values(ascending=False).head(5))

在这里插入图片描述

5.Correlation with Toy Story: Calculate the correlation between the vectors of ratings for Toy Story and each other movie. You can use the built-in CORREL() function. Order by the highest correlation (positive only) and submit the top 3 along with the correlation values for the top 2. Notice the differences between co-occurrence and correlation; these metrics are showing different types of relationships.

# correlation
print(newCalculation.corr()[u'1: Toy Story (1995)'].sort_values(ascending=False).head(6))

在这里插入图片描述

6.Mean rating difference by gender: First, recompute the mean rating for each movie separately for males and for females. And calculate the overall mean rating (across all ratings) for males and females. Submit the two movies that have the greatest differences (one where men are most above women, and one where women are most above men) along with the differences in average. Also submit the difference in overall rating averages (female average - male average).

# mean rating difference by gender
#compute overall mean rating for males and females
cal = calculationCSV.drop(['User'], axis=1)

males = cal.loc[(cal['Gender (1 =F, 0=M)'] == '1')]
females = cal.loc[(cal['Gender (1 =F, 0=M)'] == '0')]

males_mean = males.mean()
females_mean = females.mean()

male = cal[cal['Gender (1 =F, 0=M)'].isin(['1'])]
print(male)

mean_male = male.mean()
print(mean_male.sort_values(ascending=False))

Output
在这里插入图片描述

female = cal[cal['Gender (1 =F, 0=M)'].isin(['0'])]
print(female)

mean_female = female.mean()
print(mean_female.sort_values(ascending=False))

Output
在这里插入图片描述

#submit two movies that have the greatest differences
difference1 = mean_female['1198: Raiders of the Lost Ark (1981)'] - mean_male['2916: Total Recall (1990)']
difference2 = mean_male['2396: Shakespeare in Love (1998)'] - mean_female['34: Babe (1995)']
print(difference1)
print(difference2)

Output：
在这里插入图片描述

#compute overall female average - male average
mean_male = mean_male.drop(['Gender (1 =F, 0=M)'],axis=0)
mean_female = mean_female.drop(['Gender (1 =F, 0=M)'],axis=0)
avg_male = mean_male.mean()
print(avg_male)
avg_female = mean_female.mean()
print(avg_female)

Output：
在这里插入图片描述

7.Next, compute the % of ratings 4+ separately for males and females. You’ll be asked to submit two movies as above (largest difference in each direction). And again you’ll indicate whether men or women are more likely to rate movies 4 stars or above.

#4+ for male and female
cols = male.columns
print(male)

for col in cols:
    male[col] = male[col].apply(lambda x: ifgreaterthan4(x))

print(male)

sum=male.sum()
count = male.count()
liking=sum/count
print(liking.sort_values(ascending=False))

Output:
在这里插入图片描述

cols = female.columns

for col in cols:
    female[col] = female[col].apply(lambda x: ifgreaterthan4(x))

sum=female.sum()
count = female.count()
liking=sum/count
print(liking.sort_values(ascending=False))

Output：
在这里插入图片描述