当前位置:网站首页>02_电影推荐(ContentBased)_用户画像
02_电影推荐(ContentBased)_用户画像
2022-07-17 05:04:00 【Big data达闻西】
基于内容的电影推荐:用户画像
用户画像构建步骤:
- 根据用户的评分历史,结合物品画像,将有观影记录的电影的画像标签作为初始标签反打到用户身上
- 通过对用户观影标签的次数进行统计,计算用户的每个初始标签的权重值,排序后选取TOP-N作为用户最终的画像标签
用户画像建立
import pandas as pd
import numpy as np
from gensim.models import TfidfModel
from functools import reduce
import collections
from pprint import pprint
# ......
''' user profile画像建立: 1. 提取用户观看列表 2. 根据观看列表和物品画像为用户匹配关键词,并统计词频 3. 根据词频排序,最多保留TOP-k个词,这里K设为100,作为用户的标签 '''
def create_user_profile():
watch_record = pd.read_csv("datasets/ml-latest-small/ratings.csv", usecols=range(2), dtype={
"userId":np.int32, "movieId": np.int32})
watch_record = watch_record.groupby("userId").agg(list)
# print(watch_record)
movie_dataset = get_movie_dataset()
movie_profile = create_movie_profile(movie_dataset)
user_profile = {
}
for uid, mids in watch_record.itertuples():
record_movie_prifole = movie_profile.loc[list(mids)]
counter = collections.Counter(reduce(lambda x, y: list(x)+list(y), record_movie_prifole["profile"].values))
# 兴趣词
interest_words = counter.most_common(50)
maxcount = interest_words[0][1]
interest_words = [(w,round(c/maxcount, 4)) for w,c in interest_words]
user_profile[uid] = interest_words
return user_profile
user_profile = create_user_profile()
pprint(user_profile)
边栏推荐
猜你喜欢

RestClient操作文档

PyGame installation -requirement already satisfied

Learn about scheduled tasks in one article

One article to understand Zipkin

Learn about the configuration center

CVE-2020-10199 Nexus Repository Manager3远程命令执行漏洞复现

Shallow chat link tracking

DSL search results processing, including sorting, paging, highlighting

游玩数据获取与数据分析、数据挖掘 【2022.5.30】

ModerlArts第一次培训笔记
随机推荐
HarmonyOS第三次培训笔记
elment-ui使用方法
Cve-2017-12635 CouchDB vertical privilege bypass vulnerability recurrence
Topicexchange switch is simple to use.
Notes on Advanced Mathematics: a conjecture about the Equivalent Infinitesimal Substitution
Restclient operation document
ThreadLocal线程安全示例及其原理
RestAPI
微服务高并发服务治理
无限极分类
ModerlArts第一次培训笔记
Desensitization field example
用户-注册/登录
日志加入数据库实现思路
MYSQL两个查询条件取并集然后进行查询
【2022第十届‘泰迪杯’挑战赛】A题:害虫识别完整版(大致思路。详细过程和代码以及结果csv在压缩包中)
Web development with fastapi
MYSQL数据库表A数据同步到表B
The difference between junit4 and junit5
Monitoring and alarm of kubernetes