Cloud-based recommendation system

This project is based on cloud services to create data lake, ETL process, train and deploy learning model to implement a recommendation system.

Purpose

One Web app can return if the consumer will buy the product or not when providing user ID and corresponding product SKU.

Services

This project will use services:

AWS: lambda function, Step functions, Glue (job,notebook,crawler), Athena, SNS, S3, Sagemaker, IAM, Dynamodb, API Gateway.

Confluent cloud (kafka) for streaming data.

Project description

Create a bucket on S3 as the storage location of the data lake, store the raw data in the bucket (raw data zone), and then return the data after ETL to the same bucket (curated zone).
Preview the data, determine the data is useful and meaningful for our project. Use AWS Glue crawler to grab corresponding data catalog (in created database and generated table info). Use Athena to do SQL query. This like Apache Hive, it does not change raw data, but do operations above the raw data.
Create and store stream data. Create a kafka topic on Clonfluent cloud and set schema registry for the corresponding stream data, schema sets as confluent_cloud_kafka-->confluent_kafka_topic_schema.json. Set the kafka producer as confluent_cloud_kafka-->confluent_kafka_producer_lambda.py to push stream data to corresponding kafka topic in different partitions (because this project does not have exact source giving real stream data, we produce stream data manually). Set the consumer (confluent connector with AWS lambda) as confluent_cloud_kafka-->confluent_kafka_consumer_lambda.py to poll the stream data in kafka topic and store them in Dynamodb table.
ETL process. Use lambda function to do data transformation operations based on SQL, corresponding scripts in file lambda_functions(ETL). Create Glue job to integrate new dataset and store in curated zone in data lake, scripts is in glue_job-->glue_job_ETL.py. Use step fuctions to orchestrate ETL workflow based on above lambda functions, ASL script is in step_function(workflow)-->step_functions_for_curated.json.

This part is based on spark, and it is similar with the project in repo: https://github.com/Yi-Ding111/spark-ETL-based-databricks-aws.
Train learning model (XGBoost). Use sagemaker notebook instance to do some kinds more operations like: EDA and feature engineering, use XGBoost framework to train the data, adjust parameters and try different attributes combinations to find the best one. Scripts is in sagemaker-->xgboost_deploy_sagemaker.ipynb.
Deploy learning model. Get deploy endpoint after machine learning. Create lambda function to invoke the sagemaker endpoint to use the trained model, scripts is in sagemaker-->endpoint_interact_lambda.py. Let the lambda function integrate with API gatway (proxy integration) as the backend. Deploy the API gatewat and use the invoked URL for web applications to do interactions.
Store the application output. Use SNS to publish the output to lambda and update the information into Dynamodb table, scripts is in sagemaker-->prediction_store_dynamodb.py

Acknowledgement

This project is completed with the guidance from Leo Lee (JR academy)

Author: YI DING, Leo Lee

Created at: Dec 2021

Contact: [email protected]

Cloud-based recommendation system

Related tags

Overview

Cloud-based recommendation system

Purpose

Services

Project description

Acknowledgement

Owner

Yi Ding

RecList is an open source library providing behavioral, "black-box" testing for recommender systems.

Accuracy-Diversity Trade-off in Recommender Systems via Graph Convolutions

The source code for "Global Context Enhanced Graph Neural Network for Session-based Recommendation".

Deep recommender models using PyTorch.

Codes for AAAI'21 paper 'Self-Supervised Hypergraph Convolutional Networks for Session-based Recommendation'

基于个性化推荐的音乐播放系统

Incorporating User Micro-behaviors and Item Knowledge 59 60 3 into Multi-task Learning for Session-based Recommendation

A tensorflow implementation of the RecoGCN model in a CIKM'19 paper, titled with "Relation-Aware Graph Convolutional Networks for Agent-Initiated Social E-Commerce Recommendation".

Fast Python Collaborative Filtering for Implicit Feedback Datasets

Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch

Code for ICML2019 Paper "Compositional Invariance Constraints for Graph Embeddings"

Code for KHGT model, AAAI2021

EXEMPLO DE SISTEMA ESPECIALISTA PARA RECOMENDAR SERIADOS EM PYTHON

A TensorFlow recommendation algorithm and framework in Python.

Graph Neural Network based Social Recommendation Model. SIGIR2019.

Implementation of a hadoop based movie recommendation system

Elliot is a comprehensive recommendation framework that analyzes the recommendation problem from the researcher's perspective.

Knowledge-aware Coupled Graph Neural Network for Social Recommendation

Spotify API Recommnder System