Explainable Mutual Fund Recommendation
Data
Please see 'DATA_DESCRIPTION.md' for mode detail.
Recommender System Methods
Baseline
- Collabarative Fiiltering
- PersonFreq
- PersonVolume
Stable
- LightFM Meta
- LightFM PureCF
- LightFM Hybrid
Advanced
- DGL
- GCN
Part I: Fund Recommedation
Training
Supported models
- Heuristic
- LightFM (CF/Hybrid/Meta)
- SMORe
# Process 3 models in parallel
bash run_all.sh
Arugments
You can also tune the detail parameter settings of each method in training pipeline.
# Commonly used arguments
--model
--model_type
--model_hidden_dimension
--evaluation_metrics
--use_heuristic
For example, LightFM with pure-CF method
EPOCHS=10
EMBED_SIZE=64
DATE=20181231
python3 train.py \
--path_transaction data/${DATE}/transaction_train.csv \
--path_transaction_eval data/${DATE}/transaction_eval.csv \
--path_user data/${DATE}/customer.csv \
--path_item data/${DATE}/product.csv \
--model 'LightFM' \
--model_path 'models/lightfm' \
--model_type 'cf' \
--model_hidden_dimension ${EMBED_SIZE} \
--model_max_neg_sample 100 \
--model_loss 'warp' \
--training_do_evaluation \
--training_verbose \
--training_num_epochs ${EPOCHS} \
--training_eval_per_epochs 1 \
--evaluation_diff \
--evaluation_regular \
--evaluation_metrics '[email protected]' \
--evaluation_metrics '[email protected]' \
--evaluation_metrics '[email protected]' \
--evaluation_metrics '[email protected]' \
--use_heuristic 'frequency' \
--use_heuristic 'volume' \
--evaluation_results_csv results/lightfm_cf_evaluation_${DATE}.csv \
--evaluation_rec_detail_report results/lightfm_cf_rec_detail_${DATE}.tsv \
> logs/lightfm_cf_exp_${DATE}.log
For another example, SMORe
python3 train.py \
--path_transaction data/${DATE}/transaction_train.csv \
--path_transaction_eval data/${DATE}/transaction_eval.csv \
--path_user data/${DATE}/customer.csv \
--path_item data/${DATE}/product.csv \
--model 'SMORe' \
--model_path 'models/smore' \
--model_hidden_dimension ${EMBED_SIZE} \
--model_max_neg_sample 100 \
--model_loss 'warp' \
--training_do_ \
--training_verbose \
--training_num_epochs ${EPOCHS} \
--training_eval_per_epochs 1 \
--evaluation_diff \
--evaluation_regular \
--evaluation_metrics '[email protected]' \
--evaluation_metrics '[email protected]' \
--evaluation_metrics '[email protected]' \
--evaluation_metrics '[email protected]' \
--evaluation_results_csv results/smore_evaluation_${DATE}.csv \
--evaluation_rec_detail_report results/smore_rec_detail_${DATE}.tsv \
> logs/smore_exp_${DATE}.log
Evaluataion
To use the evaluation pipeline, you need a prediction rec file with the format like the example below:
# prediction rec file
\t
\t
\t
\t
\t
CFDAXWccjJPoVInuiF0mMg== AG25 EXPLOIT SOLO 0 2 CFDAXWccjJPoVInuiF0mMg== XXXX EXPLOIT SOLO 0 1 CFDAXWccjJPoVInuiF0mMg== JJ15 EXPLOIT REGULAR 0 2 CFDAXWccjJPoVInuiF0mMg== XXXX EXPLOIT REGULAR 0 1 CFDAwH4y/ssuYSedFy8UMw== CC89 EXPLOIT REGULAR 0 2 CFDAwH4y/ssuYSedFy8UMw== XXXX EXPLOIT REGULAR 0 1 CFDA9UDJnLAm4/0txbPMVQ== AP06 EXPLORE NA 0 2 CFDA9UDJnLAm4/0txbPMVQ== XXXX EXPLORE NA 0 1
Later you could directly use the evaluate pipeline
bash rec_convert_eval.sh
In the evaluation pipeline, you need to convert the ground truth interaction into '.rec' format. For xample.
# truth rec file
\t
\t
\t
\t
CFDAXWccjJPoVInuiF0mMg== AG25 EXPLOIT SOLO 1.0 CFDAXWccjJPoVInuiF0mMg== JJ15 EXPLOIT REGULAR 1.0 CFDAwH4y/ssuYSedFy8UMw== CC89 EXPLOIT REGULAR 1.0 CFDA9UDJnLAm4/0txbPMVQ== AP06 EXPLORE NA 1.0
Convert from the evaluation transaction (includes the preprocess pipeline) by the following code, which will save the corresponding rec file in the defined argument '--path_trainsaction_truth'
DATE=20181231
python3 convert_to_rec.py \
--path_transaction data/${DATE}/transaction_train.csv \
--path_transaction_eval data/${DATE}/transaction_eval.csv \
--path_user data/${DATE}/customer.csv \
--path_item data/${DATE}/product.csv \
--path_transaction_truth rec/${DATE}.eval.truth.rec
And evaluate by the code "rec_eval.py"
DATE=20181231
python3 rec_eval.py \
-truth rec/${DATE}.eval.truth.rec \
-pred rec/pred.rec \
-metric '[email protected]' \
-metric '[email protected]' \
-metric '[email protected]' \
-metric '[email protected]'
The results would be like
TRUTH REC FILE EXISTED: 'rec/20181231.eval.truth.rec'
EvalDict({
SUBSET USERS EXAMPLES
* EXPLORE 2305 2826
* EXPLOIT 33355 62403
* REGULAR 31763 59054
* SOLO 2747 3349
})
==============================
[email protected] on EXPLORE 0.0001
[email protected] on EXPLORE 0.0004
[email protected] on EXPLORE 0.0004
[email protected] on EXPLORE 0.0004
[email protected] on EXPLOIT 0.0000
[email protected] on EXPLOIT 0.0001
[email protected] on EXPLOIT 0.0001
[email protected] on EXPLOIT 0.0001
[email protected] on REGULAR 0.0000
[email protected] on REGULAR 0.0001
[email protected] on REGULAR 0.0001
[email protected] on REGULAR 0.0001
[email protected] on SOLO 0.0001
[email protected] on SOLO 0.0004
[email protected] on SOLO 0.0004
[email protected] on SOLO 0.0004
==============================
Results
Methods | [email protected] | [email protected] | [email protected] | [email protected] |
---|---|---|---|---|
Collabarative Fiiltering | - | - | - | |
PersonFreq | - | - | - | |
PersonVolume | - | - | - | |
LightFM Meta | - | - | - | |
LightFM PureCF | - | - | - | |
LightFM Hybrid | 0.000 | 0.000 | 0.000 | 0.000 |
DGL | - | - | - | |
GCN | - | - | - |