(C) Copyright 2014-2016, Allison J.B. Chaney
This software is distributed under the MIT license. See LICENSE.txt for details.
confcontains a base configure file for running LibRec to do model comparisonsscriptsbash and python scripts for data processing and running experimentssrcC++ source codedatexample dataREADME.mdthis file
The input format for data is tab-separted files with integer values:
user id item id rating
The ratings should be separated into training, testing, and validation data; scripts/process_data.py
helps divide data into these different sets. This script also culls the user network such that only
connections that have at least one item in common are included.
python process_data.py [ratings-file] [network-file] [output-dir]
Alternatively, data with time information (like shown below) can be processed with
process_time_data.py which takes the same arguments as process_data.py. This
will split the data according to time; ratings are implicit and therefore binary.
user id item id unix time
- Clone the repo:
git clone https://github.com/ajbc/spf.git - Navigate to the
spf/srcdirectory - Compile with
make - Run the executable, e.g.:
./spf --data ~/my-data/ --out my-fit
| Option | Arguments | Help | Default |
|---|---|---|---|
| help | print help information | ||
| verbose | print extra information while running | off | |
| out | dir | save directory, required | |
| data | dir | data directory, required | |
| svi | use stochastic VI (instead of batch VI) | off for < 10M ratings in training | |
| batch | use batch VI (instead of SVI) | on for < 10M ratings in training | |
| a_theta | a | shape hyperparamter to theta (user preferences) | 0.3 |
| b_theta | b | rate hyperparamter to theta (user preferences) | 0.3 |
| a_beta | a | shape hyperparamter to beta (item attributes) | 0.3 |
| b_beta | b | rate hyperparamter to beta (item attributes) | 0.3 |
| a_tau | a | shape hyperparamter to tau (user influence) | 2 |
| b_tau | b | rate hyperparamter to tau (user influence) | 5 |
| a_delta | a | shape hyperparamter to delta (item bias) | 0.3 |
| b_delta | b | rate hyperparamter to delta (item bias) | 0.3 |
| social-only | only consider social aspect of factorization (SF) | include factors | |
| factor-only | only consider general factors (no social; PF) | include social | |
| bias | include a bias term for each item | no bias | |
| binary | assume ratings are binary | integer | |
| directed | assume network is directed | undirected | |
| seed | seed | the random seed | time |
| save_freq | f | the saving frequency. Negative value means no savings for intermediate results. | 20 |
| eval_freq | f | the intermediate evaluating frequency. Negative means no evaluation for intermediate results. | -1 |
| conv_freq | f | the convergence check frequency | 10 |
| max_iter | max | the max number of iterations | 300 |
| min_iter | min | the min number of iterations | 30 |
| converge | c | the change in rating log likelihood required for convergence | 1e-6 |
| final_pass | do a final pass on all users and items | no final pass | |
| sample | sample_size | the stochastic sample size | 1000 |
| svi_delay | tau | SVI delay >= 0 to down-weight early samples | 1024 |
| svi_forget | kappa | SVI forgetting rate (0.5,1] | default 0.75 |
| K | K | the number of general factors | 100 |
- Download and compile code for comparison models:
cd scripts/; ./setup.sh; cd .. - Kick off fits for multiple models with the script (from
scriptsdirectory):
./study [data-dir] [output-dir] [K] [directed/undirected]