Skip to content

duxuhao/China_Cup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

China_Cup

For China Cup
The missing value is following:

Train

  • overdue:
    original shape: (55596, 2)
    transform shape: (55596, 2)
    unique user number: 55596
    0.0 % users are missing

  • user_info:
    original shape: (55596, 6)
    transform shape: (55596, 6)
    unique user number: 55596
    0.0 % users are missing

  • loan_time:
    original shape: (55596, 2)
    transform shape: (55596, 2)
    unique user number: 55596
    0.0 % users are missing

  • browse_history:
    original shape: (22919547, 4)
    transform shape: (22919547, 4)
    unique user number: 47330
    14.87 % users are missing

  • bill_detail:
    original shape: (2338118, 15)
    transform shape: (2338118, 15)
    unique user number: 53174
    4.36 % users are missing

  • bank_detail:
    original shape: (6070197, 5)
    transform shape: (6070197, 5)
    unique user number: 9294
    83.28 % users are missing

Test

  • usersID:
    original shape: (13899, 1)
    transform shape: (13899, 1)
    unique user number: 13899
    0.0 % users are missing

  • user_info:
    original shape: (13899, 6)
    transform shape: (13899, 6)
    unique user number: 13899
    0.0 % users are missing

  • loan_time:
    original shape: (13899, 2)
    transform shape: (13899, 2)
    unique user number: 13899
    0.0 % users are missing

  • browse_history:
    original shape: (5476055, 4)
    transform shape: (5476055, 4)
    unique user number: 11997
    13.68 % users are missing

  • bill_detail:
    original shape: (414895, 15)
    transform shape: (414895, 15)
    unique user number: 13643
    1.84 % users are missing

  • bank_detail:
    original shape: (376409, 5)
    transform shape: (376409, 5)
    unique user number: 709
    94.9 % users are missing

Features of the data

  • sample imbalance
    can adjust the sample rate for different type, try to get close to the online sample balance
  • browse_history
    user can do multi activities at a time and the file record users' activity at different time. Many activity can come out from this file, like how many times the user operate, how many times the user take this activity or activity label.
  • bill info
    a costumer can have credit cards from different bank, totally 29 banks.
  • the number of the activities has no pysical meaning, assumption base on the total number an activity taken.
  • the number of the activities' label has no pysical meaning, assumption base on the total number an activity label taken.

Extract Features (5 + 1 + 1050 + 2551 = 3607)

Basic information (5):

  • gender, occupation, education, marriage, residence

Loan time (1):

  • loan time

Browse activity (1050):

  • the total times the costumer browse (1)
  • the time difference between the first and last browses of the costumer (1)
  • the frequency of the costumer browse (1)
  • total_times/frequency/average/min/max of a costumer take for a single activity label (11 different types * 5 = 55)
  • total_times/frequency of a costumer take for a single activity (216 different types * 2 = 432)
  • the average/min/max/count of activities number does the costumer take in one browse (4)
  • the average/min/max/count of activities number label does the costumer take in one browse (4)
  • total_times/frequency of a costumer take for a single activity+label (276 different types * 2 = 552)

Bill information (2551)

  • the total bills the costumer has (1)
  • the average/min/max/sum/cov/std of each bill's details for costumers (14 * 6 = 84)
  • the number of credit cards costumer has (1)
  • the total bills the costumer has for different banks (29)
  • the average/min/max/sum/cov/std of each bill's details for costumers (14 * 6 * 29 = 2436)

Bank detail

  • loss too much information, consider later

About

For China Cup

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages