Skip to content

r2rahul/fama_industry

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tabular (tidy) data of Fama 49 Industry to SIC classification

If you are looking for Fama 49 industry classification to SIC mapping in the tabular format then this file will be good to go SIC-Fama Data. If further customization or other Fama Industry files needs to be used then code might serve as an example for the use-case. Especially the regular expression used in the code will be very useful in adapting the analysis to another Fama industry files. The code section contains further details.

Introduction

The code converts the Fama and French 49 Industry classification into the tidy data, so that the data can be integrated into the analysis pipeline according to the tidy principles [1]. The main output of the code is file SIC-Fama Data.

Data Dictionary

The table below provide data dictionary for the output file SIC-Fama Data

Column Description
sic_code SIC 4 digit Code
sic_group_name SIC description
fama_sic_start Fama SIC start 4-digit code
fama_sic_end Fama SIC end 4-digit code
sic_fama_desc SIC Industry Name as Per Fama data
fama_ind49 Fama industry code
fama_grpdesc Fama Industry description

Code

The code can be referenced for any other industry data tidying like Fama 10 industry classification. The code will provide ideas and reduce effort in tidying the data. The code uses functional approach of coding, so while and for loops are not used. Further, code uses targets targets package for executing the task pipeline [2].

The below code will check for missing library and install the missing libraries

list_packages <- c("targets", "tarchetypes", "tidyverse", "glue",
  "readr", "stringr", "ggthemes", "here", "qs", "rvest")
if (length(setdiff(list_packages, rownames(installed.packages()))) > 0) {
  install.packages(setdiff(list_packages, rownames(installed.packages())))  
}

After that run from R console

tar_make()
# or 
source("run.R")

Next, code in two files, which are important are 02_getdata.R and 03_prepdata.R.

  • 02_getdata.R contains codes to pull information from url's and fama and french website.

  • 03_prepdata.R contains code to extract information from Fama text file and convert the data into tabular or tidy format.

Codes in these two files can serve as an template for tidying other French industry text files.

The SIC classification is based on these public information:

References

  1. Wickham, H. . (2014). Tidy Data. Journal of Statistical Software, 59(10), 1–23. https://doi.org/10.18637/jss.v059.i10
  2. Landau, W. M., (2021). The targets R package: a dynamic Make-like function-oriented pipeline toolkit for reproducibility and high-performance computing. Journal of Open Source Software, 6(57), 2959, https://doi.org/10.21105/joss.02959

About

Convert Fama-French 49 Industry Classification text file into Tabular Format

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published