If you are looking for Fama 49 industry classification to SIC mapping in the tabular format then this file will be good to go SIC-Fama Data. If further customization or other Fama Industry files needs to be used then code might serve as an example for the use-case. Especially the regular expression used in the code will be very useful in adapting the analysis to another Fama industry files. The code section contains further details.
The code converts the Fama and French 49 Industry classification into the tidy data, so that the data can be integrated into the analysis pipeline according to the tidy principles [1]. The main output of the code is file SIC-Fama Data.
The table below provide data dictionary for the output file SIC-Fama Data
| Column | Description |
|---|---|
| sic_code | SIC 4 digit Code |
| sic_group_name | SIC description |
| fama_sic_start | Fama SIC start 4-digit code |
| fama_sic_end | Fama SIC end 4-digit code |
| sic_fama_desc | SIC Industry Name as Per Fama data |
| fama_ind49 | Fama industry code |
| fama_grpdesc | Fama Industry description |
The code can be referenced for any other industry data tidying like Fama 10 industry classification. The code will provide ideas and reduce effort in tidying the data. The code uses functional approach of coding, so while and for loops are not used. Further, code uses targets targets package for executing the task pipeline [2].
The below code will check for missing library and install the missing libraries
list_packages <- c("targets", "tarchetypes", "tidyverse", "glue",
"readr", "stringr", "ggthemes", "here", "qs", "rvest")
if (length(setdiff(list_packages, rownames(installed.packages()))) > 0) {
install.packages(setdiff(list_packages, rownames(installed.packages())))
}After that run from R console
tar_make()
# or
source("run.R")Next, code in two files, which are important are 02_getdata.R and 03_prepdata.R.
-
02_getdata.Rcontains codes to pull information from url's and fama and french website. -
03_prepdata.Rcontains code to extract information from Fama text file and convert the data into tabular or tidy format.
Codes in these two files can serve as an template for tidying other French industry text files.
The SIC classification is based on these public information:
- OSHA SIC Reference. The file I used from Webscrapped data from Github OSHA SIC Manual Scrapper
- SEC SIC list
- Wikipedia SIC list
- Wickham, H. . (2014). Tidy Data. Journal of Statistical Software, 59(10), 1–23. https://doi.org/10.18637/jss.v059.i10
- Landau, W. M., (2021). The targets R package: a dynamic Make-like function-oriented pipeline toolkit for reproducibility and high-performance computing. Journal of Open Source Software, 6(57), 2959, https://doi.org/10.21105/joss.02959