-
Notifications
You must be signed in to change notification settings - Fork 132
Description
I often receive data with columns that contain either identical or effectively identical columns.
As an example, I may have a column indicating a human-readable medical laboratory test (like "Cholesterol, LDL"), and another column indicating a more computer-friendly standardized test number (called a LOINC code that may be 12345). It would be helpful to me to have a reporter function that indicates the columns that all map to the same values. The interface could look like:
find_mapping_cols(x)
where it would return a list with elements that are character vectors of columns that map to each other:
foo <- data.frame(
Lab_Test_Long=c("Cholesterol, LDL", "Cholesterol, LDL", "Glucose"),
Lab_Test_Short=c("CLDL", "CLDL", "GLUC"),
LOINC=c(12345, 12345, 54321),
Person=c("Sam", "Bill", "Sam"),
stringsAsFactors=FALSE
)
find_mapping_cols(foo)That would return a list that looks like:
list(c("Lab_Test_Long", "Lab_Test_Short", "LOINC"), "Person")It's up to the user what they would want to do with that list (because which column of the set is most useful to keep is not obvious).