1.4 Loading a Delimited Text Data File

1.4.1 Problem

You want to load data from a delimited text file.

1.4.2 Solution

The most common way to read in a file is to use comma-separated values (CSV) data:

data <- read.csv("datafile.csv")

Alternatively, you can use the read_csv() function (note the underscore instead of period) from the readr package. This function is significantly faster than read.csv(), and

1.4.3 Discussion

Since data files have many different formats, there are many options for loading them. For example, if the data file does not have headers in the first row:

data <- read.csv("datafile.csv", header = FALSE)

The resulting data frame will have columns named V1, V2, and so on, and you will probably want to rename them manually:

# Manually assign the header names
names(data) <- c("Column1", "Column2", "Column3")

You can set the delimiter with sep. If it is space-delimited, use sep = " ". If it is tab-delimited, use \t, as in:

data <- read.csv("datafile.csv", sep = "\t")

By default, strings in the data are treated as factors. Suppose this is your data file, and you read it in using read.csv():

"First","Last","Sex","Number"
"Currer","Bell","F",2
"Dr.","Seuss","M",49
"","Student",NA,21

The resulting data frame will store First and Last as factors, though it makes more sense in this case to treat them as strings (or character vectors in R terminology). To differentiate this, use stringsAsFactors = FALSE. If there are any columns that should be treated as factors, you can then convert them individually:

data <- read.csv("datafile.csv", stringsAsFactors = FALSE)

# Convert to factor
data$Sex <- factor(data$Sex)
str(data)
#> 'data.frame': 3 obs. of 4 variables:
#> $ First : chr "Currer" "Dr." ""
#> $ Last : chr "Bell" "Seuss" "Student"
#> $ Sex : Factor w/ 2 levels "F","M": 1 2 NA
#> $ Number: int 2 49 21

Alternatively, you could load the file with strings as factors, and then convert individual columns from factors to characters.

1.4.4 See Also

read.csv() is a convenience wrapper function around read.table(). If you need more control over the input, see ?read.table.