28 Mar 2012

Applying Same Changes to Multiple Dataframes

How to apply the same changes to several dataframes and
save them to CSV:

# a dataframe
a <- data.frame(x = 1:3, y = 4:6)

# make a list of several dataframes, then apply function (change column names, e.g.):
my.list <- list(a, a)
my.list <- lapply(my.list, function(x) {names(x) <- c("a", "b") ; return(x)})

# save dfs to csv with similar lapply-call:
n <- 1:length(my.list)
lapply(n, function(ni) {
               write.table(file = paste(ni, ".csv", sep = ""), 
               my.list[ni], sep = ";", row.names = F)
               }
       )

Edit:

I'll extend this to a script that reads several files from a directory, applies changes to the files in the same fashion and finally saves files back to the directory (as HERE)

# clean up
rm(list = ls())
setwd(tempdir())
unlink(dir(tempdir()))

# create some files in tempdir:
a <- data.frame(x = 1:3, y = 4:6)
b <- data.frame(x = 10:13, y = 14:15)
write.csv(a, "file1.csv", row.names = F)
write.csv(b, "file2.csv", row.names = F)

# now read all files to list:
mycsv = dir(pattern=".csv")

n <- length(mycsv)
mylist <- vector("list", n)

for(i in 1:n) mylist[[i]] <- read.csv(mycsv[i])

# now change something in all dfs in list:
mylist <- lapply(mylist, function(x) {names(x) <- c("a", "b") ; return(x)})

# then save back dfs:# then save back dfs:
for(i in 1:n) {
   write.csv(file = sub(".csv", "_new.csv", mycsv[i]),
             mylist[i], row.names = F)
}

5 comments :

  1. The second example is much more complex than the same code without lapply (and the funny construction of the file path). You can write it more simply as
    for (ni in 1:length(my.list))
    write.table(file = paste(ni, ".csv", sep = ""),
    my.list[ni], sep = ";", row.names = F)

    ReplyDelete
    Replies
    1. You're right - the c(1:length(my.list))[ni] is stupid! (I've changed that)
      When it comes to lapply, I suppose it will be faster than a for loop (?).

      Delete
    2. Even if lapply() is faster, you will never notice any difference when the body of the loop is writing a file.

      Delete
  2. Hi, this is a long time after the original post but it is the most understandable example of looping through .csv files and writing them back out again I have seen. I am wanting to extend this to retain the original .csv file name when I write back out with new text on the end.

    For example I have .csv files stored in a large directory all with unique names but they have components that are standard: wgtn_01_01.csv denotes place followed by numbers that identify an individual and then an event attached to that individual.

    When I read all of these in and write the .csv files back out I would like to retain the wgtn_01_01 but add on the end of it text in the order of:
    write.csv(file = paste("original_name","new_text",".csv", sep = "")

    Any help you can provide would be much appreciated and save me a lot of time!

    ReplyDelete
    Replies
    1. I changed the original code - now it is naming the new files in a manner as you proposed;)

      Delete