我正在尝试将一个csv文件读入R.问题是该文件有2个分隔符,我不知道如何将其作为3列数据框读取;即第一,第二和第一年.这是文件的样子示例:
[Alin Deutsch, Mary F. Fernandez, 1998], [Alin Deutsch, Daniela Florescu, 1998],
我已尝试fread()
使用sep="["
和的功能sep2=","
,但它不起作用,R只是读取它作为1列向量的行谢谢
您可以阅读该文件,sep=","
然后删除额外的括号:
df <- read.csv(file = textConnection("[Alin Deutsch, Mary F. Fernandez, 1998], [Alin Deutsch, Daniela Florescu, 1998],"),stringsAsFactors=FALSE,head=FALSE) df <- df[,-4] df$V1 <- gsub("\\[","",df$V1) df$V3 <- gsub("\\]","",df$V3) names(df) <- c("first","second","year") df
产量
first second year 1 Alin Deutsch Mary F. Fernandez 1998 2 Alin Deutsch Daniela Florescu 1998
1)read.table/sub使用sep = ","
和读取它comment.char = "]"
.这将拆分领域,摆脱尾随的]
后一切,然后我们可以直接删除[
从V1
使用sub
:
Lines <- "[Alin Deutsch, Mary F. Fernandez, 1998], [Alin Deutsch, Daniela Florescu, 1998]," DF <- read.table(text = Lines, sep = ",", comment.char = "]", as.is = TRUE, strip.white = TRUE, # might not need this one col.names = c("Name1", "Name2", "Year")) DF <- transform(DF, Name1 = sub("[", "", Name1, fixed = TRUE))
赠送:
> DF Name1 Name2 Year 1 Alin Deutsch Mary F. Fernandez 1998 2 Alin Deutsch Daniela Florescu 1998
2)read.pattern 另一种可能性是read.pattern
在gsubfn中使用.这种模式假定每行以[,有三个逗号开头,最后一个有一个]开头.这与问题中的内容相对应,但如果不是这种情况,则需要更改正则表达式.
library(gsubfn) read.pattern(text = Lines, pattern = ".(.*?),(.*?),(.*?).,", as.is = TRUE, strip.white = TRUE, # might not need this one col.names = c("Name1", "Name2", "Year"))
给予同样的.