我有一个csv文件,其中每行定义给定建筑物中的房间.与房间一起,每排都有一个场地.我要提取的是所有建筑物的所有楼层.
我的文件看起来像这样......
"u_floor","u_room","name" 0,"00BDF","AIRPORT TEST " 0,0,"BRICKER HALL, JOHN W " 0,3,"BRICKER HALL, JOHN W " 0,5,"BRICKER HALL, JOHN W " 0,6,"BRICKER HALL, JOHN W " 0,7,"BRICKER HALL, JOHN W " 0,8,"BRICKER HALL, JOHN W " 0,9,"BRICKER HALL, JOHN W " 0,19,"BRICKER HALL, JOHN W " 0,20,"BRICKER HALL, JOHN W " 0,21,"BRICKER HALL, JOHN W " 0,25,"BRICKER HALL, JOHN W " 0,27,"BRICKER HALL, JOHN W " 0,29,"BRICKER HALL, JOHN W " 0,35,"BRICKER HALL, JOHN W " 0,45,"BRICKER HALL, JOHN W " 0,59,"BRICKER HALL, JOHN W " 0,60,"BRICKER HALL, JOHN W " 0,61,"BRICKER HALL, JOHN W " 0,63,"BRICKER HALL, JOHN W " 0,"0006M","BRICKER HALL, JOHN W " 0,"0008A","BRICKER HALL, JOHN W " 0,"0008B","BRICKER HALL, JOHN W " 0,"0008C","BRICKER HALL, JOHN W " 0,"0008D","BRICKER HALL, JOHN W " 0,"0008E","BRICKER HALL, JOHN W " 0,"0008F","BRICKER HALL, JOHN W " 0,"0008G","BRICKER HALL, JOHN W " 0,"0008H","BRICKER HALL, JOHN W "
我想要的是所有建筑物的所有楼层.
我正在使用cat,awk,sort和uniq来获取这个列表,虽然我在建筑物名称字段中遇到了",",例如"BRICKER HALL,JOHN W",它正在抛弃我的整个csv代.
cat Buildings.csv | awk -F, '{print $1","$2}' | sort | uniq > Floors.csv
如何让awk使用逗号但忽略字段之间的逗号?或者,有人有更好的解决方案吗?
根据提供的答案提示awk csv解析器,我能够得到解决方案:
cat Buildings.csv | awk -f csv.awk | awk -F" -> 2|" '{print $2}' | awk -F"|" '{print $2","$3}' | sort | uniq > floors.csv
我们想要使用csv awk程序,然后从那里我想使用" - > 2 |" 这是基于csv awk程序的格式化.打印$ 2只打印csv解析的内容,这是因为程序打印原始行后跟" - >#",其中#是从csv解析的计数.(即列.)从那里我可以将这个awk csv结果拆分为"|" 它取代了逗号.然后排序,uniq和管道输出到文件并完成!
谢谢您的帮助.
gawk -vFPAT='[^,]*|"[^"]*"' '{print $1 "," $3}' | sort | uniq
这是一个很棒的GNU Awk 4扩展,您可以在其中定义字段模式而不是字段分隔符模式.对CSV有奇迹.(docs)
ETA(感谢mitchus):删除周围的引号gsub("^\"|\"$","",$3)
; 如果有更多的字段而不仅仅是$3
处理这种方式,只需循环它们.
请注意,这种简单的方法不能容忍格式错误的输入,也不能容忍引号之间的一些可能的特殊字符 - 涵盖所有这些都超出了整齐的单行的范围.
您获得的额外输出csv.awk
来自演示代码.您打算使用脚本中的函数进行解析,然后根据需要输出.
最后csv.awk
是{ ... }
循环,演示了其中一个功能.这是输出的代码-> 2|
.
相反,大多数情况下,只需调用解析函数即可print csv[1], csv[2]
.
那部分代码将如下所示:
{ num_fields = parse_csv($0, csv, ",", "\"", "\"", "\\n", 1); if (num_fields < 0) { printf "ERROR: %s (%d) -> %s\n", csverr, num_fields, $0; } else { # printf "%s -> ", $0; # printf "%s", num_fields; # for (i = 0;i < num_fields;i++) { # printf "|%s", csv[i]; # } # printf "|\n"; print csv[1], csv[2] } }
保存为your_script
(例如).
做chmod +x your_script
.
而且cat
是不必要的.此外,你可以做sort -u
而不是sort | uniq
.
您的命令将如下所示:
./yourscript Buildings.csv | sort -u > floors.csv
我的解决方法是使用以下命令从csv中删除逗号:
decommaize () { cat $1 | sed 's/"[^"]*"/"((&))"/g' | sed 's/\(\"((\"\)\([^",]*\)\(,\)\([^",]*\)\(\"))\"\)/"\2\4"/g' | sed 's/"(("/"/g' | sed 's/"))"/"/g' > $2 }
也就是说,首先用"(("和"引号与"))"替换开头引号,然后用"(无论如何")替换"()","无论什么",然后改变所有剩余的实例"((和"))" 回到 ".