我有一个包含120万条文本记录的csv文件.字母数字字段用引号括起来,日期/时间或数字字段不是.
例如"Fred","Smith",01/07/1967,2,"7,The High Street","Anytown","Anycounty","LS1 7AA"
我想做的是在Excel中编写一些VBA(或多或少是我可以使用的唯一可用的工具),它通过记录读取CSV记录,执行检查(如同在最后一个字段中发生的那样,邮政编码)然后将1.2m记录的一小部分输出到新的输出文件.
我理解如何打开这两个文件,读取记录,做我需要做的数据并写出来(我只输出带有前缀表示异常类型的输入记录)
我不知道的是如何正确解析VBA中的CSV.我无法进行简单的文本扫描并搜索逗号,因为文本有时会有逗号(因此文本字段为文本分隔的原因)
是否有一个很棒的命令可以让我从记录中的第n个字段快速获取数据?
我想要的是s_work = field(s_input_record,5)其中5是我的CSV中的字段编号....
非常感谢,C
以下代码应该可以解决问题.我没有在我面前使用Excel,所以我没有测试它,但概念是合理的.
如果这最终太慢,我们可以考虑提高效率的方法.
Sub SelectSomeRecords() Dim testLine As String Open inputFileName For Input As #1 Open outputFileName For Output As #2 While Not EOF(1) Line Input #1, testLine If RecordIsInteresting(testLine) Then Print #2, testLine End If Wend Close #1 Close #2 End Sub Function RecordIsInteresting(recordLine As String) As Boolean Dim lineItems(1 to 8) As String GetRecordItems(lineItems(), recordLine) ''// do your custom checking here: RecordIsInteresting = lineItems(8) = "LS1 7AA" End Function Sub GetRecordItems(items() As String, recordLine as String) Dim finishString as Boolean Dim itemString as String Dim itemIndex as Integer Dim charIndex as Long Dim inQuote as Boolean Dim testChar as String inQuote = False charIndex = 1 itemIndex = 1 itemString = "" finishString = False While charIndex <= Len(recordLine) testChar = Mid$(recordLine, charIndex, 1) finishString = False If inQuote Then If testChar = Chr$(34) Then inQuote = False finishString = True charIndex = charIndex + 1 ''// ignore the next comma Else itemString = itemString + testChar End If Else If testChar = Chr$(34) Then inQuote = True ElseIf testChar = "," Then finishString = True Else itemString = itemString + testChar End If End If If finishString Then items(itemIndex) = itemString itemString = "" itemIndex = itemIndex + 1 End If charIndex = charIndex + 1 Wend End Sub
VBScript怎么样,虽然这也适用于Excel:
Set cn = CreateObject("ADODB.Connection") 'Note HDR=Yes, that is, first row contains field names ' 'and FMT delimted, ie CSV ' strCon="Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\Docs\;" _ & "Extended Properties=""text;HDR=Yes;FMT=Delimited"";" cn.open strcon 'You would not need delimiters ('') if last field is numeric: ' strSQL="SELECT FieldName1, FieldName2 INTO New.csv FROM Old.csv " _ & " WHERE LastFieldName='SomeTextValue'" 'Creates new csv file cn.Execute strSQL