我使用库来解析iCalendar文件,但我不明白要拆分属性的正则表达式.
iCalendar属性有3种不同的风格:
BEGIN:VEVENT DTSTART;VALUE=DATE:20080402 RRULE:FREQ=YEARLY;WKST=MO
该库使用我想要理解的这个正则表达式:
var matches:Array = data.match(/(.+?)(;(.*?)=(.*?)((,(.*?)=(.*?))*?))?:(.*)$/); p.name = matches[1]; p.value = matches[9]; p.paramString = matches[2];
谢谢.
那是一个可怕的正则表达! .*
并且.*?
意味着匹配任何东西(贪婪)或少数(懒惰).这些只应作为最后的手段.当正则表达式与输入文本不匹配时,不正确的使用将导致灾难性的回溯.所有你需要了解的这个正则表达式,你不想写这样的正则表达式.
让我展示一下我将如何处理这个问题.显然,iCalendar文件格式是基于行的.每行都有一个属性和一个用冒号分隔的值.该属性可以具有以分号分隔的参数.这意味着属性不能包含换行符,分号或冒号,可选参数不能包含换行符或冒号,并且该值不能包含换行符.这些知识允许我们编写一个使用否定字符类的高效正则表达式:
([^\r\n;:]+)(;[^\r\n:]+)?:(.+)
或者在ActionScript中:
var matches:Array = data.match(/([^\r\n;:]+)(;[^\r\n:]+)?:(.+)/); p.name = matches[1]; p.value = matches[3]; p.paramString = matches[2];
正如RegexBuddy所解释的那样:
Match the regular expression below and capture its match into backreference number 1 «([^\r\n;:]+)» Match a single character NOT present in the list below «[^\r\n;:]+» Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» A carriage return character «\r» A line feed character «\n» One of the characters “;:” «;:» Match the regular expression below and capture its match into backreference number 2 «(;[^\r\n:]+)?» Between zero and one times, as many times as possible, giving back as needed (greedy) «?» Match the character “;” literally «;» Match a single character NOT present in the list below «[^\r\n:]+» Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» A carriage return character «\r» A line feed character «\n» The character “:” «:» Match the character “:” literally «:» Match the regular expression below and capture its match into backreference number 3 «(.+)» Match any single character that is not a line break character «.+» Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»