我正在寻找一个输出文本中所有引用的SimpleGrepSedPerlOrPythonOneLiner.
例1:
echo “HAL,” noted Frank, “said that everything was going extremely well.” | SimpleGrepSedPerlOrPythonOneLiner
标准输出:
"HAL," "said that everything was going extremely well.”
例2:
cat MicrosoftWindowsXPEula.txt | SimpleGrepSedPerlOrPythonOneLiner
标准输出:
"EULA" "Software" "Workstation Computer" "Device" "DRM"
等等
(链接到相应的文本).
我喜欢这个:
perl -ne 'print "$_\n" foreach /"((?>[^"\\]|\\+[^"]|\\(?:\\\\)*")*)"/g;'
它有点冗长,但它比最简单的实现更好地处理转义引用和回溯.它的意思是:
my $re = qr{ " # Begin it with literal quote ( (?> # prevent backtracking once the alternation has been # satisfied. It either agrees or it does not. This expression # only needs one direction, or we fail out of the branch [^"\\] # a character that is not a dquote or a backslash | \\+ # OR if a backslash, then any number of backslashes followed by [^"] # something that is not a quote | \\ # OR again a backslash (?>\\\\)* # followed by any number of *pairs* of backslashes (as units) " # and a quote )* # any number of *set* qualifying phrases ) # all batched up together " # Ended by a literal quote }x;
如果你不需要那么大的力量 - 说它只是可能是对话而不是结构化的引用,那么
/"([^"]*)"/
可能与其他任何东西一样有效.
如果您有嵌套引号,则没有正则表达式解决方案可行,但对于您的示例,这种方法效果很好
$ echo \"HAL,\" noted Frank, \"said that everything was going extremely well\" | perl -n -e 'while (m/(".*?")/g) { print $1."\n"; }' "HAL," "said that everything was going extremely well" $ cat eula.txt| perl -n -e 'while (m/(".*?")/g) { print $1."\n"; }' "EULA" "online" "Software" "Workstation Computer" "Device" "multiplexing" "DRM" "Secure Content" "DRM Software" "Secure Content Owners" "DRM Upgrades" "WMFSDK" "Not For Resale" "NFR," "Academic Edition" "AE," "Qualified Educational User." "Exclusion of Incidental, Consequential and Certain Other Damages" "Restricted Rights" "Exclusion des dommages accessoires, indirects et de certains autres dommages" "Consumer rights"