每当我想要替换一段文本时,我总是要做以下事情:
"(?Psome_pattern)(?P foo)(?P end)"
然后将该start
组与新数据连接起来replace
,然后将该end
组连接起来.
有更好的方法吗?
>>> import re >>> s = "start foo end" >>> s = re.sub("foo", "replaced", s) >>> s 'start replaced end' >>> s = re.sub("(?<= )(.+)(?= )", lambda m: "can use a callable for the %s text too" % m.group(1), s) >>> s 'start can use a callable for the replaced text too end' >>> help(re.sub) Help on function sub in module re: sub(pattern, repl, string, count=0) Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a string or a callable; if a callable, it's passed the match object and must return a replacement string to be used.
查看Python re文档中的lookaheads (?=...)
和lookbehinds (?<=...)
- 我很确定它们就是你想要的.它们匹配字符串,但不"消耗"它们匹配的字符串的位.
简短的版本是你不能使用 Python的re
模块在lookbehinds中使用可变宽度模式.没有办法改变这个:
>>> import re >>> re.sub("(?<=foo)bar(?=baz)", "quux", "foobarbaz") 'fooquuxbaz' >>> re.sub("(?<=fo+)bar(?=baz)", "quux", "foobarbaz") Traceback (most recent call last): File "", line 1, in re.sub("(?<=fo+)bar(?=baz)", "quux", string) File "C:\Development\Python25\lib\re.py", line 150, in sub return _compile(pattern, 0).sub(repl, string, count) File "C:\Development\Python25\lib\re.py", line 241, in _compile raise error, v # invalid expression error: look-behind requires fixed-width pattern
这意味着你需要解决它,最简单的解决方案与你现在正在做的非常相似:
>>> re.sub("(fo+)bar(?=baz)", "\\1quux", "foobarbaz") 'fooquuxbaz' >>> >>> # If you need to turn this into a callable function: >>> def replace(start, replace, end, replacement, search): return re.sub("(" + re.escape(start) + ")" + re.escape(replace) + "(?=" + re.escape + ")", "\\1" + re.escape(replacement), search)
这不具备外观解决方案的优雅,但它仍然是一个非常清晰,直接的单线程.如果你看看专家在这件事上有什么话要说(他说的是JavaScript,完全缺乏外观,但许多原则是相同的),你会发现他最简单的解决方案看起来很像这个.