我正在写一个简单的字符串解析器,它允许类似regexp的量词.输入字符串可能如下所示:
s = "x y{1,2} z"
我的解析器函数将此字符串转换为元组列表:
list_of_tuples = [("x", 1, 1), ("y", 1, 2), ("z", 1, 1)]
现在,棘手的一点是我需要一个由量化指定的所有有效组合的列表.组合都必须具有相同数量的元素,并且该值None
用于填充.对于给定的示例,预期输出为
[["x", "y", None, "z"], ["x", "y", "y", "z"]]
我确实有一个可行的解决方案,但我对它并不满意:它使用两个嵌套for
循环,我发现代码有点模糊,所以有一些尴尬和笨拙的事情:
import itertools
def permute_input(lot):
outer = []
# is there something that replaces these nested loops?
for val, start, end in lot:
inner = []
# For each tuple, create a list of constant length
# Each element contains a different number of
# repetitions of the value of the tuple, padded
# by the value None if needed.
for i in range(start, end + 1):
x = [val] * i + [None] * (end - i)
inner.append(x)
outer.append(inner)
# Outer is now a list of lists.
final = []
# use itertools.product to combine the elements in the
# list of lists:
for combination in itertools.product(*outer):
# flatten the elements in the current combination,
# and append them to the final list:
final.append([x for x
in itertools.chain.from_iterable(combination)])
return final
print(permute_input([("x", 1, 1), ("y", 1, 2), ("z", 1, 1)]))
[['x', 'y', None, 'z'], ['x', 'y', 'y', 'z']]
我怀疑这样做有一种更优雅的方式,可能隐藏在itertools
模块的某个地方?
解决该问题的另一种方法是使用pyparsing
此示例正则表达式解析器,它将正则表达式扩展为可能的匹配字符串.对于您的x y{1,2} z
示例字符串,它将生成两个可能的字符串来扩展量词:
$ python -i regex_invert.py >>> s = "x y{1,2} z" >>> for item in invert(s): ... print(item) ... x y z x yy z
重复本身同时支持开放范围和封闭范围,并定义为:
repetition = ( (lbrace + Word(nums).setResultsName("count") + rbrace) | (lbrace + Word(nums).setResultsName("minCount") + "," + Word(nums).setResultsName("maxCount") + rbrace) | oneOf(list("*+?")) )
为了得到期望的结果,我们应该修改从recurseList
生成器和返回列表而不是字符串中产生结果的方式:
for s in elist[0].makeGenerator()(): for s2 in recurseList(elist[1:]): yield [s] + [s2] # instead of yield s + s2
然后,我们只需要展平结果:
$ ipython3 -i regex_invert.py In [1]: import collections In [2]: def flatten(l): ...: for el in l: ...: if isinstance(el, collections.Iterable) and not isinstance(el, (str, bytes)): ...: yield from flatten(el) ...: else: ...: yield el ...: In [3]: s = "x y{1,2} z" In [4]: for option in invert(s): ...: print(list(flatten(option))) ...: ['x', ' ', 'y', None, ' ', 'z'] ['x', ' ', 'y', 'y', ' ', 'z']
然后,如果需要,您可以过滤空白字符:
In [5]: for option in invert(s): ...: print([item for item in flatten(option) if item != ' ']) ...: ['x', 'y', None, 'z'] ['x', 'y', 'y', 'z']