我想使用正则表达式过滤python中的字符串列表.在以下情况中,仅保留扩展名为".npy"的文件.
代码不起作用:
import re files = [ '/a/b/c/la_seg_x005_y003.png', '/a/b/c/la_seg_x005_y003.npy', '/a/b/c/la_seg_x004_y003.png', '/a/b/c/la_seg_x004_y003.npy', '/a/b/c/la_seg_x003_y003.png', '/a/b/c/la_seg_x003_y003.npy', ] regex = re.compile(r'_x\d+_y\d+\.npy') selected_files = filter(regex.match, files) print(selected_files)
同样的正则表达式在Ruby中适用于我:
selected = files.select { |f| f =~ /_x\d+_y\d+\.npy/ }
Python代码有什么问题?
selected_files = filter(regex.match, files)
re.match('regex')
等于re.search('^regex')
或text.startswith('regex')
但正则表达式版本.它只检查字符串是否以正则表达式开头.
所以,请re.search()
改用:
import re files = [ '/a/b/c/la_seg_x005_y003.png', '/a/b/c/la_seg_x005_y003.npy', '/a/b/c/la_seg_x004_y003.png', '/a/b/c/la_seg_x004_y003.npy', '/a/b/c/la_seg_x003_y003.png', '/a/b/c/la_seg_x003_y003.npy', ] regex = re.compile(r'_x\d+_y\d+\.npy') selected_files = list(filter(regex.search, files)) # The list call is only required in Python 3, since filter was changed to return a generator print(selected_files)
输出:
['/a/b/c/la_seg_x005_y003.npy', '/a/b/c/la_seg_x004_y003.npy', '/a/b/c/la_seg_x003_y003.npy']
如果您只想获取所有.npy
文件,只需使用str.endswith()
:
files = [ '/a/b/c/la_seg_x005_y003.png', '/a/b/c/la_seg_x005_y003.npy', '/a/b/c/la_seg_x004_y003.png', '/a/b/c/la_seg_x004_y003.npy', '/a/b/c/la_seg_x003_y003.png', '/a/b/c/la_seg_x003_y003.npy', ] selected_files = list(filter(lambda x: x.endswith('.npy'), files)) print(selected_files)
只需使用search
-因为match从字符串的开头到结尾(即整个)开始匹配,并且搜索匹配字符串中的任何位置。
import re files = [ '/a/b/c/la_seg_x005_y003.png', '/a/b/c/la_seg_x005_y003.npy', '/a/b/c/la_seg_x004_y003.png', '/a/b/c/la_seg_x004_y003.npy', '/a/b/c/la_seg_x003_y003.png', '/a/b/c/la_seg_x003_y003.npy', ] regex = re.compile(r'_x\d+_y\d+\.npy') selected_files = filter(regex.search, files) print(selected_files)
输出-
['/a/b/c/la_seg_x005_y003.npy', '/a/b/c/la_seg_x004_y003.npy', '/a/b/c/la_seg_x003_y003.npy']