我有一个任意长度的字符串,从位置p0开始,我需要找到三个3字母模式之一的第一个出现.
假设字符串只包含字母.我需要找到从位置p0开始的三元组计数并以三元组向前跳,直到第一次出现'aaa'或'bbb'或'ccc'.
使用正则表达式甚至可以实现这一点吗?
$string=~/^ # from the start of the string (?:.{$p0}) # skip (don't capture) "$p0" occurrences of any character (?:...)*? # skip 3 characters at a time, # as few times as possible (non-greedy) (aaa|bbb|ccc) # capture aaa or bbb or ccc as $1 /x;
(假设p0基于0).
当然,在字符串上使用substr来跳过可能更有效:
substr($string, $p0)=~/^(?:...)*?(aaa|bbb|ccc)/;
莫里茨说这可能比正则表达更快.即使它有点慢,但在早上5点更容易理解.:)
#0123456789.123456789.123456789. my $string = "alsdhfaaasccclaaaagalkfgblkgbklfs"; my $pos = 9; my $length = 3; my $regex = qr/^(aaa|bbb|ccc)/; while( $pos < length $string ) { print "Checking $pos\n"; if( substr( $string, $pos, $length ) =~ /$regex/ ) { print "Found $1 at $pos\n"; last; } $pos += $length; }
你不能真正依赖正则表达式,但你可以做这样的事情:
pos $string = $start_from; $string =~ m/\G # anchor to previous pos() ((?:...)*?) # capture everything up to the match (aaa|bbb|ccc) /xs or die "No match" my $result = length($1) / 3;
但我认为使用substr()和unpack()分割成三元组并在for循环中遍历三元组会更快一些.
(编辑:它的长度(),而不是lenght();-)