regex - Python re.finditer match.groups() does not contain all groups from match -
i trying use regex in python find , print matching lines multiline search. text searching through may have below example structure:
aaa abc1 abc2 abc3 aaa abc1 abc2 abc3 abc4 abc aaa abc1 aaa
from want retrieve abc*s occur @ least once , preceeded aaa.
the problem is, despite group catching want:
match = <_sre.sre_match object; span=(19, 38), match='aaa\nabc2\nabc3\nabc4\n'>
... can access last match of group:
match groups = ('aaa\n', 'abc4\n')
below example code use problem.
#! python import sys import re import os string = "aaa\nabc1\nabc2\nabc3\naaa\nabc1\nabc2\nabc3\nabc4\nabc\naaa\nabc1\naaa\n" print(string) p_matches = [] p_matches.append( (re.compile('(aaa\n)(abc[0-9]\n){1,}')) ) # matches = re.finditer(p_matches[0],string) match in matches: strout = '' gr_iter=0 print("match = "+str(match)) print("match groups = "+str(match.groups())) group in match.groups(): gr_iter+=1 sys.stdout.write("test group:"+str(gr_iter)+"\t"+group) # test output if group not none: if group != '': strout+= '"'+group.replace("\n","",1)+'"'+'\n' sys.stdout.write("\ncomplete result:\n"+strout+"====\n")
here regular expression:
(aaa\r\n)(abc[0-9]\r\n){1,}
your goal capture all abc#
s follow aaa
. can see in debuggex demo, abc#
s indeed being matched (they're highlighted in yellow). however, since "what being repeated" part
abc[0-9]\r\n
is being captured (is inside parentheses), , quantifier,
{1,}
is not being captured, therefore causes matches except final one discarded. them, must capture quantifier:
aaa\r\n((?:abc[0-9]\r\n){1,})
i've placed "what being repeated" part (abc[0-9]\r\n
) non-capturing group. (i've stopped capturing aaa
, don't seem need it.)
the captured text can split on newline, , give pieces wish.
(note \n
doesn't work in debuggex. requires \r\n
.)
this workaround. not many regular expression flavors offer capability of iterating through repeating captures (which ones...?). more normal approach loop through , process each match found. here's example java:
import java.util.regex.*; public class repeatingcapturegroupsdemo { public static void main(string[] args) { string input = "i have cat, dog better."; pattern p = pattern.compile("(mouse|cat|dog|wolf|bear|human)"); matcher m = p.matcher(input); while (m.find()) { system.out.println(m.group()); } } }
output:
cat dog
(from http://ocpsoft.org/opensource/guide-to-regular-expressions-in-java-part-1/, 1/4 down)
please consider bookmarking stack overflow regular expressions faq future reference. links in answer come it.
Comments
Post a Comment