regex - Python re.finditer match.groups() does not contain all groups from match -


i trying use regex in python find , print matching lines multiline search. text searching through may have below example structure:

 aaa abc1 abc2 abc3 aaa abc1 abc2 abc3 abc4 abc aaa abc1 aaa 

from want retrieve abc*s occur @ least once , preceeded aaa.

the problem is, despite group catching want:

match = <_sre.sre_match object; span=(19, 38), match='aaa\nabc2\nabc3\nabc4\n'> 

... can access last match of group:

match groups = ('aaa\n', 'abc4\n') 

below example code use problem.

#! python import sys import re import os  string = "aaa\nabc1\nabc2\nabc3\naaa\nabc1\nabc2\nabc3\nabc4\nabc\naaa\nabc1\naaa\n" print(string)  p_matches = [] p_matches.append( (re.compile('(aaa\n)(abc[0-9]\n){1,}')) ) #    matches = re.finditer(p_matches[0],string)  match in matches:     strout = ''     gr_iter=0     print("match = "+str(match))     print("match groups = "+str(match.groups()))     group in match.groups():     gr_iter+=1     sys.stdout.write("test group:"+str(gr_iter)+"\t"+group) # test output     if group not none:         if group != '':             strout+= '"'+group.replace("\n","",1)+'"'+'\n' sys.stdout.write("\ncomplete result:\n"+strout+"====\n") 

here regular expression:

(aaa\r\n)(abc[0-9]\r\n){1,} 

regular expression visualization

debuggex demo

your goal capture all abc#s follow aaa. can see in debuggex demo, abc#s indeed being matched (they're highlighted in yellow). however, since "what being repeated" part

abc[0-9]\r\n 

is being captured (is inside parentheses), , quantifier,

{1,} 

is not being captured, therefore causes matches except final one discarded. them, must capture quantifier:

aaa\r\n((?:abc[0-9]\r\n){1,}) 

regular expression visualization

debuggex demo

i've placed "what being repeated" part (abc[0-9]\r\n) non-capturing group. (i've stopped capturing aaa, don't seem need it.)

the captured text can split on newline, , give pieces wish.

(note \n doesn't work in debuggex. requires \r\n.)


this workaround. not many regular expression flavors offer capability of iterating through repeating captures (which ones...?). more normal approach loop through , process each match found. here's example java:

   import java.util.regex.*;  public class repeatingcapturegroupsdemo {    public static void main(string[] args) {       string input = "i have cat, dog better.";        pattern p = pattern.compile("(mouse|cat|dog|wolf|bear|human)");       matcher m = p.matcher(input);        while (m.find()) {          system.out.println(m.group());       }    } } 

output:

cat dog 

(from http://ocpsoft.org/opensource/guide-to-regular-expressions-in-java-part-1/, 1/4 down)


please consider bookmarking stack overflow regular expressions faq future reference. links in answer come it.


Comments

Popular posts from this blog

windows - Single EXE to Install Python Standalone Executable for Easy Distribution -

c# - Access objects in UserControl from MainWindow in WPF -

javascript - How to name a jQuery function to make a browser's back button work? -