Grab blocks of text

A place to try and solve your RegEx problems.
Post Reply
Brig
Posts: 2
Joined: Fri Dec 12, 2014 9:40 pm

Grab blocks of text

Post by Brig »

Howdy:

If I have files with SGML tags and text like this--

<para>Text and more text.</para>
<block>
<tag1>
<tag2>
<tag3>text</tag3>
</tag2>
</tag1>
</block>
<para>More text.</para>

--and I want to extract all the "blocks"--that is, every instance of <block>...</block> (and everything between those tags)--why doesn't the expression <block>.*</block> work? ("Dot matches newline" is checked.) If there are more than one "blocks" in a file, this grabs everything from the first <block> to the very last </block> in the file. What expression should I use to limit the match to only each and every discrete <block>?

Many thanks
User avatar
DigitalVolcano
Site Admin
Posts: 1717
Joined: Thu Jun 09, 2011 10:04 am

Re: Grab blocks of text

Post by DigitalVolcano »

You need to make the regular expression asterisk non-greedy (i.e. it will only match the first occurrence found.)

Code: Select all

<block>.*?</block>
Brig
Posts: 2
Joined: Fri Dec 12, 2014 9:40 pm

Re: Grab blocks of text

Post by Brig »

Beautiful! Thanks a million. So simple.
Post Reply