Page 1 of 1

Grab blocks of text

Posted: Fri Dec 12, 2014 9:57 pm
by Brig
Howdy:

If I have files with SGML tags and text like this--

<para>Text and more text.</para>
<block>
<tag1>
<tag2>
<tag3>text</tag3>
</tag2>
</tag1>
</block>
<para>More text.</para>

--and I want to extract all the "blocks"--that is, every instance of <block>...</block> (and everything between those tags)--why doesn't the expression <block>.*</block> work? ("Dot matches newline" is checked.) If there are more than one "blocks" in a file, this grabs everything from the first <block> to the very last </block> in the file. What expression should I use to limit the match to only each and every discrete <block>?

Many thanks

Re: Grab blocks of text

Posted: Sun Dec 14, 2014 2:40 pm
by DigitalVolcano
You need to make the regular expression asterisk non-greedy (i.e. it will only match the first occurrence found.)

Code: Select all

<block>.*?</block>

Re: Grab blocks of text

Posted: Mon Dec 15, 2014 3:41 pm
by Brig
Beautiful! Thanks a million. So simple.