Page 1 of 1

Find 2 uses of <tagA>, without <tagB> between them

Posted: Wed Apr 01, 2015 2:10 pm
by my99
Hi,

Some SGML files use revision tagging as in the following example:
This is a sentence <revst>and this part has been revised<revend>.
As they are not part of a named contextual tag pair (eg. <change></change>), this can cause problems with them becoming incorrectly nested inside each other.

So I'm trying to come up with a neat string that will detect any instances of:

- A <revst> soon followed by another <revst> (ie. before the first has been resolved by a <revend>)
- A <revend> soon followed by another <revend> (ie. before a <revst> has triggered the next pair)

Between the repeated <revst> or <revend>, anything else might occur (including other tagging and Newlines), although there are no other tags whose name begins with "rev".

I'm sure I'm on the cusp of something using lookaheads, but it's starting to hurt. :)

Thanks for any help.

Re: Find 2 uses of <tagA>, without <tagB> between them

Posted: Mon Jul 20, 2015 12:46 pm
by silentguy
I know this is an old topic buy maybe you are still looking?

Code: Select all

(<rev\w+>)(?=((?!<rev\w+>)[\d\D])*\1)
should to what you want. Not sure if it's efficient. On long texts it potentially has to look at the whole text...

Re: Find 2 uses of <tagA>, without <tagB> between them

Posted: Tue Sep 01, 2015 2:15 pm
by my99
Thanks for your response silentguy - I just did a quick test and it does indeed seem to be working.

The question is - how? :) Would it be possible to take me through the logic please?

If so, I could potentially adapt this technique to solve other problems.

Thanks again for your efforts.