Find 2 uses of <tagA>, without <tagB> between them

A place to try and solve your RegEx problems.
Post Reply
my99
Posts: 3
Joined: Wed Apr 01, 2015 1:56 pm

Find 2 uses of <tagA>, without <tagB> between them

Post by my99 »

Hi,

Some SGML files use revision tagging as in the following example:
This is a sentence <revst>and this part has been revised<revend>.
As they are not part of a named contextual tag pair (eg. <change></change>), this can cause problems with them becoming incorrectly nested inside each other.

So I'm trying to come up with a neat string that will detect any instances of:

- A <revst> soon followed by another <revst> (ie. before the first has been resolved by a <revend>)
- A <revend> soon followed by another <revend> (ie. before a <revst> has triggered the next pair)

Between the repeated <revst> or <revend>, anything else might occur (including other tagging and Newlines), although there are no other tags whose name begins with "rev".

I'm sure I'm on the cusp of something using lookaheads, but it's starting to hurt. :)

Thanks for any help.
silentguy
Posts: 6
Joined: Fri Dec 12, 2014 10:21 am

Re: Find 2 uses of <tagA>, without <tagB> between them

Post by silentguy »

I know this is an old topic buy maybe you are still looking?

Code: Select all

(<rev\w+>)(?=((?!<rev\w+>)[\d\D])*\1)
should to what you want. Not sure if it's efficient. On long texts it potentially has to look at the whole text...
my99
Posts: 3
Joined: Wed Apr 01, 2015 1:56 pm

Re: Find 2 uses of <tagA>, without <tagB> between them

Post by my99 »

Thanks for your response silentguy - I just did a quick test and it does indeed seem to be working.

The question is - how? :) Would it be possible to take me through the logic please?

If so, I could potentially adapt this technique to solve other problems.

Thanks again for your efforts.
Post Reply