Aggressive or conservative?

A place to try and solve your RegEx problems.
Post Reply
User avatar
Ggtours

Aggressive or conservative?

Post by Ggtours »

consider the following text sample:

printf("Glad you reached this normal spot\n");

searching: printf.*"

matches: printf("Glad you reached this normal spot\n" what is quite aggressive, why not...

and searching: printf.*?"

matches only: printf("

interesting but, I randomly found what this question mark does but nowhere in the documentation or anywhere else

could you advise where to find more on this syntax, any similar constructions?

thank you
User avatar
Fool4UAnyway

Re: Greedy or non-greedy?

Post by Fool4UAnyway »

If you want to search anything up to and including only the first quotation mark, you could describe this as accepting anything that is not a quotation mark and then the quotation mark (or simply "the next character" as it can only be a quotation mark).

Find:
printf[^"]*"

Using a dot would make sense if the next character could be any of a set. It keeps the regex short.

Find:
printf[^'`"]*.
User avatar
Ggtours

Re: Aggressive or conservative?

Post by Ggtours »

Thank you for this interesting answer. However it does not it appear to work in you want to catch everything but a word, not a simple character

Consider:

<<item value="first">>Contents of first item<</item>>

apply regex: <<item.*>> selects all. And by the way I didn't figure out how to just select: <<item value="first">>

But by pure chance, I found that: <<item.*?>>
just does the job of selecting: <<item value="first">>

How come, then? What is the role of the question mark (?) in this regex?

Thank you.
User avatar
zzzK

Re: Aggressive or conservative?

Post by zzzK »

Question mark (?) matches zero or one occurances.
User avatar
Ggtours

Re: Aggressive or conservative?

Post by Ggtours »

After a star? Does it mean it they can be nested? How? What does .* mean as an occurrence then?
User avatar
Fool4UAnyway

Re: greedy or non-greedy?

Post by Fool4UAnyway »

But by pure chance, I found that: <<item.*?>>
just does the job of selecting: <<item value="first">>
It seems the ? question mark after the * asterisk is just to indicate the non-greedy match mode: be satisfied as soon as possible.

The effect is just the same as the method I described before.
You could have applied that to this particular search as well.

Find:
<<item[^>]*>>
User avatar
Fool4UAnyway

Re: "nesting" regular expressions?

Post by Fool4UAnyway »

Question mark (?) matches zero or one occurances.
After a star? Does it mean it they can be nested? How?
What does .* mean as an occurrence then?
Yes, you can apply some nesting (alike) techniques on regular expressions.

.* means what it means: accept any character (.) and accept any string of consecutive characters, without requiring at least one to appear. In fact, this means: "accept anything".

You can apply the ? quotation mark to (a part of) a regular expression by putting parentheses around the specific part and then putting the ? quotation mark behind the closing parenthesis:

(aab+c*)? means: match (two a characters, than any string of b characters requiring at least one and then any string of c characters without requiring any to appear) only once or not at all.

So (.*)? would mean "anything at all" once or not at all, which in effect is no different from just .*.

(aab+c*){2,4} would mean: match at least two and up to four times (repeated) the regular expression as described above between parentheses.

You can find out more about regular expressions on http://www.regular-expression.info.
User avatar
Ggtours

Re: Aggressive or conservative?

Post by Ggtours »

thanks a lot Fool4UAnyway

I think why the question mark (?) turns on non-greedy mode could be explained by its equivalence to: {0,1}

I also found your proposal to use: <<item[^>]*>>
as a means to be non greedy in searching: >>
very inspirational too
I will use these several constructs more, particularly in multiline and end-of-lines matched by '.' character, where greedy mode should often be avoided, for example when closing patterns are not unique

by the way, the answer to my question sits here from your provided link (in http://www.regular-expressions.info/reference.html): *?
means a lazy star! lazy as non greedy!
sorry I asked
but now we can better figure out why

one more comment though, your expression: (aab+c*)?
in addition to create a group to do what you described also creates a capture and a back reference
i saw (in http://www.regular-expressions.info/refadv.html) you can avoid this capture/back reference using: (?:aab+c*)?
and this time the first question mark has a different meaning! ;)

kind regards
User avatar
Fool4UAnyway

Re: grouped expressions and back-references

Post by Fool4UAnyway »

one more comment though, your expression: (aab+c*)?
in addition to create a group to do what you described also creates a capture and a back reference
i saw (in http://www.regular-expressions.info/refadv.html) you can avoid this capture/back reference using: (?:aab+c*)?
and this time the first question mark has a different meaning!
As far as I understand, the complete (part of the total) regular expression is (..)? and this expression by itself is not a group because of the functional ? character.

If (..) were to be treated as a back-reference, how would this be satisfied? The regex engine could never replace this, for instance, if the expression wasn't satisfied at all: it is not required to appear just because of the ? following it.

If you would want to back-reference this part of the total regular expression, you would have to write ((..)?). The outer pair of parenthesis is used to create a back reference group, if no (further) functional character(s) follow(s) it. You could then re-use (replace) this part which would contain either (..) or nothing at all.

The link you provided just deals with (..) parts. It does not state anyting about (..)?, for instance.
User avatar
silentguy

Re: grouped expressions and back-references

Post by silentguy »

Fool4UAnyway wrote: As far as I understand, the complete (part of the total) regular expression is (..)? and this expression by itself is not a group because of the functional ? character.

If (..) were to be treated as a back-reference, how would this be satisfied? The regex engine could never replace this, for instance, if the expression wasn't satisfied at all: it is not required to appear just because of the ? following it.

If you would want to back-reference this part of the total regular expression, you would have to write ((..)?). The outer pair of parenthesis is used to create a back reference group, if no (further) functional character(s) follow(s) it. You could then re-use (replace) this part which would contain either (..) or nothing at all.

The link you provided just deals with (..) parts. It does not state anyting about (..)?, for instance.
Every pair of parenthesis you use creates a group and possible backreference unless it starts with a ? which gives it special meaning. This is something most people forget when using them for just grouping stuff together for modifiers etc. As to what actually happens to be in this group can be pretty chaotic as you just noticed. this can be even weirder is you write something like this "(.)*". Just have a look at http://www.regular-expressions.info/brackets.html sections "Backreferences to Failed Groups" and "Repetition and Backreferences"
Post Reply