Mark leading / trailing empty lines

A place to try and solve your RegEx problems.
Post Reply
User avatar
Juergen

Mark leading / trailing empty lines

Post by Juergen »

Lets say I have some text similar to this, where each asterisk (*) stands for an empty line:
-------------
*
*
text
*
text
*
*
text
*
-------------

and I want to mark for double check (and possible later deletion) the leading and trailing empty lines but NOT any line(s) in the text body. That is I want to end up:
-------------
-check-
-check-
text
*
text
*
*
text
-check-
-------------

What makes the first and last empty line(s) unique and what regex would find one, or the other, or both?
User avatar
Fool4UAnyway

Re: Mark leading / trailing empty lines

Post by Fool4UAnyway »

I believe in Text Crawler 2 you can use the ^ and $ characters for this, as they mean start of the file and end of the file.

Leading blank lines

Find:
^(\r\n)+
Replace by:
{leave empty}

This captures all newline characters from the start, as long as there is no (other) text on a line.
To delete, replace by nothing.

Trailing blank lines:

Find:
(\r\n){2,}$
Replace by:
\r\n

This captures all line endings, requiring at least one blank line between them.
To delete, you should replace a single \r\n newline character (sequence) to end the last line that does contain text.
User avatar
Guest

Re: Mark leading / trailing empty lines

Post by Guest »

I wish it was that easy.
I can try it only on TC v1.1.4, as I don't have 'NET, but assume they to be pretty much the same in this regard.

You missed the requirement that blank line(s) in the text are to be left alone.

^(\r\n)+
catches all blank line(s) (except the last, of course)
and
(\r\n){2,}$
catches only double (or more) blank lines, not the first (as expected) but not the last (a single blank line) either.
And
(\r\n){1,}$ or (\r\n)+$
catches everything.
User avatar
silentguy

Re: Mark leading / trailing empty lines

Post by silentguy »

Guest wrote:I wish it was that easy.
I can try it only on TC v1.1.4, as I don't have 'NET, but assume they to be pretty much the same in this regard.

You missed the requirement that blank line(s) in the text are to be left alone.

^(\r\n)+
catches all blank line(s) (except the last, of course)
and
(\r\n){2,}$
catches only double (or more) blank lines, not the first (as expected) but not the last (a single blank line) either.
And
(\r\n){1,}$ or (\r\n)+$
catches everything.
If it matches all blank lines you are using the wrong mode. There is a anchors match lines and anchors match file mode.
if you replace ^ with \A and $ with \z it should work just fine, no matter which mode you are it. \z has the added advantage the $ (and \Z) match end of line/file EXCEPT for when the file ends with a line break.

Therefore "(\A(\r\n)+|(\r\n){2,}\z)" should do what you want, regardless of modes etc.
you could also use "(\A(\r\n)+|(\r\n)+\z)", it all depends on how you define an empy line, with the second one the file ends after the last line with text with the first one it ends at the beginning of the first empty line.
User avatar
Juergen

Re: Mark leading / trailing empty lines

Post by Juergen »

silentguy wrote:If it matches all blank lines you are using the wrong mode. There is a anchors match lines and anchors match file mode.
if you replace ^ with \A and $ with \z it should work just fine, no matter which mode you are it. \z has the added advantage the $ (and \Z) match end of line/file EXCEPT for when the file ends with a line break.

Therefore "(\A(\r\n)+|(\r\n){2,}\z)" should do what you want, regardless of modes etc.
you could also use "(\A(\r\n)+|(\r\n)+\z)", it all depends on how you define an empy line, with the second one the file ends after the last line with text with the first one it ends at the beginning of the first empty line.
Thanks for your answer, but I was trying it with the regex tester, and on v1.1.4 there are no flags to be set, at least that I can find. None of your suggestions seem to be giving the right answer. They all do the blank line(s) in the text, too.
I consider blank lines, for this, just lines with nothing in it, except maybe \r\n.

I should have said before, but forgot, the final resting place for the regex, once I had it figured out, was going to be in a js/html file, which has the advantage that I can run a loop and/or do intermediate replace.

I am not a programmer, but as far as I see ^ or $ applies to each line and js does not do \A or \z.
User avatar
silentguy

Re: Mark leading / trailing empty lines

Post by silentguy »

Juergen wrote:
silentguy wrote:If it matches all blank lines you are using the wrong mode. There is a anchors match lines and anchors match file mode.
if you replace ^ with \A and $ with \z it should work just fine, no matter which mode you are it. \z has the added advantage the $ (and \Z) match end of line/file EXCEPT for when the file ends with a line break.

Therefore "(\A(\r\n)+|(\r\n){2,}\z)" should do what you want, regardless of modes etc.
you could also use "(\A(\r\n)+|(\r\n)+\z)", it all depends on how you define an empy line, with the second one the file ends after the last line with text with the first one it ends at the beginning of the first empty line.
Thanks for your answer, but I was trying it with the regex tester, and on v1.1.4 there are no flags to be set, at least that I can find. None of your suggestions seem to be giving the right answer. They all do the blank line(s) in the text, too.
I consider blank lines, for this, just lines with nothing in it, except maybe \r\n.

I should have said before, but forgot, the final resting place for the regex, once I had it figured out, was going to be in a js/html file, which has the advantage that I can run a loop and/or do intermediate replace.

I am not a programmer, but as far as I see ^ or $ applies to each line and js does not do \A or \z.
Okay, I don't have 1.1.5 at hand, in 2 it's "Multi-Line Anchors" and can be found on the top of the regexp tester and to the right of the normal replace.
Oh, javascript uses ecma which does not support \A\Z\z. And while the javascript regexp syntax does support modes, in this e.g. /test/g searches for test Globally and /test/i Ignored case, the official documentation does not state anything about multiline mode. the only reference I found was one stating that you specifically have to add an /m mode if you want ^ and $ to match every line instead of the file. http://www.regular-expressions.info/javascript.html
Post Reply