Regex replace results in strange characters

Tool for Search and Replace across multiple files.
Post Reply
User avatar
Juergen

Regex replace results in strange characters

Post by Juergen »

I have the following situation I cannot solve. I have:
Line1












Line2

I want to end up:
Line1

Line2
That is I want to delete 2 or more empty lines down to just 1.

I tried this RegEx expression:
(\r\n *){3,}
and replace with:
\r\n\n

In the Regular Expression Tester it looks as expected. When I do it to a file it looks ok in the TextCrawler preview window. Opening the file in 'Word' also looks as expected. However opening it in an editor like "(Windows)Notepad" (or TEDNotepad), results in:
Line1

Line2

Further investigation, and using different text editors (like PSPad, KDiff3, WinMerge, Crimson Editor, Plato3, TextPad, RJ TextEd, UnicEdit, WordPad), which all show it correctly, indicate to me I have to grapically represent what happens.
With '(Windows)Notepad' (or TEDNotepad) it looks like:
Line1[**][*]
Line2
Where [**] represents two rectangles, which however appear to be only one character, and the [*] represents a rectangle, but only one character. Probably [\r\n] and [\n] from the regex expression.

Obviously I want it to be looking "right" in all cases. Can you help me?

Juergen

P.S. And if this is confusing, since this forum window shows it correct as well, I can send screen dumps as pdf files, if you tell me how.
User avatar
DV

Post by DV »

This seems to work ok for me - looks good in normal notepad too. Not sure about have two \n's in the replace expression though.
Perhaps the file you are working on has some unicode in it? TC doesn't handle unicode text files properly.

User avatar
Juergen

Post by Juergen »

The post was drafted in Notepad, and copy/pasted into the post. When copy/pasting the post back out into Notepad it looks ok for me also. Apparently the copy/past fixes it.
I can send a pdf and the txt file if you send me an email address where I can attach files.
(The two \n are remains from trying "this, that and something else")
User avatar
DV

Post by DV »

Sure - send to software (at) digitalvolcano.co.uk.


User avatar
Juergen

Post by Juergen »

I was sick a few days so could not get back immediately. To zoom into the problem I was making new, minimum files and had trouble repeating my problems. Investigating that further, going all the way down to the hex code, I found the solution.

And, for the curious minded, Notepad is fussy how a new line is coded (replaced via Regex). <CR><NL> (\r\n) is ok, <NL><CR> (\n\r) is not. Who would have thought, just like a manual typewriter.

During copy/past other things happen.
Having this in hex:
41 0a 0D 0A 0d 42 0D 0A 0D 0A 43
changes to:
41 0d 0a 0D 0A 0d 0a 42 0D 0A 0D 0A 43

So apparently a single 0a or 0d is completed to 0D 0A. In this example twice, changing two to three returns.
Post Reply