Dialog Fixing.

A place to try and solve your RegEx problems.
Post Reply
Cerest
Posts: 2
Joined: Sun Sep 11, 2011 9:53 pm

Dialog Fixing.

Post by Cerest »

Hello,

I'm trying to find a function that will find the following.

Find: "Text[. ! ?][linebreak][linebreak]Text[. ! ?]"
Replace: "Text[. ! ?]Text[. ! ?]"

Everything seems straight forward, but the Text part of it is confusing. These text sections are of varying lengths, varying characters (numbers and letters and hyphens). The [. ! ?] stands for acceptable characters at the end of a sentence.
User avatar
Fool4UAnyway

Re: Dialog Fixing - the Text part

Post by Fool4UAnyway »

These text sections are of varying lengths,
varying characters (numbers and letters and hyphens).
You might just specify these characters as acceptable ones:
[a-zA-Z0-9\-;:]

You may specify a minimal number, if you like:
* for really any number, also none of them
+ to require at least one
{10,} to require at least ten with no maximum

So, the Text part could be:
[a-zA-Z0-9\-;:]+

This requires at least one acceptable character, but any more of them are accepted as well.

[. ! ?]

You might use [.!?] without any space character specified, or [.!? ] with the space character specified only once.

You could also use these to specify what you accept as Text: any character that does not match any of the characters at the end of a sentence.

[a-zA-Z0-9\-;:]+

might as well be specified ad

[^.!?]+

^ means "not", so this just excludes the end of sentence characters (except the space character) and the linebreak characters. So, in effect, this would accept any other character that you might not have specified with the other Text regex part. It may be hard to specify really all characters that could be part of Text (" ~ ' / etc.).
User avatar
Fool4UAnyway

Re: Dialog Fixing - the Text part

Post by Fool4UAnyway »

You could capture the found texts by surrounding the by parentheses.
([a-zA-Z0-9\-;:]+)\r\n([a-zA-Z0-9\-;:]+)

You can then re-place these parts by referring to their sequential number:
$1$2

But, if all you want to do is remove the linebreaks, you could just only search for the newline characters and not re-place them, effectively removing them.

Find:
\r\n

Replace by:
{leave empty}

If there are any linebreaks that you do NOT want to be removed, you may specify any expression before or after the newline characters to include only the lines that you want to affect, or exclude any that you want to leave as they are.

For example, if a line already ends on an end of sentence character, you could exclude these as stated below.

Find:
([^.!?] *)\r\n

Replace by:
$1

This will exclude lines already ending on a . ! or ? and possibly traling white space characters from being affected.

You have to capture the "not non-matching character(s)" to be able to re-place them (without the newline characters). Otherwise, they would get lost, too.

You could remove the trailing (white) space characters by slightly changing this regular expression.

Find:
([^.!?])[ \t]*\r\n

Replace by:
$1

The white space characters are not part of the captured text now. Any string of space or tab characters (\t) at the end of the line are removed also.
Cerest
Posts: 2
Joined: Sun Sep 11, 2011 9:53 pm

Re: Dialog Fixing.

Post by Cerest »

Thank you,

This was what I used.

Code: Select all

(“[a-zA-Z0-9\-;: ’.?,‘]+)\r\n\r\n([a-zA-Z0-9\-;: ’.?!,‘]+”)

Code: Select all

$1$2
Post Reply