what is wrong wiht this regex?

A place to try and solve your RegEx problems.
Post Reply
User avatar
mashhood

what is wrong wiht this regex?

Post by mashhood »

i used this regex to split a text into given number of words.here i used odd number of words(5) per line.

here it is>> (([^ ]+ ?){4})
and replaced by>> $1\r\n
now i got lines with odd number (5)of words as i wanted look>>
میرا نام مشہود ھے۔اپ
کا نام کیا ھے۔کل
ملیں گے۔تم میرے بھای
ھو۔چل یار بس
کر۔
(this is Arabic character text)

u can see last line got 4 words but above lines got 5 words in each line,
why last line do not have 5 words rather 4 words and let 5th word in next line.
kindly help
thanks in advance for help
User avatar
mashhood

Post by mashhood »

sorry about above text it was in Arabic but the forum do not support Unicode so here i give another English text to elaborate my problem

for example text is like this

word word word word word
word word word word word
word word word word word
word word word word word
word word word word
word
the last line got 4 words in my Arabic text and 5th word went to next line but in all above lines it got perfectly 5 words per line as i demanded,so what is wrong please guide
my regex was >>> (([^ ]+ ?){4})

replaced by>>> $1\r\n

thanks for help in advance
User avatar
Fool4UAnyway

Post by Fool4UAnyway »

I totally missed this one, here. So here's the solution.

>> "
i am using your regex in TextCrawler 2.*
regex is as u told me on help forum (notepad++ forum)

>>>>(([^ ]+ ?){4}) and replaced by>>>$1\r\n

issue is if i put 5or 4 it get 5 words per line and 4 per line respectively but if i put 7 or 9 it still split text 5 words per line.

kindly guide

not just a formality but i am really thankful for you previous and future help.
yours Mashhood
" <<

My Arabic reading skills are not that well and Text Crawler 2 doesn't show the text in Unicode as well, so I decided to use the text in your mail to work on.

I tried to get 7 words per line, but I didn't succeed at first. I noticed that on the Matches (Text) tab page of the Regular Expressoin Tester some (sequences of) words were skipped. This turned out to be caused by multiple consecutive space characters occurring around those words.

So I improved my regex as below:

(([^ \r\n]+ *){7})

_ *_ means: any consecutive string of space characters, though none are allowed as well

I also included the newline characters \r and \n to be excluded as well, because Text Crawler 2 is looking for matches across (new)line boundaries. This led to inserting a newline character after each string of 7 words globally, which meant some (new) lines would be broken after only the first word.

Here's the correct Full Text with Replacements:

i am using your regex in TextCrawler
2.*
regex is as u told me on
help forum (notepad++ forum)


>>>>(([^ ]+ ?){4}) and replaced by>>>$1\r\n


issue is if i put 5or 4
it get 5 words per line and
4 per line respectively but if i
put 7 or 9 it still split
text 5 words per line.


kindly guide


not just a formality but i am
really thankful for you previous and future
help.
yours Mashhood

As you can see: there are no more than the maximum number of 7 words on each line and, if applicable, the counting starts anew with each real new line of the input text. Try the regex without \r\n to see the difference: if think you will also prefer the "absolute" (per line) count instead of the "global" count.

Notice also that you cannot combine multiple lines to form strings of 7 words, if the original lines each contain less than 7 words. You could, however, put all individual lines after each other instead of below each other, if you would want this, and then apply the word break regex after that.

In Notepad++'s regular expression mode you cannot search across newline characters and you already had to enter the * asterisk behind the word breaking space character, because the ? question mark is not implemented in Notepad++'s (Scintilla's) regular expression engine.

I understand this solved your problem, at least, for now.
Post Reply