is this possible?

A place to try and solve your RegEx problems.
Post Reply
User avatar
mashhood

is this possible?

Post by mashhood »

thanks a lot in advance
i am doing a project,in it i am using both Arabic and English text simultaneously in my text file,let me give you an example

>>>my name is &#1575;&#1587;&#1604;&#1605; and i belong to &#1593;&#1585;&#1576;&#1740; &#1586;&#1576;&#1575;&#1606; Arabic language and i am very &#1605;&#1587;&#1585;&#1608;&#1585; happy.<<<

the above text using English as well Arabic words.

now my issue is that at first i want to highlight only Arabic words in my text file, so make Arabic words bold or colored or change fonts and then highlight English words to do same operations on English words.


is this possible by any regular expression to do this???
kindly guide.

i would be thankful

User avatar
Frank

Post by Frank »

I'm not sure if mashhood is still around, but it can be done with zero-width assertions for lookarounds:
http://www.regular-expressions.info/lookaround.html
These are implemented in TextCrawler but not in many other RegEx tools! A big deal!

The following may have bugs, but that's what I got in a playtime...

Bold Arabic:

replace (?<!&#\d\d\d\d;[ ,.\-?!"]*)(&#\d\d\d\d;)
"htmlNo not preceded by (htmlNo or htmlNo+separators)"
by <b>$1

replace (&#\d\d\d\d;)(?![ ,.\-?!"]*&#\d\d\d\d;)
"htmlNo not followed by (htmlNo or separators+htmlNo)"
by $1</b>


Bold English:

replace ((&#\d\d\d\d;[ ,.\-?!"]*)(?![ ,.\-?!"]*&#\d\d\d\d;)
"htmlNo + separators not followed by (separators+)htmlNo"
by $1<b>

replace (?<!&#\d\d\d\d;[ ,.\-?!"]*)([ ,.\-?!"]*&#\d\d\d\d;)
"separators + htmlNo not preceded by (htmlNo or htmlNo+separators)"
by </b>$1

replace ^(\s*)([^\s&]|&(?!#\d\d\d\d;))
"first non-separator at the beginning that is not the start of a htmlNo"
by $1<b>$2

replace ^(\s*)(\S)
"first non-space at the beginning" (could be from a redundant </b>)
by $1<b>$2

replace (\S)(\s*)$
"last non-space at the end" (could be from a redundant <b>)
by $1</b>$2
Post Reply