is this possible?

mashhood · Post by **mashhood** » Sat Nov 20, 2010 1:38 am

thanks a lot in advance
i am doing a project,in it i am using both Arabic and English text simultaneously in my text file,let me give you an example

>>>my name is اسلم and i belong to عربی زبان Arabic language and i am very مسرور happy.<<<

the above text using English as well Arabic words.

now my issue is that at first i want to highlight only Arabic words in my text file, so make Arabic words bold or colored or change fonts and then highlight English words to do same operations on English words.

is this possible by any regular expression to do this???
kindly guide.

i would be thankful

Frank · Post by **Frank** » Sun Apr 17, 2011 9:48 pm

I'm not sure if mashhood is still around, but it can be done with zero-width assertions for lookarounds:
http://www.regular-expressions.info/lookaround.html
These are implemented in TextCrawler but not in many other RegEx tools! A big deal!

The following may have bugs, but that's what I got in a playtime...

Bold Arabic:

replace (?<!&#\d\d\d\d;[ ,.\-?!"]*)(&#\d\d\d\d;)
"htmlNo not preceded by (htmlNo or htmlNo+separators)"
by $1

replace (&#\d\d\d\d;)(?![ ,.\-?!"]*&#\d\d\d\d;)
"htmlNo not followed by (htmlNo or separators+htmlNo)"
by $1

Bold English:

replace ((&#\d\d\d\d;[ ,.\-?!"]*)(?![ ,.\-?!"]*&#\d\d\d\d;)
"htmlNo + separators not followed by (separators+)htmlNo"
by $1

replace (?<!&#\d\d\d\d;[ ,.\-?!"]*)([ ,.\-?!"]*&#\d\d\d\d;)
"separators + htmlNo not preceded by (htmlNo or htmlNo+separators)"
by $1

replace ^(\s*)([^\s&]|&(?!#\d\d\d\d;))
"first non-separator at the beginning that is not the start of a htmlNo"
by $1$2

replace ^(\s*)(\S)
"first non-space at the beginning" (could be from a redundant )
by $1$2

replace (\S)(\s*)$
"last non-space at the end" (could be from a redundant )
by $1$2