Page 1 of 1

First attempt doesn't seem to work

Posted: Tue May 28, 2013 7:09 pm
by SafeTex
Hello

I'm not very good on regular expressions but someone has given me one that works in a software called X Bench

<([[:digit:][:letter:]]+_)+[[:digit:][:letter:]]+>

It should find reference numbers like

ABC_123_DEFG-4567

etc of any length

The problem is that when I try to run it in TC (which I'm not really familiar with either), the search only takes a few seconds (instead of a few hours) on a txt file which is VERY big and there are 0 founds (there are just over 121,000 in Xbench)

The reason I can't use x bench is that it doesn't have an extract function and I need my founds to be extracted to a txt file

I've tried testing the expression in TC's tester but that doesn't seem to work either for me

This might be no problem with the regex but that I'm using TC wrongly (I've indicated the path and put the expression in the box and asked it to 'extract'

Can anyone help please

Thanks in advance

Re: First attempt doesn't seem to work

Posted: Wed May 29, 2013 7:27 pm
by SafeTex
Hello again

No hero to come to help me? :(

Ok, let's start with something simple

Why does [:digit:] find all the letters D I G I T in my file and not the numbers 1-9 ? (no surprise then that my regular expression does not work :D

What flavour of regular expressions should I try to use in Text Crawler?

Is there a manual?

Is there anyone there please?

Regards

Re: First attempt doesn't seem to work

Posted: Sat Jun 01, 2013 10:33 pm
by Steve_L
Hi SafeTex...I a new to regular expressions and TextCrawler too......but did you try this?

\D+\d+\D+\d+

\D matches a character that is not a digit
\D+ matches more than one character that is not a digit

\d matches any single digit
\d+ matches one or more digits

Re: First attempt doesn't seem to work

Posted: Wed Jun 05, 2013 8:45 pm
by SafeTex
Hello Steve L

Thanks for the possible answer but I eventually figured out for myself how to simplify the original expression for Text Crawler

I used [0-9] cos the engine does not seem to recognise [:digit:] or any other set for that matter.

Regards

SafeTex