Page 1 of 1

Removing numbers from a list of URLS

Posted: Tue Oct 19, 2010 9:55 pm
by Fool4UAnyway
Find:
^\d+ :

Replace: (leave empty)

^_____ You want to "match" the start of a line only
\d ___ Matches a digit 0 to 9
+ ____ requires at least one occurrence, but accepts each (consecutive as well)
\d+ __ So this matches complete (integer) numbers
_____ there is a space character
: ____ before and after the colon
_____ don't forget the second space character

You might as well use a shortcut.

Find:
^[^h]+

Replace: (leave empty)

[] ____ groups characters to match
[^] ___ groups characters not to match
[^h] __ accept any character that is not h
[^h]+ _ accept any string of consecutive characters that are not h, i.e. accept anything up to the first h, or just anything if there is no h at all

Posted: Tue Oct 19, 2010 10:05 pm
by Devin
Wow thanks! I was reading some tutorials on reg ex but had a hard time understanding it. Thanks for breaking it down for me. Bookmarked!

Posted: Tue Oct 19, 2010 10:49 pm
by Devin
For some reason I can't get it to work right. It's only deleting the first number.

Would you mind downloading the text file and trying it for yourself? I uploaded to rapidshare here: http://rapidshare.com/files/426054731/filter-test.txt

Posted: Tue Oct 19, 2010 11:08 pm
by Fool4UAnyway
What regex are you using?

"^\d+ : "
"^[^h]+"

You can try in the Regex Tester

Posted: Tue Oct 19, 2010 11:14 pm
by Devin
I've tried both regex and they don't seem to work for some reason. I have no idea why .. its only working for the first line in my .txt file. All the other lines are in the exact same format and it goes to 100 lines. No idea why its only picking the first. Thats why i was asking if you could grab that .txt from rapidshare and test it yourself. I'm not sure if the formatting is weird in the .txt or what.

Posted: Tue Oct 19, 2010 11:26 pm
by Fool4UAnyway
Works for me. File opened in Reg Ex Tester, used both regexes.

Try changing them a little and see what happens.
Perhaps try without first ^ first, or jusat \d, \d+ or [^h]+.

Posted: Tue Oct 19, 2010 11:29 pm
by Devin
Ok it worked without using the ^

Woo hoo thanks. Big help!

Posted: Tue Oct 19, 2010 11:30 pm
by Fool4UAnyway
Try to get anything working that starts with ^, try ^. and see what happens or ^.{5} and try .{10}$

Posted: Tue Oct 19, 2010 11:38 pm
by Fool4UAnyway
My fault, I am still using Text Crawler 1.1.4.

Text Crawler 2.0 has a new .NET Regex Engine:
http://www.digitalvolcano.co.uk/content ... ler/manual

Position Matching:

^ Only match the beginning of a file.
$ Only match the ending of a file.

\b Matches any word boundary
\B Matches any non-word boundary