Removing numbers from a list of URLS

Fool4UAnyway · Post by **Fool4UAnyway** » Tue Oct 19, 2010 9:55 pm

Find:
^\d+ :

Replace: (leave empty)

^_____ You want to "match" the start of a line only
\d ___ Matches a digit 0 to 9
+ ____ requires at least one occurrence, but accepts each (consecutive as well)
\d+ __ So this matches complete (integer) numbers
_____ there is a space character
: ____ before and after the colon
_____ don't forget the second space character

You might as well use a shortcut.

Find:
^[^h]+

Replace: (leave empty)

[] ____ groups characters to match
[^] ___ groups characters not to match
[^h] __ accept any character that is not h
[^h]+ _ accept any string of consecutive characters that are not h, i.e. accept anything up to the first h, or just anything if there is no h at all

Devin · Post by **Devin** » Tue Oct 19, 2010 10:05 pm

Wow thanks! I was reading some tutorials on reg ex but had a hard time understanding it. Thanks for breaking it down for me. Bookmarked!

Devin · Post by **Devin** » Tue Oct 19, 2010 10:49 pm

For some reason I can't get it to work right. It's only deleting the first number.

Would you mind downloading the text file and trying it for yourself? I uploaded to rapidshare here: http://rapidshare.com/files/426054731/filter-test.txt

Fool4UAnyway · Post by **Fool4UAnyway** » Tue Oct 19, 2010 11:08 pm

What regex are you using?

"^\d+ : "
"^[^h]+"

You can try in the Regex Tester

Devin · Post by **Devin** » Tue Oct 19, 2010 11:14 pm

I've tried both regex and they don't seem to work for some reason. I have no idea why .. its only working for the first line in my .txt file. All the other lines are in the exact same format and it goes to 100 lines. No idea why its only picking the first. Thats why i was asking if you could grab that .txt from rapidshare and test it yourself. I'm not sure if the formatting is weird in the .txt or what.

Fool4UAnyway · Post by **Fool4UAnyway** » Tue Oct 19, 2010 11:26 pm

Works for me. File opened in Reg Ex Tester, used both regexes.

Try changing them a little and see what happens.
Perhaps try without first ^ first, or jusat \d, \d+ or [^h]+.

Devin · Post by **Devin** » Tue Oct 19, 2010 11:29 pm

Ok it worked without using the ^

Woo hoo thanks. Big help!

Fool4UAnyway · Post by **Fool4UAnyway** » Tue Oct 19, 2010 11:30 pm

Try to get anything working that starts with ^, try ^. and see what happens or ^.{5} and try .{10}$

Fool4UAnyway · Post by **Fool4UAnyway** » Tue Oct 19, 2010 11:38 pm

My fault, I am still using Text Crawler 1.1.4.

Text Crawler 2.0 has a new .NET Regex Engine:
http://www.digitalvolcano.co.uk/content ... ler/manual

Position Matching:

^ Only match the beginning of a file.
$ Only match the ending of a file.

\b Matches any word boundary
\B Matches any non-word boundary