Page 1 of 1

How to extract all URLs from a text file?

Posted: Sun Jan 03, 2016 11:01 am
by rsdguru
Hi, I'm Peter and a new member here.

Who can help me out?

I need to extract all the urls from a simple text file into a new file, every url in a new line.
The urls may have all different formats...

(Is this possible with the free version of textcrawler?)

thx in advance

Re: How to extract all URLs from a text file?

Posted: Tue Jan 05, 2016 1:57 pm
by DigitalVolcano
You need to use a regular expression to find urls.

Such as:

Code: Select all

\b((ftp|https?)://[-\w]+(\.\w[-\w]*)+|(?:[a-z0-9](?:[-a-z0-9]*[a-z0-9])?\.)+(?: com\b|edu\b|biz\b|gov\b|in(?:t|fo)\b|mil\b|net\b|org\b|[a-z][a-z]\b))(\:\d+)?(/[^.!,?;"'<>()\[\]{}\s\x7F-\xFF]*(?:[.!,?]+[^.!,?;"'<>()\[\]{}\s\x7F-\xFF]+)*)?
(This may not be perfect for your needs)


You can then click extract. Unfortunately the Extract function is only in the Pro version.