How to Replace " " by "_" only in Links

Tool for Search and Replace across multiple files.
Post Reply
User avatar
malba

How to Replace " " by "_" only in Links

Post by malba »

hello

i need to update my website, we have to replace " " with "_" in all links of all html pages of our website

do you know if with TextCrawler there is a way to replace "" with "_" only in the href of the links on html pages on my website?

i try same things but always find all " " in the html pages, but we need that only find " " in the href argument of the links

thank you very much
User avatar
malba

Post by malba »

I forgot to set an example to make it more clear ...

if we have a html page with this content:

...
<a href="folder one/file name one.htm">FILE ONE</a>
...

we need that will be:

...
<a href="folder_one/file_name_one.htm">FILE ONE</a>
...

and this, for all links, in all pages of our website

thank you very much
User avatar
DV

Post by DV »

This works for your example (in Regular Expression mode)

Find: href=(.*?) (.*?) (.*?) (.*?)">
Replace: href=$1_$2_$3_$4">

You might have to vary the number of (.*?) and $? depending on the numbers of spaces. Someone who knows more than me could probably design a regex to do this...
User avatar
Fool4UAnyway

Post by Fool4UAnyway »

Would be nice to be able to process a match and then replace that processed match...

Is it possible to recursively group expressions?

Thinking of something like this for the Find:
href=(([^ ]*) )*">

But how to change those space characters in-between?
User avatar
malba

Post by malba »

hello

thank you very much for the reply.

I had to make a small change in the regular expresion because it found (match) all the code of a link and not just the href parameter only...

if you served for the future, the regular expression I used was

Find...... href="([^"]*?) ([^"]*?)"
Replace... href="$1_$2"

and the search and the replacement is performed as often as is necessary because in each execution, Is replaced the first whitespace it finds, so if there are more, I have to rerun the search and replacement

thank you very much for everything, because I'm saving a lot of time to rename files and change links automatically
User avatar
Fool4UAnyway

Post by Fool4UAnyway »

OK, it works, but I wouldn't think of this as very convenient, having to rerun the replacement a number of times.

You may speed up things by allowing the space character to be found multiple times in one pass, thereby reducing the number of runs you have to perform.

Trying this regular expression:
<a href="([^ "]*) +([^ "]+) *([^ "]*) *([^ "]*) *(([^ "]* *)*)">

Replacing by this:
<a href="$1$2$3$4$5">

Processing this text:

<a href="justatightlink">
<a href="alinkwith onespacecharacter">
<a href="alink withtwo spaces">
<a href="a link with too much spaces">
<a href="a link with a couple of spaces">

matches these:

1) <a href="alinkwith onespacecharacter">
2) <a href="alink withtwo spaces">
3) <a href="a link with too much spaces">
4) <a href="a link with a couple of spaces">

and results into this changed text:

<a href="justatightlink">
<a href="alinkwithonespacecharacter">
<a href="alinkwithtwospaces">
<a href="alinkwithtoomuch spaces">
<a href="alinkwithacouple of spaces">

Up to 4 space characters are removed in one pass. You would have to run this replacement twice.

You may insert the term " *([^ "]*)" to the regular expression as many times as you like, if you also append the same number of $ backreferences ($6$7 etc.) to the replacement expression.
User avatar
Fool4UAnyway

Post by Fool4UAnyway »

OK, to replace the space character by the underscore, you would have to insert the _ between each $ backreference:

<a href="$1_$2_$3_$4_$5">

<a href="$1_$2_$3_$4_$5_$6_$7">

etc.
Post Reply