Hi,
I have a directory of html pages (each page is related to an item full description)
In several html pages are expressions having the following shape :
<li><font size="2">xxxxxxxxxx</font></li>
xxxxxxxxxx is a variable text having a length from 5 to 100 characters
I would like to delete <font size="2"> and </font>
So, my goal is to have, after deletion, the following expression :
<li>xxxxxxxxxx</li>
Does any one know how to make such task when using Text crawler ?
Thank you in advance for any help in this matter.
Patrick
How to delete a part of an expression ?
Re: How to delete a part of an expression ?
You could use back- and forward references, or whatever they are called.
I do not know the syntax for them by head.
But it is not necessarily necessary to use those.
You can just search for an expression and re-place (them by) any part(s) of the same expression.
If you are looking for excessive whitespace and want to shrink it to only one space character (or tab) you could search for \s+ and replace it by " " (without quotes, or by \t for a tab).
You can do the same for the text you want to keep.
You will have to "store" it in a variable (capture group).
You can do this by using parentheses around the expression you would like to keep.
You can refer to this capture group by $ followed by the number of its order in the full search expression.
A simple example: I have any text between two numbers and I want to keep only the text.
Find:
\d+( [^ ]*) \d+
Replace by:
$1
The (first and only) captured group includes a single space character to keep it separated from any preceding text after re-placing it.
I do not know the syntax for them by head.
But it is not necessarily necessary to use those.
You can just search for an expression and re-place (them by) any part(s) of the same expression.
If you are looking for excessive whitespace and want to shrink it to only one space character (or tab) you could search for \s+ and replace it by " " (without quotes, or by \t for a tab).
You can do the same for the text you want to keep.
You will have to "store" it in a variable (capture group).
You can do this by using parentheses around the expression you would like to keep.
You can refer to this capture group by $ followed by the number of its order in the full search expression.
A simple example: I have any text between two numbers and I want to keep only the text.
Find:
\d+( [^ ]*) \d+
Replace by:
$1
The (first and only) captured group includes a single space character to keep it separated from any preceding text after re-placing it.
Re: How to delete a part of an expression ?
Hi,
My goal was to :
Find :
<li><font size="2">xxxxxxxxxx</font></li>
(where xxxxxxxxxx is a variable text having a length from 5 to 100 characters)
Replace by
<li>xxxxxxxxxx</li>
I managed to reach my goal through Notepad++ as follows :
Find :
<li><font size="2">([^<]*)</font></li>
Replace
<li>\1</li>
But when using Text Crawler :
Finding the expression works
But the replacement gives :
<li>\1</li>
instead of
<li>xxxxxxxxxx</li>
Does anyone knows what would be, in Text Crawler, the replacement expression ?
Thank you in advance for any help.
Patrick
My goal was to :
Find :
<li><font size="2">xxxxxxxxxx</font></li>
(where xxxxxxxxxx is a variable text having a length from 5 to 100 characters)
Replace by
<li>xxxxxxxxxx</li>
I managed to reach my goal through Notepad++ as follows :
Find :
<li><font size="2">([^<]*)</font></li>
Replace
<li>\1</li>
But when using Text Crawler :
Finding the expression works
But the replacement gives :
<li>\1</li>
instead of
<li>xxxxxxxxxx</li>
Does anyone knows what would be, in Text Crawler, the replacement expression ?
Thank you in advance for any help.
Patrick
Re: How to delete a part of an expression ?
Yup, in Text Crawler the placeholder sign is $ instead of \, see my example above.
So you would replace your matches by:
<li>$1</li>
If your goal is to delete (all) <font> opening and closing tags (only), you could also try this.
Find:
</?font( size="d+")?>
Replace by: (leave empty)
So you would replace your matches by:
<li>$1</li>
If your goal is to delete (all) <font> opening and closing tags (only), you could also try this.
Find:
</?font( size="d+")?>
Replace by: (leave empty)
Re: How to delete a part of an expression ?
Find:
</?font( size="\d+")?>
Replace by: (leave empty)
\d for the digit(s), of course.
</?font( size="\d+")?>
Replace by: (leave empty)
\d for the digit(s), of course.
Re: How to delete a part of an expression ?
Hi,
Thank you very much for your help.
The following works very well :
replacing
<li><font size="2">([^<]*)</font></li>
by
<li>$1</li>
I have another task to do :
I have a directory of html pages (each page is related to an item full description)
In several html pages are expressions having the following shape :
class="auto-stylexx" (with a space at the beginning of the expression
where xx can vary from 1 to 30
My goal is to replace
class="auto-stylexx" (there is a space before class="auto-stylexx")
by
nothing
In other words, my goal is to delete all such expressions
What would be the regex formula to do that ?
Thank you in advance for nay help in this matter
Patrick
Thank you very much for your help.
The following works very well :
replacing
<li><font size="2">([^<]*)</font></li>
by
<li>$1</li>
I have another task to do :
I have a directory of html pages (each page is related to an item full description)
In several html pages are expressions having the following shape :
class="auto-stylexx" (with a space at the beginning of the expression
where xx can vary from 1 to 30
My goal is to replace
class="auto-stylexx" (there is a space before class="auto-stylexx")
by
nothing
In other words, my goal is to delete all such expressions
What would be the regex formula to do that ?
Thank you in advance for nay help in this matter
Patrick
Re: How to delete a part of an expression ?
Just a quick guidance: you search for the text that is constant and write specific regex "code" for the variable part.
You can:
- look for any text between "": try to hit anything up to and then including the second ".
- specify the length of the text between "": use {shortest, longest} for the variable part
You can:
- look for any text between "": try to hit anything up to and then including the second ".
- specify the length of the text between "": use {shortest, longest} for the variable part