How to delete a part of an expression ?

Tool for Search and Replace across multiple files.
patrmich
Posts: 18
Joined: Fri Jul 01, 2011 9:01 am

How to delete a part of an expression ?

Post by patrmich »

Hi,

I have a directory of html pages (each page is related to an item full description)

In several html pages are expressions having the following shape :
<li><font size="2">xxxxxxxxxx</font></li>

xxxxxxxxxx is a variable text having a length from 5 to 100 characters

I would like to delete <font size="2"> and </font>
So, my goal is to have, after deletion, the following expression :
<li>xxxxxxxxxx</li>

Does any one know how to make such task when using Text crawler ?

Thank you in advance for any help in this matter.

Patrick
User avatar
Fool4UAnyway

Re: How to delete a part of an expression ?

Post by Fool4UAnyway »

You could use back- and forward references, or whatever they are called.
I do not know the syntax for them by head.

But it is not necessarily necessary to use those.

You can just search for an expression and re-place (them by) any part(s) of the same expression.

If you are looking for excessive whitespace and want to shrink it to only one space character (or tab) you could search for \s+ and replace it by " " (without quotes, or by \t for a tab).

You can do the same for the text you want to keep.
You will have to "store" it in a variable (capture group).
You can do this by using parentheses around the expression you would like to keep.
You can refer to this capture group by $ followed by the number of its order in the full search expression.

A simple example: I have any text between two numbers and I want to keep only the text.

Find:
\d+( [^ ]*) \d+

Replace by:
$1

The (first and only) captured group includes a single space character to keep it separated from any preceding text after re-placing it.
patrmich
Posts: 18
Joined: Fri Jul 01, 2011 9:01 am

Re: How to delete a part of an expression ?

Post by patrmich »

Hi,

My goal was to :
Find :
<li><font size="2">xxxxxxxxxx</font></li>
(where xxxxxxxxxx is a variable text having a length from 5 to 100 characters)
Replace by
<li>xxxxxxxxxx</li>

I managed to reach my goal through Notepad++ as follows :
Find :
<li><font size="2">([^<]*)</font></li>
Replace
<li>\1</li>

But when using Text Crawler :
Finding the expression works
But the replacement gives :
<li>\1</li>
instead of
<li>xxxxxxxxxx</li>

Does anyone knows what would be, in Text Crawler, the replacement expression ?

Thank you in advance for any help.

Patrick
User avatar
Fool4UAnyway

Re: How to delete a part of an expression ?

Post by Fool4UAnyway »

Yup, in Text Crawler the placeholder sign is $ instead of \, see my example above.

So you would replace your matches by:
<li>$1</li>

If your goal is to delete (all) <font> opening and closing tags (only), you could also try this.

Find:
</?font( size="d+")?>

Replace by: (leave empty)
User avatar
Fool4UAnyway

Re: How to delete a part of an expression ?

Post by Fool4UAnyway »

Find:
</?font( size="\d+")?>

Replace by: (leave empty)

\d for the digit(s), of course.
patrmich
Posts: 18
Joined: Fri Jul 01, 2011 9:01 am

Re: How to delete a part of an expression ?

Post by patrmich »

Hi,

Thank you very much for your help.
The following works very well :
replacing
<li><font size="2">([^<]*)</font></li>
by
<li>$1</li>

I have another task to do :

I have a directory of html pages (each page is related to an item full description)

In several html pages are expressions having the following shape :
class="auto-stylexx" (with a space at the beginning of the expression
where xx can vary from 1 to 30

My goal is to replace
class="auto-stylexx" (there is a space before class="auto-stylexx")
by
nothing
In other words, my goal is to delete all such expressions

What would be the regex formula to do that ?

Thank you in advance for nay help in this matter

Patrick
User avatar
Fool4UAnyway

Re: How to delete a part of an expression ?

Post by Fool4UAnyway »

Just a quick guidance: you search for the text that is constant and write specific regex "code" for the variable part.

You can:
- look for any text between "": try to hit anything up to and then including the second ".
- specify the length of the text between "": use {shortest, longest} for the variable part
Post Reply