Extracting Text from files

Tool for Search and Replace across multiple files.
Post Reply
User avatar
Shawn

Extracting Text from files

Post by Shawn »

How would I be able to have a starting point and ending point and grab all in beetween? Example : <noscript> Wanted Information </noscript> ...

Thanks again for making such a great program.
User avatar
Shawn

Post by Shawn »

Well I'm guessing this is something that can't be done with Text Crawler I'll try to find a different solution / program I have over 1,000,000 files to deal with... I'll try with a macro recorder...

Good Day to you all...
User avatar
dv2

Post by dv2 »

You can do this with TC - sorry for the slow response!

In 'Regular expressions' mode try
<noscript>[\s\S]*</noscript>

Then 'extract'. This will grab everything between (and including) the tags. You can always remove the tags afterwards with another search/replace op.
User avatar
Shawn

Post by Shawn »

My Appologies for my impatience I just wanted to get the files done as soon as possible due that the data we're recipes going to be used for a community kitchen program (To get people to eat different meals without busting a budjet)

Thanks again.

User avatar
Fool4UAnyway

Post by Fool4UAnyway »

A non-regex solution (for the macro) could be like this:

1. Search for the opening tag and put a linebreak character _before_ it.
2. Search for the closing tag.
3. Remove any linebreak characters between the two tags. Perform a replace on a selection from the closing tag to the opening tag.
4. Add a linebreak character _after_ the closing tag.

Now, all wanted information should be on individual lines, starting with the opening tag and ending with the closing tag.

5. Sort the lines (hey, there are a million...)
6. Keep the lines starting with the opening tag. They are sorted then.

If you do not want the lines to be sorted, you could perform a regex replace emptying all lines that do not contain the noscript tag.
User avatar
Shawn

Post by Shawn »

@Fool4UAnyway

Thanks for the Macro tip however placing all the items on one line wouldn't work with this table as the items are all in order I just had WAY too many junk lines in the files due to crappy exportation in the first place.

P.S: The Database is done.. Thank you all for your time.
User avatar
Ray

Post by Ray »

I'm using TC 2.0. I've got a bunch of ANSI text files that all begin with "FROM: [username]," where username varies. For example, one is an email whose first line is "FROM: Joe." I want to extract just the username. When I use "FROM: [\s\S]*\n" in Regular Expression mode and then click Find, it highlights the entire contents of the file in yellow. Isn't \n the right way to indicate that I want everything up to the next newline? Same thing if I use \r instead. ... Looks like a great program, my first time using ... wish me luck!
Post Reply