Hello, I have a folder called "Movie" and underneath that folder I have category folders like "Comedy", "Drama", etc.. Inside those folder I have individual movie folders like "Caddy Shack".
So the directory structure is: Movie\Comedy\Caddy Shack\*.*
Inside each individual movie folder is an XML file with a ton of info in it. The main content I want to modify is the "Genre". The reason is most movies have several genre's but not all main genre's are listed 1st or 2nd which is how my media player picks up this text and categorizes it by itself.
So the section in each *.xml file looks exactly like this (sometimes there are up to 5 or 6 genre's):
<genre>
<name>Comedy</name>
<name>Action</name>
</genre>
What I want to do is look for the <genre> heading in each file automatically inside each category directory (Comedy - all main and sub folders) and replace everything in-between <genre> and </genre> with 1entry which for our example above could be <name>Comedy</name>
The Genre I want to have as 1 entry may already be there and it may not. Everything else should be gone in between the Genre start and finish tags except the Genre specified in text crawler. In addition, the <genre> start tag is not always on the same line in every file depending upon how much data is before it.
How can I setup text crawler to do this??
Thanks!
Find & Replace All Genre in XML file
- DigitalVolcano
- Site Admin
- Posts: 1863
- Joined: Thu Jun 09, 2011 10:04 am
Re: Find & Replace All Genre in XML file
I think this is what you want.
In the Regular Expression tab, with 'dot matches newline' selected.
Regex:
Replace :
BEFORE:
<genre>
<name>Comedy</name>
<name>Action</name>
</genre>
<test>
blah
</test>
<genre>
<name>bfabfababf</name>
</genre>
AFTER:
<genre>
<name>Comedy</name>
</genre>
<test>
blah
</test>
<genre>
<name>Comedy</name>
</genre>
In the Regular Expression tab, with 'dot matches newline' selected.
Regex:
Code: Select all
<genre>.*?</genre>
Code: Select all
<genre><name>Comedy</name></genre>
BEFORE:
<genre>
<name>Comedy</name>
<name>Action</name>
</genre>
<test>
blah
</test>
<genre>
<name>bfabfababf</name>
</genre>
AFTER:
<genre>
<name>Comedy</name>
</genre>
<test>
blah
</test>
<genre>
<name>Comedy</name>
</genre>
Re: Find & Replace All Genre in XML file
Thank you for your quick reply! I am still a bit confused by the BEFORE/AFTER output example. The reason is that in the *.XML files I have in each sub-directory the organization of text seems to be the same in every file (not the number of entries per tag but just the organization).
1. I say this as I'm not sure why there are 2 <genre>/</genre> tags in each example as there is only 1 at top and 1 at bottom.
2. Also, the 2 <test>/</test> tags in the example what does that do? Does text crawler input those?? OR is that just showing that other stuff may be in there.
3. Also as your example also shows <name>Comedy</name> twice is that to mean that the current way of doing it will
Thx again!
Barry
1. I say this as I'm not sure why there are 2 <genre>/</genre> tags in each example as there is only 1 at top and 1 at bottom.
2. Also, the 2 <test>/</test> tags in the example what does that do? Does text crawler input those?? OR is that just showing that other stuff may be in there.
3. Also as your example also shows <name>Comedy</name> twice is that to mean that the current way of doing it will
Thx again!
Barry
Re: Find & Replace All Genre in XML file
Hello,
Ok.. I tested and after also adding *.xml to the file filter under "input" and putting in your syntax I was able to get great results. So thank you! I still have a couple questions:
1. Is it ok that the open close tag is all on the same line? Or is there a way to force TextCrawler at a specific point to place a carriage return? The output I got was like this: <genre><name>Comedy</name></genre>
All XML files I have were organized like this:
<genre>
<name>Comedy</name>
</genre>
2. Is there a way to leave other genre entries and just append the #1 spot? AND possibly delete any subsequent duplicate genre entries?
for example: If I add <name>Comedy</name>
AND the current file looks like this:
BEFORE
<genre>
<name>Action</name>
<name>Comedy</name>
<name>Sci-Fi</name>
</genre>
AFTER
<genre>
<name>Comedy</name>
<name>Action</name>
<name>Sci-Fi</name>
</genre>
3. Also, is there a way on the replace to specify 2 or more genres?
Thanks Again!!
Ok.. I tested and after also adding *.xml to the file filter under "input" and putting in your syntax I was able to get great results. So thank you! I still have a couple questions:
1. Is it ok that the open close tag is all on the same line? Or is there a way to force TextCrawler at a specific point to place a carriage return? The output I got was like this: <genre><name>Comedy</name></genre>
All XML files I have were organized like this:
<genre>
<name>Comedy</name>
</genre>
2. Is there a way to leave other genre entries and just append the #1 spot? AND possibly delete any subsequent duplicate genre entries?
for example: If I add <name>Comedy</name>
AND the current file looks like this:
BEFORE
<genre>
<name>Action</name>
<name>Comedy</name>
<name>Sci-Fi</name>
</genre>
AFTER
<genre>
<name>Comedy</name>
<name>Action</name>
<name>Sci-Fi</name>
</genre>
3. Also, is there a way on the replace to specify 2 or more genres?
Thanks Again!!