Extract pages

A place to try and solve your RegEx problems.
Post Reply
User avatar
Juergen

Extract pages

Post by Juergen »

I have a text file, a book where each page is separated by a common line. What RegEx should I use if If I wanted to extract only certain pages.

The common line is, lets say:
--Page xx--
where x stands for page number

A sample file, typically, would be:
...
--Page 11--
some text 11
more text 12
--Page 12--
some text 21
more text 22
--Page 13--
some text 31
more text 32
--Page 14--
some text 41
more text 42
--Page 15--
some text 51
more text 52
--Page 16--
some text 61
more text 62
...

Lets say I would want to extract pages 12, 14, 15 to end up:
--Page 12--
some text 21
more text 22
--Page 14--
some text 41
more text 42
--Page 15--
some text 51
more text 52

What RegEx would to the trick? What would the RegEx look like if I would want all pages except pages 12, 14, 15?

Thanks for any help.
Juergen
User avatar
DV

Post by DV »

Page 12[\s\S]*Page 15 would grab pages 12-15.

All pages but 12-15 might involve a negated character class - I haven't managed to work it out yet...
User avatar
Juergen

Post by Juergen »

Thanks for your effort, but not quite what I had in mind. That's the reason I gave a consecutive and non-consecutive page range. I don't want page 13, but do want page 15.

Before asking for help I had this regex:
(--Page )([12|14|15])([\s\S]*?)(--Page)
but that got me page 16 instead of 15.
Something needs to be done to the last expression, because it is used twice, once on stop page 14 and then on start 15.
Post Reply