Creating Extracts with REGEX

By Oli on Saturday, 03rd February 2007. More information. Comments.

If you make your own content management system like I have for ThePCSpy, chances are, somewhere along the line, you will want to create extracts so that you can show introductory paragraphs of things outside its original scope. One example is an RSS feed where you want to give your users

If you make your own content management system like I have for ThePCSpy, chances are, somewhere along the line, you will want to create extracts so that you can show introductory paragraphs of things outside its original scope.

One example is an RSS feed where you want to give your users a sneak-peek of the full thing. Another is sending trackbacks or pingbacks to other blogs when you mention them.

Using Regular Expressions, we're going to take a chunk of the following text that we can use as we like:

<strong>Hello!</strong> My name is <em>Oli</em> and <br/ >
I love programming <em>Regular Expressions</em>.

Step 1 - Nuke the HTML

This is a simple REGEX Replace that matches anything that could be construed as an HTML tag:

<[^>]*>

And just replace all matches of that with an empty string.

This could be expanded to rip out the contents of headers tags, so if your text starts with a header, that isn't part of the extract.

Step 2 - Extract your chunk

For this example we're going to extract the first 35 characters of our HTML-less using this another simple REGEX:

^(.{0,35})

Outputs:

Hello! My name is Oli and I like pr

The problem being, we've cut a word in half. That just looks silly and it's not going to make any sense to people. By specifying we'd like the first 35 characters plus all the characters up til the next space, we ensure that no words are broken:

^(.{0,35}[^\s]*)

And there we have it. From one long string we now output:

Hello! My name is Oli and I like programming
Grav

Written by Oli on Saturday, 03 February 2007. Tagged with regex, programming. Read 1780 times. If you liked it, please give it a digg.

Don't just sit there like a lemon! Reply!

Got something to say? Now's the time to share it with the author and everybody else that reads this posting! Lemons need not apply.

edtBOX - xHTML: yes - bbcode:no
Home | Advertise | About | Contact | Legal © Oli Warner 2001—2007 Proud 9rules member