Getting the substrings between given delimiters

One of the most common and overused type of operation a web developer needs to perform is no doubt string manipulation. Very often strings need to be trimmed and chopped, cleaned of unwanted characters, chunk split into little pieces, glued back in a different order, encrypted and decoded back and forth, searched through, reversed, converted or printed and the list never ends.

Fortunately for the PHP lovers these things are not quite as complicated as they seem because PHP comes from their makers with a great set of string manipulation functions. Most things you would want to do to a string (no matter how kinky you are) can be easily done with one or a simple combination of the standard functions. Of course the combination might not be so obvious to all of us.

For today I’d like to share with you a little piece of code that helped me find and extract fast all the substrings placed between given delimiters.

First I needed to build a function to return the first occurrence of a string placed between given separators and starting from a specific offset:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
function get_between($haystack, $start_needle, $stop_needle, $init_pos)
	{
	$between='';
 
	//position of the first delimitor
	$start_pos = strpos($haystack, $start_needle, $init_pos);
 
	//position of the wanted string
	$between_pos = $start_pos + strlen($start_needle);
 
	//position of the second delimitor
	$stop_pos = strpos($haystack, $stop_needle, $between_pos);
 
	//string we wanted
	if ($start_pos>0)
		$between = substr($haystack, $between_pos, $stop_pos - $between_pos);
	return $between;
	}

And now I’ll show you how to use it to extract – for example – all images in a forum post placed between [img] [/img] delimiters, given all tags were correctly nested with no overlapping. Or you could use it to extract all URLs from the text placed between <a href=” and the next . The following example is barbaric but very easy to understand.

19
20
21
22
23
24
25
26
27
28
29
30
31
32
$data=file_get_contents('data.txt');
$init_pos=0;
$wanted_strings=array();
while (1)
	{
	$token=trim(get_between($data, '[img]', '[/img]', $init_pos));
	if (strlen($token)&gt;0)
		{
		$wanted_strings[]=$token;
		$init_pos=strpos($data, $token, $init_pos+strlen($token));
		}
	else
		break;
	}

You may need to work it out to fit your specific needs and you may also need to add some basic validations at least.

As a conclusion I’d like to say I suspect we’ll be spending a noticeable amount of time talking about strings manipulation.

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">

Categories

Archives

Calendar

February 2012
M T W T F S S
« May    
 12345
6789101112
13141516171819
20212223242526
272829