Comparing preg_replace and str_ireplace (Rude Word(s) Filter)

Filtering users input is unfortunately something that needs to be done these days. Possibly even more so when the content is coming from another website/service (such as twitter or facebook).  Swearing/rude words is quite a serious area, as you really don’t want to offend any visitors to your site by displaying a live thread showing comment or a ‘tweet’ or similar containing offensive langauge (tends to go down badly). So on this very note I set out today to add a live twitter feed to my current project (www.wsaf.co.uk), but decided once I had pulled the data from twitter I should probably remove any swearing before displaying it on the site!

As the site is written in php funnily enough this was also going to be. As I like to make my code as fast a possible I decided I would test the speed to different functions that could be used for this purpose:

preg_replace();
//and...
str_ireplace();


With both these functions you can either pass them a single word (as a string) and its replacement or an array of words to replace and their corresponding replacement.

To time the speed of each function I used microtime(true) – which provides the current unix timestamp down to the microsecond as a floating point number.

So below is my test code for str_ireplace():

$start = microtime(true); // get the start time
$replacements = array(); 
 
// I want to replace each banned word with the eqivelent number of stars
// so apple would go to: *****
foreach ($this->rude_words as $word) {
      array_push($replacements, str_pad('*', strlen($word), '*'));
}
 
// Now lets do the replacing. $this->rude_words is an array of the
// words that we want to filter out
$str = str_ireplace($this->rude_words, $replacements, $str);
 
// get the end time
$end = microtime(true);
// print how long the whole process has taken
echo "str_ireplace Time taken = " . ($end - $start);

and below is the code for testing preg_replace:

$start = microtime(true);
$replacements = array();
 
foreach ($this->rude_words as $word) {
    // The reason we want to create a string with the number of chars - 3
    // is because preg_ is a pearl regular expression function so we need to
    // pop the char '/' either side of each word plus put the letter 'i'
    // after the second '/' to make it case insenstive!
    array_push($replacements, str_pad('*', strlen($word) - 3, '*'));
}
 
$res = preg_replace($this->rude_words, $replacements, $orginalStr);
$end = microtime(true);
echo "preg Time taken = " . ($end - $start) . '';

Ok so lets get some results as see which is faster. For my test I decided to pass in a string of 52 words with half being swear words! A total of 259 characters, and as tweets can only be 140 this small string should surfice.

Results from my development PC (running windows XP with apache):

str_ireplace Time taken = 0.000275850296021
preg Time taken = 0.000153064727783
preg won!

Ran this test a load of times and pretty much every time preg was faster!
Right so lets try this on a linux box:

Results taken from a Linux machine (Debian with apache):

str_ireplace Time taken = 8.48770141602E-5
preg Time taken = 8.70227813721E-5
str_ireplace Won

Um… A slightly different story here then, once again ran the test 30 odd times and yes str_ireplace seemed faster!

To me it looks like for small strings with a smallish number of replacements (about 20 words) the two functions are fairly equal. php.net recommends using str_ireplace over preg. But to be honest I’m not sure for filtering tweets it would make much difference.

I decided to go with using str_ireplace(), below is the final class I created:

class Rude_Words {
    private $rude_words = array();
    private $replacements;
 
    private function set_up_rude_array() {
        $i = 0;
        $this->rude_words[$i++] = 'Enter';
        $this->rude_words[$i++] = 'banned';
        $this->rude_words[$i++] = 'words';
        $this->rude_words[$i++] = 'here';
    }
 
    public function __construct() {
        $this->set_up_rude_array();
        $this->replacements = array();
 
         foreach ($this->rude_words as $word) {
              array_push($this->replacements, str_pad('*', strlen($word), '*'));
         }
    }
 
    public function filter_string($str) {
         return str_ireplace($this->rude_words, $this->replacements, $str);
    }
}