Continuing the theme of work with strings encoded in UTF-8, we shall consider some more functions (utf8_strpos and utf8_substr_count), working without Multibyte String Functions extension:
function utf8_strpos($haystack, $needle, $offset = 0)
{
# get substring (if isset offset param)
$offset = ($offset<0) ? 0 : $offset;
if ($offset>0)
{
preg_match('/^.{' . $offset . '}(.*)/us', $haystack, $dummy);
$haystack = (isset($dummy[1])) ? $dummy[1] : '';
}
# get relative pos
$p = strpos($haystack, $needle);
if ($haystack=='' or $p===false) return false;
$r = $offset;
$i = 0;
# calc real pos
while($i<$p)
{
if (ord($haystack[$i])<128)
{
# ascii symbol
$i = $i + 1;
}
else
{
# non-ascii symbol with variable length
# (handling first byte)
$bvalue = decbin(ord($haystack[$i]));
$i = $i + strlen(preg_replace('/^(1+)(.+)$/', '\1', $bvalue));
}
$r++;
}
return $r;
}
function utf8_substr_count($h, $n)
{
# preparing $n for using in reg. ex.
$n = preg_quote($n, '/');
# select all matches
preg_match_all('/' . $n . '/u', $h, $dummy);
return count($dummy[0]);
}
See also: PHP & UTF-8. Chapter 1.
03/19/2006.
Last projects: Contact lens, Ekaterinburg