Testimonials  >

We still had only pleasant impressions, with work by company "ControlStyle". Employees of firm have shown efficiency in redesign of web-shop, performed work before the coordinated term (!) with the suitable prices. Also the understanding of...

Fotomir Company,
Khilay V.

Articles  >

Articles  >  Programming  >  PHP & UTF-8. Chapter 2

Continuing the theme of work with strings encoded in UTF-8, we shall consider some more functions (utf8_strpos and utf8_substr_count), working without Multibyte String Functions extension:

function utf8_strpos($haystack, $needle, $offset = 0)
{
    # get substring (if isset offset param)
    $offset = ($offset<0) ? 0 : $offset;
    if ($offset>0)
    {
        preg_match('/^.{' . $offset . '}(.*)/us', $haystack, $dummy);
        $haystack = (isset($dummy[1])) ? $dummy[1] : '';
    }

    # get relative pos
    $p = strpos($haystack, $needle);
    if ($haystack=='' or $p===false) return false;
    $r = $offset;
    $i = 0;

    # calc real pos
    while($i<$p)
    {
        if (ord($haystack[$i])<128)
        {
            # ascii symbol
            $i = $i + 1;
        }
        else
        {
            # non-ascii symbol with variable length
            # (handling first byte)
            $bvalue = decbin(ord($haystack[$i]));       
            $i = $i + strlen(preg_replace('/^(1+)(.+)$/', '\1', $bvalue));
        }
        $r++;
    }
    return $r;
}

function utf8_substr_count($h, $n)
{
    # preparing $n for using in reg. ex.
    $n = preg_quote($n, '/');

    # select all matches
    preg_match_all('/' . $n . '/u', $h, $dummy);
    return count($dummy[0]);
}

See also: PHP & UTF-8. Chapter 1.

← To publications list

Nikolay I. Yarovoy,
03/19/2006.

Last projects:  Contact lens, Ekaterinburg

Back to top© 2020 ControlStyle, web site development. All rights reserved.
Web site promotion and advertising.