Testimonials  >

The "ETS-Ural" company sells the welding apparatuses and accompanied equipment in Ekaterinburg, offering its clients a wide selection of welding tools of different types such as: sources for the manual arc welding, half-automatic and automatic...

Ipatov S.

Articles  >

Articles  >  Programming  >  PHP & UTF-8. Chapter 1

utf-8It is more convenient and more preferable to use coding UTF-8 providing support all or nearly so of all existing languages and coding ASCII-symbols (the latin alphabet, figures and special symbols) one byte, and national alphabets - to several, by development of multilingual sites for HTML-pages. Thus, coding UTF-8 has variable physical length of each symbol. In this connection sometimes there are problems at programming of multilingual sites.

For example, in programming language PHP of function strlen and substr give out incorrect results if there are symbols of the national alphabet in string (because they are intended for work with the one-byte coding). Certainly, there are such functions as mb_strlen and mb_susbtr in PHP, which are specially intended for work with multibyte strings. But, support Multibyte String Functions in PHP is switched off by default, that automatically limits a choice of a hosting for a projected site. Besides at connection of the module mb_string supported languages choose. That is why there is a probability, that language required to you can not appear in the list supported.

However, there is other, more convenient and flexible decision of a problem. Having taken advantage of functions PCRE correctly perceiving coding UTF-8, it is possible to write the functions utf8_strlen and utf8_substr:

function utf8_strlen($s)
    return preg_match_all('/./u', $s, $tmp);

function utf8_substr($s, $offset, $len = 'all')
    if ($offset<0) $offset = utf8_strlen($s) + $offset;
    if ($len!='all')
        if ($len<0) $len = utf8_strlen($s) - $offset + $len;
        $xlen = utf8_strlen($s) - $offset;
        $len = ($len>$xlen) ? $xlen : $len;
        preg_match('/^.{' . $offset . '}(.{0,'.$len.'})/us', $s, $tmp);
        preg_match('/^.{' . $offset . '}(.*)/us', $s, $tmp);
    return (isset($tmp[1])) ? $tmp[1] : false;

See also: PHP & UTF-8. Chapter 2.

← To publications list

Nikolay I. Yarovoy,

Last projects:  Contact lens, Ekaterinburg

Back to top© 2021 ControlStyle, web site development. All rights reserved.
Web site promotion and advertising.