PHP Classes

utf-8 strings

Recommend this page to a friend!

      PHP Get All Words  >  All threads  >  utf-8 strings  >  (Un) Subscribe thread alerts  
Subject:utf-8 strings
Summary:don't works correct
Messages:4
Author:Maier Karl
Date:2016-04-17 12:39:07
 

  1. utf-8 strings   Reply   Report abuse  
Picture of Maier Karl Maier Karl - 2016-04-17 12:39:08
strtolower doesn't wotks with utf8 strings.
for example: German language äoü ....

  2. Re: utf-8 strings   Reply   Report abuse  
Picture of Lionel F. Lebeau Lionel F. Lebeau - 2016-04-17 15:00:19 - In reply to message 1 from Maier Karl
Your are right Karl.
I modified the two filter function to take it in account.
I just had to replace strtolower with mb_strtolower.

  3. Re: utf-8 strings   Reply   Report abuse  
Picture of Maier Karl Maier Karl - 2016-04-17 17:22:05 - In reply to message 2 from Lionel F. Lebeau
i founds out mb_strtolower needs a lot of time, the class goes than to low performance.

i my script i usw following.

if (mb_detect_encoding ($ele) == 'UTF-8')
$ele = mb_strtolower($ele,'UTF-8');
else
$ele = strtolower($ele);

the best way, i think write a internal function like above and call this than in array_map.

Best regards

  4. Re: utf-8 strings   Reply   Report abuse  
Picture of Lionel F. Lebeau Lionel F. Lebeau - 2016-04-17 22:42:04 - In reply to message 3 from Maier Karl
What if the text is not encoded in UTF-8 but in another multibyte encoding ?
For example, I often have to work with japanese encoded texts (EUC-JP or Shift JIS).
So, I think that using mb_strtolower(), although slower, is better than having to detect the encoding ^_^

But, of course, if you know you'll have only UTF-8 for multibyte text, you can replace the array_map() call ^_^

I use a much more complicated class to index chapters published on my main website. Some have more than 10000 words. It is launched by a cron job because it can be long (there are always several chapters to index).
In fact, if it is very slow, you could split the big text in smaller chunks and launch two or more sub routines in parallel.