![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
![[community profile]](https://www.dreamwidth.org/img/silk/identity/community.png)
When you have data in a string, Perl remembers the encoding the string's in. If you grab UTF-8 stuff out of a database or from HTTP parameters, it doesn't know what the encoding is, and it will get it wrong. This function returns the strings you passed it concatenated and marked as UTF-8:
sub mark_utf8 { pack "U0C*", unpack "C*", join('',@_); }
no subject
Date: 2009-04-17 10:53 am (UTC)use Encode;
sub mark_utf8 { return decode("UTF-8", shift); }
This is because Perl's internal "utf8" encoding is very slightly different from regular "UTF-8" in subtle ways. (I don't know all the differences, but one is that Perl is more lax in the way it works.) The Encode module knows how to handle these changes and will always give you what you want.
Similarly, to unmark:
use Encode;
sub unmark_utf8 { return encode("UTF-8", shift); }
(no subject)
From: