marnanel: (Default)
[personal profile] marnanel posting in [community profile] perl
When you have data in a string, Perl remembers the encoding the string's in. If you grab UTF-8 stuff out of a database or from HTTP parameters, it doesn't know what the encoding is, and it will get it wrong. This function returns the strings you passed it concatenated and marked as UTF-8:

sub mark_utf8 { pack "U0C*", unpack "C*", join('',@_); }

Date: 2009-04-17 10:53 am (UTC)
sophie: A cartoon-like representation of a girl standing on a hill, with brown hair, blue eyes, a flowery top, and blue skirt. ☀ (Default)
From: [personal profile] sophie
Actually, the better way to do this, on Perl v5.8 or higher, is:

use Encode;

sub mark_utf8 { return decode("UTF-8", shift); }


This is because Perl's internal "utf8" encoding is very slightly different from regular "UTF-8" in subtle ways. (I don't know all the differences, but one is that Perl is more lax in the way it works.) The Encode module knows how to handle these changes and will always give you what you want.

Similarly, to unmark:

use Encode;

sub unmark_utf8 { return encode("UTF-8", shift); }

Profile

perl: cc-by-nc (Default)
Pathologically Eclectic Rubbish Lister

August 2012

S M T W T F S
   1234
56 7891011
12131415161718
19202122232425
262728293031 

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated May. 22nd, 2025 09:58 am
Powered by Dreamwidth Studios