Text::Iconv - Perl interface to iconv codeset conversion function |
Text::Iconv - Perl interface to iconv()
codeset conversion function
use Text::Iconv; $converter = Text::Iconv->new("fromcode", "tocode"); $converted = $converter->convert("Text to convert");
The Text::Iconv module provides a Perl interface to the iconv()
function as defined by the Single UNIX Specification.
The convert()
method converts the encoding of characters in the input
string from the fromcode codeset to the tocode codeset, and
returns the result.
Settings of fromcode and tocode and their permitted combinations
are implementation-dependent. Valid values are specified in the
system documentation; the iconv(1)
utility should also provide a -l
option that lists all supported codesets.
Text::Iconv objects also provide the following methods:
retval()
returns the return value of the underlying iconv()
function
for the last conversion; according to the Single UNIX Specification,
this value indicates ``the number of non-identical conversions
performed.'' Note, however, that iconv implementations vary widely in
the interpretation of this specification.
This method can be called after calling convert(), e.g.:
$result = $converter->convert("lorem ipsum dolor sit amet"); $retval = $converter->retval;
When called before the first call to convert(), or if an error occured
during the conversion, retval()
returns undef.
get_attr(): This method is only available with GNU libiconv, otherwise
it throws an exception. The get_attr()
method allows you to query
various attributes which influence the behavior of convert(). The
currently supported attributes are trivialp, transliterate, and
discard_ilseq, e.g.:
$state = $converter->get_attr("transliterate");
See iconvctl(3)
for details. To ensure portability to other iconv
implementations you should first check for the availability of this
method using eval {}, e.g.:
eval { $conv->get_attr("trivialp") }; if ($@) { # get_attr() is not available } else { # get_attr() is available }
This method should be considered experimental.
set_attr(): This method is only available with GNU libiconv, otherwise
it throws an exception. The set_attr()
method allows you to set
various attributes which influence the behavior of convert(). The
currently supported attributes are transliterate and
discard_ilseq, e.g.:
$state = $converter->set_attr("transliterate");
See iconvctl(3)
for details. To ensure portability to other iconv
implementations you should first check for the availability of this
method using eval {}, cf. the description of set_attr()
above.
This method should be considered experimental.
If the conversion can't be initialized an exception is raised (using croak()).
Text::Iconv provides a class attribute raise_error and a
corresponding class method for setting and getting its value. The
handling of errors during conversion depends on the setting of this
attribute. If raise_error is set to a true value, an exception is
raised; otherwise, the convert()
method only returns undef. By
default raise_error is false. Example usage:
Text::Iconv->raise_error(1); # Conversion errors raise exceptions Text::Iconv->raise_error(0); # Conversion errors return undef $a = Text::Iconv->raise_error(); # Get current setting
As an experimental feature, Text::Iconv also provides an instance attribute raise_error and a corresponding method for setting and getting its value. If raise_error is undef, the class-wide settings apply. If raise_error is 1 or 0 (true or false), the object settings override the class-wide settings.
Consult iconv(3) for details on errors that might occur.
Converting undef, e.g.,
$converted = $converter->convert(undef);
always returns undef. This is not considered an error.
The supported codesets, their names, the supported conversions, and the quality of the conversions are all system-dependent.
[ Added for Windows users (jl_morel@bribes.org)
Here is the list of the names of the supported encodings for this binary distro. The names are printed in upper case, separated by whitespace, and alias names of an encoding are listed on the same line as the encoding itself.
ANSI_X3.4-1968 ANSI_X3.4-1986 ASCII CP367 IBM367 ISO-IR-6 ISO646-US ISO_646.IRV:1991 US US-ASCII CSASCII UTF-8 ISO-10646-UCS-2 UCS-2 CSUNICODE UCS-2BE UNICODE-1-1 UNICODEBIG CSUNICODE11 UCS-2LE UNICODELITTLE ISO-10646-UCS-4 UCS-4 CSUCS4 UCS-4BE UCS-4LE UTF-16 UTF-16BE UTF-16LE UTF-32 UTF-32BE UTF-32LE UNICODE-1-1-UTF-7 UTF-7 CSUNICODE11UTF7 UCS-2-INTERNAL UCS-2-SWAPPED UCS-4-INTERNAL UCS-4-SWAPPED C99 JAVA CP819 IBM819 ISO-8859-1 ISO-IR-100 ISO8859-1 ISO_8859-1 ISO_8859-1:1987 L1 LATIN1 CSISOLATIN1 ISO-8859-2 ISO-IR-101 ISO8859-2 ISO_8859-2 ISO_8859-2:1987 L2 LATIN2 CSISOLATIN2 ISO-8859-3 ISO-IR-109 ISO8859-3 ISO_8859-3 ISO_8859-3:1988 L3 LATIN3 CSISOLATIN3 ISO-8859-4 ISO-IR-110 ISO8859-4 ISO_8859-4 ISO_8859-4:1988 L4 LATIN4 CSISOLATIN4 CYRILLIC ISO-8859-5 ISO-IR-144 ISO8859-5 ISO_8859-5 ISO_8859-5:1988 CSISOLATINCYRILLIC ARABIC ASMO-708 ECMA-114 ISO-8859-6 ISO-IR-127 ISO8859-6 ISO_8859-6 ISO_8859-6:1987 CSISOLATINARABIC ECMA-118 ELOT_928 GREEK GREEK8 ISO-8859-7 ISO-IR-126 ISO8859-7 ISO_8859-7 ISO_8859-7:1987 CSISOLATINGREEK HEBREW ISO-8859-8 ISO-IR-138 ISO8859-8 ISO_8859-8 ISO_8859-8:1988 CSISOLATINHEBREW ISO-8859-9 ISO-IR-148 ISO8859-9 ISO_8859-9 ISO_8859-9:1989 L5 LATIN5 CSISOLATIN5 ISO-8859-10 ISO-IR-157 ISO8859-10 ISO_8859-10 ISO_8859-10:1992 L6 LATIN6 CSISOLATIN6 ISO-8859-13 ISO-IR-179 ISO8859-13 ISO_8859-13 L7 LATIN7 ISO-8859-14 ISO-CELTIC ISO-IR-199 ISO8859-14 ISO_8859-14 ISO_8859-14:1998 L8 LATIN8 ISO-8859-15 ISO-IR-203 ISO8859-15 ISO_8859-15 ISO_8859-15:1998 LATIN-9 ISO-8859-16 ISO-IR-226 ISO8859-16 ISO_8859-16 ISO_8859-16:2001 L10 LATIN10 KOI8-R CSKOI8R KOI8-U KOI8-RU CP1250 MS-EE WINDOWS-1250 CP1251 MS-CYRL WINDOWS-1251 CP1252 MS-ANSI WINDOWS-1252 CP1253 MS-GREEK WINDOWS-1253 CP1254 MS-TURK WINDOWS-1254 CP1255 MS-HEBR WINDOWS-1255 CP1256 MS-ARAB WINDOWS-1256 CP1257 WINBALTRIM WINDOWS-1257 CP1258 WINDOWS-1258 850 CP850 IBM850 CSPC850MULTILINGUAL 862 CP862 IBM862 CSPC862LATINHEBREW 866 CP866 IBM866 CSIBM866 MAC MACINTOSH MACROMAN CSMACINTOSH MACCENTRALEUROPE MACICELAND MACCROATIAN MACROMANIA MACCYRILLIC MACUKRAINE MACGREEK MACTURKISH MACHEBREW MACARABIC MACTHAI HP-ROMAN8 R8 ROMAN8 CSHPROMAN8 NEXTSTEP ARMSCII-8 GEORGIAN-ACADEMY GEORGIAN-PS KOI8-T MULELAO-1 CP1133 IBM-CP1133 ISO-IR-166 TIS-620 TIS620 TIS620-0 TIS620.2529-1 TIS620.2533-0 TIS620.2533-1 CP874 WINDOWS-874 VISCII VISCII1.1-1 CSVISCII TCVN TCVN-5712 TCVN5712-1 TCVN5712-1:1993 ISO-IR-14 ISO646-JP JIS_C6220-1969-RO JP CSISO14JISC6220RO JISX0201-1976 JIS_X0201 X0201 CSHALFWIDTHKATAKANA ISO-IR-87 JIS0208 JIS_C6226-1983 JIS_X0208 JIS_X0208-1983 JIS_X0208-1990 X0208 CSISO87JISX0208 ISO-IR-159 JIS_X0212 JIS_X0212-1990 JIS_X0212.1990-0 X0212 CSISO159JISX02121990 CN GB_1988-80 ISO-IR-57 ISO646-CN CSISO57GB1988 CHINESE GB_2312-80 ISO-IR-58 CSISO58GB231280 CN-GB-ISOIR165 ISO-IR-165 ISO-IR-149 KOREAN KSC_5601 KS_C_5601-1987 KS_C_5601-1989 CSKSC56011987 EUC-JP EUCJP EXTENDED_UNIX_CODE_PACKED_FORMAT_FOR_JAPANESE CSEUCPKDFMTJAPANESE MS_KANJI SHIFT-JIS SHIFT_JIS SJIS CSSHIFTJIS CP932 ISO-2022-JP CSISO2022JP ISO-2022-JP-1 ISO-2022-JP-2 CSISO2022JP2 CN-GB EUC-CN EUCCN GB2312 CSGB2312 CP936 GBK MS936 WINDOWS-936 GB18030 ISO-2022-CN CSISO2022CN ISO-2022-CN-EXT HZ HZ-GB-2312 EUC-TW EUCTW CSEUCTW BIG-5 BIG-FIVE BIG5 BIGFIVE CN-BIG5 CSBIG5 CP950 BIG5-HKSCS BIG5HKSCS EUC-KR EUCKR CSEUCKR CP949 UHC CP1361 JOHAB ISO-2022-KR CSISO2022KR CP856 CP922 CP943 CP1046 CP1124 CP1129 CP1161 IBM-1161 IBM1161 CSIBM1161 CP1162 IBM-1162 IBM1162 CSIBM1162 CP1163 IBM-1163 IBM1163 CSIBM1163 DEC-KANJI DEC-HANYU 437 CP437 IBM437 CSPC8CODEPAGE437 CP737 CP775 IBM775 CSPC775BALTIC 852 CP852 IBM852 CSPCP852 CP853 855 CP855 IBM855 CSIBM855 857 CP857 IBM857 CSIBM857 CP858 860 CP860 IBM860 CSIBM860 861 CP-IS CP861 IBM861 CSIBM861 863 CP863 IBM863 CSIBM863 CP864 IBM864 CSIBM864 865 CP865 IBM865 CSIBM865 869 CP-GR CP869 IBM869 CSIBM869 CP1125 EUC-JISX0213 SHIFT_JISX0213 ISO-2022-JP-3 ISO-IR-230 TDS565 RISCOS-LATIN1 ]
Michael Piotrowski <mxp@dynalabs.de>
iconv(1), iconv(3)
Text::Iconv - Perl interface to iconv codeset conversion function |