Make your own free website on Tripod.com

IANA and JDK-specific character encoding names

by Anton Tagunov

First, of all IANA keeps a registry of character set (character encoding) names. It can accessed here.

Second, JDK docs include a list of character encodings supported by the JDK. For JDK 1.3 this can be accessed here or in your own tree of JDK documents at docs/guide/intl/encoding.doc.html.

(As this doc says, these encodings are supported if you have the jre/lib/i18n.jar in your distribution.

This list of supported encodings lists encodings by their canonical names.

Canonical name of a character encoding is the value returned by the getEncoding method in the java.io.InputStreamReader and java.io.OutputStreamWriter classes. The character sets in JDK 1.3 may have aliases and if the java.io.InputStreamReader/ java.io.OutputStreamWriter has been given a character set name alias at creation time then the java.io.InputStreamReader will still return the canonical name.

Canonical names are not always the same as IANA preffered names for the character encodings, but IANA preffered character encoding names are recognised as aliases to the canonical names. For example,

the IANA preffered character encoding name iswhile JDK canonical name is
UTF-8UTF8
ISO-8859-1ISO8859_1
Shift_JISSJIS
KOI8-RKOI8_R

It might be worth noting that xerces sources contain a special utility function that performs conversions of the form UTF-8 -> UTF8 and there's some wording there about JDK 1.1

So we can suspect that JDK 1.1 did not support the IANA preffered names as aliases for all the character encodings that it had support for.