What's That Noise?! [Ian Kallen's Weblog]

All | LAMP | Music | Java | Ruby | The Agilist | Musings | Commute | Ball
Main | Next day (Jan 9, 2006) »

20060108 Sunday January 08, 2006

Character Set Encoding Detection in Java

The levers and dials of character set encoding can be overwhelming, just looking at the matrix supported by J2SE 1.4.2 gives me vertigo. Java's encoding conversion support is simple enough, if not garrulous:

String iso88591String = request.getParameter("q");
String utf8String = new String(iso88591String.getBytes("UTF-8"));
But what do you do if you don't know what encoding you're dealing with to begin with? It looks as though there are a couple of ways to do it: The docs for Java 1.5.0 look similar but I'm still using Java 1.4.2 (old habits die hard).

( Jan 08 2006, 04:42:53 PM PST ) Permalink