What's That Noise?! [Ian Kallen's Weblog]

All | LAMP | Music | Java | Ruby | The Agilist | Musings | Commute | Ball
Main | Next day (Nov 8, 2004) »

20041107 Sunday November 07, 2004

Tomcat's "Content-type" header parsing busted? One bit fun this week was trying to figure out why some XML output I was working was mangling characters. I thought I was doing all of the right things as far as handling the data goes. Well, I think I was but Tomcat 5.0.28 wasn't.

I poked around the Jakarta bug database and the only mention I could find that close was PR 31442, which described having this

<%@ page language="java" contentType="text/html; charset=UTF-8" %>
<%@page pageEncoding="UTF-8"%>
and saying that the text was coming back ISO8859-1 when the page is requested as a GET but not as a POST. Well, someone from the Jakarta project marked the bug INVALID glibly saying to ask on the user's mailing list and look at the Connector configuration because it's not a bug. WTF? Are you kidding?

Now I looked around in the Connector stanza's that come in the server.xml and see no mention of encoding configuration attributes. I've got a real simple test case.

<% response.setContentType("text/xml"); %>
triggers no funny encoding behavior, I get the data out as good old utf8 just as I wanted but if I did this
<% response.setContentType("text/xml; charset=UTF-8"); %>
....kablooey! Mangled encoding! That's just wrong. And if it's not wrong, I think it warrants a better answer than RTFM on the Connectors.

And the problem may not just be isolated to JSP handling. Judging from other reports that are turning up in Google's index pertaining to SetCharacterEncodingFilter, it's affecting the filter implemetation as well.

( Nov 07 2004, 02:46:58 AM PST ) Permalink