It seemed pretty straight forward going into the project that I'm working on:
For instance, if Tomcat is serving on the HTTP tier edit server.xml and make sure the URIEncoding attribute (absent by default) is set for the connector.<Connector port="8080" maxThreads="150" minSpareThreads="25" maxSpareThreads="75" enableLookups="false" redirectPort="8443" acceptCount="100" debug="0" connectionTimeout="20000" disableUploadTimeout="true" URIEncoding="UTF-8" />The same holds true for letting Apache do the HTTP dirty work and connecting with mod_jk<Connector port="8009" enableLookups="false" redirectPort="8443" debug="0" URIEncoding="UTF-8" protocol="AJP/1.3" />And, by the way, if you have static content served by an Apache server, you probably want this as wellAddDefaultCharset utf-8
Manually dealing with the ASCII escaping is a nuisance. If the conversion can't be transparent, at least automate it.
In the case of MySQL, changing the JDBC URLs from thisjdbc:mysql://localhost/fubarto thisjdbc:mysql://localhost/fubar?useUnicode=true&characterEncoding=UTF-8made a world of difference.
If the charset isn't set to UTF-8 when it really is, you could be confusing the client. This can set in a servlet, in a JSP and IIRC the struts-config.xml allows you to set it declaritively. You want to set the Content-type before writing to the response object's PrintWriter. Apparently if you have multibyte characters in your JSP page components, you need to set the pageEncoding i.e. in the JSP file itself, something like this:<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8 %>Though my whole motivation for using Java on this project was to have page components have only markup and display code; all of the lanugage is abstracted. Anyway, I'm preferring Velocity over JSP these days.
In my experience, doing thisrequest.setCharacterEncoding("UTF-8");before getting the parameter values is not reliable (could be Tomcat bugs though). However, this appears to be a fairly standard idiomString formValue = new String(request.getParameter("formParam").getBytes("ISO8859_1") /* bytes */, "UTF-8");