我一直在试验各种Java代码试图想出一些东西,它将编码一个包含引号,空格和"奇异"Unicode字符的字符串,并产生与JavaScript的encodeURIComponent函数相同的输出.
我的折磨测试字符串是:"A"B±"
如果我在Firebug中输入以下JavaScript语句:
encodeURIComponent('"A" B ± "');
- 然后我得到:
"%22A%22%20B%20%C2%B1%20%22"
这是我的小测试Java程序:
import java.io.UnsupportedEncodingException; import java.net.URLEncoder; public class EncodingTest { public static void main(String[] args) throws UnsupportedEncodingException { String s = "\"A\" B ± \""; System.out.println("URLEncoder.encode returns " + URLEncoder.encode(s, "UTF-8")); System.out.println("getBytes returns " + new String(s.getBytes("UTF-8"), "ISO-8859-1")); } }
- 该计划输出:
URLEncoder.encode returns %22A%22+B+%C2%B1+%22 getBytes returns "A" B ± "
关闭,但没有雪茄!使用Java编码UTF-8字符串的最佳方法是什么,以便它产生与JavaScript相同的输出encodeURIComponent
?
编辑:我很快就使用Java 1.4迁移到Java 5.
这是我最终提出的课程:
import java.io.UnsupportedEncodingException; import java.net.URLDecoder; import java.net.URLEncoder; /** * Utility class for JavaScript compatible UTF-8 encoding and decoding. * * @see http://stackoverflow.com/questions/607176/java-equivalent-to-javascripts-encodeuricomponent-that-produces-identical-output * @author John Topley */ public class EncodingUtil { /** * Decodes the passed UTF-8 String using an algorithm that's compatible with * JavaScript'sdecodeURIComponent
function. Returns *null
if the String isnull
. * * @param s The UTF-8 encoded String to be decoded * @return the decoded String */ public static String decodeURIComponent(String s) { if (s == null) { return null; } String result = null; try { result = URLDecoder.decode(s, "UTF-8"); } // This exception should never occur. catch (UnsupportedEncodingException e) { result = s; } return result; } /** * Encodes the passed String as UTF-8 using an algorithm that's compatible * with JavaScript'sencodeURIComponent
function. Returns *null
if the String isnull
. * * @param s The String to be encoded * @return the encoded String */ public static String encodeURIComponent(String s) { String result = null; try { result = URLEncoder.encode(s, "UTF-8") .replaceAll("\\+", "%20") .replaceAll("\\%21", "!") .replaceAll("\\%27", "'") .replaceAll("\\%28", "(") .replaceAll("\\%29", ")") .replaceAll("\\%7E", "~"); } // This exception should never occur. catch (UnsupportedEncodingException e) { result = s; } return result; } /** * Private constructor to prevent this class from being instantiated. */ private EncodingUtil() { super(); } }
看看实现差异,我看到:
MDCencodeURIComponent()
:
文字字符(正则表达式): [-a-zA-Z0-9._*~'()!]
Java 1.5.0文档URLEncoder
:
文字字符(正则表达式): [-a-zA-Z0-9._*]
空格字符" "
转换为加号"+"
.
基本上,要获得所需的结果,请使用URLEncoder.encode(s, "UTF-8")
然后进行一些后处理:
替换所有出现的"+"
与"%20"
将所有出现的"%xx"
任何表示替换[~'()!]
回其文字对位部分
使用Java 6附带的javascript引擎:
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
public class Wow
{
public static void main(String[] args) throws Exception
{
ScriptEngineManager factory = new ScriptEngineManager();
ScriptEngine engine = factory.getEngineByName("JavaScript");
engine.eval("print(encodeURIComponent('\"A\" B ± \"'))");
}
}
产量:%22A%22%20B%20%c2%b1%20%22
情况有所不同,但它更接近你想要的.
我使用java.net.URI#getRawPath()
,例如
String s = "a+b c.html"; String fixed = new URI(null, null, s, null).getRawPath();
的价值fixed
将是a+b%20c.html
,这是你想要的.
对输出进行后处理URLEncoder.encode()
将消除应该在URI中的任何优缺点.例如
URLEncoder.encode("a+b c.html").replaceAll("\\+", "%20");
会给你a%20b%20c.html
,这将被解释为a b c.html
.