new String(byte[]): Charset or charset name?

Today I was investigating bm results of the simple Grizzly 2.0 based Web server, for some reason it showed sensitive perf. gap comparing to 1.9.x implementation. I suspected something is happening on HTTP side, because Grizzly 2.0 core shows equal or better results for different tests.

After spending some time I realized, that the problem is caused by simple String constructor: new String(byte[] buffer, Charset charset), which is used in Grizzly 2.0. Grizzly 1.9.x uses new String(byte[] buffer, String charsetName). I supposed, that by passing Charset to the String constructor - I'll be able to optimize it and skip Charset resolving phase. But it was just a half of true. Even though I was able to skip charset resolving, it appeared that new String(byte[], Charset) constructor uses absolutely different execution path, comparing to charsetName constructor. Here is code, where it ends up:

new String(byte[] buffer, String charsetName) path:

static char[] decode(String charsetName, byte[] ba, int off, int len)
throws UnsupportedEncodingException
{
StringDecoder sd = (StringDecoder)deref(decoder);
String csn = (charsetName == null) ? "ISO-8859-1" : charsetName;
if ((sd == null) || !(csn.equals(sd.requestedCharsetName())
|| csn.equals(sd.charsetName()))) {
sd = null;
try {
Charset cs = lookupCharset(csn);
if (cs != null)
sd = new StringDecoder(cs, csn);
} catch (IllegalCharsetNameException x) {}
if (sd == null)
throw new UnsupportedEncodingException(csn);
set(decoder, sd);
}
return sd.decode(ba, off, len);
}

new String(byte[] buffer, Charset charset) path:

static char[] decode(Charset cs, byte[] ba, int off, int len) {
StringDecoder sd = new StringDecoder(cs, cs.name());
byte[] b = Arrays.copyOf(ba, ba.length);
return sd.decode(b, off, len);
}

Byte copying? Why?

From javadoc I understood, that new String(byte[], Charset) has a new? feature: "This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement string". May be because of that?

Anyway, here is the difference I observe in my profiler: new String(byte[] buffer, Charset charset) and new String(byte[] buffer, String charsetName)

Constructor with the Charset is 7x slower ???!!!

For sure it might be different depending on byte[] length, in my case the length=11.

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

oleksiys

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today