java - Wrong Content-Length for response text with umlaut -
there problem associated umlaut. description on request:
@requestmapping(value = "/description", method = requestmethod.post, consumes = "application/json", produces = "text/plain;charset=utf-8") @responsebody private string getdescription() { return "ärchik"; }
on frontend response.responsetext fails score last letter response.responsetext = "ärchi"
i found problem in wrong content-length: 7 if set content-length:8, work , return full description "ärchik"
but not understand why 8?
"ärchik".getbytes("utf-8").length = 7
response headers
cache-control:must-revalidate
content-length:7
content-type:text/plain;charset=utf-8
date:mon, 14 apr 2014 09:08:26 gmt
server:apache-coyote/1.1
i'm turning core of comment answer, since seems on right track.
the reason string 1 byte longer expected 'ä'
got encoded 3 bytes not two. can happen if 1 uses not precomposed codepoint u+00e4 (utf-8: c3 a4
) instead letter 'a'
(which simple ascii letter @ u+0061) followed combining diaresis u+0308, encoded 61 cc 88
. there several normal forms unicode, , longer encoding result of conversion nfd.
looking @ own answer, seems did normalization, @ point content length determined un-normalized (or perhaps nfc-normalized) string.
Comments
Post a Comment