Supplementary Characters in the Java | Why java takes 2 bytes for character
Supplementary Characters are Characters in the Unicode standard whose code points are above U+FFFF and which therefore can not be described as single 16 bit entities such as the char data type in java programming language.
Supplementary Characters |
Such characters are very rare but some are used in Chinese and Japanese personal names so support for them is commonly required for government applications in East Asian countries. So java platform is being enhanced to enabled processing for supplementary characters.
The Unicode standard therefore has been extended to allow up to 1,112,064 characters. Those characters that go beyond the original 16-bit limit are called supplementary characters.
Version 5.0 of the J2SE is required to support version 4.0 of the Unicode standard, so it has to support supplementary characters.
The Java platform therefore not only has to support supplementary characters, but it also has to make it easy for applications to do the same. Since supplementary characters break a fundamental assumption of the Java programming language and might require a fundamental change in the programming model.
The introduction of supplementary characters unfortunately makes the character model quite a bit more complicated. Where in the past we could simply talk about "characters" and, in a Unicode based environment such as the Java platform, assume that a character has 16 bits, we now need more terminology.
Background history of Supplementary Characters/Unicode Characters
Unicode was originally designed as a fixed-width 16 bit character encoding and java support Unicode.
The primitive data type 'char' in java programming language intended to take advantage of this design that's why in java 2 byte is reserved to store a character in java.
How ever it turned out that 65,536 characters possible in a 16-bit encoding are not sufficient to represent all characters that are or have been used on planet Earth.
Version 5.0 of the J2SE is required to support version 4.0 of the Unicode standard, so it has to support supplementary characters.
The Java platform therefore not only has to support supplementary characters, but it also has to make it easy for applications to do the same. Since supplementary characters break a fundamental assumption of the Java programming language and might require a fundamental change in the programming model.
The introduction of supplementary characters unfortunately makes the character model quite a bit more complicated. Where in the past we could simply talk about "characters" and, in a Unicode based environment such as the Java platform, assume that a character has 16 bits, we now need more terminology.
Identifiers in the Java Programming Language
The Java Language Specification specifies that all Unicode letters and digits can be used in identifiers. Many supplementary characters are letters or digits, and so the Java Language Specification was updated to refer to new code point-based methods to define the legal characters in identifiers. The javac compiler and other tools that need to detect identifiers were changed to use these new methods.
No comments:
Post a Comment