Unicode System

Unicode is a universal international standard character encoding that is capable of representing most of the world's written languages.

Why java uses Unicode System?

there were many language standards, before using unicode system such as,
  • ASCII (American Standard Code for Information Interchange) for the United States.
  • ISO 8859-1 for Western European Language.
  • KOI-8 for Russian.
  • GB18030 and BIG-5 for chinese, and so on.
This caused two problems:
  1. A particular code value corresponds to different letters in the various language standards.
  2. The encoding for languages with large character sets have variable length.Some common characters are encoded as single bytes, other require two or more byte.
To solve these problems, a new language standard was developed i.e. Unicode System.
In unicode, character holds 2 byte, so java also uses 2 byte for characters.
lowest value:\u0000
highest value:\uFFFF

  1. Unicode Provides a unique number for every character, no matter what is the number, or program or language we deal with.

    Basically,, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. 

     The Unicode Standard has been adopted by such industry leaders as Apple, HP, IBM, JustSystems, Microsoft, Oracle, SAP, Sun, Sybase, Unisys. Unicode is required by modern standards such as XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc., and is the official way to implement ISO/IEC 10646. It is supported in many operating systems, all modern browsers, and many more products. The emergence of the Unicode Standard, and the availability of tools supporting it, are among the most significant recent global software technology trends.

    Incorporating Unicode into client-server or multi-tiered applications and websites offers significant cost savings over the use of legacy character sets. Unicode enables a single software product or a single website to be targeted across multiple platforms, languages and countries without re-engineering. It allows data to be transported through many different systems without corruption.

    Unicode can be implemented by different character encodings. Most commonly are UTF-8, UTF-16 etc.

    UTF-8= Universal Transformation format (8-byte).

Leave Comment