Product Code Database
Example Keywords: tablet computers -glove $15
   » » Wiki: Utf-ebcdic
Tag Wiki 'Utf-ebcdic'.
Tag

UTF-EBCDIC is a character encoding capable of encoding all 1,112,064 valid character in using 1 to 5 (in contrast to a maximum of 4 for UTF-8). It is meant to be -friendly, so that legacy EBCDIC applications on mainframes may process the characters without much difficulty. Its advantages for existing EBCDIC-based systems are similar to UTF-8's advantages for existing -based systems. Details on UTF-EBCDIC are defined in Unicode Technical Report #16.

To produce the UTF-EBCDIC encoded version of a series of Unicode code points, an encoding based on UTF-8 (known in the specification as UTF-8-Mod) is applied first (creating what the specification calls an I8 sequence). The main difference between this encoding and UTF-8 is that it allows Unicode code points through (the C1 control codes) to be represented as a single byte and therefore later mapped to corresponding EBCDIC control codes. In order to achieve this, UTF-8-Mod uses instead of as the format for trailing bytes in a multi-byte sequence. As this can only hold 5 bits rather than 6, the UTF-8-Mod encoding of codepoints above are larger than the UTF-8 encoding.

The UTF-8-Mod transformation leaves the data in an ASCII-based format (for example, "A" is still encoded as ), so each byte is fed through a reversible (one-to-one) lookup table to produce the final UTF-EBCDIC encoding. For example, in this table maps to ; thus the UTF-EBCDIC encoding of (Unicode's "A") is (EBCDIC's "A").

UTF-EBCDIC is rarely used, even on the EBCDIC-based mainframes for which it was designed. EBCDIC-based mainframe operating systems, such as z/OS, usually use UTF-16 for complete Unicode support. For example, IBM Db2, , PL/I, Java and the toolkit support UTF-16 on IBM mainframes.


Codepage layout
There are 160 characters with single-byte encodings in UTF-EBCDIC (compared to 128 in UTF-8). As can be seen, the single-byte portion is similar to IBM-1047 instead of IBM-37 due to the location of the square brackets. 37 has at hex BA and BB instead of at hex AD and BD respectively.


Oracle UTFE
Oracle UTFE is a Unicode 3.0 UTF-8 variation, similar to the CESU-8 variant of UTF-8, where supplementary characters are encoded as two 4-byte characters rather than a single 4- or 5-byte character. It is used only on EBCDIC platforms.


See also
  • UTF-1
  • UTF-8
  • BOCU-1


External links

Page 1 of 1
1
Page 1 of 1
1

Account

Social:
Pages:  ..   .. 
Items:  .. 

Navigation

General: Atom Feed Atom Feed  .. 
Help:  ..   .. 
Category:  ..   .. 
Media:  ..   .. 
Posts:  ..   ..   .. 

Statistics

Page:  .. 
Summary:  .. 
1 Tags
10/10 Page Rank
5 Page Refs