8-Bit Single-Byte Coded Graphic Character Sets: Latin/Cyrillic Alphabet
8-Bit Single-Byte Coded Graphic Character Sets: Latin/Cyrillic Alphabet
3 r d E d i t i o n - D e c e mb e r 1 9 9 9
Phone: +41 22 849.60.00 - Fax: +41 22 849.60.01 - URL: http://www.ecma.ch - Internet: helpdesk@ecma.ch
.
Standard ECMA-113
D e c e mb e r 1 9 9 9
Phone: +41 22 849.60.00 - Fax: +41 22 849.60.01 - URL: http://www.ecma.ch - Internet: helpdesk@ecma.ch
MB E-113-iii.doc 17-01-00 12,18
.
Brief History
The adoption of ECMA-6 (ISO/IEC 646) as the agreed international 7-bit code for information interchange had led to
the development of many national, international and application-oriented versions of this code.
These versions had a number of limitations generally inherent to the size of the code:
− they did not provide all graphic characters which were needed;
− for some characters, specially for accented letters, it was necessary to resort to BACKSPACE sequences, which
created problems when processing data containing such composite characters;
− interchange among different versions was practically limited to the 82 common graphic characters.
With the advent of 8-bit coding it was possible to increase the number of graphic characters. ISO/IEC 6937, for
example, provided a character set covering the requirements of most languages based on the Latin alphabet. This
character set, although well suited for text communication, was difficult to use for processing as some graphic
characters were represented by one and others by two bit combinations.
Thus the need was recognized for coded graphic character sets, each of which:
− is the same for all users of a given area,
− provides single-byte coding of all graphic characters, thus permitting easy processing,
− takes into account character sets used in the industry.
In 1982 the urgency of the need for an 8-bit single-byte coded character set was recognized in ECMA as well as in
ANSI/X3L2 and numerous working papers were exchanged between the two groups. In February 1984 ECMA TC1
submitted to ISO/TC97/SC2 a proposal for such a coded character set. At its meeting of April 1984 SC2 decided to
submit to TC97 a proposal for a new item of work for this topic. Technical discussions during and after this meeting
led TC1 to adopt the coding scheme proposed by X3L2. International Standard ISO/IEC 8859-1 is based on this joint
ANSI/ECMA proposal. ECMA published its corresponding Standard ECMA-94 in March 1985.
After this first publication, the work of ECMA TC1 on further coded graphic character sets has led to the following
results:
i. A first Edition, dated June 1986, of a Standard for a Latin/Cyrillic coded graphic character set.
ii. The second Edition of Standard ECMA-94, dated June 1986, comprising four coded graphic character sets for
the Latin script, identified as Latin Alphabets No. 1 to No. 4. These alphabets have a number of characters in
common, in particular those allocated to columns 02 to 07. They have all been submitted to ISO/IEC JTC 1 - the
successor of ISO/TC97 - and are the subject of ISO/IEC 8859, Parts 1 to 4.
iii. A series of ECMA Standards for coded graphic character sets comprising those characters of the Latin Alphabets
allocated to columns 02 to 07 and characters of another script for multiple-language applications. These
Standards ECMA-114, ECMA-118 and ECMA-121 cover the Arabic, Greek and Hebrew scripts, respectively.
They have been submitted to JTC 1 for further processing as ISO/IEC standards and have been published as Part
6, Part 7 and Part 8, respectively, of ISO/IEC 8859.
The 2 nd Edition of Standard ECMA-113 superseded the first edition. Indeed, the latter was based on the 1974 version
of GOST Standard 19768. In 1987 this standard was revised. As a consequence the 2 nd Edition was prepared in co-
operation with Russian experts and was brought in complete agreement with the corresponding GOST standard. The
corresponding International Standard, ISO/IEC 8859-5:1988 is technically identical with the 2 nd Edition of
ECMA-113.
In 1999 the 2 nd Edition of ISO/IEC 8859-5 has been published, as a technical revision of the 1 st Edition of this
International Standard. The 3rd Edition of ECMA-113 has been made technically identical with the 2 nd Edition of
ISO/IEC 8859-5.
This 3 rd Edition of Standard ECMA-113 has been adopted by the ECMA General Assembly of December 1999.
- i -
Table of contents
1 Scope 1
2 Conformance 1
2.1 Conformance of information interchange 1
2.2 Conformance of devices 1
2.2.1 Device description 1
2.2.2 Originating devices 1
2.2.3 Receiving devices 1
3 References 2
4 Definitions 2
4.1 bit combination 2
4.2 byte 2
4.3 character 2
4.4 code table 2
4.5 coded character set; code 2
4.6 coded-character-data-element (CC-data-element) 2
4.7 graphic character 2
4.8 graphic symbol 2
4.9 position 2
Annex B - Main differences between the second edition and this third edition of ECMA-113 13
Annex C - Bibliography 15
2 Conformance
2.1 Conformance of information interchange
A coded-character-data-element (CC-data-element) within coded information for interchange is in
conformance with this ECMA Standard if all the coded representations of graphic characters within that
CC-data-element conform to the requirements of clause 6.
2.2 Conformance of devices
A device is in conformance with this ECMA Standard if it conforms to the requirements of 2.2.1, and either
or both of 2.2.2 and 2.2.3. A claim of conformance shall identify the document which contains the
description specified in 2.2.1.
2.2.1 Device description
A device that conforms to this ECMA Standard shall be subject of a description that identifies the means
by which the user may supply characters to the device, or may recognize them when they are made
available to him, as specified respectively in 2.2.2 and 2.2.3.
2.2.2 Originating devices
An originating device shall allow its user to supply any sequence of characters from those specified in
clause 6, and shall be capable of transmitting their coded representations within a CC-data-element.
2.2.3 Receiving devices
A receiving device shall be capable of receiving and interpreting any coded representations of characters
that are within a CC-data-element, and that conform to clause 6, and shall make the corresponding
characters available to its user in such a way that the user can identify them from among those specified
there, and can distinguish them from each other.
- 2 -
3 References
ECMA-35 Code Extension Techniques
ECMA-43 8-Bit Coded Character Set Structure and Rules
ECMA-48 Control Functions for Coded Character Sets
ECMA-94 8-Bit Single-Byte Coded Graphic Character Sets - Latin Alphabets No. 1 to No. 4
ECMA-114 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Arabic Alphabet
ECMA-118 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Greek Alphabet
ECMA-121 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Hebrew Alphabet
ECMA-128 8-Bit Single-Byte Coded Graphic Character Sets - Latin alphabet No. 5
ECMA-144 8-Bit Singly-Byte Coded Graphic Character Sets - Latin Alphabet No. 6
4 Definitions
For the purpose of this Standard the following definitions apply.
4.1 bit combination
An ordered set of bits used for the representation of characters.
4.2 byte
A bit string that is operated upon as a unit.
4.3 character
A member of a set of elements used for the organization, control, or representation of data.
4.4 code table
A table showing the characters allocated to each bit combination in a code.
4.5 coded character set; code
A set of unambiguous rules that establishes a character set and the one-to-one relationship between the
characters of the set and their bit combinations.
4.6 coded-character-data-element (CC-data-element)
An element of interchanged information that is specified to consist of a sequence of coded representations
of characters, in accordance with one or more identified standards for coded character sets.
4.7 graphic character
A character, other than a control function, that has a visual representation normally hand-written, printed or
displayed, and that has a coded representation consisting of one or more bit combinations.
NOTE
In this Standard a single bit combination is used to represent each character.
4.8 graphic symbol
A visual representation of a graphic character or of a control function.
4.9 position
That part of a code table identified by its column and row co-ordinates.
The bit combinations may be interpreted to represent numbers in binary notation by attributing the
following weights to the individual bits:
Bit b8 b7 b6 b5 b4 b3 b2 b1
Weight 128 64 32 16 8 4 2 1
Using these weights, the bit combinations are identified by notations of the form xx/yy, where xx and yy
are numbers in the range 00 to 15. The correspondence between the notations of the form xx/yy and the bit
combinations consisting of the bits b8 to b 1 is as follows:
− xx is the number represented by b 8 , b 7 , b 6 and b 5 where these bits are given the weights 8, 4, 2, and 1,
respectively.
− yy is the number represented by b 4 , b 3 , b 2 and b 1 where these bits are given the weights 8, 4, 2, and 1,
respectively.
The bit combinations are also identified by notations of the form hk, where h and k are numbers in the
range 0 to F in hexadecimal notation. The number h is the same as the number xx described above, and the
number k the same as the number yy described above.
5.2 Layout of the code table
An 8-bit code table consists of 256 positions arranged in 16 columns and 16 rows. The columns and the
rows are numbered 00 to 15. In hexadecimal notation the columns and the rows are numbered 0 to F.
The code table positions are identified by notations of the form xx/yy, where xx is the column number and
yy is the row number. The column and row numbers are shown at the top and left edges of the table,
respectively. The code table positions are also identified by notations of the form hk, where h is the column
number and k is the row number in hexadecimal notation. The column and row numbers are shown at the
bottom and right edges of the table, respectively.
The positions of the code table are in one-to-one correspondence with the bit combinations of the code. The
notation of a code table position, of the form xx/yy, or of the form hk, is the same as that of the
corresponding bit combination.
5.3 Names and meanings.
This ECMA Standard assigns a unique name and a unique identifier to each graphic character. These names
and identifiers have been taken from ISO/IEC 10646-1. This ECMA Standard also specifies an acronym for
each of the characters SPACE, NO-BREAK SPACE and SOFT HYPHEN. For acronyms only Latin capital
letters A to Z are used. It is intended that the acronyms be retained in all translations of the text.
Except for SPACE (SP), NO-BREAK SPACE (NBSP) and SOFT HYPHEN (SHY), this ECMA Standard
does not define and does not restrict the meanings of graphic characters.
This ECMA Standard specifies a graphic symbol for each graphic character. This symbol is shown in the
corresponding position of the code table. However, this Standard does not specify a particular style or font
design for imaging graphic characters.
5.3.1 SPACE (SP)
A graphic character the visual representation of which consists of the absence of a graphic symbol.
5.3.2 NO-BREAK SPACE (NBSP)
A graphic character the visual representation of which consists of the absence of a graphic symbol, for
use when a line break is to be prevented in the text as presented.
5.3.3 SOFT HYPHEN (SHY)
A graphic character that is imaged by a graphic symbol identical with, or similar to, that representing
HYPHEN, for use when a line break has been established within a word.
- 4 -
Bit
combina- Hex Identifier Name
tion
Bit
combina- Hex Identifier Name
tion
Bit
combina- Hex Identifier Name
tion
b4 b 3 b2 b1 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15
0 0 0 0 00 SP 0 P p NBSP 0
0 0 0 1 01 1 A Q a q 1
0 0 1 0 02 2 B R b r 2
0 0 1 1 03 3 C S c s 3
0 1 0 0 04 4 D T d t 4
0 1 0 1 05 5 E U e u 5
0 1 1 0 06 6 F V f v 6
0 1 1 1 07 7 G W g w 7
1 0 0 0 08 8 H X h x 8
1 0 0 1 09 9 I Y i y 9
1 0 1 0 10 J Z j z A
1 0 1 1 11 K k B
1 1 0 0 12 L l C
1 1 0 1 13 M m SHY D
1 1 1 0 14 N n E
1 1 1 1 15 O _ o F
0 1 2 3 4 5 6 7 8 9 A B C D E F
he
x
99-0086-A
- 9 -
Annex A
(informative)
Coverage of languages
NOTES
1. The list of languages in table A.1 is not exhaustive. It shows the languages that are included in the Scope
clause of each of the ECMA Standardsfor the Latin alphabets.
2. For writing French, three characters (Œ, œ, Ÿ) not specified in Latin alphabets No. 1, 3 and 5, are also
needed.
3. The various Sámi languages use partly differing orthographies. The character sets in Latin alphabets No.
4 and No. 6 cover the requirements of the Sámi languages most commonly used in Finland, Norway and
Sweden. For the Skolt Sámi language used in Finland and Norway additional characters are needed.
4. There are several official written languages outside Europe that are covered by Latin alphabet No. 1.
Examples are Indonesian/Malay, Tagalog (Philippines), Swahili, Afrikaans.
5. Use of Latin alphabet No. 3 for Turkish is deprecated.
Annex B
(informative)
Main differences between the second edition and this third edition of ECMA-113
B.1 The names of the graphic characters have been amended where necessary to align them with the names of the
characters adopted for all standards on coded character sets developed under the responsibility of ISO/IEC
JTC 1. For each character the short identifiers specified in ISO/IEC 10646-1, Amendment 9, have been added
to table 1.
B.2 The new style of conformance clause, adopted for all standards on coded character sets, has been introduced.
B.3 Object identifiers conforming to Abstract Syntax Notation One are specified in annex D for the character set,
and the corresponding coded representations of this ECMA Standard.
Registration numbers from the International register of coded character sets to be used with escape sequences
have been included as an additional method of identifying the coded character set of this ECMA Standard.
B.4 A new annex A has been added that identifies the coverage of languages by the Standards for the Latin
alphabets.
B.5 Various editorial adjustments and clarifications have been made to the text of the Standard. The hexadecimal
equivalents of the bit combinations have been added to tables 1 and 2.
B.6 Annex C, Bibliography, and annex D, Identification according to ISO/IEC 8824-1, have been added.
- 14 -
- 15 -
Annex C
(informative)
Bibliography
ECMA-48 Control Functions for Coded Character Sets, 5 th Edition (June 1991)
ISO/IEC 10367:1991 Information technology - Standardized coded graphic character sets for use in 8-bit codes
ISO/IEC 10646-1:1993 Information technology - Universal Multiple-Octet Coded Character Set (UCS) - Part 1:
Architecture and Basic Multilingual Plane
ISO International register of coded character sets to be used with escape sequences.
CEN/CENELEC IT/PT004, Report from the project team on Definition of a Cyrillic primary set of graphic characters
(CEN, Brussels, July 1992)
- 16 -
- 17 -
Annex D
(informative)
In the terminology of ISO/IEC 8824-1 the character set of part of ISO/IEC 8859-5 (ECMA-113) and the
corresponding coded representations are distinct, and are known as the "character abstract syntax" and the "character
transfer syntax", respectively.
When the identification methods of ISO/IEC 8824-1 are used, ISO/IEC 8859-5 shall be identified by the following
object identifiers:
− character set
{iso standard 8859 5 abstract-syntax (1)}
− coded representations
{iso standard 8859 5 transfer-syntax (0)}
The corresponding object descriptors shall be:
− character set "ISO 8859 part 5 repertoire"
− coded representations "ISO 8859 part 5 code".
.
Free printed copies can be ordered from:
ECMA
114 Rue du Rhône
CH-1204 Geneva
Switzerland
Fax: +41 22 849.60.01
Internet: documents@ecma.ch
Files of this Standard can be freely downloaded from the ECMA web site (www.ecma.ch). This site gives full
information on ECMA, ECMA activities, ECMA Standards and Technical Reports.
ECMA
114 Rue du Rhône
CH-1204 Geneva
Switzerland
See inside cover page for obtaining further soft or hard copies.