This article is from the Mongolia FAQ, by Oliver Corff with numerous contributions by others.
Unlike the American ASCII code, the Chinese GuoBiao code or the
Japanese JIS code there is not yet a national code system for the
encoding of Mongolian writing be it encoded in its Classical or
Cyrillic form. As a consequence, no international standard
organization (like ISO) could accept a national standard and turn it
into an international one.
The problems we find in this field are of a complex nature and
frequently have strong mutual dependencies.
Let's look at Cyrillic encoding first. It is not far-fetched to
suggest using an existing Cyrillic encoding scheme for encoding
Mongolian but not even such a simple idea is without its traps. There
is more than one Cyrillic encoding, and some encodings are incomplete:
they do not include the Cyrillic yo or ë. In addition, these tables
(or code pages) usually have no space to accommodate the additional
Mongolian vowel symbols ü which must then be placed somewhere outside
the natural order of the alphabet. Several modified code pages of this
type exist; implementations available are mentioned below.
With Classical writing, the situation is even more complicated. For a
long time in history, there has not been one commonly acknowledged
Classical Mongolian alphabet (or cagaan tolgoï); differences can be
observed in the number of letters, the sorting order and the treatment
of ambiguous letters which have more than one reading for a given
shape, like t/d. The situation is further complicated by the fact that
one given letter may assume numerous different shapes depending on its
position within the word. The designer of an encoding scheme has to
decide whether only canonical letters (the ones under which one would
try to find a word in a dictionary) are to be included or whether all
shape variants should be included as well.
The next problem arises when thinking of computer technology. The
eight bit (one byte) code space of commonly used systems cannot hold
more than 256 characters of which 128 have been defined already. If
both Cyrillic and Classical writing are to be enclosed in one common
code space, it is only possible at the cost of sharing common letter
shapes between Latin and Cyrillic characters. There is no other choice
if one wants to avoid the switching of code pages in one document.
Another problem intimately related to writing is the field of
transcriptions and transliterations. The layout of rules for
transliterating Classical or Cyrillic Mongolian has many consequences
in the field of data exchange, automatic text processing, the building
of library catalogues, etc. Some popular systems (e.g. the so-called
Petersburg transliteration) use characters which are not readily
available on today's computers, and the ones working with reduced
character sets are sometimes not popular.
Only in recent years (more or less starting with the UNESCO conference
on the Computerization of Mongolian script in Ulaanbaatar in August
1992) there has been a genuine international effort to solve these
problems and to come up with an encoding scheme that will be accepted
world-wide. The Mongolian National Institute for Standardization and
Metrology (MNISM), the Chinese National Standard Bureau, other
standard bodies of other countries, ISO and UNICODE all have held
regular meetings during the last years in order to define a standard.
So far, no final agreement exists, and there is no software package
which could serve as a demonstrator for this future standard. All
available software either defines its own code page or relies on ASCII
representations of Mongolian which are then converted into Mongolian