lotus

previous page: 5.1 Traveling with a computer in Hungary
  
page up: Hungarian FAQ
  
next page: 5.2.1 Coding Hungarian accents: House rules for plain (7-bit) ASCII

5.2.0 Coding Hungarian accents: Introduction & section overview




Description

This article is from the Hungarian FAQ, by Zoli Fekete fekete@bc.edu with numerous contributions by others.

5.2.0 Coding Hungarian accents: Introduction & section overview

During the evolution of teletypes and computers, two character tables
survived, acquiring major importance in later computer systems. One is
EBCDIC, primarily used in ancient IBM mainframes. The other one, ASCII,
can be considered today's ubiquitous standard in computing worldwide.
The rest of this section, therefore, pays attention to ASCII code, very
unfairly ignoring EBCDIC, since none of the accent conversion programs
support neither this code table nor the CMS environment.

Since the language of computing has been English from the beginning,
the original ASCII table was limited to the characters used in English:
letters of the Latin alphabet, a few punctuation marks and some other
special symbols. Since the number of all these characters, plus the
unprintable "control" characters (located in the first 32 positions of
the ASCII table, responsible for different control functions) doesn't
exceed 128, the real 'brilliant' idea of representing the ASCII table
in 7 bits spread like wild fire all over the computer world. No wonder,
that most of the Internet mailers and Usenet hubs are also set up to
forward documents in 7-bit ASCII only. (Read the rest of the section
carefully to learn how to overcome these problems.) As computing and
word processing started to rise up in the rest of the world, there was
an increasing demand to represent these national characters as well. (A
good example is Hungarian. The extra consonants [nonexistent in
English] are formed by merely juxtaposing 2 (or 3 in case of dzs)
regular Latin characters; so there is no problem here. However, the
special vowels of the language are denoted by applying different
accents on the Latin 'base-vowel', introducing new characters, the so
called accented vowels.) It's an obvious idea to place these national
characters and other fancy symbols utilizing codes 128 to 255, still
remaining within the byte limit. Different character sets have been
created by defining purpose- or language-specific characters for the
upper half of the table, while keeping the 7-bit ASCII codes unchanged.
(Note: Some character sets also re-use codes between 0 and 31, the
domain of ASCII control characters, keeping some, or none of them.
Using these codes, however, is pretty difficult, device- and
implementation-dependent, etc. Therefore it wouldn't be wise to put
accented characters here, but fortunately none of the sets listed below
did it actually.) Hopefully Unicode will ultimately stop this
confusion, but until then there's a long long way to go.

At this point let's clarify the terminology:

.. ASCII (also 7-bit or plain ASCII) data:
Usually text (but not necessarily, see 5.2.5.1.), containing only 7-bit
ASCII characters, including the control ones.
.. 8-bit (extended) ASCII data:
Text containing the uniform 7-bit ASCII characters, plus special
characters (with code greater than 127) according to one of the 8-bit
character sets.
.. Binary data:
Non-text data (executables, pictures, etc.) containing any 8-bit value.

The different kludges accepted by Internet users to denote accented
vowels in 7-bit ASCII are described in 5.2.1. The most important
extended ASCII character sets are introduced in 5.2.2. 5.2.3 shows the
accented character representations used by high-level formatting
languages. The correct ways of transferring files among word processor
[on the Net] are detailed in 5.2.4. If the data to be transferred is
not 7-bit ASCII, 5.2.5 tells you what to do. Last, but not least, 5.2.6
introduces the programs in the HIX archives (and mentions some others)
that address the problem of conversion between the various types of
accent representation.

 

Continue to:













TOP
previous page: 5.1 Traveling with a computer in Hungary
  
page up: Hungarian FAQ
  
next page: 5.2.1 Coding Hungarian accents: House rules for plain (7-bit) ASCII