This article is from the Esperanto
FAQ, by Mike Urban email@example.com and Yves Bellefeuille
firstname.lastname@example.org with numerous contributions by others.
15. How can i represent these characters in e-mail or on usenet? (Esperanto)
Accented characters are not included in standard, 7-bit ASCII. Since
only 7-bit ASCII can be reliably transmitted over the net, this leads to
problems when trying to use Esperanto in E-mail and Usenet news. These
problems are not unique to Esperanto; all languages with accents have
Two approaches are possible: using ASCII to represent the accented
characters, or using 8-bit codes and sending them somehow over the net.
Using Standard ASCII:
There are two major work-arounds to represent Esperanto's accented
letters using standard 7-bit ASCII: using the letter "h" to represent
the circumflex, and using the letter "x" to represent all accents.
^ ^ ^ ^ ^ -
Esperanto letter: c g h j s u
"h" method: ch gh hh jh sh u
"x" method: cx gx hx jx sx ux
The "h" method is canonical in Esperanto since the "Fundamento de
Esperanto", which forms the basis of the language, expressly provides
for it. Note that "u with breve" is represented by "u" alone, not "uh".
The "x" method is a recent coinage and first appeared among computer
users; it is used only on the Net.
The following arguments are made in favour of the "x" method:
- The "h" method is ambiguous. Is the letter "h" really supposed to be
there, or is it supposed to represent an accent? The letter "x" doesn't
exist in Esperanto, so there is no ambiguity: any "x" in an Esperanto
text must represent an accent. Rebuttal: This kind of confusion never
happens in practice. "Flughaveno" can only be the Esperanto word for
"airport", since "flug^aveno" isn't a word.
- The "x" method is more suitable for machine treatment of text
(sorting, indexing, etc.). In Esperanto, letters with accents are
different from letters without accents: the alphabet is A, B, C, C^, D,
etc. Since "x" is very close to the end of the alphabet, sorting
algorithms will almost always put the accented letters in their proper
alphabetical order. Rebuttal: These are highly specialized needs.
People who must make their texts machine-treatable can use whatever
method suits their requirements, but this is irrelevant for the vast
majority of Esperanto speakers.
The "x" method was very popular in the early years of the net, but the
"h" method has clearly been gaining ground recently, as more "ordinary"
Esperantists (as opposed to professional computer users, etc.) have
started using the net. Either method may be used with confidence.
The "x" method is perhaps more suitable for beginners, since it removes
all ambiguity, so that a beginner won't try to look up "flug^aveno" in
Other methods are also used, such as typing a circumflex accent (^)
before or after the accented letter, but these are rarer.
These work-arounds should only be used when one is restricted to 7-bit
ASCII. It is wrong to use them when the real characters are available.
All word processing programs can handle the accented letters correctly;
most typewriters (especially electronic typewriters) can also do so. It
is also wrong to use these work-arounds when hand-writing.
Using 8-bit Codes:
Esperanto is covered by the 8-bit encoding known as Latin-3 (ISO
8859-3:1988). Since 8-bit codes usually cannot be reliably transmitted
over the net, some "data massaging" is necessary.
For E-mail, a standard known as MIME (Multi-Purpose Internet Mail
Extension) converts 8-bit characters to 7-bit ASCII for transmission,
and converts the message back to 8 bits upon reception. Many E-mail
programs can do this conversion automatically; however, users with shell
accounts (especially students) often cannot see MIME messages properly.
For this reason, one should ensure that the recipient's system supports
MIME before sending messages in this format.
The use of MIME in Usenet is neither specifically permitted nor
expressly prohibited. Most newsreaders can't handle postings in MIME, so
it is best not to use it in Usenet.
Some users post messages in soc.culture.esperanto and other Usenet
groups using "raw" Latin-3 codes, without attempting to "protect" them
with a 7-bit encoding. This has lead to some heated discussions between
those who say that they can receive the original 8-bit Latin-3 codes,
and those who say that they often (or always) receive gibberish.
Even if the codes are transmitted properly, they can only be viewed as
Esperanto characters if a Latin-3 font is used; users whose language
requires the use of an incompatible 8-bit font (e.g. Russian and
Japanese) will have problems viewing these characters in any event.
Esperanto's accented characters are covered by the incipient "wide
character" standard Unicode (ISO 10646-1:1993), so these problems will
be solved if and when Unicode is widely adopted and implemented. Unicode
is a widely endorsed 16-bit character code covering all languages,
including non-alphabetic languages such as Chinese and Japanese.
For everyday use, it is probably best to use either the "h" method or
the "x" method, both for E-mail and for Usenet news. These methods are
widely used and recognized, and both work well in practice.
If one is sure that the recipient can handle MIME messages, then this
format can be used for E-mail.
No satisfactory 8-bit solution exists today for Usenet. Either the "h"
method or the "x" method should be used for Usenet news.