This article is from the Nordic countries FAQ, by Antti Lahelma and Johan Olofsson, with numerous contributions by others.
The "mother" of all modern character sets for computers is the
original ASCII character set, now renamed to US-ASCII. (ASCII =
"American Standard Code for Information Interchange"). This is a 7-bit
set containing the characters needed to write American English without
accents or special letters, and little more. No "foreign" letters are
included.
Various standards exist for representing extra characters, some of
which are: Digraph, LaTeX, ISO-646, ISO-8859-1, and the IBM codepages
437, 850, and 865. All of these sets, except the IBM codepages, are
usually considered acceptable on soc.culture.nordic, e-mail, and the
internet in general.
Digraphs are two-character combinations used for simplicity, and are
often the most universally understood notation on soc .culture
.nordic. However, when using these to non-Nordics, one should be
careful to explain that these are digraphs, not two separate
characters. Also, some information may get lost by using digraphs,
since a filtering program will not be able to determine whether it is
really a digraph or two separate characters.
LaTeX notation comes from the typesetting program by the same name,
where a sequence starting with '\' may be substituted with a given
character. For instance, the a-ring is written as "\aa" or "{\aa}" in
LaTeX.
ISO-646 (really ISO-646-NO and ISO-646-SE) are 7-bit sets similar to
US-ASCII, but with national characters substituted in place of the
following characters: {, |, }, [, \, ]. This is the oldest one of the
"true representation" standards mentioned here; it was used in e.g.
the Nordic versions of the CP/M operating system, prior to MS-DOS.
Today, it is mostly used in Sweden and Finland (although the ordering
of the letters, for the sake of compability with the Danish /Norwegian
/German equivalents, are not correct in these languages).
ISO-8859-1, also called ISO Latin-1, is the first of several 8-bit
character sets described in International Standards Organization's
document 8859. (ISO is the maintainer of the meter, the kilogram,
etcetera.) This sets include all characters needed for all West
European languages, leave Sámi and Esperanto. Latin-1 is a superset of
US-ASCII, hence all ASCII characters maintain their original position
in this set. Rather than trying to accomodate positioning in any
spesific language, the letters in ISO-8859-1 are ordered according to
the alphabetical position of their US-ASCII lookalikes. Latin-1 is
supported through modern standardizations like MIME (RFC 1521).
The IBM codepages 437, 850, 861 and 865 are used on Personal Computers
in "text" mode, and is also the default set on many MS-Windows ®
communication programs. Out of the Big Blue, they were created to
provide text-based PC programs with a means to create low-cost
graphics, and the addition of extra characters came as a nice side
effect. (Certain Nordic characters were not represented in the
original codepage 437, with the consequence that in Iceland, Denmark
and Norway, computers would occasionally be sold with cp 861 or 865 in
the hardware. Today, alternative codepages can be downloaded to the
video card via software). The Danish /Norwegian character o-slash is
not represented in cp 437, and in 850 /861 /865 it is positioned with
the dangerous code 155 (9B hex) -- "Upper Escape". Certain terminal
types will interpret this code as the initial character of a escape
command, and may e.g. clear the screen depending on the next letter.
Further, it is incompatible with the established 8-bit standard
Latin-1, and should be avoided.
The various notations of the Nordic graphemes follow:
Letter Digraph LaTeX ISO-646 ISO-8859-1
HTML Octal Char
_________________________________ _____________________________________
a acute A' \'{A} - alt-0193 Á Á \301 Á
a' \'{a} - alt-0225 á á \341 á
eth TH - alt-0208 Ð Ð \320 Ð
th - alt-0240 ð ð \360 ð
e acute E' \'{E} - alt-0201 É É \311 É
e' \'{e} - alt-0233 é é \351 é
i acute I' \'{I} - alt-0205 Í Í \315 Í
i' \'{i} - alt-0237 í í \355 í
o acute O' \'{O} - alt-0211 Ó Ó \323 Ó
o' \'{o} - alt-0243 ó ó \363 ó
u acute U' \'{U} - alt-0218 Ú Ú \332 Ú
u' \'{u} - alt-0250 ú ú \372 ú
y acute Y' \'{Y} - alt-0221 Ý Ý \335 Ý
y' \'{y} - alt-0253 ý ý \375 ý
thorn TH - alt-0222 Þ Þ ; \336 Þ
th - alt-0254 þ þ \376 þ
u diaeresis U" \"{U} ^ alt-0220 Ü Ü \334 Ü
u" \"{u} ~ alt-0252 ü ü \374 ü
ae AE {\AE} [ alt-0198 Æ Æ \306 Æ
ae {\ae} { alt-0230 æ æ \346 æ
o-slash OE {\OE} \ alt-0216 Ø Ø \330 Ø
oe {\oe} | alt-0248 ø ø \370 ø
a-ring AA {\AA} ] alt-0197 Å Å \305 Å
aa {\aa} } alt-0229 å å \345 å
a diaeresis A" \"{A} [ alt-0196 Ä Ä \304 Ä
a" \"{a} { alt-0228 ä ä \344 ä
o diaeresis O" \"{O} \ alt-0214 Ö Ö \326 Ö
o" \"{o} | alt-0246 ö ö \366 ö
 
Continue to: