Previous: , Up: Internationalization   [Contents][Index]


15.2 @documentencoding enc: Set Input Encoding

The @documentencoding command declares the input document encoding, and can also affect the encoding of the output. Write it on a line by itself, with a valid encoding specification following, near the beginning of the file.

@documentencoding enc

Texinfo supports these encodings:

US-ASCII

This has no particular effect, but it’s included for completeness.

UTF-8

The vast global character encoding, expressed in 8-bit bytes.

ISO-8859-1
ISO-8859-15
ISO-8859-2

These specify the standard encodings for Western European (the first two) and Eastern European languages (the third), respectively. ISO 8859-15 replaces some little-used characters from 8859-1 (e.g., precomposed fractions) with more commonly needed ones, such as the Euro symbol (€).

A full description of the encodings is beyond our scope here; one useful reference is http://czyborra.com/charsets/iso8859.html.

koi8-r

This is the commonly used encoding for the Russian language.

koi8-u

This is the commonly used encoding for the Ukrainian language.

Specifying an encoding enc has the following effects:

In Info output, a so-called ‘Local Variables’ section (see File Variables in The GNU Emacs Manual) is output including enc. This allows Info readers to set the encoding appropriately. It looks like this:

Local Variables:
coding: enc
End:

Also, in Info and plain text output, unless the option --disable-encoding is given to makeinfo, accent constructs and special characters, such as @'e, are output as the actual 8-bit or UTF-8 character in the given encoding where possible.

In HTML output, a ‘<meta>’ tag is output, in the ‘<head>’ section of the HTML, that specifies enc. Web servers and browsers cooperate to use this information so the correct encoding is used to display the page, if supported by the system. That looks like this:

<meta http-equiv="Content-Type" content="text/html;
     charset=enc">

In XML and Docbook output, UTF-8 is always used for the output, according to the conventions of those formats.

In TeX output, the characters which are supported in the standard Computer Modern fonts are output accordingly. For example, this means using constructed accents rather than precomposed glyphs. Using a missing character generates a warning message, as does specifying an unimplemented encoding.

Although modern TeX systems support nearly every script in use in the world, this wide-ranging support is not available in texinfo.tex, and it’s not feasible to duplicate or incorporate all that effort. (Our plan to support other scripts is to create a LaTeX back-end to texi2any, where the support is already present.)

For maximum portability of Texinfo documents across the many different user environments in the world, we recommend sticking to 7-bit ASCII in the input unless your particular manual needs a substantial amount of non-ASCII, e.g., it’s written in German. You can use the @U command to insert an occasional needed character (see Inserting Unicode).


Previous: , Up: Internationalization   [Contents][Index]