Emacs Online Documentation

define-coding-system

define-coding-system is a compiled Lisp function in `mule.el'.

(define-coding-system NAME DOCSTRING &rest PROPS)

Define NAME (a symbol) as a coding system with DOCSTRING and attributes.
The remaining arguments must come in pairs ATTRIBUTE VALUE. ATTRIBUTE
may be any symbol.

The following attributes have special meanings. Those labeled as
"(required)" should not be omitted.

`:mnemonic' (required)

VALUE is a character to display on mode line for the coding system.

`:coding-type' (required)

VALUE must be one of `charset', `utf-8', `utf-16', `iso-2022',
`emacs-mule', `shift-jis', `ccl', `raw-text', `undecided'.

`:eol-type'

VALUE is the EOL (end-of-line) format of the coding system. It must be
one of `unix', `dos', `mac'. The symbol `unix' means Unix-like EOL
(i.e. single LF), `dos' means DOS-like EOL (i.e. sequence of CR LF),
and `mac' means Mac-like EOL (i.e. single CR). If omitted, Emacs
detects the EOL format automatically when decoding.

`:charset-list'

VALUE must be a list of charsets supported by the coding system. On
encoding by the coding system, if a character belongs to multiple
charsets in the list, a charset that comes earlier in the list is
selected. If `:coding-type' is `iso-2022', VALUE may be `iso-2022',
which indicates that the coding system supports all ISO-2022 based
charsets. If `:coding-type' is `emacs-mule', VALUE may be
`emacs-mule', which indicates that the coding system supports all
charsets that have the `:emacs-mule-id' property.

`:ascii-compatible-p'

If VALUE is non-nil, the coding system decodes all 7-bit bytes into
the corresponding ASCII characters, and encodes all ASCII characters
back to the corresponding 7-bit bytes. VALUE defaults to nil.

`:decode-translation-table'

VALUE must be a translation table to use on decoding.

`:encode-translation-table'

VALUE must be a translation table to use on encoding.

`:post-read-conversion'

VALUE must be a function to call after some text is inserted and
decoded by the coding system itself and before any functions in
`after-insert-functions' are called. This function is passed one
argument; the number of characters in the text to convert, with
point at the start of the text. The function should leave point
the same, and return the new character count.

`:pre-write-conversion'

VALUE must be a function to call after all functions in
`write-region-annotate-functions' and `buffer-file-format' are
called, and before the text is encoded by the coding system
itself. This function should convert the whole text in the
current buffer. For backward compatibility, this function is
passed two arguments which can be ignored.

`:default-char'

VALUE must be a character. On encoding, a character not supported by
the coding system is replaced with VALUE.

`:for-unibyte'

VALUE non-nil means that visiting a file with the coding system
results in a unibyte buffer.

`:mime-charset'

VALUE must be a symbol whose name is that of a MIME charset converted
to lower case.

`:mime-text-unsuitable'

VALUE non-nil means the `:mime-charset' property names a charset which
is unsuitable for the top-level media type "text".

`:flags'

VALUE must be a list of symbols that control the ISO-2022 converter.
Each must be a member of the list `coding-system-iso-2022-flags'
(which see). This attribute is meaningful only when `:coding-type'
is `iso-2022'.

`:designation'

VALUE must be a vector [G0-USAGE G1-USAGE G2-USAGE G3-USAGE].
GN-USAGE specifies the usage of graphic register GN as follows.

If it is nil, no charset can be designated to GN.

If it is a charset, the charset is initially designated to GN, and
never used by the other charsets.

If it is a list, the elements must be charsets, nil, 94, or 96. GN
can be used by all the listed charsets. If the list contains 94, any
iso-2022 charset whose code-space ranges are 94 long can be designated
to GN. If the list contains 96, any charsets whose whose ranges are
96 long can be designated to GN. If the first element is a charset,
that charset is initially designated to GN.

This attribute is meaningful only when `:coding-type' is `iso-2022'.

`:bom'

This attributes specifies whether the coding system uses a `byte order
mark'. VALUE must be nil, t, or cons of coding systems whose
`:coding-type' is `utf-16' or `utf-8'.

If the value is nil, on decoding, don't treat the first two-byte as
BOM, and on encoding, don't produce BOM bytes.

If the value is t, on decoding, skip the first two-byte as BOM, and on
encoding, produce BOM bytes according to the value of `:endian'.

If the value is cons, on decoding, check the first two-byte. If they
are 0xFE 0xFF, use the car part coding system of the value. If they
are 0xFF 0xFE, use the cdr part coding system of the value.
Otherwise, treat them as bytes for a normal character. On encoding,
produce BOM bytes according to the value of `:endian'.

This attribute is meaningful only when `:coding-type' is `utf-16' or
`utf-8'.

`:endian'

VALUE must be `big' or `little' specifying big-endian and
little-endian respectively. The default value is `big'.

This attribute is meaningful only when `:coding-type' is `utf-16'.

`:ccl-decoder'

VALUE is a symbol representing the registered CCL program used for
decoding. This attribute is meaningful only when `:coding-type' is
`ccl'.

`:ccl-encoder'

VALUE is a symbol representing the registered CCL program used for
encoding. This attribute is meaningful only when `:coding-type' is
`ccl'.

`:inhibit-null-byte-detection'

VALUE non-nil means Emacs ignore null bytes on code detection.
See the variable `inhibit-null-byte-detection'. This attribute
is meaningful only when `:coding-type' is `undecided'.

`:inhibit-iso-escape-detection'

VALUE non-nil means Emacs ignores ISO-2022 escape sequences on
code detection. See the variable `inhibit-iso-escape-detection'.
This attribute is meaningful only when `:coding-type' is
`undecided'.

`:prefer-utf-8'

VALUE non-nil means Emacs prefers UTF-8 on code detection for
non-ASCII files. This attribute is meaningful only when
`:coding-type' is `undecided'.