The Standard ML Basis Library


The MultiByte structure

The optional MultiByte structure provides support for multibyte encoded strings through functions that convert multibyte strings to and from wide strings and wide characters.

Although the interface supports stateful multibyte encodings, an implementation may support only stateless multibyte encodings. Such an implementation may raise Invalid when any of the functions that take a state argument are called in a locale that uses a stateful encoding.


Synopsis

signature MULTIBYTE
structure MultiByte : MULTIBYTE

Interface

type state
exception Invalid
val initial : state
val mbStringToWide : Word8Vector.vector -> WideString.string
val wideStringToMB : WideString.string -> Word8Vector.vector
val mbCharSize : (state * substring) -> (state * substring * int)
val mbCharToWide : (state * substring) -> (state * substring * WideString.char)
val mbSubstringToWide : (state * substring) -> (state * WideString.string)
val wideCharToMB : (state * WideString.Char.char) -> (state * string)
val wideSubstringToMB : (state * WideSubstring.substring) -> (state * string)
val wideCharToChar : WideString.char -> char option
val collate : (WideSubstring.substring * WideSubstring.substring) -> order

Description

type state

exception Invalid
indicates an attempt was made to convert from an invalid multibyte encoding, or from a multibyte encoding with no corresponding encoding in the target type.

initial
the initial state for multibyte conversions; valid for all locales.

mbStringToWide s
converts the multibyte-encoded string s to the corresponding wide string. Conversion always begins in the initial state. This function corresponds to ANSI C mbstowcs.

wideStringToMB s
converts the wide string s to its corresponding multibyte encoding. Conversion always begins in the initial state. This function corresponds to ANSI C wcstombs.

mbCharSize (st, s)
returns (st', s', n) where n is the number of bytes used by the first multibyte encoding in s. The remainder of s after the first n bytes is s'; the new state after the first multibyte character is st'. This function corresponds to the ISO C function mbrlen.

mbCharToWide (st, s)
returns (st', s', wc) where wc is the wide-character representation of the first multibyte encoding in s. The remainder of s after the first multibyte character is s'; the new state after the first multibyte character is st'. This function corresponds to the ISO C function mbrtowc.

mbSubstringToWide (st, s)
returns (st', ws) where ws is the wide string corresponding to the multibyte encoding (st, s). This function corresponds to the ISO C function mbsrtowcs.

wideCharToMB (st, wc)
returns (st', s) where s is the multibyte encoding corresponding to the wide character wc. The new state st' is the state after extraction of the wide character. This function corresponds to the ISO C function wcrtomb.

wideSubstringToMB (st, ws)
returns (st', s) where s is the multibyte encoding corresponding to the wide substring ws. The new state st' is the state after extraction of the wide substring. This function corresponds to the ISO C function wcsrtombs.

wideCharToChar wc
returns SOME c where c is a single byte character corresponding to wc, if such a mapping exists. Returns NONE if the wide character wc cannot be converted. This function corresponds to the ISO C function wctob.

collate (s, t)
return the order (LESS,EQUAL,GREATER) of two arguments, using the collating order of the current locale. This function corresponds to ANSI C strcoll.


See Also

WideChar, WideString, WideSubstring, Locale

[ INDEX | TOP | Parent | Root ]

Last Modified January 21, 1997
Comments to John Reppy.
Copyright © 1997 Bell Labs, Lucent Technologies