The Standard ML Basis Library

The `SUBSTRING` signature

The SUBSTRING signature specifies manipulations on an abstract representation of a piece of a string. A substring value can be modeled as a triple (s, i, n), where s is the underlying string, i is the starting index, and n is the size of the substring, with the constraint that 0 <= i <= i + n <= size s.

The substring type and its attendant functions provide a convenient abstraction for performing a variety of common analyses of strings, such as finding the leftmost occurrence, if any, of a character in a string. In addition, using the substring functions avoids much of the copying and bounds checking that occurs if similar operations are implemented solely in terms of strings.

The SUBSTRING signature is matched by two structures, the required Substring and the optional WideSubstring. The former is a companion structure to the Char and String structures, which are based on the extended ASCII 8-bit character set. The structure WideSubstring is related in the same way to the structures WideChar and WideString, which are based on characters of some size greater than or equal to 8 bits. In particular, substructure Substring.String is identical to the structure String and, when WideSubstring is defined, the substructure WideSubstring.String is identical to WideString.

Synopsis

signature SUBSTRING structure Substring : SUBSTRING structure WideSubstring : SUBSTRING

Interface

structure String : STRING type substring val base : substring -> (String.string * int * int) val string : substring -> String.string val extract : (String.string * int * int option) -> substring val substring : (String.string * int * int) -> substring val all : String.string -> substring val isEmpty : substring -> bool val getc : substring -> (String.Char.char * substring) option val first : substring -> String.Char.char option val triml : int -> substring -> substring val trimr : int -> substring -> substring val slice : (substring * int * int option) -> substring val sub : (substring * int) -> char val size : substring -> int val concat : substring list -> String.string val explode : substring -> String.Char.char list val isPrefix : String.string -> substring -> bool val compare : (substring * substring) -> order val collate : ((String.Char.char * String.Char.char) -> order) -> (substring * substring) -> order val splitl : (String.Char.char -> bool) -> substring -> (substring * substring) val splitr : (String.Char.char -> bool) -> substring -> (substring * substring) val splitAt : (substring * int) -> (substring * substring) val dropl : (String.Char.char -> bool) -> substring -> substring val dropr : (String.Char.char -> bool) -> substring -> substring val takel : (String.Char.char -> bool) -> substring -> substring val taker : (String.Char.char -> bool) -> substring -> substring val position : String.string -> substring -> (substring * substring) val span : (substring * substring) -> substring val translate : (String.Char.char -> String.string) -> substring -> String.string val tokens : (String.Char.char -> bool) -> substring -> substring list val fields : (String.Char.char -> bool) -> substring -> substring list val foldl : ((String.Char.char * 'a) -> 'a) -> 'a -> substring -> 'a val foldr : ((String.Char.char * 'a) -> 'a) -> 'a -> substring -> 'a val app : (String.Char.char -> unit) -> substring -> unit

Description

structure String

type substring

base s

returns a triple (s, i, n) representing the concrete representation of the substring. s is the underlying string, i is the starting index, and n is the size of the substring.

string s

creates a string value corresponding to the substring. It is equivalent to String.substring o base.

extract (s, i, NONE)

extract (s, i, SOME j)

substring (s, i, j)

return substrings of s. The first returns the substring of s from the ith character to the end of the string, i.e., the string s[i..size s-1]. This raises Subscript if i < 0 or size s < i. The second form returns the substring of length j starting at index i, i.e., the string s[i..i+j-1]. It raises Subscript if i < 0 or j < 0 or size s < i + j. Note that, if defined, extract returns the empty string when i = size s.

The third form returns the substring of length j starting at index i, i.e., the string s[i..i+j-1]. This is equivalent to extract(s, i, SOME j).

We require that base o substring be the identity function on valid arguments. Conversely, substring o base is the identity function on all arguments.

Implementation note:

Note that implementations of this function must perform bounds checking in such a way that the Overflow exception is not raised.

all s

creates a substring representing the entire string s. It is equivalent to substring(s, 0, String.size s).

isEmpty s

returns true if the substring has size 0.

getc s

returns the first character in the substring and the rest of the substring, or NONE if the substring is empty.

first s

returns the first character in the substring, or NONE if the substring is empty.

triml k s

trimr k s

These functions remove k characters from the left (respectively, right) of the substring s. If k is greater than the length of the substring, an empty substring is returned. Specifically, for substring ss = substring(s, i, j) and k > j, we have:

          triml k ss = substring(s, i+j, 0)
          trimr k ss = substring(s, i, 0)

The exception Subscript is raised if k < 0.

slice (s, i, SOME m)

slice (s, i, NONE)

These return a substring of s starting at the ith character. In the former case, the length of the resulting substring is m. Otherwise, the length is size s - i. To be valid, the arguments in the first case must satisfy 0 <= i, 0 <= m and i + m <= n, where n is the length of s. In the second case, the arguments must satisfy 0 <= i and i <= n. If the arguments are not valid, the exception Subscript is raised.

sub (s, i)

is equivalent to String.sub(string s, i). The exception Subscript is raised unless i is non-negative and less than the size of s.

size s

returns the size of the substring. This is equivalent to #3 o base and String.size o string.

concat l

generates a string that is the concatenation of the substrings. This is equivalent to String.concat o (List.map string).

explode s

returns the list of characters composing the substring. This is equivalent to String.explode (string s).

isPrefix s ss

returns true if the string s is a prefix of the substring ss.

compare (s, t)

compares the two substrings lexicographically using the default character comparison function. This is equivalent to

          String.compare (string s, string t)

collate f (s, t)

compares the two substrings lexicographically using the character comparison function f. This is equivalent to

         String.collate f (string s, string t)

splitl f s

splitr f s

These functions scan the substring from left to right (respectively, right to left) looking for the first character that does not satisfy the given predicate. They return the split of the substring into the span up to that character and the rest. The first result is the left side of the split, and the second is the right side. For example, if the characters a and c satisfy the predicate, but character X does not, then these functions work as follows on the substring aaaXbbbbXccc:

            splitl   :           aaa         XbbbbXccc
            splitr   :           aaaXbbbbX   ccc

splitAt (s, i)

returns the pair of substring (ss, ss'), where ss contains the first i characters of s and ss' contains the rest, assuming 0 <= i <= size s. Otherwise, it raises Subscript.

dropl p s

dropr p s

takel p s

taker p s

These routines scan the substring s for the first character not satisfying the predicate p. The functions dropl and takel scan left to right (i.e., increasing character indices), while dropr and taker scan from the right. The drop functions drop the maximal substring satisfying the predicate, while the take functions take the maximal substring. These can be defined in terms of the split operations:

          takel p s = #1(splitl p s)
          dropl p s = #2(splitl p s)
          taker p s = #2(splitr p s)
          dropr p s = #1(splitr p s)

position s ss

splits the substring ss into a pair (pref, suff) of substrings, where suff is the longest suffix of ss that has s as a prefix and pref is the prefix of ss preceding suff. More precisely, let m be the size of s and let ss correspond to the substring (s', i, n). If there is a least index k such that s = s'[k..k+m-1], then suff corresponds to (s', k, n+i-k) and pref corresponds to (s', i, k-i). If there is no such k, then suff is the empty substring corresponding to (s', i+n, 0) and pref corresponds to (s', i, n), i.e., all of ss.

span (ss, ss')

produces a substring composed of a prefix ss, suffix ss', plus all intermediate characters in the underlying string. Raises Span if ss and ss' are not substrings of the same underlying string or if ss is not to the left of ss'. More precisely, if we have

	        val (s, i, n) = base ss
	        val (s', i', n') = base ss'

then span returns substring(s, i, (i'+n')-i) unless s <> s' or i'+n' < i, in which case it raises Span.

This function allows one to scan for a substring using multiple pieces and then coalescing the pieces. For example, given a URL string such as

          "http://www.research.att.com/jhr/sml/basis"

to scan the protocol and host ("http://www.research.att.com"), one could write:

          fun protoAndHost url = let
                fun notc (c : char) = fn c' => c <> c'
                val (proto,rest) = splitl (notc #":") (all url)
                val host = takel (notc #"/") (trim  3 rest)
                in
                  span (proto, host)
                end

Implementation note:

When applied to substrings derived from the identical base string, the string equality test should be constant time. This can be achieved by first doing a pointer test and, if that fails, then checking the strings character by character.

translate f s

applies f to every character of s, from left to right, and returns the concatenation of the results. This is equivalent to String.concat(List.map f (explode s)).

tokens p s

fields p s

These functions decompose a substring into a list of tokens or fields from left to right. A token is a non-empty maximal substring not containing any delimiter. A field is a (possibly empty) maximal substring of s not containing any delimiter. In both cases, a delimiter is a character satisfying predicate p.

Two tokens may be separated by more than one delimiter, whereas two fields are separated by exactly one delimiter. For example, if the only delimiter is the character #"|", then the substring "|abc||def" contains two tokens "abc" and "def", whereas it contains the four fields "", "abc", "" and "def".

foldl f a s

foldr f a s

fold the function f over the substring s, starting with the value a, from left to right and from right to left, respectively. They are the analogues of the identically named functions in List. In particular, they are respectively equivalent to:

 
          List.foldl f a (explode s)
          List.foldr f a (explode s)

app f s

applies f to each character of s from left to right. It is equivalent to List.app f (explode s).

The Standard ML Basis Library

The SUBSTRING signature

Synopsis

Interface

Description

See Also

The `SUBSTRING` signature