Solving the String Statistics Problem in Time O(n log n)
Gerth S. Brodal, Rune B. Lyngsø, Anna Östlin, and Christian N. S. Pedersen
To appear in Proc. 29th International Colloquium on Automata,
Languages, and Programming, 2002.
Abstract
The string statistics problem consists of preprocessing a string of
length n such that given a query pattern of length m,
the maximum number of non-overlapping occurrences of the query pattern
in the string can be reported efficiently. Apostolico and Preparata
introduced the minimal augmented suffix tree (MAST) as a data
structure for the string statistics problem, and showed how to
construct the MAST in time O(nlog2 n)
and how it supports queries in time O(m) for constant
sized alphabets. A subsequent theorem by Fraenkel and Simpson stating
that a string has at most a linear number of distinct squares implies
that the MAST requires space O(n). In this paper we
improve the construction time for the MAST to O(nlog
n) by extending the algorithm of Apostolico and Preparata to
exploit properties of efficient joining and splitting of search trees
together with a refined analysis.