Apostolico–Giancarlo algorithm

Optimization of Boyer–Moore string-search algorithm

This article includes a list of references, related reading, or external links, but its sources remain unclear because it lacks inline citations. Please help improve this article by introducing more precise citations. (October 2023) (Learn how and when to remove this message)

In computer science, the Apostolico–Giancarlo algorithm is a variant of the Boyer–Moore string-search algorithm, the basic application of which is searching for occurrences of a pattern $P$ in a text $T$ . As with other comparison-based string searches, this is done by aligning $P$ to a certain index of $T$ and checking whether a match occurs at that index. $P$ is then shifted relative to $T$ according to the rules of the Boyer–Moore algorithm, and the process repeats until the end of $T$ has been reached. Application of the Boyer–Moore shift rules often results in large chunks of the text being skipped entirely.

With regard to the shift operation, Apostolico–Giancarlo is exactly equivalent in functionality to Boyer–Moore. The utility of Apostolico–Giancarlo is to speed up the match-checking operation at any index. With Boyer–Moore, finding an occurrence of $P$ in $T$ requires that all $n$ characters of $P$ be explicitly matched. For certain patterns and texts, this is very inefficient – a simple example is when both pattern and text consist of the same repeated character, in which case Boyer–Moore runs in $O(nm)$ , where $m$ is the length in characters of $T$ . Apostolico–Giancarlo speeds this up by recording the number of characters matched at the alignments of $T$ in a table, which is combined with data gathered during the pre-processing of $P$ to avoid redundant equality checking for sequences of characters that are known to match. It can be seen as a generalization of the Galil rule.

References

Apostolico, Alberto; Giancarlo, Raffaele (1986). "The Boyer–Moore–Galil String Searching Strategies Revisited". SIAM Journal on Computing. 15: 98–105. doi:10.1137/0215007.
Crochemore, Maxime; Lecroq, Thierry (1997). "Tight bounds on the complexity of the Apostolico-Giancarlo algorithm" (PDF). Information Processing Letters. 63 (4): 195–203. doi:10.1016/S0020-0190(97)00107-5.
Crochemore, M.; Rytter, W. (1994). Text Algorithms. Oxford University Press.
Gusfield, D. (1997). Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press. ISBN 0-521-58519-8.
Lecroq, T. (1992). Recherches de Mots (Ph. D. Thesis). University of Orléans.
Lecroq, Thierry (1995). "Experimental results on string matching algorithms". Software: Practice and Experience. 25 (7): 727–765. doi:10.1002/spe.4380250703. S2CID 15253073.

v t e Strings
String metric	Approximate string matching Bitap algorithm Damerau–Levenshtein distance Edit distance Gestalt pattern matching Hamming distance Jaro–Winkler distance Lee distance Levenshtein automaton Levenshtein distance Wagner–Fischer algorithm
String-searching algorithm	Apostolico–Giancarlo algorithm Boyer–Moore string-search algorithm Boyer–Moore–Horspool algorithm Knuth–Morris–Pratt algorithm Rabin–Karp algorithm Raita algorithm Trigram search Two-way string-matching algorithm Zhu–Takaoka string matching algorithm
Multiple string searching	Aho–Corasick Commentz-Walter algorithm
Regular expression	Comparison of regular-expression engines Regular grammar Thompson's construction Nondeterministic finite automaton
Sequence alignment	BLAST Hirschberg's algorithm Needleman–Wunsch algorithm Smith–Waterman algorithm
Data structure	DAFSA Substring index Suffix array Suffix automaton Suffix tree Compressed suffix array LCP array FM-index Generalized suffix tree Rope Ternary search tree Trie
Other	Parsing Pattern matching Compressed pattern matching Longest common subsequence Longest common substring Sequential pattern mining Sorting String rewriting systems String operations

Category:

String matching algorithms