Compact bases formed by motifs called ''irredundant'' and capable of generating all other motifs in a sequence have been proposed in recent years and successfully tested in tasks of biosequence analysis and classification. Given a sequence s of n characters drawn from an alphabet @S, the problem of extracting such a base from s had been previously solved in time O(n^2lognlog|@S|) and O(|@S|n^2log^2nloglogn), respectively, using the FFT-based string searching by Fischer and Paterson. More recently, a solution on binary strings taking time O(n^2) without resorting to the FFT was also proposed. In the present paper, we considered the problem of incrementally extracting the bases of all suffixes of a string. This problem was solved in a previous work in time O(n^3). A much faster incremental algorithm is described here, which takes time O(n^2logn) for binary strings. Although this algorithm does not make use of the FFT, its performance is comparable to the one exhibited by the previous FFT-based algorithms involving the computation of only one base. The implicit representation of a single base requires O(n) space, whence for finite alphabets the proposed solution is within a logn factor from optimality.

Incremental discovery of the irredundant motif bases for all suffixes of a string in O(n2logn) time.

APOSTOLICO, ALBERTO;
2008

Abstract

Compact bases formed by motifs called ''irredundant'' and capable of generating all other motifs in a sequence have been proposed in recent years and successfully tested in tasks of biosequence analysis and classification. Given a sequence s of n characters drawn from an alphabet @S, the problem of extracting such a base from s had been previously solved in time O(n^2lognlog|@S|) and O(|@S|n^2log^2nloglogn), respectively, using the FFT-based string searching by Fischer and Paterson. More recently, a solution on binary strings taking time O(n^2) without resorting to the FFT was also proposed. In the present paper, we considered the problem of incrementally extracting the bases of all suffixes of a string. This problem was solved in a previous work in time O(n^3). A much faster incremental algorithm is described here, which takes time O(n^2logn) for binary strings. Although this algorithm does not make use of the FFT, its performance is comparable to the one exhibited by the previous FFT-based algorithms involving the computation of only one base. The implicit representation of a single base requires O(n) space, whence for finite alphabets the proposed solution is within a logn factor from optimality.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/2268855
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 11
  • OpenAlex ND
social impact