The string correction factor is the term by which the probability of a word w needs to be multiplied in order to account for character changes or “errors” occurring in at most k arbitrary positions in that word. The behavior of this factor, as a function of k and of the word length, has implications on the number of candidates that need to be considered and weighted when looking for subwords of a sequence that present unusually recurrent replicas within some bounded number of mismatches. Specifically, it is seen that over intervals of mono- or bi- tonicity for the correction factor, only some of the candidates need be considered. This mitigates the computation and leads to tables of over- represented words that are more compact to represent and inspect. In recent work, expectation and score monotonicity has been established for a number of cases of interest, under i.i.d. probabilistic assumptions. The present paper reviews the cases of bi-tonic behavior for the correction factor, concentrating on the instance in which the question is still open.
On the Monotonicity of the String Correction Factor for Words with Mismatches
PIZZI, CINZIA
2006
Abstract
The string correction factor is the term by which the probability of a word w needs to be multiplied in order to account for character changes or “errors” occurring in at most k arbitrary positions in that word. The behavior of this factor, as a function of k and of the word length, has implications on the number of candidates that need to be considered and weighted when looking for subwords of a sequence that present unusually recurrent replicas within some bounded number of mismatches. Specifically, it is seen that over intervals of mono- or bi- tonicity for the correction factor, only some of the candidates need be considered. This mitigates the computation and leads to tables of over- represented words that are more compact to represent and inspect. In recent work, expectation and score monotonicity has been established for a number of cases of interest, under i.i.d. probabilistic assumptions. The present paper reviews the cases of bi-tonic behavior for the correction factor, concentrating on the instance in which the question is still open.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.