- Integration of CiNii Books functions for fiscal year 2025 has completed
- Trial version of CiNii Research Knowledge Graph Search feature is available on CiNii Labs
- 【Updated on November 26, 2025】Regarding the recording of “Research Data” and “Evidence Data”
- Incorporated Jxiv preprints from JaLC and adding coverage from NDL Search
Extracting best consensus motifs from positive and negative examples
Bibliographic Information
- Published
- 1996-01-01
- DOI
-
- 10.1007/3-540-60922-9_19
- Publisher
- Springer Berlin Heidelberg
Description
We define the best consensus motif (BCM) problem motivated by the problem of extracting motifs from nucleic acid and amino acid sequences. A type over an alphabet Σ is a family Ω of subsets of Σ*. A motif π of type Ω is a string π=π1 ⋯ πn of motif components, each of which stands for an element in Ω. The BCM problem for Ω is, given a yes-no sample S=(α(1),β(1),..., (α(m),β(m))} of pairs of strings in Σ* with α(i) ≠β(i) for 1 ≤ i ≤ m, to find a motif π of type Ω that maximizes the number of good pairs in S, where (α(i), β(i)) is good for π if π accepts α(i) and rejects β(i) We prove that the BCM problem is NP-complete even for a very simple type Ω1=2 ∑ −{θ}, which is used, in practice, for describing protein motifs in the PROSITE database. We also show that the NP-completeness of the problem does not change for the type Ω∞=Ω1∪ {Σ+}∪{Σ[i,j]¦1≤i≤ j}, where Σ[i,j] is the set of strings over Σ of length between i and j Furthermore, for the BCM problem for Ω1 we provide a polynomial-time greedy algorithm based on the probabilistic method. Its performance analysis shows an explicit approximation ratio of the algorithm.

