Induction of Directed Acyclic Word Graph in a Bioinformatics Task

Wojciech Wieczorek, Olgierd Unold
The 12th International Conference on Grammatical Inference, PMLR 34:207-217, 2014.

Abstract

In this paper a new algorithm for the induction of a Directed Acyclic Word Graph (DAWG) is proposed. A DAWG can serve as a very efficient data structure for lexicon representation and fast string matching, and have a variety of applications. Similar structures are being investigated in the theory of formal languages and grammatical inference, namely deterministic and nondeterministic finite automata (DFA and NFA, respectively). Since a DAWG is acyclic the proposed method is suited for problems where the target language does not necessarily have to be infinite. The experiments have been performed for a dataset from the domain of bioinformatics, and our results are compared with those obtained using the current state-of-the-art methods in heuristic DFA induction.

Cite this Paper


BibTeX
@InProceedings{pmlr-v34-wieczorek14a, title = {Induction of Directed Acyclic Word Graph in a Bioinformatics Task}, author = {Wieczorek, Wojciech and Unold, Olgierd}, booktitle = {The 12th International Conference on Grammatical Inference}, pages = {207--217}, year = {2014}, editor = {Clark, Alexander and Kanazawa, Makoto and Yoshinaka, Ryo}, volume = {34}, series = {Proceedings of Machine Learning Research}, address = {Kyoto, Japan}, month = {17--19 Sep}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v34/wieczorek14a.pdf}, url = {https://proceedings.mlr.press/v34/wieczorek14a.html}, abstract = {In this paper a new algorithm for the induction of a Directed Acyclic Word Graph (DAWG) is proposed. A DAWG can serve as a very efficient data structure for lexicon representation and fast string matching, and have a variety of applications. Similar structures are being investigated in the theory of formal languages and grammatical inference, namely deterministic and nondeterministic finite automata (DFA and NFA, respectively). Since a DAWG is acyclic the proposed method is suited for problems where the target language does not necessarily have to be infinite. The experiments have been performed for a dataset from the domain of bioinformatics, and our results are compared with those obtained using the current state-of-the-art methods in heuristic DFA induction.} }
Endnote
%0 Conference Paper %T Induction of Directed Acyclic Word Graph in a Bioinformatics Task %A Wojciech Wieczorek %A Olgierd Unold %B The 12th International Conference on Grammatical Inference %C Proceedings of Machine Learning Research %D 2014 %E Alexander Clark %E Makoto Kanazawa %E Ryo Yoshinaka %F pmlr-v34-wieczorek14a %I PMLR %P 207--217 %U https://proceedings.mlr.press/v34/wieczorek14a.html %V 34 %X In this paper a new algorithm for the induction of a Directed Acyclic Word Graph (DAWG) is proposed. A DAWG can serve as a very efficient data structure for lexicon representation and fast string matching, and have a variety of applications. Similar structures are being investigated in the theory of formal languages and grammatical inference, namely deterministic and nondeterministic finite automata (DFA and NFA, respectively). Since a DAWG is acyclic the proposed method is suited for problems where the target language does not necessarily have to be infinite. The experiments have been performed for a dataset from the domain of bioinformatics, and our results are compared with those obtained using the current state-of-the-art methods in heuristic DFA induction.
RIS
TY - CPAPER TI - Induction of Directed Acyclic Word Graph in a Bioinformatics Task AU - Wojciech Wieczorek AU - Olgierd Unold BT - The 12th International Conference on Grammatical Inference DA - 2014/08/30 ED - Alexander Clark ED - Makoto Kanazawa ED - Ryo Yoshinaka ID - pmlr-v34-wieczorek14a PB - PMLR DP - Proceedings of Machine Learning Research VL - 34 SP - 207 EP - 217 L1 - http://proceedings.mlr.press/v34/wieczorek14a.pdf UR - https://proceedings.mlr.press/v34/wieczorek14a.html AB - In this paper a new algorithm for the induction of a Directed Acyclic Word Graph (DAWG) is proposed. A DAWG can serve as a very efficient data structure for lexicon representation and fast string matching, and have a variety of applications. Similar structures are being investigated in the theory of formal languages and grammatical inference, namely deterministic and nondeterministic finite automata (DFA and NFA, respectively). Since a DAWG is acyclic the proposed method is suited for problems where the target language does not necessarily have to be infinite. The experiments have been performed for a dataset from the domain of bioinformatics, and our results are compared with those obtained using the current state-of-the-art methods in heuristic DFA induction. ER -
APA
Wieczorek, W. & Unold, O.. (2014). Induction of Directed Acyclic Word Graph in a Bioinformatics Task. The 12th International Conference on Grammatical Inference, in Proceedings of Machine Learning Research 34:207-217 Available from https://proceedings.mlr.press/v34/wieczorek14a.html.

Related Material