[Dejean(2002)] presented a top-down rule induction system, called ALLiS, for learning linguistic structures. The initial system is enhanced with additional mechanisms to deal with noisy data. The author identifies two types of difficulties - significant noise in the data and the presence of linguistically motivated exceptions. Since linguistically motivated exceptions occur, they cannot be treated as noise. To address these problems, a refinement algorithm is introduced to learn exceptions for each rule that is learned. The second improvement introduces linguistically motivated prior knowledge to improve the efficiency and accuracy of the system.
The experimental results clearly demonstrate significant improvement with the introduction of the two mechanisms. The refinement mechanism is based on the assumption that there is some regularity to the errors in the data and thus, by systematically searching for exceptions, the rule induction system is improved. With the use of prior knowledge, the context of only one element need be taken into account and the search space is reduced resulting in a significant reduction in learning time. In comparison to tbl, a well-known transformation based learning system (TBL), ALLiS needs fewer rules and overcomes a number of classification errors produced by TBL.
The incorporation of linguistically motivated prior knowledge in a learning-based system is an interesting addition, and as pointed out in the paper, the question arises whether such background information would be useful in other systems. In any case, it is clear that additional mechanisms are necessary to deal with the noise and exceptions present in natural language data for tasks such as shallow parsing.