However, not all natural language processing (NLP) applications require a complete syntactic analysis. A full parse often provides more information than needed and sometimes less. E.g., in Information Retrieval, it may be enough to find simple NPs (Noun Phrases) and VPs (Verb Phrases). In Information Extraction, Summary Generation, and Question Answering, we are interested especially in information about specific syntactico-semantic relations such as agent, object, location, time, etc (basically, who did what to whom, when, where and why), rather than elaborate configurational syntactic analyses.
Partial or shallow parsing | the task of recovering only a limited amount of syntactic information from natural language sentences | has proved to be a useful technology for written and spoken language domains. For example, within the Verbmobil project, shallow parsers were used to add robustness to a large speech-to-speech translation system wahl00. Shallow parsers are also typically used to reduce the search space for full-blown, `deep' parsers Coll96. Yet another application of shallow parsing is question-answering on the World Wide Web, where there is a need to efficiently process large quantities of (potentially) ill-formed documents buch,Srihari+99. And more generally all text mining applications, e.g. in biology Sekimizu+98.
[Abney(1991)] is credited with being the first to argue for the relevance of shallow parsing, both from the point of view of psycholinguistic evidence and from the point of view of practical applications. His own approach used hand-crafted cascaded Finite State Transducers to get at a shallow parse.
Typical modules within a shallow parser architecture include the following:
Because shallow parsers have to deal with natural languages in their entirety, they are large, and frequently contain thousands of rules (or rule analogues). For example, a rule might state that determiners (words such as the) are good predictors of noun phrases. These rule sets also tend to be largely `soft', in that exceptions abound. Continuing with our example, in the phrase:
...fatalities on non-interstate roads were about the samethe word the is instead within the adjectival phrase were about the same. This example was taken from the Parsed Wall Street Journal Marc93.
Building shallow parsers is therefore a labour-intensive task. Unsurprisingly, shallow parsers are usually automatically built, using techniques originating within the machine learning (or statistical) community.
The work by [Ramshaw and Marcus(1995)] proved to be an important inspiration source for this work. By formulating the task of NP-chunking as a tagging task, a large number of machine learning techniques suddenly became available to solve the problem. In this approach, each word is associated with one of three tags: I (for a word inside an NP), O (for outside of an NP), and B (for between the end of one and the start of another NP). The classification task can easily be extended to other types of chunks and with some effort even to finding relations Buchholz+99. For an extension of a HMM approach from tagging to chunking, see [Skut and Brants(1998)].
Readers are encouraged to visit the Computational Natural Language Learning (CoNLL) shared task websites:1
http://lcg-www.uia.ac.be/conll2000/chunking/and:
http://lcg-www.uia.ac.be/conll2001/clauses/for background reading, datasets and results of more than 20 shallow parsing systems.
Applying learning techniques is however not necessarily straightforward:
Shallow parsing, like much of natural language processing, is therefore a challenging domain for machine learning research.
Note that shallow parsing does not refer to a single technique. Instead, it is better to consider it to refer to a family of related methods, all of which attempt to recover some syntactic information, at the possible expense of ignoring all other such information.