LingDoc Transform Short Description:
LingDoc is a set of tools for Document and Language Engineering.
LingDoc Transform Long Description:
LingDoc Transform is a general-purpose tool for the analysis of natural language texts, for XML up-conversion and for information extraction. It has been used in projects like the preprocessing of texts, parsing of natural languages, transformation of graphemes into phonemes, classification of Latin words and conversion of a Russian-Dutch dictionary into a Dutch-Russian one.
For XML up-conversion, the structure of the source document is specified by a formal grammar, akin to the document schema of the target document. In a first stage errors and inconsistencies in the source document are detected. In a second stage a correct target document is created.
Transform has two modes: one for pattern-matching and -replacement and one for parsing and transduction according to grammars and lexicons. It is based on unique extensions to the theory of formal grammars and automata.
The grammar formalism supports regular expressions on the character and on the rule level, Chomsky type 0 rules, attributed variables, pattern rewriting rules and programming statements.
Cascaded grammars may be used in a multi-stage/pipelined process, e.g. for the integration of linguistic layers and for the modularization of large transduction grammars.
The generated programs have the unlimited look-ahead and on-line property, handle all kind of ambiguities and report errors as soon as possible.
Transform provides a head start for the development of solutions, be it for individual developers, service providers or system integrators. The source code and documentation may be obtained for free from www.palstar.nl.