|the Open Source Lexical Information Network||Home • Languages • Description • Publications • Contact|
Description of the OSLIN system
The Paradigm System
One of the core types of information in OSLIN is the inflection of all the words. Even though all the other modules provide invaluable information, there are many application for which the only thing needed is a full-form lexicon. The inflected forms are are stored in a table of wordforms, where each word form is stored with an identifier of the for it is (for instance, the masculin singular form of the adjective), what its orthgraphy is, and which lexical entry in the lemmalist it belongs to.
To facilitate the maintenance of the inflectional information, OSLIN uses a paradigm system. The paradigm system is used to define for each language how words of different classes inflect, defined in terms of a transformation from the citation form to the various inflected forms. The transformation is done is two steps: first the root is deduced from the citation form, and then prefixes and suffixes are added to the root. This system is rich enough to provide inflection for a wide range of languages, including Catalan and Russian, although it does not suffice for all the languages of the world: for instance for Dutch and Arabic, the system has to be augmented specifically.
The database of paradigms stores the paradigms with a name (number) for the paradigm, an example word to easily refer to it, the information how to get the root from the citation form, and which affixes and prefixes to add to obtain the inflected forms. Each lexical entry in the lemmalist then gets assigned a paradigm, which implicitly defines all the inflected forms. This paradigm number is then used to explicitly generate all the individual word forms stored in the wordforms table.
The paradigm system in principle has no function other than the easy management of inflected forms: using the paradigm system makes the correction and creation of inflected forms easier and more reliable. However, the paradigms can be displayed on the portal as well, providing information about other words that inflect in the same way as a given word. It also is useful for linguistic resource, since it provides easily accessible data about the relative frequency of the different inflectional paradigms in the language.
Return to the description index