>  >  > Part of Speech Tagging 

Linguistic Tools

A variety of natural language processing tools for 9 European languages.

 Part of Speech Disambiguation (Tagging)

The general purpose of a part-of-speech tagger is to associate each word in a text with its morphosyntactic category (represented by a tag).


This+PRON  is+VAUX_3SG  a+DET  sentence+NOUN_SG  .+SENT

The process of tagging consists in three steps:

  1. tokenization: break a text into tokens

  2. lexical lookup: provide all potential tags for each token

  3. disambiguation: assign to each token a single tag

Each step is performed by an application program which uses language specific data:

  • The tokenization step uses a finite-state transducer to insert token boundaries around simple words (or multi-word expressions), punctuations, numbers, etc.

  • Lexical lookup requires a morphological analyser to associate each token with one or more readings. Unknown words are handled by a guesser which provides potential part-of-speech categories based on affix patterns.

  • Disambiguation is done with statistical methods (Hidden Markov Model).

Using the Xerox HMM training tools, we have developed part-of-speech disambiguators for various languages including Czech, English, French, German, Greek, Hungarian, Italian, Polish and Russian.

We have two Part of Speech Tagging demos. One is standard, the other does the tagging in real time. They are using the following tag sets


Rate this service :
User Name:
Enter the 2 words: Get a new challenge Get an audio challenge Get a visual challenge Help