>  >  > tokenization 

Linguistic Tools

A variety of natural language processing tools for 9 European languages.


Tokenization is a natural language processing component that breaks a text into tokens.

The tokenization step uses a finite-state transducer to insert token boundaries around simple words (or multi-word expressions), punctuation, numbers, etc.

We have developed a tokenizer for various languages including Czech, English, French, German, Greek, Hungarian, Italian, Polish and Russian.



Rate this service :
User Name:
Enter the 2 words: Get a new challenge Get an audio challenge Get a visual challenge Help