>  >  > Tokenization 

Linguistic Tools

A variety of natural language processing tools for 9 European languages.
3.57500
(190)

Tokenization

Tokenization is a natural language processing component that breaks a text into tokens.

The tokenization step uses a finite-state transducer to insert token boundaries around simple words (or multi-word expressions), punctuation, numbers, etc.

We have developed a tokenizer for various languages including Czech, English, French, German, Greek, Hungarian, Italian, Polish and Russian.

 

Comments

Rate this service :
User Name:
Enter the 2 words: Get a new challenge Get an audio challenge Get a visual challenge Help