Published in: Mathematica Programming

Working with Linguistic Data in Mathematica

There are many interesting functions in Mathematica for working with language data, not just in English but in many other languages too.

The DictionaryLookup[] function is a good starting point. Let us see what languages are supported as part of dictionary lookup:

That is a good collection. It is nice to see that our National language Hindi is supported. Shall we try to get a glimpse of the words in the Hindi dictionary?

You can also see that 15983 Hindi words are available in the dictionary.

In contrast, the English dictionary contains a lot more number of words.

The documentation says that all forms of a word are included in the dictionary, i.e., no word stemming is done by default. So you will find the words “eat”, “eats” and “eating” in the dictionary.

Let us look for all English words that start with “lo” and end in “y”.

Another interesting function is Transliterate[]. You can use it to transliterate a string to plain ASCII or from one script to another.

WordTranslation[] is a very useful function. It gives word translations across many languages.

It is interesting to see that multiple Indian languages are supported by this function!

The function LanguageIdentify[] takes a string and guesses the corresponding language.

The last function we will look at is LanguageData[]. It takes a language and a property and returns the corresponding property value for that language.

Obviously, Wolfram guys have to work more on improving the Tamil script! But it is a good start.

From a developer’s perspective, it is indeed admirable that Mathematica attempts to support such diversity!

That is all for today. Hope you found the discussion interesting and useful. Thanks for your time!

Rangarajan Krishnamoorthy on Programming and Other Topics

Rangarajan Krishnamoorthy on Programming and Other Topics

Working with Linguistic Data in Mathematica

Leave a comment Cancel reply

Share this:

Leave a comment Cancel reply