Mathematica 12 was released a few days ago. It has been over a year since version 11.3 came out in March 2018. The long wait appears justified since the new release boasts of numerous improvements and new features across several areas. You may want to read this blog post by Stephen Wolfram.
In the area of Natural Language Processing, one function that appeals to me is TextContents. This allows us to extract meaningful information, such as entities, numbers, locations, etc. form a given piece of text. A list of all the things that we can look for in the text is given here.
In today’s post, I will run through some examples of this function.
Instead of using a toy 2-liner text as example, I decided to use a longer one. So, I put together some information about Florence Nightingale (sorry, I do not remember the source). Here is the text.
First, we import the text from the file.
The interesting thing is we can even ask for syntactic elements appearing in the text. Below, I am asking for all Prepositional Phrases in the example text:
The result above shows the first 20 matches out of 28. It looks good! Since the system needs to download some resources as part of evaluating these functions, it can take several seconds when you evaluate the expression for the first time. I think there is an option to pre-download these locally so as to save time later.
Next, let us look for any text fragments that express positive sentiment:
Looking closely at the result, it is not clear to me how the 4th element in the result “At night, Florence walked around the hospital” is considered to be positive. It seems to be neutral to me (because if you work in the hospital, this is normal). This could be subjective I suppose.
What about negative sentiment?
This looks OK to me.
We can specify multiple features in the same expression. Below, I am asking for all occurrences of “Date”, “Location” and “Number” appearing in the sample text.
If you observe carefully, you will see that the system has wrongly mapped the person “Florence” to the city “Florence” in some places. Looks like more training needs to be done!
We can use the “Containing[]” function to extract sentences that contain certain features, for example, Pronouns.
We can also use the “Entity[]” function to look for specific entities as given here:
Quite interesting! Definitely a useful function in the area of NLP. One of the remarkable things about Mathematica is that it exposes functionality at the right level of abstraction, making it much easier for the user.One feature I was quite eagerly anticipating in this release is support for Text Summarization. I am a bit disappointed it is not there, but I am sure it would be included in a future release because summarization is a hot topic today.
You can download the sample text file as well as the Mathematica Notebook.
Have a great weekend!
Recent Comments