I talked about detecting Emotion from text in the last two articles. Another popular text analysis service is Text Summarization.
There are two approaches for summarization:
- Extractive summarization
- Abstractive summarization
In the first approach, “Extractive Summarization”, the system extracts key sentences from the given text and puts them together to form a summary. There are no new words or phrases in the summary. This is the widely used approach today. An algorithm called “TextRank” is the inspiration for many such implementations. You can read more about this approach here.
The second approach, “Abstractive Summarization” attempts to “understand” what is being discussed in the given text, in order to generate a summary that usually includes new words and phrases. Remember, we humans are good at summarizing given text “in our own words”. This is a challenging research problem, and a lot of focus is currently on this topic.
Today, I am going to take a sample text and run it through three different Summarization APIs and look at the output generated by them. The three API services I am going to use are:
I have written an article earlier describing MeaningCloud’s Parsing API. This Madrid-based company offers a variety of text analysis APIs.
The input for summarization is an article based on Florence Nightingale and is taken from this page. I copy-pasted the text onto NotePad (Windows app) and removed a couple of non-ascii characters so that the final text has only ascii characters (this is not a requirement for generating summary, though).
MeaningCloud.com
Working with MeaningCloud was easy. I logged into my account and pasted the input text in the “Test Console” corresponding to Text Summarization API. The nice thing is that there is an option to specify the number of sentences in the generated summary, and I chose the default (5 sentences). Here is the generated summary:
Nightingale was helped to understand statistics by her country’s leading expert on public statistics,. Cholera is caused by a bacterium spread by people drinking water contaminated by sewage. Florence Nightingale was born into an upper class British family in 1820 in Florence, Tuscany, Italy. In 1907, Florence Nightingale became the first woman to be awarded the Order of Merit by King Edward VII. In London and a Florence Nightingale museum, also in London.
Just to make it interesting, I ran the summarizer again, with the option to emit 10 sentences this time. Here is the result:
Florence Nightingale, OM (12 May 1820 – 13 August 1910), was an English nurse. Florence Nightingale was a wonderful woman. Nightingale was helped to understand statistics by her country’s leading expert on public statistics,. Cholera is caused by a bacterium spread by people drinking water contaminated by sewage. Florence Nightingale was born into an upper class British family in 1820 in Florence, Tuscany, Italy. In 1854 when the Crimean War began, Florence was working in Harley Street in London. Hospital in Istanbul where the injured soldiers were sent, Florence realized that soldiers died more often. In 1907, Florence Nightingale became the first woman to be awarded the Order of Merit by King Edward VII. Nightingale died in 1910 in London. In London and a Florence Nightingale museum, also in London.
Aylien.com
Just as I did with MeaningCloud, I logged into Aylien’s account and submitted the article. In this case, I had to submit the URL of the article. I chose the default summary size of 5 sentences and here is the output :
Florence Nightingale, OM (12 May 1820 – 13 August 1910), was an English nurse.
In 1854 when the Crimean War began, Florence was working in Harley Street in London.
At the hospital in Istanbul where the injured soldiers were sent, Florence realized that soldiers died more often from diseases like cholera than from their injuries in war.
She used her knowledge of maths and statistics to show the British government that providing better conditions for sick and injured soldiers would help them win the war.
There are many statues of her in Britain, including one in Waterloo Place in London and a Florence Nightingale museum, also in London.
When I chose 10 sentences as the summary size, this is what I got:
Florence Nightingale, OM (12 May 1820 – 13 August 1910), was an English nurse.
She became a leader of the team of nurses who helped wounded soldiers during the Crimean War.
Florence Nightingale was a wonderful woman who fought the odds of not living a life expected by her family.
Florence Nightingale was born into an upper class British family in 1820 in Florence, Tuscany, Italy.
In 1854 when the Crimean War began, Florence was working in Harley Street in London.
At the hospital in Istanbul where the injured soldiers were sent, Florence realized that soldiers died more often from diseases like cholera than from their injuries in war.
She used her knowledge of maths and statistics to show the British government that providing better conditions for sick and injured soldiers would help them win the war.
Nightingale believed religion helped provide people with the fortitude for arduous good work.
In 1907, Florence Nightingale became the first woman to be awarded the Order of Merit by King Edward VII.
There are many statues of her in Britain, including one in Waterloo Place in London and a Florence Nightingale museum, also in London.
Deepai.org
In the case of Deep AI, there is no option to control the number of sentences in the generated summary. The documentation says that the summary is about 20 percent of the original text. I logged into my account and pasted the input text in the test console. Here is the generated summary:
Florence Nightingale, OM (12 May 1820 – 13 August 1910), was an English nurse.
helped wounded soldiers during the Crimean War.
Florence Nightingale was a wonderful woman
Nightingale was helped to understand statistics by her country’s leading expert on public statistics,
Unfortunately, both she and Farr believed the disease was caused by foul air: this was called the miasma theory.
Florence Nightingale was born into an upper class British family in 1820 in Florence, Tuscany, Italy.
In 1854 when the Crimean War began, Florence was working in Harley Street in London.
hospital in Istanbul where the injured soldiers were sent, Florence realized that soldiers died more often
show the British government that providing better conditions for sick and injured soldiers would help them
There is a syndrome named after her called “Florence Nightingale Syndrome”.
Nightingale believed religion helped provide people with
In 1907, Florence Nightingale became the first woman to be awarded the Order of Merit by King Edward VII.
Nightingale died in 1910 in London.
in London and a Florence Nightingale museum, also in London.
The summary has 14 sentences.
Observations
Let us discard the 5-sentence summaries from MeaningCloud and Aylien and consider only the 10-sentence versions. That way, the output from all three API services are of “similar” size.
You can detect substantial similarity in the summaries generated by MeaningCloud and Deep AI. In fact, except for the sentence “Cholera is caused by a bacterium spread by people drinking water contaminated by sewage.”, every sentence in the summary of MeaningCloud is also in the summary of Deep AI. Of course, Deep AI has a few extra sentences (total size is 14) as I pointed out earlier.
What is interesting is that some of the sentences in both the summaries are truncated versions of the original sentences. For example, the actual sentence
“Nightingale was helped to understand statistics by her country’s leading expert on public statistics, William Farr.”
has been trucated to:
“Nightingale was helped to understand statistics by her country’s leading expert on public statistics,.”
As another example, the original sentence
“She became a leader of the team of nurses who helped wounded soldiers during the Crimean War.”
has become:
“helped wounded soldiers during the Crimean War.”
It is not clear why this happens.
In terms of the overall output, I prefer the summary produced by Aylien. It appears a bit more coherent than the other two.
What if we asked two people to generate “extractive summarization” manually? Will their output match that of Aylien? My guess is that there is likely to be variation even among human generated summaries, because identifying “important” sentences in a given piece of text is somewhat subjective. Secondly, even if two people choose the same set of sentences, they might rearrange them in slightly different order.
In the meantime, while we are eagerly waiting for good quality “abstractive summarization” implementations, we have to make do with “extractive summarization”.
Have a nice weekend!
Recent Comments