Experimenting with Text Simplification

Written by on January 19, 2020 in Natural Language Processing with 0 Comments

After my last book review, I decided to check out a few websites that claim to simplify English text and/or help compute the measure of readability. In today’s post, I am sharing the results of my experiment.

www.simplish.org

This site has some interesting functionality. It does spelling check, grammar check, text simplification, and can even provide a summary of the given text.

I gave the following text for simplification (taken from here):

Long sentences (full of description and unnecessary circumlocution), while capable of containing more information than a much shorter sentence, have a tendency to indulge in bad habits like the passive voice, and oftimes the reader, upon reaching the end of a monstrously long and verbose sentence, doesn’t remember what the sentence was supposed to be about.

Here is the simplified text from the system:

Long groups of words making sense (full of account and unnecessary long way round), while able of having in it more news given than a much shorter group of words making sense, have a tendency to give way to in bad dresses of religion like the actionless voice, and oftimes the reader, upon getting to the end of a cruelly long and full of words group of words making sense, doesn’t have in mind what the group of words making sense was took as probable to be about.

How is that? Forget simplification, the resulting text is not even grammatical!

I tried a simpler sentence next:

I deem it an honour to be bestowed this coveted title upon me.

This is what I got back in return:

I be of opinion (of) it a great respect to be given this desired sign of position upon me.

Not quite inspiring. 

I then asked the system to check the grammar of the above text. No problems reported.

My next example was about the correct use of the phrase cope with”. I have seen people wrongly use the phrase as cope up with”. 

For example, the sentence “He could not cope with the job stress” is sometimes written as “He could not cope up with the job stress”.

When I asked the system to check the grammar of the text “He could not cope up with the job stress”, I got the same text back, suggesting that it was grammatically correct.

I definitely expected something better from the system.

www.rewordify.com

I tried giving the two examples as earlier.

Example-1:

Long sentences (full of description and unnecessary circumlocution), while capable of containing more information than a much shorter sentence, have a tendency to indulge in bad habits like the passive voice, and oftimes the reader, upon reaching the end of a monstrously long and verbose sentence, doesn’t remember what the sentence was supposed to be about.

Here is the output from the system:

Long sentences (full of description and unnecessary (the use of a lot of words, but not in a clear way)), while capable of containing more information than a much shorter sentence, have a habit/desire to do bad habits like the (when you write (for example) that a ball was thrown by a child, rather than a child threw a ball), and oftimes the reader, upon reaching the end of a terribly long and talkative (or rambling) sentence, doesn’t remember what the sentence was supposed to be about.

Example-2:

I deem it an honour to be bestowed this coveted title upon me.

Result:

I think of/consider it a honor to be given this jealously desired title upon me.

Not much different from www.simplish.com. I am not impressed.

www.readable.com

This site does not provide text simplification service, but computes the “readability” of the given text. I had to sign up for the basic plan, which comes with a 7-day free trial.

I entered the following text used earlier:

Long sentences (full of description and unnecessary circumlocution), while capable of containing more information than a much shorter sentence, have a tendency to indulge in bad habits like the passive voice, and oftimes the reader, upon reaching the end of a monstrously long and verbose sentence, doesn’t remember what the sentence was supposed to be about.

I got the following metrics:

SMOG Index 22.9
Automated Readability Index 31.4
FORCAST Grade Level 11.4
Powers Sumner Kearl Grade 10.1
Rix Readability 13
Readable Rating E
Flesch Reading Ease 3.5
CEFR Level B2
IELTS Level 5-6
Spache Score 10.2
New Dale-Chall Score 7
Lensear Write 53.6

There are some more metrics such as sentences with more than 30 syllables, words having more than 12 letters, and so on, but those are not important for our discussion here. The essential point is that this site provides various measures of readability of the given text and can be useful if we know how to write satisfying these metrics.

Here is an example of a highly readable text:

Yoga is good for the body and mind.

Here are the scores:

SMOG Index 3.1
Automated Readability Index -1.5
FORCAST Grade Level 8.8
Powers Sumner Kearl Grade 3.8
Rix Readability 1
Readable Rating A
Flesch Reading Ease 93.0
CEFR Level A2
IELTS Level 2-4
Spache Score 3.7
New Dale-Chall Score 2.4
Lensear Write 87.5

The above sentence gets a grade of “A”, whereas the earlier one got a grade of “E”.

Here is an example of an ungrammatical, nonsense text:

A that it they he she.

Can you guess the scores I got?

SMOG Index 3.1
Automated Readability Index -5.9
FORCAST Grade Level 5.0
Powers Sumner Kearl Grade 3.6
Rix Readability 1
Readable Rating A
Flesch Reading Ease 116.1
CEFR Level A2
IELTS Level 2-4
Spache Score 1.4
New Dale-Chall Score 0.3
Lensear Write 150

Isn’t it weird that this also gets a “Grade A”? I am sure we have to use these metrics with care and not depend on them entirely.

Well, I decided to stop my experimentation at this point. I guess there are some more sites that help in assessment of readability and/or help with simplifying given text. 

As a general remark on readability, technical articles with domain-dependent terms and definitions might be difficult for readers from other disciplines. For example someone from software domain might find an article on biology difficult to understand (and vice versa).

What about syntactically correct, but semantically absurd sentences: “The milk drank the cat”?

Or what about ambiguous sentences: “I saw a man on the hill with a telescope”?

Coming up with a measure of readability based on so many factors is definitely hard, but even harder is taking a sentence and simplifying it. No wonder, then, text simplification as a research topic is enormously challenging!

Have a great week ahead!

Tags: ,

Subscribe

If you enjoyed this article, subscribe now to receive more just like it.

Subscribe via RSS Feed

Leave a Reply

Your email address will not be published. Required fields are marked *

Top