Title: Automatic Text Simplification
Author: Horacio Saggino
Publisher: Morgan & Claypool Publishers
Year: 2017
Automatic Text Simplification is an active area of research in NLP and has been going on for over 20 years. The idea is to transform a given text T1 into text T2 such that T2 is easier to read and understand compared to T1, while conveying the same information as T1. This is different from Text Paraphrasing, which does not guarantee that the target text is easier to understand compared to the original version. Finally, both of these are different from Text Summarization.
The first two chapters lay the foundation for later chapters. After introducing the idea of text simplification, the author touches upon the notion of text readability and discusses a few classic readability formulas, including Flesch Reading Ease Score and SMOG readability score. He then outlines more recent approaches, including languages models for readability assessment, as well as treating it as a classification problem.
The third chapter focuses on lexical simplification. The idea here is to replace difficult words in the text with easier-to-understand words or phrases that mean the same. One of the initial approaches used WordNet along with the word’s frequency count to find an appropriate replacement for a difficult word. Later approaches made use of corpora such as the Simple English Wikipedia and the regular English Wikipedia, combined with machine learning techniques to arrive at suitable lexical substitution rules. An interesting category of simplification involves handling arithmetic expressions. There is a brief discussion on LexSis system, a lexical simplification system for Spanish.
The next chapter is on syntactic simplification, a much more challenging approach. Whereas lexical simplification ignores the grammatical structure of the sentences in the text, syntactic simplification attempts to replace complicated syntactic phenomena (relative clauses, subordination, etc.) with simpler phrases. One system discussed in this section uses a Java-based pattern matching engine, working on the dependency parse structure, to identify sentence fragments that require transformation. A variant of this approach, also rule-based, uses event extraction to determine key elements of a sentence and then applies a generation step to transform the resulting structure.
Chapter 5 talks about applying Machine Learning techniques to discover text simplification rules from original and simplified text corpora. One approach treats simplification as monolingual machine translation using a well-established statistical framework. Another interesting approach attempts to apply a sequence of pre-defined operations namely, splitting, dropping, reordering, and dropping to the input parse tree to produce the simplified text. When multiple such sequence transformations exist, the system tries to find the best sequence. While both the above are pure syntax-based systems, other researchers have added semantic constraints to improve the quality of simplification.
Chapter 6 briefly discusses three text simplification systems: PSET (English), Simplext (Spanish) and PorSimples (Brazilian Portuguese).
What are the applications of automatic text simplification? Chapter 7 addresses this important question. An important use case is assisting people with special needs, such as those suffering from Dyslexia and Autism Spectrum Disorder (ASD). The other use case is NLP facilitator, where simplification is performed prior to other steps such as Parsing, Information Extraction or Summarization.
For those interested in building text simplification systems, Chapter 8 provides a fairly comprehensive list of resources, including datasets and tools.
One thing I like very much about this book is its extensive bibliography, running to 23 pages! This will be immensely useful to those who would like to pursue further research in this area. Since the book provides a good overview of the field of automatic text simplification, I strongly recommend it to everyone interested in this area.
Have a wonderful New Year 2020!
Recent Comments