Sentences in English can be classified into the following common types:
– Simple sentence (“I am drinking coffee”)
– Compound sentence (“He came home with his school friend and they had an enjoyable evening”)
– Complex sentence (“Whenever my dog barks, I give him some biscuit”)
– Imperative sentence (“Please keep quiet”)
– Interrogative sentence (“Where did I park my car?”)
As you can see from the example sentences, there is a distinct grammatical structure for each of the above types. The question now is, how easy is it to identify the type, given a sentence? I guess it is not too hard (note that I am not saying it is easy!) for humans to identify the type. But my interest is in identifying the type programmatically.
Before I get into the details, I would like to make a reference to an article I wrote over a year ago – “Automatically Converting Active Voice to Passive Voice and Vice Versa”. That and the idea that I am discussing today both use traditional parsing (defining the grammar and parsing using DCG) technique. I am not using any Machine Learning algorithms in my implementation. I like this approach because I can learn so much in the process. Helping me here is “iLexicon”, a fairly comprehensive lexicon I have built over the years. It provides the parser with Part of speech information as well as the other constraints required for detecting valid English sentences.
The top-level logic is contained in the following predicates:
The predicate “sentence_type(Sentence)” takes a sentence as input and prints the type of the sentence. To make the idea even more interesting, I am allowing the possibility of detecting the type of sentence fragments as well. This is handled by the predicate “sentence_part(SentencePart)”. This will be clear when you see some examples later.
The finer details of mapping a valid parsed syntactic structure into the corresponding type is handled by a few auxiliary predicates:
The extreme case of a “sentence fragment” being just a single word is also taken into account. In this case, the predicate will print its possible parts of speech.
You can see from the above listing, what sentence types, sentence fragments and parts of speech are supported.
Let us take a look at some actual outputs. The first is categorization of complete sentences.
Although both “compound” and “complex” sentences contain more than one verb, a compound sentence may be thought of as having two independent sentences/clauses, whereas a complex sentence has a main clause and a dependent clause. The program has correctly identified the types. What is missing in the above is “Interrogative” type. The following shows that the program is able to detect different interrogative sentences:
Note that not all interrogations involve “WH” type (see my earlier article on WH-Questions). An interrogative sentence can start with words such as “Can”, “Are”, “May”, “Would”, etc. I am not showing the other cases, for want of space.
The next set of examples pertain to sentence fragments: phrases and clauses.
What about single words? The program can handle that as well:
I know I still have to account for more complex sentence structures (it is fairly good as it is) and it is thus an on-going project. The project uses Sicstus Prolog 4.6.0 and runs on Windows 10 (64-bit).
I hope to share more such experiments in the near future.
Have a nice weekend and a great week ahead!
Recent Comments