{"id":4265,"date":"2026-03-22T09:35:05","date_gmt":"2026-03-22T04:05:05","guid":{"rendered":"https:\/\/www.rangakrish.com\/?p=4265"},"modified":"2026-03-22T09:35:05","modified_gmt":"2026-03-22T04:05:05","slug":"counting-sentences-an-implementation-in-c20","status":"publish","type":"post","link":"https:\/\/www.rangakrish.com\/index.php\/2026\/03\/22\/counting-sentences-an-implementation-in-c20\/","title":{"rendered":"Counting Sentences: An Implementation in C++20"},"content":{"rendered":"<p>Counting the number of sentences in a given paragraph appears rather simple on the surface &#8211; look for the common punctuation marks: <em><strong>\u201c.?!\u201d<\/strong><\/em>. Only when you dig deeper, you will know that it is really not that simple. For example, consider this text: <em><strong>\u201cPeter met Dr.James at 3 p.m.\u201d<\/strong><\/em> How many sentences does this have? Not three, just one! The reason sentence counting is hard is because the most common delimiter, the period, has multiple roles to play. It appears in abbreviations, decimal numbers, email addresses, URLs, initials, and ellipses and so understanding the context is quite important.<\/p>\n<p>Before I forget, I must also point out that a sentence might not have any terminator at all. Consider this: <em><strong>\u201cThe dog ran after the cat. The cat climbed on to the wall\u201d<\/strong><\/em>. Here the second sentence does not have any terminator but we know that it has ended because the text has ended.<\/p>\n<p>What are some approaches we can follow for sentence boundary detection?<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<p style=\"padding-left: 40px;\">1) <em><strong>Rule-based systems:<\/strong><\/em> We hand-craft a set of rules that classify each period as a boundary or non-boundary and use these rules to define the correct pipeline.<\/p>\n<p style=\"padding-left: 40px;\">2) <em><strong>Machine-Learning classifiers:<\/strong><\/em> We can train an unsupervised model that learns which tokens are abbreviations based on statistical cues. This can give good accuracy but can fail in noisy contexts.<\/p>\n<p style=\"padding-left: 40px;\">3) <em><strong>Neural models:<\/strong><\/em> These can yield new-human accuracy, but might require heavy computational resources for training.<\/p>\n<p>I decided to give this problem to <em><strong>Claude<\/strong><\/em>, specifically <em><strong>\u201cOpus 4.6 Extended\u201d<\/strong><\/em> model. I asked it to generate <em><strong>C++20<\/strong> <\/em>code. The good news is that it did it quite fast and even included test cases to validate!<\/p>\n<p>Here are the cases it has handled:<\/p>\n<table width=\"624.0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td valign=\"top\"><b>#<\/b><\/td>\n<td valign=\"top\"><b>Rule<\/b><\/td>\n<td valign=\"top\"><b>Description<\/b><\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">1<\/td>\n<td valign=\"top\">Sentence terminators<\/td>\n<td valign=\"top\">Sentences end with\u00a0 .\u00a0 !\u00a0 or\u00a0 ?<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">2<\/td>\n<td valign=\"top\">Ellipsis<\/td>\n<td valign=\"top\">\u2018&#8230;\u2019 and \u2018\u2026\u2019 do NOT terminate a sentence<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">3<\/td>\n<td valign=\"top\">Abbreviations<\/td>\n<td valign=\"top\">Mr., Dr., U.S., p.m., etc. do NOT end a sentence<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">4<\/td>\n<td valign=\"top\">Decimal numbers<\/td>\n<td valign=\"top\">3.14, $1,200.50 \u2014 dots inside numbers are ignored<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">5<\/td>\n<td valign=\"top\">Initials<\/td>\n<td valign=\"top\">J. K. Rowling \u2014 single-letter dots are not boundaries<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">6<\/td>\n<td valign=\"top\">Quoted endings<\/td>\n<td valign=\"top\">He said, &#8220;Go!&#8221; \u2014 terminators inside quotes still count<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">7<\/td>\n<td valign=\"top\">Multiple terminators<\/td>\n<td valign=\"top\">?!\u00a0 or\u00a0 !!!\u00a0 collapse to a single sentence end<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">8<\/td>\n<td valign=\"top\">URLs and emails<\/td>\n<td valign=\"top\">Dots in http:\/\/x.com or a@b.com are ignored<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>Here are the regular expressions it has generated and used for the common situations:<\/p>\n<figure id=\"attachment_4267\" aria-describedby=\"caption-attachment-4267\" style=\"width: 600px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2026\/03\/regex.png?ssl=1\"><img data-recalc-dims=\"1\" fetchpriority=\"high\" decoding=\"async\" data-attachment-id=\"4267\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2026\/03\/22\/counting-sentences-an-implementation-in-c20\/regex\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2026\/03\/regex.png\" data-orig-size=\"1190,362\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Regular Expressions\" data-image-description=\"&lt;p&gt;Regular Expressions&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Regular Expressions&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2026\/03\/regex-1024x312.png\" class=\"wp-image-4267\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2026\/03\/regex.png?resize=600%2C183&#038;ssl=1\" alt=\"Regular Expressions\" width=\"600\" height=\"183\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2026\/03\/regex.png?resize=300%2C91&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2026\/03\/regex.png?resize=1024%2C312&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2026\/03\/regex.png?w=1190&amp;ssl=1 1190w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><figcaption id=\"caption-attachment-4267\" class=\"wp-caption-text\"><strong>Regular Expressions<\/strong><\/figcaption><\/figure>\n<p>The generated code was compiled thus:<\/p>\n<div>\n<div style=\"padding-left: 40px;\"><span style=\"color: #0000ff;\">g++ -std=c++20 -O2 -o sentence_counter sentence_counter.cpp<\/span><\/div>\n<\/div>\n<p>Here is the program output:<\/p>\n<figure id=\"attachment_4268\" aria-describedby=\"caption-attachment-4268\" style=\"width: 600px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2026\/03\/output.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" data-attachment-id=\"4268\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2026\/03\/22\/counting-sentences-an-implementation-in-c20\/output-15\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2026\/03\/output.png\" data-orig-size=\"1184,1078\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Program Output\" data-image-description=\"&lt;p&gt;Program Output&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Program Output&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2026\/03\/output-1024x932.png\" class=\"wp-image-4268\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2026\/03\/output.png?resize=600%2C546&#038;ssl=1\" alt=\"Program Output\" width=\"600\" height=\"546\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2026\/03\/output.png?resize=300%2C273&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2026\/03\/output.png?resize=1024%2C932&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2026\/03\/output.png?w=1184&amp;ssl=1 1184w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><figcaption id=\"caption-attachment-4268\" class=\"wp-caption-text\"><strong>Program Output<\/strong><\/figcaption><\/figure>\n<p>Does the code cover all possible cases? It is reasonable to say it covers most of the edge cases. Nevertheless it is quite interesting that <em><strong>Claude<\/strong><\/em> was able to generate decent quality code, and that too without any errors. Personally, between <em><strong>ChatGPT<\/strong><\/em> and <em><strong>Claude<\/strong><\/em>, I prefer the latter. Of course, we need to carefully review the generated code (and logic) before integrating it into any project.<\/p>\n<p>You can download the code <a href=\"https:\/\/www.rangakrish.com\/downloads\/sentence_counter.cpp\">here<\/a>.<\/p>\n<p>Have a wonderful weekend.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Counting the number of sentences in a given paragraph appears rather simple on the surface &#8211; look for the common punctuation marks: \u201c.?!\u201d. Only when you dig deeper, you will know that it is really not that simple. For example, consider this text: \u201cPeter met Dr.James at 3 p.m.\u201d How many sentences does this have? [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[49,17],"tags":[371,454,455],"class_list":["post-4265","post","type-post","status-publish","format-standard","hentry","category-c","category-programming","tag-c20","tag-claude","tag-sentence-counting"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p9OLnF-16N","jetpack-related-posts":[{"id":2467,"url":"https:\/\/www.rangakrish.com\/index.php\/2021\/07\/04\/identifying-sentence-types-automatically\/","url_meta":{"origin":4265,"position":0},"title":"Identifying Sentence Types Automatically","author":"admin","date":"July 4, 2021","format":false,"excerpt":"Sentences in English can be classified into the following common types: - Simple sentence (\"I am drinking coffee\") - Compound sentence (\"He came home with his school friend and they had an enjoyable evening\") - Complex sentence (\"Whenever my dog barks, I give him some biscuit\") - Imperative sentence (\"Please\u2026","rel":"","context":"In &quot;Natural Language Processing&quot;","block_context":{"text":"Natural Language Processing","link":"https:\/\/www.rangakrish.com\/index.php\/category\/natural-language-processing\/"},"img":{"alt_text":"Top-level Predicates","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2021\/07\/code-300x233.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":2483,"url":"https:\/\/www.rangakrish.com\/index.php\/2021\/07\/18\/sentence-negation\/","url_meta":{"origin":4265,"position":1},"title":"Sentence Negation","author":"admin","date":"July 18, 2021","format":false,"excerpt":"In the last article, I talked about determining sentence types automatically. Another interesting task is to generate the \"negation\" of a given sentence. Example-1: Sentence => \"My teacher lives nearby\" Negation => \"My teacher does not live nearby\" Example-2: Sentence => \"She did not like that speech\" Negation => \"She\u2026","rel":"","context":"In &quot;Natural Language Processing&quot;","block_context":{"text":"Natural Language Processing","link":"https:\/\/www.rangakrish.com\/index.php\/category\/natural-language-processing\/"},"img":{"alt_text":"Parse Tree","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2021\/07\/parsetree-300x24.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2021\/07\/parsetree-300x24.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2021\/07\/parsetree-300x24.jpg?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":1870,"url":"https:\/\/www.rangakrish.com\/index.php\/2020\/01\/19\/experimenting-with-text-simplification\/","url_meta":{"origin":4265,"position":2},"title":"Experimenting with Text Simplification","author":"admin","date":"January 19, 2020","format":false,"excerpt":"After my last book review, I decided to check out a few websites that claim to simplify English text and\/or help compute the measure of readability. In today\u2019s post, I am sharing the results of my experiment. www.simplish.org This site has some interesting functionality. It does spelling check, grammar check,\u2026","rel":"","context":"In &quot;Natural Language Processing&quot;","block_context":{"text":"Natural Language Processing","link":"https:\/\/www.rangakrish.com\/index.php\/category\/natural-language-processing\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":912,"url":"https:\/\/www.rangakrish.com\/index.php\/2018\/04\/22\/question-answering-using-dependency-trees\/","url_meta":{"origin":4265,"position":3},"title":"Question Answering\u00a0Using Dependency Trees","author":"admin","date":"April 22, 2018","format":false,"excerpt":"A few weeks ago I had written about my brief experiment with Mathematica's new feature, which provides answers to questions based on given text. After that post, I spent some time thinking about how to implement something similar. In today's post, I want to show you what I have been\u2026","rel":"","context":"In &quot;LISP&quot;","block_context":{"text":"LISP","link":"https:\/\/www.rangakrish.com\/index.php\/category\/lisp\/"},"img":{"alt_text":"Dependency Tree","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/04\/Deptree-example.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":534,"url":"https:\/\/www.rangakrish.com\/index.php\/2017\/05\/22\/definite-clause-grammars-dcg-in-lisp\/","url_meta":{"origin":4265,"position":4},"title":"Definite Clause Grammars (DCG) in Lisp","author":"admin","date":"May 22, 2017","format":false,"excerpt":"Definite Clause Grammars (DCG) are an elegant formalism for specifying context free grammars, and part of their popularity is due to their support in the Prolog language. Most books on Natural Language processing usually include a brief coverage of DCGs, even though Natural languages are not context-free. Because of the\u2026","rel":"","context":"In &quot;LISP&quot;","block_context":{"text":"LISP","link":"https:\/\/www.rangakrish.com\/index.php\/category\/lisp\/"},"img":{"alt_text":"DCG Grammar","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2017\/05\/DCG-Grammar.png?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2017\/05\/DCG-Grammar.png?resize=350%2C200 1x, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2017\/05\/DCG-Grammar.png?resize=525%2C300 1.5x"},"classes":[]},{"id":1444,"url":"https:\/\/www.rangakrish.com\/index.php\/2019\/02\/10\/coreference-resolution-in-stanford-corenlp\/","url_meta":{"origin":4265,"position":5},"title":"Coreference Resolution in Stanford CoreNLP","author":"admin","date":"February 10, 2019","format":false,"excerpt":"In the last article, I showed how we can use the neuralcoref\u00a0library along with spaCy\u00a0to do coreference resolution (examples involved anaphoric references). In today's article, I want to try the same (well, almost) examples in Stanford CoreNLP engine and see how they compare. Since CoreNLP is a Java implementation, I\u2026","rel":"","context":"In &quot;Natural Language Processing&quot;","block_context":{"text":"Natural Language Processing","link":"https:\/\/www.rangakrish.com\/index.php\/category\/natural-language-processing\/"},"img":{"alt_text":"Comparison Table","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Table.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Table.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Table.jpg?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Table.jpg?resize=700%2C400&ssl=1 2x"},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/posts\/4265","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/comments?post=4265"}],"version-history":[{"count":7,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/posts\/4265\/revisions"}],"predecessor-version":[{"id":4274,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/posts\/4265\/revisions\/4274"}],"wp:attachment":[{"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/media?parent=4265"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/categories?post=4265"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/tags?post=4265"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}