{"id":1386,"date":"2019-01-13T14:12:49","date_gmt":"2019-01-13T08:42:49","guid":{"rendered":"https:\/\/www.rangakrish.com\/?p=1386"},"modified":"2019-01-13T14:12:49","modified_gmt":"2019-01-13T08:42:49","slug":"chunking-in-opennlp","status":"publish","type":"post","link":"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/13\/chunking-in-opennlp\/","title":{"rendered":"Chunking in OpenNLP"},"content":{"rendered":"<p>In my previous <a href=\"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/08\/parsing-text-with-apache-opennlp\/\" target=\"_blank\" rel=\"noopener\"><em><strong>post<\/strong><\/em><\/a>, I showed how to parse sentences using <a href=\"https:\/\/opennlp.apache.org\" target=\"_blank\" rel=\"noopener\"><em><strong>OpenNLP<\/strong><\/em><\/a>. Another useful feature supported by OpenNLP is <em><strong>&#8220;chunking\u201d<\/strong><\/em>. That is the subject of today\u2019s article.<\/p>\n<p>Chunking stands between part-of-speech tagging and full parse in terms of the information it captures. POS tagging assigns part of speech to individual tokens in a sentence. So, in the sentence <em><strong>\u201cPeter likes sweets\u201d<\/strong><\/em>, the POS tags are:<\/p>\n<blockquote><p><span style=\"color: #0000ff;\">Peter =&gt; NNP<\/span><\/p>\n<p><span style=\"color: #0000ff;\">likes =&gt; VBZ<\/span><\/p>\n<p><span style=\"color: #0000ff;\">sweets =&gt; NNS<\/span><\/p><\/blockquote>\n<p>The tagging is based on <a href=\"https:\/\/www.ling.upenn.edu\/courses\/Fall_2003\/ling001\/penn_treebank_pos.html\" target=\"_blank\" rel=\"noopener\"><em><strong>Penn Treebank scheme<\/strong><\/em><\/a>.<\/p>\n<p>The constituency parser operates at the other extreme. It tries to assign a structure to the complete sentence, by assigning a structure (recursively) to constituent parts.<span class=\"Apple-converted-space\">\u00a0We saw this in the <a href=\"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/08\/parsing-text-with-apache-opennlp\/\" target=\"_blank\" rel=\"noopener\"><em><strong>last article<\/strong><\/em><\/a>.<\/span><\/p>\n<p>Full parse is significantly more expensive than just POS tagging for obvious reasons. Sometimes we might be interested only in the smaller structures contained in the larger parse tree, for example, <em><strong>Verb Phrase<\/strong><\/em>, <em><strong>Adjective Phrase<\/strong><\/em>, <em><strong>Noun Phrase,<\/strong><\/em> and so on. The classic example is <em><strong>NER<\/strong><\/em> (Named Entity Recognition) where we are interested in specific <em><strong>Noun Phrases<\/strong><\/em>. This usually (not always) involves more than one token in the given text, and is called <em><strong>\u201cchunking\u201d<\/strong><\/em>.<\/p>\n<p>OK. Let us see how to use the chunker in <em><strong>OpenNLP<\/strong><\/em>. I have written a simple class called <em><strong>\u201cOpenNLPChunkerExample\u201d<\/strong><\/em> to illustrate the essential features (you can download the source from <a href=\"http:\/\/www.rangakrish.com\/downloads\/OpenNLPChunkerExample.java\" target=\"_blank\" rel=\"noopener\"><em><strong>here<\/strong><\/em><\/a>).<\/p>\n<p>The code fragment below gets the chunked tags and prints them along with the corresponding word.<\/p>\n<figure id=\"attachment_1389\" aria-describedby=\"caption-attachment-1389\" style=\"width: 533px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example1-2.png?ssl=1\"><img data-recalc-dims=\"1\" fetchpriority=\"high\" decoding=\"async\" data-attachment-id=\"1389\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/13\/chunking-in-opennlp\/example1-12\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example1-2.png\" data-orig-size=\"533,177\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Printing Chunked Tags\" data-image-description=\"&lt;p&gt;Printing Chunked Tags&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Printing Chunked Tags&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example1-2.png\" class=\"size-full wp-image-1389\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example1-2.png?resize=533%2C177&#038;ssl=1\" alt=\"Printing Chunked Tags\" width=\"533\" height=\"177\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example1-2.png?w=533&amp;ssl=1 533w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example1-2.png?resize=300%2C100&amp;ssl=1 300w\" sizes=\"(max-width: 533px) 100vw, 533px\" \/><\/a><figcaption id=\"caption-attachment-1389\" class=\"wp-caption-text\"><strong>Printing Chunked Tags<\/strong><\/figcaption><\/figure>\n<p>The output from the program is:<\/p>\n<figure id=\"attachment_1390\" aria-describedby=\"caption-attachment-1390\" style=\"width: 120px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output1-1.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" data-attachment-id=\"1390\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/13\/chunking-in-opennlp\/output1-2\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output1-1.png\" data-orig-size=\"120,141\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Output\" data-image-description=\"&lt;p&gt;Output&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Output&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output1-1.png\" class=\"size-full wp-image-1390\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output1-1.png?resize=120%2C141&#038;ssl=1\" alt=\"Output\" width=\"120\" height=\"141\" \/><\/a><figcaption id=\"caption-attachment-1390\" class=\"wp-caption-text\"><strong>Output<\/strong><\/figcaption><\/figure>\n<p>The tagging produced by the chunker follows the <em><strong>\u201cIOB\u201d<\/strong><\/em> tagging <a href=\"https:\/\/en.wikipedia.org\/wiki\/Inside\u2013outside\u2013beginning_(tagging)\" target=\"_blank\" rel=\"noopener\"><em><strong>scheme<\/strong><\/em><\/a>. Here<span class=\"Apple-converted-space\">,<\/span><\/p>\n<blockquote><p><span style=\"color: #0000ff;\">B = Beginning of chunk<\/span><\/p>\n<p><span style=\"color: #0000ff;\">I = In a chunk<\/span><\/p>\n<p><span style=\"color: #0000ff;\">O = Outside any chunk<\/span><\/p><\/blockquote>\n<p>From the above scheme, we can easily see that the words <em><strong>\u201cThe pretty cat\u201d<\/strong><\/em> form a single <em><strong>NP<\/strong><\/em> chunk, the word <em><strong>\u201cchased\u201d<\/strong><\/em> forms a <em><strong>VP<\/strong><\/em> chunk all by itself, and the words <em><strong>\u201cthe ugly rat\u201d<\/strong><\/em> constitute an <em><strong>NP<\/strong><\/em> chunk again. The final <em><strong>\u201c.\u201d<\/strong><\/em>\u00a0is not part of any chunk.<\/p>\n<p>To facilitate readability, we can write a convenience function to group the related chunks. Here is the code:<\/p>\n<figure id=\"attachment_1391\" aria-describedby=\"caption-attachment-1391\" style=\"width: 553px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/GroupChunks.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" data-attachment-id=\"1391\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/13\/chunking-in-opennlp\/groupchunks\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/GroupChunks.png\" data-orig-size=\"553,417\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Chunked Phrases\" data-image-description=\"&lt;p&gt;Chunked Phrases&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Chunked Phrases&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/GroupChunks.png\" class=\"size-full wp-image-1391\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/GroupChunks.png?resize=553%2C417&#038;ssl=1\" alt=\"Chunked Phrases\" width=\"553\" height=\"417\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/GroupChunks.png?w=553&amp;ssl=1 553w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/GroupChunks.png?resize=300%2C226&amp;ssl=1 300w\" sizes=\"(max-width: 553px) 100vw, 553px\" \/><\/a><figcaption id=\"caption-attachment-1391\" class=\"wp-caption-text\"><strong>Function to Group Words in Chunk<\/strong><\/figcaption><\/figure>\n<p>The function returns a <em><strong>Span[].<\/strong><\/em> The updated <em><strong>\u201cmain\u201d<\/strong><\/em> that uses this function and prints the chunks is:<\/p>\n<figure id=\"attachment_1392\" aria-describedby=\"caption-attachment-1392\" style=\"width: 546px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example2-1.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1392\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/13\/chunking-in-opennlp\/example2-8\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example2-1.png\" data-orig-size=\"546,312\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Printing Grouped Words\" data-image-description=\"&lt;p&gt;Printing Grouped Words&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Printing Grouped Words&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example2-1.png\" class=\"size-full wp-image-1392\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example2-1.png?resize=546%2C312&#038;ssl=1\" alt=\"Printing Grouped Words\" width=\"546\" height=\"312\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example2-1.png?w=546&amp;ssl=1 546w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example2-1.png?resize=300%2C171&amp;ssl=1 300w\" sizes=\"(max-width: 546px) 100vw, 546px\" \/><\/a><figcaption id=\"caption-attachment-1392\" class=\"wp-caption-text\"><strong>Printing Grouped Chunks<\/strong><\/figcaption><\/figure>\n<p>The corresponding output is:<\/p>\n<figure id=\"attachment_1393\" aria-describedby=\"caption-attachment-1393\" style=\"width: 165px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output2-1.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1393\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/13\/chunking-in-opennlp\/output2-2\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output2-1.png\" data-orig-size=\"165,199\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Grouped Chunks\" data-image-description=\"&lt;p&gt;Grouped Chunks&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Grouped Chunks&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output2-1.png\" class=\"size-full wp-image-1393\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output2-1.png?resize=165%2C199&#038;ssl=1\" alt=\"Grouped Chunks\" width=\"165\" height=\"199\" \/><\/a><figcaption id=\"caption-attachment-1393\" class=\"wp-caption-text\"><strong>Grouped Chunks<\/strong><\/figcaption><\/figure>\n<p>We can even get the probability associated with each chunked tag. Here is the final version that prints this information:<\/p>\n<figure id=\"attachment_1394\" aria-describedby=\"caption-attachment-1394\" style=\"width: 640px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example3-1.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1394\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/13\/chunking-in-opennlp\/example3-7\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example3-1.png\" data-orig-size=\"640,346\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Printing Chunked Tags with Probability\" data-image-description=\"&lt;p&gt;Printing Chunked Tags with Probability&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Printing Chunked Tags with Probability&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example3-1.png\" class=\"size-full wp-image-1394\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example3-1.png?resize=640%2C346&#038;ssl=1\" alt=\"Printing Chunked Tags with Probability\" width=\"640\" height=\"346\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example3-1.png?w=640&amp;ssl=1 640w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example3-1.png?resize=300%2C162&amp;ssl=1 300w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><figcaption id=\"caption-attachment-1394\" class=\"wp-caption-text\"><strong>Printing Chunked Tags with Probability<\/strong><\/figcaption><\/figure>\n<p>Here is the corresponding output:<\/p>\n<figure id=\"attachment_1395\" aria-describedby=\"caption-attachment-1395\" style=\"width: 265px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output3-1.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1395\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/13\/chunking-in-opennlp\/output3-2\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output3-1.png\" data-orig-size=\"265,193\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Tags with Probability\" data-image-description=\"&lt;p&gt;Tags with Probability&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Tags with Probability&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output3-1.png\" class=\"size-full wp-image-1395\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output3-1.png?resize=265%2C193&#038;ssl=1\" alt=\"Tags with Probability\" width=\"265\" height=\"193\" \/><\/a><figcaption id=\"caption-attachment-1395\" class=\"wp-caption-text\"><strong>Tags with Probability<\/strong><\/figcaption><\/figure>\n<p>Before concluding, let us print the chunks for another sentence: <em><strong>\u201cIt is very beautiful.\u201d<\/strong><\/em><\/p>\n<figure id=\"attachment_1396\" aria-describedby=\"caption-attachment-1396\" style=\"width: 294px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output4.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1396\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/13\/chunking-in-opennlp\/output4\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output4.png\" data-orig-size=\"294,140\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Another Example\" data-image-description=\"&lt;p&gt;Another Example&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Another Example&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output4.png\" class=\"size-full wp-image-1396\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output4.png?resize=294%2C140&#038;ssl=1\" alt=\"Another Example\" width=\"294\" height=\"140\" \/><\/a><figcaption id=\"caption-attachment-1396\" class=\"wp-caption-text\"><strong>Another Example<\/strong><\/figcaption><\/figure>\n<p>You can see that we now have an <em><strong>Adjective Phrase<\/strong><\/em> (<em><strong>ADJP<\/strong><\/em>):\u00a0<em><strong>\u201cvery beautiful\u201d<\/strong><\/em>.<\/p>\n<p>Python\u2019s <a href=\"http:\/\/www.nltk.org\" target=\"_blank\" rel=\"noopener\"><em><strong>NLTK<\/strong><\/em><\/a>, another popular NLP toolkit, also supports chunking. What I like about NLTK is that it allows us to define a <em><strong>\u201cchunking grammar\u201d<\/strong><\/em> to customize our chunking logic. This can prove useful in some cases. Take a look at NLTK when you get time.<\/p>\n<p>You can download my Java program from <a href=\"http:\/\/www.rangakrish.com\/downloads\/OpenNLPChunkerExample.java\" target=\"_blank\" rel=\"noopener\"><em><strong>here<\/strong><\/em><\/a>.<\/p>\n<p>Have a nice weekend and a great week ahead!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In my previous post, I showed how to parse sentences using OpenNLP. Another useful feature supported by OpenNLP is &#8220;chunking\u201d. That is the subject of today\u2019s article. Chunking stands between part-of-speech tagging and full parse in terms of the information it captures. POS tagging assigns part of speech to individual tokens in a sentence. So, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[107,17],"tags":[181,69,174,182,179],"class_list":["post-1386","post","type-post","status-publish","format-standard","hentry","category-natural-language-processing","category-programming","tag-chunking","tag-java","tag-natural-language-processing","tag-nltk","tag-opennlp"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p9OLnF-mm","jetpack-related-posts":[{"id":1368,"url":"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/08\/parsing-text-with-apache-opennlp\/","url_meta":{"origin":1386,"position":0},"title":"Parsing Text with Apache OpenNLP","author":"admin","date":"January 8, 2019","format":false,"excerpt":"In my earlier posts I have written about parsing text using spaCy\u00a0and MeaningCloud's parsing API. For today's article, I decided to take a look at OpenNLP, an open-source ML-based Java toolkit for parsing natural language text. OpenNLP is a fairly mature library and has been around since 2004 (source: Wikipedia).\u2026","rel":"","context":"In &quot;Natural Language Processing&quot;","block_context":{"text":"Natural Language Processing","link":"https:\/\/www.rangakrish.com\/index.php\/category\/natural-language-processing\/"},"img":{"alt_text":"Parse Tree","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree3.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":1399,"url":"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/20\/named-entity-recognition-ner-with-opennlp\/","url_meta":{"origin":1386,"position":1},"title":"Named Entity Recognition (NER) with OpenNLP","author":"admin","date":"January 20, 2019","format":false,"excerpt":"In the earlier two articles, we looked at Sentence Parsing\u00a0and Chunking\u00a0as supported in OpenNLP. In today's article, let us explore Named Entity Recognition, also known as NER. NER is a technique to identify special categories of noun phrases such as people, places, companies, money, etc., present in the given text.\u2026","rel":"","context":"In &quot;Natural Language Processing&quot;","block_context":{"text":"Natural Language Processing","link":"https:\/\/www.rangakrish.com\/index.php\/category\/natural-language-processing\/"},"img":{"alt_text":"Wrapper Classes","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Wrappers.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":1792,"url":"https:\/\/www.rangakrish.com\/index.php\/2019\/11\/23\/using-augmented-transition-networks-atn-for-information-extraction\/","url_meta":{"origin":1386,"position":2},"title":"Using Augmented Transition Networks (ATN) for Information Extraction","author":"admin","date":"November 23, 2019","format":false,"excerpt":"After Wood\u2019s paper [1], Augmented Transition Networks\u00a0(ATN) became popular in the 1970s, for parsing text. An ATN is a generalized transition network with two major enhancements: Support for recursive transitions, including jumping to other ATNs Performing arbitrary actions when edges are traversed Remembering state through the use of registers See\u2026","rel":"","context":"In &quot;LISP&quot;","block_context":{"text":"LISP","link":"https:\/\/www.rangakrish.com\/index.php\/category\/lisp\/"},"img":{"alt_text":"ATN for Modality","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/11\/modality.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/11\/modality.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/11\/modality.jpg?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":912,"url":"https:\/\/www.rangakrish.com\/index.php\/2018\/04\/22\/question-answering-using-dependency-trees\/","url_meta":{"origin":1386,"position":3},"title":"Question Answering\u00a0Using Dependency Trees","author":"admin","date":"April 22, 2018","format":false,"excerpt":"A few weeks ago I had written about my brief experiment with Mathematica's new feature, which provides answers to questions based on given text. After that post, I spent some time thinking about how to implement something similar. In today's post, I want to show you what I have been\u2026","rel":"","context":"In &quot;LISP&quot;","block_context":{"text":"LISP","link":"https:\/\/www.rangakrish.com\/index.php\/category\/lisp\/"},"img":{"alt_text":"Dependency Tree","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/04\/Deptree-example.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":1817,"url":"https:\/\/www.rangakrish.com\/index.php\/2019\/12\/08\/using-definite-clause-grammars-dcg-for-information-extraction\/","url_meta":{"origin":1386,"position":4},"title":"Using Definite Clause Grammars (DCG) for Information Extraction","author":"admin","date":"December 8, 2019","format":false,"excerpt":"In the previous article, I showed how we can use ATNs for extracting key information from natural language text. I also pointed out in that article that Definite Clause Grammars (DCG) are a more compact formalism for doing this. That will be the focus of today's article. For a nice\u2026","rel":"","context":"In &quot;Natural Language Processing&quot;","block_context":{"text":"Natural Language Processing","link":"https:\/\/www.rangakrish.com\/index.php\/category\/natural-language-processing\/"},"img":{"alt_text":"Processing the Text","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/12\/Processing-file-code.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/12\/Processing-file-code.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/12\/Processing-file-code.jpg?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":2483,"url":"https:\/\/www.rangakrish.com\/index.php\/2021\/07\/18\/sentence-negation\/","url_meta":{"origin":1386,"position":5},"title":"Sentence Negation","author":"admin","date":"July 18, 2021","format":false,"excerpt":"In the last article, I talked about determining sentence types automatically. Another interesting task is to generate the \"negation\" of a given sentence. Example-1: Sentence => \"My teacher lives nearby\" Negation => \"My teacher does not live nearby\" Example-2: Sentence => \"She did not like that speech\" Negation => \"She\u2026","rel":"","context":"In &quot;Natural Language Processing&quot;","block_context":{"text":"Natural Language Processing","link":"https:\/\/www.rangakrish.com\/index.php\/category\/natural-language-processing\/"},"img":{"alt_text":"Parse Tree","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2021\/07\/parsetree-300x24.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2021\/07\/parsetree-300x24.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2021\/07\/parsetree-300x24.jpg?resize=525%2C300&ssl=1 1.5x"},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/posts\/1386","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/comments?post=1386"}],"version-history":[{"count":0,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/posts\/1386\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/media?parent=1386"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/categories?post=1386"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/tags?post=1386"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}