{"id":1368,"date":"2019-01-08T09:19:10","date_gmt":"2019-01-08T03:49:10","guid":{"rendered":"https:\/\/www.rangakrish.com\/?p=1368"},"modified":"2019-01-08T10:51:48","modified_gmt":"2019-01-08T05:21:48","slug":"parsing-text-with-apache-opennlp","status":"publish","type":"post","link":"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/08\/parsing-text-with-apache-opennlp\/","title":{"rendered":"Parsing Text with Apache OpenNLP"},"content":{"rendered":"<p>In my earlier posts I have written about parsing text using <a href=\"https:\/\/www.rangakrish.com\/index.php\/2018\/09\/16\/dependency-graph-to-rdf\/\" target=\"_blank\" rel=\"noopener\"><em><strong>spaCy<\/strong><\/em><\/a>\u00a0and <a href=\"https:\/\/www.rangakrish.com\/index.php\/2018\/12\/09\/parsing-text-with-meaningclouds-text-analytics-api\/\" target=\"_blank\" rel=\"noopener\"><em><strong>MeaningCloud&#8217;s parsing API<\/strong><\/em><\/a>. For today&#8217;s article, I decided to take a look at <a href=\"https:\/\/opennlp.apache.org\" target=\"_blank\" rel=\"noopener\"><em><strong>OpenNLP<\/strong><\/em><\/a>, an open-source ML-based Java toolkit for parsing natural language text.<\/p>\n<p><em><strong>OpenNLP<\/strong><\/em> is a fairly mature library and has been around since 2004 (source: <a href=\"https:\/\/en.wikipedia.org\/wiki\/Apache_OpenNLP\" target=\"_blank\" rel=\"noopener\"><em><strong>Wikipedia<\/strong><\/em><\/a>). It is actively maintained and developed, the current version being 1.9.1. It supports all the standard tasks expected of such a toolkit, namely, <em><strong>language detection, document categorization, lemmatization, tokenization, part-of-speech tagging, chunking, parsing, named-entity recognition, <\/strong><\/em>and<em><strong> coreference resolution<\/strong><\/em>.<\/p>\n<p>In this article, my focus is on the parser alone. I will try to write about the other components in future articles.<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<p>The parser is a <em><strong>constituency parser<\/strong><\/em> and not dependency parser. It is thus similar to <em><strong>MeaningCloud<\/strong><\/em> but different from <em><strong>spaCy<\/strong><\/em>.<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<p>Getting started is quite simple. I downloaded the binaries from <a href=\"https:\/\/opennlp.apache.org\/download.html\" target=\"_blank\" rel=\"noopener\"><em><strong>here.<\/strong><\/em><\/a><\/p>\n<p>I also downloaded two <em><strong>models<\/strong><\/em>: <em><strong>en-parser-chunking.bin<\/strong><\/em> and <em><strong>en-sent.bin<\/strong><\/em> from <a href=\"http:\/\/opennlp.sourceforge.net\/models-1.5\/\" target=\"_blank\" rel=\"noopener\">here.<\/a><\/p>\n<p>The former is for the parser (chunking parser) and the latter is for sentence detection.<\/p>\n<p>The parser parses one sentence at a time and hence we need the sentence detector if we are going to parse a text made up of multiple sentences.<\/p>\n<p>I then configured my <em><strong>Libraries<\/strong><\/em> setting in <a href=\"https:\/\/www.jetbrains.com\/idea\/\" target=\"_blank\" rel=\"noopener\"><em><strong>IntelliJ IDEA<\/strong><\/em><\/a>\u00a0to point to the downloaded <em><strong>lib<\/strong><\/em> directory. That&#8217;s pretty much it.<\/p>\n<p>One interesting aspect of the parser is that you can specify the number of parses to return after parsing. Since the parser is ML-based, the system can typically arrive at multiple <em><strong>potential<\/strong><\/em> parse trees for the same sentence, each tree being associated with a probability factor (the official documentation defines this as the <em><strong>&#8220;<\/strong><strong>log of the product of the probability associated with all the decisions which formed this constituent&#8221;<\/strong><\/em>.)<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<p>I wrote a simple wrapper class for testing the parser functionality. You can download the source from <a href=\"http:\/\/www.rangakrish.com\/downloads\/OpenNLPTester.java\" target=\"_blank\" rel=\"noopener\"><em><strong>here<\/strong><\/em><\/a>.<\/p>\n<p>To start with, let us print the parse tree for the sentence <em><strong>\u201cJohn loves Mary.\u201d<\/strong><\/em> asking for just one parse tree.<\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_1371\" aria-describedby=\"caption-attachment-1371\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example1-1.png?ssl=1\"><img data-recalc-dims=\"1\" fetchpriority=\"high\" decoding=\"async\" data-attachment-id=\"1371\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/08\/parsing-text-with-apache-opennlp\/example1-11\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example1-1.png\" data-orig-size=\"666,325\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Single Parse Tree\" data-image-description=\"&lt;p&gt;Single Parse Tree&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Single Parse Tree&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example1-1.png\" class=\"wp-image-1371\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example1-1.png?resize=650%2C317&#038;ssl=1\" alt=\"Single Parse Tree\" width=\"650\" height=\"317\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example1-1.png?w=666&amp;ssl=1 666w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example1-1.png?resize=300%2C146&amp;ssl=1 300w\" sizes=\"(max-width: 650px) 100vw, 650px\" \/><\/a><figcaption id=\"caption-attachment-1371\" class=\"wp-caption-text\"><strong>Single Parse Tree<\/strong><\/figcaption><\/figure>\n<p>Here is the output:<\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_1372\" aria-describedby=\"caption-attachment-1372\" style=\"width: 624px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output1.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" data-attachment-id=\"1372\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/08\/parsing-text-with-apache-opennlp\/output1\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output1.png\" data-orig-size=\"624,20\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Output\" data-image-description=\"&lt;p&gt;Output&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Output&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output1.png\" class=\"size-full wp-image-1372\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output1.png?resize=624%2C20&#038;ssl=1\" alt=\"Output\" width=\"624\" height=\"20\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output1.png?w=624&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output1.png?resize=300%2C10&amp;ssl=1 300w\" sizes=\"(max-width: 624px) 100vw, 624px\" \/><\/a><figcaption id=\"caption-attachment-1372\" class=\"wp-caption-text\"><strong>Output<\/strong><\/figcaption><\/figure>\n<p><span style=\"font-size: 16px;\">When rendered as a tree, it looks like this (I used this <a href=\"https:\/\/bagnalla.github.io\/sexp-trees\/\" target=\"_blank\" rel=\"noopener\"><em><strong>site<\/strong><\/em><\/a>\u00a0<\/span><span style=\"font-size: 16px;\">to render the s-expression):<\/span><\/p>\n<figure id=\"attachment_1373\" aria-describedby=\"caption-attachment-1373\" style=\"width: 396px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree1.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" data-attachment-id=\"1373\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/08\/parsing-text-with-apache-opennlp\/tree1\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree1.png\" data-orig-size=\"396,310\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Parse Tree\" data-image-description=\"&lt;p&gt;Parse Tree&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Parse Tree&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree1.png\" class=\"size-full wp-image-1373\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree1.png?resize=396%2C310&#038;ssl=1\" alt=\"Parse Tree\" width=\"396\" height=\"310\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree1.png?w=396&amp;ssl=1 396w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree1.png?resize=300%2C235&amp;ssl=1 300w\" sizes=\"(max-width: 396px) 100vw, 396px\" \/><\/a><figcaption id=\"caption-attachment-1373\" class=\"wp-caption-text\"><strong>Parse Tree<\/strong><\/figcaption><\/figure>\n<p>This looks fine. Note the probability number. I assume this shows that what we have got is the best parse probable.<\/p>\n<p>Next, let us ask for 10 possible parses for the same sentence:<\/p>\n<figure id=\"attachment_1374\" aria-describedby=\"caption-attachment-1374\" style=\"width: 651px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example2.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1374\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/08\/parsing-text-with-apache-opennlp\/example2-7\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example2.png\" data-orig-size=\"680,326\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Asking for 10 Parses\" data-image-description=\"&lt;p&gt;Asking for 10 Parses&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Asking for 10 Parses&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example2.png\" class=\"wp-image-1374\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example2.png?resize=651%2C312&#038;ssl=1\" alt=\"Asking for 10 Parses\" width=\"651\" height=\"312\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example2.png?w=680&amp;ssl=1 680w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example2.png?resize=300%2C144&amp;ssl=1 300w\" sizes=\"(max-width: 651px) 100vw, 651px\" \/><\/a><figcaption id=\"caption-attachment-1374\" class=\"wp-caption-text\"><strong>Asking for 10 Parses<\/strong><\/figcaption><\/figure>\n<p>This is the output:<\/p>\n<figure id=\"attachment_1375\" aria-describedby=\"caption-attachment-1375\" style=\"width: 657px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output2.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1375\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/08\/parsing-text-with-apache-opennlp\/output2\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output2.png\" data-orig-size=\"657,174\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Output\" data-image-description=\"&lt;p&gt;Output&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Output&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output2.png\" class=\"size-full wp-image-1375\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output2.png?resize=657%2C174&#038;ssl=1\" alt=\"Output\" width=\"657\" height=\"174\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output2.png?w=657&amp;ssl=1 657w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output2.png?resize=300%2C79&amp;ssl=1 300w\" sizes=\"(max-width: 657px) 100vw, 657px\" \/><\/a><figcaption id=\"caption-attachment-1375\" class=\"wp-caption-text\"><strong>Output<\/strong><\/figcaption><\/figure>\n<p>I have printed the parse trees in <em><strong>descending<\/strong><\/em> order of the probability, so the top one is the best parse in this batch. Notice that the top parse is not the same as the one we got earlier (when we asked for just one parse tree) and has lower probability number than the earlier one. It is not clear why this is the case.<\/p>\n<p>In order not to clutter the screen, I am showing below the visual tree representation for the top two parses:<\/p>\n<figure id=\"attachment_1377\" aria-describedby=\"caption-attachment-1377\" style=\"width: 402px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree2a.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1377\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/08\/parsing-text-with-apache-opennlp\/tree2a\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree2a.png\" data-orig-size=\"402,332\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Tree1\" data-image-description=\"&lt;p&gt;Tree1&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Tree1&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree2a.png\" class=\"size-full wp-image-1377\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree2a.png?resize=402%2C332&#038;ssl=1\" alt=\"Tree1\" width=\"402\" height=\"332\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree2a.png?w=402&amp;ssl=1 402w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree2a.png?resize=300%2C248&amp;ssl=1 300w\" sizes=\"(max-width: 402px) 100vw, 402px\" \/><\/a><figcaption id=\"caption-attachment-1377\" class=\"wp-caption-text\"><strong>First Tree<\/strong><\/figcaption><\/figure>\n<figure id=\"attachment_1378\" aria-describedby=\"caption-attachment-1378\" style=\"width: 379px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree2b.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1378\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/08\/parsing-text-with-apache-opennlp\/tree2b\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree2b.png\" data-orig-size=\"379,257\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Tree2\" data-image-description=\"&lt;p&gt;The Second Tree&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;The Second Tree&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree2b.png\" class=\"size-full wp-image-1378\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree2b.png?resize=379%2C257&#038;ssl=1\" alt=\"The Second Tree\" width=\"379\" height=\"257\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree2b.png?w=379&amp;ssl=1 379w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree2b.png?resize=300%2C203&amp;ssl=1 300w\" sizes=\"(max-width: 379px) 100vw, 379px\" \/><\/a><figcaption id=\"caption-attachment-1378\" class=\"wp-caption-text\"><strong>The Second Tree<\/strong><\/figcaption><\/figure>\n<p>They do not look correct.<\/p>\n<p>Just to understand what is going on, let us ask for 25 parses this time:<\/p>\n<figure id=\"attachment_1379\" aria-describedby=\"caption-attachment-1379\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example3.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1379\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/08\/parsing-text-with-apache-opennlp\/example3-6\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example3.png\" data-orig-size=\"671,324\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Asking for 25 Parses\" data-image-description=\"&lt;p&gt;Asking for 25 Parses&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Asking for 25 Parses&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example3.png\" class=\"wp-image-1379\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example3.png?resize=650%2C314&#038;ssl=1\" alt=\"Asking for 25 Parses\" width=\"650\" height=\"314\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example3.png?w=671&amp;ssl=1 671w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example3.png?resize=300%2C145&amp;ssl=1 300w\" sizes=\"(max-width: 650px) 100vw, 650px\" \/><\/a><figcaption id=\"caption-attachment-1379\" class=\"wp-caption-text\"><strong>Asking for 25 Parses<\/strong><\/figcaption><\/figure>\n<p>Here is what we get this time:<\/p>\n<figure id=\"attachment_1380\" aria-describedby=\"caption-attachment-1380\" style=\"width: 671px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output3.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1380\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/08\/parsing-text-with-apache-opennlp\/output3\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output3.png\" data-orig-size=\"671,364\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Output (25 Parses)\" data-image-description=\"&lt;p&gt;Output (25 Parses)&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Output (25 Parses)&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output3.png\" class=\"size-full wp-image-1380\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output3.png?resize=671%2C364&#038;ssl=1\" alt=\"Output (25 Parses)\" width=\"671\" height=\"364\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output3.png?w=671&amp;ssl=1 671w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Output3.png?resize=300%2C163&amp;ssl=1 300w\" sizes=\"(max-width: 671px) 100vw, 671px\" \/><\/a><figcaption id=\"caption-attachment-1380\" class=\"wp-caption-text\"><strong>Output (25 Parses)<\/strong><\/figcaption><\/figure>\n<p>This time, it is interesting that the top parse tree corresponds to the one we got when we asked for the single parse case. The probability number also matches.<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<p>As a final confirmation, take a look at the visual representation of the top parse:<\/p>\n<figure id=\"attachment_1381\" aria-describedby=\"caption-attachment-1381\" style=\"width: 392px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree3.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1381\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/08\/parsing-text-with-apache-opennlp\/tree3\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree3.png\" data-orig-size=\"392,309\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Tree3\" data-image-description=\"&lt;p&gt;Parse Tree&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Parse Tree&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree3.png\" class=\"size-full wp-image-1381\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree3.png?resize=392%2C309&#038;ssl=1\" alt=\"Parse Tree\" width=\"392\" height=\"309\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree3.png?w=392&amp;ssl=1 392w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree3.png?resize=300%2C236&amp;ssl=1 300w\" sizes=\"(max-width: 392px) 100vw, 392px\" \/><\/a><figcaption id=\"caption-attachment-1381\" class=\"wp-caption-text\"><strong>Parse Tree<\/strong><\/figcaption><\/figure>\n<p>The trees match, right? It appears to me that we are better off asking for just one parse tree unless we are doing some study to compare the various outputs.<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<p>The result of parsing a sentence is a <em><strong>Parse<\/strong><\/em> array comprising as many parses as requested while parsing. The <em><strong>Parse<\/strong><\/em> class has several methods to access and manipulate the components of the parse structure, including accessing the child elements and so on. See the <a href=\"https:\/\/opennlp.apache.org\/docs\/1.9.1\/apidocs\/opennlp-tools\/index.html\" target=\"_blank\" rel=\"noopener\"><em><strong>documentation<\/strong><\/em><\/a> for more details.<\/p>\n<p>You can play around with my example <a href=\"http:\/\/www.rangakrish.com\/downloads\/OpenNLPTester.java\" target=\"_blank\" rel=\"noopener\"><em><strong>code<\/strong><\/em><\/a>\u00a0to get a feel for the parser functionality. <span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<p>Have a great day and see you soon!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In my earlier posts I have written about parsing text using spaCy\u00a0and MeaningCloud&#8217;s parsing API. For today&#8217;s article, I decided to take a look at OpenNLP, an open-source ML-based Java toolkit for parsing natural language text. OpenNLP is a fairly mature library and has been around since 2004 (source: Wikipedia). It is actively maintained and [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[107,17],"tags":[180,174,179],"class_list":["post-1368","post","type-post","status-publish","format-standard","hentry","category-natural-language-processing","category-programming","tag-constituency-parsing","tag-natural-language-processing","tag-opennlp"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p9OLnF-m4","jetpack-related-posts":[{"id":1399,"url":"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/20\/named-entity-recognition-ner-with-opennlp\/","url_meta":{"origin":1368,"position":0},"title":"Named Entity Recognition (NER) with OpenNLP","author":"admin","date":"January 20, 2019","format":false,"excerpt":"In the earlier two articles, we looked at Sentence Parsing\u00a0and Chunking\u00a0as supported in OpenNLP. In today's article, let us explore Named Entity Recognition, also known as NER. NER is a technique to identify special categories of noun phrases such as people, places, companies, money, etc., present in the given text.\u2026","rel":"","context":"In &quot;Natural Language Processing&quot;","block_context":{"text":"Natural Language Processing","link":"https:\/\/www.rangakrish.com\/index.php\/category\/natural-language-processing\/"},"img":{"alt_text":"Wrapper Classes","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Wrappers.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":1386,"url":"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/13\/chunking-in-opennlp\/","url_meta":{"origin":1368,"position":1},"title":"Chunking in OpenNLP","author":"admin","date":"January 13, 2019","format":false,"excerpt":"In my previous post, I showed how to parse sentences using OpenNLP. Another useful feature supported by OpenNLP is \"chunking\u201d. That is the subject of today\u2019s article. Chunking stands between part-of-speech tagging and full parse in terms of the information it captures. POS tagging assigns part of speech to individual\u2026","rel":"","context":"In &quot;Natural Language Processing&quot;","block_context":{"text":"Natural Language Processing","link":"https:\/\/www.rangakrish.com\/index.php\/category\/natural-language-processing\/"},"img":{"alt_text":"Printing Chunked Tags with Probability","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example3-1.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example3-1.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Example3-1.png?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":1792,"url":"https:\/\/www.rangakrish.com\/index.php\/2019\/11\/23\/using-augmented-transition-networks-atn-for-information-extraction\/","url_meta":{"origin":1368,"position":2},"title":"Using Augmented Transition Networks (ATN) for Information Extraction","author":"admin","date":"November 23, 2019","format":false,"excerpt":"After Wood\u2019s paper [1], Augmented Transition Networks\u00a0(ATN) became popular in the 1970s, for parsing text. An ATN is a generalized transition network with two major enhancements: Support for recursive transitions, including jumping to other ATNs Performing arbitrary actions when edges are traversed Remembering state through the use of registers See\u2026","rel":"","context":"In &quot;LISP&quot;","block_context":{"text":"LISP","link":"https:\/\/www.rangakrish.com\/index.php\/category\/lisp\/"},"img":{"alt_text":"ATN for Modality","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/11\/modality.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/11\/modality.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/11\/modality.jpg?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":1285,"url":"https:\/\/www.rangakrish.com\/index.php\/2018\/12\/09\/parsing-text-with-meaningclouds-text-analytics-api\/","url_meta":{"origin":1368,"position":3},"title":"Parsing Text with MeaningCloud&#8217;s Text Analytics API","author":"admin","date":"December 9, 2018","format":false,"excerpt":"There is wide-spread interest in Natural Language Processing (NLP) today, and there are several API services available to cater to this demand. See this article for a fairly detailed list of services. All of them support multiple languages, including English. Today, I am going to share my experience in working\u2026","rel":"","context":"In &quot;LISP&quot;","block_context":{"text":"LISP","link":"https:\/\/www.rangakrish.com\/index.php\/category\/lisp\/"},"img":{"alt_text":"Get Words Function","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/12\/Get-words.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/12\/Get-words.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/12\/Get-words.png?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":1817,"url":"https:\/\/www.rangakrish.com\/index.php\/2019\/12\/08\/using-definite-clause-grammars-dcg-for-information-extraction\/","url_meta":{"origin":1368,"position":4},"title":"Using Definite Clause Grammars (DCG) for Information Extraction","author":"admin","date":"December 8, 2019","format":false,"excerpt":"In the previous article, I showed how we can use ATNs for extracting key information from natural language text. I also pointed out in that article that Definite Clause Grammars (DCG) are a more compact formalism for doing this. That will be the focus of today's article. For a nice\u2026","rel":"","context":"In &quot;Natural Language Processing&quot;","block_context":{"text":"Natural Language Processing","link":"https:\/\/www.rangakrish.com\/index.php\/category\/natural-language-processing\/"},"img":{"alt_text":"Processing the Text","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/12\/Processing-file-code.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/12\/Processing-file-code.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/12\/Processing-file-code.jpg?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":2366,"url":"https:\/\/www.rangakrish.com\/index.php\/2021\/03\/28\/implementing-ilexicon-using-litedb\/","url_meta":{"origin":1368,"position":5},"title":"Implementing iLexicon using LiteDB","author":"admin","date":"March 28, 2021","format":false,"excerpt":"iLexicon is an \"intelligent\" dictionary that can be used to build Natural Language applications. I have two implementations, one in Lisp and another in Prolog. Both implementations are memory-based, in order to speed up performance. I have written several articles referencing it, for example see this. \u00a0 LiteDB is a\u2026","rel":"","context":"In &quot;Natural Language Processing&quot;","block_context":{"text":"Natural Language Processing","link":"https:\/\/www.rangakrish.com\/index.php\/category\/natural-language-processing\/"},"img":{"alt_text":"Sample Commands","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2021\/03\/Session1.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/posts\/1368","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/comments?post=1368"}],"version-history":[{"count":0,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/posts\/1368\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/media?parent=1368"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/categories?post=1368"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/tags?post=1368"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}