{"id":1427,"date":"2019-02-03T16:00:20","date_gmt":"2019-02-03T10:30:20","guid":{"rendered":"https:\/\/www.rangakrish.com\/?p=1427"},"modified":"2019-02-03T16:03:20","modified_gmt":"2019-02-03T10:33:20","slug":"coreference-resolution-using-spacy","status":"publish","type":"post","link":"https:\/\/www.rangakrish.com\/index.php\/2019\/02\/03\/coreference-resolution-using-spacy\/","title":{"rendered":"Coreference Resolution Using spaCy"},"content":{"rendered":"<p>According to <a href=\"https:\/\/nlp.stanford.edu\/projects\/coref.shtml\" target=\"_blank\" rel=\"noopener\">Stanford NLP Group<\/a>, <em><strong>&#8220;Coreference resolution is the task of finding all expressions that refer to the same entity in a text&#8221;<\/strong><\/em>.<span class=\"Apple-converted-space\">\u00a0 <\/span>You can also read this <a href=\"https:\/\/en.wikipedia.org\/wiki\/Coreference\" target=\"_blank\" rel=\"noopener\">Wikipedia page<\/a>.<\/p>\n<p>For example, in the sentence <em><strong>&#8220;Tom dropped the glass jar by accident and broke it&#8221;<\/strong><\/em>, what does <em><strong>&#8220;it&#8221;<\/strong><\/em> refer to? I am sure you will immediately say that <em><strong>&#8220;it&#8221;<\/strong><\/em> refers to <em><strong>&#8220;the glass jar&#8221;<\/strong><\/em>. This is a simple example of coreference resolution.<\/p>\n<p>It can be much more tricky in some cases, but humans usually have no difficulty in resolving coreferences. In today&#8217;s article, I want to take a look at the <em><strong>&#8220;<a href=\"https:\/\/github.com\/huggingface\/neuralcoref\" target=\"_blank\" rel=\"noopener\">neuralcoref<\/a>&#8220;<\/strong><\/em>\u00a0Python library that is integrated into <a href=\"https:\/\/spacy.io\" target=\"_blank\" rel=\"noopener\"><em><strong>spaCy&#8217;s<\/strong><\/em><\/a> NLP pipeline and hence seamlessly extends <em><strong>spaCy<\/strong><\/em>. You may also want to read this <a href=\"https:\/\/medium.com\/huggingface\/state-of-the-art-neural-coreference-resolution-for-chatbots-3302365dcf30\" target=\"_blank\" rel=\"noopener\"><em><strong>article<\/strong><\/em><\/a>.<\/p>\n<p>Installing the library is simple. Just follow the instructions given <a href=\"https:\/\/github.com\/huggingface\/neuralcoref\" target=\"_blank\" rel=\"noopener\"><em><strong>here<\/strong><\/em><\/a>. I have heard that it does not work well some versions of <strong>spaCy<\/strong>, but my version of <em><strong>spaCy<\/strong><\/em> (ver 2.0.13) had no compatibility issues.<\/p>\n<p>We need to import <em><strong>spaCy<\/strong><\/em> and load the relevant coreference model. I chose the <em><strong>&#8220;large&#8221;<\/strong><\/em> model just to be safe.<\/p>\n<figure id=\"attachment_1428\" aria-describedby=\"caption-attachment-1428\" style=\"width: 277px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Loading-Model.jpg?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" data-attachment-id=\"1428\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/02\/03\/coreference-resolution-using-spacy\/loading-model\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Loading-Model.jpg\" data-orig-size=\"277,51\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;Admin&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;1549180375&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Loading the Coreference Model\" data-image-description=\"&lt;p&gt;Loading the Coreference Model&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Loading the Coreference Model&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Loading-Model.jpg\" class=\"size-full wp-image-1428\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Loading-Model.jpg?resize=277%2C51&#038;ssl=1\" alt=\"Loading the Coreference Model\" width=\"277\" height=\"51\" \/><\/a><figcaption id=\"caption-attachment-1428\" class=\"wp-caption-text\"><strong>Loading the Coreference Model<\/strong><\/figcaption><\/figure>\n<p>I wrote two simple functions, one to print all the coreference <em><strong>&#8220;mentions&#8221;<\/strong><\/em> in the document, and another to print the resolved pronoun references.<\/p>\n<figure id=\"attachment_1429\" aria-describedby=\"caption-attachment-1429\" style=\"width: 499px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Functions.jpg?ssl=1\"><img data-recalc-dims=\"1\" fetchpriority=\"high\" decoding=\"async\" data-attachment-id=\"1429\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/02\/03\/coreference-resolution-using-spacy\/functions\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Functions.jpg\" data-orig-size=\"499,228\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;Admin&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;1549180442&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Functions\" data-image-description=\"&lt;p&gt;Functions&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Functions&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Functions.jpg\" class=\"size-full wp-image-1429\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Functions.jpg?resize=499%2C228&#038;ssl=1\" alt=\"Functions\" width=\"499\" height=\"228\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Functions.jpg?w=499&amp;ssl=1 499w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Functions.jpg?resize=300%2C137&amp;ssl=1 300w\" sizes=\"(max-width: 499px) 100vw, 499px\" \/><\/a><figcaption id=\"caption-attachment-1429\" class=\"wp-caption-text\"><em><strong>Functions<\/strong><\/em><\/figcaption><\/figure>\n<p>The full source code is available <a href=\"http:\/\/www.rangakrish.com\/downloads\/CorefExample.py\" target=\"_blank\" rel=\"noopener\"><em><strong>here<\/strong><\/em><\/a>.<\/p>\n<p>Let us start with a simple example:<\/p>\n<blockquote><p><span style=\"color: #0000ff;\">&#8220;<b>My sister has a dog and she loves him.&#8221;<\/b><\/span><\/p><\/blockquote>\n<p>Let us see how the library resolves the references:<\/p>\n<figure id=\"attachment_1430\" aria-describedby=\"caption-attachment-1430\" style=\"width: 441px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output1.jpg?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" data-attachment-id=\"1430\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/02\/03\/coreference-resolution-using-spacy\/output1-4\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output1.jpg\" data-orig-size=\"441,169\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;Admin&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;1549122637&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Output1\" data-image-description=\"&lt;p&gt;Output&lt;\/p&gt;\n\" data-image-caption=\"\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output1.jpg\" class=\"size-full wp-image-1430\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output1.jpg?resize=441%2C169&#038;ssl=1\" alt=\"Output\" width=\"441\" height=\"169\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output1.jpg?w=441&amp;ssl=1 441w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output1.jpg?resize=300%2C115&amp;ssl=1 300w\" sizes=\"(max-width: 441px) 100vw, 441px\" \/><\/a><figcaption id=\"caption-attachment-1430\" class=\"wp-caption-text\"><strong>Output<\/strong><\/figcaption><\/figure>\n<p>That is good. The pronouns are mapped correctly, as we expect.<\/p>\n<p>Let us extend the above example with another sentence:<\/p>\n<blockquote><p><span style=\"color: #0000ff;\">&#8220;<b>My sister has a dog and she loves him. He is cute.&#8221;<\/b><\/span><\/p><\/blockquote>\n<p>No major challenge here. Let us see what the library does:<\/p>\n<figure id=\"attachment_1431\" aria-describedby=\"caption-attachment-1431\" style=\"width: 511px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output2.jpg?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1431\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/02\/03\/coreference-resolution-using-spacy\/output2-4\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output2.jpg\" data-orig-size=\"511,181\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;Admin&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;1549123103&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Output\" data-image-description=\"&lt;p&gt;Output&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Output&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output2.jpg\" class=\"size-full wp-image-1431\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output2.jpg?resize=511%2C181&#038;ssl=1\" alt=\"Output\" width=\"511\" height=\"181\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output2.jpg?w=511&amp;ssl=1 511w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output2.jpg?resize=300%2C106&amp;ssl=1 300w\" sizes=\"(max-width: 511px) 100vw, 511px\" \/><\/a><figcaption id=\"caption-attachment-1431\" class=\"wp-caption-text\"><strong>Output<\/strong><\/figcaption><\/figure>\n<p>Great. The references are correct.<\/p>\n<p>In the above examples, we implicitly assumed that the dog is male. What if we use the pronoun <em><strong>&#8220;her&#8221;<\/strong><\/em> instead of <em><strong>&#8220;him&#8221;<\/strong><\/em>, implying the dog is a female?<\/p>\n<blockquote><p><span style=\"color: #0000ff;\">&#8220;<b>My sister has a dog and she loves her.&#8221;<\/b><\/span><\/p><\/blockquote>\n<p>This is the output we get in this case:<\/p>\n<figure id=\"attachment_1433\" aria-describedby=\"caption-attachment-1433\" style=\"width: 449px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output3.jpg?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1433\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/02\/03\/coreference-resolution-using-spacy\/output3-4\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output3.jpg\" data-orig-size=\"449,154\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;Admin&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;1549123217&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Output\" data-image-description=\"&lt;p&gt;Output&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Output&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output3.jpg\" class=\"size-full wp-image-1433\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output3.jpg?resize=449%2C154&#038;ssl=1\" alt=\"Output\" width=\"449\" height=\"154\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output3.jpg?w=449&amp;ssl=1 449w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output3.jpg?resize=300%2C103&amp;ssl=1 300w\" sizes=\"(max-width: 449px) 100vw, 449px\" \/><\/a><figcaption id=\"caption-attachment-1433\" class=\"wp-caption-text\"><strong>Output<\/strong><\/figcaption><\/figure>\n<p>Clearly, the mapping of <em><strong>&#8220;her&#8221;<\/strong><\/em> to <em><strong>&#8220;My sister&#8221;<\/strong><\/em> is not what we expect. Could it be because of some confusion caused by both objects being females? Let us try a modified example:<\/p>\n<blockquote><p><span style=\"color: #0000ff;\">&#8220;<b>My brother has a dog and he loves her.&#8221;<\/b><\/span><\/p><\/blockquote>\n<p>In this case, since <em><strong>&#8220;brother&#8221;<\/strong><\/em> refers to a male, we expect that <em><strong>&#8220;her&#8221;<\/strong><\/em> should be easily resolved to the only other object, <em><strong>&#8220;dog&#8221;<\/strong><\/em>. Let us see:<\/p>\n<figure id=\"attachment_1434\" aria-describedby=\"caption-attachment-1434\" style=\"width: 441px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output4.jpg?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1434\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/02\/03\/coreference-resolution-using-spacy\/output4-2\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output4.jpg\" data-orig-size=\"441,153\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;Admin&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;1549123324&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Output\" data-image-description=\"&lt;p&gt;Output&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Output&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output4.jpg\" class=\"size-full wp-image-1434\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output4.jpg?resize=441%2C153&#038;ssl=1\" alt=\"Output\" width=\"441\" height=\"153\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output4.jpg?w=441&amp;ssl=1 441w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output4.jpg?resize=300%2C104&amp;ssl=1 300w\" sizes=\"(max-width: 441px) 100vw, 441px\" \/><\/a><figcaption id=\"caption-attachment-1434\" class=\"wp-caption-text\"><strong>Output<\/strong><\/figcaption><\/figure>\n<p>Strange! Even in this case, the library incorrectly maps <em><strong>&#8220;her&#8221;<\/strong><\/em> to <em><strong>&#8220;My brother&#8221;<\/strong><\/em>. Such errors could be due to the model, or because the machine learning approach does not lead to <em><strong>&#8220;understanding&#8221;<\/strong><\/em> in the way we humans understand text.<\/p>\n<p>Let us try another example:<\/p>\n<blockquote><p><span style=\"color: #0000ff;\">&#8220;<b>Mary and Julie are sisters. They love chocolates.<\/b>&#8220;<\/span><\/p><\/blockquote>\n<p>In this case, we expect <em><strong>&#8220;they&#8221;<\/strong><\/em> to refer to both <em><strong>&#8220;Mary and Julie&#8221;<\/strong><\/em> and not just one of them. Here is the output:<\/p>\n<figure id=\"attachment_1435\" aria-describedby=\"caption-attachment-1435\" style=\"width: 502px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output5.jpg?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1435\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/02\/03\/coreference-resolution-using-spacy\/output5\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output5.jpg\" data-orig-size=\"502,136\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;Admin&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;1549124965&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Output\" data-image-description=\"&lt;p&gt;Output&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Output&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output5.jpg\" class=\"size-full wp-image-1435\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output5.jpg?resize=502%2C136&#038;ssl=1\" alt=\"Output\" width=\"502\" height=\"136\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output5.jpg?w=502&amp;ssl=1 502w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output5.jpg?resize=300%2C81&amp;ssl=1 300w\" sizes=\"(max-width: 502px) 100vw, 502px\" \/><\/a><figcaption id=\"caption-attachment-1435\" class=\"wp-caption-text\"><strong>Output<\/strong><\/figcaption><\/figure>\n<p>That is nice. Works as expected. Let us introduce a twist in this pattern:<\/p>\n<blockquote><p><span style=\"color: #0000ff;\">&#8220;<b>John and Mary are neighbours. She admires him because he works hard.<\/b>&#8220;<\/span><\/p><\/blockquote>\n<p>We know that <em><strong>&#8220;John&#8221;<\/strong><\/em> is male and <em><strong>&#8220;Mary&#8221;<\/strong><\/em> is female, so <em><strong>&#8220;She&#8221;<\/strong><\/em> should map to <em><strong>&#8220;Mary&#8221;<\/strong><\/em> and both <em><strong>&#8220;him&#8221;<\/strong><\/em> and <em><strong>&#8220;he&#8221;<\/strong><\/em> should point to <em><strong>&#8220;John&#8221;<\/strong><\/em>. How does the library handle this?<\/p>\n<figure id=\"attachment_1436\" aria-describedby=\"caption-attachment-1436\" style=\"width: 647px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output6.jpg?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1436\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/02\/03\/coreference-resolution-using-spacy\/output6\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output6.jpg\" data-orig-size=\"647,180\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;Admin&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;1549125355&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Output\" data-image-description=\"&lt;p&gt;Output&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Output&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output6.jpg\" class=\"size-full wp-image-1436\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output6.jpg?resize=647%2C180&#038;ssl=1\" alt=\"Output\" width=\"647\" height=\"180\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output6.jpg?w=647&amp;ssl=1 647w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output6.jpg?resize=300%2C83&amp;ssl=1 300w\" sizes=\"(max-width: 647px) 100vw, 647px\" \/><\/a><figcaption id=\"caption-attachment-1436\" class=\"wp-caption-text\"><strong>Output<\/strong><\/figcaption><\/figure>\n<p>That is a pleasant surprise! The system resolved the pronouns correctly.<\/p>\n<p>Wait. What if we use abstract names such as <em><strong>&#8220;X&#8221;<\/strong><\/em> and <em><strong>&#8220;Y&#8221;<\/strong><\/em> instead of <em><strong>&#8220;John&#8221;<\/strong><\/em> and <em><strong>&#8220;Mary&#8221;<\/strong><\/em>?<\/p>\n<blockquote><p><span style=\"color: #0000ff;\">&#8220;<b>X and Y are neighbours. She admires him because he works hard.<\/b>&#8220;<\/span><\/p><\/blockquote>\n<p>This is what the library does in this case:<\/p>\n<figure id=\"attachment_1437\" aria-describedby=\"caption-attachment-1437\" style=\"width: 600px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output7.jpg?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1437\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/02\/03\/coreference-resolution-using-spacy\/output7\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output7.jpg\" data-orig-size=\"600,152\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;Admin&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;1549125512&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Output\" data-image-description=\"&lt;p&gt;Output&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Output&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output7.jpg\" class=\"size-full wp-image-1437\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output7.jpg?resize=600%2C152&#038;ssl=1\" alt=\"Output\" width=\"600\" height=\"152\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output7.jpg?w=600&amp;ssl=1 600w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output7.jpg?resize=300%2C76&amp;ssl=1 300w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><figcaption id=\"caption-attachment-1437\" class=\"wp-caption-text\"><strong>Output<\/strong><\/figcaption><\/figure>\n<p>Let us think for a minute here. How would we, as humans, have handled this case? I feel it is acceptable if <em><strong>&#8220;She&#8221;<\/strong><\/em> maps to <em><strong>&#8220;X&#8221;<\/strong><\/em> and both <em><strong>&#8220;him&#8221;<\/strong><\/em> and <em><strong>&#8220;he&#8221;<\/strong><\/em> map to <em><strong>&#8220;Y&#8221;<\/strong><\/em>, or the other way around. But the way the library has resolved the references stumps me!<\/p>\n<p>Let us try one last example:<\/p>\n<blockquote><p><span style=\"color: #0000ff;\">&#8220;<b>The dog chased the cat. But it escaped.&#8221;<\/b><\/span><\/p><\/blockquote>\n<p>This is what we get:<\/p>\n<figure id=\"attachment_1438\" aria-describedby=\"caption-attachment-1438\" style=\"width: 440px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output8.jpg?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1438\" data-permalink=\"https:\/\/www.rangakrish.com\/index.php\/2019\/02\/03\/coreference-resolution-using-spacy\/output8\/\" data-orig-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output8.jpg\" data-orig-size=\"440,132\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;Admin&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;1549125736&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Output\" data-image-description=\"&lt;p&gt;Output&lt;\/p&gt;\n\" data-image-caption=\"&lt;p&gt;Output&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output8.jpg\" class=\"size-full wp-image-1438\" src=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output8.jpg?resize=440%2C132&#038;ssl=1\" alt=\"Output\" width=\"440\" height=\"132\" srcset=\"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output8.jpg?w=440&amp;ssl=1 440w, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Output8.jpg?resize=300%2C90&amp;ssl=1 300w\" sizes=\"(max-width: 440px) 100vw, 440px\" \/><\/a><figcaption id=\"caption-attachment-1438\" class=\"wp-caption-text\"><strong>Output<\/strong><\/figcaption><\/figure>\n<p>I think we, as humans, would have mapped <em><strong>&#8220;it&#8221;<\/strong><\/em> to the <em><strong>&#8220;cat&#8221;<\/strong><\/em>\u00a0without much effort, because we understand the context. Erroneous mappings from the system appear unavoidable, given that the system does not <em><strong>&#8220;understand&#8221;<\/strong><\/em> the meaning of the sentence.\u00a0As I mentioned earlier in the<span class=\"Apple-converted-space\">\u00a0 <\/span>article, <em><strong>Coreference resolution<\/strong><\/em> is a complex task and I expect that\u00a0<em><strong>neuralcoref<\/strong><\/em> library and other similar systems will become better in due course.<\/p>\n<p>In case you haven&#8217;t noticed, all the examples I have considered for this article involve one type of coreference called\u00a0<em><strong>&#8220;anaphora&#8221;<\/strong><\/em>. There are other <a href=\"https:\/\/en.wikipedia.org\/wiki\/Coreference\" target=\"_blank\" rel=\"noopener\"><em><strong>types<\/strong><\/em><\/a>, which can be even more difficult to handle.<\/p>\n<p>It would be interesting to compare the performance of other libraries such as <a href=\"https:\/\/opennlp.apache.org\" target=\"_blank\" rel=\"noopener\"><em><strong>OpenNLP<\/strong><\/em><\/a>\u00a0and <a href=\"https:\/\/nlp.stanford.edu\/software\/lex-parser.shtml\" target=\"_blank\" rel=\"noopener\"><em><strong>Stanford Parser<\/strong><\/em><\/a>\u00a0on the same set of examples. Well, that is for another article.<\/p>\n<p>You can download my Python code from <a href=\"http:\/\/www.rangakrish.com\/downloads\/CorefExample.py\" target=\"_blank\" rel=\"noopener\"><em><strong>here<\/strong><\/em><\/a>. Have a nice weekend!<\/p>\n<p><span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>According to Stanford NLP Group, &#8220;Coreference resolution is the task of finding all expressions that refer to the same entity in a text&#8221;.\u00a0 You can also read this Wikipedia page. For example, in the sentence &#8220;Tom dropped the glass jar by accident and broke it&#8221;, what does &#8220;it&#8221; refer to? I am sure you will [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[78,107,17,103],"tags":[188,186,185,187],"class_list":["post-1427","post","type-post","status-publish","format-standard","hentry","category-machine-learning","category-natural-language-processing","category-programming","category-python","tag-anaphora","tag-coreference-resolution","tag-neuralcoref","tag-spacy"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p9OLnF-n1","jetpack-related-posts":[{"id":1444,"url":"https:\/\/www.rangakrish.com\/index.php\/2019\/02\/10\/coreference-resolution-in-stanford-corenlp\/","url_meta":{"origin":1427,"position":0},"title":"Coreference Resolution in Stanford CoreNLP","author":"admin","date":"February 10, 2019","format":false,"excerpt":"In the last article, I showed how we can use the neuralcoref\u00a0library along with spaCy\u00a0to do coreference resolution (examples involved anaphoric references). In today's article, I want to try the same (well, almost) examples in Stanford CoreNLP engine and see how they compare. Since CoreNLP is a Java implementation, I\u2026","rel":"","context":"In &quot;Natural Language Processing&quot;","block_context":{"text":"Natural Language Processing","link":"https:\/\/www.rangakrish.com\/index.php\/category\/natural-language-processing\/"},"img":{"alt_text":"Comparison Table","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Table.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Table.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Table.jpg?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/02\/Table.jpg?resize=700%2C400&ssl=1 2x"},"classes":[]},{"id":912,"url":"https:\/\/www.rangakrish.com\/index.php\/2018\/04\/22\/question-answering-using-dependency-trees\/","url_meta":{"origin":1427,"position":1},"title":"Question Answering\u00a0Using Dependency Trees","author":"admin","date":"April 22, 2018","format":false,"excerpt":"A few weeks ago I had written about my brief experiment with Mathematica's new feature, which provides answers to questions based on given text. After that post, I spent some time thinking about how to implement something similar. In today's post, I want to show you what I have been\u2026","rel":"","context":"In &quot;LISP&quot;","block_context":{"text":"LISP","link":"https:\/\/www.rangakrish.com\/index.php\/category\/lisp\/"},"img":{"alt_text":"Dependency Tree","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/04\/Deptree-example.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":1368,"url":"https:\/\/www.rangakrish.com\/index.php\/2019\/01\/08\/parsing-text-with-apache-opennlp\/","url_meta":{"origin":1427,"position":2},"title":"Parsing Text with Apache OpenNLP","author":"admin","date":"January 8, 2019","format":false,"excerpt":"In my earlier posts I have written about parsing text using spaCy\u00a0and MeaningCloud's parsing API. For today's article, I decided to take a look at OpenNLP, an open-source ML-based Java toolkit for parsing natural language text. OpenNLP is a fairly mature library and has been around since 2004 (source: Wikipedia).\u2026","rel":"","context":"In &quot;Natural Language Processing&quot;","block_context":{"text":"Natural Language Processing","link":"https:\/\/www.rangakrish.com\/index.php\/category\/natural-language-processing\/"},"img":{"alt_text":"Parse Tree","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/01\/Tree3.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":3312,"url":"https:\/\/www.rangakrish.com\/index.php\/2024\/01\/28\/the-hy-programming-language\/","url_meta":{"origin":1427,"position":3},"title":"The Hy Programming Language","author":"admin","date":"January 28, 2024","format":false,"excerpt":"In an earlier article\u00a0I had explained how to execute Python code from within Common Lisp using \u201cCLPython\u201d package. In contrast to that approach, \u201cHy\u201d\u00a0is a Lisp-style language (not compatible with Common Lisp) that is embedded in Python and hence provides seamless interoperability with Python code. Installation is straightforward (it is\u2026","rel":"","context":"In &quot;Hy Language&quot;","block_context":{"text":"Hy Language","link":"https:\/\/www.rangakrish.com\/index.php\/category\/hy-language\/"},"img":{"alt_text":"Hy REPL","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2024\/01\/console-300x148.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":1640,"url":"https:\/\/www.rangakrish.com\/index.php\/2019\/07\/11\/information-extraction-using-spacys-pattern-matcher\/","url_meta":{"origin":1427,"position":4},"title":"Information Extraction Using spaCy\u2019s Pattern Matcher","author":"admin","date":"July 11, 2019","format":false,"excerpt":"In the previous article, I explored the Deep Categorization capabilities of MeaningCloud. We saw how a powerful rule-based pattern matching language allowed us to map fragments of unstructured text to custom categories. In today\u2019s post, I want to go through spaCy\u2019s\u00a0pattern matching capabilities. The version I am using is 2.0.13.\u2026","rel":"","context":"In &quot;Homeopathy&quot;","block_context":{"text":"Homeopathy","link":"https:\/\/www.rangakrish.com\/index.php\/category\/homeopathy\/"},"img":{"alt_text":"Output","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/07\/Output.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/07\/Output.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2019\/07\/Output.jpg?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":1068,"url":"https:\/\/www.rangakrish.com\/index.php\/2018\/09\/16\/dependency-graph-to-rdf\/","url_meta":{"origin":1427,"position":5},"title":"Dependency Graph to RDF","author":"admin","date":"September 16, 2018","format":false,"excerpt":"Dependency parsing is widely used these days, and many NLP tools give a dependency graph as the parsed representation of the input text. See for example, SpacY and TextRazor.\u00a0 The following is the dependency tree corresponding to the sentence Mary is drinking cold water: The above tree was generated using\u2026","rel":"","context":"In &quot;Natural Language Processing&quot;","block_context":{"text":"Natural Language Processing","link":"https:\/\/www.rangakrish.com\/index.php\/category\/natural-language-processing\/"},"img":{"alt_text":"Dependency Graph","src":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/DepGraph.png?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/DepGraph.png?resize=350%2C200 1x, https:\/\/i0.wp.com\/www.rangakrish.com\/wp-content\/uploads\/2018\/09\/DepGraph.png?resize=525%2C300 1.5x"},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/posts\/1427","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/comments?post=1427"}],"version-history":[{"count":0,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/posts\/1427\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/media?parent=1427"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/categories?post=1427"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.rangakrish.com\/index.php\/wp-json\/wp\/v2\/tags?post=1427"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}