https://en.wikipedia.org/w/index.php?action=history&feed=atom&title=Yarowsky_algorithmYarowsky algorithm - Revision history2025-05-30T08:49:41ZRevision history for this page on the wikiMediaWiki 1.45.0-wmf.3https://en.wikipedia.org/w/index.php?title=Yarowsky_algorithm&diff=1136088487&oldid=prevOAbot: Open access bot: doi added to citation with #oabot.2023-01-28T18:00:12Z<p><a href="/wiki/Wikipedia:OABOT" class="mw-redirect" title="Wikipedia:OABOT">Open access bot</a>: doi added to citation with #oabot.</p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 18:00, 28 January 2023</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 42:</td>
<td colspan="2" class="diff-lineno">Line 42:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>| access-date = 1 November 2022</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>| access-date = 1 November 2022</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>| doi-access= free</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>}}</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>}}</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<!-- diff cache key enwiki:diff:1.41:old-1124443008:rev-1136088487:wikidiff2=table:1.14.1:ff290eae -->
</table>OAbothttps://en.wikipedia.org/w/index.php?title=Yarowsky_algorithm&diff=1124443008&oldid=prev209.64.107.75: /* Application */2022-11-28T23:31:50Z<p><span class="autocomment">Application</span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 23:31, 28 November 2022</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 11:</td>
<td colspan="2" class="diff-lineno">Line 11:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A [[decision list]] algorithm is then used to identify other reliable collocations. This training algorithm calculates the probability Pr(Sense&nbsp;|&nbsp;Collocation), and the decision list is ranked by the log-likelihood ratio:</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A [[decision list]] algorithm is then used to identify other reliable collocations. This training algorithm calculates the probability Pr(Sense&nbsp;|&nbsp;Collocation), and the decision list is ranked by the log-likelihood ratio:</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>: <math> \log \left( \frac{\Pr(\text{Sense}_A <del style="font-weight: bold; text-decoration: none;">|</del> \text{Collocation}_i)}{ \Pr( \text{Sense}_B <del style="font-weight: bold; text-decoration: none;">|</del> \text{Collocation}_i)} \right) </math></div></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>: <math> \log \left( \frac{\Pr(\text{Sense}_A <ins style="font-weight: bold; text-decoration: none;">\mid</ins> \text{Collocation}_i)}{ \Pr( \text{Sense}_B <ins style="font-weight: bold; text-decoration: none;">\mid</ins> \text{Collocation}_i)} \right) </math></div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A [[smoothing]] algorithm will then be used to avoid 0 values. The decision-list algorithm resolves many problems in a large set of non-independent evidence source by using only the most reliable piece of evidence rather than the whole matching collocation set.</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A [[smoothing]] algorithm will then be used to avoid 0 values. The decision-list algorithm resolves many problems in a large set of non-independent evidence source by using only the most reliable piece of evidence rather than the whole matching collocation set.</div></td>
</tr>
</table>209.64.107.75https://en.wikipedia.org/w/index.php?title=Yarowsky_algorithm&diff=1119431151&oldid=prevAryamccarthy: Use template for citation; update categories2022-11-01T15:12:13Z<p>Use template for citation; update categories</p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 15:12, 1 November 2022</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 29:</td>
<td colspan="2" class="diff-lineno">Line 29:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==References==</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==References==</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* {{cite journal</div></td>
</tr>
<tr>
<td class="diff-marker"><a class="mw-diff-movedpara-left" title="Paragraph was moved. Click to jump to new location." href="#movedpara_3_5_rhs">⚫</a></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><a name="movedpara_2_0_lhs"></a><del style="font-weight: bold; text-decoration: none;">*</del> <del style="font-weight: bold; text-decoration: none;">Yarowsky,</del> <del style="font-weight: bold; text-decoration: none;">D.</del> <del style="font-weight: bold; text-decoration: none;">"Unsupervised</del> <del style="font-weight: bold; text-decoration: none;">Word</del> <del style="font-weight: bold; text-decoration: none;">Sense</del> <del style="font-weight: bold; text-decoration: none;">Disambiguation Rivaling Supervised Methods". ''</del>Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics<del style="font-weight: bold; text-decoration: none;">''. Cambridge, MA, pp.&nbsp;189&ndash;196, 1995.</del></div></td>
<td colspan="2" class="diff-empty diff-side-added"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>| last1 = Yarowsky</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>| first1 = David</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>| date = 1995</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>| title = Unsupervised Word Sense Disambiguation Rivaling Supervised Methods</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>| url = https://aclanthology.org/P95-1026/</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker"><a class="mw-diff-movedpara-right" title="Paragraph was moved. Click to jump to old location." href="#movedpara_2_0_lhs">⚫</a></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><a name="movedpara_3_5_rhs"></a><ins style="font-weight: bold; text-decoration: none;">|</ins> <ins style="font-weight: bold; text-decoration: none;">journal</ins> <ins style="font-weight: bold; text-decoration: none;">=</ins> Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>| pages = 189–196</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>| doi = 10.3115/981658.981684</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>| location = Cambridge, MA</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>| publisher = Association for Computational Linguistics</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>| access-date = 1 November 2022</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>}}</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Corpus linguistics]]</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Corpus linguistics]]</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Word-sense disambiguation]]</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Word-sense disambiguation]]</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Natural language processing]]</div></td>
</tr>
</table>Aryamccarthyhttps://en.wikipedia.org/w/index.php?title=Yarowsky_algorithm&diff=1066550404&oldid=prevPrimeBOT: Task 24: remove a maintenance template following a TFD2022-01-18T22:29:09Z<p><a href="/wiki/User:PrimeBOT/24" class="mw-redirect" title="User:PrimeBOT/24">Task 24</a>: remove a maintenance template following <a href="/wiki/Wikipedia:Templates_for_discussion/Log/2021_February_9" title="Wikipedia:Templates for discussion/Log/2021 February 9">a TFD</a></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 22:29, 18 January 2022</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{short description|Method in computational linguistics}}</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{short description|Method in computational linguistics}}</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker"><a class="mw-diff-movedpara-right" title="Paragraph was moved. Click to jump to old location." href="#movedpara_3_1_lhs">⚫</a></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><a name="movedpara_1_0_rhs"></a>In [[computational linguistics]] the '''Yarowsky algorithm''' is an [[unsupervised learning]] [[algorithm]] for [[word sense disambiguation]] that uses the "one sense per [[collocation]]" and the "one sense per discourse" properties of [[natural language|human languages]] for word sense disambiguation. From observation, words tend to exhibit only one sense in most given discourse and in a given collocation.</div></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>{{Expert-subject|linguistics|date=January 2009}}</div></td>
<td colspan="2" class="diff-empty diff-side-added"></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td colspan="2" class="diff-empty diff-side-added"></td>
</tr>
<tr>
<td class="diff-marker"><a class="mw-diff-movedpara-left" title="Paragraph was moved. Click to jump to new location." href="#movedpara_1_0_rhs">⚫</a></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><a name="movedpara_3_1_lhs"></a>In [[computational linguistics]] the '''Yarowsky algorithm''' is an [[unsupervised learning]] [[algorithm]] for [[word sense disambiguation]] that uses the "one sense per [[collocation]]" and the "one sense per discourse" properties of [[natural language|human languages]] for word sense disambiguation. From observation, words tend to exhibit only one sense in most given discourse and in a given collocation.<del style="font-weight: bold; text-decoration: none;"> </del></div></td>
<td colspan="2" class="diff-empty diff-side-added"></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Application ==</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Application ==</div></td>
</tr>
<tr>
<td colspan="2" class="diff-lineno">Line 17:</td>
<td colspan="2" class="diff-lineno">Line 15:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A [[smoothing]] algorithm will then be used to avoid 0 values. The decision-list algorithm resolves many problems in a large set of non-independent evidence source by using only the most reliable piece of evidence rather than the whole matching collocation set.</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A [[smoothing]] algorithm will then be used to avoid 0 values. The decision-list algorithm resolves many problems in a large set of non-independent evidence source by using only the most reliable piece of evidence rather than the whole matching collocation set.</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The new resulting classifier will then be applied to the whole sample set. Add those examples in the [[errors and residuals in statistics|residual]] that are tagged as A or B with probability above a reasonable threshold to the seed sets. The decision-list algorithm and the above adding step are applied [[iteration|iteratively]]. As more newly-learned collocations are added to the seed sets, the sense A or sense B set will grow, and the original residual will shrink. However, these collocations stay in the seed sets only if their probability of classification remains above the threshold, otherwise they are returned to the residual for later classification. At the end of each iteration, the "one sense per discourse" property can be used to help preventing initially mistagged collocates and hence improving the purity of the seed sets.<del style="font-weight: bold; text-decoration: none;"> </del></div></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The new resulting classifier will then be applied to the whole sample set. Add those examples in the [[errors and residuals in statistics|residual]] that are tagged as A or B with probability above a reasonable threshold to the seed sets. The decision-list algorithm and the above adding step are applied [[iteration|iteratively]]. As more newly-learned collocations are added to the seed sets, the sense A or sense B set will grow, and the original residual will shrink. However, these collocations stay in the seed sets only if their probability of classification remains above the threshold, otherwise they are returned to the residual for later classification. At the end of each iteration, the "one sense per discourse" property can be used to help preventing initially mistagged collocates and hence improving the purity of the seed sets.</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In order to avoid strong collocates becoming indicators for the wrong class, the class-inclusion threshold needs to be randomly altered. For the same purpose, after intermediate convergence the algorithm will also need to increase the width of the context window.</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In order to avoid strong collocates becoming indicators for the wrong class, the class-inclusion threshold needs to be randomly altered. For the same purpose, after intermediate convergence the algorithm will also need to increase the width of the context window.</div></td>
</tr>
<tr>
<td colspan="2" class="diff-lineno">Line 31:</td>
<td colspan="2" class="diff-lineno">Line 29:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==References==</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==References==</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* Yarowsky, D. "Unsupervised Word Sense Disambiguation Rivaling Supervised Methods". ''Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics''. Cambridge, MA, pp.<del style="font-weight: bold; text-decoration: none;"> </del>189&ndash;196, 1995.</div></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* Yarowsky, D. "Unsupervised Word Sense Disambiguation Rivaling Supervised Methods". ''Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics''. Cambridge, MA, pp.<ins style="font-weight: bold; text-decoration: none;">&nbsp;</ins>189&ndash;196, 1995.</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Corpus linguistics]]</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Corpus linguistics]]</div></td>
</tr>
</table>PrimeBOThttps://en.wikipedia.org/w/index.php?title=Yarowsky_algorithm&diff=974046729&oldid=prevGVnayR: +short description2020-08-20T19:47:49Z<p>+short description</p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 19:47, 20 August 2020</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>{{short description|Method in computational linguistics}}</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{Expert-subject|linguistics|date=January 2009}}</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{Expert-subject|linguistics|date=January 2009}}</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
</table>GVnayRhttps://en.wikipedia.org/w/index.php?title=Yarowsky_algorithm&diff=469053915&oldid=prevCydebot: Robot - Speedily moving category Word sense disambiguation to :Category:Word-sense disambiguation per CFDS.2012-01-02T04:00:25Z<p>Robot - Speedily moving category Word sense disambiguation to <a href="/wiki/Category:Word-sense_disambiguation" title="Category:Word-sense disambiguation">Category:Word-sense disambiguation</a> per <a href="/wiki/Wikipedia:CFDS" class="mw-redirect" title="Wikipedia:CFDS">CFDS</a>.</p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 04:00, 2 January 2012</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 32:</td>
<td colspan="2" class="diff-lineno">Line 32:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Corpus linguistics]]</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Corpus linguistics]]</div></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Word<del style="font-weight: bold; text-decoration: none;"> </del>sense disambiguation]]</div></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Word<ins style="font-weight: bold; text-decoration: none;">-</ins>sense disambiguation]]</div></td>
</tr>
</table>Cydebothttps://en.wikipedia.org/w/index.php?title=Yarowsky_algorithm&diff=462909906&oldid=prevR'n'B: Fix links to disambiguation page Predicate2011-11-28T15:41:22Z<p>Fix <a href="/wiki/Wikipedia:DPL" class="mw-redirect" title="Wikipedia:DPL">links</a> to <a href="/wiki/Wikipedia:D" class="mw-redirect" title="Wikipedia:D">disambiguation</a> page <a href="/wiki/Predicate" title="Predicate">Predicate</a></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 15:41, 28 November 2011</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 7:</td>
<td colspan="2" class="diff-lineno">Line 7:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The algorithm starts with a large, untagged [[text corpus|corpus]], in which it identifies examples of the given [[polysemy|polysemous]] word, and stores all the relevant [[Sentence (linguistics)|sentence]]s as lines. For instance, Yarowsky uses the word "plant" in his 1995 paper to demonstrate the algorithm. If it is assumed that there are two possible senses of the word, the next step is to identify a small number of seed collocations representative of each sense, give each sense a label (i.e. sense A and B), then assign the appropriate label to all training examples containing the seed collocations. In this case, the words "life" and "manufacturing" are chosen as initial seed collocations for senses A and B respectively. The residual examples (85%&ndash;98% according to Yarowsky) remain untagged.</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The algorithm starts with a large, untagged [[text corpus|corpus]], in which it identifies examples of the given [[polysemy|polysemous]] word, and stores all the relevant [[Sentence (linguistics)|sentence]]s as lines. For instance, Yarowsky uses the word "plant" in his 1995 paper to demonstrate the algorithm. If it is assumed that there are two possible senses of the word, the next step is to identify a small number of seed collocations representative of each sense, give each sense a label (i.e. sense A and B), then assign the appropriate label to all training examples containing the seed collocations. In this case, the words "life" and "manufacturing" are chosen as initial seed collocations for senses A and B respectively. The residual examples (85%&ndash;98% according to Yarowsky) remain untagged.</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The algorithm should initially choose seed collocations representative that will distinguish sense A and B accurately and productively. This can be done by selecting seed words from a [[dictionary]]’s entry for that sense. The collocations tend to have stronger effect if they are adjacent to the target word, the effect weakens with distance. According to the criteria given in Yarowsky (1993), seed words that appear in the most reliable collocational relationships with the target word will be selected. The effect is much stronger for words in a [[predicate]]-argument relationship than for arbitrary associations at the same distance to the target word, and is much stronger for collocations with content words than with function words. Having said this, a collocation word can have several collocational relationships with the target word throughout the corpus. This could give the word different rankings or even different classifications. Alternatively, it can be done by identifying a single defining collocate for each class, and using for seeds only those contexts containing one of these defining words. A publicly available database [[WordNet]] can be used as an automatic source for such defining terms. In addition, words that occur near the target word in great frequency can be selected as seed collocations representative. This approach is not fully automatic, a human judge must decide which word will be selected for each target word’s sense, the outputs will be reliable indicators of the senses.</div></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The algorithm should initially choose seed collocations representative that will distinguish sense A and B accurately and productively. This can be done by selecting seed words from a [[dictionary]]’s entry for that sense. The collocations tend to have stronger effect if they are adjacent to the target word, the effect weakens with distance. According to the criteria given in Yarowsky (1993), seed words that appear in the most reliable collocational relationships with the target word will be selected. The effect is much stronger for words in a [[<ins style="font-weight: bold; text-decoration: none;">wikt:predicate|</ins>predicate]]-argument relationship than for arbitrary associations at the same distance to the target word, and is much stronger for collocations with content words than with function words. Having said this, a collocation word can have several collocational relationships with the target word throughout the corpus. This could give the word different rankings or even different classifications. Alternatively, it can be done by identifying a single defining collocate for each class, and using for seeds only those contexts containing one of these defining words. A publicly available database [[WordNet]] can be used as an automatic source for such defining terms. In addition, words that occur near the target word in great frequency can be selected as seed collocations representative. This approach is not fully automatic, a human judge must decide which word will be selected for each target word’s sense, the outputs will be reliable indicators of the senses.</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A [[decision list]] algorithm is then used to identify other reliable collocations. This training algorithm calculates the probability Pr(Sense&nbsp;|&nbsp;Collocation), and the decision list is ranked by the log-likelihood ratio:</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A [[decision list]] algorithm is then used to identify other reliable collocations. This training algorithm calculates the probability Pr(Sense&nbsp;|&nbsp;Collocation), and the decision list is ranked by the log-likelihood ratio:</div></td>
</tr>
</table>R'n'Bhttps://en.wikipedia.org/w/index.php?title=Yarowsky_algorithm&diff=434386812&oldid=prevMieaou at 10:27, 15 June 20112011-06-15T10:27:42Z<p></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 10:27, 15 June 2011</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 9:</td>
<td colspan="2" class="diff-lineno">Line 9:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The algorithm should initially choose seed collocations representative that will distinguish sense A and B accurately and productively. This can be done by selecting seed words from a [[dictionary]]’s entry for that sense. The collocations tend to have stronger effect if they are adjacent to the target word, the effect weakens with distance. According to the criteria given in Yarowsky (1993), seed words that appear in the most reliable collocational relationships with the target word will be selected. The effect is much stronger for words in a [[predicate]]-argument relationship than for arbitrary associations at the same distance to the target word, and is much stronger for collocations with content words than with function words. Having said this, a collocation word can have several collocational relationships with the target word throughout the corpus. This could give the word different rankings or even different classifications. Alternatively, it can be done by identifying a single defining collocate for each class, and using for seeds only those contexts containing one of these defining words. A publicly available database [[WordNet]] can be used as an automatic source for such defining terms. In addition, words that occur near the target word in great frequency can be selected as seed collocations representative. This approach is not fully automatic, a human judge must decide which word will be selected for each target word’s sense, the outputs will be reliable indicators of the senses.</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The algorithm should initially choose seed collocations representative that will distinguish sense A and B accurately and productively. This can be done by selecting seed words from a [[dictionary]]’s entry for that sense. The collocations tend to have stronger effect if they are adjacent to the target word, the effect weakens with distance. According to the criteria given in Yarowsky (1993), seed words that appear in the most reliable collocational relationships with the target word will be selected. The effect is much stronger for words in a [[predicate]]-argument relationship than for arbitrary associations at the same distance to the target word, and is much stronger for collocations with content words than with function words. Having said this, a collocation word can have several collocational relationships with the target word throughout the corpus. This could give the word different rankings or even different classifications. Alternatively, it can be done by identifying a single defining collocate for each class, and using for seeds only those contexts containing one of these defining words. A publicly available database [[WordNet]] can be used as an automatic source for such defining terms. In addition, words that occur near the target word in great frequency can be selected as seed collocations representative. This approach is not fully automatic, a human judge must decide which word will be selected for each target word’s sense, the outputs will be reliable indicators of the senses.</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>A decision<del style="font-weight: bold; text-decoration: none;">-</del>list algorithm is then used to identify other reliable collocations. This training algorithm calculates the probability Pr(Sense&nbsp;|&nbsp;Collocation), and the decision list is ranked by the log-likelihood ratio:</div></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>A <ins style="font-weight: bold; text-decoration: none;">[[</ins>decision<ins style="font-weight: bold; text-decoration: none;"> </ins>list<ins style="font-weight: bold; text-decoration: none;">]]</ins> algorithm is then used to identify other reliable collocations. This training algorithm calculates the probability Pr(Sense&nbsp;|&nbsp;Collocation), and the decision list is ranked by the log-likelihood ratio:</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>: <math> \log \left( \frac{\Pr(\text{Sense}_A | \text{Collocation}_i)}{ \Pr( \text{Sense}_B | \text{Collocation}_i)} \right) </math></div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>: <math> \log \left( \frac{\Pr(\text{Sense}_A | \text{Collocation}_i)}{ \Pr( \text{Sense}_B | \text{Collocation}_i)} \right) </math></div></td>
</tr>
</table>Mieaouhttps://en.wikipedia.org/w/index.php?title=Yarowsky_algorithm&diff=427637809&oldid=prevMichael Hardy: dab2011-05-05T20:12:13Z<p>dab</p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 20:12, 5 May 2011</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 15:</td>
<td colspan="2" class="diff-lineno">Line 15:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A [[smoothing]] algorithm will then be used to avoid 0 values. The decision-list algorithm resolves many problems in a large set of non-independent evidence source by using only the most reliable piece of evidence rather than the whole matching collocation set.</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A [[smoothing]] algorithm will then be used to avoid 0 values. The decision-list algorithm resolves many problems in a large set of non-independent evidence source by using only the most reliable piece of evidence rather than the whole matching collocation set.</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The new resulting classifier will then be applied to the whole sample set. Add those examples in the [[residual]] that are tagged as A or B with probability above a reasonable threshold to the seed sets. The decision-list algorithm and the above adding step are applied [[iteration|iteratively]]. As more newly-learned collocations are added to the seed sets, the sense A or sense B set will grow, and the original residual will shrink. However, these collocations stay in the seed sets only if their probability of classification remains above the threshold, otherwise they are returned to the residual for later classification. At the end of each iteration, the "one sense per discourse" property can be used to help preventing initially mistagged collocates and hence improving the purity of the seed sets. </div></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The new resulting classifier will then be applied to the whole sample set. Add those examples in the [[<ins style="font-weight: bold; text-decoration: none;">errors and residuals in statistics|</ins>residual]] that are tagged as A or B with probability above a reasonable threshold to the seed sets. The decision-list algorithm and the above adding step are applied [[iteration|iteratively]]. As more newly-learned collocations are added to the seed sets, the sense A or sense B set will grow, and the original residual will shrink. However, these collocations stay in the seed sets only if their probability of classification remains above the threshold, otherwise they are returned to the residual for later classification. At the end of each iteration, the "one sense per discourse" property can be used to help preventing initially mistagged collocates and hence improving the purity of the seed sets. </div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In order to avoid strong collocates becoming indicators for the wrong class, the class-inclusion threshold needs to be randomly altered. For the same purpose, after intermediate convergence the algorithm will also need to increase the width of the context window.</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In order to avoid strong collocates becoming indicators for the wrong class, the class-inclusion threshold needs to be randomly altered. For the same purpose, after intermediate convergence the algorithm will also need to increase the width of the context window.</div></td>
</tr>
</table>Michael Hardyhttps://en.wikipedia.org/w/index.php?title=Yarowsky_algorithm&diff=386462165&oldid=prevThe Interior: /* Application */ Repairing links to disambiguation pages - You can help!2010-09-23T03:14:48Z<p><span class="autocomment">Application: </span> Repairing links to disambiguation pages - <a href="/wiki/Wikipedia:Disambiguation_pages_with_links" title="Wikipedia:Disambiguation pages with links">You can help!</a></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 03:14, 23 September 2010</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 5:</td>
<td colspan="2" class="diff-lineno">Line 5:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Application ==</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Application ==</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The algorithm starts with a large, untagged [[corpus]], in which it identifies examples of the given [[polysemy|polysemous]] word, and stores all the relevant [[Sentence (linguistics)|sentence]]s as lines. For instance, Yarowsky uses the word "plant" in his 1995 paper to demonstrate the algorithm. If it is assumed that there are two possible senses of the word, the next step is to identify a small number of seed collocations representative of each sense, give each sense a label (i.e. sense A and B), then assign the appropriate label to all training examples containing the seed collocations. In this case, the words "life" and "manufacturing" are chosen as initial seed collocations for senses A and B respectively. The residual examples (85%&ndash;98% according to Yarowsky) remain untagged.</div></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The algorithm starts with a large, untagged [[<ins style="font-weight: bold; text-decoration: none;">text corpus|</ins>corpus]], in which it identifies examples of the given [[polysemy|polysemous]] word, and stores all the relevant [[Sentence (linguistics)|sentence]]s as lines. For instance, Yarowsky uses the word "plant" in his 1995 paper to demonstrate the algorithm. If it is assumed that there are two possible senses of the word, the next step is to identify a small number of seed collocations representative of each sense, give each sense a label (i.e. sense A and B), then assign the appropriate label to all training examples containing the seed collocations. In this case, the words "life" and "manufacturing" are chosen as initial seed collocations for senses A and B respectively. The residual examples (85%&ndash;98% according to Yarowsky) remain untagged.</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The algorithm should initially choose seed collocations representative that will distinguish sense A and B accurately and productively. This can be done by selecting seed words from a [[dictionary]]’s entry for that sense. The collocations tend to have stronger effect if they are adjacent to the target word, the effect weakens with distance. According to the criteria given in Yarowsky (1993), seed words that appear in the most reliable collocational relationships with the target word will be selected. The effect is much stronger for words in a [[predicate]]-argument relationship than for arbitrary associations at the same distance to the target word, and is much stronger for collocations with content words than with function words. Having said this, a collocation word can have several collocational relationships with the target word throughout the corpus. This could give the word different rankings or even different classifications. Alternatively, it can be done by identifying a single defining collocate for each class, and using for seeds only those contexts containing one of these defining words. A publicly available database [[WordNet]] can be used as an automatic source for such defining terms. In addition, words that occur near the target word in great frequency can be selected as seed collocations representative. This approach is not fully automatic, a human judge must decide which word will be selected for each target word’s sense, the outputs will be reliable indicators of the senses.</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The algorithm should initially choose seed collocations representative that will distinguish sense A and B accurately and productively. This can be done by selecting seed words from a [[dictionary]]’s entry for that sense. The collocations tend to have stronger effect if they are adjacent to the target word, the effect weakens with distance. According to the criteria given in Yarowsky (1993), seed words that appear in the most reliable collocational relationships with the target word will be selected. The effect is much stronger for words in a [[predicate]]-argument relationship than for arbitrary associations at the same distance to the target word, and is much stronger for collocations with content words than with function words. Having said this, a collocation word can have several collocational relationships with the target word throughout the corpus. This could give the word different rankings or even different classifications. Alternatively, it can be done by identifying a single defining collocate for each class, and using for seeds only those contexts containing one of these defining words. A publicly available database [[WordNet]] can be used as an automatic source for such defining terms. In addition, words that occur near the target word in great frequency can be selected as seed collocations representative. This approach is not fully automatic, a human judge must decide which word will be selected for each target word’s sense, the outputs will be reliable indicators of the senses.</div></td>
</tr>
</table>The Interior