https://en.wikipedia.org/w/index.php?action=history&feed=atom&title=Cache_language_model Cache language model - Revision history 2025-06-26T04:25:20Z Revision history for this page on the wiki MediaWiki 1.45.0-wmf.6 https://en.wikipedia.org/w/index.php?title=Cache_language_model&diff=1214930617&oldid=prev Me, Myself, and I are Here: fixed dashes using a script 2024-03-22T02:33:40Z <p>fixed <a href="/wiki/MOS:DASH" class="mw-redirect" title="MOS:DASH">dashes</a> using a <a href="/wiki/User:GregU/dashes.js" title="User:GregU/dashes.js">script</a></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 02:33, 22 March 2024</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 1:</td> <td colspan="2" class="diff-lineno">Line 1:</td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><br /></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A '''cache language model''' is a type of statistical [[language model]]. These occur in the [[natural language processing]] subfield of [[computer science]] and assign [[probability|probabilities]] to given sequences of words by means of a [[probability distribution]]. Statistical language models are key components of [[speech recognition]] systems and of many [[machine translation]] systems: they tell such systems which possible output word sequences are probable and which are improbable. The particular characteristic of a cache language model is that it contains a [[Cache (computing)|cache component]] and assigns relatively high probabilities to words or word sequences that occur elsewhere in a given text. The primary, but by no means sole, use of cache language models is in speech recognition systems.{{Citation needed|date=September 2011}}</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A '''cache language model''' is a type of statistical [[language model]]. These occur in the [[natural language processing]] subfield of [[computer science]] and assign [[probability|probabilities]] to given sequences of words by means of a [[probability distribution]]. Statistical language models are key components of [[speech recognition]] systems and of many [[machine translation]] systems: they tell such systems which possible output word sequences are probable and which are improbable. The particular characteristic of a cache language model is that it contains a [[Cache (computing)|cache component]] and assigns relatively high probabilities to words or word sequences that occur elsewhere in a given text. The primary, but by no means sole, use of cache language models is in speech recognition systems.{{Citation needed|date=September 2011}}</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td colspan="2" class="diff-lineno">Line 12:</td> <td colspan="2" class="diff-lineno">Line 11:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The success of the cache language model in improving [[word prediction]] rests on the human tendency to use words in a "bursty" fashion: when one is discussing a certain topic in a certain context, the frequency with which one uses certain words will be quite different from their frequencies when one is discussing other topics in other contexts. The traditional N-gram language models, which rely entirely on information from a very small number (four, three, or two) of words preceding the word to which a probability is to be assigned, do not adequately model this "burstiness".{{Citation needed|date=September 2011}}</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The success of the cache language model in improving [[word prediction]] rests on the human tendency to use words in a "bursty" fashion: when one is discussing a certain topic in a certain context, the frequency with which one uses certain words will be quite different from their frequencies when one is discussing other topics in other contexts. The traditional N-gram language models, which rely entirely on information from a very small number (four, three, or two) of words preceding the word to which a probability is to be assigned, do not adequately model this "burstiness".{{Citation needed|date=September 2011}}</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Recently, the cache language model concept <del style="font-weight: bold; text-decoration: none;">-</del> originally conceived for the N-gram statistical language model paradigm <del style="font-weight: bold; text-decoration: none;">-</del> has been adapted for use in the neural paradigm. For instance, recent work on continuous cache language models in the [[recurrent neural network]] (RNN) setting has applied the cache concept to much larger contexts than before, yielding significant reductions in perplexity.&lt;ref&gt;{{cite conference |url=https://dl.acm.org/citation.cfm?id=3295353 |title=Unbounded cache model for online language modeling with open vocabulary | author=Edouard Grave |author2= Moustapha Cisse |author3=Armand Joulin |book-title=NIPS'17 Proceedings of the 31st International Conference on Neural Information Processing Systems |year=2017 |pages=6044–6054 |location=Long Beach, California |publisher=Association for Computing Machinery |isbn=978-1-5108-6096-4}}&lt;/ref&gt; Another recent line of research involves incorporating a cache component in a [[Feedforward neural network|feed-forward]] neural language model (FN-LM) to achieve rapid domain adaptation.&lt;ref&gt;{{cite conference | title=i-Vectors in Language Modeling: An Efficient Way of Domain Adaptation for Feed-Forward Models | author=Karel Benes|author2=Santosh Kesiraju |author3=Lukas Burget | s2cid=52192034 | conference=Interspeech 2018 | year=2018 | pages=3383–3387 | location=Hyderabad, India | publisher=Interspeech| doi=10.21437/Interspeech.2018-1070 }}&lt;/ref&gt;</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Recently, the cache language model concept <ins style="font-weight: bold; text-decoration: none;">–</ins> originally conceived for the N-gram statistical language model paradigm <ins style="font-weight: bold; text-decoration: none;">–</ins> has been adapted for use in the neural paradigm. For instance, recent work on continuous cache language models in the [[recurrent neural network]] (RNN) setting has applied the cache concept to much larger contexts than before, yielding significant reductions in perplexity.&lt;ref&gt;{{cite conference |url=https://dl.acm.org/citation.cfm?id=3295353 |title=Unbounded cache model for online language modeling with open vocabulary | author=Edouard Grave |author2= Moustapha Cisse |author3=Armand Joulin |book-title=NIPS'17 Proceedings of the 31st International Conference on Neural Information Processing Systems |year=2017 |pages=6044–6054 |location=Long Beach, California |publisher=Association for Computing Machinery |isbn=978-1-5108-6096-4}}&lt;/ref&gt; Another recent line of research involves incorporating a cache component in a [[Feedforward neural network|feed-forward]] neural language model (FN-LM) to achieve rapid domain adaptation.&lt;ref&gt;{{cite conference | title=i-Vectors in Language Modeling: An Efficient Way of Domain Adaptation for Feed-Forward Models | author=Karel Benes|author2=Santosh Kesiraju |author3=Lukas Burget | s2cid=52192034 | conference=Interspeech 2018 | year=2018 | pages=3383–3387 | location=Hyderabad, India | publisher=Interspeech| doi=10.21437/Interspeech.2018-1070 }}&lt;/ref&gt;</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==See also==</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==See also==</div></td> </tr> </table> Me, Myself, and I are Here https://en.wikipedia.org/w/index.php?title=Cache_language_model&diff=1190170104&oldid=prev Tillander: clean up 2023-12-16T09:54:59Z <p>clean up</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 09:54, 16 December 2023</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 1:</td> <td colspan="2" class="diff-lineno">Line 1:</td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>{{Confusing|date=December 2022}}</div></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A '''cache language model''' is a type of statistical [[language model]]. These occur in the [[natural language processing]] subfield of [[computer science]] and assign [[probability|probabilities]] to given sequences of words by means of a [[probability distribution]]. Statistical language models are key components of [[speech recognition]] systems and of many [[machine translation]] systems: they tell such systems which possible output word sequences are probable and which are improbable. The particular characteristic of a cache language model is that it contains a [[Cache (computing)|cache component]] and assigns relatively high probabilities to words or word sequences that occur elsewhere in a given text. The primary, but by no means sole, use of cache language models is in speech recognition systems.{{Citation needed|date=September 2011}}</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A '''cache language model''' is a type of statistical [[language model]]. These occur in the [[natural language processing]] subfield of [[computer science]] and assign [[probability|probabilities]] to given sequences of words by means of a [[probability distribution]]. Statistical language models are key components of [[speech recognition]] systems and of many [[machine translation]] systems: they tell such systems which possible output word sequences are probable and which are improbable. The particular characteristic of a cache language model is that it contains a [[Cache (computing)|cache component]] and assigns relatively high probabilities to words or word sequences that occur elsewhere in a given text. The primary, but by no means sole, use of cache language models is in speech recognition systems.{{Citation needed|date=September 2011}}</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> </table> Tillander https://en.wikipedia.org/w/index.php?title=Cache_language_model&diff=1188269941&oldid=prev Badui Nijil at 09:46, 4 December 2023 2023-12-04T09:46:23Z <p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 09:46, 4 December 2023</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 12:</td> <td colspan="2" class="diff-lineno">Line 12:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The success of the cache language model in improving [[word prediction]] rests on the human tendency to use words in a "bursty" fashion: when one is discussing a certain topic in a certain context, the frequency with which one uses certain words will be quite different from their frequencies when one is discussing other topics in other contexts. The traditional N-gram language models, which rely entirely on information from a very small number (four, three, or two) of words preceding the word to which a probability is to be assigned, do not adequately model this "burstiness".{{Citation needed|date=September 2011}}</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The success of the cache language model in improving [[word prediction]] rests on the human tendency to use words in a "bursty" fashion: when one is discussing a certain topic in a certain context, the frequency with which one uses certain words will be quite different from their frequencies when one is discussing other topics in other contexts. The traditional N-gram language models, which rely entirely on information from a very small number (four, three, or two) of words preceding the word to which a probability is to be assigned, do not adequately model this "burstiness".{{Citation needed|date=September 2011}}</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Recently, the cache language model concept - originally conceived for the N-gram statistical language model paradigm - has been adapted for use in the neural paradigm. For instance, recent work on continuous cache language models in the [[recurrent neural network]] (RNN) setting has applied the cache concept to much larger contexts than before, yielding significant reductions in perplexity</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Recently, the cache language model concept - originally conceived for the N-gram statistical language model paradigm - has been adapted for use in the neural paradigm. For instance, recent work on continuous cache language models in the [[recurrent neural network]] (RNN) setting has applied the cache concept to much larger contexts than before, yielding significant reductions in perplexity<ins style="font-weight: bold; text-decoration: none;">.&lt;ref&gt;{{cite conference |url=https://dl.acm.org/citation.cfm?id=3295353 |title=Unbounded cache model for online language modeling with open vocabulary | author=Edouard Grave |author2= Moustapha Cisse |author3=Armand Joulin |book-title=NIPS'17 Proceedings of the 31st International Conference on Neural Information Processing Systems |year=2017 |pages=6044–6054 |location=Long Beach, California |publisher=Association for Computing Machinery |isbn=978-1-5108-6096-4}}&lt;/ref&gt; Another recent line of research involves incorporating a cache component in a [[Feedforward neural network|feed-forward]] neural language model (FN-LM) to achieve rapid domain adaptation.&lt;ref&gt;{{cite conference | title=i-Vectors in Language Modeling: An Efficient Way of Domain Adaptation for Feed-Forward Models | author=Karel Benes|author2=Santosh Kesiraju |author3=Lukas Burget | s2cid=52192034 | conference=Interspeech 2018 | year=2018 | pages=3383–3387 | location=Hyderabad, India | publisher=Interspeech| doi=10.21437/Interspeech.2018-1070 }}&lt;/ref&gt;</ins></div></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>.&lt;ref&gt;{{cite conference |url=https://dl.acm.org/citation.cfm?id=3295353 |title=Unbounded cache model for online language modeling with open vocabulary | author=Edouard Grave |author2= Moustapha Cisse |author3=Armand Joulin |book-title=NIPS'17 Proceedings of the 31st International Conference on Neural Information Processing Systems |year=2017 |pages=6044–6054 |location=Long Beach, California |publisher=Association for Computing Machinery |isbn=978-1-5108-6096-4}}&lt;/ref&gt; Another recent line of research involves incorporating a cache component in a [[Feedforward neural network|feed-forward]] neural language model (FN-LM) to achieve rapid domain adaptation </div></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>.&lt;ref&gt;{{cite conference | title=i-Vectors in Language Modeling: An Efficient Way of Domain Adaptation for Feed-Forward Models | author=Karel Benes|author2=Santosh Kesiraju |author3=Lukas Burget | s2cid=52192034 | conference=Interspeech 2018 | year=2018 | pages=3383–3387 | location=Hyderabad, India | publisher=Interspeech| doi=10.21437/Interspeech.2018-1070 }}&lt;/ref&gt;</div></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==See also==</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==See also==</div></td> </tr> </table> Badui Nijil https://en.wikipedia.org/w/index.php?title=Cache_language_model&diff=1188269853&oldid=prev Badui Nijil at 09:45, 4 December 2023 2023-12-04T09:45:19Z <p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 09:45, 4 December 2023</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 4:</td> <td colspan="2" class="diff-lineno">Line 4:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>To understand why it is a good idea for a statistical language model to contain a cache component one might consider someone who is dictating a letter about elephants to a speech recognition system. Standard (non-cache) [[N-gram]] language models will assign a very low probability to the word "elephant" because it is a very rare word in [[English language|English]]. If the speech recognition system does not contain a cache component, the person dictating the letter may be annoyed: each time the word "elephant" is spoken another sequence of words with a higher probability according to the N-gram language model may be recognized (e.g., "tell a plan"). These erroneous sequences will have to be deleted manually and replaced in the text by "elephant" each time "elephant" is spoken. If the system has a cache language model, "elephant" will still probably be misrecognized the first time it is spoken and will have to be entered into the text manually; however, from this point on the system is aware that "elephant" is likely to occur again – the estimated probability of occurrence of "elephant" has been increased, making it more likely that if it is spoken it will be recognized correctly. Once "elephant" has occurred several times, the system is likely to recognize it correctly every time it is spoken until the letter has been completely dictated. This increase in the probability assigned to the occurrence of "elephant" is an example of a consequence of [[machine learning]] and more specifically of [[pattern recognition]].</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>To understand why it is a good idea for a statistical language model to contain a cache component one might consider someone who is dictating a letter about elephants to a speech recognition system. Standard (non-cache) [[N-gram]] language models will assign a very low probability to the word "elephant" because it is a very rare word in [[English language|English]]. If the speech recognition system does not contain a cache component, the person dictating the letter may be annoyed: each time the word "elephant" is spoken another sequence of words with a higher probability according to the N-gram language model may be recognized (e.g., "tell a plan"). These erroneous sequences will have to be deleted manually and replaced in the text by "elephant" each time "elephant" is spoken. If the system has a cache language model, "elephant" will still probably be misrecognized the first time it is spoken and will have to be entered into the text manually; however, from this point on the system is aware that "elephant" is likely to occur again – the estimated probability of occurrence of "elephant" has been increased, making it more likely that if it is spoken it will be recognized correctly. Once "elephant" has occurred several times, the system is likely to recognize it correctly every time it is spoken until the letter has been completely dictated. This increase in the probability assigned to the occurrence of "elephant" is an example of a consequence of [[machine learning]] and more specifically of [[pattern recognition]].</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>There exist variants of the cache language model in which not only single words but also multi-word sequences that have occurred previously are assigned higher probabilities (e.g., if "San Francisco" occurred near the beginning of the text subsequent instances of it would be assigned a higher probability).</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>There exist variants of the cache language model in which not only single words but also multi-word sequences that have occurred previously are assigned higher probabilities (e.g., if "San Francisco" occurred near the beginning of the text subsequent instances of it would be assigned a higher probability).<ins style="font-weight: bold; text-decoration: none;">{{Citation needed|date=September 2011}}</ins></div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The cache language model was first proposed in a paper published in 1990,&lt;ref&gt;{{cite journal | last1=Kuhn | first1=R. | last2=De Mori | first2=R. | author-link2=Renato de Mori | title=A Cache-Based Natural Language Model for Speech Recognition | journal=[[IEEE Transactions on Pattern Analysis and Machine Intelligence]] | date=June 1990 | volume=12 | issue=6 | pages=570–583 | url=http://visgraph.cs.ust.hk/biometrics/Papers/Voice/pami1990-06-01.pdf | issn=1939-3539 | doi=10.1109/34.56193 | access-date=2011-09-24 | archive-url=https://web.archive.org/web/20110901154408/http://visgraph.cs.ust.hk/biometrics/Papers/Voice/pami1990-06-01.pdf | archive-date=2011-09-01 | url-status=dead }} ([https://www.computer.org/csdl/trans/tp/1990/06/i0570-abs.html Abstract])&lt;/ref&gt; after which the [[IBM]] speech-recognition group experimented with the concept. The group found that implementation of a form of cache language model yielded a 24% drop in [[Word error rate|word-error rates]] once the first few hundred words of a document had been dictated.&lt;ref&gt;{{cite journal | url=http://acl.ldc.upenn.edu/H/H91/H91-1057.pdf | title=A Dynamic Language Model for Speech Recognition | author=F. Jelinek|author2=B. Merialdo|author3=S. Roukos|author4=M. Strauss|name-list-style=amp | journal=The Journal of the Acoustical Society of America | year=1991 | volume=98 | issue=2 | pages=293–295 | doi=10.3115/112405.112464 | s2cid=11601499 | archive-url=https://web.archive.org/web/20060614121245/http://acl.ldc.upenn.edu/H/H91/H91-1057.pdf | archive-date=June 14, 2006 | url-status=dead}} Conference: Speech and Natural Language, Proceedings of a Workshop held at Pacific Grove, California, USA, February 19–22, 1999.&lt;/ref&gt; A detailed survey of language modeling techniques concluded that the cache language model was one of the few new language modeling techniques that yielded improvements over the standard N-gram approach: "Our caching results show that caching is by far the most useful technique for perplexity reduction at small and medium [[Training set|training data]] sizes".&lt;ref&gt;{{cite book|author=Joshua T. Goodman|year=2001|title=A Bit of Progress in Language Modeling: Extended Version|publisher=Microsoft Research|location=Redmond, WA (US)|id=Technical report MSR-TR-2001-72|arxiv=cs/0108005v1 |bibcode=2001cs........8005G}}&lt;/ref&gt;</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The cache language model was first proposed in a paper published in 1990,&lt;ref&gt;{{cite journal | last1=Kuhn | first1=R. | last2=De Mori | first2=R. | author-link2=Renato de Mori | title=A Cache-Based Natural Language Model for Speech Recognition | journal=[[IEEE Transactions on Pattern Analysis and Machine Intelligence]] | date=June 1990 | volume=12 | issue=6 | pages=570–583 | url=http://visgraph.cs.ust.hk/biometrics/Papers/Voice/pami1990-06-01.pdf | issn=1939-3539 | doi=10.1109/34.56193 | access-date=2011-09-24 | archive-url=https://web.archive.org/web/20110901154408/http://visgraph.cs.ust.hk/biometrics/Papers/Voice/pami1990-06-01.pdf | archive-date=2011-09-01 | url-status=dead }} ([https://www.computer.org/csdl/trans/tp/1990/06/i0570-abs.html Abstract])&lt;/ref&gt; after which the [[IBM]] speech-recognition group experimented with the concept. The group found that implementation of a form of cache language model yielded a 24% drop in [[Word error rate|word-error rates]] once the first few hundred words of a document had been dictated.&lt;ref&gt;{{cite journal | url=http://acl.ldc.upenn.edu/H/H91/H91-1057.pdf | title=A Dynamic Language Model for Speech Recognition | author=F. Jelinek|author2=B. Merialdo|author3=S. Roukos|author4=M. Strauss|name-list-style=amp | journal=The Journal of the Acoustical Society of America | year=1991 | volume=98 | issue=2 | pages=293–295 | doi=10.3115/112405.112464 | s2cid=11601499 | archive-url=https://web.archive.org/web/20060614121245/http://acl.ldc.upenn.edu/H/H91/H91-1057.pdf | archive-date=June 14, 2006 | url-status=dead}} Conference: Speech and Natural Language, Proceedings of a Workshop held at Pacific Grove, California, USA, February 19–22, 1999.&lt;/ref&gt; A detailed survey of language modeling techniques concluded that the cache language model was one of the few new language modeling techniques that yielded improvements over the standard N-gram approach: "Our caching results show that caching is by far the most useful technique for perplexity reduction at small and medium [[Training set|training data]] sizes".&lt;ref&gt;{{cite book|author=Joshua T. Goodman|year=2001|title=A Bit of Progress in Language Modeling: Extended Version|publisher=Microsoft Research|location=Redmond, WA (US)|id=Technical report MSR-TR-2001-72|arxiv=cs/0108005v1 |bibcode=2001cs........8005G}}&lt;/ref&gt;</div></td> </tr> </table> Badui Nijil https://en.wikipedia.org/w/index.php?title=Cache_language_model&diff=1137788424&oldid=prev Nierty at 12:49, 6 February 2023 2023-02-06T12:49:39Z <p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 12:49, 6 February 2023</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 2:</td> <td colspan="2" class="diff-lineno">Line 2:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A '''cache language model''' is a type of statistical [[language model]]. These occur in the [[natural language processing]] subfield of [[computer science]] and assign [[probability|probabilities]] to given sequences of words by means of a [[probability distribution]]. Statistical language models are key components of [[speech recognition]] systems and of many [[machine translation]] systems: they tell such systems which possible output word sequences are probable and which are improbable. The particular characteristic of a cache language model is that it contains a [[Cache (computing)|cache component]] and assigns relatively high probabilities to words or word sequences that occur elsewhere in a given text. The primary, but by no means sole, use of cache language models is in speech recognition systems.{{Citation needed|date=September 2011}}</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A '''cache language model''' is a type of statistical [[language model]]. These occur in the [[natural language processing]] subfield of [[computer science]] and assign [[probability|probabilities]] to given sequences of words by means of a [[probability distribution]]. Statistical language models are key components of [[speech recognition]] systems and of many [[machine translation]] systems: they tell such systems which possible output word sequences are probable and which are improbable. The particular characteristic of a cache language model is that it contains a [[Cache (computing)|cache component]] and assigns relatively high probabilities to words or word sequences that occur elsewhere in a given text. The primary, but by no means sole, use of cache language models is in speech recognition systems.{{Citation needed|date=September 2011}}</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>To understand why it is a good idea for a statistical language model to contain a cache component one might consider someone who is dictating a letter about elephants to a speech recognition system. Standard (non-cache) [[N-gram]] language models will assign a very low probability to the word "elephant" because it is a very rare word in [[English language|English]]. If the speech recognition system does not contain a cache component the person dictating the letter may be annoyed: each time the word "elephant" is spoken another sequence of words with a higher probability according to the N-gram language model may be recognized (e.g., "tell a plan"). These erroneous sequences will have to be deleted manually and replaced in the text by "elephant" each time "elephant" is spoken. If the system has a cache language model, "elephant" will still probably be misrecognized the first time it is spoken and will have to be entered into the text manually; however, from this point on the system is aware that "elephant" is likely to occur again – the estimated probability of occurrence of "elephant" has been increased, making it more likely that if it is spoken it will be recognized correctly. Once "elephant" has occurred several times the system is likely to recognize it correctly every time it is spoken until the letter has been completely dictated. This increase in the probability assigned to the occurrence of "elephant" is an example of a consequence of [[machine learning]] and more specifically of [[pattern recognition]].</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>To understand why it is a good idea for a statistical language model to contain a cache component one might consider someone who is dictating a letter about elephants to a speech recognition system. Standard (non-cache) [[N-gram]] language models will assign a very low probability to the word "elephant" because it is a very rare word in [[English language|English]]. If the speech recognition system does not contain a cache component<ins style="font-weight: bold; text-decoration: none;">,</ins> the person dictating the letter may be annoyed: each time the word "elephant" is spoken another sequence of words with a higher probability according to the N-gram language model may be recognized (e.g., "tell a plan"). These erroneous sequences will have to be deleted manually and replaced in the text by "elephant" each time "elephant" is spoken. If the system has a cache language model, "elephant" will still probably be misrecognized the first time it is spoken and will have to be entered into the text manually; however, from this point on the system is aware that "elephant" is likely to occur again – the estimated probability of occurrence of "elephant" has been increased, making it more likely that if it is spoken it will be recognized correctly. Once "elephant" has occurred several times<ins style="font-weight: bold; text-decoration: none;">,</ins> the system is likely to recognize it correctly every time it is spoken until the letter has been completely dictated. This increase in the probability assigned to the occurrence of "elephant" is an example of a consequence of [[machine learning]] and more specifically of [[pattern recognition]].</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>There exist variants of the cache language model in which not only single words but also multi-word sequences that have occurred previously are assigned higher probabilities (e.g., if "San Francisco" occurred near the beginning of the text subsequent instances of it would be assigned a higher probability).</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>There exist variants of the cache language model in which not only single words but also multi-word sequences that have occurred previously are assigned higher probabilities (e.g., if "San Francisco" occurred near the beginning of the text subsequent instances of it would be assigned a higher probability).</div></td> </tr> <tr> <td colspan="2" class="diff-lineno">Line 8:</td> <td colspan="2" class="diff-lineno">Line 8:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The cache language model was first proposed in a paper published in 1990,&lt;ref&gt;{{cite journal | last1=Kuhn | first1=R. | last2=De Mori | first2=R. | author-link2=Renato de Mori | title=A Cache-Based Natural Language Model for Speech Recognition | journal=[[IEEE Transactions on Pattern Analysis and Machine Intelligence]] | date=June 1990 | volume=12 | issue=6 | pages=570–583 | url=http://visgraph.cs.ust.hk/biometrics/Papers/Voice/pami1990-06-01.pdf | issn=1939-3539 | doi=10.1109/34.56193 | access-date=2011-09-24 | archive-url=https://web.archive.org/web/20110901154408/http://visgraph.cs.ust.hk/biometrics/Papers/Voice/pami1990-06-01.pdf | archive-date=2011-09-01 | url-status=dead }} ([https://www.computer.org/csdl/trans/tp/1990/06/i0570-abs.html Abstract])&lt;/ref&gt; after which the [[IBM]] speech-recognition group experimented with the concept. The group found that implementation of a form of cache language model yielded a 24% drop in [[Word error rate|word-error rates]] once the first few hundred words of a document had been dictated.&lt;ref&gt;{{cite journal | url=http://acl.ldc.upenn.edu/H/H91/H91-1057.pdf | title=A Dynamic Language Model for Speech Recognition | author=F. Jelinek|author2=B. Merialdo|author3=S. Roukos|author4=M. Strauss|name-list-style=amp | journal=The Journal of the Acoustical Society of America | year=1991 | volume=98 | issue=2 | pages=293–295 | doi=10.3115/112405.112464 | s2cid=11601499 | archive-url=https://web.archive.org/web/20060614121245/http://acl.ldc.upenn.edu/H/H91/H91-1057.pdf | archive-date=June 14, 2006 | url-status=dead}} Conference: Speech and Natural Language, Proceedings of a Workshop held at Pacific Grove, California, USA, February 19–22, 1999.&lt;/ref&gt; A detailed survey of language modeling techniques concluded that the cache language model was one of the few new language modeling techniques that yielded improvements over the standard N-gram approach: "Our caching results show that caching is by far the most useful technique for perplexity reduction at small and medium [[Training set|training data]] sizes".&lt;ref&gt;{{cite book|author=Joshua T. Goodman|year=2001|title=A Bit of Progress in Language Modeling: Extended Version|publisher=Microsoft Research|location=Redmond, WA (US)|id=Technical report MSR-TR-2001-72|arxiv=cs/0108005v1 |bibcode=2001cs........8005G}}&lt;/ref&gt;</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The cache language model was first proposed in a paper published in 1990,&lt;ref&gt;{{cite journal | last1=Kuhn | first1=R. | last2=De Mori | first2=R. | author-link2=Renato de Mori | title=A Cache-Based Natural Language Model for Speech Recognition | journal=[[IEEE Transactions on Pattern Analysis and Machine Intelligence]] | date=June 1990 | volume=12 | issue=6 | pages=570–583 | url=http://visgraph.cs.ust.hk/biometrics/Papers/Voice/pami1990-06-01.pdf | issn=1939-3539 | doi=10.1109/34.56193 | access-date=2011-09-24 | archive-url=https://web.archive.org/web/20110901154408/http://visgraph.cs.ust.hk/biometrics/Papers/Voice/pami1990-06-01.pdf | archive-date=2011-09-01 | url-status=dead }} ([https://www.computer.org/csdl/trans/tp/1990/06/i0570-abs.html Abstract])&lt;/ref&gt; after which the [[IBM]] speech-recognition group experimented with the concept. The group found that implementation of a form of cache language model yielded a 24% drop in [[Word error rate|word-error rates]] once the first few hundred words of a document had been dictated.&lt;ref&gt;{{cite journal | url=http://acl.ldc.upenn.edu/H/H91/H91-1057.pdf | title=A Dynamic Language Model for Speech Recognition | author=F. Jelinek|author2=B. Merialdo|author3=S. Roukos|author4=M. Strauss|name-list-style=amp | journal=The Journal of the Acoustical Society of America | year=1991 | volume=98 | issue=2 | pages=293–295 | doi=10.3115/112405.112464 | s2cid=11601499 | archive-url=https://web.archive.org/web/20060614121245/http://acl.ldc.upenn.edu/H/H91/H91-1057.pdf | archive-date=June 14, 2006 | url-status=dead}} Conference: Speech and Natural Language, Proceedings of a Workshop held at Pacific Grove, California, USA, February 19–22, 1999.&lt;/ref&gt; A detailed survey of language modeling techniques concluded that the cache language model was one of the few new language modeling techniques that yielded improvements over the standard N-gram approach: "Our caching results show that caching is by far the most useful technique for perplexity reduction at small and medium [[Training set|training data]] sizes".&lt;ref&gt;{{cite book|author=Joshua T. Goodman|year=2001|title=A Bit of Progress in Language Modeling: Extended Version|publisher=Microsoft Research|location=Redmond, WA (US)|id=Technical report MSR-TR-2001-72|arxiv=cs/0108005v1 |bibcode=2001cs........8005G}}&lt;/ref&gt;</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The development of the cache language model has generated considerable interest among those concerned with [[computational linguistics]] in general and [[statistical natural language processing]] in particular: recently there has been interest in applying the cache language model in the field of statistical machine translation.&lt;ref&gt;{{cite conference | url=http://www.aclweb.org/anthology/W/W10/W10-2602.pdf | title=Context Adaptation in Statistical Machine Translation Using Models with Exponentially Decaying Cache | author=Tiedemann, Jorg | conference=Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing, ACL 2010 | year=2010 | pages=8–15 | location=Uppsala, Sweden | publisher=Association for Computational Linguistics}}&lt;/ref&gt;</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The development of the cache language model has generated considerable interest among those concerned with [[computational linguistics]] in general and [[statistical natural language processing]] in particular: recently<ins style="font-weight: bold; text-decoration: none;">,</ins> there has been interest in applying the cache language model in the field of statistical machine translation.&lt;ref&gt;{{cite conference | url=http://www.aclweb.org/anthology/W/W10/W10-2602.pdf | title=Context Adaptation in Statistical Machine Translation Using Models with Exponentially Decaying Cache | author=Tiedemann, Jorg | conference=Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing, ACL 2010 | year=2010 | pages=8–15 | location=Uppsala, Sweden | publisher=Association for Computational Linguistics}}&lt;/ref&gt;</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The success of the cache language model in improving [[word prediction]] rests on the human tendency to use words in a "bursty" fashion: when one is discussing a certain topic in a certain context the frequency with which one uses certain words will be quite different from their frequencies when one is discussing other topics in other contexts. The traditional N-gram language models, which rely entirely on information from a very small number (four, three, or two) of words preceding the word to which a probability is to be assigned, do not adequately model this "burstiness".</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The success of the cache language model in improving [[word prediction]] rests on the human tendency to use words in a "bursty" fashion: when one is discussing a certain topic in a certain context<ins style="font-weight: bold; text-decoration: none;">,</ins> the frequency with which one uses certain words will be quite different from their frequencies when one is discussing other topics in other contexts. The traditional N-gram language models, which rely entirely on information from a very small number (four, three, or two) of words preceding the word to which a probability is to be assigned, do not adequately model this "burstiness".<ins style="font-weight: bold; text-decoration: none;">{{Citation needed|date=September 2011}}</ins></div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Recently, the cache language model concept - originally conceived for the N-gram statistical language model paradigm - has been adapted for use in the neural paradigm. For instance, recent work on continuous cache language models in the [[recurrent neural network]] (RNN) setting has applied the cache concept to much larger contexts than before, yielding significant reductions in perplexity</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Recently, the cache language model concept - originally conceived for the N-gram statistical language model paradigm - has been adapted for use in the neural paradigm. For instance, recent work on continuous cache language models in the [[recurrent neural network]] (RNN) setting has applied the cache concept to much larger contexts than before, yielding significant reductions in perplexity</div></td> </tr> </table> Nierty https://en.wikipedia.org/w/index.php?title=Cache_language_model&diff=1128396853&oldid=prev DarklitShadow: Added {{Confusing}} tag 2022-12-19T23:04:27Z <p>Added {{<a href="/wiki/Template:Confusing" title="Template:Confusing">Confusing</a>}} tag</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 23:04, 19 December 2022</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 1:</td> <td colspan="2" class="diff-lineno">Line 1:</td> </tr> <tr> <td colspan="2" class="diff-empty diff-side-deleted"></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>{{Confusing|date=December 2022}}</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A '''cache language model''' is a type of statistical [[language model]]. These occur in the [[natural language processing]] subfield of [[computer science]] and assign [[probability|probabilities]] to given sequences of words by means of a [[probability distribution]]. Statistical language models are key components of [[speech recognition]] systems and of many [[machine translation]] systems: they tell such systems which possible output word sequences are probable and which are improbable. The particular characteristic of a cache language model is that it contains a [[Cache (computing)|cache component]] and assigns relatively high probabilities to words or word sequences that occur elsewhere in a given text. The primary, but by no means sole, use of cache language models is in speech recognition systems.{{Citation needed|date=September 2011}}</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A '''cache language model''' is a type of statistical [[language model]]. These occur in the [[natural language processing]] subfield of [[computer science]] and assign [[probability|probabilities]] to given sequences of words by means of a [[probability distribution]]. Statistical language models are key components of [[speech recognition]] systems and of many [[machine translation]] systems: they tell such systems which possible output word sequences are probable and which are improbable. The particular characteristic of a cache language model is that it contains a [[Cache (computing)|cache component]] and assigns relatively high probabilities to words or word sequences that occur elsewhere in a given text. The primary, but by no means sole, use of cache language models is in speech recognition systems.{{Citation needed|date=September 2011}}</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> </table> DarklitShadow https://en.wikipedia.org/w/index.php?title=Cache_language_model&diff=994022028&oldid=prev Monkbot: Task 18 (cosmetic): eval 7 templates: del empty params (1×); hyphenate params (4×); 2020-12-13T19:02:59Z <p><a href="/wiki/User:Monkbot/task_18" class="mw-redirect" title="User:Monkbot/task 18">Task 18 (cosmetic)</a>: eval 7 templates: del empty params (1×); hyphenate params (4×);</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 19:02, 13 December 2020</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 5:</td> <td colspan="2" class="diff-lineno">Line 5:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>There exist variants of the cache language model in which not only single words but also multi-word sequences that have occurred previously are assigned higher probabilities (e.g., if "San Francisco" occurred near the beginning of the text subsequent instances of it would be assigned a higher probability).</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>There exist variants of the cache language model in which not only single words but also multi-word sequences that have occurred previously are assigned higher probabilities (e.g., if "San Francisco" occurred near the beginning of the text subsequent instances of it would be assigned a higher probability).</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The cache language model was first proposed in a paper published in 1990,&lt;ref&gt;{{cite journal | last1=Kuhn | first1=R. | last2=De Mori | first2=R. | <del style="font-weight: bold; text-decoration: none;">authorlink1= | authorlink2</del>=Renato de Mori | title=A Cache-Based Natural Language Model for Speech Recognition | journal=[[IEEE Transactions on Pattern Analysis and Machine Intelligence]] | date=June 1990 | volume=12 | issue=6 | pages=570–583 | url=http://visgraph.cs.ust.hk/biometrics/Papers/Voice/pami1990-06-01.pdf | issn=1939-3539 | doi=10.1109/34.56193 | access-date=2011-09-24 | archive-url=https://web.archive.org/web/20110901154408/http://visgraph.cs.ust.hk/biometrics/Papers/Voice/pami1990-06-01.pdf | archive-date=2011-09-01 | url-status=dead }} ([https://www.computer.org/csdl/trans/tp/1990/06/i0570-abs.html Abstract])&lt;/ref&gt; after which the [[IBM]] speech-recognition group experimented with the concept. The group found that implementation of a form of cache language model yielded a 24% drop in [[Word error rate|word-error rates]] once the first few hundred words of a document had been dictated.&lt;ref&gt;{{cite journal | url=http://acl.ldc.upenn.edu/H/H91/H91-1057.pdf | title=A Dynamic Language Model for Speech Recognition | author=F. Jelinek|author2=B. Merialdo|author3=S. Roukos|author4=M. Strauss|name-list-style=amp | journal=The Journal of the Acoustical Society of America | year=1991 | volume=98 | issue=2 | pages=293–295 | doi=10.3115/112405.112464 | s2cid=11601499 | <del style="font-weight: bold; text-decoration: none;">archiveurl</del>=https://web.archive.org/web/20060614121245/http://acl.ldc.upenn.edu/H/H91/H91-1057.pdf | <del style="font-weight: bold; text-decoration: none;">archivedate</del>=June 14, 2006 | url-status=dead}} Conference: Speech and Natural Language, Proceedings of a Workshop held at Pacific Grove, California, USA, February 19–22, 1999.&lt;/ref&gt; A detailed survey of language modeling techniques concluded that the cache language model was one of the few new language modeling techniques that yielded improvements over the standard N-gram approach: "Our caching results show that caching is by far the most useful technique for perplexity reduction at small and medium [[Training set|training data]] sizes".&lt;ref&gt;{{cite book|author=Joshua T. Goodman|year=2001|title=A Bit of Progress in Language Modeling: Extended Version|publisher=Microsoft Research|location=Redmond, WA (US)|id=Technical report MSR-TR-2001-72|arxiv=cs/0108005v1 |bibcode=2001cs........8005G}}&lt;/ref&gt;</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The cache language model was first proposed in a paper published in 1990,&lt;ref&gt;{{cite journal | last1=Kuhn | first1=R. | last2=De Mori | first2=R. | <ins style="font-weight: bold; text-decoration: none;">author-link2</ins>=Renato de Mori | title=A Cache-Based Natural Language Model for Speech Recognition | journal=[[IEEE Transactions on Pattern Analysis and Machine Intelligence]] | date=June 1990 | volume=12 | issue=6 | pages=570–583 | url=http://visgraph.cs.ust.hk/biometrics/Papers/Voice/pami1990-06-01.pdf | issn=1939-3539 | doi=10.1109/34.56193 | access-date=2011-09-24 | archive-url=https://web.archive.org/web/20110901154408/http://visgraph.cs.ust.hk/biometrics/Papers/Voice/pami1990-06-01.pdf | archive-date=2011-09-01 | url-status=dead }} ([https://www.computer.org/csdl/trans/tp/1990/06/i0570-abs.html Abstract])&lt;/ref&gt; after which the [[IBM]] speech-recognition group experimented with the concept. The group found that implementation of a form of cache language model yielded a 24% drop in [[Word error rate|word-error rates]] once the first few hundred words of a document had been dictated.&lt;ref&gt;{{cite journal | url=http://acl.ldc.upenn.edu/H/H91/H91-1057.pdf | title=A Dynamic Language Model for Speech Recognition | author=F. Jelinek|author2=B. Merialdo|author3=S. Roukos|author4=M. Strauss|name-list-style=amp | journal=The Journal of the Acoustical Society of America | year=1991 | volume=98 | issue=2 | pages=293–295 | doi=10.3115/112405.112464 | s2cid=11601499 | <ins style="font-weight: bold; text-decoration: none;">archive-url</ins>=https://web.archive.org/web/20060614121245/http://acl.ldc.upenn.edu/H/H91/H91-1057.pdf | <ins style="font-weight: bold; text-decoration: none;">archive-date</ins>=June 14, 2006 | url-status=dead}} Conference: Speech and Natural Language, Proceedings of a Workshop held at Pacific Grove, California, USA, February 19–22, 1999.&lt;/ref&gt; A detailed survey of language modeling techniques concluded that the cache language model was one of the few new language modeling techniques that yielded improvements over the standard N-gram approach: "Our caching results show that caching is by far the most useful technique for perplexity reduction at small and medium [[Training set|training data]] sizes".&lt;ref&gt;{{cite book|author=Joshua T. Goodman|year=2001|title=A Bit of Progress in Language Modeling: Extended Version|publisher=Microsoft Research|location=Redmond, WA (US)|id=Technical report MSR-TR-2001-72|arxiv=cs/0108005v1 |bibcode=2001cs........8005G}}&lt;/ref&gt;</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The development of the cache language model has generated considerable interest among those concerned with [[computational linguistics]] in general and [[statistical natural language processing]] in particular: recently there has been interest in applying the cache language model in the field of statistical machine translation.&lt;ref&gt;{{cite conference | url=http://www.aclweb.org/anthology/W/W10/W10-2602.pdf | title=Context Adaptation in Statistical Machine Translation Using Models with Exponentially Decaying Cache | author=Tiedemann, Jorg | conference=Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing, ACL 2010 | year=2010 | pages=8–15 | location=Uppsala, Sweden | publisher=Association for Computational Linguistics}}&lt;/ref&gt;</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The development of the cache language model has generated considerable interest among those concerned with [[computational linguistics]] in general and [[statistical natural language processing]] in particular: recently there has been interest in applying the cache language model in the field of statistical machine translation.&lt;ref&gt;{{cite conference | url=http://www.aclweb.org/anthology/W/W10/W10-2602.pdf | title=Context Adaptation in Statistical Machine Translation Using Models with Exponentially Decaying Cache | author=Tiedemann, Jorg | conference=Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing, ACL 2010 | year=2010 | pages=8–15 | location=Uppsala, Sweden | publisher=Association for Computational Linguistics}}&lt;/ref&gt;</div></td> </tr> <tr> <td colspan="2" class="diff-lineno">Line 26:</td> <td colspan="2" class="diff-lineno">Line 26:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Further reading ==</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Further reading ==</div></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>*{{cite book | last=Jelinek | first=Frederick | <del style="font-weight: bold; text-decoration: none;">authorlink</del>=Frederick Jelinek | title=Statistical Methods for Speech Recognition | publisher=[[The MIT Press]] | url=http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&amp;tid=7447 | year=1997 | isbn=0-262-10066-5 | access-date=2011-09-24 | archive-url=https://web.archive.org/web/20110805015427/http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&amp;tid=7447 | archive-date=2011-08-05 | url-status=dead }}</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>*{{cite book | last=Jelinek | first=Frederick | <ins style="font-weight: bold; text-decoration: none;">author-link</ins>=Frederick Jelinek | title=Statistical Methods for Speech Recognition | publisher=[[The MIT Press]] | url=http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&amp;tid=7447 | year=1997 | isbn=0-262-10066-5 | access-date=2011-09-24 | archive-url=https://web.archive.org/web/20110805015427/http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&amp;tid=7447 | archive-date=2011-08-05 | url-status=dead }}</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Language modeling]]</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[[Category:Language modeling]]</div></td> </tr> </table> Monkbot https://en.wikipedia.org/w/index.php?title=Cache_language_model&diff=984443672&oldid=prev Monkbot: /* top */Task 17 (BRFA trial): replace deprecated: |last-author-amp= (1× replaced; usage: 1 of 4); 2020-10-20T03:36:25Z <p><span class="autocomment">top: </span><a href="/wiki/User:Monkbot/task_17:_remove_replace_deprecated_last-author-amp_params" title="User:Monkbot/task 17: remove replace deprecated last-author-amp params">Task 17</a> (<a href="/wiki/Wikipedia:Bots/Requests_for_approval/Monkbot_17" title="Wikipedia:Bots/Requests for approval/Monkbot 17">BRFA</a> trial): replace deprecated: |last-author-amp= (1× replaced; usage: 1 of 4);</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 03:36, 20 October 2020</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 5:</td> <td colspan="2" class="diff-lineno">Line 5:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>There exist variants of the cache language model in which not only single words but also multi-word sequences that have occurred previously are assigned higher probabilities (e.g., if "San Francisco" occurred near the beginning of the text subsequent instances of it would be assigned a higher probability).</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>There exist variants of the cache language model in which not only single words but also multi-word sequences that have occurred previously are assigned higher probabilities (e.g., if "San Francisco" occurred near the beginning of the text subsequent instances of it would be assigned a higher probability).</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The cache language model was first proposed in a paper published in 1990,&lt;ref&gt;{{cite journal | last1=Kuhn | first1=R. | last2=De Mori | first2=R. | authorlink1= | authorlink2=Renato de Mori | title=A Cache-Based Natural Language Model for Speech Recognition | journal=[[IEEE Transactions on Pattern Analysis and Machine Intelligence]] | date=June 1990 | volume=12 | issue=6 | pages=570–583 | url=http://visgraph.cs.ust.hk/biometrics/Papers/Voice/pami1990-06-01.pdf | issn=1939-3539 | doi=10.1109/34.56193 | access-date=2011-09-24 | archive-url=https://web.archive.org/web/20110901154408/http://visgraph.cs.ust.hk/biometrics/Papers/Voice/pami1990-06-01.pdf | archive-date=2011-09-01 | url-status=dead }} ([https://www.computer.org/csdl/trans/tp/1990/06/i0570-abs.html Abstract])&lt;/ref&gt; after which the [[IBM]] speech-recognition group experimented with the concept. The group found that implementation of a form of cache language model yielded a 24% drop in [[Word error rate|word-error rates]] once the first few hundred words of a document had been dictated.&lt;ref&gt;{{cite journal | url=http://acl.ldc.upenn.edu/H/H91/H91-1057.pdf | title=A Dynamic Language Model for Speech Recognition | author=F. Jelinek|author2=B. Merialdo|author3=S. Roukos|author4=M. Strauss|<del style="font-weight: bold; text-decoration: none;">lastauthoramp</del>=<del style="font-weight: bold; text-decoration: none;">y</del> | journal=The Journal of the Acoustical Society of America | year=1991 | volume=98 | issue=2 | pages=293–295 | doi=10.3115/112405.112464 | s2cid=11601499 | archiveurl=https://web.archive.org/web/20060614121245/http://acl.ldc.upenn.edu/H/H91/H91-1057.pdf | archivedate=June 14, 2006 | url-status=dead}} Conference: Speech and Natural Language, Proceedings of a Workshop held at Pacific Grove, California, USA, February 19–22, 1999.&lt;/ref&gt; A detailed survey of language modeling techniques concluded that the cache language model was one of the few new language modeling techniques that yielded improvements over the standard N-gram approach: "Our caching results show that caching is by far the most useful technique for perplexity reduction at small and medium [[Training set|training data]] sizes".&lt;ref&gt;{{cite book|author=Joshua T. Goodman|year=2001|title=A Bit of Progress in Language Modeling: Extended Version|publisher=Microsoft Research|location=Redmond, WA (US)|id=Technical report MSR-TR-2001-72|arxiv=cs/0108005v1 |bibcode=2001cs........8005G}}&lt;/ref&gt;</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The cache language model was first proposed in a paper published in 1990,&lt;ref&gt;{{cite journal | last1=Kuhn | first1=R. | last2=De Mori | first2=R. | authorlink1= | authorlink2=Renato de Mori | title=A Cache-Based Natural Language Model for Speech Recognition | journal=[[IEEE Transactions on Pattern Analysis and Machine Intelligence]] | date=June 1990 | volume=12 | issue=6 | pages=570–583 | url=http://visgraph.cs.ust.hk/biometrics/Papers/Voice/pami1990-06-01.pdf | issn=1939-3539 | doi=10.1109/34.56193 | access-date=2011-09-24 | archive-url=https://web.archive.org/web/20110901154408/http://visgraph.cs.ust.hk/biometrics/Papers/Voice/pami1990-06-01.pdf | archive-date=2011-09-01 | url-status=dead }} ([https://www.computer.org/csdl/trans/tp/1990/06/i0570-abs.html Abstract])&lt;/ref&gt; after which the [[IBM]] speech-recognition group experimented with the concept. The group found that implementation of a form of cache language model yielded a 24% drop in [[Word error rate|word-error rates]] once the first few hundred words of a document had been dictated.&lt;ref&gt;{{cite journal | url=http://acl.ldc.upenn.edu/H/H91/H91-1057.pdf | title=A Dynamic Language Model for Speech Recognition | author=F. Jelinek|author2=B. Merialdo|author3=S. Roukos|author4=M. Strauss|<ins style="font-weight: bold; text-decoration: none;">name-list-style</ins>=<ins style="font-weight: bold; text-decoration: none;">amp</ins> | journal=The Journal of the Acoustical Society of America | year=1991 | volume=98 | issue=2 | pages=293–295 | doi=10.3115/112405.112464 | s2cid=11601499 | archiveurl=https://web.archive.org/web/20060614121245/http://acl.ldc.upenn.edu/H/H91/H91-1057.pdf | archivedate=June 14, 2006 | url-status=dead}} Conference: Speech and Natural Language, Proceedings of a Workshop held at Pacific Grove, California, USA, February 19–22, 1999.&lt;/ref&gt; A detailed survey of language modeling techniques concluded that the cache language model was one of the few new language modeling techniques that yielded improvements over the standard N-gram approach: "Our caching results show that caching is by far the most useful technique for perplexity reduction at small and medium [[Training set|training data]] sizes".&lt;ref&gt;{{cite book|author=Joshua T. Goodman|year=2001|title=A Bit of Progress in Language Modeling: Extended Version|publisher=Microsoft Research|location=Redmond, WA (US)|id=Technical report MSR-TR-2001-72|arxiv=cs/0108005v1 |bibcode=2001cs........8005G}}&lt;/ref&gt;</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The development of the cache language model has generated considerable interest among those concerned with [[computational linguistics]] in general and [[statistical natural language processing]] in particular: recently there has been interest in applying the cache language model in the field of statistical machine translation.&lt;ref&gt;{{cite conference | url=http://www.aclweb.org/anthology/W/W10/W10-2602.pdf | title=Context Adaptation in Statistical Machine Translation Using Models with Exponentially Decaying Cache | author=Tiedemann, Jorg | conference=Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing, ACL 2010 | year=2010 | pages=8–15 | location=Uppsala, Sweden | publisher=Association for Computational Linguistics}}&lt;/ref&gt;</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The development of the cache language model has generated considerable interest among those concerned with [[computational linguistics]] in general and [[statistical natural language processing]] in particular: recently there has been interest in applying the cache language model in the field of statistical machine translation.&lt;ref&gt;{{cite conference | url=http://www.aclweb.org/anthology/W/W10/W10-2602.pdf | title=Context Adaptation in Statistical Machine Translation Using Models with Exponentially Decaying Cache | author=Tiedemann, Jorg | conference=Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing, ACL 2010 | year=2010 | pages=8–15 | location=Uppsala, Sweden | publisher=Association for Computational Linguistics}}&lt;/ref&gt;</div></td> </tr> </table> Monkbot https://en.wikipedia.org/w/index.php?title=Cache_language_model&diff=981775146&oldid=prev WikiCleanerBot: v2.03b - Bot T20 CW#61 - WP:WCW project (Reference before punctuation) 2020-10-04T10:57:33Z <p>v2.03b - <a href="/wiki/User:WikiCleanerBot#T20" title="User:WikiCleanerBot">Bot T20 CW#61</a> - <a href="/wiki/Wikipedia:WCW" class="mw-redirect" title="Wikipedia:WCW">WP:WCW</a> project (Reference before punctuation)</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 10:57, 4 October 2020</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 12:</td> <td colspan="2" class="diff-lineno">Line 12:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Recently, the cache language model concept - originally conceived for the N-gram statistical language model paradigm - has been adapted for use in the neural paradigm. For instance, recent work on continuous cache language models in the [[recurrent neural network]] (RNN) setting has applied the cache concept to much larger contexts than before, yielding significant reductions in perplexity</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Recently, the cache language model concept - originally conceived for the N-gram statistical language model paradigm - has been adapted for use in the neural paradigm. For instance, recent work on continuous cache language models in the [[recurrent neural network]] (RNN) setting has applied the cache concept to much larger contexts than before, yielding significant reductions in perplexity</div></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>&lt;ref&gt;{{cite conference |url=https://dl.acm.org/citation.cfm?id=3295353 |title=Unbounded cache model for online language modeling with open vocabulary | author=Edouard Grave |author2= Moustapha Cisse |author3=Armand Joulin |book-title=NIPS'17 Proceedings of the 31st International Conference on Neural Information Processing Systems |year=2017 |pages=6044–6054 |location=Long Beach, California |publisher=Association for Computing Machinery |isbn=978-1-5108-6096-4}}&lt;/ref&gt;<del style="font-weight: bold; text-decoration: none;">.</del> Another recent line of research involves incorporating a cache component in a [[Feedforward neural network|feed-forward]] neural language model (FN-LM) to achieve rapid domain adaptation </div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">.</ins>&lt;ref&gt;{{cite conference |url=https://dl.acm.org/citation.cfm?id=3295353 |title=Unbounded cache model for online language modeling with open vocabulary | author=Edouard Grave |author2= Moustapha Cisse |author3=Armand Joulin |book-title=NIPS'17 Proceedings of the 31st International Conference on Neural Information Processing Systems |year=2017 |pages=6044–6054 |location=Long Beach, California |publisher=Association for Computing Machinery |isbn=978-1-5108-6096-4}}&lt;/ref&gt; Another recent line of research involves incorporating a cache component in a [[Feedforward neural network|feed-forward]] neural language model (FN-LM) to achieve rapid domain adaptation </div></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>&lt;ref&gt;{{cite conference | title=i-Vectors in Language Modeling: An Efficient Way of Domain Adaptation for Feed-Forward Models | author=Karel Benes|author2=Santosh Kesiraju |author3=Lukas Burget | s2cid=52192034 | conference=Interspeech 2018 | year=2018 | pages=3383–3387 | location=Hyderabad, India | publisher=Interspeech| doi=10.21437/Interspeech.2018-1070 }}&lt;/ref&gt;<del style="font-weight: bold; text-decoration: none;">.</del></div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">.</ins>&lt;ref&gt;{{cite conference | title=i-Vectors in Language Modeling: An Efficient Way of Domain Adaptation for Feed-Forward Models | author=Karel Benes|author2=Santosh Kesiraju |author3=Lukas Burget | s2cid=52192034 | conference=Interspeech 2018 | year=2018 | pages=3383–3387 | location=Hyderabad, India | publisher=Interspeech| doi=10.21437/Interspeech.2018-1070 }}&lt;/ref&gt;</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==See also==</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==See also==</div></td> </tr> </table> WikiCleanerBot https://en.wikipedia.org/w/index.php?title=Cache_language_model&diff=975866970&oldid=prev Citation bot: Add: s2cid. | You can use this bot yourself. Report bugs here. | Suggested by Amigao | Category:Computational linguistics | via #UCB_Category 2020-08-30T21:04:54Z <p>Add: s2cid. | You can <a href="/wiki/Wikipedia:UCB" class="mw-redirect" title="Wikipedia:UCB">use this bot</a> yourself. <a href="/wiki/Wikipedia:DBUG" class="mw-redirect" title="Wikipedia:DBUG">Report bugs here</a>. | Suggested by Amigao | <a href="/wiki/Category:Computational_linguistics" title="Category:Computational linguistics">Category:Computational linguistics</a> | via #UCB_Category</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 21:04, 30 August 2020</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 5:</td> <td colspan="2" class="diff-lineno">Line 5:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>There exist variants of the cache language model in which not only single words but also multi-word sequences that have occurred previously are assigned higher probabilities (e.g., if "San Francisco" occurred near the beginning of the text subsequent instances of it would be assigned a higher probability).</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>There exist variants of the cache language model in which not only single words but also multi-word sequences that have occurred previously are assigned higher probabilities (e.g., if "San Francisco" occurred near the beginning of the text subsequent instances of it would be assigned a higher probability).</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The cache language model was first proposed in a paper published in 1990,&lt;ref&gt;{{cite journal | last1=Kuhn | first1=R. | last2=De Mori | first2=R. | authorlink1= | authorlink2=Renato de Mori | title=A Cache-Based Natural Language Model for Speech Recognition | journal=[[IEEE Transactions on Pattern Analysis and Machine Intelligence]] | date=June 1990 | volume=12 | issue=6 | pages=570–583 | url=http://visgraph.cs.ust.hk/biometrics/Papers/Voice/pami1990-06-01.pdf | issn=1939-3539 | doi=10.1109/34.56193 | access-date=2011-09-24 | archive-url=https://web.archive.org/web/20110901154408/http://visgraph.cs.ust.hk/biometrics/Papers/Voice/pami1990-06-01.pdf | archive-date=2011-09-01 | url-status=dead }} ([https://www.computer.org/csdl/trans/tp/1990/06/i0570-abs.html Abstract])&lt;/ref&gt; after which the [[IBM]] speech-recognition group experimented with the concept. The group found that implementation of a form of cache language model yielded a 24% drop in [[Word error rate|word-error rates]] once the first few hundred words of a document had been dictated.&lt;ref&gt;{{cite journal | url=http://acl.ldc.upenn.edu/H/H91/H91-1057.pdf | title=A Dynamic Language Model for Speech Recognition | author=F. Jelinek|author2=B. Merialdo|author3=S. Roukos|author4=M. Strauss|lastauthoramp=y | journal=The Journal of the Acoustical Society of America | year=1991 | volume=98 | issue=2 | pages=293–295 | doi=10.3115/112405.112464 | archiveurl=https://web.archive.org/web/20060614121245/http://acl.ldc.upenn.edu/H/H91/H91-1057.pdf | archivedate=June 14, 2006 | url-status=dead}} Conference: Speech and Natural Language, Proceedings of a Workshop held at Pacific Grove, California, USA, February 19–22, 1999.&lt;/ref&gt; A detailed survey of language modeling techniques concluded that the cache language model was one of the few new language modeling techniques that yielded improvements over the standard N-gram approach: "Our caching results show that caching is by far the most useful technique for perplexity reduction at small and medium [[Training set|training data]] sizes".&lt;ref&gt;{{cite book|author=Joshua T. Goodman|year=2001|title=A Bit of Progress in Language Modeling: Extended Version|publisher=Microsoft Research|location=Redmond, WA (US)|id=Technical report MSR-TR-2001-72|arxiv=cs/0108005v1 |bibcode=2001cs........8005G}}&lt;/ref&gt;</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The cache language model was first proposed in a paper published in 1990,&lt;ref&gt;{{cite journal | last1=Kuhn | first1=R. | last2=De Mori | first2=R. | authorlink1= | authorlink2=Renato de Mori | title=A Cache-Based Natural Language Model for Speech Recognition | journal=[[IEEE Transactions on Pattern Analysis and Machine Intelligence]] | date=June 1990 | volume=12 | issue=6 | pages=570–583 | url=http://visgraph.cs.ust.hk/biometrics/Papers/Voice/pami1990-06-01.pdf | issn=1939-3539 | doi=10.1109/34.56193 | access-date=2011-09-24 | archive-url=https://web.archive.org/web/20110901154408/http://visgraph.cs.ust.hk/biometrics/Papers/Voice/pami1990-06-01.pdf | archive-date=2011-09-01 | url-status=dead }} ([https://www.computer.org/csdl/trans/tp/1990/06/i0570-abs.html Abstract])&lt;/ref&gt; after which the [[IBM]] speech-recognition group experimented with the concept. The group found that implementation of a form of cache language model yielded a 24% drop in [[Word error rate|word-error rates]] once the first few hundred words of a document had been dictated.&lt;ref&gt;{{cite journal | url=http://acl.ldc.upenn.edu/H/H91/H91-1057.pdf | title=A Dynamic Language Model for Speech Recognition | author=F. Jelinek|author2=B. Merialdo|author3=S. Roukos|author4=M. Strauss|lastauthoramp=y | journal=The Journal of the Acoustical Society of America | year=1991 | volume=98 | issue=2 | pages=293–295 | doi=10.3115/112405.112464<ins style="font-weight: bold; text-decoration: none;"> | s2cid=11601499</ins> | archiveurl=https://web.archive.org/web/20060614121245/http://acl.ldc.upenn.edu/H/H91/H91-1057.pdf | archivedate=June 14, 2006 | url-status=dead}} Conference: Speech and Natural Language, Proceedings of a Workshop held at Pacific Grove, California, USA, February 19–22, 1999.&lt;/ref&gt; A detailed survey of language modeling techniques concluded that the cache language model was one of the few new language modeling techniques that yielded improvements over the standard N-gram approach: "Our caching results show that caching is by far the most useful technique for perplexity reduction at small and medium [[Training set|training data]] sizes".&lt;ref&gt;{{cite book|author=Joshua T. Goodman|year=2001|title=A Bit of Progress in Language Modeling: Extended Version|publisher=Microsoft Research|location=Redmond, WA (US)|id=Technical report MSR-TR-2001-72|arxiv=cs/0108005v1 |bibcode=2001cs........8005G}}&lt;/ref&gt;</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The development of the cache language model has generated considerable interest among those concerned with [[computational linguistics]] in general and [[statistical natural language processing]] in particular: recently there has been interest in applying the cache language model in the field of statistical machine translation.&lt;ref&gt;{{cite conference | url=http://www.aclweb.org/anthology/W/W10/W10-2602.pdf | title=Context Adaptation in Statistical Machine Translation Using Models with Exponentially Decaying Cache | author=Tiedemann, Jorg | conference=Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing, ACL 2010 | year=2010 | pages=8–15 | location=Uppsala, Sweden | publisher=Association for Computational Linguistics}}&lt;/ref&gt;</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The development of the cache language model has generated considerable interest among those concerned with [[computational linguistics]] in general and [[statistical natural language processing]] in particular: recently there has been interest in applying the cache language model in the field of statistical machine translation.&lt;ref&gt;{{cite conference | url=http://www.aclweb.org/anthology/W/W10/W10-2602.pdf | title=Context Adaptation in Statistical Machine Translation Using Models with Exponentially Decaying Cache | author=Tiedemann, Jorg | conference=Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing, ACL 2010 | year=2010 | pages=8–15 | location=Uppsala, Sweden | publisher=Association for Computational Linguistics}}&lt;/ref&gt;</div></td> </tr> </table> Citation bot