https://en.wikipedia.org/w/index.php?action=history&feed=atom&title=Large_language_model Large language model - Revision history 2025-06-06T01:39:54Z Revision history for this page on the wiki MediaWiki 1.45.0-wmf.4 https://en.wikipedia.org/w/index.php?title=Large_language_model&diff=1294079396&oldid=prev Alenoach: Undid revision 1294041586 by 2A00:23C6:54FE:FB01:39CF:EA01:B980:C39D (talk): "machine learning model" is more precise than "computational model", and "designed to mimic the human ability to ..." isn't fully accurate 2025-06-05T13:02:43Z <p>Undid revision <a href="/wiki/Special:Diff/1294041586" title="Special:Diff/1294041586">1294041586</a> by <a href="/wiki/Special:Contributions/2A00:23C6:54FE:FB01:39CF:EA01:B980:C39D" title="Special:Contributions/2A00:23C6:54FE:FB01:39CF:EA01:B980:C39D">2A00:23C6:54FE:FB01:39CF:EA01:B980:C39D</a> (<a href="/w/index.php?title=User_talk:2A00:23C6:54FE:FB01:39CF:EA01:B980:C39D&amp;action=edit&amp;redlink=1" class="new" title="User talk:2A00:23C6:54FE:FB01:39CF:EA01:B980:C39D (page does not exist)">talk</a>): &quot;machine learning model&quot; is more precise than &quot;computational model&quot;, and &quot;designed to mimic the human ability to ...&quot; isn&#039;t fully accurate</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 13:02, 5 June 2025</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 5:</td> <td colspan="2" class="diff-lineno">Line 5:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{Machine learning|Artificial neural network}}</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{Machine learning|Artificial neural network}}</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>A '''large language model''' ('''LLM''') is a [[<del style="font-weight: bold; text-decoration: none;">Model#Conceptual</del> <del style="font-weight: bold; text-decoration: none;">model|computational model</del>]] designed <del style="font-weight: bold; text-decoration: none;">to mimic the human ability to</del> [[<del style="font-weight: bold; text-decoration: none;">Natural</del> language <del style="font-weight: bold; text-decoration: none;">generation|generate</del>]] <del style="font-weight: bold; text-decoration: none;">and</del> [[<del style="font-weight: bold; text-decoration: none;">natural</del> language <del style="font-weight: bold; text-decoration: none;">processing</del>|<del style="font-weight: bold; text-decoration: none;">process</del> <del style="font-weight: bold; text-decoration: none;">natural language</del>]]. LLMs are [[Language model|language models]] with many parameters<del style="font-weight: bold; text-decoration: none;">, based on [[machine learning]]</del>, and are trained with [[self-supervised learning]] on a vast amount of text.</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>A '''large language model''' ('''LLM''') is a [[<ins style="font-weight: bold; text-decoration: none;">machine</ins> <ins style="font-weight: bold; text-decoration: none;">learning</ins>]]<ins style="font-weight: bold; text-decoration: none;"> model</ins> designed <ins style="font-weight: bold; text-decoration: none;">for</ins> [[<ins style="font-weight: bold; text-decoration: none;">natural</ins> language <ins style="font-weight: bold; text-decoration: none;">processing</ins>]] <ins style="font-weight: bold; text-decoration: none;">tasks, especially</ins> [[<ins style="font-weight: bold; text-decoration: none;">Natural</ins> language <ins style="font-weight: bold; text-decoration: none;">generation</ins>|<ins style="font-weight: bold; text-decoration: none;">language</ins> <ins style="font-weight: bold; text-decoration: none;">generation</ins>]]. LLMs are [[Language model|language models]] with many parameters, and are trained with [[self-supervised learning]] on a vast amount of text.</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The largest and most capable LLMs are [[Generative pre-trained transformer|generative pretrained transformers]] (GPTs), which are largely used in [[Generative artificial intelligence|generative]] [[Chatbot|chatbots]] such as [[ChatGPT]] or [[Gemini (chatbot)|Gemini]]. LLMs can be [[Fine-tuning (deep learning)|fine-tuned]] for specific tasks or guided by [[prompt engineering]].&lt;ref name="few-shot-learners"&gt;{{cite journal |last1=Brown |first1=Tom B. |last2=Mann |first2=Benjamin |last3=Ryder |first3=Nick |last4=Subbiah |first4=Melanie |last5=Kaplan |first5=Jared |last6=Dhariwal |first6=Prafulla |last7=Neelakantan |first7=Arvind |last8=Shyam |first8=Pranav |last9=Sastry |first9=Girish |last10=Askell |first10=Amanda |last11=Agarwal |first11=Sandhini |last12=Herbert-Voss |first12=Ariel |last13=Krueger |first13=Gretchen |last14=Henighan |first14=Tom |last15=Child |first15=Rewon |date=Dec 2020 |editor1-last=Larochelle |editor1-first=H. |editor2-last=Ranzato |editor2-first=M. |editor3-last=Hadsell |editor3-first=R. |editor4-last=Balcan |editor4-first=M.F. |editor5-last=Lin |editor5-first=H. |title=Language Models are Few-Shot Learners |url=https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=33 |pages=1877–1901 |last25=Chess |last20=Hesse |first20=Christopher |last21=Chen |first21=Mark |last22=Sigler |first22=Eric |last23=Litwin |first23=Mateusz |last24=Gray |first24=Scott |first26=Jack |first25=Benjamin |last26=Clark |last19=Winter |last27=Berner |first27=Christopher |last28=McCandlish |first28=Sam |last29=Radford |first29=Alec |last30=Sutskever |first30=Ilya |last31=Amodei |first31=Dario |first19=Clemens |first18=Jeffrey |last18=Wu |last16=Ramesh |first16=Aditya |last17=Ziegler |first17=Daniel M. |access-date=2023-03-14 |archive-date=2023-11-17 |archive-url=https://web.archive.org/web/20231117204007/https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf |url-status=live }}&lt;/ref&gt; These models acquire [[Predictive learning|predictive power]] regarding [[syntax]], [[semantics]], and [[ontology (information science)|ontologies]]&lt;ref&gt;{{cite conference |url=https://2024.eswc-conferences.org/wp-content/uploads/2024/05/77770034.pdf |title=NeOn-GPT: A Large Language Model-Powered Pipeline for Ontology Learning |last1=Fathallah |first1=Nadeen |last2=Das |first2=Arunav |last3=De Giorgis |first3=Stefano |last4=Poltronieri |first4=Andrea |last5=Haase |first5=Peter |last6=Kovriguina |first6=Liubov |date=2024-05-26 |location=Hersonissos, Greece |conference=Extended Semantic Web Conference 2024}}&lt;/ref&gt; inherent in human [[Text corpus|language corpora]], but they also inherit inaccuracies and [[Algorithmic bias|biases]] present in the [[Training, validation, and test data sets|data]] they are trained in.&lt;ref name="Manning-2022"&gt;{{cite journal |last=Manning |first=Christopher D. |author-link=Christopher D. Manning |year=2022 |title=Human Language Understanding &amp; Reasoning |url=https://www.amacad.org/publication/human-language-understanding-reasoning |journal=Daedalus |volume=151 |issue=2 |pages=127–138 |doi=10.1162/daed_a_01905 |s2cid=248377870 |doi-access=free |access-date=2023-03-09 |archive-date=2023-11-17 |archive-url=https://web.archive.org/web/20231117205531/https://www.amacad.org/publication/human-language-understanding-reasoning |url-status=live }}&lt;/ref&gt;</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The largest and most capable LLMs are [[Generative pre-trained transformer|generative pretrained transformers]] (GPTs), which are largely used in [[Generative artificial intelligence|generative]] [[Chatbot|chatbots]] such as [[ChatGPT]] or [[Gemini (chatbot)|Gemini]]. LLMs can be [[Fine-tuning (deep learning)|fine-tuned]] for specific tasks or guided by [[prompt engineering]].&lt;ref name="few-shot-learners"&gt;{{cite journal |last1=Brown |first1=Tom B. |last2=Mann |first2=Benjamin |last3=Ryder |first3=Nick |last4=Subbiah |first4=Melanie |last5=Kaplan |first5=Jared |last6=Dhariwal |first6=Prafulla |last7=Neelakantan |first7=Arvind |last8=Shyam |first8=Pranav |last9=Sastry |first9=Girish |last10=Askell |first10=Amanda |last11=Agarwal |first11=Sandhini |last12=Herbert-Voss |first12=Ariel |last13=Krueger |first13=Gretchen |last14=Henighan |first14=Tom |last15=Child |first15=Rewon |date=Dec 2020 |editor1-last=Larochelle |editor1-first=H. |editor2-last=Ranzato |editor2-first=M. |editor3-last=Hadsell |editor3-first=R. |editor4-last=Balcan |editor4-first=M.F. |editor5-last=Lin |editor5-first=H. |title=Language Models are Few-Shot Learners |url=https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=33 |pages=1877–1901 |last25=Chess |last20=Hesse |first20=Christopher |last21=Chen |first21=Mark |last22=Sigler |first22=Eric |last23=Litwin |first23=Mateusz |last24=Gray |first24=Scott |first26=Jack |first25=Benjamin |last26=Clark |last19=Winter |last27=Berner |first27=Christopher |last28=McCandlish |first28=Sam |last29=Radford |first29=Alec |last30=Sutskever |first30=Ilya |last31=Amodei |first31=Dario |first19=Clemens |first18=Jeffrey |last18=Wu |last16=Ramesh |first16=Aditya |last17=Ziegler |first17=Daniel M. |access-date=2023-03-14 |archive-date=2023-11-17 |archive-url=https://web.archive.org/web/20231117204007/https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf |url-status=live }}&lt;/ref&gt; These models acquire [[Predictive learning|predictive power]] regarding [[syntax]], [[semantics]], and [[ontology (information science)|ontologies]]&lt;ref&gt;{{cite conference |url=https://2024.eswc-conferences.org/wp-content/uploads/2024/05/77770034.pdf |title=NeOn-GPT: A Large Language Model-Powered Pipeline for Ontology Learning |last1=Fathallah |first1=Nadeen |last2=Das |first2=Arunav |last3=De Giorgis |first3=Stefano |last4=Poltronieri |first4=Andrea |last5=Haase |first5=Peter |last6=Kovriguina |first6=Liubov |date=2024-05-26 |location=Hersonissos, Greece |conference=Extended Semantic Web Conference 2024}}&lt;/ref&gt; inherent in human [[Text corpus|language corpora]], but they also inherit inaccuracies and [[Algorithmic bias|biases]] present in the [[Training, validation, and test data sets|data]] they are trained in.&lt;ref name="Manning-2022"&gt;{{cite journal |last=Manning |first=Christopher D. |author-link=Christopher D. Manning |year=2022 |title=Human Language Understanding &amp; Reasoning |url=https://www.amacad.org/publication/human-language-understanding-reasoning |journal=Daedalus |volume=151 |issue=2 |pages=127–138 |doi=10.1162/daed_a_01905 |s2cid=248377870 |doi-access=free |access-date=2023-03-09 |archive-date=2023-11-17 |archive-url=https://web.archive.org/web/20231117205531/https://www.amacad.org/publication/human-language-understanding-reasoning |url-status=live }}&lt;/ref&gt;</div></td> </tr> <!-- diff cache key enwiki:diff:1.41:old-1294041586:rev-1294079396:wikidiff2=table:1.14.1:ff290eae --> </table> Alenoach https://en.wikipedia.org/w/index.php?title=Large_language_model&diff=1294041586&oldid=prev 2A00:23C6:54FE:FB01:39CF:EA01:B980:C39D: As per request, I have reworded the first sentence for the layman to understand. Further work is needed: explaining the difference between a computer program, a computer algorthm, and a computer model. 2025-06-05T06:12:25Z <p>As per request, I have reworded the first sentence for the layman to understand. Further work is needed: explaining the difference between a computer program, a computer algorthm, and a computer model.</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 06:12, 5 June 2025</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 5:</td> <td colspan="2" class="diff-lineno">Line 5:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{Machine learning|Artificial neural network}}</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{Machine learning|Artificial neural network}}</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>A '''large language model''' ('''LLM''') is a [[<del style="font-weight: bold; text-decoration: none;">machine</del> <del style="font-weight: bold; text-decoration: none;">learning]]</del> model designed <del style="font-weight: bold; text-decoration: none;">for</del> [[<del style="font-weight: bold; text-decoration: none;">natural</del> language <del style="font-weight: bold; text-decoration: none;">processing</del>]] <del style="font-weight: bold; text-decoration: none;">tasks, especially</del> [[<del style="font-weight: bold; text-decoration: none;">Natural</del> language <del style="font-weight: bold; text-decoration: none;">generation</del>|<del style="font-weight: bold; text-decoration: none;">language</del> <del style="font-weight: bold; text-decoration: none;">generation</del>]]. LLMs are [[Language model|language models]] with many parameters, and are trained with [[self-supervised learning]] on a vast amount of text.</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>A '''large language model''' ('''LLM''') is a [[<ins style="font-weight: bold; text-decoration: none;">Model#Conceptual</ins> <ins style="font-weight: bold; text-decoration: none;">model|computational</ins> model<ins style="font-weight: bold; text-decoration: none;">]]</ins> designed <ins style="font-weight: bold; text-decoration: none;">to mimic the human ability to</ins> [[<ins style="font-weight: bold; text-decoration: none;">Natural</ins> language <ins style="font-weight: bold; text-decoration: none;">generation|generate</ins>]] <ins style="font-weight: bold; text-decoration: none;">and</ins> [[<ins style="font-weight: bold; text-decoration: none;">natural</ins> language <ins style="font-weight: bold; text-decoration: none;">processing</ins>|<ins style="font-weight: bold; text-decoration: none;">process</ins> <ins style="font-weight: bold; text-decoration: none;">natural language</ins>]]. LLMs are [[Language model|language models]] with many parameters<ins style="font-weight: bold; text-decoration: none;">, based on [[machine learning]]</ins>, and are trained with [[self-supervised learning]] on a vast amount of text.</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The largest and most capable LLMs are [[Generative pre-trained transformer|generative pretrained transformers]] (GPTs), which are largely used in [[Generative artificial intelligence|generative]] [[Chatbot|chatbots]] such as [[ChatGPT]] or [[Gemini (chatbot)|Gemini]]. LLMs can be [[Fine-tuning (deep learning)|fine-tuned]] for specific tasks or guided by [[prompt engineering]].&lt;ref name="few-shot-learners"&gt;{{cite journal |last1=Brown |first1=Tom B. |last2=Mann |first2=Benjamin |last3=Ryder |first3=Nick |last4=Subbiah |first4=Melanie |last5=Kaplan |first5=Jared |last6=Dhariwal |first6=Prafulla |last7=Neelakantan |first7=Arvind |last8=Shyam |first8=Pranav |last9=Sastry |first9=Girish |last10=Askell |first10=Amanda |last11=Agarwal |first11=Sandhini |last12=Herbert-Voss |first12=Ariel |last13=Krueger |first13=Gretchen |last14=Henighan |first14=Tom |last15=Child |first15=Rewon |date=Dec 2020 |editor1-last=Larochelle |editor1-first=H. |editor2-last=Ranzato |editor2-first=M. |editor3-last=Hadsell |editor3-first=R. |editor4-last=Balcan |editor4-first=M.F. |editor5-last=Lin |editor5-first=H. |title=Language Models are Few-Shot Learners |url=https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=33 |pages=1877–1901 |last25=Chess |last20=Hesse |first20=Christopher |last21=Chen |first21=Mark |last22=Sigler |first22=Eric |last23=Litwin |first23=Mateusz |last24=Gray |first24=Scott |first26=Jack |first25=Benjamin |last26=Clark |last19=Winter |last27=Berner |first27=Christopher |last28=McCandlish |first28=Sam |last29=Radford |first29=Alec |last30=Sutskever |first30=Ilya |last31=Amodei |first31=Dario |first19=Clemens |first18=Jeffrey |last18=Wu |last16=Ramesh |first16=Aditya |last17=Ziegler |first17=Daniel M. |access-date=2023-03-14 |archive-date=2023-11-17 |archive-url=https://web.archive.org/web/20231117204007/https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf |url-status=live }}&lt;/ref&gt; These models acquire [[Predictive learning|predictive power]] regarding [[syntax]], [[semantics]], and [[ontology (information science)|ontologies]]&lt;ref&gt;{{cite conference |url=https://2024.eswc-conferences.org/wp-content/uploads/2024/05/77770034.pdf |title=NeOn-GPT: A Large Language Model-Powered Pipeline for Ontology Learning |last1=Fathallah |first1=Nadeen |last2=Das |first2=Arunav |last3=De Giorgis |first3=Stefano |last4=Poltronieri |first4=Andrea |last5=Haase |first5=Peter |last6=Kovriguina |first6=Liubov |date=2024-05-26 |location=Hersonissos, Greece |conference=Extended Semantic Web Conference 2024}}&lt;/ref&gt; inherent in human [[Text corpus|language corpora]], but they also inherit inaccuracies and [[Algorithmic bias|biases]] present in the [[Training, validation, and test data sets|data]] they are trained in.&lt;ref name="Manning-2022"&gt;{{cite journal |last=Manning |first=Christopher D. |author-link=Christopher D. Manning |year=2022 |title=Human Language Understanding &amp; Reasoning |url=https://www.amacad.org/publication/human-language-understanding-reasoning |journal=Daedalus |volume=151 |issue=2 |pages=127–138 |doi=10.1162/daed_a_01905 |s2cid=248377870 |doi-access=free |access-date=2023-03-09 |archive-date=2023-11-17 |archive-url=https://web.archive.org/web/20231117205531/https://www.amacad.org/publication/human-language-understanding-reasoning |url-status=live }}&lt;/ref&gt;</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The largest and most capable LLMs are [[Generative pre-trained transformer|generative pretrained transformers]] (GPTs), which are largely used in [[Generative artificial intelligence|generative]] [[Chatbot|chatbots]] such as [[ChatGPT]] or [[Gemini (chatbot)|Gemini]]. LLMs can be [[Fine-tuning (deep learning)|fine-tuned]] for specific tasks or guided by [[prompt engineering]].&lt;ref name="few-shot-learners"&gt;{{cite journal |last1=Brown |first1=Tom B. |last2=Mann |first2=Benjamin |last3=Ryder |first3=Nick |last4=Subbiah |first4=Melanie |last5=Kaplan |first5=Jared |last6=Dhariwal |first6=Prafulla |last7=Neelakantan |first7=Arvind |last8=Shyam |first8=Pranav |last9=Sastry |first9=Girish |last10=Askell |first10=Amanda |last11=Agarwal |first11=Sandhini |last12=Herbert-Voss |first12=Ariel |last13=Krueger |first13=Gretchen |last14=Henighan |first14=Tom |last15=Child |first15=Rewon |date=Dec 2020 |editor1-last=Larochelle |editor1-first=H. |editor2-last=Ranzato |editor2-first=M. |editor3-last=Hadsell |editor3-first=R. |editor4-last=Balcan |editor4-first=M.F. |editor5-last=Lin |editor5-first=H. |title=Language Models are Few-Shot Learners |url=https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=33 |pages=1877–1901 |last25=Chess |last20=Hesse |first20=Christopher |last21=Chen |first21=Mark |last22=Sigler |first22=Eric |last23=Litwin |first23=Mateusz |last24=Gray |first24=Scott |first26=Jack |first25=Benjamin |last26=Clark |last19=Winter |last27=Berner |first27=Christopher |last28=McCandlish |first28=Sam |last29=Radford |first29=Alec |last30=Sutskever |first30=Ilya |last31=Amodei |first31=Dario |first19=Clemens |first18=Jeffrey |last18=Wu |last16=Ramesh |first16=Aditya |last17=Ziegler |first17=Daniel M. |access-date=2023-03-14 |archive-date=2023-11-17 |archive-url=https://web.archive.org/web/20231117204007/https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf |url-status=live }}&lt;/ref&gt; These models acquire [[Predictive learning|predictive power]] regarding [[syntax]], [[semantics]], and [[ontology (information science)|ontologies]]&lt;ref&gt;{{cite conference |url=https://2024.eswc-conferences.org/wp-content/uploads/2024/05/77770034.pdf |title=NeOn-GPT: A Large Language Model-Powered Pipeline for Ontology Learning |last1=Fathallah |first1=Nadeen |last2=Das |first2=Arunav |last3=De Giorgis |first3=Stefano |last4=Poltronieri |first4=Andrea |last5=Haase |first5=Peter |last6=Kovriguina |first6=Liubov |date=2024-05-26 |location=Hersonissos, Greece |conference=Extended Semantic Web Conference 2024}}&lt;/ref&gt; inherent in human [[Text corpus|language corpora]], but they also inherit inaccuracies and [[Algorithmic bias|biases]] present in the [[Training, validation, and test data sets|data]] they are trained in.&lt;ref name="Manning-2022"&gt;{{cite journal |last=Manning |first=Christopher D. |author-link=Christopher D. Manning |year=2022 |title=Human Language Understanding &amp; Reasoning |url=https://www.amacad.org/publication/human-language-understanding-reasoning |journal=Daedalus |volume=151 |issue=2 |pages=127–138 |doi=10.1162/daed_a_01905 |s2cid=248377870 |doi-access=free |access-date=2023-03-09 |archive-date=2023-11-17 |archive-url=https://web.archive.org/web/20231117205531/https://www.amacad.org/publication/human-language-understanding-reasoning |url-status=live }}&lt;/ref&gt;</div></td> </tr> <!-- diff cache key enwiki:diff:1.41:old-1293979801:rev-1294041586:wikidiff2=table:1.14.1:ff290eae --> </table> 2A00:23C6:54FE:FB01:39CF:EA01:B980:C39D https://en.wikipedia.org/w/index.php?title=Large_language_model&diff=1293979801&oldid=prev Citation bot: Altered title. Add: pages, issue, volume. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox2 | #UCB_webform_linked 287/904 2025-06-04T20:59:30Z <p>Altered title. Add: pages, issue, volume. | <a href="/wiki/Wikipedia:UCB" class="mw-redirect" title="Wikipedia:UCB">Use this bot</a>. <a href="/wiki/Wikipedia:DBUG" class="mw-redirect" title="Wikipedia:DBUG">Report bugs</a>. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox2 | #UCB_webform_linked 287/904</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 20:59, 4 June 2025</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 120:</td> <td colspan="2" class="diff-lineno">Line 120:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Tool use ==</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Tool use ==</div></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Tool use is a mechanism that enables LLMs to interact with external systems, applications, or data sources. It can allow for example to fetch real-time information from an API or to execute code.&lt;ref&gt;{{Cite web |last=Dickson |first=Ben |date=2025-04-02 |title=The tool integration problem <del style="font-weight: bold; text-decoration: none;">that’s</del> holding back enterprise AI (and how CoTools solves it) |url=https://venturebeat.com/ai/the-tool-integration-problem-thats-holding-back-enterprise-ai-and-how-cotools-solves-it/ |access-date=2025-05-26 |website=VentureBeat |language=en-US}}&lt;/ref&gt; Generally, in order to get an LLM to use tools, one must fine-tune it for tool use. If the number of tools is finite, then fine-tuning may be done just once. If the number of tools can grow arbitrarily, as with online [[API]] services, then the LLM can be fine-tuned to be able to read API documentation and call API correctly.&lt;ref name="lLrda"&gt;{{Cite arXiv |eprint=2303.16434 |class=cs.AI |first1=Yaobo |last1=Liang |first2=Chenfei |last2=Wu |title=TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs |date=2023-03-01 |last3=Song |first3=Ting |last4=Wu |first4=Wenshan |last5=Xia |first5=Yan |last6=Liu |first6=Yu |last7=Ou |first7=Yang |last8=Lu |first8=Shuai |last9=Ji |first9=Lei |last10=Mao |first10=Shaoguang |last11=Wang |first11=Yun |last12=Shou |first12=Linjun |last13=Gong |first13=Ming |last14=Duan |first14=Nan}}&lt;/ref&gt;&lt;ref name="4Xzrs"&gt;{{Cite arXiv |last1=Patil |first1=Shishir G. |last2=Zhang |first2=Tianjun |last3=Wang |first3=Xin |last4=Gonzalez |first4=Joseph E. |date=2023-05-01 |title=Gorilla: Large Language Model Connected with Massive APIs |class=cs.CL |eprint=2305.15334}}&lt;/ref&gt;</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Tool use is a mechanism that enables LLMs to interact with external systems, applications, or data sources. It can allow for example to fetch real-time information from an API or to execute code.&lt;ref&gt;{{Cite web |last=Dickson |first=Ben |date=2025-04-02 |title=The tool integration problem <ins style="font-weight: bold; text-decoration: none;">that's</ins> holding back enterprise AI (and how CoTools solves it) |url=https://venturebeat.com/ai/the-tool-integration-problem-thats-holding-back-enterprise-ai-and-how-cotools-solves-it/ |access-date=2025-05-26 |website=VentureBeat |language=en-US}}&lt;/ref&gt; Generally, in order to get an LLM to use tools, one must fine-tune it for tool use. If the number of tools is finite, then fine-tuning may be done just once. If the number of tools can grow arbitrarily, as with online [[API]] services, then the LLM can be fine-tuned to be able to read API documentation and call API correctly.&lt;ref name="lLrda"&gt;{{Cite arXiv |eprint=2303.16434 |class=cs.AI |first1=Yaobo |last1=Liang |first2=Chenfei |last2=Wu |title=TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs |date=2023-03-01 |last3=Song |first3=Ting |last4=Wu |first4=Wenshan |last5=Xia |first5=Yan |last6=Liu |first6=Yu |last7=Ou |first7=Yang |last8=Lu |first8=Shuai |last9=Ji |first9=Lei |last10=Mao |first10=Shaoguang |last11=Wang |first11=Yun |last12=Shou |first12=Linjun |last13=Gong |first13=Ming |last14=Duan |first14=Nan}}&lt;/ref&gt;&lt;ref name="4Xzrs"&gt;{{Cite arXiv |last1=Patil |first1=Shishir G. |last2=Zhang |first2=Tianjun |last3=Wang |first3=Xin |last4=Gonzalez |first4=Joseph E. |date=2023-05-01 |title=Gorilla: Large Language Model Connected with Massive APIs |class=cs.CL |eprint=2305.15334}}&lt;/ref&gt;</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[[Retrieval-augmented generation]] (RAG) is another approach that enhances LLMs by integrating them with [[document retrieval]] systems. Given a query, a document retriever is called to retrieve the most relevant documents. This is usually done by encoding the query and the documents into vectors, then finding the documents with vectors (usually stored in a [[vector database]]) most similar to the vector of the query. The LLM then generates an output based on both the query and context included from the retrieved documents.&lt;ref name="BUZBP"&gt;{{Cite journal |last1=Lewis |first1=Patrick |last2=Perez |first2=Ethan |last3=Piktus |first3=Aleksandra |last4=Petroni |first4=Fabio |last5=Karpukhin |first5=Vladimir |last6=Goyal |first6=Naman |last7=Küttler |first7=Heinrich |last8=Lewis |first8=Mike |last9=Yih |first9=Wen-tau |last10=Rocktäschel |first10=Tim |last11=Riedel |first11=Sebastian |last12=Kiela |first12=Douwe |date=2020 |title=Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks |url=https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=33 |pages=9459–9474 |arxiv=2005.11401 |access-date=2023-06-12 |archive-date=2023-06-12 |archive-url=https://web.archive.org/web/20230612171229/https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html |url-status=live }}&lt;/ref&gt;</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[[Retrieval-augmented generation]] (RAG) is another approach that enhances LLMs by integrating them with [[document retrieval]] systems. Given a query, a document retriever is called to retrieve the most relevant documents. This is usually done by encoding the query and the documents into vectors, then finding the documents with vectors (usually stored in a [[vector database]]) most similar to the vector of the query. The LLM then generates an output based on both the query and context included from the retrieved documents.&lt;ref name="BUZBP"&gt;{{Cite journal |last1=Lewis |first1=Patrick |last2=Perez |first2=Ethan |last3=Piktus |first3=Aleksandra |last4=Petroni |first4=Fabio |last5=Karpukhin |first5=Vladimir |last6=Goyal |first6=Naman |last7=Küttler |first7=Heinrich |last8=Lewis |first8=Mike |last9=Yih |first9=Wen-tau |last10=Rocktäschel |first10=Tim |last11=Riedel |first11=Sebastian |last12=Kiela |first12=Douwe |date=2020 |title=Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks |url=https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=33 |pages=9459–9474 |arxiv=2005.11401 |access-date=2023-06-12 |archive-date=2023-06-12 |archive-url=https://web.archive.org/web/20230612171229/https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html |url-status=live }}&lt;/ref&gt;</div></td> </tr> <tr> <td colspan="2" class="diff-lineno">Line 268:</td> <td colspan="2" class="diff-lineno">Line 268:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A question answering benchmark is termed "open book" if the model's prompt includes text from which the expected answer can be derived (for example, the previous question could be combined with text that includes the sentence "The Sharks have advanced to the Stanley Cup finals once, losing to the Pittsburgh Penguins in 2016."&lt;ref name="boolq" /&gt;). Otherwise, the task is considered "closed book", and the model must draw solely on its training.&lt;ref name="survey"&gt;{{cite arXiv |eprint=2303.18223 |class=cs.CL |author1=Wayne Xin Zhao |first2=Kun |last2=Zhou |title=A Survey of Large Language Models |last3=Li |first3=Junyi |last4=Tang |first4=Tianyi |last5=Wang |first5=Xiaolei |last6=Hou |first6=Yupeng |last7=Min |first7=Yingqian |last8=Zhang |first8=Beichen |last9=Zhang |first9=Junjie |last10=Dong |first10=Zican |last11=Du |first11=Yifan |last12=Yang |first12=Chen |last13=Chen |first13=Yushuo |last14=Chen |first14=Zhipeng |last15=Jiang |first15=Jinhao |last16=Ren |first16=Ruiyang |last17=Li |first17=Yifan |last18=Tang |first18=Xinyu |last19=Liu |first19=Zikang |last20=Liu |first20=Peiyu |last21=Nie |first21=Jian-Yun |last22=Wen |first22=Ji-Rong |year=2023}}&lt;/ref&gt; Examples include GLUE, SuperGLUE, [[MMLU]], BIG-bench, HELM, and [[HLE (Humanity's Last Exam)]].&lt;ref name="Huyen" /&gt;&lt;ref name="survey" /&gt;</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A question answering benchmark is termed "open book" if the model's prompt includes text from which the expected answer can be derived (for example, the previous question could be combined with text that includes the sentence "The Sharks have advanced to the Stanley Cup finals once, losing to the Pittsburgh Penguins in 2016."&lt;ref name="boolq" /&gt;). Otherwise, the task is considered "closed book", and the model must draw solely on its training.&lt;ref name="survey"&gt;{{cite arXiv |eprint=2303.18223 |class=cs.CL |author1=Wayne Xin Zhao |first2=Kun |last2=Zhou |title=A Survey of Large Language Models |last3=Li |first3=Junyi |last4=Tang |first4=Tianyi |last5=Wang |first5=Xiaolei |last6=Hou |first6=Yupeng |last7=Min |first7=Yingqian |last8=Zhang |first8=Beichen |last9=Zhang |first9=Junjie |last10=Dong |first10=Zican |last11=Du |first11=Yifan |last12=Yang |first12=Chen |last13=Chen |first13=Yushuo |last14=Chen |first14=Zhipeng |last15=Jiang |first15=Jinhao |last16=Ren |first16=Ruiyang |last17=Li |first17=Yifan |last18=Tang |first18=Xinyu |last19=Liu |first19=Zikang |last20=Liu |first20=Peiyu |last21=Nie |first21=Jian-Yun |last22=Wen |first22=Ji-Rong |year=2023}}&lt;/ref&gt; Examples include GLUE, SuperGLUE, [[MMLU]], BIG-bench, HELM, and [[HLE (Humanity's Last Exam)]].&lt;ref name="Huyen" /&gt;&lt;ref name="survey" /&gt;</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>LLM bias may be assessed through benchmarks such as CrowS-Pairs (Crowdsourced Stereotype Pairs),&lt;ref&gt;{{cite conference |author=Nangia, Nikita and Vania, Clara and Bhalerao, Rasika and Bowman, Samuel R. |date=November 2020 |title=CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models |url=https://aclanthology.org/2020.emnlp-main.154/ |publisher=Association for Computational Linguistics |pages=1953–1967 |arxiv=2010.00133 |doi=10.18653/v1/2020.emnlp-main.154 |editor=Webber, Bonnie and Cohn, Trevor and He, Yulan and Liu, Yang |book-title=Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)}}&lt;/ref&gt; Stereo Set,&lt;ref&gt;{{cite conference |author=Nadeem, Moin and Bethke, Anna and Reddy, Siva |date=August 2021 |title=StereoSet: Measuring stereotypical bias in pretrained language models |url=https://aclanthology.org/2021.acl-long.416/ |publisher=Association for Computational Linguistics |pages=5356–5371 |arxiv=2004.09456 |doi=10.18653/v1/2021.acl-long.416 |editor=Zong, Chengqing and Xia, Fei and Li, Wenjie and Navigli, Roberto |book-title=Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)}}&lt;/ref&gt; and Parity Benchmark.&lt;ref&gt;{{cite journal |author=Simpson, Shmona and Nukpezah, Jonathan and Kie Brooks and Pandya, Raaghav |date=17 December 2024 |title=Parity benchmark for measuring bias in LLMs |journal=AI and Ethics |publisher=Springer |doi=10.1007/s43681-024-00613-4 |doi-access=free}}&lt;/ref&gt;</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>LLM bias may be assessed through benchmarks such as CrowS-Pairs (Crowdsourced Stereotype Pairs),&lt;ref&gt;{{cite conference |author=Nangia, Nikita and Vania, Clara and Bhalerao, Rasika and Bowman, Samuel R. |date=November 2020 |title=CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models |url=https://aclanthology.org/2020.emnlp-main.154/ |publisher=Association for Computational Linguistics |pages=1953–1967 |arxiv=2010.00133 |doi=10.18653/v1/2020.emnlp-main.154 |editor=Webber, Bonnie and Cohn, Trevor and He, Yulan and Liu, Yang |book-title=Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)}}&lt;/ref&gt; Stereo Set,&lt;ref&gt;{{cite conference |author=Nadeem, Moin and Bethke, Anna and Reddy, Siva |date=August 2021 |title=StereoSet: Measuring stereotypical bias in pretrained language models |url=https://aclanthology.org/2021.acl-long.416/ |publisher=Association for Computational Linguistics |pages=5356–5371 |arxiv=2004.09456 |doi=10.18653/v1/2021.acl-long.416 |editor=Zong, Chengqing and Xia, Fei and Li, Wenjie and Navigli, Roberto |book-title=Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)}}&lt;/ref&gt; and Parity Benchmark.&lt;ref&gt;{{cite journal |author=Simpson, Shmona and Nukpezah, Jonathan and Kie Brooks and Pandya, Raaghav |date=17 December 2024 |title=Parity benchmark for measuring bias in LLMs |journal=AI and Ethics<ins style="font-weight: bold; text-decoration: none;"> |volume=5 |issue=3 |pages=3087–3101</ins> |publisher=Springer |doi=10.1007/s43681-024-00613-4 |doi-access=free}}&lt;/ref&gt;</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Fact-checking and misinformation detection benchmarks are available. A 2023 study compared the fact-checking accuracy of LLMs including ChatGPT 3.5 and 4.0, Bard, and Bing AI against independent fact-checkers such as PolitiFact and Snopes. The results demonstrated moderate proficiency, with GPT-4 achieving the highest accuracy at 71%, lagging behind human fact-checkers.&lt;ref&gt;{{Cite book |last=Caramancion |first=Kevin Matthe |url=https://ieeexplore.ieee.org/document/10520446 |title=2023 IEEE Future Networks World Forum (FNWF) |date=2023-11-13 |publisher=IEEE |isbn=979-8-3503-2458-7 |pages=1–6 |chapter=News Verifiers Showdown: A Comparative Performance Evaluation of ChatGPT 3.5, ChatGPT 4.0, Bing AI, and Bard in News Fact-Checking |doi=10.1109/FNWF58287.2023.10520446 |arxiv=2306.17176}}&lt;/ref&gt;</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Fact-checking and misinformation detection benchmarks are available. A 2023 study compared the fact-checking accuracy of LLMs including ChatGPT 3.5 and 4.0, Bard, and Bing AI against independent fact-checkers such as PolitiFact and Snopes. The results demonstrated moderate proficiency, with GPT-4 achieving the highest accuracy at 71%, lagging behind human fact-checkers.&lt;ref&gt;{{Cite book |last=Caramancion |first=Kevin Matthe |url=https://ieeexplore.ieee.org/document/10520446 |title=2023 IEEE Future Networks World Forum (FNWF) |date=2023-11-13 |publisher=IEEE |isbn=979-8-3503-2458-7 |pages=1–6 |chapter=News Verifiers Showdown: A Comparative Performance Evaluation of ChatGPT 3.5, ChatGPT 4.0, Bing AI, and Bard in News Fact-Checking |doi=10.1109/FNWF58287.2023.10520446 |arxiv=2306.17176}}&lt;/ref&gt;</div></td> </tr> </table> Citation bot https://en.wikipedia.org/w/index.php?title=Large_language_model&diff=1293507989&oldid=prev OAbot: Open access bot: doi updated in citation with #oabot. 2025-06-02T03:28:22Z <p><a href="/wiki/Wikipedia:OABOT" class="mw-redirect" title="Wikipedia:OABOT">Open access bot</a>: doi updated in citation with #oabot.</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 03:28, 2 June 2025</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 202:</td> <td colspan="2" class="diff-lineno">Line 202:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* decoding the [[International Phonetic Alphabet]]</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* decoding the [[International Phonetic Alphabet]]</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* unscrambling a word's letters</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* unscrambling a word's letters</div></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* disambiguating word-in-context datasets&lt;ref name="emergentpaper" /&gt;&lt;ref name="57FEA"&gt;{{Cite journal |last1=Pilehvar |first1=Mohammad Taher |last2=Camacho-Collados |first2=Jose |title=Proceedings of the 2019 Conference of the North |date=June 2019 |url=https://aclanthology.org/N19-1128 |journal=Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) |location=Minneapolis, Minnesota |publisher=Association for Computational Linguistics |pages=1267–1273 |doi=10.18653/v1/N19-1128 |s2cid=102353817 |access-date=2023-06-27 |archive-date=2023-06-27 |archive-url=https://web.archive.org/web/20230627202732/https://aclanthology.org/N19-1128/ |url-status=live |url-access=subscription }}&lt;/ref&gt;&lt;ref name="TEIkA"&gt;{{Cite web |title=WiC: The Word-in-Context Dataset |url=https://pilehvar.github.io/wic/ |access-date=2023-06-27 |website=pilehvar.github.io |archive-date=2023-06-27 |archive-url=https://web.archive.org/web/20230627202725/https://pilehvar.github.io/wic/ |url-status=live }}&lt;/ref&gt; </div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* disambiguating word-in-context datasets&lt;ref name="emergentpaper" /&gt;&lt;ref name="57FEA"&gt;{{Cite journal |last1=Pilehvar |first1=Mohammad Taher |last2=Camacho-Collados |first2=Jose |title=Proceedings of the 2019 Conference of the North |date=June 2019 |url=https://aclanthology.org/N19-1128 |journal=Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) |location=Minneapolis, Minnesota |publisher=Association for Computational Linguistics |pages=1267–1273 |doi=10.18653/v1/N19-1128 |s2cid=102353817 |access-date=2023-06-27 |archive-date=2023-06-27 |archive-url=https://web.archive.org/web/20230627202732/https://aclanthology.org/N19-1128/ |url-status=live |url-access=subscription<ins style="font-weight: bold; text-decoration: none;"> |doi-access=free</ins> }}&lt;/ref&gt;&lt;ref name="TEIkA"&gt;{{Cite web |title=WiC: The Word-in-Context Dataset |url=https://pilehvar.github.io/wic/ |access-date=2023-06-27 |website=pilehvar.github.io |archive-date=2023-06-27 |archive-url=https://web.archive.org/web/20230627202725/https://pilehvar.github.io/wic/ |url-status=live }}&lt;/ref&gt; </div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* converting spatial words</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* converting spatial words</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* [[cardinal direction]]s (for example, replying "northeast" in response to a 3x3 grid of 8 zeros and a 1 in the top-right), color terms represented in text.&lt;ref name="zgy1i"&gt;{{Cite journal |last1=Patel |first1=Roma |last2=Pavlick |first2=Ellie |date=2021-10-06 |title=Mapping Language Models to Grounded Conceptual Spaces |url=https://openreview.net/forum?id=gJcEM8sxHK |journal=ICLR |access-date=2023-06-27 |archive-date=2023-06-24 |archive-url=https://web.archive.org/web/20230624191940/https://openreview.net/forum?id=gJcEM8sxHK |url-status=live }}&lt;/ref&gt;</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* [[cardinal direction]]s (for example, replying "northeast" in response to a 3x3 grid of 8 zeros and a 1 in the top-right), color terms represented in text.&lt;ref name="zgy1i"&gt;{{Cite journal |last1=Patel |first1=Roma |last2=Pavlick |first2=Ellie |date=2021-10-06 |title=Mapping Language Models to Grounded Conceptual Spaces |url=https://openreview.net/forum?id=gJcEM8sxHK |journal=ICLR |access-date=2023-06-27 |archive-date=2023-06-24 |archive-url=https://web.archive.org/web/20230624191940/https://openreview.net/forum?id=gJcEM8sxHK |url-status=live }}&lt;/ref&gt;</div></td> </tr> <tr> <td colspan="2" class="diff-lineno">Line 298:</td> <td colspan="2" class="diff-lineno">Line 298:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Wider impact ==</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Wider impact ==</div></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>In 2023, ''[[Nature Biomedical Engineering]]'' wrote that "it is no longer possible to accurately distinguish" human-written text from text created by large language models, and that "It is all but certain that general-purpose large language models will rapidly proliferate... It is a rather safe bet that they will change many industries over time."&lt;ref name="ZDTUM"&gt;{{cite journal |date=7 March 2023 |title=Prepare for truly useful large language models |journal=Nature Biomedical Engineering |volume=7 |issue=2 |pages=85–86 |doi=10.1038/s41551-023-01012-6 |pmid=36882584 |s2cid=257403466}}&lt;/ref&gt; [[Goldman Sachs]] suggested in 2023 that generative language AI could increase global GDP by 7% in the next ten years, and could expose to automation 300 million jobs globally.&lt;ref name="81w7x"&gt;{{cite news |date=7 May 2023 |title=Your job is (probably) safe from artificial intelligence |newspaper=The Economist |url=https://www.economist.com/finance-and-economics/2023/05/07/your-job-is-probably-safe-from-artificial-intelligence |access-date=18 June 2023 |archive-date=17 June 2023 |archive-url=https://web.archive.org/web/20230617225618/https://www.economist.com/finance-and-economics/2023/05/07/your-job-is-probably-safe-from-artificial-intelligence |url-status=live }}&lt;/ref&gt;&lt;ref name="zIM6Y"&gt;{{cite web |title=Generative AI Could Raise Global GDP by 7% |url=https://www.goldmansachs.com/intelligence/pages/generative-ai-could-raise-global-gdp-by-7-percent.html |access-date=18 June 2023 |website=Goldman Sachs |archive-date=18 June 2023 |archive-url=https://web.archive.org/web/20230618013836/https://www.goldmansachs.com/intelligence/pages/generative-ai-could-raise-global-gdp-by-7-percent.html |url-status=live }}&lt;/ref&gt; Brinkmann et al. (2023)&lt;ref&gt;{{Cite journal |last1=Brinkmann |first1=Levin |last2=Baumann |first2=Fabian |last3=Bonnefon |first3=Jean-François |last4=Derex |first4=Maxime |last5=Müller |first5=Thomas F. |last6=Nussberger |first6=Anne-Marie |last7=Czaplicka |first7=Agnieszka |last8=Acerbi |first8=Alberto |last9=Griffiths |first9=Thomas L. |last10=Henrich |first10=Joseph |last11=Leibo |first11=Joel Z. |last12=McElreath |first12=Richard |last13=Oudeyer |first13=Pierre-Yves |last14=Stray |first14=Jonathan |last15=Rahwan |first15=Iyad |date=2023-11-20 |title=Machine culture |url=https://www.nature.com/articles/s41562-023-01742-2 |journal=Nature Human Behaviour |language=en |volume=7 |issue=11 |pages=1855–1868 |doi=10.1038/s41562-023-01742-2 |pmid=37985914 |issn=2397-3374|arxiv=2311.11388 }}&lt;/ref&gt; also argue that LLMs are transforming processes of [[cultural evolution]] by shaping processes of variation, transmission, and selection.</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>In 2023, ''[[Nature Biomedical Engineering]]'' wrote that "it is no longer possible to accurately distinguish" human-written text from text created by large language models, and that "It is all but certain that general-purpose large language models will rapidly proliferate... It is a rather safe bet that they will change many industries over time."&lt;ref name="ZDTUM"&gt;{{cite journal |date=7 March 2023 |title=Prepare for truly useful large language models |journal=Nature Biomedical Engineering |volume=7 |issue=2 |pages=85–86 |doi=10.1038/s41551-023-01012-6 |pmid=36882584 |s2cid=257403466<ins style="font-weight: bold; text-decoration: none;">|doi-access=free </ins>}}&lt;/ref&gt; [[Goldman Sachs]] suggested in 2023 that generative language AI could increase global GDP by 7% in the next ten years, and could expose to automation 300 million jobs globally.&lt;ref name="81w7x"&gt;{{cite news |date=7 May 2023 |title=Your job is (probably) safe from artificial intelligence |newspaper=The Economist |url=https://www.economist.com/finance-and-economics/2023/05/07/your-job-is-probably-safe-from-artificial-intelligence |access-date=18 June 2023 |archive-date=17 June 2023 |archive-url=https://web.archive.org/web/20230617225618/https://www.economist.com/finance-and-economics/2023/05/07/your-job-is-probably-safe-from-artificial-intelligence |url-status=live }}&lt;/ref&gt;&lt;ref name="zIM6Y"&gt;{{cite web |title=Generative AI Could Raise Global GDP by 7% |url=https://www.goldmansachs.com/intelligence/pages/generative-ai-could-raise-global-gdp-by-7-percent.html |access-date=18 June 2023 |website=Goldman Sachs |archive-date=18 June 2023 |archive-url=https://web.archive.org/web/20230618013836/https://www.goldmansachs.com/intelligence/pages/generative-ai-could-raise-global-gdp-by-7-percent.html |url-status=live }}&lt;/ref&gt; Brinkmann et al. (2023)&lt;ref&gt;{{Cite journal |last1=Brinkmann |first1=Levin |last2=Baumann |first2=Fabian |last3=Bonnefon |first3=Jean-François |last4=Derex |first4=Maxime |last5=Müller |first5=Thomas F. |last6=Nussberger |first6=Anne-Marie |last7=Czaplicka |first7=Agnieszka |last8=Acerbi |first8=Alberto |last9=Griffiths |first9=Thomas L. |last10=Henrich |first10=Joseph |last11=Leibo |first11=Joel Z. |last12=McElreath |first12=Richard |last13=Oudeyer |first13=Pierre-Yves |last14=Stray |first14=Jonathan |last15=Rahwan |first15=Iyad |date=2023-11-20 |title=Machine culture |url=https://www.nature.com/articles/s41562-023-01742-2 |journal=Nature Human Behaviour |language=en |volume=7 |issue=11 |pages=1855–1868 |doi=10.1038/s41562-023-01742-2 |pmid=37985914 |issn=2397-3374|arxiv=2311.11388 }}&lt;/ref&gt; also argue that LLMs are transforming processes of [[cultural evolution]] by shaping processes of variation, transmission, and selection.</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>=== Memorization and copyright ===</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>=== Memorization and copyright ===</div></td> </tr> </table> OAbot https://en.wikipedia.org/w/index.php?title=Large_language_model&diff=1293483271&oldid=prev MrGoodEgg: /* History */ 2025-06-02T00:05:39Z <p><span class="autocomment">History</span></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 00:05, 2 June 2025</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 17:</td> <td colspan="2" class="diff-lineno">Line 17:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>At the 2017 [[NeurIPS]] conference, Google researchers introduced the transformer architecture in their landmark paper "[[Attention Is All You Need]]". This paper's goal was to improve upon 2014 seq2seq technology,&lt;ref&gt;{{cite journal |last1=Vaswani |first1=Ashish |author1-link=Ashish Vaswani |last2=Shazeer |first2=Noam |last3=Parmar |first3=Niki |last4=Uszkoreit |first4=Jakob |last5=Jones |first5=Llion |last6=Gomez |first6=Aidan N |author6-link=Aidan Gomez |last7=Kaiser |first7=Łukasz |last8=Polosukhin |first8=Illia |title=Attention is All you Need |journal=Advances in Neural Information Processing Systems |date=2017 |volume=30 |url=https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf |publisher=Curran Associates, Inc. |access-date=2024-01-21 |archive-date=2024-02-21 |archive-url=https://web.archive.org/web/20240221141113/https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf |url-status=live }}&lt;/ref&gt; and was based mainly on the [[attention (machine learning)|attention]] mechanism developed by Bahdanau et al. in 2014.&lt;ref&gt;{{cite arXiv |last1=Bahdanau |first1=Dzmitry |last2=Cho |first2=Kyunghyun |last3=Bengio |first3=Yoshua |title=Neural Machine Translation by Jointly Learning to Align and Translate |date=2014 |class=cs.CL |eprint=1409.0473}}&lt;/ref&gt; The following year in 2018, [[BERT (language model)|BERT]] was introduced and quickly became "ubiquitous".&lt;ref&gt;{{Cite journal|last1=Rogers|first1=Anna|last2=Kovaleva|first2=Olga|last3=Rumshisky|first3=Anna|date=2020|title=A Primer in BERTology: What We Know About How BERT Works|url=https://aclanthology.org/2020.tacl-1.54|journal=Transactions of the Association for Computational Linguistics|volume=8|pages=842–866|doi=10.1162/tacl_a_00349|arxiv=2002.12327|s2cid=211532403|access-date=2024-01-21|archive-date=2022-04-03|archive-url=https://web.archive.org/web/20220403103310/https://aclanthology.org/2020.tacl-1.54/|url-status=live}}&lt;/ref&gt; Though the original transformer has both encoder and decoder blocks, BERT is an encoder-only model. Academic and research usage of BERT began to decline in 2023, following rapid improvements in the abilities of decoder-only models (such as GPT) to solve tasks via [[Prompt engineering|prompting]].&lt;ref name="auto"&gt;{{Cite book|last1=Movva|first1=Rajiv|last2=Balachandar|first2=Sidhika|last3=Peng|first3=Kenny|last4=Agostini|first4=Gabriel|last5=Garg|first5=Nikhil|last6=Pierson|first6=Emma|chapter=Topics, Authors, and Institutions in Large Language Model Research: Trends from 17K arXiv Papers |date=2024|title=Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)|chapter-url=https://aclanthology.org/2024.naacl-long.67|volume=|pages=1223–1243|doi=10.18653/v1/2024.naacl-long.67|arxiv=2307.10700 |access-date=2024-12-08}}&lt;/ref&gt;</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>At the 2017 [[NeurIPS]] conference, Google researchers introduced the transformer architecture in their landmark paper "[[Attention Is All You Need]]". This paper's goal was to improve upon 2014 seq2seq technology,&lt;ref&gt;{{cite journal |last1=Vaswani |first1=Ashish |author1-link=Ashish Vaswani |last2=Shazeer |first2=Noam |last3=Parmar |first3=Niki |last4=Uszkoreit |first4=Jakob |last5=Jones |first5=Llion |last6=Gomez |first6=Aidan N |author6-link=Aidan Gomez |last7=Kaiser |first7=Łukasz |last8=Polosukhin |first8=Illia |title=Attention is All you Need |journal=Advances in Neural Information Processing Systems |date=2017 |volume=30 |url=https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf |publisher=Curran Associates, Inc. |access-date=2024-01-21 |archive-date=2024-02-21 |archive-url=https://web.archive.org/web/20240221141113/https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf |url-status=live }}&lt;/ref&gt; and was based mainly on the [[attention (machine learning)|attention]] mechanism developed by Bahdanau et al. in 2014.&lt;ref&gt;{{cite arXiv |last1=Bahdanau |first1=Dzmitry |last2=Cho |first2=Kyunghyun |last3=Bengio |first3=Yoshua |title=Neural Machine Translation by Jointly Learning to Align and Translate |date=2014 |class=cs.CL |eprint=1409.0473}}&lt;/ref&gt; The following year in 2018, [[BERT (language model)|BERT]] was introduced and quickly became "ubiquitous".&lt;ref&gt;{{Cite journal|last1=Rogers|first1=Anna|last2=Kovaleva|first2=Olga|last3=Rumshisky|first3=Anna|date=2020|title=A Primer in BERTology: What We Know About How BERT Works|url=https://aclanthology.org/2020.tacl-1.54|journal=Transactions of the Association for Computational Linguistics|volume=8|pages=842–866|doi=10.1162/tacl_a_00349|arxiv=2002.12327|s2cid=211532403|access-date=2024-01-21|archive-date=2022-04-03|archive-url=https://web.archive.org/web/20220403103310/https://aclanthology.org/2020.tacl-1.54/|url-status=live}}&lt;/ref&gt; Though the original transformer has both encoder and decoder blocks, BERT is an encoder-only model. Academic and research usage of BERT began to decline in 2023, following rapid improvements in the abilities of decoder-only models (such as GPT) to solve tasks via [[Prompt engineering|prompting]].&lt;ref name="auto"&gt;{{Cite book|last1=Movva|first1=Rajiv|last2=Balachandar|first2=Sidhika|last3=Peng|first3=Kenny|last4=Agostini|first4=Gabriel|last5=Garg|first5=Nikhil|last6=Pierson|first6=Emma|chapter=Topics, Authors, and Institutions in Large Language Model Research: Trends from 17K arXiv Papers |date=2024|title=Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)|chapter-url=https://aclanthology.org/2024.naacl-long.67|volume=|pages=1223–1243|doi=10.18653/v1/2024.naacl-long.67|arxiv=2307.10700 |access-date=2024-12-08}}&lt;/ref&gt;</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Although decoder-only [[GPT-1]] was introduced in 2018, it was [[GPT-2]] in 2019 that caught widespread attention because [[OpenAI]] at first deemed it too powerful to release publicly, out of fear of malicious use.&lt;ref&gt;{{cite web |url=https://www.theguardian.com/technology/2019/feb/14/elon-musk-backed-ai-writes-convincing-news-fiction |title=New AI fake text generator may be too dangerous to release, say creators |last=Hern |first=Alex |work=[[The Guardian]] |date=14 February 2019 |access-date=20 January 2024 |archive-date=14 February 2019 |archive-url=https://web.archive.org/web/20190214173112/https://www.theguardian.com/technology/2019/feb/14/elon-musk-backed-ai-writes-convincing-news-fiction |url-status=live }}&lt;/ref&gt; [[GPT-3]] in 2020 went a step further and {{as of|<del style="font-weight: bold; text-decoration: none;">2024</del>|lc=y}} is available only via [[Web API|API]] with no offering of downloading the model to execute locally. But it was the 2022 consumer-facing browser-based [[ChatGPT]] that captured the imaginations of the general population and caused some media hype and online buzz.&lt;ref&gt;{{cite web |url=https://www.euronews.com/next/2023/11/30/chatgpt-a-year-on-3-ways-the-ai-chatbot-has-completely-changed-the-world-in-12-months |title=ChatGPT a year on: 3 ways the AI chatbot has completely changed the world in 12 months |author=&lt;!--Not stated--&gt; |date=November 30, 2023 |publisher=[[Euronews]] |access-date=January 20, 2024 |archive-date=January 14, 2024 |archive-url=https://web.archive.org/web/20240114025250/https://www.euronews.com/next/2023/11/30/chatgpt-a-year-on-3-ways-the-ai-chatbot-has-completely-changed-the-world-in-12-months |url-status=live }}&lt;/ref&gt; The 2023 [[GPT-4]] was praised for its increased accuracy and as a "holy grail" for its [[Multimodal learning|multimodal]] capabilities.&lt;ref&gt;{{cite web |url=https://www.technologyreview.com/2023/03/14/1069823/gpt-4-is-bigger-and-better-chatgpt-openai/ |title=GPT-4 is bigger and better than ChatGPT—but OpenAI won't say why |last=Heaven |first=Will |date=March 14, 2023 |publisher=[[MIT Technology Review]] |access-date=January 20, 2024 |archive-date=March 17, 2023 |archive-url=https://web.archive.org/web/20230317224201/https://www.technologyreview.com/2023/03/14/1069823/gpt-4-is-bigger-and-better-chatgpt-openai/ |url-status=live }}&lt;/ref&gt; OpenAI did not reveal the high-level architecture and the number of [[Parameter#Artificial intelligence|parameters]] of GPT-4. The release of ChatGPT led to an uptick in LLM usage across several research subfields of computer science, including robotics, software engineering, and societal impact work.&lt;ref name="auto"/&gt; In 2024 OpenAI released the reasoning model [[OpenAI o1]], which generates long chains of thought before returning a final answer.</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Although decoder-only [[GPT-1]] was introduced in 2018, it was [[GPT-2]] in 2019 that caught widespread attention because [[OpenAI]] at first deemed it too powerful to release publicly, out of fear of malicious use.&lt;ref&gt;{{cite web |url=https://www.theguardian.com/technology/2019/feb/14/elon-musk-backed-ai-writes-convincing-news-fiction |title=New AI fake text generator may be too dangerous to release, say creators |last=Hern |first=Alex |work=[[The Guardian]] |date=14 February 2019 |access-date=20 January 2024 |archive-date=14 February 2019 |archive-url=https://web.archive.org/web/20190214173112/https://www.theguardian.com/technology/2019/feb/14/elon-musk-backed-ai-writes-convincing-news-fiction |url-status=live }}&lt;/ref&gt; [[GPT-3]] in 2020 went a step further and {{as of|<ins style="font-weight: bold; text-decoration: none;">2025</ins>|lc=y}} is available only via [[Web API|API]] with no offering of downloading the model to execute locally. But it was the 2022 consumer-facing browser-based [[ChatGPT]] that captured the imaginations of the general population and caused some media hype and online buzz.&lt;ref&gt;{{cite web |url=https://www.euronews.com/next/2023/11/30/chatgpt-a-year-on-3-ways-the-ai-chatbot-has-completely-changed-the-world-in-12-months |title=ChatGPT a year on: 3 ways the AI chatbot has completely changed the world in 12 months |author=&lt;!--Not stated--&gt; |date=November 30, 2023 |publisher=[[Euronews]] |access-date=January 20, 2024 |archive-date=January 14, 2024 |archive-url=https://web.archive.org/web/20240114025250/https://www.euronews.com/next/2023/11/30/chatgpt-a-year-on-3-ways-the-ai-chatbot-has-completely-changed-the-world-in-12-months |url-status=live }}&lt;/ref&gt; The 2023 [[GPT-4]] was praised for its increased accuracy and as a "holy grail" for its [[Multimodal learning|multimodal]] capabilities.&lt;ref&gt;{{cite web |url=https://www.technologyreview.com/2023/03/14/1069823/gpt-4-is-bigger-and-better-chatgpt-openai/ |title=GPT-4 is bigger and better than ChatGPT—but OpenAI won't say why |last=Heaven |first=Will |date=March 14, 2023 |publisher=[[MIT Technology Review]] |access-date=January 20, 2024 |archive-date=March 17, 2023 |archive-url=https://web.archive.org/web/20230317224201/https://www.technologyreview.com/2023/03/14/1069823/gpt-4-is-bigger-and-better-chatgpt-openai/ |url-status=live }}&lt;/ref&gt; OpenAI did not reveal the high-level architecture and the number of [[Parameter#Artificial intelligence|parameters]] of GPT-4. The release of ChatGPT led to an uptick in LLM usage across several research subfields of computer science, including robotics, software engineering, and societal impact work.&lt;ref name="auto"/&gt; In 2024 OpenAI released the reasoning model [[OpenAI o1]], which generates long chains of thought before returning a final answer.</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Competing language models have for the most part been attempting to equal the GPT series, at least in terms of number of parameters.&lt;ref&gt;{{cite web |url=https://ourworldindata.org/grapher/artificial-intelligence-parameter-count?time=2017-09-05..latest |title=Parameters in notable artificial intelligence systems |author=&lt;!--Not stated--&gt; |date=November 30, 2023 |website=ourworldindata.org |access-date=January 20, 2024}}&lt;/ref&gt;</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Competing language models have for the most part been attempting to equal the GPT series, at least in terms of number of parameters.&lt;ref&gt;{{cite web |url=https://ourworldindata.org/grapher/artificial-intelligence-parameter-count?time=2017-09-05..latest |title=Parameters in notable artificial intelligence systems |author=&lt;!--Not stated--&gt; |date=November 30, 2023 |website=ourworldindata.org |access-date=January 20, 2024}}&lt;/ref&gt;</div></td> </tr> </table> MrGoodEgg https://en.wikipedia.org/w/index.php?title=Large_language_model&diff=1293177504&oldid=prev 75.84.221.176: Grammar 2025-05-31T05:10:48Z <p>Grammar</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 05:10, 31 May 2025</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 75:</td> <td colspan="2" class="diff-lineno">Line 75:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Training and architecture ==</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Training and architecture ==</div></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>{{See also|Fine-tuning (machine learning)}}<del style="font-weight: bold; text-decoration: none;">A</del> LLM is a type of [[foundation model]] (large X model) trained on language.&lt;ref&gt;{{Cite web |last= |first= |date=2025-02-05 |title=Foundation Models And LLMs: 19 Real-World, Practical Use Cases |url=https://www.forbes.com/councils/forbestechcouncil/2025/02/05/foundation-models-and-llms-19-real-world-practical-use-cases/ |access-date=2025-05-26 |website=Forbes |language=en}}&lt;/ref&gt; LLMs can be trained in different ways. In particular, GPT models are first pretrained to predict the next word on a large amount of data, before being fine-tuned.&lt;ref&gt;{{Cite web |title=7 Steps to Mastering Large Language Model Fine-tuning |url=https://www.kdnuggets.com/7-steps-to-mastering-large-language-model-fine-tuning |access-date=2025-05-26 |website=KDnuggets |language=en-US}}&lt;/ref&gt;</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>{{See also|Fine-tuning (machine learning)}}<ins style="font-weight: bold; text-decoration: none;">An</ins> LLM is a type of [[foundation model]] (large X model) trained on language.&lt;ref&gt;{{Cite web |last= |first= |date=2025-02-05 |title=Foundation Models And LLMs: 19 Real-World, Practical Use Cases |url=https://www.forbes.com/councils/forbestechcouncil/2025/02/05/foundation-models-and-llms-19-real-world-practical-use-cases/ |access-date=2025-05-26 |website=Forbes |language=en}}&lt;/ref&gt; LLMs can be trained in different ways. In particular, GPT models are first pretrained to predict the next word on a large amount of data, before being fine-tuned.&lt;ref&gt;{{Cite web |title=7 Steps to Mastering Large Language Model Fine-tuning |url=https://www.kdnuggets.com/7-steps-to-mastering-large-language-model-fine-tuning |access-date=2025-05-26 |website=KDnuggets |language=en-US}}&lt;/ref&gt;</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>===Reinforcement learning from human feedback===</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>===Reinforcement learning from human feedback===</div></td> </tr> </table> 75.84.221.176 https://en.wikipedia.org/w/index.php?title=Large_language_model&diff=1293097720&oldid=prev Alenoach: Reworked the section "Multimodality" 2025-05-30T17:35:29Z <p>Reworked the section &quot;Multimodality&quot;</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 17:35, 30 May 2025</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 149:</td> <td colspan="2" class="diff-lineno">Line 149:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Multimodality ==</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Multimodality ==</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{See also|Multimodal learning}}</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{See also|Multimodal learning}}</div></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Multimodality<del style="font-weight: bold; text-decoration: none;"> (LxM)</del> means <del style="font-weight: bold; text-decoration: none;">"</del>having <del style="font-weight: bold; text-decoration: none;">several</del> modalities<del style="font-weight: bold; text-decoration: none;">"</del>, <del style="font-weight: bold; text-decoration: none;">and</del> a [[Modality (human–computer interaction)|<del style="font-weight: bold; text-decoration: none;">"</del>modality<del style="font-weight: bold; text-decoration: none;">"</del>]] refers to a type of input or output, such as video, image, audio, text, [[proprioception]], etc.&lt;ref&gt;{{Cite journal |last1=Kiros |first1=Ryan |last2=Salakhutdinov |first2=Ruslan |last3=Zemel |first3=Rich |date=2014-06-18 |title=Multimodal Neural Language Models |url=https://proceedings.mlr.press/v32/kiros14.html |journal=Proceedings of the 31st International Conference on Machine Learning |publisher=PMLR |pages=595–603 |access-date=2023-07-02 |archive-date=2023-07-02 |archive-url=https://web.archive.org/web/20230702195952/https://proceedings.mlr.press/v32/kiros14.html |url-status=live }}&lt;/ref&gt; <del style="font-weight: bold; text-decoration: none;">There</del> <del style="font-weight: bold; text-decoration: none;">have</del> <del style="font-weight: bold; text-decoration: none;">been</del> <del style="font-weight: bold; text-decoration: none;">many</del> <del style="font-weight: bold; text-decoration: none;">AI</del> <del style="font-weight: bold; text-decoration: none;">models</del> <del style="font-weight: bold; text-decoration: none;">trained</del> <del style="font-weight: bold; text-decoration: none;">specifically</del> <del style="font-weight: bold; text-decoration: none;">to</del> <del style="font-weight: bold; text-decoration: none;">ingest</del> <del style="font-weight: bold; text-decoration: none;">one</del> <del style="font-weight: bold; text-decoration: none;">modality</del> and <del style="font-weight: bold; text-decoration: none;">output</del> <del style="font-weight: bold; text-decoration: none;">another modality, such as</del> [[<del style="font-weight: bold; text-decoration: none;">AlexNet]]</del> <del style="font-weight: bold; text-decoration: none;">for image</del> <del style="font-weight: bold; text-decoration: none;">to label,</del>&lt;ref&gt;{{Cite <del style="font-weight: bold; text-decoration: none;">journal</del> |<del style="font-weight: bold; text-decoration: none;">last1</del>=<del style="font-weight: bold; text-decoration: none;">Krizhevsky</del> |first1=<del style="font-weight: bold; text-decoration: none;">Alex</del> |<del style="font-weight: bold; text-decoration: none;">last2</del>=<del style="font-weight: bold; text-decoration: none;">Sutskever</del> |first2=<del style="font-weight: bold; text-decoration: none;">Ilya</del> |last3=<del style="font-weight: bold; text-decoration: none;">Hinton</del> |first3=<del style="font-weight: bold; text-decoration: none;">Geoffrey</del> <del style="font-weight: bold; text-decoration: none;">E</del> |<del style="font-weight: bold; text-decoration: none;">date</del>=<del style="font-weight: bold; text-decoration: none;">2012</del> |<del style="font-weight: bold; text-decoration: none;">title</del>=<del style="font-weight: bold; text-decoration: none;">ImageNet</del> <del style="font-weight: bold; text-decoration: none;">Classification</del> <del style="font-weight: bold; text-decoration: none;">with</del> <del style="font-weight: bold; text-decoration: none;">Deep</del> <del style="font-weight: bold; text-decoration: none;">Convolutional</del> <del style="font-weight: bold; text-decoration: none;">Neural</del> <del style="font-weight: bold; text-decoration: none;">Networks</del> |<del style="font-weight: bold; text-decoration: none;">url</del>=<del style="font-weight: bold; text-decoration: none;">https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html</del> |<del style="font-weight: bold; text-decoration: none;">journal</del>=<del style="font-weight: bold; text-decoration: none;">Advances</del> <del style="font-weight: bold; text-decoration: none;">in</del> <del style="font-weight: bold; text-decoration: none;">Neural</del> <del style="font-weight: bold; text-decoration: none;">Information</del> <del style="font-weight: bold; text-decoration: none;">Processing</del> <del style="font-weight: bold; text-decoration: none;">Systems</del> |<del style="font-weight: bold; text-decoration: none;">publisher</del>=<del style="font-weight: bold; text-decoration: none;">Curran</del> <del style="font-weight: bold; text-decoration: none;">Associates,</del> <del style="font-weight: bold; text-decoration: none;">Inc.</del> |<del style="font-weight: bold; text-decoration: none;">volume</del>=<del style="font-weight: bold; text-decoration: none;">25</del> |<del style="font-weight: bold; text-decoration: none;">access-date</del>=<del style="font-weight: bold; text-decoration: none;">2023-07-02</del> |<del style="font-weight: bold; text-decoration: none;">archive-date</del>=<del style="font-weight: bold; text-decoration: none;">2023-07-02</del> |<del style="font-weight: bold; text-decoration: none;">archive-url</del>=<del style="font-weight: bold; text-decoration: none;">https://web.archive.org/web/20230702195952/https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html</del> |<del style="font-weight: bold; text-decoration: none;">url-status</del>=<del style="font-weight: bold; text-decoration: none;">live</del> }}&lt;/ref&gt; [[<del style="font-weight: bold; text-decoration: none;">visual question answering</del>]] <del style="font-weight: bold; text-decoration: none;">for</del> <del style="font-weight: bold; text-decoration: none;">image-text</del> to <del style="font-weight: bold; text-decoration: none;">text</del>,&lt;ref&gt;{{Cite <del style="font-weight: bold; text-decoration: none;">journal</del> |<del style="font-weight: bold; text-decoration: none;">last1</del>=<del style="font-weight: bold; text-decoration: none;">Antol</del> |first1=<del style="font-weight: bold; text-decoration: none;">Stanislaw</del> |<del style="font-weight: bold; text-decoration: none;">last2</del>=<del style="font-weight: bold; text-decoration: none;">Agrawal</del> |first2=<del style="font-weight: bold; text-decoration: none;">Aishwarya</del> |last3=<del style="font-weight: bold; text-decoration: none;">Lu</del> |first3=<del style="font-weight: bold; text-decoration: none;">Jiasen</del> |last4=<del style="font-weight: bold; text-decoration: none;">Mitchell</del> |first4=<del style="font-weight: bold; text-decoration: none;">Margaret</del> |<del style="font-weight: bold; text-decoration: none;">last5</del>=<del style="font-weight: bold; text-decoration: none;">Batra</del> |<del style="font-weight: bold; text-decoration: none;">first5</del>=<del style="font-weight: bold; text-decoration: none;">Dhruv</del> |<del style="font-weight: bold; text-decoration: none;">last6</del>=<del style="font-weight: bold; text-decoration: none;">Zitnick</del> |<del style="font-weight: bold; text-decoration: none;">first6</del>=<del style="font-weight: bold; text-decoration: none;">C.</del> <del style="font-weight: bold; text-decoration: none;">Lawrence</del> |<del style="font-weight: bold; text-decoration: none;">last7</del>=<del style="font-weight: bold; text-decoration: none;">Parikh</del> |<del style="font-weight: bold; text-decoration: none;">first7</del>=<del style="font-weight: bold; text-decoration: none;">Devi</del> |date=<del style="font-weight: bold; text-decoration: none;">2015</del> |title=<del style="font-weight: bold; text-decoration: none;">VQA:</del> <del style="font-weight: bold; text-decoration: none;">Visual</del> <del style="font-weight: bold; text-decoration: none;">Question</del> <del style="font-weight: bold; text-decoration: none;">Answering</del> |url=https://<del style="font-weight: bold; text-decoration: none;">openaccess</del>.<del style="font-weight: bold; text-decoration: none;">thecvf</del>.com/<del style="font-weight: bold; text-decoration: none;">content_iccv_2015</del>/<del style="font-weight: bold; text-decoration: none;">html</del>/<del style="font-weight: bold; text-decoration: none;">Antol_VQA_Visual_Question_ICCV_2015_paper</del>.<del style="font-weight: bold; text-decoration: none;">html</del> |<del style="font-weight: bold; text-decoration: none;">journal</del>=<del style="font-weight: bold; text-decoration: none;">ICCV</del> |<del style="font-weight: bold; text-decoration: none;">pages</del>=<del style="font-weight: bold; text-decoration: none;">2425–2433</del> |<del style="font-weight: bold; text-decoration: none;">access-</del>date=<del style="font-weight: bold; text-decoration: none;">2023</del>-<del style="font-weight: bold; text-decoration: none;">07</del>-<del style="font-weight: bold; text-decoration: none;">02</del> |<del style="font-weight: bold; text-decoration: none;">archive-date</del>=<del style="font-weight: bold; text-decoration: none;">2023-07-02</del> |<del style="font-weight: bold; text-decoration: none;">archive-</del>url=https://<del style="font-weight: bold; text-decoration: none;">web</del>.<del style="font-weight: bold; text-decoration: none;">archive</del>.<del style="font-weight: bold; text-decoration: none;">org</del>/<del style="font-weight: bold; text-decoration: none;">web</del>/<del style="font-weight: bold; text-decoration: none;">20230702195952/https://openaccess.thecvf.com/content_iccv_2015/html/Antol_VQA_Visual_Question_ICCV_2015_paper.html</del> |<del style="font-weight: bold; text-decoration: none;">url</del>-<del style="font-weight: bold; text-decoration: none;">status</del>=<del style="font-weight: bold; text-decoration: none;">live</del> }}&lt;/ref&gt; <del style="font-weight: bold; text-decoration: none;">and [[speech recognition]] for speech to text.</del></div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Multimodality means having <ins style="font-weight: bold; text-decoration: none;">multiple</ins> modalities, <ins style="font-weight: bold; text-decoration: none;">where</ins> a <ins style="font-weight: bold; text-decoration: none;">"</ins>[[Modality (human–computer interaction)|modality]]<ins style="font-weight: bold; text-decoration: none;">"</ins> refers to a type of input or output, such as video, image, audio, text, [[proprioception]], etc.&lt;ref&gt;{{Cite journal |last1=Kiros |first1=Ryan |last2=Salakhutdinov |first2=Ruslan |last3=Zemel |first3=Rich |date=2014-06-18 |title=Multimodal Neural Language Models |url=https://proceedings.mlr.press/v32/kiros14.html |journal=Proceedings of the 31st International Conference on Machine Learning |publisher=PMLR |pages=595–603 |access-date=2023-07-02 |archive-date=2023-07-02 |archive-url=https://web.archive.org/web/20230702195952/https://proceedings.mlr.press/v32/kiros14.html |url-status=live }}&lt;/ref&gt; <ins style="font-weight: bold; text-decoration: none;">For</ins> <ins style="font-weight: bold; text-decoration: none;">example,</ins> <ins style="font-weight: bold; text-decoration: none;">[[Pathways</ins> <ins style="font-weight: bold; text-decoration: none;">Language</ins> <ins style="font-weight: bold; text-decoration: none;">Model|Google</ins> <ins style="font-weight: bold; text-decoration: none;">PaLM]]</ins> <ins style="font-weight: bold; text-decoration: none;">model</ins> <ins style="font-weight: bold; text-decoration: none;">was</ins> <ins style="font-weight: bold; text-decoration: none;">fine-tuned</ins> <ins style="font-weight: bold; text-decoration: none;">into</ins> <ins style="font-weight: bold; text-decoration: none;">a</ins> <ins style="font-weight: bold; text-decoration: none;">multimodal model</ins> and <ins style="font-weight: bold; text-decoration: none;">applied</ins> <ins style="font-weight: bold; text-decoration: none;">to</ins> [[<ins style="font-weight: bold; text-decoration: none;">Robot</ins> <ins style="font-weight: bold; text-decoration: none;">control|robotic</ins> <ins style="font-weight: bold; text-decoration: none;">control]].</ins>&lt;ref&gt;{{Cite <ins style="font-weight: bold; text-decoration: none;">arXiv</ins> |<ins style="font-weight: bold; text-decoration: none;">eprint</ins>=<ins style="font-weight: bold; text-decoration: none;">2303.03378 |class=cs.LG</ins> |first1=<ins style="font-weight: bold; text-decoration: none;">Danny</ins> |<ins style="font-weight: bold; text-decoration: none;">last1</ins>=<ins style="font-weight: bold; text-decoration: none;">Driess</ins> |first2=<ins style="font-weight: bold; text-decoration: none;">Fei |last2=Xia |title=PaLM-E: An Embodied Multimodal Language Model |date=2023-03-01</ins> |last3=<ins style="font-weight: bold; text-decoration: none;">Sajjadi</ins> |first3=<ins style="font-weight: bold; text-decoration: none;">Mehdi</ins> <ins style="font-weight: bold; text-decoration: none;">S. M.</ins> |<ins style="font-weight: bold; text-decoration: none;">last4</ins>=<ins style="font-weight: bold; text-decoration: none;">Lynch</ins> |<ins style="font-weight: bold; text-decoration: none;">first4</ins>=<ins style="font-weight: bold; text-decoration: none;">Corey</ins> <ins style="font-weight: bold; text-decoration: none;">|last5=Chowdhery</ins> <ins style="font-weight: bold; text-decoration: none;">|first5=Aakanksha</ins> <ins style="font-weight: bold; text-decoration: none;">|last6=Ichter</ins> <ins style="font-weight: bold; text-decoration: none;">|first6=Brian</ins> <ins style="font-weight: bold; text-decoration: none;">|last7=Wahid</ins> <ins style="font-weight: bold; text-decoration: none;">|first7=Ayzaan</ins> |<ins style="font-weight: bold; text-decoration: none;">last8</ins>=<ins style="font-weight: bold; text-decoration: none;">Tompson</ins> |<ins style="font-weight: bold; text-decoration: none;">first8</ins>=<ins style="font-weight: bold; text-decoration: none;">Jonathan</ins> <ins style="font-weight: bold; text-decoration: none;">|last9=Vuong</ins> <ins style="font-weight: bold; text-decoration: none;">|first9=Quan</ins> <ins style="font-weight: bold; text-decoration: none;">|last10=Yu</ins> <ins style="font-weight: bold; text-decoration: none;">|first10=Tianhe</ins> <ins style="font-weight: bold; text-decoration: none;">|last11=Huang</ins> |<ins style="font-weight: bold; text-decoration: none;">first11</ins>=<ins style="font-weight: bold; text-decoration: none;">Wenlong</ins> <ins style="font-weight: bold; text-decoration: none;">|last12=Chebotar</ins> <ins style="font-weight: bold; text-decoration: none;">|first12=Yevgen</ins> |<ins style="font-weight: bold; text-decoration: none;">last13</ins>=<ins style="font-weight: bold; text-decoration: none;">Sermanet</ins> |<ins style="font-weight: bold; text-decoration: none;">first13</ins>=<ins style="font-weight: bold; text-decoration: none;">Pierre</ins> |<ins style="font-weight: bold; text-decoration: none;">last14</ins>=<ins style="font-weight: bold; text-decoration: none;">Duckworth</ins> |<ins style="font-weight: bold; text-decoration: none;">first14</ins>=<ins style="font-weight: bold; text-decoration: none;">Daniel</ins> |<ins style="font-weight: bold; text-decoration: none;">last15</ins>=<ins style="font-weight: bold; text-decoration: none;">Levine</ins> <ins style="font-weight: bold; text-decoration: none;">|first15=Sergey</ins>}}&lt;/ref&gt; [[<ins style="font-weight: bold; text-decoration: none;">LLaMA</ins>]] <ins style="font-weight: bold; text-decoration: none;">models</ins> <ins style="font-weight: bold; text-decoration: none;">have also been turned multimodal using the tokenization method,</ins> to <ins style="font-weight: bold; text-decoration: none;">allow image inputs</ins>,&lt;ref&gt;{{Cite <ins style="font-weight: bold; text-decoration: none;">arXiv</ins> |<ins style="font-weight: bold; text-decoration: none;">eprint</ins>=<ins style="font-weight: bold; text-decoration: none;">2304.08485 |class=cs.CV</ins> |first1=<ins style="font-weight: bold; text-decoration: none;">Haotian</ins> |<ins style="font-weight: bold; text-decoration: none;">last1</ins>=<ins style="font-weight: bold; text-decoration: none;">Liu</ins> |first2=<ins style="font-weight: bold; text-decoration: none;">Chunyuan |last2=Li |title=Visual Instruction Tuning |date=2023-04-01</ins> |last3=<ins style="font-weight: bold; text-decoration: none;">Wu</ins> |first3=<ins style="font-weight: bold; text-decoration: none;">Qingyang</ins> |last4=<ins style="font-weight: bold; text-decoration: none;">Lee</ins> |first4=<ins style="font-weight: bold; text-decoration: none;">Yong Jae}}&lt;/ref&gt; and video inputs.&lt;ref&gt;{{Cite arXiv</ins> |<ins style="font-weight: bold; text-decoration: none;">eprint</ins>=<ins style="font-weight: bold; text-decoration: none;">2306.02858</ins> |<ins style="font-weight: bold; text-decoration: none;">class</ins>=<ins style="font-weight: bold; text-decoration: none;">cs.CL</ins> |<ins style="font-weight: bold; text-decoration: none;">first1</ins>=<ins style="font-weight: bold; text-decoration: none;">Hang</ins> |<ins style="font-weight: bold; text-decoration: none;">last1</ins>=<ins style="font-weight: bold; text-decoration: none;">Zhang</ins> <ins style="font-weight: bold; text-decoration: none;">|first2=Xin</ins> |<ins style="font-weight: bold; text-decoration: none;">last2</ins>=<ins style="font-weight: bold; text-decoration: none;">Li</ins> |<ins style="font-weight: bold; text-decoration: none;">title</ins>=<ins style="font-weight: bold; text-decoration: none;">Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding</ins> |date=<ins style="font-weight: bold; text-decoration: none;">2023-06-01 |last3=Bing |first3=Lidong}}&lt;/ref&gt; [[GPT-4o]] can process and generate text, audio and images.&lt;ref&gt;{{Cite news |date=2024-05-13</ins> |title=<ins style="font-weight: bold; text-decoration: none;">OpenAI</ins> <ins style="font-weight: bold; text-decoration: none;">says</ins> <ins style="font-weight: bold; text-decoration: none;">natively</ins> <ins style="font-weight: bold; text-decoration: none;">multimodal GPT-4o eats text, visuals, sound – and emits the same</ins> |url=https://<ins style="font-weight: bold; text-decoration: none;">www</ins>.<ins style="font-weight: bold; text-decoration: none;">theregister</ins>.com/<ins style="font-weight: bold; text-decoration: none;">2024</ins>/<ins style="font-weight: bold; text-decoration: none;">05</ins>/<ins style="font-weight: bold; text-decoration: none;">13/openai_gpt4o/ |work=The Register}}&lt;/ref&gt; Such models are sometimes called large multimodal models (LMMs)</ins>.<ins style="font-weight: bold; text-decoration: none;">&lt;ref&gt;{{Cite web</ins> |<ins style="font-weight: bold; text-decoration: none;">last</ins>=<ins style="font-weight: bold; text-decoration: none;">Zia</ins> |<ins style="font-weight: bold; text-decoration: none;">first</ins>=<ins style="font-weight: bold; text-decoration: none;">Dr Tehseen</ins> |date=<ins style="font-weight: bold; text-decoration: none;">2024</ins>-<ins style="font-weight: bold; text-decoration: none;">01</ins>-<ins style="font-weight: bold; text-decoration: none;">08</ins> |<ins style="font-weight: bold; text-decoration: none;">title</ins>=<ins style="font-weight: bold; text-decoration: none;">Unveiling of Large Multimodal Models: Shaping the Landscape of Language Models in 2024</ins> |url=https://<ins style="font-weight: bold; text-decoration: none;">www</ins>.<ins style="font-weight: bold; text-decoration: none;">unite</ins>.<ins style="font-weight: bold; text-decoration: none;">ai</ins>/<ins style="font-weight: bold; text-decoration: none;">unveiling-of-large-multimodal-models-shaping-the-landscape-of-language-models-in-2024</ins>/ |<ins style="font-weight: bold; text-decoration: none;">access</ins>-<ins style="font-weight: bold; text-decoration: none;">date</ins>=<ins style="font-weight: bold; text-decoration: none;">2025-05-30</ins> <ins style="font-weight: bold; text-decoration: none;">|website=Unite.AI |language=en-US</ins>}}&lt;/ref&gt; </div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>A common method to create multimodal models out of an LLM is to "tokenize" the output of a trained encoder. Concretely, one can construct an LLM that can understand images as follows: take a trained LLM, and take a trained image encoder &lt;math&gt;E&lt;/math&gt;. Make a small multilayered perceptron &lt;math&gt;f&lt;/math&gt;, so that for any image &lt;math&gt;y&lt;/math&gt;, the post-processed vector &lt;math&gt;f(E(y))&lt;/math&gt; has the same dimensions as an encoded token. That is an "image token". Then, one can interleave text tokens and image tokens. The compound model is then fine-tuned on an image-text dataset. This basic construction can be applied with more sophistication to improve the model. The image encoder may be frozen to improve stability.&lt;ref&gt;{{Cite arXiv |last1=Li |first1=Junnan |last2=Li |first2=Dongxu |last3=Savarese |first3=Silvio |last4=Hoi |first4=Steven |date=2023-01-01 |title=BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models |class=cs.CV |eprint=2301.12597 }}&lt;/ref&gt;</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>A common method to create multimodal models out of an LLM is to "tokenize" the output of a trained encoder. Concretely, one can construct an LLM that can understand images as follows: take a trained LLM, and take a trained image encoder &lt;math&gt;E&lt;/math&gt;. Make a small multilayered perceptron &lt;math&gt;f&lt;/math&gt;, so that for any image &lt;math&gt;y&lt;/math&gt;, the post-processed vector &lt;math&gt;f(E(y))&lt;/math&gt; has the same dimensions as an encoded token. That is an "image token". Then, one can interleave text tokens and image tokens. The compound model is then fine-tuned on an image-text dataset. This basic construction can be applied with more sophistication to improve the model. The image encoder may be frozen to improve stability.&lt;ref&gt;{{Cite arXiv |last1=Li |first1=Junnan |last2=Li |first2=Dongxu |last3=Savarese |first3=Silvio |last4=Hoi |first4=Steven |date=2023-01-01 |title=BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models |class=cs.CV |eprint=2301.12597 <ins style="font-weight: bold; text-decoration: none;">}}&lt;/ref&gt; The model Flamingo demonstrated in 2022 the effectiveness of the tokenization method, fine-tuning a pair of pretrained language model and image encoder to perform better on visual question answering than models trained from scratch.&lt;ref&gt;{{Cite journal |last1=Alayrac |first1=Jean-Baptiste |last2=Donahue |first2=Jeff |last3=Luc |first3=Pauline |last4=Miech |first4=Antoine |last5=Barr |first5=Iain |last6=Hasson |first6=Yana |last7=Lenc |first7=Karel |last8=Mensch |first8=Arthur |last9=Millican |first9=Katherine |last10=Reynolds |first10=Malcolm |last11=Ring |first11=Roman |last12=Rutherford |first12=Eliza |last13=Cabi |first13=Serkan |last14=Han |first14=Tengda |last15=Gong |first15=Zhitao |date=2022-12-06 |title=Flamingo: a Visual Language Model for Few-Shot Learning |url=https://proceedings.neurips.cc/paper_files/paper/2022/hash/960a172bc7fbf0177ccccbb411a7d800-Abstract-Conference.html |url-status=live |journal=Advances in Neural Information Processing Systems |volume=35 |pages=23716–23736 |arxiv=2204.14198 |archive-url=https://web.archive.org/web/20230702195951/https://proceedings.neurips.cc/paper_files/paper/2022/hash/960a172bc7fbf0177ccccbb411a7d800-Abstract-Conference.html |archive-date=2023-07-02 |access-date=2023-07-02</ins>}}&lt;/ref&gt;</div></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><br /></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Flamingo demonstrated the effectiveness of the tokenization method, finetuning a pair of pretrained language model and image encoder to perform better on visual question answering than models trained from scratch.&lt;ref&gt;{{Cite journal |last1=Alayrac |first1=Jean-Baptiste |last2=Donahue |first2=Jeff |last3=Luc |first3=Pauline |last4=Miech |first4=Antoine |last5=Barr |first5=Iain |last6=Hasson |first6=Yana |last7=Lenc |first7=Karel |last8=Mensch |first8=Arthur |last9=Millican |first9=Katherine |last10=Reynolds |first10=Malcolm |last11=Ring |first11=Roman |last12=Rutherford |first12=Eliza |last13=Cabi |first13=Serkan |last14=Han |first14=Tengda |last15=Gong |first15=Zhitao |date=2022-12-06 |title=Flamingo: a Visual Language Model for Few-Shot Learning |url=https://proceedings.neurips.cc/paper_files/paper/2022/hash/960a172bc7fbf0177ccccbb411a7d800-Abstract-Conference.html |journal=Advances in Neural Information Processing Systems |volume=35 |pages=23716–23736 |arxiv=2204.14198 |access-date=2023-07-02 |archive-date=2023-07-02 |archive-url=https://web.archive.org/web/20230702195951/https://proceedings.neurips.cc/paper_files/paper/2022/hash/960a172bc7fbf0177ccccbb411a7d800-Abstract-Conference.html |url-status=live }}&lt;/ref&gt; [[Pathways Language Model|Google PaLM]] model was fine-tuned into a multimodal model PaLM-E using the tokenization method, and applied to robotic control.&lt;ref&gt;{{Cite arXiv |last1=Driess |first1=Danny |last2=Xia |first2=Fei |last3=Sajjadi |first3=Mehdi S. M. |last4=Lynch |first4=Corey |last5=Chowdhery |first5=Aakanksha |last6=Ichter |first6=Brian |last7=Wahid |first7=Ayzaan |last8=Tompson |first8=Jonathan |last9=Vuong |first9=Quan |last10=Yu |first10=Tianhe |last11=Huang |first11=Wenlong |last12=Chebotar |first12=Yevgen |last13=Sermanet |first13=Pierre |last14=Duckworth |first14=Daniel |last15=Levine |first15=Sergey |date=2023-03-01 |title=PaLM-E: An Embodied Multimodal Language Model |class=cs.LG |eprint=2303.03378 }}&lt;/ref&gt; [[LLaMA]] models have also been turned multimodal using the tokenization method, to allow image inputs,&lt;ref&gt;{{Cite arXiv|last1=Liu |first1=Haotian |last2=Li |first2=Chunyuan |last3=Wu |first3=Qingyang |last4=Lee |first4=Yong Jae |date=2023-04-01 |title=Visual Instruction Tuning |class=cs.CV |eprint=2304.08485 }}&lt;/ref&gt; and video inputs.&lt;ref&gt;{{Cite arXiv|last1=Zhang |first1=Hang |last2=Li |first2=Xin |last3=Bing |first3=Lidong |date=2023-06-01 |title=Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding |class=cs.CL |eprint=2306.02858 }}&lt;/ref&gt;</div></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><br /></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>[[GPT-4]] can use both text and image as inputs&lt;ref&gt;{{Cite arXiv |eprint=2303.08774 |class=cs.CL |last=OpenAI |title=GPT-4 Technical Report |date=2023-03-27}}&lt;/ref&gt; (although the vision component was not released to the public until GPT-4V&lt;ref&gt;{{Cite web |last=OpenAI |date=September 25, 2023 |title=GPT-4V(ision) System Card |url=https://cdn.openai.com/papers/GPTV_System_Card.pdf}}&lt;/ref&gt;); [[Google DeepMind]]'s [[Gemini (language model)|Gemini]] is also multimodal.&lt;ref&gt;{{Citation |last=Pichai |first=Sundar |title=Google Keynote (Google I/O '23) |date=10 May 2023 |url=https://www.youtube.com/watch?v=cNfINi5CNbY&amp;t=931s |access-date=2023-07-02 |at=timestamp 15:31 }}&lt;/ref&gt; &lt;!-- update this in 2024 --&gt; Mistral introduced its own multimodal Pixtral 12B model in September 2024.&lt;ref&gt;{{cite web |last1=Wiggers |first1=Kyle |title=Mistral releases Pixtral 12B, its first multimodal model |url=https://techcrunch.com/2024/09/11/mistral-releases-pixtral-its-first-multimodal-model/?utm_medium=aisecret.us&amp;utm_source=aisecret.us&amp;utm_campaign=aisecret.us |website=TechCrunch |access-date=14 September 2024 |date=11 September 2024}}&lt;/ref&gt;</div></td> <td colspan="2" class="diff-empty diff-side-added"></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Reasoning ==</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Reasoning ==</div></td> </tr> </table> Alenoach https://en.wikipedia.org/w/index.php?title=Large_language_model&diff=1293084568&oldid=prev Alenoach: more relevant link in this context 2025-05-30T15:50:20Z <p>more relevant link in this context</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 15:50, 30 May 2025</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 115:</td> <td colspan="2" class="diff-lineno">Line 115:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>As technology advanced, large sums have been invested in increasingly large models. For example, training of the GPT-2 (i.e. a 1.5-billion-parameters model) in 2019 cost $50,000, while training of the PaLM (i.e. a 540-billion-parameters model) in 2022 cost $8 million, and Megatron-Turing NLG 530B (in 2021) cost around $11 million.&lt;ref&gt;{{Citation |last1=Maslej |first1=Nestor |title=Artificial Intelligence Index Report 2023 |date=2023-10-05 |arxiv=2310.03715 |last2=Fattorini |first2=Loredana |last3=Brynjolfsson |first3=Erik |last4=Etchemendy |first4=John |last5=Ligett |first5=Katrina |last6=Lyons |first6=Terah |last7=Manyika |first7=James |last8=Ngo |first8=Helen |last9=Niebles |first9=Juan Carlos}}&lt;/ref&gt;</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>As technology advanced, large sums have been invested in increasingly large models. For example, training of the GPT-2 (i.e. a 1.5-billion-parameters model) in 2019 cost $50,000, while training of the PaLM (i.e. a 540-billion-parameters model) in 2022 cost $8 million, and Megatron-Turing NLG 530B (in 2021) cost around $11 million.&lt;ref&gt;{{Citation |last1=Maslej |first1=Nestor |title=Artificial Intelligence Index Report 2023 |date=2023-10-05 |arxiv=2310.03715 |last2=Fattorini |first2=Loredana |last3=Brynjolfsson |first3=Erik |last4=Etchemendy |first4=John |last5=Ligett |first5=Katrina |last6=Lyons |first6=Terah |last7=Manyika |first7=James |last8=Ngo |first8=Helen |last9=Niebles |first9=Juan Carlos}}&lt;/ref&gt;</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>For Transformer-based LLM, training cost is much higher than [[<del style="font-weight: bold; text-decoration: none;">Inference</del>|inference]] cost. It costs 6 [[FLOPS|FLOPs]] per parameter to train on one token, whereas it costs 1 to 2 FLOPs per parameter to infer on one token.&lt;ref name="kaplan-scaling"&gt;Section 2.1 and Table 1,</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>For Transformer-based LLM, training cost is much higher than [[<ins style="font-weight: bold; text-decoration: none;">Statistical inference</ins>|inference]] cost. It costs 6 [[FLOPS|FLOPs]] per parameter to train on one token, whereas it costs 1 to 2 FLOPs per parameter to infer on one token.&lt;ref name="kaplan-scaling"&gt;Section 2.1 and Table 1,</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{Cite arXiv |eprint=2001.08361 |class=cs.LG |first1=Jared |last1=Kaplan |first2=Sam |last2=McCandlish |title=Scaling Laws for Neural Language Models |last3=Henighan |first3=Tom |last4=Brown |first4=Tom B. |last5=Chess |first5=Benjamin |last6=Child |first6=Rewon |last7=Gray |first7=Scott |last8=Radford |first8=Alec |last9=Wu |first9=Jeffrey |last10=Amodei |first10=Dario |year=2020}}&lt;/ref&gt;</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{Cite arXiv |eprint=2001.08361 |class=cs.LG |first1=Jared |last1=Kaplan |first2=Sam |last2=McCandlish |title=Scaling Laws for Neural Language Models |last3=Henighan |first3=Tom |last4=Brown |first4=Tom B. |last5=Chess |first5=Benjamin |last6=Child |first6=Rewon |last7=Gray |first7=Scott |last8=Radford |first8=Alec |last9=Wu |first9=Jeffrey |last10=Amodei |first10=Dario |year=2020}}&lt;/ref&gt;</div></td> </tr> </table> Alenoach https://en.wikipedia.org/w/index.php?title=Large_language_model&diff=1293074874&oldid=prev D4n2016: /* Training cost */ added wikilink for first "inference" mentioning in article 2025-05-30T14:40:12Z <p><span class="autocomment">Training cost: </span> added wikilink for first &quot;inference&quot; mentioning in article</p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 14:40, 30 May 2025</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 115:</td> <td colspan="2" class="diff-lineno">Line 115:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>As technology advanced, large sums have been invested in increasingly large models. For example, training of the GPT-2 (i.e. a 1.5-billion-parameters model) in 2019 cost $50,000, while training of the PaLM (i.e. a 540-billion-parameters model) in 2022 cost $8 million, and Megatron-Turing NLG 530B (in 2021) cost around $11 million.&lt;ref&gt;{{Citation |last1=Maslej |first1=Nestor |title=Artificial Intelligence Index Report 2023 |date=2023-10-05 |arxiv=2310.03715 |last2=Fattorini |first2=Loredana |last3=Brynjolfsson |first3=Erik |last4=Etchemendy |first4=John |last5=Ligett |first5=Katrina |last6=Lyons |first6=Terah |last7=Manyika |first7=James |last8=Ngo |first8=Helen |last9=Niebles |first9=Juan Carlos}}&lt;/ref&gt;</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>As technology advanced, large sums have been invested in increasingly large models. For example, training of the GPT-2 (i.e. a 1.5-billion-parameters model) in 2019 cost $50,000, while training of the PaLM (i.e. a 540-billion-parameters model) in 2022 cost $8 million, and Megatron-Turing NLG 530B (in 2021) cost around $11 million.&lt;ref&gt;{{Citation |last1=Maslej |first1=Nestor |title=Artificial Intelligence Index Report 2023 |date=2023-10-05 |arxiv=2310.03715 |last2=Fattorini |first2=Loredana |last3=Brynjolfsson |first3=Erik |last4=Etchemendy |first4=John |last5=Ligett |first5=Katrina |last6=Lyons |first6=Terah |last7=Manyika |first7=James |last8=Ngo |first8=Helen |last9=Niebles |first9=Juan Carlos}}&lt;/ref&gt;</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>For Transformer-based LLM, training cost is much higher than inference cost. It costs 6 [[FLOPS|FLOPs]] per parameter to train on one token, whereas it costs 1 to 2 FLOPs per parameter to infer on one token.&lt;ref name="kaplan-scaling"&gt;Section 2.1 and Table 1,</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>For Transformer-based LLM, training cost is much higher than <ins style="font-weight: bold; text-decoration: none;">[[Inference|</ins>inference<ins style="font-weight: bold; text-decoration: none;">]]</ins> cost. It costs 6 [[FLOPS|FLOPs]] per parameter to train on one token, whereas it costs 1 to 2 FLOPs per parameter to infer on one token.&lt;ref name="kaplan-scaling"&gt;Section 2.1 and Table 1,</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{Cite arXiv |eprint=2001.08361 |class=cs.LG |first1=Jared |last1=Kaplan |first2=Sam |last2=McCandlish |title=Scaling Laws for Neural Language Models |last3=Henighan |first3=Tom |last4=Brown |first4=Tom B. |last5=Chess |first5=Benjamin |last6=Child |first6=Rewon |last7=Gray |first7=Scott |last8=Radford |first8=Alec |last9=Wu |first9=Jeffrey |last10=Amodei |first10=Dario |year=2020}}&lt;/ref&gt;</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{Cite arXiv |eprint=2001.08361 |class=cs.LG |first1=Jared |last1=Kaplan |first2=Sam |last2=McCandlish |title=Scaling Laws for Neural Language Models |last3=Henighan |first3=Tom |last4=Brown |first4=Tom B. |last5=Chess |first5=Benjamin |last6=Child |first6=Rewon |last7=Gray |first7=Scott |last8=Radford |first8=Alec |last9=Wu |first9=Jeffrey |last10=Amodei |first10=Dario |year=2020}}&lt;/ref&gt;</div></td> </tr> </table> D4n2016 https://en.wikipedia.org/w/index.php?title=Large_language_model&diff=1293058878&oldid=prev 91.194.221.232: /* Multimodality */ 2025-05-30T12:25:22Z <p><span class="autocomment">Multimodality</span></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 12:25, 30 May 2025</td> </tr><tr> <td colspan="2" class="diff-lineno">Line 149:</td> <td colspan="2" class="diff-lineno">Line 149:</td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Multimodality ==</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Multimodality ==</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{See also|Multimodal learning}}</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>{{See also|Multimodal learning}}</div></td> </tr> <tr> <td class="diff-marker" data-marker="−"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Multimodality means "having several modalities", and a [[Modality (human–computer interaction)|"modality"]] refers to a type of input or output, such as video, image, audio, text, [[proprioception]], etc.&lt;ref&gt;{{Cite journal |last1=Kiros |first1=Ryan |last2=Salakhutdinov |first2=Ruslan |last3=Zemel |first3=Rich |date=2014-06-18 |title=Multimodal Neural Language Models |url=https://proceedings.mlr.press/v32/kiros14.html |journal=Proceedings of the 31st International Conference on Machine Learning |publisher=PMLR |pages=595–603 |access-date=2023-07-02 |archive-date=2023-07-02 |archive-url=https://web.archive.org/web/20230702195952/https://proceedings.mlr.press/v32/kiros14.html |url-status=live }}&lt;/ref&gt; There have been many AI models trained specifically to ingest one modality and output another modality, such as [[AlexNet]] for image to label,&lt;ref&gt;{{Cite journal |last1=Krizhevsky |first1=Alex |last2=Sutskever |first2=Ilya |last3=Hinton |first3=Geoffrey E |date=2012 |title=ImageNet Classification with Deep Convolutional Neural Networks |url=https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=25 |access-date=2023-07-02 |archive-date=2023-07-02 |archive-url=https://web.archive.org/web/20230702195952/https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html |url-status=live }}&lt;/ref&gt; [[visual question answering]] for image-text to text,&lt;ref&gt;{{Cite journal |last1=Antol |first1=Stanislaw |last2=Agrawal |first2=Aishwarya |last3=Lu |first3=Jiasen |last4=Mitchell |first4=Margaret |last5=Batra |first5=Dhruv |last6=Zitnick |first6=C. Lawrence |last7=Parikh |first7=Devi |date=2015 |title=VQA: Visual Question Answering |url=https://openaccess.thecvf.com/content_iccv_2015/html/Antol_VQA_Visual_Question_ICCV_2015_paper.html |journal=ICCV |pages=2425–2433 |access-date=2023-07-02 |archive-date=2023-07-02 |archive-url=https://web.archive.org/web/20230702195952/https://openaccess.thecvf.com/content_iccv_2015/html/Antol_VQA_Visual_Question_ICCV_2015_paper.html |url-status=live }}&lt;/ref&gt; and [[speech recognition]] for speech to text.</div></td> <td class="diff-marker" data-marker="+"></td> <td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Multimodality<ins style="font-weight: bold; text-decoration: none;"> (LxM)</ins> means "having several modalities", and a [[Modality (human–computer interaction)|"modality"]] refers to a type of input or output, such as video, image, audio, text, [[proprioception]], etc.&lt;ref&gt;{{Cite journal |last1=Kiros |first1=Ryan |last2=Salakhutdinov |first2=Ruslan |last3=Zemel |first3=Rich |date=2014-06-18 |title=Multimodal Neural Language Models |url=https://proceedings.mlr.press/v32/kiros14.html |journal=Proceedings of the 31st International Conference on Machine Learning |publisher=PMLR |pages=595–603 |access-date=2023-07-02 |archive-date=2023-07-02 |archive-url=https://web.archive.org/web/20230702195952/https://proceedings.mlr.press/v32/kiros14.html |url-status=live }}&lt;/ref&gt; There have been many AI models trained specifically to ingest one modality and output another modality, such as [[AlexNet]] for image to label,&lt;ref&gt;{{Cite journal |last1=Krizhevsky |first1=Alex |last2=Sutskever |first2=Ilya |last3=Hinton |first3=Geoffrey E |date=2012 |title=ImageNet Classification with Deep Convolutional Neural Networks |url=https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=25 |access-date=2023-07-02 |archive-date=2023-07-02 |archive-url=https://web.archive.org/web/20230702195952/https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html |url-status=live }}&lt;/ref&gt; [[visual question answering]] for image-text to text,&lt;ref&gt;{{Cite journal |last1=Antol |first1=Stanislaw |last2=Agrawal |first2=Aishwarya |last3=Lu |first3=Jiasen |last4=Mitchell |first4=Margaret |last5=Batra |first5=Dhruv |last6=Zitnick |first6=C. Lawrence |last7=Parikh |first7=Devi |date=2015 |title=VQA: Visual Question Answering |url=https://openaccess.thecvf.com/content_iccv_2015/html/Antol_VQA_Visual_Question_ICCV_2015_paper.html |journal=ICCV |pages=2425–2433 |access-date=2023-07-02 |archive-date=2023-07-02 |archive-url=https://web.archive.org/web/20230702195952/https://openaccess.thecvf.com/content_iccv_2015/html/Antol_VQA_Visual_Question_ICCV_2015_paper.html |url-status=live }}&lt;/ref&gt; and [[speech recognition]] for speech to text.</div></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td> </tr> <tr> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A common method to create multimodal models out of an LLM is to "tokenize" the output of a trained encoder. Concretely, one can construct an LLM that can understand images as follows: take a trained LLM, and take a trained image encoder &lt;math&gt;E&lt;/math&gt;. Make a small multilayered perceptron &lt;math&gt;f&lt;/math&gt;, so that for any image &lt;math&gt;y&lt;/math&gt;, the post-processed vector &lt;math&gt;f(E(y))&lt;/math&gt; has the same dimensions as an encoded token. That is an "image token". Then, one can interleave text tokens and image tokens. The compound model is then fine-tuned on an image-text dataset. This basic construction can be applied with more sophistication to improve the model. The image encoder may be frozen to improve stability.&lt;ref&gt;{{Cite arXiv |last1=Li |first1=Junnan |last2=Li |first2=Dongxu |last3=Savarese |first3=Silvio |last4=Hoi |first4=Steven |date=2023-01-01 |title=BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models |class=cs.CV |eprint=2301.12597 }}&lt;/ref&gt;</div></td> <td class="diff-marker"></td> <td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A common method to create multimodal models out of an LLM is to "tokenize" the output of a trained encoder. Concretely, one can construct an LLM that can understand images as follows: take a trained LLM, and take a trained image encoder &lt;math&gt;E&lt;/math&gt;. Make a small multilayered perceptron &lt;math&gt;f&lt;/math&gt;, so that for any image &lt;math&gt;y&lt;/math&gt;, the post-processed vector &lt;math&gt;f(E(y))&lt;/math&gt; has the same dimensions as an encoded token. That is an "image token". Then, one can interleave text tokens and image tokens. The compound model is then fine-tuned on an image-text dataset. This basic construction can be applied with more sophistication to improve the model. The image encoder may be frozen to improve stability.&lt;ref&gt;{{Cite arXiv |last1=Li |first1=Junnan |last2=Li |first2=Dongxu |last3=Savarese |first3=Silvio |last4=Hoi |first4=Steven |date=2023-01-01 |title=BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models |class=cs.CV |eprint=2301.12597 }}&lt;/ref&gt;</div></td> </tr> </table> 91.194.221.232