https://en.wikipedia.org/w/index.php?action=history&feed=atom&title=Variational_autoencoder
Variational autoencoder - Revision history
2025-06-01T21:52:47Z
Revision history for this page on the wiki
MediaWiki 1.45.0-wmf.3
https://en.wikipedia.org/w/index.php?title=Variational_autoencoder&diff=1292165124&oldid=prev
OAbot: Open access bot: url-access updated in citation with #oabot.
2025-05-25T14:55:18Z
<p><a href="/wiki/Wikipedia:OABOT" class="mw-redirect" title="Wikipedia:OABOT">Open access bot</a>: url-access updated in citation with #oabot.</p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 14:55, 25 May 2025</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 12:</td>
<td colspan="2" class="diff-lineno">Line 12:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Thus, the encoder maps each point (such as an image) from a large complex dataset into a distribution within the latent space, rather than to a single point in that space. The decoder has the opposite function, which is to map from the latent space to the input space, again according to a distribution (although in practice, noise is rarely added during the decoding stage). By mapping a point to a distribution instead of a single point, the network can avoid overfitting the training data. Both networks are typically trained together with the usage of the [[#Reparameterization|reparameterization trick]], although the variance of the noise model can be learned separately.{{cn|date=June 2024}}</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Thus, the encoder maps each point (such as an image) from a large complex dataset into a distribution within the latent space, rather than to a single point in that space. The decoder has the opposite function, which is to map from the latent space to the input space, again according to a distribution (although in practice, noise is rarely added during the decoding stage). By mapping a point to a distribution instead of a single point, the network can avoid overfitting the training data. Both networks are typically trained together with the usage of the [[#Reparameterization|reparameterization trick]], although the variance of the noise model can be learned separately.{{cn|date=June 2024}}</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Although this type of model was initially designed for [[unsupervised learning]],<ref>{{cite arXiv |last1=Dilokthanakul |first1=Nat |last2=Mediano |first2=Pedro A. M. |last3=Garnelo |first3=Marta |last4=Lee |first4=Matthew C. H. |last5=Salimbeni |first5=Hugh |last6=Arulkumaran |first6=Kai |last7=Shanahan |first7=Murray |title=Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders |date=2017-01-13 |class=cs.LG |eprint=1611.02648}}</ref><ref>{{cite book |last1=Hsu |first1=Wei-Ning |last2=Zhang |first2=Yu |last3=Glass |first3=James |title=2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) |chapter=Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation |date=December 2017 |pages=16–23 |doi=10.1109/ASRU.2017.8268911 |arxiv=1707.06265 |isbn=978-1-5090-4788-8 |s2cid=22681625 |chapter-url=https://ieeexplore.ieee.org/document/8268911}}</ref> its effectiveness has been proven for [[semi-supervised learning]]<ref>{{cite book |last1=Ehsan Abbasnejad |first1=M. |last2=Dick |first2=Anthony |last3=van den Hengel |first3=Anton |title=Infinite Variational Autoencoder for Semi-Supervised Learning |date=2017 |pages=5888–5897 |url=https://openaccess.thecvf.com/content_cvpr_2017/html/Abbasnejad_Infinite_Variational_Autoencoder_CVPR_2017_paper.html}}</ref><ref>{{cite journal |last1=Xu |first1=Weidi |last2=Sun |first2=Haoze |last3=Deng |first3=Chao |last4=Tan |first4=Ying |title=Variational Autoencoder for Semi-Supervised Text Classification |journal=Proceedings of the AAAI Conference on Artificial Intelligence |date=2017-02-12 |volume=31 |issue=1 |doi=10.1609/aaai.v31i1.10966 |s2cid=2060721 |url=https://ojs.aaai.org/index.php/AAAI/article/view/10966 |language=en|doi-access=free }}</ref> and [[supervised learning]].<ref>{{cite journal |last1=Kameoka |first1=Hirokazu |last2=Li |first2=Li |last3=Inoue |first3=Shota |last4=Makino |first4=Shoji |title=Supervised Determined Source Separation with Multichannel Variational Autoencoder |journal=Neural Computation |date=2019-09-01 |volume=31 |issue=9 |pages=1891–1914 |doi=10.1162/neco_a_01217 |pmid=31335290 |s2cid=198168155 |url=https://direct.mit.edu/neco/article/31/9/1891/8494/Supervised-Determined-Source-Separation-with}}</ref></div></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Although this type of model was initially designed for [[unsupervised learning]],<ref>{{cite arXiv |last1=Dilokthanakul |first1=Nat |last2=Mediano |first2=Pedro A. M. |last3=Garnelo |first3=Marta |last4=Lee |first4=Matthew C. H. |last5=Salimbeni |first5=Hugh |last6=Arulkumaran |first6=Kai |last7=Shanahan |first7=Murray |title=Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders |date=2017-01-13 |class=cs.LG |eprint=1611.02648}}</ref><ref>{{cite book |last1=Hsu |first1=Wei-Ning |last2=Zhang |first2=Yu |last3=Glass |first3=James |title=2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) |chapter=Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation |date=December 2017 |pages=16–23 |doi=10.1109/ASRU.2017.8268911 |arxiv=1707.06265 |isbn=978-1-5090-4788-8 |s2cid=22681625 |chapter-url=https://ieeexplore.ieee.org/document/8268911}}</ref> its effectiveness has been proven for [[semi-supervised learning]]<ref>{{cite book |last1=Ehsan Abbasnejad |first1=M. |last2=Dick |first2=Anthony |last3=van den Hengel |first3=Anton |title=Infinite Variational Autoencoder for Semi-Supervised Learning |date=2017 |pages=5888–5897 |url=https://openaccess.thecvf.com/content_cvpr_2017/html/Abbasnejad_Infinite_Variational_Autoencoder_CVPR_2017_paper.html}}</ref><ref>{{cite journal |last1=Xu |first1=Weidi |last2=Sun |first2=Haoze |last3=Deng |first3=Chao |last4=Tan |first4=Ying |title=Variational Autoencoder for Semi-Supervised Text Classification |journal=Proceedings of the AAAI Conference on Artificial Intelligence |date=2017-02-12 |volume=31 |issue=1 |doi=10.1609/aaai.v31i1.10966 |s2cid=2060721 |url=https://ojs.aaai.org/index.php/AAAI/article/view/10966 |language=en|doi-access=free }}</ref> and [[supervised learning]].<ref>{{cite journal |last1=Kameoka |first1=Hirokazu |last2=Li |first2=Li |last3=Inoue |first3=Shota |last4=Makino |first4=Shoji |title=Supervised Determined Source Separation with Multichannel Variational Autoencoder |journal=Neural Computation |date=2019-09-01 |volume=31 |issue=9 |pages=1891–1914 |doi=10.1162/neco_a_01217 |pmid=31335290 |s2cid=198168155 |url=https://direct.mit.edu/neco/article/31/9/1891/8494/Supervised-Determined-Source-Separation-with<ins style="font-weight: bold; text-decoration: none;">|url-access=subscription </ins>}}</ref></div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Overview of architecture and operation ==</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Overview of architecture and operation ==</div></td>
</tr>
<tr>
<td colspan="2" class="diff-lineno">Line 97:</td>
<td colspan="2" class="diff-lineno">Line 97:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Some structures directly deal with the quality of the generated samples<ref>{{Cite arXiv|last1=Dai|first1=Bin|last2=Wipf|first2=David|date=2019-10-30|title=Diagnosing and Enhancing VAE Models|class=cs.LG|eprint=1903.05789}}</ref><ref>{{Cite arXiv|last1=Dorta|first1=Garoe|last2=Vicente|first2=Sara|last3=Agapito|first3=Lourdes|last4=Campbell|first4=Neill D. F.|last5=Simpson|first5=Ivor|date=2018-07-31|title=Training VAEs Under Structured Residuals|class=stat.ML|eprint=1804.01050}}</ref> or implement more than one latent space to further improve the representation learning.</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Some structures directly deal with the quality of the generated samples<ref>{{Cite arXiv|last1=Dai|first1=Bin|last2=Wipf|first2=David|date=2019-10-30|title=Diagnosing and Enhancing VAE Models|class=cs.LG|eprint=1903.05789}}</ref><ref>{{Cite arXiv|last1=Dorta|first1=Garoe|last2=Vicente|first2=Sara|last3=Agapito|first3=Lourdes|last4=Campbell|first4=Neill D. F.|last5=Simpson|first5=Ivor|date=2018-07-31|title=Training VAEs Under Structured Residuals|class=stat.ML|eprint=1804.01050}}</ref> or implement more than one latent space to further improve the representation learning.</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Some architectures mix VAE and [[generative adversarial network]]s to obtain hybrid models.<ref>{{Cite journal|last1=Larsen|first1=Anders Boesen Lindbo|last2=Sønderby|first2=Søren Kaae|last3=Larochelle|first3=Hugo|last4=Winther|first4=Ole|date=2016-06-11|title=Autoencoding beyond pixels using a learned similarity metric|url=http://proceedings.mlr.press/v48/larsen16.html|journal=International Conference on Machine Learning|language=en|publisher=PMLR|pages=1558–1566|arxiv=1512.09300}}</ref><ref>{{cite arXiv|last1=Bao|first1=Jianmin|last2=Chen|first2=Dong|last3=Wen|first3=Fang|last4=Li|first4=Houqiang|last5=Hua|first5=Gang|date=2017|title=CVAE-GAN: Fine-Grained Image Generation Through Asymmetric Training|pages=2745–2754|class=cs.CV|eprint=1703.10155}}</ref><ref>{{Cite journal|last1=Gao|first1=Rui|last2=Hou|first2=Xingsong|last3=Qin|first3=Jie|last4=Chen|first4=Jiaxin|last5=Liu|first5=Li|last6=Zhu|first6=Fan|last7=Zhang|first7=Zhao|last8=Shao|first8=Ling|date=2020|title=Zero-VAE-GAN: Generating Unseen Features for Generalized and Transductive Zero-Shot Learning|url=https://ieeexplore.ieee.org/document/8957359|journal=IEEE Transactions on Image Processing|volume=29|pages=3665–3680|doi=10.1109/TIP.2020.2964429|pmid=31940538|bibcode=2020ITIP...29.3665G|s2cid=210334032|issn=1941-0042}}</ref></div></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Some architectures mix VAE and [[generative adversarial network]]s to obtain hybrid models.<ref>{{Cite journal|last1=Larsen|first1=Anders Boesen Lindbo|last2=Sønderby|first2=Søren Kaae|last3=Larochelle|first3=Hugo|last4=Winther|first4=Ole|date=2016-06-11|title=Autoencoding beyond pixels using a learned similarity metric|url=http://proceedings.mlr.press/v48/larsen16.html|journal=International Conference on Machine Learning|language=en|publisher=PMLR|pages=1558–1566|arxiv=1512.09300}}</ref><ref>{{cite arXiv|last1=Bao|first1=Jianmin|last2=Chen|first2=Dong|last3=Wen|first3=Fang|last4=Li|first4=Houqiang|last5=Hua|first5=Gang|date=2017|title=CVAE-GAN: Fine-Grained Image Generation Through Asymmetric Training|pages=2745–2754|class=cs.CV|eprint=1703.10155}}</ref><ref>{{Cite journal|last1=Gao|first1=Rui|last2=Hou|first2=Xingsong|last3=Qin|first3=Jie|last4=Chen|first4=Jiaxin|last5=Liu|first5=Li|last6=Zhu|first6=Fan|last7=Zhang|first7=Zhao|last8=Shao|first8=Ling|date=2020|title=Zero-VAE-GAN: Generating Unseen Features for Generalized and Transductive Zero-Shot Learning|url=https://ieeexplore.ieee.org/document/8957359|journal=IEEE Transactions on Image Processing|volume=29|pages=3665–3680|doi=10.1109/TIP.2020.2964429|pmid=31940538|bibcode=2020ITIP...29.3665G|s2cid=210334032|issn=1941-0042<ins style="font-weight: bold; text-decoration: none;">|url-access=subscription</ins>}}</ref></div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>It is not necessary to use gradients to update the encoder. In fact, the encoder is not necessary for the generative model. <ref>{{cite book | last1=Drefs | first1=J. | last2=Guiraud | first2=E. | last3=Panagiotou | first3=F. | last4=Lücke | first4=J. | chapter=Direct evolutionary optimization of variational autoencoders with binary latents | title=Joint European Conference on Machine Learning and Knowledge Discovery in Databases | series=Lecture Notes in Computer Science | pages=357–372 | year=2023 | volume=13715 | publisher=Springer Nature Switzerland | doi=10.1007/978-3-031-26409-2_22 | isbn=978-3-031-26408-5 | chapter-url=https://link.springer.com/chapter/10.1007/978-3-031-26409-2_22 }}</ref></div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>It is not necessary to use gradients to update the encoder. In fact, the encoder is not necessary for the generative model. <ref>{{cite book | last1=Drefs | first1=J. | last2=Guiraud | first2=E. | last3=Panagiotou | first3=F. | last4=Lücke | first4=J. | chapter=Direct evolutionary optimization of variational autoencoders with binary latents | title=Joint European Conference on Machine Learning and Knowledge Discovery in Databases | series=Lecture Notes in Computer Science | pages=357–372 | year=2023 | volume=13715 | publisher=Springer Nature Switzerland | doi=10.1007/978-3-031-26409-2_22 | isbn=978-3-031-26408-5 | chapter-url=https://link.springer.com/chapter/10.1007/978-3-031-26409-2_22 }}</ref></div></td>
</tr>
</table>
OAbot
https://en.wikipedia.org/w/index.php?title=Variational_autoencoder&diff=1288065409&oldid=prev
Herbmuell: /* Evidence lower bound (ELBO) */ x|z to be consistent with z|x just below.
2025-04-30T05:53:06Z
<p><span class="autocomment">Evidence lower bound (ELBO): </span> x|z to be consistent with z|x just below.</p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 05:53, 30 April 2025</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 69:</td>
<td colspan="2" class="diff-lineno">Line 69:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= \ln p_\theta(x) - D_{KL}(q_\phi({\cdot| x})\parallel p_\theta({\cdot | x})) </math>Maximizing the ELBO<math display="block">\theta^*,\phi^* = \underset{\theta,\phi}\operatorname{arg max} \, L_{\theta,\phi}(x) </math>is equivalent to simultaneously maximizing <math>\ln p_\theta(x) </math> and minimizing <math> D_{KL}(q_\phi({z| x})\parallel p_\theta({z| x})) </math>. That is, maximizing the log-likelihood of the observed data, and minimizing the divergence of the approximate posterior <math>q_\phi(\cdot | x) </math> from the exact posterior <math>p_\theta(\cdot | x) </math>.</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= \ln p_\theta(x) - D_{KL}(q_\phi({\cdot| x})\parallel p_\theta({\cdot | x})) </math>Maximizing the ELBO<math display="block">\theta^*,\phi^* = \underset{\theta,\phi}\operatorname{arg max} \, L_{\theta,\phi}(x) </math>is equivalent to simultaneously maximizing <math>\ln p_\theta(x) </math> and minimizing <math> D_{KL}(q_\phi({z| x})\parallel p_\theta({z| x})) </math>. That is, maximizing the log-likelihood of the observed data, and minimizing the divergence of the approximate posterior <math>q_\phi(\cdot | x) </math> from the exact posterior <math>p_\theta(\cdot | x) </math>.</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The form given is not very convenient for maximization, but the following, equivalent form, is:<math display="block">L_{\theta,\phi}(x) = \mathbb E_{z \sim q_\phi(\cdot | x)} \left[\ln p_\theta(x|z)\right] - D_{KL}(q_\phi({\cdot| x})\parallel p_\theta(\cdot)) </math>where <math>\ln p_\theta(x|z)</math> is implemented as <math>-\frac{1}{2}\| x - D_\theta(z)\|^2_2</math>, since that is, up to an additive constant, what <math>x \sim \mathcal N(D_\theta(z), I)</math> yields. That is, we model the distribution of <math>x</math> conditional on <math>z</math> to be a Gaussian distribution centered on <math>D_\theta(z)</math>. The distribution of <math>q_\phi(z |x)</math> and <math>p_\theta(z)</math> are often also chosen to be Gaussians as <math>z|x \sim \mathcal N(E_\phi(x), \sigma_\phi(x)^2I)</math> and <math>z \sim \mathcal N(0, I)</math>, with which we obtain by the formula for [[Kullback–Leibler divergence#Multivariate normal distributions|KL divergence of Gaussians]]:<math display="block">L_{\theta,\phi}(x) = -\frac 12\mathbb E_{z \sim q_\phi(\cdot | x)} \left[ \|x - D_\theta(z)\|_2^2\right] - \frac 12 \left( N\sigma_\phi(x)^2 + \|E_\phi(x)\|_2^2 - 2N\ln\sigma_\phi(x) \right) + Const </math>Here <math> N </math> is the dimension of <math> z </math>. For a more detailed derivation and more interpretations of ELBO and its maximization, see [[Evidence lower bound|its main page]].</div></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The form given is not very convenient for maximization, but the following, equivalent form, is:<math display="block">L_{\theta,\phi}(x) = \mathbb E_{z \sim q_\phi(\cdot | x)} \left[\ln p_\theta(x|z)\right] - D_{KL}(q_\phi({\cdot| x})\parallel p_\theta(\cdot)) </math>where <math>\ln p_\theta(x|z)</math> is implemented as <math>-\frac{1}{2}\| x - D_\theta(z)\|^2_2</math>, since that is, up to an additive constant, what <math>x<ins style="font-weight: bold; text-decoration: none;">|z</ins> \sim \mathcal N(D_\theta(z), I)</math> yields. That is, we model the distribution of <math>x</math> conditional on <math>z</math> to be a Gaussian distribution centered on <math>D_\theta(z)</math>. The distribution of <math>q_\phi(z |x)</math> and <math>p_\theta(z)</math> are often also chosen to be Gaussians as <math>z|x \sim \mathcal N(E_\phi(x), \sigma_\phi(x)^2I)</math> and <math>z \sim \mathcal N(0, I)</math>, with which we obtain by the formula for [[Kullback–Leibler divergence#Multivariate normal distributions|KL divergence of Gaussians]]:<math display="block">L_{\theta,\phi}(x) = -\frac 12\mathbb E_{z \sim q_\phi(\cdot | x)} \left[ \|x - D_\theta(z)\|_2^2\right] - \frac 12 \left( N\sigma_\phi(x)^2 + \|E_\phi(x)\|_2^2 - 2N\ln\sigma_\phi(x) \right) + Const </math>Here <math> N </math> is the dimension of <math> z </math>. For a more detailed derivation and more interpretations of ELBO and its maximization, see [[Evidence lower bound|its main page]].</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Reparameterization ==</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Reparameterization ==</div></td>
</tr>
</table>
Herbmuell
https://en.wikipedia.org/w/index.php?title=Variational_autoencoder&diff=1286071809&oldid=prev
82.102.110.228: Stochastic gradient descend has nothing to do with taking expectations. Undid revision 1280088605 by G.S.Ray (talk)
2025-04-17T15:18:08Z
<p>Stochastic gradient descend has nothing to do with taking expectations. Undid revision <a href="/wiki/Special:Diff/1280088605" title="Special:Diff/1280088605">1280088605</a> by <a href="/wiki/Special:Contributions/G.S.Ray" title="Special:Contributions/G.S.Ray">G.S.Ray</a> (<a href="/w/index.php?title=User_talk:G.S.Ray&action=edit&redlink=1" class="new" title="User talk:G.S.Ray (page does not exist)">talk</a>)</p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 15:18, 17 April 2025</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 110:</td>
<td colspan="2" class="diff-lineno">Line 110:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>We obtain the final formula for the loss:</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>We obtain the final formula for the loss:</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><math display="block"> L_{\theta,\phi} = \mathbb{E}_{x \sim \mathbb{P}^{real}} \left[ \|x - D_\theta(E_\phi(x))\|_2^2\right]</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><math display="block"> L_{\theta,\phi} = \mathbb{E}_{x \sim \mathbb{P}^{real}} \left[ \|x - D_\theta(E_\phi(x))\|_2^2\right]</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>+d \left( \mu(dz), E_\phi \sharp \mathbb{P}^{real} \right)^2</math></div></td>
</tr>
<tr>
<td class="diff-marker"><a class="mw-diff-movedpara-left" title="Paragraph was moved. Click to jump to new location." href="#movedpara_3_1_rhs">⚫</a></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><a name="movedpara_2_0_lhs"></a><del style="font-weight: bold; text-decoration: none;">+d</del> <del style="font-weight: bold; text-decoration: none;">\left(</del> <del style="font-weight: bold; text-decoration: none;">\mu(dz),</del> <del style="font-weight: bold; text-decoration: none;">E_\phi \sharp \mathbb{P}^{real} \right)^2</math></del><math>d</math> <del style="font-weight: bold; text-decoration: none;">must</del> <del style="font-weight: bold; text-decoration: none;">have certain</del> properties <del style="font-weight: bold; text-decoration: none;">depending</del> <del style="font-weight: bold; text-decoration: none;">on</del> <del style="font-weight: bold; text-decoration: none;">the</del> <del style="font-weight: bold; text-decoration: none;">type of algorithm used</del> to <del style="font-weight: bold; text-decoration: none;">mimize</del> <del style="font-weight: bold; text-decoration: none;">this</del> <del style="font-weight: bold; text-decoration: none;">loss</del> <del style="font-weight: bold; text-decoration: none;">function.</del> <del style="font-weight: bold; text-decoration: none;">For</del> <del style="font-weight: bold; text-decoration: none;">example,</del> <del style="font-weight: bold; text-decoration: none;">it</del> <del style="font-weight: bold; text-decoration: none;">has</del> <del style="font-weight: bold; text-decoration: none;">to</del> <del style="font-weight: bold; text-decoration: none;">be</del> <del style="font-weight: bold; text-decoration: none;">expressable as an expectation if it</del> <del style="font-weight: bold; text-decoration: none;">is</del> to be optimized by<del style="font-weight: bold; text-decoration: none;"> a</del> [[Stochastic gradient descent|stochastic optimization <del style="font-weight: bold; text-decoration: none;">algorithm</del>]]. Several distances can be chosen and this <del style="font-weight: bold; text-decoration: none;">has given</del> rise to several flavors of VAEs:</div></td>
<td colspan="2" class="diff-empty diff-side-added"></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker"><a class="mw-diff-movedpara-right" title="Paragraph was moved. Click to jump to old location." href="#movedpara_2_0_lhs">⚫</a></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><a name="movedpara_3_1_rhs"></a><ins style="font-weight: bold; text-decoration: none;">The</ins> <ins style="font-weight: bold; text-decoration: none;">statistical</ins> <ins style="font-weight: bold; text-decoration: none;">distance</ins> <math>d</math> <ins style="font-weight: bold; text-decoration: none;">requires</ins> <ins style="font-weight: bold; text-decoration: none;">special</ins> properties<ins style="font-weight: bold; text-decoration: none;">,</ins> <ins style="font-weight: bold; text-decoration: none;">for</ins> <ins style="font-weight: bold; text-decoration: none;">instance</ins> <ins style="font-weight: bold; text-decoration: none;">it</ins> <ins style="font-weight: bold; text-decoration: none;">has</ins> to <ins style="font-weight: bold; text-decoration: none;">be</ins> <ins style="font-weight: bold; text-decoration: none;">posses</ins> <ins style="font-weight: bold; text-decoration: none;">a</ins> <ins style="font-weight: bold; text-decoration: none;">formula</ins> <ins style="font-weight: bold; text-decoration: none;">as</ins> <ins style="font-weight: bold; text-decoration: none;">expectation</ins> <ins style="font-weight: bold; text-decoration: none;">because</ins> <ins style="font-weight: bold; text-decoration: none;">the</ins> <ins style="font-weight: bold; text-decoration: none;">loss</ins> <ins style="font-weight: bold; text-decoration: none;">function</ins> <ins style="font-weight: bold; text-decoration: none;">will</ins> <ins style="font-weight: bold; text-decoration: none;">need</ins> to be optimized by [[Stochastic gradient descent|stochastic optimization <ins style="font-weight: bold; text-decoration: none;">algorithms</ins>]]. Several distances can be chosen and this <ins style="font-weight: bold; text-decoration: none;">gave</ins> rise to several flavors of VAEs:</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* the sliced Wasserstein distance used by S Kolouri, et al. in their VAE<ref>{{Cite conference |last1=Kolouri |first1=Soheil |last2=Pope |first2=Phillip E. |last3=Martin |first3=Charles E. |last4=Rohde |first4=Gustavo K. |date=2019 |title=Sliced Wasserstein Auto-Encoders |url=https://openreview.net/forum?id=H1xaJn05FQ |conference=International Conference on Learning Representations |publisher=ICPR |book-title=International Conference on Learning Representations}}</ref></div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* the sliced Wasserstein distance used by S Kolouri, et al. in their VAE<ref>{{Cite conference |last1=Kolouri |first1=Soheil |last2=Pope |first2=Phillip E. |last3=Martin |first3=Charles E. |last4=Rohde |first4=Gustavo K. |date=2019 |title=Sliced Wasserstein Auto-Encoders |url=https://openreview.net/forum?id=H1xaJn05FQ |conference=International Conference on Learning Representations |publisher=ICPR |book-title=International Conference on Learning Representations}}</ref></div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* the [[energy distance]] implemented in the Radon Sobolev Variational Auto-Encoder<ref>{{Cite journal |last=Turinici |first=Gabriel |year=2021 |title=Radon-Sobolev Variational Auto-Encoders |url=https://www.sciencedirect.com/science/article/pii/S0893608021001556 |journal=Neural Networks |volume=141 |pages=294–305 |arxiv=1911.13135 |doi=10.1016/j.neunet.2021.04.018 |issn=0893-6080 |pmid=33933889}}</ref></div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* the [[energy distance]] implemented in the Radon Sobolev Variational Auto-Encoder<ref>{{Cite journal |last=Turinici |first=Gabriel |year=2021 |title=Radon-Sobolev Variational Auto-Encoders |url=https://www.sciencedirect.com/science/article/pii/S0893608021001556 |journal=Neural Networks |volume=141 |pages=294–305 |arxiv=1911.13135 |doi=10.1016/j.neunet.2021.04.018 |issn=0893-6080 |pmid=33933889}}</ref></div></td>
</tr>
</table>
82.102.110.228
https://en.wikipedia.org/w/index.php?title=Variational_autoencoder&diff=1285405588&oldid=prev
213.196.211.24: Fix formatting.
2025-04-13T14:42:43Z
<p>Fix formatting.</p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 14:42, 13 April 2025</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 21:</td>
<td colspan="2" class="diff-lineno">Line 21:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>To optimize this model, one needs to know two terms: the "reconstruction error", and the [[Kullback–Leibler divergence]] (KL-D). Both terms are derived from the free energy expression of the probabilistic model, and therefore differ depending on the noise distribution and the assumed prior of the data, here referred to as p-distribution. For example, a standard VAE task such as IMAGENET is typically assumed to have a gaussianly distributed noise; however, tasks such as binarized MNIST require a Bernoulli noise. The KL-D from the free energy expression maximizes the probability mass of the q-distribution that overlaps with the p-distribution, which unfortunately can result in mode-seeking behaviour. The "reconstruction" term is the remainder of the free energy expression, and requires a sampling approximation to compute its expectation value.<ref name="Kingma2013">{{cite arXiv |last1=Kingma |first1=Diederik P. |last2=Welling |first2=Max |title=Auto-Encoding Variational Bayes |date=2013-12-20 |class=stat.ML |eprint=1312.6114}}</ref> </div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>To optimize this model, one needs to know two terms: the "reconstruction error", and the [[Kullback–Leibler divergence]] (KL-D). Both terms are derived from the free energy expression of the probabilistic model, and therefore differ depending on the noise distribution and the assumed prior of the data, here referred to as p-distribution. For example, a standard VAE task such as IMAGENET is typically assumed to have a gaussianly distributed noise; however, tasks such as binarized MNIST require a Bernoulli noise. The KL-D from the free energy expression maximizes the probability mass of the q-distribution that overlaps with the p-distribution, which unfortunately can result in mode-seeking behaviour. The "reconstruction" term is the remainder of the free energy expression, and requires a sampling approximation to compute its expectation value.<ref name="Kingma2013">{{cite arXiv |last1=Kingma |first1=Diederik P. |last2=Welling |first2=Max |title=Auto-Encoding Variational Bayes |date=2013-12-20 |class=stat.ML |eprint=1312.6114}}</ref> </div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>More recent approaches replace [[Kullback–Leibler divergence]] (KL-D) with [[Statistical distance|various statistical distances]], see [[#Statistical distance VAE variants|<del style="font-weight: bold; text-decoration: none;">see section </del>"Statistical distance VAE variants" below<del style="font-weight: bold; text-decoration: none;">.]]</del>.</div></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>More recent approaches replace [[Kullback–Leibler divergence]] (KL-D) with [[Statistical distance|various statistical distances]], see [[#Statistical distance VAE variants|"Statistical distance VAE variants"<ins style="font-weight: bold; text-decoration: none;">]]</ins> below.</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Formulation ==</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Formulation ==</div></td>
</tr>
</table>
213.196.211.24
https://en.wikipedia.org/w/index.php?title=Variational_autoencoder&diff=1281286425&oldid=prev
AndrewGibbs7: Punctuation error - full stop should be a comma.
2025-03-19T12:50:22Z
<p>Punctuation error - full stop should be a comma.</p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 12:50, 19 March 2025</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 103:</td>
<td colspan="2" class="diff-lineno">Line 103:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Statistical distance VAE variants==</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Statistical distance VAE variants==</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>After the initial work of Diederik P. Kingma and [[Max Welling]]<del style="font-weight: bold; text-decoration: none;">.</del><ref>{{Cite arXiv |eprint=1312.6114 |class=stat.ML |first1=Diederik P. |last1=Kingma |first2=Max |last2=Welling |title=Auto-Encoding Variational Bayes |date=2022-12-10}}</ref> several procedures were </div></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>After the initial work of Diederik P. Kingma and [[Max Welling]]<ins style="font-weight: bold; text-decoration: none;">,</ins><ref>{{Cite arXiv |eprint=1312.6114 |class=stat.ML |first1=Diederik P. |last1=Kingma |first2=Max |last2=Welling |title=Auto-Encoding Variational Bayes |date=2022-12-10}}</ref> several procedures were </div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>proposed to formulate in a more abstract way the operation of the VAE. In these approaches the loss function is composed of two parts : </div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>proposed to formulate in a more abstract way the operation of the VAE. In these approaches the loss function is composed of two parts : </div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* the usual reconstruction error part which seeks to ensure that the encoder-then-decoder mapping <math>x \mapsto D_\theta(E_\psi(x))</math> is as close to the identity map as possible; the sampling is done at run time from the empirical distribution <math>\mathbb{P}^{real}</math> of objects available (e.g., for MNIST or IMAGENET this will be the empirical probability law of all images in the dataset). This gives the term: <math> \mathbb{E}_{x \sim \mathbb{P}^{real}} \left[ \|x - D_\theta(E_\phi(x))\|_2^2\right]</math>.</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* the usual reconstruction error part which seeks to ensure that the encoder-then-decoder mapping <math>x \mapsto D_\theta(E_\psi(x))</math> is as close to the identity map as possible; the sampling is done at run time from the empirical distribution <math>\mathbb{P}^{real}</math> of objects available (e.g., for MNIST or IMAGENET this will be the empirical probability law of all images in the dataset). This gives the term: <math> \mathbb{E}_{x \sim \mathbb{P}^{real}} \left[ \|x - D_\theta(E_\phi(x))\|_2^2\right]</math>.</div></td>
</tr>
</table>
AndrewGibbs7
https://en.wikipedia.org/w/index.php?title=Variational_autoencoder&diff=1280088605&oldid=prev
G.S.Ray: Cleaned up phrasing and made clearer (I hope) the link between the statistical distance and the type of optimization algorithm.
2025-03-12T12:30:49Z
<p>Cleaned up phrasing and made clearer (I hope) the link between the statistical distance and the type of optimization algorithm.</p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 12:30, 12 March 2025</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 110:</td>
<td colspan="2" class="diff-lineno">Line 110:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>We obtain the final formula for the loss:</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>We obtain the final formula for the loss:</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><math display="block"> L_{\theta,\phi} = \mathbb{E}_{x \sim \mathbb{P}^{real}} \left[ \|x - D_\theta(E_\phi(x))\|_2^2\right]</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><math display="block"> L_{\theta,\phi} = \mathbb{E}_{x \sim \mathbb{P}^{real}} \left[ \|x - D_\theta(E_\phi(x))\|_2^2\right]</div></td>
</tr>
<tr>
<td colspan="2" class="diff-empty diff-side-deleted"></td>
<td class="diff-marker"><a class="mw-diff-movedpara-right" title="Paragraph was moved. Click to jump to old location." href="#movedpara_3_1_lhs">⚫</a></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><a name="movedpara_1_0_rhs"></a><ins style="font-weight: bold; text-decoration: none;">+d</ins> <ins style="font-weight: bold; text-decoration: none;">\left(</ins> <ins style="font-weight: bold; text-decoration: none;">\mu(dz),</ins> <ins style="font-weight: bold; text-decoration: none;">E_\phi \sharp \mathbb{P}^{real} \right)^2</math></ins><math>d</math> <ins style="font-weight: bold; text-decoration: none;">must</ins> <ins style="font-weight: bold; text-decoration: none;">have certain</ins> properties <ins style="font-weight: bold; text-decoration: none;">depending</ins> <ins style="font-weight: bold; text-decoration: none;">on</ins> <ins style="font-weight: bold; text-decoration: none;">the</ins> <ins style="font-weight: bold; text-decoration: none;">type of algorithm used</ins> to <ins style="font-weight: bold; text-decoration: none;">mimize</ins> <ins style="font-weight: bold; text-decoration: none;">this</ins> <ins style="font-weight: bold; text-decoration: none;">loss</ins> <ins style="font-weight: bold; text-decoration: none;">function.</ins> <ins style="font-weight: bold; text-decoration: none;">For</ins> <ins style="font-weight: bold; text-decoration: none;">example,</ins> <ins style="font-weight: bold; text-decoration: none;">it</ins> <ins style="font-weight: bold; text-decoration: none;">has</ins> <ins style="font-weight: bold; text-decoration: none;">to</ins> <ins style="font-weight: bold; text-decoration: none;">be</ins> <ins style="font-weight: bold; text-decoration: none;">expressable as an expectation if it</ins> <ins style="font-weight: bold; text-decoration: none;">is</ins> to be optimized by<ins style="font-weight: bold; text-decoration: none;"> a</ins> [[Stochastic gradient descent|stochastic optimization <ins style="font-weight: bold; text-decoration: none;">algorithm</ins>]]. Several distances can be chosen and this <ins style="font-weight: bold; text-decoration: none;">has given</ins> rise to several flavors of VAEs:</div></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>+d \left( \mu(dz), E_\phi \sharp \mathbb{P}^{real} \right)^2</math></div></td>
<td colspan="2" class="diff-empty diff-side-added"></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td colspan="2" class="diff-empty diff-side-added"></td>
</tr>
<tr>
<td class="diff-marker"><a class="mw-diff-movedpara-left" title="Paragraph was moved. Click to jump to new location." href="#movedpara_1_0_rhs">⚫</a></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><a name="movedpara_3_1_lhs"></a><del style="font-weight: bold; text-decoration: none;">The</del> <del style="font-weight: bold; text-decoration: none;">statistical</del> <del style="font-weight: bold; text-decoration: none;">distance</del> <math>d</math> <del style="font-weight: bold; text-decoration: none;">requires</del> <del style="font-weight: bold; text-decoration: none;">special</del> properties<del style="font-weight: bold; text-decoration: none;">,</del> <del style="font-weight: bold; text-decoration: none;">for</del> <del style="font-weight: bold; text-decoration: none;">instance</del> <del style="font-weight: bold; text-decoration: none;">it</del> <del style="font-weight: bold; text-decoration: none;">has</del> to <del style="font-weight: bold; text-decoration: none;">be</del> <del style="font-weight: bold; text-decoration: none;">posses</del> <del style="font-weight: bold; text-decoration: none;">a</del> <del style="font-weight: bold; text-decoration: none;">formula</del> <del style="font-weight: bold; text-decoration: none;">as</del> <del style="font-weight: bold; text-decoration: none;">expectation</del> <del style="font-weight: bold; text-decoration: none;">because</del> <del style="font-weight: bold; text-decoration: none;">the</del> <del style="font-weight: bold; text-decoration: none;">loss</del> <del style="font-weight: bold; text-decoration: none;">function</del> <del style="font-weight: bold; text-decoration: none;">will</del> <del style="font-weight: bold; text-decoration: none;">need</del> to be optimized by [[Stochastic gradient descent|stochastic optimization <del style="font-weight: bold; text-decoration: none;">algorithms</del>]]. Several distances can be chosen and this <del style="font-weight: bold; text-decoration: none;">gave</del> rise to several flavors of VAEs:</div></td>
<td colspan="2" class="diff-empty diff-side-added"></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* the sliced Wasserstein distance used by S Kolouri, et al. in their VAE<ref>{{Cite conference |last1=Kolouri |first1=Soheil |last2=Pope |first2=Phillip E. |last3=Martin |first3=Charles E. |last4=Rohde |first4=Gustavo K. |date=2019 |title=Sliced Wasserstein Auto-Encoders |url=https://openreview.net/forum?id=H1xaJn05FQ |conference=International Conference on Learning Representations |publisher=ICPR |book-title=International Conference on Learning Representations}}</ref></div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* the sliced Wasserstein distance used by S Kolouri, et al. in their VAE<ref>{{Cite conference |last1=Kolouri |first1=Soheil |last2=Pope |first2=Phillip E. |last3=Martin |first3=Charles E. |last4=Rohde |first4=Gustavo K. |date=2019 |title=Sliced Wasserstein Auto-Encoders |url=https://openreview.net/forum?id=H1xaJn05FQ |conference=International Conference on Learning Representations |publisher=ICPR |book-title=International Conference on Learning Representations}}</ref></div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* the [[energy distance]] implemented in the Radon Sobolev Variational Auto-Encoder<ref>{{Cite journal |last=Turinici |first=Gabriel |year=2021 |title=Radon-Sobolev Variational Auto-Encoders |url=https://www.sciencedirect.com/science/article/pii/S0893608021001556 |journal=Neural Networks |volume=141 |pages=294–305 |arxiv=1911.13135 |doi=10.1016/j.neunet.2021.04.018 |issn=0893-6080 |pmid=33933889}}</ref></div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* the [[energy distance]] implemented in the Radon Sobolev Variational Auto-Encoder<ref>{{Cite journal |last=Turinici |first=Gabriel |year=2021 |title=Radon-Sobolev Variational Auto-Encoders |url=https://www.sciencedirect.com/science/article/pii/S0893608021001556 |journal=Neural Networks |volume=141 |pages=294–305 |arxiv=1911.13135 |doi=10.1016/j.neunet.2021.04.018 |issn=0893-6080 |pmid=33933889}}</ref></div></td>
</tr>
</table>
G.S.Ray
https://en.wikipedia.org/w/index.php?title=Variational_autoencoder&diff=1278307551&oldid=prev
Arjayay: Sp
2025-03-01T17:53:00Z
<p>Sp</p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 17:53, 1 March 2025</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 112:</td>
<td colspan="2" class="diff-lineno">Line 112:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>+d \left( \mu(dz), E_\phi \sharp \mathbb{P}^{real} \right)^2</math></div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>+d \left( \mu(dz), E_\phi \sharp \mathbb{P}^{real} \right)^2</math></div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The statistical distance <math>d</math> requires special properties, for instance <del style="font-weight: bold; text-decoration: none;">is</del> has to be posses a formula as expectation because the loss function will need to be optimized by [[Stochastic gradient descent|stochastic optimization algorithms]]. Several distances can be chosen and this gave rise to several flavors of VAEs:</div></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The statistical distance <math>d</math> requires special properties, for instance <ins style="font-weight: bold; text-decoration: none;">it</ins> has to be posses a formula as expectation because the loss function will need to be optimized by [[Stochastic gradient descent|stochastic optimization algorithms]]. Several distances can be chosen and this gave rise to several flavors of VAEs:</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* the sliced Wasserstein distance used by S Kolouri, et al. in their VAE<ref>{{Cite conference |last1=Kolouri |first1=Soheil |last2=Pope |first2=Phillip E. |last3=Martin |first3=Charles E. |last4=Rohde |first4=Gustavo K. |date=2019 |title=Sliced Wasserstein Auto-Encoders |url=https://openreview.net/forum?id=H1xaJn05FQ |conference=International Conference on Learning Representations |publisher=ICPR |book-title=International Conference on Learning Representations}}</ref></div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* the sliced Wasserstein distance used by S Kolouri, et al. in their VAE<ref>{{Cite conference |last1=Kolouri |first1=Soheil |last2=Pope |first2=Phillip E. |last3=Martin |first3=Charles E. |last4=Rohde |first4=Gustavo K. |date=2019 |title=Sliced Wasserstein Auto-Encoders |url=https://openreview.net/forum?id=H1xaJn05FQ |conference=International Conference on Learning Representations |publisher=ICPR |book-title=International Conference on Learning Representations}}</ref></div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* the [[energy distance]] implemented in the Radon Sobolev Variational Auto-Encoder<ref>{{Cite journal |last=Turinici |first=Gabriel |year=2021 |title=Radon-Sobolev Variational Auto-Encoders |url=https://www.sciencedirect.com/science/article/pii/S0893608021001556 |journal=Neural Networks |volume=141 |pages=294–305 |arxiv=1911.13135 |doi=10.1016/j.neunet.2021.04.018 |issn=0893-6080 |pmid=33933889}}</ref></div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* the [[energy distance]] implemented in the Radon Sobolev Variational Auto-Encoder<ref>{{Cite journal |last=Turinici |first=Gabriel |year=2021 |title=Radon-Sobolev Variational Auto-Encoders |url=https://www.sciencedirect.com/science/article/pii/S0893608021001556 |journal=Neural Networks |volume=141 |pages=294–305 |arxiv=1911.13135 |doi=10.1016/j.neunet.2021.04.018 |issn=0893-6080 |pmid=33933889}}</ref></div></td>
</tr>
</table>
Arjayay
https://en.wikipedia.org/w/index.php?title=Variational_autoencoder&diff=1269809754&oldid=prev
Bearcat at 14:31, 16 January 2025
2025-01-16T14:31:27Z
<p></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 14:31, 16 January 2025</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 86:</td>
<td colspan="2" class="diff-lineno">Line 86:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>\mathbb {E}_{\epsilon}\left[ \nabla_\phi \ln {\frac {p_{\theta }(x, \mu_\phi(x) + L_\phi(x)\epsilon)}{q_{\phi }(\mu_\phi(x) + L_\phi(x)\epsilon | x)}}\right] </math>and so we obtained an unbiased estimator of the gradient, allowing [[stochastic gradient descent]].</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>\mathbb {E}_{\epsilon}\left[ \nabla_\phi \ln {\frac {p_{\theta }(x, \mu_\phi(x) + L_\phi(x)\epsilon)}{q_{\phi }(\mu_\phi(x) + L_\phi(x)\epsilon | x)}}\right] </math>and so we obtained an unbiased estimator of the gradient, allowing [[stochastic gradient descent]].</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Since we reparametrized <math>z</math>, we need to find <math>q_\phi(z|x)</math>. Let <math>q_0</math> be the probability density function for <math>\epsilon</math>, then {{clarify |reason=The following calculations might have mistakes.|date=October 2023<del style="font-weight: bold; text-decoration: none;">; I interchanged z and epsilon in the definition of the Jacobian below </del>}}<math display="block">\ln q_\phi(z | x) = \ln q_0 (\epsilon) - \ln|\det(\partial_\epsilon z)|</math>where <math>\partial_\epsilon z</math> is the Jacobian matrix of <math>z</math> with respect to <math>\epsilon</math>. Since <math>z = \mu_\phi(x) + L_\phi(x)\epsilon </math>, this is <math display="block">\ln q_\phi(z | x) = -\frac 12 \|\epsilon\|^2 - \ln|\det L_\phi(x)| - \frac n2 \ln(2\pi)</math></div></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Since we reparametrized <math>z</math>, we need to find <math>q_\phi(z|x)</math>. Let <math>q_0</math> be the probability density function for <math>\epsilon</math>, then {{clarify |reason=The following calculations might have mistakes.|date=October 2023}}<math display="block">\ln q_\phi(z | x) = \ln q_0 (\epsilon) - \ln|\det(\partial_\epsilon z)|</math>where <math>\partial_\epsilon z</math> is the Jacobian matrix of <math>z</math> with respect to <math>\epsilon</math>. Since <math>z = \mu_\phi(x) + L_\phi(x)\epsilon </math>, this is <math display="block">\ln q_\phi(z | x) = -\frac 12 \|\epsilon\|^2 - \ln|\det L_\phi(x)| - \frac n2 \ln(2\pi)</math></div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Variations ==</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Variations ==</div></td>
</tr>
</table>
Bearcat
https://en.wikipedia.org/w/index.php?title=Variational_autoencoder&diff=1269604440&oldid=prev
AnomieBOT: Dating maintenance tags: {{Clarify}}
2025-01-15T13:38:02Z
<p>Dating maintenance tags: {{Clarify}}</p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 13:38, 15 January 2025</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 86:</td>
<td colspan="2" class="diff-lineno">Line 86:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>\mathbb {E}_{\epsilon}\left[ \nabla_\phi \ln {\frac {p_{\theta }(x, \mu_\phi(x) + L_\phi(x)\epsilon)}{q_{\phi }(\mu_\phi(x) + L_\phi(x)\epsilon | x)}}\right] </math>and so we obtained an unbiased estimator of the gradient, allowing [[stochastic gradient descent]].</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>\mathbb {E}_{\epsilon}\left[ \nabla_\phi \ln {\frac {p_{\theta }(x, \mu_\phi(x) + L_\phi(x)\epsilon)}{q_{\phi }(\mu_\phi(x) + L_\phi(x)\epsilon | x)}}\right] </math>and so we obtained an unbiased estimator of the gradient, allowing [[stochastic gradient descent]].</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Since we reparametrized <math>z</math>, we need to find <math>q_\phi(z|x)</math>. Let <math>q_0</math> be the probability density function for <math>\epsilon</math>, then {{clarify |reason=The following calculations might have mistakes.|date=October 2023; I interchanged z and epsilon in the definition of the Jacobian below <del style="font-weight: bold; text-decoration: none;">| date January 2025</del>}}<math display="block">\ln q_\phi(z | x) = \ln q_0 (\epsilon) - \ln|\det(\partial_\epsilon z)|</math>where <math>\partial_\epsilon z</math> is the Jacobian matrix of <math>z</math> with respect to <math>\epsilon</math>. Since <math>z = \mu_\phi(x) + L_\phi(x)\epsilon </math>, this is <math display="block">\ln q_\phi(z | x) = -\frac 12 \|\epsilon\|^2 - \ln|\det L_\phi(x)| - \frac n2 \ln(2\pi)</math></div></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Since we reparametrized <math>z</math>, we need to find <math>q_\phi(z|x)</math>. Let <math>q_0</math> be the probability density function for <math>\epsilon</math>, then {{clarify |reason=The following calculations might have mistakes.|date=October 2023; I interchanged z and epsilon in the definition of the Jacobian below }}<math display="block">\ln q_\phi(z | x) = \ln q_0 (\epsilon) - \ln|\det(\partial_\epsilon z)|</math>where <math>\partial_\epsilon z</math> is the Jacobian matrix of <math>z</math> with respect to <math>\epsilon</math>. Since <math>z = \mu_\phi(x) + L_\phi(x)\epsilon </math>, this is <math display="block">\ln q_\phi(z | x) = -\frac 12 \|\epsilon\|^2 - \ln|\det L_\phi(x)| - \frac n2 \ln(2\pi)</math></div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Variations ==</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Variations ==</div></td>
</tr>
</table>
AnomieBOT
https://en.wikipedia.org/w/index.php?title=Variational_autoencoder&diff=1269588695&oldid=prev
2A0C:5BC0:40:1058:9E7B:EFFF:FE25:5BDB: /* Reparameterization */
2025-01-15T11:36:49Z
<p><span class="autocomment">Reparameterization</span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Previous revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 11:36, 15 January 2025</td>
</tr><tr>
<td colspan="2" class="diff-lineno">Line 86:</td>
<td colspan="2" class="diff-lineno">Line 86:</td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>\mathbb {E}_{\epsilon}\left[ \nabla_\phi \ln {\frac {p_{\theta }(x, \mu_\phi(x) + L_\phi(x)\epsilon)}{q_{\phi }(\mu_\phi(x) + L_\phi(x)\epsilon | x)}}\right] </math>and so we obtained an unbiased estimator of the gradient, allowing [[stochastic gradient descent]].</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>\mathbb {E}_{\epsilon}\left[ \nabla_\phi \ln {\frac {p_{\theta }(x, \mu_\phi(x) + L_\phi(x)\epsilon)}{q_{\phi }(\mu_\phi(x) + L_\phi(x)\epsilon | x)}}\right] </math>and so we obtained an unbiased estimator of the gradient, allowing [[stochastic gradient descent]].</div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker" data-marker="−"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Since we reparametrized <math>z</math>, we need to find <math>q_\phi(z|x)</math>. Let <math>q_0</math> be the probability density function for <math>\epsilon</math>, then {{clarify |reason=The following calculations might have mistakes.|date=October 2023}}<math display="block">\ln q_\phi(z | x) = \ln q_0 (\epsilon) - \ln|\det(\partial_\epsilon z)|</math>where <math>\partial_\epsilon z</math> is the Jacobian matrix of <math><del style="font-weight: bold; text-decoration: none;">\epsilon</del></math> with respect to <math><del style="font-weight: bold; text-decoration: none;">z</del></math>. Since <math>z = \mu_\phi(x) + L_\phi(x)\epsilon </math>, this is <math display="block">\ln q_\phi(z | x) = -\frac 12 \|\epsilon\|^2 - \ln|\det L_\phi(x)| - \frac n2 \ln(2\pi)</math></div></td>
<td class="diff-marker" data-marker="+"></td>
<td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Since we reparametrized <math>z</math>, we need to find <math>q_\phi(z|x)</math>. Let <math>q_0</math> be the probability density function for <math>\epsilon</math>, then {{clarify |reason=The following calculations might have mistakes.|date=October 2023<ins style="font-weight: bold; text-decoration: none;">; I interchanged z and epsilon in the definition of the Jacobian below | date January 2025</ins>}}<math display="block">\ln q_\phi(z | x) = \ln q_0 (\epsilon) - \ln|\det(\partial_\epsilon z)|</math>where <math>\partial_\epsilon z</math> is the Jacobian matrix of <math><ins style="font-weight: bold; text-decoration: none;">z</ins></math> with respect to <math><ins style="font-weight: bold; text-decoration: none;">\epsilon</ins></math>. Since <math>z = \mu_\phi(x) + L_\phi(x)\epsilon </math>, this is <math display="block">\ln q_\phi(z | x) = -\frac 12 \|\epsilon\|^2 - \ln|\det L_\phi(x)| - \frac n2 \ln(2\pi)</math></div></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br /></td>
</tr>
<tr>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Variations ==</div></td>
<td class="diff-marker"></td>
<td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Variations ==</div></td>
</tr>
</table>
2A0C:5BC0:40:1058:9E7B:EFFF:FE25:5BDB