Draft:Sharpness Aware Minimization

Review waiting, please be patient.

This may take 3 months or more, since drafts are reviewed in no specific order. There are 2,803 pending submissions waiting for review.

If the submission is accepted, then this page will be moved into the article space.
If the submission is declined, then the reason will be posted here.
In the meantime, you can continue to improve this submission by editing normally.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Reviewer tools

Instructions · What links here · Sharpness Aware Minimization (talk: + · bio) · (log) · Copyvios report · reFill · Citation Bot · (Search: Google, Wikipedia) · Submitted 28 hours ago by 2600:1700:3EC7:2CC0:5B82:5CA9:6050:DFE (talk: D · +) · Last edited 5 hours ago by Citation bot

Submission declined on 22 May 2025 by ToadetteEdit (talk).

This submission does not appear to be written in the formal tone expected of an encyclopedia article. Entries should be written from a neutral point of view, and should refer to a range of independent, reliable, published sources. Please rewrite your submission in a more encyclopedic format. Please make sure to avoid peacock terms that promote the subject.

If you would like to continue working on the submission, click on the "Edit" tab at the top of the window.
If you have not resolved the issues listed above, your draft will be declined again and potentially deleted.
If you need extra help, please ask us a question at the AfC Help Desk or get live help from experienced editors.
Please do not remove reviewer comments or this notice until the submission is accepted.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Declined by ToadetteEdit 9 days ago. Last edited by Citation bot 5 hours ago. Reviewer: Inform author.

This draft has been resubmitted and is currently awaiting re-review.

Sharpness Aware Minimization (SAM) is an optimization algorithm used in machine learning that aims to improve model generalization. The method seeks to find model parameters that are located in regions of the loss landscape with uniformly low loss values, rather than parameters that only achieve a minimal loss value at a single point. This approach is described as finding "flat" minima instead of "sharp" ones. The rationale is that models trained this way are less sensitive to variations between training and test data, which can lead to better performance on unseen data.^[1]

The algorithm was introduced in a 2020 paper by a team of researchers including Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur.^[1]

Underlying Principle

SAM modifies the standard training objective by minimizing a "sharpness-aware" loss. This is formulated as a minimax problem where the inner objective seeks to find the highest loss value in the immediate neighborhood of the current model weights, and the outer objective minimizes this value:^[1]

$\min _{w}\max _{\|\epsilon \|_{p}\leq \rho }L_{\text{train}}(w+\epsilon )+\lambda \|w\|_{2}^{2}$

In this formulation:

$w$ represents the model's parameters (weights).
$L_{\text{train}}$ is the loss calculated on the training data.
$\epsilon$ is a perturbation applied to the weights.
$\rho$ is a hyperparameter that defines the radius of the neighborhood (an $L_{p}$ ball) to search for the highest loss.
An optional L2 regularization term, scaled by $\lambda$ , can be included.

A direct solution to the inner maximization problem is computationally expensive. SAM approximates it by taking a single gradient ascent step to find the perturbation $\epsilon$ . This is calculated as:^[1]

$\epsilon (w)=\rho {\frac {\nabla L_{\text{train}}(w)}{\|\nabla L_{\text{train}}(w)\|_{2}}}$

The optimization process for each training step involves two stages. First, an "ascent step" computes a perturbed set of weights, $w_{\text{adv}}=w+\epsilon (w)$ , by moving towards the direction of the highest local loss. Second, a "descent step" updates the original weights $w$ using the gradient calculated at these perturbed weights, $\nabla L_{\text{train}}(w_{\text{adv}})$ . This update is typically performed using a standard optimizer like SGD or Adam.^[1]

Application and Performance

SAM has been applied in various machine learning contexts, primarily in computer vision. Research has shown it can improve generalization performance in models such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) on image datasets including ImageNet, CIFAR-10, and CIFAR-100.^[1]

The algorithm has also been found to be effective in training models with noisy labels, where it performs comparably to methods designed specifically for this problem.^[2]^[3] Some studies indicate that SAM and its variants can improve out-of-distribution (OOD) generalization, which is a model's ability to perform well on data from distributions not seen during training.^[4]^[5] Other areas where it has been applied include gradual domain adaptation and mitigating overfitting in scenarios with repeated exposure to training examples.^[6]^[1]

Limitations

A primary limitation of SAM is its computational cost. By requiring two gradient computations (one for the ascent and one for the descent) per optimization step, it approximately doubles the training time compared to standard optimizers.^[1]

The theoretical convergence properties of SAM are still under investigation. Some research suggests that with a constant step size, SAM may not converge to a stationary point.^[7] The accuracy of the single gradient step approximation for finding the worst-case perturbation may also decrease during the training process.^[8]

The effectiveness of SAM can also be domain-dependent. While it has shown benefits for computer vision tasks, its impact on other areas, such as GPT-style language models where each training example is seen only once, has been reported as limited in some studies.^[9] Furthermore, while SAM seeks flat minima, some research suggests that not all flat minima necessarily lead to good generalization.^[10] The algorithm also introduces the neighborhood size $\rho$ as a new hyperparameter, which requires tuning.^[1]

Research, Variants, and Enhancements

Active research on SAM focuses on reducing its computational overhead and improving its performance. Several variants have been proposed to make the algorithm more efficient. These include methods that attempt to parallelize the two gradient computations, apply the perturbation to only a subset of parameters, or reduce the number of computation steps required.^[11]^[12]^[13] Other approaches use historical gradient information or apply SAM steps intermittently to lower the computational burden.^[14]^[15]

To improve performance and robustness, variants have been developed that adapt the neighborhood size based on model parameter scales (Adaptive SAM or ASAM)^[8] or incorporate information about the curvature of the loss landscape (Curvature Regularized SAM or CR-SAM).^[16] Other research explores refining the perturbation step by focusing on specific components of the gradient or combining SAM with techniques like random smoothing.^[17]^[18]

Theoretical work continues to analyze the algorithm's behavior, including its implicit bias towards flatter minima and the development of broader frameworks for sharpness-aware optimization that use different measures of sharpness.^[19]^[20]

References

^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ Foret, Pierre; Kleiner, Ariel; Mobahi, Hossein; Neyshabur, Behnam (2021). "Sharpness-Aware Minimization for Efficiently Improving Generalization". International Conference on Learning Representations (ICLR) 2021. arXiv:2010.01412.
^ Wen, Yulei; Liu, Zhen; Zhang, Zhe; Zhang, Yilong; Wang, Linmi; Zhang, Tiantian (2021). "Mitigating Memorization in Sample Selection for Learning with Noisy Labels". arXiv:2110.08529 [cs.LG].
^ Zhuang, Juntang; Gong, Ming; Liu, Tong (2022). "Surrogate Gap Minimization Improves Sharpness-Aware Training". International Conference on Machine Learning (ICML) 2022. PMLR. pp. 27098–27115.
^ Croce, Francesco; Hein, Matthias (2021). "High-Resolution "Magic"-Field Spectroscopy on Trapped Polyatomic Molecules". Physical Review Letters. 127 (17): 173602. arXiv:2110.11214. Bibcode:2021PhRvL.127q3602P. doi:10.1103/PhysRevLett.127.173602. PMID 34739278.
^ Kim, Daehyeon; Kim, Seungone; Kim, Kwangrok; Kim, Sejun; Kim, Jangho (2022). "Slicing Aided Hyper-dimensional Inference and Fine-tuning for Improved OOD Generalization". Conference on Neural Information Processing Systems (NeurIPS) 2022.
^ Liu, Sitong; Zhou, Pan; Zhang, Xingchao; Xu, Zhi; Wang, Guang; Zhao, Hao (2021). "Delving into SAM: An Analytical Study of Sharpness Aware Minimization". arXiv:2111.00905 [cs.LG].
^ Andriushchenko, Maksym; Flammarion, Nicolas (2022). "Towards Understanding Sharpness-Aware Minimization". International Conference on Machine Learning (ICML) 2022. PMLR. pp. 612–639.
^ ^a ^b Kwon, Jungmin; Kim, Jeongseop; Park, Hyunseo; Choi, Il-Chul (2021). "ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks". International Conference on Machine Learning (ICML) 2021. PMLR. pp. 5919–5929.
^ Chen, Xian; Zhai, Saining; Chan, Crucian; Le, Quoc V.; Houlsby, Graham (2023). "When is Sharpness-Aware Minimization (SAM) Effective for Large Language Models?". arXiv:2308.04932 [cs.LG].
^ Liu, Kai; Li, Yifan; Wang, Hao; Liu, Zhen; Zhao, Jindong (2023). "When Sharpness-Aware Minimization Meets Data Augmentation: Connect the Dots for OOD Generalization". International Conference on Learning Representations (ICLR) 2023.
^ Dou, Yong; Zhou, Cong; Zhao, Peng; Zhang, Tong (2022). "SAMPa: A Parallelized Version of Sharpness-Aware Minimization". arXiv:2202.02081 [cs.LG].
^ Chen, Wenlong; Liu, Xiaoyu; Yin, Huan; Yang, Tianlong (2022). "Sparse SAM: Squeezing Sharpness-aware Minimization into a Single Forward-backward Pass". arXiv:2205.13516 [cs.LG].
^ Zhuang, Juntang; Liu, Tong; Tao, Dacheng (2022). "S2-SAM: A Single-Step, Zero-Extra-Cost Approach to Sharpness-Aware Training". arXiv:2206.08307 [cs.LG].
^ He, Zequn; Liu, Sitong; Zhang, Xingchao; Zhou, Pan; Zhang, Cong; Xu, Zhi; Zhao, Hao (2021). "Optical secret sharing with cascaded metasurface holography". Science Advances. 7 (16). arXiv:2110.03265. Bibcode:2021SciA....7.9718G. doi:10.1126/sciadv.abf9718. PMC 8046362. PMID 33853788.
^ Liu, Sitong; He, Zequn; Zhang, Xingchao; Zhou, Pan; Xu, Zhi; Zhang, Cong; Zhao, Hao (2022). "Lookahead Sharpness-aware Minimization". International Conference on Learning Representations (ICLR) 2022.
^ Kim, Minhwan; Lee, Suyeon; Shin, Jonghyun (2023). "MRChem Multiresolution Analysis Code for Molecular Electronic Structure Calculations: Performance and Scaling Properties". Journal of Chemical Theory and Computation. 19 (1): 137–146. arXiv:2210.01011. doi:10.1021/acs.jctc.2c00982. PMC 9835826. PMID 36410396.
^ Liu, Kai; Wang, Hao; Li, Yifan; Liu, Zhen; Zhang, Runpeng; Zhao, Jindong (2023). "Friendly Sharpness-Aware Minimization". International Conference on Learning Representations (ICLR) 2023.
^ Singh, Sandeep Kumar; Ahn, Kyungsu; Oh, Songhwai (2021). "R-SAM: Random Structure-Aware Minimization for Generalization and Robustness". arXiv:2110.07486 [cs.LG].
^ Wen, Yulei; Zhang, Zhe; Liu, Zhen; Li, Yue; Zhang, Tiantian (2022). "How Does SAM Influence the Loss Landscape?". arXiv:2203.08065 [cs.LG].
^ Zhou, Kaizheng; Zhang, Yulai; Tao, Dacheng (2023). "Sharpness-Aware Minimization: A Unified View and A New Theory". arXiv:2305.10276 [cs.LG].

References

[Foret2021-1] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ Foret, Pierre; Kleiner, Ariel; Mobahi, Hossein; Neyshabur, Behnam (2021). "Sharpness-Aware Minimization for Efficiently Improving Generalization". International Conference on Learning Representations (ICLR) 2021. arXiv:2010.01412.

[Wen2021Mitigating-2] Wen, Yulei; Liu, Zhen; Zhang, Zhe; Zhang, Yilong; Wang, Linmi; Zhang, Tiantian (2021). "Mitigating Memorization in Sample Selection for Learning with Noisy Labels". arXiv:2110.08529 [cs.LG].

[Zhuang2022Surrogate-3] Zhuang, Juntang; Gong, Ming; Liu, Tong (2022). "Surrogate Gap Minimization Improves Sharpness-Aware Training". International Conference on Machine Learning (ICML) 2022. PMLR. pp. 27098–27115.

[Croce2021SAMBayes-4] Croce, Francesco; Hein, Matthias (2021). "High-Resolution "Magic"-Field Spectroscopy on Trapped Polyatomic Molecules". Physical Review Letters. 127 (17): 173602. arXiv:2110.11214. Bibcode:2021PhRvL.127q3602P. doi:10.1103/PhysRevLett.127.173602. PMID 34739278.

[Kim2022Slicing-5] Kim, Daehyeon; Kim, Seungone; Kim, Kwangrok; Kim, Sejun; Kim, Jangho (2022). "Slicing Aided Hyper-dimensional Inference and Fine-tuning for Improved OOD Generalization". Conference on Neural Information Processing Systems (NeurIPS) 2022.

[Liu2021Delving-6] Liu, Sitong; Zhou, Pan; Zhang, Xingchao; Xu, Zhi; Wang, Guang; Zhao, Hao (2021). "Delving into SAM: An Analytical Study of Sharpness Aware Minimization". arXiv:2111.00905 [cs.LG].

[Andriushchenko2022Understanding-7] Andriushchenko, Maksym; Flammarion, Nicolas (2022). "Towards Understanding Sharpness-Aware Minimization". International Conference on Machine Learning (ICML) 2022. PMLR. pp. 612–639.

[Kwon2021ASAM-8] Kwon, Jungmin; Kim, Jeongseop; Park, Hyunseo; Choi, Il-Chul (2021). "ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks". International Conference on Machine Learning (ICML) 2021. PMLR. pp. 5919–5929.

[Chen2023SAMLLM-9] Chen, Xian; Zhai, Saining; Chan, Crucian; Le, Quoc V.; Houlsby, Graham (2023). "When is Sharpness-Aware Minimization (SAM) Effective for Large Language Models?". arXiv:2308.04932 [cs.LG].

[Liu2023SAMOOD-10] Liu, Kai; Li, Yifan; Wang, Hao; Liu, Zhen; Zhao, Jindong (2023). "When Sharpness-Aware Minimization Meets Data Augmentation: Connect the Dots for OOD Generalization". International Conference on Learning Representations (ICLR) 2023.

[Dou2022SAMPa-11] Dou, Yong; Zhou, Cong; Zhao, Peng; Zhang, Tong (2022). "SAMPa: A Parallelized Version of Sharpness-Aware Minimization". arXiv:2202.02081 [cs.LG].

[Chen2022SSAM-12] Chen, Wenlong; Liu, Xiaoyu; Yin, Huan; Yang, Tianlong (2022). "Sparse SAM: Squeezing Sharpness-aware Minimization into a Single Forward-backward Pass". arXiv:2205.13516 [cs.LG].

[Zhuang2022S2SAM-13] Zhuang, Juntang; Liu, Tong; Tao, Dacheng (2022). "S2-SAM: A Single-Step, Zero-Extra-Cost Approach to Sharpness-Aware Training". arXiv:2206.08307 [cs.LG].

[He2021MomentumSAM-14] He, Zequn; Liu, Sitong; Zhang, Xingchao; Zhou, Pan; Zhang, Cong; Xu, Zhi; Zhao, Hao (2021). "Optical secret sharing with cascaded metasurface holography". Science Advances. 7 (16). arXiv:2110.03265. Bibcode:2021SciA....7.9718G. doi:10.1126/sciadv.abf9718. PMC 8046362. PMID 33853788.

[Liu2022LookaheadSAM-15] Liu, Sitong; He, Zequn; Zhang, Xingchao; Zhou, Pan; Xu, Zhi; Zhang, Cong; Zhao, Hao (2022). "Lookahead Sharpness-aware Minimization". International Conference on Learning Representations (ICLR) 2022.

[Kim2022CRSAM-16] Kim, Minhwan; Lee, Suyeon; Shin, Jonghyun (2023). "MRChem Multiresolution Analysis Code for Molecular Electronic Structure Calculations: Performance and Scaling Properties". Journal of Chemical Theory and Computation. 19 (1): 137–146. arXiv:2210.01011. doi:10.1021/acs.jctc.2c00982. PMC 9835826. PMID 36410396.

[Liu2023FriendlySAM-17] Liu, Kai; Wang, Hao; Li, Yifan; Liu, Zhen; Zhang, Runpeng; Zhao, Jindong (2023). "Friendly Sharpness-Aware Minimization". International Conference on Learning Representations (ICLR) 2023.

[Singh2021RSAM-18] Singh, Sandeep Kumar; Ahn, Kyungsu; Oh, Songhwai (2021). "R-SAM: Random Structure-Aware Minimization for Generalization and Robustness". arXiv:2110.07486 [cs.LG].

[Wen2022SAMLandscape-19] Wen, Yulei; Zhang, Zhe; Liu, Zhen; Li, Yue; Zhang, Tiantian (2022). "How Does SAM Influence the Loss Landscape?". arXiv:2203.08065 [cs.LG].

[Zhou2023SAMUnified-20] Zhou, Kaizheng; Zhang, Yulai; Tao, Dacheng (2023). "Sharpness-Aware Minimization: A Unified View and A New Theory". arXiv:2305.10276 [cs.LG].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]