TabPFN

TabPFN
TabPFN
Developer(s)	Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, Frank Hutter, Leo Grinsztajn, Klemens Flöge, Oscar Key & Sauraj Gambhir
Initial release	September 16, 2023; 21 months ago
Written in	Python
Operating system	Linux, macOS, Microsoft Windows
Type	Machine learning
License	Apache License 2.0
Website	github.com/PriorLabs/TabPFN

TabPFN (Tabular Prior-data Fitted Network) is a machine learning model that uses a transformer architecture for supervised classification and regression tasks on small to medium-sized tabular datasets, e.g., up to 10,000 samples.^[1]

Overview

First developed in 2022, TabPFN v2 was published in 2025 in Nature (journal) by Hollmann and co-authors.^[1] The source code is published on GitHub under a modified Apache License and on PyPi.^[4]

TabPFN v1 was introduced in a 2022 pre-print and presented at ICLR 2023.^[2] Prior Labs, founded in 2024, aims to commercialize TabPFN.^[5]

TabPFN supports classification, regression and generative tasks,^[1] and its TabPFN-TS extension adds time series forecasting.^[6]

Training

TabPFN does not require extensive hyperparameter optimization, and is pre-trained on synthetic datasets.^[7]

TabPFN addresses challenges in modeling tabular data.^[8]^[9]

Related to Prior-Data Fitted Networks,^[10] TabPFN uses a transformer pre-trained on synthetic tabular datasets.^[2]^[11]

It is pre-trained once on around 130 million synthetic datasets generated using Structural Causal Models or Bayesian Neural Networks, simulating real-world data characteristics like missing values or noise.^[1] This enables TabPFN to process new datasets in a single forward pass, adapting to the input without retraining.^[2] The model’s transformer encoder processes features and labels by alternating attention across rows and columns, capturing relationships within the data.^[7] TabPFN v2, an updated version, handles numerical and categorical features, missing values, and supports tasks like regression and synthetic data generation.^[1]

TabPFN's pre-training exclusively uses synthetically generated datasets, avoiding benchmark contamination and the costs of curating real-world data.^[2] TabPFN v2 was pre-trained on approximately 130 million such datasets, each serving as a "meta-datapoint".^[1]

The synthetic datasets are primarily drawn from a prior distribution embodying causal reasoning principles, using Structural Causal Models (SCMs) or Bayesian Neural Networks (BNNs). Random inputs are passed through these models to generate outputs, with a bias towards simpler causal structures. The process generates diverse datasets that simulate real-world imperfections like missing values, imbalanced data and noise. During pre-training, TabPFN predicts the masked target values of new data points given training data points and their known targets, effectively learning a generic learning algorithm that is executed by running a neural network forward pass.^[1]

Performance

TabPFN v2 has been found to outperform tuned tree-based models like XGBoost or CatBoost in accuracy and speed on small tabular datasets.^[2]^{[failed verification]} For one application, it matched the accuracy of CatBoost with less training data.^[12]^{[failed verification]} According to the Nature commentary, traditional models may be more efficient on larger datasets.^[7]^{[failed verification]} The original v1 release of TabPFN imposed restrictions on multi-class classification tasks, a shortcoming that v2 partially addresses.^[2]^{[failed verification]}

Understanding and explaining the behavior and performance of TabPFN is an active area of research.^[5]^{[failed verification]}

Other research and applications

Applications for TabPFN have been investigated for domains such as Time Series Forecasting,^[6] chemoproteomics,^[13] insurance risk classification,^[14] medical diagnostics,^[15]^[16]^[17]^[18] metagenomics,^[19] wildfire propagation modeling,^[20] and others.^[21]

References

^ ^a ^b ^c ^d ^e ^f ^g ^h Hollmann, N.; Müller, S.; Purucker, L. (2025). "Accurate predictions on small data with a tabular foundation model". Nature. 637 (8045): 319–326. Bibcode:2025Natur.637..319H. doi:10.1038/s41586-024-08328-6. PMC 11711098. PMID 39780007.
^ ^a ^b ^c ^d ^e ^f ^g Hollmann, Noah (2023). TabPFN: A transformer that solves small tabular classification problems in a second. International Conference on Learning Representations (ICLR).
^ ^a ^b ^c Python Package Index (PyPI) - tabpfn https://pypi.org/project/tabpfn/
^ PriorLabs/TabPFN, Prior Labs, 2025-06-22, retrieved 2025-06-23
^ ^a ^b Kahn, Jeremy (5 February 2025). "AI has struggled to analyze tables and spreadsheets. This German startup thinks its breakthrough is about to change that". Fortune.
^ ^a ^b "TabPFN Time Series". GitHub.
^ ^a ^b ^c McElfresh, Duncan C. (8 January 2025). "The AI tool that can interpret any spreadsheet instantly". Nature. 637 (8045): 274–275. Bibcode:2025Natur.637..274M. doi:10.1038/d41586-024-03852-x. PMID 39780000.
^ Shwartz-Ziv, Ravid; Armon, Amitai (2022). "Tabular data: Deep learning is not all you need". Information Fusion. 81: 84–90. arXiv:2106.03253. doi:10.1016/j.inffus.2021.11.011.
^ Grinsztajn, Léo; Oyallon, Edouard; Varoquaux, Gaël (2022). Why do tree-based models still outperform deep learning on typical tabular data?. Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS '22). pp. 507–520.
^ Müller, Samuel (2022). Transformers can do Bayesian inference. International Conference on Learning Representations (ICLR).
^ McCarter, Calvin (May 7, 2024). "What exactly has TabPFN learned to do? | ICLR Blogposts 2024". iclr-blogposts.github.io. Retrieved 2025-06-22.
^ Bender, C.; Vestergaard, P.; Cichosz, S.L. (2025). "The History, Evolution and Future of Continuous Glucose Monitoring (CGM)". Diabetology. 6 (3): 17. doi:10.3390/diabetology6030017.
^ Offensperger, Fabian; Tin, Gary; Duran-Frigola, Miquel; Hahn, Elisa; Dobner, Sarah; Ende, Christopher W. am; Strohbach, Joseph W.; Rukavina, Andrea; Brennsteiner, Vincenth; Ogilvie, Kevin; Marella, Nara; Kladnik, Katharina; Ciuffa, Rodolfo; Majmudar, Jaimeen D.; Field, S. Denise; Bensimon, Ariel; Ferrari, Luca; Ferrada, Evandro; Ng, Amanda; Zhang, Zhechun; Degliesposti, Gianluca; Boeszoermenyi, Andras; Martens, Sascha; Stanton, Robert; Müller, André C.; Hannich, J. Thomas; Hepworth, David; Superti-Furga, Giulio; Kubicek, Stefan; Schenone, Monica; Winter, Georg E. (26 April 2024). "Large-scale chemoproteomics expedites ligand discovery and predicts ligand behavior in cells". Science. 384 (6694): eadk5864. Bibcode:2024Sci...384k5864O. doi:10.1126/science.adk5864. PMID 38662832.
^ Chu, Jasmin Z. K.; Than, Joel C. M.; Jo, Hudyjaya Siswoyo (2024). "Deep Learning for Cross-Selling Health Insurance Classification". 2024 International Conference on Green Energy, Computing and Sustainable Technology (GECOST). pp. 453–457. doi:10.1109/GECOST60902.2024.10475046. ISBN 979-8-3503-5790-5.
^ Alzakari, Sarah A.; Aldrees, Asma; Umer, Muhammad; Cascone, Lucia; Innab, Nisreen; Ashraf, Imran (December 2024). "Artificial intelligence-driven predictive framework for early detection of still birth". SLAS Technology. 29 (6): 100203. doi:10.1016/j.slast.2024.100203. PMID 39424101.
^ El-Melegy, Moumen; Mamdouh, Ahmed; Ali, Samia; Badawy, Mohamed; El-Ghar, Mohamed Abou; Alghamdi, Norah Saleh; El-Baz, Ayman (21 June 2024). "Prostate Cancer Diagnosis via Visual Representation of Tabular Data and Deep Transfer Learning". Bioengineering. 11 (7): 635. doi:10.3390/bioengineering11070635. PMC 11274351. PMID 39061717.
^ Karabacak, Mert; Schupper, Alexander; Carr, Matthew; Margetis, Konstantinos (August 2024). "A machine learning-based approach for individualized prediction of short-term outcomes after anterior cervical corpectomy". Asian Spine Journal. 18 (4): 541–549. doi:10.31616/asj.2024.0048. PMC 11366553. PMID 39113482.
^ Liu, Yanqing; Su, Zhenyi; Tavana, Omid; Gu, Wei (June 2024). "Understanding the complexity of p53 in a new era of tumor suppression". Cancer Cell. 42 (6): 946–967. doi:10.1016/j.ccell.2024.04.009. PMC 11190820. PMID 38729160.
^ Perciballi, Giulia; Granese, Federica; Fall, Ahmad; Zehraoui, Farida; Prifti, Edi; Zucker, Jean-Daniel (10 October 2024). Adapting TabPFN for Zero-Inflated Metagenomic Data. Table Representation Learning Workshop at NeurIPS 2024.
^ Khanmohammadi, Sadegh; Cruz, Miguel G.; Perrakis, Daniel D.B.; Alexander, Martin E.; Arashpour, Mehrdad (September 2024). "Using AutoML and generative AI to predict the type of wildfire propagation in Canadian conifer forests". Ecological Informatics. 82: 102711. doi:10.1016/j.ecoinf.2024.102711.
^ Peña-Asensio, Eloy; Trigo-Rodríguez, Josep M.; Sort, Jordi; Ibáñez-Insa, Jordi; Rimola, Albert (September 2024). "Machine learning applications on lunar meteorite minerals: From classification to mechanical properties prediction". International Journal of Mining Science and Technology. 34 (9): 1283–1292. Bibcode:2024IJMST..34.1283P. doi:10.1016/j.ijmst.2024.08.001.

[Nature_Article-1] ^ ^a ^b ^c ^d ^e ^f ^g ^h Hollmann, N.; Müller, S.; Purucker, L. (2025). "Accurate predictions on small data with a tabular foundation model". Nature. 637 (8045): 319–326. Bibcode:2025Natur.637..319H. doi:10.1038/s41586-024-08328-6. PMC 11711098. PMID 39780007.

[First_Paper-2] ^ ^a ^b ^c ^d ^e ^f ^g Hollmann, Noah (2023). TabPFN: A transformer that solves small tabular classification problems in a second. International Conference on Learning Representations (ICLR).

[Python_Package_Index-3] Python Package Index (PyPI) - tabpfn https://pypi.org/project/tabpfn/

[:0-4] PriorLabs/TabPFN, Prior Labs, 2025-06-22, retrieved 2025-06-23

[Fortune_article-5] Kahn, Jeremy (5 February 2025). "AI has struggled to analyze tables and spreadsheets. This German startup thinks its breakthrough is about to change that". Fortune.

[TabPFN_Time_Series-6] "TabPFN Time Series". GitHub.

[Nature_Paper_Commentary-7] McElfresh, Duncan C. (8 January 2025). "The AI tool that can interpret any spreadsheet instantly". Nature. 637 (8045): 274–275. Bibcode:2025Natur.637..274M. doi:10.1038/d41586-024-03852-x. PMID 39780000.

[8] Shwartz-Ziv, Ravid; Armon, Amitai (2022). "Tabular data: Deep learning is not all you need". Information Fusion. 81: 84–90. arXiv:2106.03253. doi:10.1016/j.inffus.2021.11.011.

[9] Grinsztajn, Léo; Oyallon, Edouard; Varoquaux, Gaël (2022). Why do tree-based models still outperform deep learning on typical tabular data?. Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS '22). pp. 507–520.

[Transformers_can_do_bayesian_inference-10] Müller, Samuel (2022). Transformers can do Bayesian inference. International Conference on Learning Representations (ICLR).

[11] McCarter, Calvin (May 7, 2024). "What exactly has TabPFN learned to do? | ICLR Blogposts 2024". iclr-blogposts.github.io. Retrieved 2025-06-22.

[Glucose_Monitoring-12] Bender, C.; Vestergaard, P.; Cichosz, S.L. (2025). "The History, Evolution and Future of Continuous Glucose Monitoring (CGM)". Diabetology. 6 (3): 17. doi:10.3390/diabetology6030017.

[13] Offensperger, Fabian; Tin, Gary; Duran-Frigola, Miquel; Hahn, Elisa; Dobner, Sarah; Ende, Christopher W. am; Strohbach, Joseph W.; Rukavina, Andrea; Brennsteiner, Vincenth; Ogilvie, Kevin; Marella, Nara; Kladnik, Katharina; Ciuffa, Rodolfo; Majmudar, Jaimeen D.; Field, S. Denise; Bensimon, Ariel; Ferrari, Luca; Ferrada, Evandro; Ng, Amanda; Zhang, Zhechun; Degliesposti, Gianluca; Boeszoermenyi, Andras; Martens, Sascha; Stanton, Robert; Müller, André C.; Hannich, J. Thomas; Hepworth, David; Superti-Furga, Giulio; Kubicek, Stefan; Schenone, Monica; Winter, Georg E. (26 April 2024). "Large-scale chemoproteomics expedites ligand discovery and predicts ligand behavior in cells". Science. 384 (6694): eadk5864. Bibcode:2024Sci...384k5864O. doi:10.1126/science.adk5864. PMID 38662832.

[14] Chu, Jasmin Z. K.; Than, Joel C. M.; Jo, Hudyjaya Siswoyo (2024). "Deep Learning for Cross-Selling Health Insurance Classification". 2024 International Conference on Green Energy, Computing and Sustainable Technology (GECOST). pp. 453–457. doi:10.1109/GECOST60902.2024.10475046. ISBN 979-8-3503-5790-5.

[15] Alzakari, Sarah A.; Aldrees, Asma; Umer, Muhammad; Cascone, Lucia; Innab, Nisreen; Ashraf, Imran (December 2024). "Artificial intelligence-driven predictive framework for early detection of still birth". SLAS Technology. 29 (6): 100203. doi:10.1016/j.slast.2024.100203. PMID 39424101.

[16] El-Melegy, Moumen; Mamdouh, Ahmed; Ali, Samia; Badawy, Mohamed; El-Ghar, Mohamed Abou; Alghamdi, Norah Saleh; El-Baz, Ayman (21 June 2024). "Prostate Cancer Diagnosis via Visual Representation of Tabular Data and Deep Transfer Learning". Bioengineering. 11 (7): 635. doi:10.3390/bioengineering11070635. PMC 11274351. PMID 39061717.

[17] Karabacak, Mert; Schupper, Alexander; Carr, Matthew; Margetis, Konstantinos (August 2024). "A machine learning-based approach for individualized prediction of short-term outcomes after anterior cervical corpectomy". Asian Spine Journal. 18 (4): 541–549. doi:10.31616/asj.2024.0048. PMC 11366553. PMID 39113482.

[18] Liu, Yanqing; Su, Zhenyi; Tavana, Omid; Gu, Wei (June 2024). "Understanding the complexity of p53 in a new era of tumor suppression". Cancer Cell. 42 (6): 946–967. doi:10.1016/j.ccell.2024.04.009. PMC 11190820. PMID 38729160.

[19] Perciballi, Giulia; Granese, Federica; Fall, Ahmad; Zehraoui, Farida; Prifti, Edi; Zucker, Jean-Daniel (10 October 2024). Adapting TabPFN for Zero-Inflated Metagenomic Data. Table Representation Learning Workshop at NeurIPS 2024.

[20] Khanmohammadi, Sadegh; Cruz, Miguel G.; Perrakis, Daniel D.B.; Alexander, Martin E.; Arashpour, Mehrdad (September 2024). "Using AutoML and generative AI to predict the type of wildfire propagation in Canadian conifer forests". Ecological Informatics. 82: 102711. doi:10.1016/j.ecoinf.2024.102711.

[21] Peña-Asensio, Eloy; Trigo-Rodríguez, Josep M.; Sort, Jordi; Ibáñez-Insa, Jordi; Rimola, Albert (September 2024). "Machine learning applications on lunar meteorite minerals: From classification to mechanical properties prediction". International Journal of Mining Science and Technology. 34 (9): 1283–1292. Bibcode:2024IJMST..34.1283P. doi:10.1016/j.ijmst.2024.08.001.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

Overview

Training

Performance

Other research and applications

See also

References