TabPFN (Tabular Prior-data Fitted Network) is a machine learning model for proposed in 2022. It uses a transformer architecture. It is intended for supervised classification and regression analysis on small- to medium-sized datasets, e.g., up to 10,000 samples.
Prior Labs, founded in 2024, aims to commercialize TabPFN.
TabPFN v2 was pre-trained on approximately 130 million such datasets. Synthetic datasets are generated using or Bayesian neural networks; this can include simulating missing values, imbalanced data, and noise. Random inputs are passed through these models to generate outputs, with a bias towards simpler causal structures. During pre-training, TabPFN predicts the masked target values of new data points given training data points and their known targets, effectively learning a generic learning algorithm that is executed by running a neural network forward pass. The new dataset is then processed in a single forward pass without retraining. The model’s transformer encoder processes features and labels by alternating attention across rows and columns. TabPFN v2 handles numerical and categorical features, missing values, and supports tasks like regression and synthetic data generation.
Since TabPFN is pre-trained, in contrast to other deep learning methods, it does not require costly hyperparameter optimization.
|
|