Self-supervised learning - Misplaced Pages

(Redirected from Contrastive learning) Machine learning paradigm

Machine learning and data mining
Part of a series on
Paradigms Supervised learning Unsupervised learning Semi-supervised learning Self-supervised learning Reinforcement learning Meta-learning Online learning Batch learning Curriculum learning Rule-based learning Neuro-symbolic AI Neuromorphic engineering Quantum machine learning
Problems Classification Generative modeling Regression Clustering Dimensionality reduction Density estimation Anomaly detection Data cleaning AutoML Association rules Semantic analysis Structured prediction Feature engineering Feature learning Learning to rank Grammar induction Ontology learning Multimodal learning
Supervised learning (classification • regression) Apprenticeship learning Decision trees Ensembles Bagging Boosting Random forest k-NN Linear regression Naive Bayes Artificial neural networks Logistic regression Perceptron Relevance vector machine (RVM) Support vector machine (SVM)
Clustering BIRCH CURE Hierarchical k-means Fuzzy Expectation–maximization (EM) DBSCAN OPTICS Mean shift
Dimensionality reduction Factor analysis CCA ICA LDA NMF PCA PGD t-SNE SDL
Structured prediction Graphical models Bayes net Conditional random field Hidden Markov
Anomaly detection RANSAC k-NN Local outlier factor Isolation forest
Artificial neural network Autoencoder Deep learning Feedforward neural network Recurrent neural network LSTM GRU ESN reservoir computing Boltzmann machine Restricted GAN Diffusion model SOM Convolutional neural network U-Net LeNet AlexNet DeepDream Neural radiance field Transformer Vision Mamba Spiking neural network Memtransistor Electrochemical RAM (ECRAM)
Reinforcement learning Q-learning SARSA Temporal difference (TD) Multi-agent Self-play
Learning with humans Active learning Crowdsourcing Human-in-the-loop RLHF
Model diagnostics Coefficient of determination Confusion matrix Learning curve ROC curve
Mathematical foundations Kernel machines Bias–variance tradeoff Computational learning theory Empirical risk minimization Occam learning PAC learning Statistical learning VC theory
Journals and conferences ECML PKDD NeurIPS ICML ICLR IJCAI ML JMLR
Related articles Glossary of artificial intelligence List of datasets for machine-learning research List of datasets in computer vision and image processing Outline of machine learning
v t e

Self-supervised learning (SSL) is a paradigm in machine learning where a model is trained on a task using the data itself to generate supervisory signals, rather than relying on externally-provided labels. In the context of neural networks, self-supervised learning aims to leverage inherent structures or relationships within the input data to create meaningful training signals. SSL tasks are designed so that solving them requires capturing essential features or relationships in the data. The input data is typically augmented or transformed in a way that creates pairs of related samples, where one sample serves as the input, and the other is used to formulate the supervisory signal. This augmentation can involve introducing noise, cropping, rotation, or other transformations. Self-supervised learning more closely imitates the way humans learn to classify objects.

During SSL, the model learns in two steps. First, the task is solved based on an auxiliary or pretext classification task using pseudo-labels, which help to initialize the model parameters. Next, the actual task is performed with supervised or unsupervised learning.

Self-supervised learning has produced promising results in recent years, and has found practical application in fields such as audio processing, and is being used by Facebook and others for speech recognition.

Types

Autoassociative self-supervised learning

Autoassociative self-supervised learning is a specific category of self-supervised learning where a neural network is trained to reproduce or reconstruct its own input data. In other words, the model is tasked with learning a representation of the data that captures its essential features or structure, allowing it to regenerate the original input.

The term "autoassociative" comes from the fact that the model is essentially associating the input data with itself. This is often achieved using autoencoders, which are a type of neural network architecture used for representation learning. Autoencoders consist of an encoder network that maps the input data to a lower-dimensional representation (latent space), and a decoder network that reconstructs the input from this representation.

The training process involves presenting the model with input data and requiring it to reconstruct the same data as closely as possible. The loss function used during training typically penalizes the difference between the original input and the reconstructed output (e.g. mean squared error). By minimizing this reconstruction error, the autoencoder learns a meaningful representation of the data in its latent space.

Contrastive self-supervised learning

For a binary classification task, training data can be divided into positive examples and negative examples. Positive examples are those that match the target. For example, if training a classifier to identify birds, the positive training data would include images that contain birds. Negative examples would be images that do not. Contrastive self-supervised learning uses both positive and negative examples. The loss function in contrastive learning is used to minimize the distance between positive sample pairs, while maximizing the distance between negative sample pairs.

An early example uses a pair of 1-dimensional convolutional neural networks to process a pair of images and maximize their agreement.

Contrastive Language-Image Pre-training (CLIP) allows joint pretraining of a text encoder and an image encoder, such that a matching image-text pair have image encoding vector and text encoding vector that span a small angle (having a large cosine similarity).

InfoNCE (Noise-Contrastive Estimation) is a method to optimize two models jointly, based on Noise Contrastive Estimation (NCE). Given a set $X=\left\{x_{1},\ldots x_{N}\right\}$ of $N$ random samples containing one positive sample from $p\left(x_{t+k}\mid c_{t}\right)$ and $N-1$ negative samples from the 'proposal' distribution $p\left(x_{t+k}\right)$ , it minimizes the following loss function:

${\mathcal {L}}_{\mathrm {N} }=-\mathbb {E} _{X}\left$

Non-contrastive self-supervised learning

Non-contrastive self-supervised learning (NCSSL) uses only positive examples. Counterintuitively, NCSSL converges on a useful local minimum rather than reaching a trivial solution, with zero loss. For the example of binary classification, it would trivially learn to classify each example as positive. Effective NCSSL requires an extra predictor on the online side that does not back-propagate on the target side.

Comparison with other forms of machine learning

SSL belongs to supervised learning methods insofar as the goal is to generate a classified output from the input. At the same time, however, it does not require the explicit use of labeled input-output pairs. Instead, correlations, metadata embedded in the data, or domain knowledge present in the input are implicitly and autonomously extracted from the data. These supervisory signals, generated from the data, can then be used for training.

SSL is similar to unsupervised learning in that it does not require labels in the sample data. Unlike unsupervised learning, however, learning is not done using inherent data structures.

Semi-supervised learning combines supervised and unsupervised learning, requiring only a small portion of the learning data be labeled.

In transfer learning a model designed for one task is reused on a different task.

Training an autoencoder intrinsically constitutes a self-supervised process, because the output pattern needs to become an optimal reconstruction of the input pattern itself. However, in current jargon, the term 'self-supervised' has become associated with classification tasks that are based on a pretext-task training setup. This involves the (human) design of such pretext task(s), unlike the case of fully self-contained autoencoder training.

In reinforcement learning, self-supervising learning from a combination of losses can create abstract representations where only the most important information about the state are kept in a compressed way.

Examples

Self-supervised learning is particularly suitable for speech recognition. For example, Facebook developed wav2vec, a self-supervised algorithm, to perform speech recognition using two deep convolutional neural networks that build on each other.

Google's Bidirectional Encoder Representations from Transformers (BERT) model is used to better understand the context of search queries.

OpenAI's GPT-3 is an autoregressive language model that can be used in language processing. It can be used to translate texts or answer questions, among other things.

Bootstrap Your Own Latent (BYOL) is a NCSSL that produced excellent results on ImageNet and on transfer and semi-supervised benchmarks.

The Yarowsky algorithm is an example of self-supervised learning in natural language processing. From a small number of labeled examples, it learns to predict which word sense of a polysemous word is being used at a given point in text.

DirectPred is a NCSSL that directly sets the predictor weights instead of learning it via typical gradient descent.

Self-GenomeNet is an example of self-supervised learning in genomics.

References

^ Bouchard, Louis (25 November 2020). "What is Self-Supervised Learning? | Will machines ever be able to learn like humans?". Medium. Retrieved 9 June 2021.
Doersch, Carl; Zisserman, Andrew (October 2017). "Multi-task Self-Supervised Visual Learning". 2017 IEEE International Conference on Computer Vision (ICCV). IEEE. pp. 2070–2079. arXiv:1708.07860. doi:10.1109/iccv.2017.226. ISBN 978-1-5386-1032-9. S2CID 473729.
^ Beyer, Lucas; Zhai, Xiaohua; Oliver, Avital; Kolesnikov, Alexander (October 2019). "S4L: Self-Supervised Semi-Supervised Learning". 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. pp. 1476–1485. arXiv:1905.03670. doi:10.1109/iccv.2019.00156. ISBN 978-1-7281-4803-8. S2CID 167209887.
Doersch, Carl; Gupta, Abhinav; Efros, Alexei A. (December 2015). "Unsupervised Visual Representation Learning by Context Prediction". 2015 IEEE International Conference on Computer Vision (ICCV). IEEE. pp. 1422–1430. arXiv:1505.05192. doi:10.1109/iccv.2015.167. ISBN 978-1-4673-8391-2. S2CID 9062671.
Zheng, Xin; Wang, Yong; Wang, Guoyou; Liu, Jianguo (April 2018). "Fast and robust segmentation of white blood cell images by self-supervised learning". Micron. 107: 55–71. doi:10.1016/j.micron.2018.01.010. ISSN 0968-4328. PMID 29425969. S2CID 3796689.
Gidaris, Spyros; Bursuc, Andrei; Komodakis, Nikos; Perez, Patrick Perez; Cord, Matthieu (October 2019). "Boosting Few-Shot Visual Learning with Self-Supervision". 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. pp. 8058–8067. arXiv:1906.05186. doi:10.1109/iccv.2019.00815. ISBN 978-1-7281-4803-8. S2CID 186206588.
^ "Wav2vec: State-of-the-art speech recognition through self-supervision". ai.facebook.com. Retrieved 9 June 2021.
^ Kramer, Mark A. (1991). "Nonlinear principal component analysis using autoassociative neural networks" (PDF). AIChE Journal. 37 (2): 233–243. Bibcode:1991AIChE..37..233K. doi:10.1002/aic.690370209.
^ "Demystifying a key self-supervised learning technique: Non-contrastive learning". ai.facebook.com. Retrieved 5 October 2021.
Becker, Suzanna; Hinton, Geoffrey E. (January 1992). "Self-organizing neural network that discovers surfaces in random-dot stereograms". Nature. 355 (6356): 161–163. Bibcode:1992Natur.355..161B. doi:10.1038/355161a0. ISSN 1476-4687. PMID 1729650.
Oord, Aaron van den; Li, Yazhe; Vinyals, Oriol (22 January 2019), Representation Learning with Contrastive Predictive Coding, arXiv:1807.03748
Gutmann, Michael; Hyvärinen, Aapo (31 March 2010). "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models". Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings: 297–304.
Littwin, Etai; Wolf, Lior (June 2016). "The Multiverse Loss for Robust Transfer Learning". 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. pp. 3957–3966. arXiv:1511.09033. doi:10.1109/cvpr.2016.429. ISBN 978-1-4673-8851-1. S2CID 6517610.
Francois-Lavet, Vincent; Bengio, Yoshua; Precup, Doina; Pineau, Joelle (2019). "Combined Reinforcement Learning via Abstract Representations". Proceedings of the AAAI Conference on Artificial Intelligence. arXiv:1809.04506.
"Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing". Google AI Blog. 2 November 2018. Retrieved 9 June 2021.
Wilcox, Ethan; Qian, Peng; Futrell, Richard; Kohita, Ryosuke; Levy, Roger; Ballesteros, Miguel (2020). "Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models". Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA, USA: Association for Computational Linguistics. pp. 4640–4652. arXiv:2010.05725. doi:10.18653/v1/2020.emnlp-main.375. S2CID 222291675.
Grill, Jean-Bastien; Strub, Florian; Altché, Florent; Tallec, Corentin; Richemond, Pierre H.; Buchatskaya, Elena; Doersch, Carl; Pires, Bernardo Avila; Guo, Zhaohan Daniel; Azar, Mohammad Gheshlaghi; Piot, Bilal (10 September 2020). "Bootstrap your own latent: A new approach to self-supervised Learning". arXiv:2006.07733 .
Gündüz, Hüseyin Anil; Binder, Martin; To, Xiao-Yin; Mreches, René; Bischl, Bernd; McHardy, Alice C.; Münch, Philipp C.; Rezaei, Mina (11 September 2023). "A self-supervised deep learning method for data-efficient training in genomics". Communications Biology. 6 (1): 928. doi:10.1038/s42003-023-05310-2. ISSN 2399-3642. PMC 10495322. PMID 37696966.

External links

Doersch, Carl; Zisserman, Andrew (October 2017). "Multi-task Self-Supervised Visual Learning". 2017 IEEE International Conference on Computer Vision (ICCV). pp. 2070–2079. arXiv:1708.07860. doi:10.1109/ICCV.2017.226. ISBN 978-1-5386-1032-9. S2CID 473729.
Doersch, Carl; Gupta, Abhinav; Efros, Alexei A. (December 2015). "Unsupervised Visual Representation Learning by Context Prediction". 2015 IEEE International Conference on Computer Vision (ICCV). pp. 1422–1430. arXiv:1505.05192. doi:10.1109/ICCV.2015.167. ISBN 978-1-4673-8391-2. S2CID 9062671.
Zheng, Xin; Wang, Yong; Wang, Guoyou; Liu, Jianguo (1 April 2018). "Fast and robust segmentation of white blood cell images by self-supervised learning". Micron. 107: 55–71. doi:10.1016/j.micron.2018.01.010. ISSN 0968-4328. PMID 29425969. S2CID 3796689.
Yarowsky, David (1995). "Unsupervised Word Sense Disambiguation Rivaling Supervised Methods". Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. Cambridge, MA: Association for Computational Linguistics: 189–196. doi:10.3115/981658.981684. Retrieved 1 November 2022.

Artificial intelligence

Concepts

Applications

Implementations

Audio–visual	AlexNet WaveNet Human image synthesis HWR OCR Speech synthesis ElevenLabs Speech recognition Whisper Facial recognition AlphaFold Text-to-image models Aurora DALL-E Flux Ideogram Imagen Midjourney Stable Diffusion Text-to-video models Dream Machine Gen-3 Alpha Hailuo AI Kling Sora Veo Music generation Suno AI Udio
Text	Word2vec Seq2seq GloVe BERT T5 Llama Chinchilla AI PaLM GPT 1 2 3 J ChatGPT 4 4o o1 o3 Claude Gemini chatbot Grok LaMDA BLOOM Project Debater IBM Watson IBM Watsonx Granite PanGu-Σ
Decisional	AlphaGo AlphaZero OpenAI Five Self-driving car MuZero Action selection AutoGPT Robot control

People

Architectures

Portals
- Technology
Categories
- Artificial neural networks
- Machine learning
List
- Companies
- Projects

Categories: