Original): Luhui Hu
Originally published in the direction of artificial intelligence.
Introduction: an increase in self -sufficient science
In recent years, Learning complacency (SSL) appeared as a key paradigm in machine learning, enabling models to learn based on unknown data by generating your own supervisory signals. This approach significantly reduced dependence on large data sets, accelerating the progress in various AI domains.
Understanding the complacency of learning
SSL is a subset Learning without supervision where the system learns to understand and interpret data by teaching. Unlike supervised learning, which is based on marked data sets, SSL algorithms generate their own labels based on input data, enabling models to use an inseparable data structure to learn useful representation without labels by humans.
Short SSL history
The concept of SSL comes from the first days of machine learning. In 2006 Geoffrey Hinton He introduced an idea before training neural networks using learning without supervision, establishing grounds for SSL. However, it was only in 2010 that SSL gained significant adhesion, along with the development of such models Word2Vec AND Bert in natural language processing and Simclr AND Booger in a computer vision.
Basic techniques in SSL
1. Contempt learning
Contractical learning includes learning representations by comparing similar and different data pairs. The model is trained to bring similar data points in the representative space, while moving different. This technique played a key role in the tasks of computer vision.
2. Mass modeling
Modeling, such as Bert, popularized by models such as Bert, includes masking input data and model training for predicting missing parts. This approach helps the model understand the context and reports in the data.
3. Predictive learning
In a predictive learning, the model is trained to predict future data points based on previous input data. This technique is widely used in the analysis of time ranks and learning to strengthen.
Inside SSL Technologies and Architecture
Modern SSL develops intricate how well models can Use the structure as part of unknown data. Below are the most influential techniques and their basic architecture.
1. Contempt learning
Basic idea: Learn to represent, attracting similar couples close and moving different.
Notable models:
- Simclr (simple contrasting of national team)
It uses data extensions (e.g. pruning, color shaking) to generate positive pairs from the same image. Trained with contrasting loss (NT-XENT). - Moco (shoe contrast)
It introduces a dynamic memory bank and shoot enkoder to build coherent representations between mini-parties.
Architecture:
- Spine encoder (e.g. Resnet)
- Projection head (MLP)
- Contractic loss (infonce or NT-Xent)
Used in: Presorraining computer vision (Resnet/VIT), Robotics perception modules.
2. Masked carsodowanie (mae, bert, vests)
Basic idea: Mainten the parts of the entrance and dancing model to reproduce them.
Notable models:
- Bert (NLP)
It predicts masked tokens using language models based on transformers. - Mae (masked autoencoder for vision)
Masks 75% of image patches and reconstruct the original image from visible. - Beit (two -way representation of the encoder from image transformers)
It combines masked modeling with image tokens for vision tasks.
Architecture:
- Transformer enkoder
- Masking module
- Reconstruction decoder
Used in: GPT family, multimodal codes (Palm-E, Flamingo), FSD planning modules.
3. Bootstrap of your own latent (byol, Dino)
Basic idea: Learn to represent without negative samples, equalizing the results from two networks – one is the average moving the other.
Notable models:
- Byol (Facebook AI)
Uses online network and slowly updating the target network to adjust the function projections. - Dino
He builds attention maps that capture information at the object level without supervision.
Architecture:
- Two codes (online and target)
- Projection heads and MLP forecasts
- Without a contrasting loss, just a fits similarity
Used in: Spatial awareness and object -oriented learning in world models.
4. Predictive coding and hidden dynamics (world models)
Basic idea: Learn a compact representation of the world that can predict future hidden states.
Notable models:
- Dreamerv3
It connects an encoder based on VAE with a repeated model of dynamics and reinforcement learning. - Model of the world meta
It uses predictive learning and representations based on energy for autonomous interaction.
Architecture:
- Enkoder + latent dynamics (RNN/Transformer)
- Prize/forecasting value
- Optional rules (for agents based on RL)
Used in: General agents, robotics, planning based on simulation (e.g. Nvidia Cosmos, π0.5).
5. Requirement required (clip, Flamingo, Helisa)
Basic idea: Align visual and text methods using contrasting or masked modeling.
Notable models:
- CLIP (Openai)
Trained to match pairs of image text using a contrasting loss. - Flamingo (Deepmind), Spiral (AI drawing)
Expand alignment to VLA reasoning and real -time interaction.
Architecture:
- Encoder vision (VIT or CNN)
- Language encoder (transformer)
- Joint training with contrasting or cross heads
Used in: Humanoid robotics, FSD Scene-Text Grounding, household agents.
SSL in models and robotics of foundations
GPT-4O and GPT-4
- Guide Modeling of the causal language of maskedwhich is a form of SSL predicting future tokens.
- Use Multimodal leveling goals in GPT-4O to integrate vision, sound and text in unified architecture.
- Tuning the instructions for SSL lever to improve generalization.
Models operating in the language of the vision (RT-2, Helisa, Openvla)
- Start from Clip style scared for visual grounding.
- Use Behavioral cloning with coding.
- Add often Layers of intercipe attention Trained with the forecast of the next action and masked sensor modeling.
World models (π0.5, Cosmos, Meta WM)
- Train with complacency forecasting latentOften using:
- Visual encoders (VIT/Resnet)
- Time models based on transformer or RNN
- Multi -purpose heads (prize, next image, mask recovery)
- Example: Cosmos Reason1 It combines perception with simulation using a self -sufficient physics tokenizer.
Tesla FSD (V13+)
- Uses self -sufficient elements such as:
- 3D trajectories taken From video data
- Masked autoregression video forecasts Modeling of driving behavior
- Multimodal fusion of the sensor (free from Lidar) with SSL on video pipelines for action
The AI Tesla pile is still changing from supervised logic blocks in the united direction Comprehensive complacency driving models.
SSL applications
Natural language processing (NLP)
SSL has revolutionized NLP, enabling learning to learn from huge amounts of unknown text. Models such as Bert and GPT have achieved the latest results in various NLP tasks.
Computer vision
At Computer Vision, SSL techniques have been used for pre -leaching models on large sets of data data, which leads to better performance in tasks such as image classification, detection of objects and segmentation.
Robotics
SSL allows robots to learn based on their interaction with the environment without clear supervision, increasing their adaptability and autonomy.
Healthcare
In medical imaging, SSL helps in learning representation based on unknown scans, supporting the diagnosis of the disease and treatment planning.
Advantages of SSL
- Reduced dependence on the marked data: SSL minimizes the need for large data sets that are often expensive and time -consuming to create.
- Improved generalization: Models trained with SSL often generalize better for new tasks and domains.
- Scalability: SSL allows the use of huge amounts of unknown data, facilitating large -scale models training.
Challenges in SSL
- Designing effective pretext tasks: Creating tasks leading to significant representations is non -trivial and often specific to the domain.
- Calculation resources: Training large SSL models requires significant computing strength.
- Evaluation indicators: Assessment of the quality of learned representations without marked data remains a challenge.
SSL future
As SSL evolutions, it is expected to play a key role in the development of general artificial intelligence (GAI). Future tips include:
- Integration with learning reinforcement: A combination of SSL with reinforcement learning can lead to more efficient learning in dynamic environments.
- Multimodal learning: SSL will facilitate learning with many data methods, such as text, images and sound, leading to more comprehensive AI systems.
- Continuous learning: SSL may enable continuous learning from the streaming of data without forgetting about prior knowledge.
Application
Commodation of learning appeared as a transformational approach in machine learning, enabling models to effectively learn based on unknown data. Its applications include various domains, and its potential is constantly growing with the progress of research. When we approach more generalized AI systems, SSL will undoubtedly play a central role in shaping the future of artificial intelligence.
Reference
- https://www.linkedin.com/posts/yann-lecun_the-self-supervised-learning-cookbook-activity-7057520172525334528-ahhe
- https://venturebeat.com/ai/facebook-details-sel-supervised-ai-th-can -segment-images-and-videos/
- Path for autonomous machine intelligence: https://openreview.net/pdf?id=BZ5A1R-KVSF
- Self -compulsory DIRTING of visual features in the wild: https://arxiv.org/pdf/2103.01988.pdf
- Learning complacency: Dark matter of intelligence: https://ai.facebook.com/blog/self-supervised-learning-the-dark-matter-of-intelligence/
- A thorough guide to self -sufficient learning: benefits and applications: https://research.aimultiPle.com/self-supervised-learning/
- Learning to represent complacency: https://lilianweg.github.io/posts/2019-11-10-sels-supervised/
- Self -complacency of learning and its use: https://neptune.ai/blog/self-supervised—-myning
- Self -complacency learning for recommending systems: survey: https://arxiv.org/pdf/2203.15876.pdf
- Self -compilation large -scale learning Recommendations for subjects: https://arxiv.org/pdf/2007.12865.pdf
Published via AI