Home Machine Learning Learning complacency: the engine behind the general AI

Machine Learning

Learning complacency: the engine behind the general AI

May 12, 2025

Original): Luhui Hu

Originally published in the direction of artificial intelligence.

Typical SSL architecture

Introduction: an increase in self -sufficient science

In recent years, Learning complacency (SSL) appeared as a key paradigm in machine learning, enabling models to learn based on unknown data by generating your own supervisory signals. This approach significantly reduced dependence on large data sets, accelerating the progress in various AI domains.

Understanding the complacency of learning

SSL is a subset Learning without supervision where the system learns to understand and interpret data by teaching. Unlike supervised learning, which is based on marked data sets, SSL algorithms generate their own labels based on input data, enabling models to use an inseparable data structure to learn useful representation without labels by humans.

Short SSL history

The concept of SSL comes from the first days of machine learning. In 2006 Geoffrey Hinton He introduced an idea before training neural networks using learning without supervision, establishing grounds for SSL. However, it was only in 2010 that SSL gained significant adhesion, along with the development of such models Word2Vec AND Bert in natural language processing and Simclr AND Booger in a computer vision.

Basic techniques in SSL

1. Contempt learning

Contractical learning includes learning representations by comparing similar and different data pairs. The model is trained to bring similar data points in the representative space, while moving different. This technique played a key role in the tasks of computer vision.

2. Mass modeling

Modeling, such as Bert, popularized by models such as Bert, includes masking input data and model training for predicting missing parts. This approach helps the model understand the context and reports in the data.

3. Predictive learning

In a predictive learning, the model is trained to predict future data points based on previous input data. This technique is widely used in the analysis of time ranks and learning to strengthen.

Inside SSL Technologies and Architecture

Modern SSL develops intricate how well models can Use the structure as part of unknown data. Below are the most influential techniques and their basic architecture.

1. Contempt learning

Basic idea: Learn to represent, attracting similar couples close and moving different.

Notable models:

Simclr (simple contrasting of national team)
It uses data extensions (e.g. pruning, color shaking) to generate positive pairs from the same image. Trained with contrasting loss (NT-XENT).
Moco (shoe contrast)
It introduces a dynamic memory bank and shoot enkoder to build coherent representations between mini-parties.

Architecture:

Spine encoder (e.g. Resnet)
Projection head (MLP)
Contractic loss (infonce or NT-Xent)

Used in: Presorraining computer vision (Resnet/VIT), Robotics perception modules.

2. Masked carsodowanie (mae, bert, vests)

Basic idea: Mainten the parts of the entrance and dancing model to reproduce them.

Notable models:

Bert (NLP)
It predicts masked tokens using language models based on transformers.
Mae (masked autoencoder for vision)
Masks 75% of image patches and reconstruct the original image from visible.
Beit (two -way representation of the encoder from image transformers)
It combines masked modeling with image tokens for vision tasks.

Architecture:

Transformer enkoder
Masking module
Reconstruction decoder

Used in: GPT family, multimodal codes (Palm-E, Flamingo), FSD planning modules.

3. Bootstrap of your own latent (byol, Dino)

Basic idea: Learn to represent without negative samples, equalizing the results from two networks – one is the average moving the other.

Notable models:

Byol (Facebook AI)
Uses online network and slowly updating the target network to adjust the function projections.
Dino
He builds attention maps that capture information at the object level without supervision.

Architecture:

Two codes (online and target)
Projection heads and MLP forecasts
Without a contrasting loss, just a fits similarity

Used in: Spatial awareness and object -oriented learning in world models.

4. Predictive coding and hidden dynamics (world models)

Basic idea: Learn a compact representation of the world that can predict future hidden states.

Notable models:

Dreamerv3
It connects an encoder based on VAE with a repeated model of dynamics and reinforcement learning.
Model of the world meta
It uses predictive learning and representations based on energy for autonomous interaction.

Architecture:

Enkoder + latent dynamics (RNN/Transformer)
Prize/forecasting value
Optional rules (for agents based on RL)

Used in: General agents, robotics, planning based on simulation (e.g. Nvidia Cosmos, π0.5).

5. Requirement required (clip, Flamingo, Helisa)

Basic idea: Align visual and text methods using contrasting or masked modeling.

Notable models:

CLIP (Openai)
Trained to match pairs of image text using a contrasting loss.
Flamingo (Deepmind), Spiral (AI drawing)
Expand alignment to VLA reasoning and real -time interaction.

Architecture:

Encoder vision (VIT or CNN)
Language encoder (transformer)
Joint training with contrasting or cross heads

Used in: Humanoid robotics, FSD Scene-Text Grounding, household agents.

SSL in models and robotics of foundations

GPT-4O and GPT-4

Guide Modeling of the causal language of maskedwhich is a form of SSL predicting future tokens.
Use Multimodal leveling goals in GPT-4O to integrate vision, sound and text in unified architecture.
Tuning the instructions for SSL lever to improve generalization.

Models operating in the language of the vision (RT-2, Helisa, Openvla)

Start from Clip style scared for visual grounding.
Use Behavioral cloning with coding.
Add often Layers of intercipe attention Trained with the forecast of the next action and masked sensor modeling.

World models (π0.5, Cosmos, Meta WM)

Train with complacency forecasting latentOften using:
Visual encoders (VIT/Resnet)
Time models based on transformer or RNN
Multi -purpose heads (prize, next image, mask recovery)
Example: Cosmos Reason1 It combines perception with simulation using a self -sufficient physics tokenizer.

Tesla FSD (V13+)

Uses self -sufficient elements such as:
3D trajectories taken From video data
Masked autoregression video forecasts Modeling of driving behavior
Multimodal fusion of the sensor (free from Lidar) with SSL on video pipelines for action

The AI Tesla pile is still changing from supervised logic blocks in the united direction Comprehensive complacency driving models.

Summary of SSL technology and their use cases

SSL applications

Natural language processing (NLP)

SSL has revolutionized NLP, enabling learning to learn from huge amounts of unknown text. Models such as Bert and GPT have achieved the latest results in various NLP tasks.

Computer vision

At Computer Vision, SSL techniques have been used for pre -leaching models on large sets of data data, which leads to better performance in tasks such as image classification, detection of objects and segmentation.

Robotics

SSL allows robots to learn based on their interaction with the environment without clear supervision, increasing their adaptability and autonomy.

Healthcare

In medical imaging, SSL helps in learning representation based on unknown scans, supporting the diagnosis of the disease and treatment planning.

Advantages of SSL

Reduced dependence on the marked data: SSL minimizes the need for large data sets that are often expensive and time -consuming to create.
Improved generalization: Models trained with SSL often generalize better for new tasks and domains.
Scalability: SSL allows the use of huge amounts of unknown data, facilitating large -scale models training.

Challenges in SSL

Designing effective pretext tasks: Creating tasks leading to significant representations is non -trivial and often specific to the domain.
Calculation resources: Training large SSL models requires significant computing strength.
Evaluation indicators: Assessment of the quality of learned representations without marked data remains a challenge.

SSL future

As SSL evolutions, it is expected to play a key role in the development of general artificial intelligence (GAI). Future tips include:

Integration with learning reinforcement: A combination of SSL with reinforcement learning can lead to more efficient learning in dynamic environments.
Multimodal learning: SSL will facilitate learning with many data methods, such as text, images and sound, leading to more comprehensive AI systems.
Continuous learning: SSL may enable continuous learning from the streaming of data without forgetting about prior knowledge.

Application

Commodation of learning appeared as a transformational approach in machine learning, enabling models to effectively learn based on unknown data. Its applications include various domains, and its potential is constantly growing with the progress of research. When we approach more generalized AI systems, SSL will undoubtedly play a central role in shaping the future of artificial intelligence.

Reference

https://www.linkedin.com/posts/yann-lecun_the-self-supervised-learning-cookbook-activity-7057520172525334528-ahhe
https://venturebeat.com/ai/facebook-details-sel-supervised-ai-th-can -segment-images-and-videos/
Path for autonomous machine intelligence: https://openreview.net/pdf?id=BZ5A1R-KVSF
Self -compulsory DIRTING of visual features in the wild: https://arxiv.org/pdf/2103.01988.pdf
Learning complacency: Dark matter of intelligence: https://ai.facebook.com/blog/self-supervised-learning-the-dark-matter-of-intelligence/
A thorough guide to self -sufficient learning: benefits and applications: https://research.aimultiPle.com/self-supervised-learning/
Learning to represent complacency: https://lilianweg.github.io/posts/2019-11-10-sels-supervised/
Self -complacency of learning and its use: https://neptune.ai/blog/self-supervised—-myning
Self -complacency learning for recommending systems: survey: https://arxiv.org/pdf/2203.15876.pdf
Self -compilation large -scale learning Recommendations for subjects: https://arxiv.org/pdf/2007.12865.pdf

Published via AI

Original): Luhui Hu

Introduction: an increase in self -sufficient science

Understanding the complacency of learning

Short SSL history

Basic techniques in SSL

1. Contempt learning

2. Mass modeling

3. Predictive learning

Inside SSL Technologies and Architecture

1. Contempt learning

2. Masked carsodowanie (mae, bert, vests)

3. Bootstrap of your own latent (byol, Dino)

4. Predictive coding and hidden dynamics (world models)

5. Requirement required (clip, Flamingo, Helisa)

SSL in models and robotics of foundations

GPT-4O and GPT-4

Models operating in the language of the vision (RT-2, Helisa, Openvla)

World models (π0.5, Cosmos, Meta WM)

Tesla FSD (V13+)

SSL applications

Natural language processing (NLP)

Computer vision

Robotics

Healthcare

Advantages of SSL

Challenges in SSL

SSL future

Application

Reference

LEAVE A REPLY Cancel reply

APLICATIONS

HOT NEWS

POPULAR POSTS

POPULAR CATEGORY