MIT Latest News

Subscribe to MIT Latest News feed
MIT News is dedicated to communicating to the media and the public the news and achievements of the students, faculty, staff and the greater MIT community.
Updated: 4 hours 10 min ago

Teaching large language models how to absorb new knowledge

Wed, 11/12/2025 - 12:00am

In an MIT classroom, a professor lectures while students diligently write down notes they will reread later to study and internalize key information ahead of an exam.

Humans know how to learn new information, but large language models can’t do this in the same way. Once a fully trained LLM has been deployed, its “brain” is static and can’t permanently adapt itself to new knowledge.

This means that if a user tells an LLM something important today, it won’t remember that information the next time this person starts a new conversation with the chatbot.

Now, a new approach developed by MIT researchers enables LLMs to update themselves in a way that permanently internalizes new information. Just like a student, the LLM generates its own study sheets from a user’s input, which it uses to memorize the information by updating its inner workings.

The model generates multiple self-edits to learn from one input, then applies each one to see which improves its performance the most. This trial-and-error process teaches the model the best way to train itself.

The researchers found this approach improved the accuracy of LLMs at question-answering and pattern-recognition tasks, and it enabled a small model to outperform much larger LLMs.

While there are still limitations that must be overcome, the technique could someday help artificial intelligence agents consistently adapt to new tasks and achieve changing goals in evolving environments.   

“Just like humans, complex AI systems can’t remain static for their entire lifetimes. These LLMs are not deployed in static environments. They are constantly facing new inputs from users. We want to make a model that is a bit more human-like — one that can keep improving itself,” says Jyothish Pari, an MIT graduate student and co-lead author of a paper on this technique.

He is joined on the paper by co-lead author Adam Zweiger, an MIT undergraduate; graduate students Han Guo and Ekin Akyürek; and senior authors Yoon Kim, an associate professor in the Department of Electrical Engineering and Computer Science (EECS) and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL), and Pulkit Agrawal, an associate professor in EECS and member of CSAIL. The research will be presented at the Conference on Neural Information Processing Systems.

Teaching the model to learn

LLMs are neural network models that have billions of parameters, called weights, that contain the model’s knowledge and process inputs to make predictions. During training, the model adapts these weights to learn new information contained in its training data.

But once it is deployed, the weights are static and can’t be permanently updated anymore.

However, LLMs are very good at a process called in-context learning, in which a trained model learns a new task by seeing a few examples. These examples guide the model’s responses, but the knowledge disappears before the next conversation.

The MIT researchers wanted to leverage a model’s powerful in-context learning capabilities to teach it how to permanently update its weights when it encounters new knowledge.

The framework they developed, called SEAL for “self-adapting LLMs,” enables an LLM to generate new synthetic data based on an input, and then determine the best way to adapt itself and learn from that synthetic data. Each piece of synthetic data is a self-edit the model can apply.

In the case of language, the LLM creates synthetic data by rewriting the information, and its implications, in an input passage. This is similar to how students make study sheets by rewriting and summarizing original lecture content.

The LLM does this multiple times, then quizzes itself on each self-edit to see which led to the biggest boost in performance on a downstream task like question answering. It uses a trial-and-error method known as reinforcement learning, where it receives a reward for the greatest performance boost.

Then the model memorizes the best study sheet by updating its weights to internalize the information in that self-edit.

“Our hope is that the model will learn to make the best kind of study sheet — one that is the right length and has the proper diversity of information — such that updating the model based on it leads to a better model,” Zweiger explains.

Choosing the best method

Their framework also allows the model to choose the way it wants to learn the information. For instance, the model can select the synthetic data it wants to use, the rate at which it learns, and how many iterations it wants to train on.

In this case, not only does the model generate its own training data, but it also configures the optimization that applies that self-edit to its weights.

“As humans, we know how we learn best. We want to grant that same ability to large language models. By providing the model with the ability to control how it digests this information, it can figure out the best way to parse all the data that are coming in,” Pari says.

SEAL outperformed several baseline methods across a range of tasks, including learning a new skill from a few examples and incorporating knowledge from a text passage. On question answering, SEAL improved model accuracy by nearly 15 percent and on some skill-learning tasks, it boosted the success rate by more than 50 percent.

But one limitation of this approach is a problem called catastrophic forgetting: As the model repeatedly adapts to new information, its performance on earlier tasks slowly declines.

The researchers plan to mitigate catastrophic forgetting in future work. They also want to apply this technique in a multi-agent setting where several LLMs train each other.

“One of the key barriers to LLMs that can do meaningful scientific research is their inability to update themselves based on their interactions with new information. Though fully deployed self-adapting models are still far off, we hope systems able to learn this way could eventually overcome this and help advance science,” Zweiger says.

This work is supported, in part, by the U.S. Army Research Office, the U.S. Air Force AI Accelerator, the Stevens Fund for MIT UROP, and the MIT-IBM Watson AI Lab. 

Understanding the nuances of human-like intelligence

Tue, 11/11/2025 - 12:00am

What can we learn about human intelligence by studying how machines “think?” Can we better understand ourselves if we better understand the artificial intelligence systems that are becoming a more significant part of our everyday lives?

These questions may be deeply philosophical, but for Phillip Isola, finding the answers is as much about computation as it is about cogitation.

Isola, the newly tenured associate professor in the Department of Electrical Engineering and Computer Science (EECS), studies the fundamental mechanisms involved in human-like intelligence from a computational perspective.

While understanding intelligence is the overarching goal, his work focuses mainly on computer vision and machine learning. Isola is particularly interested in exploring how intelligence emerges in AI models, how these models learn to represent the world around them, and what their “brains” share with the brains of their human creators.

“I see all the different kinds of intelligence as having a lot of commonalities, and I’d like to understand those commonalities. What is it that all animals, humans, and AIs have in common?” says Isola, who is also a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL).

To Isola, a better scientific understanding of the intelligence that AI agents possess will help the world integrate them safely and effectively into society, maximizing their potential to benefit humanity.

Asking questions

Isola began pondering scientific questions at a young age.

While growing up in San Francisco, he and his father frequently went hiking along the northern California coastline or camping around Point Reyes and in the hills of Marin County.

He was fascinated by geological processes and often wondered what made the natural world work. In school, Isola was driven by an insatiable curiosity, and while he gravitated toward technical subjects like math and science, there was no limit to what he wanted to learn.

Not entirely sure what to study as an undergraduate at Yale University, Isola dabbled until he came upon cognitive sciences.

“My earlier interest had been with nature — how the world works. But then I realized that the brain was even more interesting, and more complex than even the formation of the planets. Now, I wanted to know what makes us tick,” he says.

As a first-year student, he started working in the lab of his cognitive sciences professor and soon-to-be mentor, Brian Scholl, a member of the Yale Department of Psychology. He remained in that lab throughout his time as an undergraduate.

After spending a gap year working with some childhood friends at an indie video game company, Isola was ready to dive back into the complex world of the human brain. He enrolled in the graduate program in brain and cognitive sciences at MIT.

“Grad school was where I felt like I finally found my place. I had a lot of great experiences at Yale and in other phases of my life, but when I got to MIT, I realized this was the work I really loved and these are the people who think similarly to me,” he says.

Isola credits his PhD advisor, Ted Adelson, the John and Dorothy Wilson Professor of Vision Science, as a major influence on his future path. He was inspired by Adelson’s focus on understanding fundamental principles, rather than only chasing new engineering benchmarks, which are formalized tests used to measure the performance of a system.

A computational perspective

At MIT, Isola’s research drifted toward computer science and artificial intelligence.

“I still loved all those questions from cognitive sciences, but I felt I could make more progress on some of those questions if I came at it from a purely computational perspective,” he says.

His thesis was focused on perceptual grouping, which involves the mechanisms people and machines use to organize discrete parts of an image as a single, coherent object.

If machines can learn perceptual groupings on their own, that could enable AI systems to recognize objects without human intervention. This type of self-supervised learning has applications in areas such autonomous vehicles, medical imaging, robotics, and automatic language translation.

After graduating from MIT, Isola completed a postdoc at the University of California at Berkeley so he could broaden his perspectives by working in a lab solely focused on computer science.

“That experience helped my work become a lot more impactful because I learned to balance understanding fundamental, abstract principles of intelligence with the pursuit of some more concrete benchmarks,” Isola recalls.

At Berkeley, he developed image-to-image translation frameworks, an early form of generative AI model that could turn a sketch into a photographic image, for instance, or turn a black-and-white photo into a color one.

He entered the academic job market and accepted a faculty position at MIT, but Isola deferred for a year to work at a then-small startup called OpenAI.

“It was a nonprofit, and I liked the idealistic mission at that time. They were really good at reinforcement learning, and I thought that seemed like an important topic to learn more about,” he says.

He enjoyed working in a lab with so much scientific freedom, but after a year Isola was ready to return to MIT and start his own research group.

Studying human-like intelligence

Running a research lab instantly appealed to him.

“I really love the early stage of an idea. I feel like I am a sort of startup incubator where I am constantly able to do new things and learn new things,” he says.

Building on his interest in cognitive sciences and desire to understand the human brain, his group studies the fundamental computations involved in the human-like intelligence that emerges in machines.

One primary focus is representation learning, or the ability of humans and machines to represent and perceive the sensory world around them.

In recent work, he and his collaborators observed that the many varied types of machine-learning models, from LLMs to computer vision models to audio models, seem to represent the world in similar ways.

These models are designed to do vastly different tasks, but there are many similarities in their architectures. And as they get bigger and are trained on more data, their internal structures become more alike.

This led Isola and his team to introduce the Platonic Representation Hypothesis (drawing its name from the Greek philosopher Plato) which says that the representations all these models learn are converging toward a shared, underlying representation of reality.

“Language, images, sound — all of these are different shadows on the wall from which you can infer that there is some kind of underlying physical process — some kind of causal reality — out there. If you train models on all these different types of data, they should converge on that world model in the end,” Isola says.

A related area his team studies is self-supervised learning. This involves the ways in which AI models learn to group related pixels in an image or words in a sentence without having labeled examples to learn from.

Because data are expensive and labels are limited, using only labeled data to train models could hold back the capabilities of AI systems. With self-supervised learning, the goal is to develop models that can come up with an accurate internal representation of the world on their own.

“If you can come up with a good representation of the world, that should make subsequent problem solving easier,” he explains.

The focus of Isola’s research is more about finding something new and surprising than about building complex systems that can outdo the latest machine-learning benchmarks.

While this approach has yielded much success in uncovering innovative techniques and architectures, it means the work sometimes lacks a concrete end goal, which can lead to challenges.

For instance, keeping a team aligned and the funding flowing can be difficult when the lab is focused on searching for unexpected results, he says.

“In a sense, we are always working in the dark. It is high-risk and high-reward work. Every once in while, we find some kernel of truth that is new and surprising,” he says.

In addition to pursuing knowledge, Isola is passionate about imparting knowledge to the next generation of scientists and engineers. Among his favorite courses to teach is 6.7960 (Deep Learning), which he and several other MIT faculty members launched four years ago.

The class has seen exponential growth, from 30 students in its initial offering to more than 700 this fall.

And while the popularity of AI means there is no shortage of interested students, the speed at which the field moves can make it difficult to separate the hype from truly significant advances.

“I tell the students they have to take everything we say in the class with a grain of salt. Maybe in a few years we’ll tell them something different. We are really on the edge of knowledge with this course,” he says.

But Isola also emphasizes to students that, for all the hype surrounding the latest AI models, intelligent machines are far simpler than most people suspect.

“Human ingenuity, creativity, and emotions — many people believe these can never be modeled. That might turn out to be true, but I think intelligence is fairly simple once we understand it,” he says.

Even though his current work focuses on deep-learning models, Isola is still fascinated by the complexity of the human brain and continues to collaborate with researchers who study cognitive sciences.

All the while, he has remained captivated by the beauty of the natural world that inspired his first interest in science.

Although he has less time for hobbies these days, Isola enjoys hiking and backpacking in the mountains or on Cape Cod, skiing and kayaking, or finding scenic places to spend time when he travels for scientific conferences.

And while he looks forward to exploring new questions in his lab at MIT, Isola can’t help but contemplate how the role of intelligent machines might change the course of his work.

He believes that artificial general intelligence (AGI), or the point where machines can learn and apply their knowledge as well as humans can, is not that far off.

“I don’t think AIs will just do everything for us and we’ll go and enjoy life at the beach. I think there is going to be this coexistence between smart machines and humans who still have a lot of agency and control. Now, I’m thinking about the interesting questions and applications once that happens. How can I help the world in this post-AGI future? I don’t have any answers yet, but it’s on my mind,” he says.

Pages