MIT Latest News

MIT News is dedicated to communicating to the media and the public the news and achievements of the students, faculty, staff and the greater MIT community.

URL: https://news.mit.edu/rss/feed

Updated: 19 hours 19 min ago

A new way to make steel could reduce America’s reliance on imports

Fri, 02/13/2026 - 12:00am

America has been making steel from iron ore the same way for hundreds of years. Unfortunately, it hasn’t been making enough of it. Today the U.S. is the world’s largest steel importer, relying on other countries to produce a material that serves as the backbone of our society.

That’s not to say the U.S. is alone: Globally, most steel today is made in enormous, multi-billion-dollar plants using a coal-based process that hasn’t changed much in 300 years.

Now Hertha Metals, founded by CEO Laureen Meroueh SM ’18, PhD ’20, is scaling up a new steel production system powered by natural gas and electricity. The process, which can also run on hydrogen, uses a continuous electric arc furnace within which iron ore of any grade and format is reduced and carburized into molten steel in a single step. It also eliminates the need for coking and sintering plants, along with other dangerous and expensive components of traditional systems. As a result, the company says its process uses 30 percent less energy and costs less to operate than conventional steel mills in America.

“The real headline is the fact that we can make steel from iron ore more cost-competitive by 25 percent in the United States, while also reducing emissions.” Meroueh says. “The United States hasn’t been competitive in steelmaking in decades. Now we’re enabling that.”

Since late 2024, Hertha has been operating a 1-tonne-per-day pilot plant at its first production facility outside Houston, Texas. The company calls it the world’s largest demonstration of a single-step steelmaking process. This year, the company will begin construction of a plant that will be able to produce 10,000 tons of steel each year. That plant, which Hertha expects to reach full production capacity at the end 2027, will also produce high-purity iron for the magnet industry, helping America onshore another critical material.

“By importing so much of our pig iron and steel, we are completely reliant on global trade mechanisms and geopolitics remaining the way they are today for us to continue making the materials that are critical for our infrastructure, our defense systems, and our energy systems,” Meroueh says. “Steel is the most foundational material to our society. It is simply irreplaceable.”

Streamlining steelmaking

Meroueh earned her master’s degree in the lab of Gang Chen, MIT’s Carl Richard Soderberg Professor of Power Engineering. She studied thermal energy storage and the fundamental physics of heat transfer, eventually getting her first taste of entrepreneurship when she explored commercializing some of that research. Meroueh received a grant from the MIT Sandbox Innovation Fund and considers Executive Director Jinane Abounadi a close mentor today.

The experience taught Meroueh a lot about startups, but she ultimately decided to stay at MIT to pursue her PhD in metallurgy and hydrogen production in the lab of Douglas Hart, MIT professor of mechanical engineering. After earning her PhD in 2020, she was recruited to lead a hydrogen production startup for a year and a half.

“After that experience, I was looking at all of the hard-to-abate, high-emissions sectors of the economy to find the one receiving the least attention,” Meroueh says. “I stumbled onto steel and fell in love.”

Meroueh became an Innovators Fellow at the climate and energy startup investment firm Breakthrough Energy and officially founded Hertha Metals in 2022.

The company is named after Hertha Ayrton, a 19th-century physicist and inventor who advanced our understanding of electric arcs, which the company uses in its furnaces.

Globally, most steel today is made by combining iron ore with coke (from coal) and limestone in a blast furnace to make molten iron. That “pig iron” is then sent to another furnace to burn off excess carbon and impurities. Alloying elements are then added, and the steel is sent for casting and finishing, requiring additional machinery.

The U.S. makes most of its steel from recycled scrap metal, but it still must import iron made from a blast furnace to reach useful grades of steel.

“The United States has a massive need to make steel from iron ore, not just scrap, so we can stop relying on importing so much,” Meroueh explains. “We only have about 11 operational blast furnaces in the U.S., so we end up importing about 90 percent of the pig iron needed to feed into domestic scrap steel furnaces.”

To solve the problem, Meroueh leveraged a fuel America has in abundance: natural gas. Hertha’s system uses natural gas (the process also works with hydrogen) to reduce iron ore while using electricity to melt it in a single step. She says the closest competing technology requires scarce and expensive pelletized, high-grade iron ore and multiple furnaces to produce liquid steel. Meroueh’s process uses iron ore of any format or grade, producing refined liquid steel in a single furnace, cutting both cost and emissions.

“Many reactions that were previously run sequentially though a conventional steelmaking process are now occurring simultaneously, within a single furnace,” Meroueh explains. “We’re melting, we’re reducing, and we’re carburizing the steel to the exact amount we need. What exits our furnace is a refined molten steel. We can process any grade and format of iron ore because everything is occurring in the molten phase. It doesn’t matter whether the ore came in as a pellet or clumps and fines out of the ground.”

Meroueh says the company’s biggest innovation is performing the gaseous reduction when the iron oxide is a molten liquid using proprietary gas technologies.

“All of the conventional steelmaking technologies perform reduction while the iron ore is in a solid state, and they use gas — whether that’s combusted coke or natural gas — to perform that reduction,” Meroueh says. “We saw the inefficiency in doing that and how it restricted the grade and form of usable iron ore, because at the end of the day you have to melt the ore anyway.”

Hertha’s system is modular and uses standard off-gas handling equipment, steam turbines, and heat exchangers. It also recycles natural gas to regenerate electricity from the hot off-gas leaving the furnace.

“Our steel mill has its own little power plant attached that leads to 35 percent recovery in energy and minimizes grid power demand in an age in which we are competing with data centers,” Meroueh says.

Onshoring critical materials

Today’s steel mills are the result of enormous investments and are designed to run for at least 50 years. Hertha Metals doesn’t envision replacing those entirely — at least not anytime soon.

“You’re not just going to shut off a steel mill in the middle of its life,” Meroueh says. “Sure, you can build new steel mills, but we really want to be able to displace the blast furnace and the basic oxygen furnace while still utilizing all the mill’s downstream equipment.”

The company’s Houston plant began producing one ton of steel per day just two years after Hertha’s founding and less than one year after Meroueh opened up Hertha’s headquarters. She calls it an important first step.

“This is the largest-scale demonstration of a single-step steelmaking company,” Meroueh says. “It’s a true breakthrough in terms of scalability, pace of progress, and capital efficiency.”

The company’s next plant, which will be capable of producing 10,000 tons of steel each year, will also be producing high-purity iron for permanent magnets, which are used in electric motors, robotics, consumer electronics, aerospace and military hardware.

“It’s insane that we don’t make rare earth magnets domestically,” Meroueh says. “It’s insane that any country doesn’t make their own rare earth magnets. Most rare earth magnets are permanent magnets, so neodymium magnets. What’s interesting is that by weight, 70 percent of that magnet is not a rare earth, it’s high-purity iron. America doesn’t currently make any high-purity iron, but Hertha has already made it in our pilot plant.”

Hertha plans to quickly scale up its production of high-purity iron so that, by 2030, it will be able to meet about a quarter of total projected demand for magnets in the U.S.

After that, the company plans to run a full-scale commercial steel plant in partnership with a steel manufacturer in America. Meroueh says that plant, which will be able to produce around half a million tons of steel each year, should be operational by 2030.

“We are eager to partner with today’s steel producers so that we can collectively leverage the existing infrastructure alongside Hertha’s innovation,” Meroueh says. “That includes the $1.5 billion of capital downstream of a melt shop that Hertha’s process can integrate into. The melt shop is the ore-to-liquid steel portion of the steel mill. That’s just the start. It’s a smaller scale than a conventional plant in which we still economically out compete traditional production processes. Then we’re going to scale to 2 million tons per year once we build up our balance sheet.”

New J-PAL research and policy initiative to test and scale AI innovations to fight poverty

Thu, 02/12/2026 - 6:50pm

The Abdul Latif Jameel Poverty Action Lab (J-PAL) at MIT has awarded funding to eight new research studies to understand how artificial intelligence innovations can be used in the fight against poverty through its new Project AI Evidence.

The age of AI has brought wide-ranging optimism and skepticism about its effects on society. To realize AI’s full potential, Project AI Evidence (PAIE) will identify which AI solutions work and for whom, and scale only the most effective, inclusive, and responsible solutions — while scaling down those that may potentially cause harm.

PAIE will generate evidence on what works by connecting governments, tech companies, and nonprofits with world-class economists at MIT and across J-PAL’s global network to evaluate and improve AI solutions to entrenched social challenges.

The new initiative is prioritizing questions policymakers are already asking: Do AI-assisted teaching tools help all children learn? How can early-warning flood systems help people affected by natural disasters? Can machine learning algorithms help reduce deforestation in the Amazon? Can AI-powered chatbots help improve people’s health? In the coming years, PAIE will run a series of funding competitions to invite proposals for evaluations of AI tools that address questions like these, and many more.

PAIE is financially supported by a grant from Google.org, philanthropic support from Community Jameel, a grant from Canada’s International Development Research Centre and UK International Development, and a collaboration agreement with Amazon Web Services. Through a grant from Eric and Wendy Schmidt, awarded by recommendation of Schmidt Sciences, the initiative will also study generative AI in the workplace, particularly in low- and middle-income countries.

Alex Diaz, head of AI for social good at Google.org, says, “we’re thrilled to collaborate with MIT and J-PAL, already leaders in this space, on Project AI Evidence. AI has great potential to benefit all people, but we urgently need to study what works, what doesn’t, and why, if we are to realize this potential.”

“Artificial intelligence holds extraordinary potential, but only if the tools, knowledge, and power to shape it are accessible to all — that includes contextually grounded research and evidence on what works and what does not,” adds Maggie Gorman-Velez, vice president of strategy, regions, and policies at IDRC. “That is why IDRC is proud to be supporting this new evaluation work as part of our ongoing commitment to the responsible scaling of proven safe, inclusive, and locally relevant AI innovations.”

J-PAL is uniquely positioned to help understand AI’s effects on society: Since its inception in 2003, J-PAL’s network of researchers has led over 2,500 rigorous evaluations of social policies and programs around the world. Through PAIE, J-PAL will bring together leading experts in AI technology, research, and social policy, in alignment with MIT president Sally Kornbluth’s focus on generative AI as a strategic priority.

PAIE is chaired by Professor Joshua Blumenstock of the University of California at Berkeley; J-PAL Global Executive Director Iqbal Dhaliwal; and Professor David Yanagizawa-Drott of the University of Zurich.

New evaluations of urgent policy questions

The studies funded in PAIE’s first round of competition explore urgent questions in key sectors like education, health, climate, and economic opportunity.

How can AI be most effective in classrooms, helping both students and teachers?

Existing research shows that personalized learning is important for students, but challenging to implement with limited resources. In Kenya, education social enterprise EIDU has developed an AI tool that helps teachers identify learning gaps and adapt their daily lesson plans. In India, the nongovernmental organization (NGO) Pratham is developing an AI tool to increase the impact and scale of the evidence-informed Teaching at the Right Level approach. J-PAL researchers Daron Acemoglu, Iqbal Dhaliwal, and Francisco Gallego will work with both organizations to study the effects and potential of these different use cases on teachers’ productivity and students’ learning.

Can AI tools reduce gender bias in schools?

Researchers are collaborating with Italy’s Ministry of Education to evaluate whether AI tools can help close gender gaps in students’ performance by addressing teachers’ unconscious biases. J-PAL affiliates Michela Carlana and Will Dobbie, along with Francesca Miserocchi and Eleonora Patacchini, will study the impacts of two AI tools, one that helps teachers predict performance and a second that gives real-time feedback on the diversity of their decisions.

Can AI help career counselors uncover more job opportunities?

In Kenya, researchers are evaluating if an AI tool can identify overlooked skills and unlock employment opportunities, particularly for youth, women, and those without formal education. In collaboration with NGOs Swahilipot and Tabiya, Jasmin Baier and J-PAL researcher Christian Meyer will evaluate how the tool changes people’s job search strategies and employment. This study will shed light on AI as a complement, rather than a substitute, for human expertise in career guidance.

Looking forward

As use of AI in the social sector evolves, these evaluations are a first step in discovering effective, responsible solutions that will go the furthest in alleviating poverty and inequality.

J-PAL’s Dhaliwal notes, “J-PAL has a long history of evaluating innovative technology and its ability to improve people’s lives. While AI has incredible potential, we need to maximize its benefits and minimize possible harms. We’re grateful to our donors, sponsors, and collaborators for their catalytic support in launching PAIE, which will help us do exactly that by continuing to expand evidence on the impacts of AI innovations.”

J-PAL is also seeking new collaborators who share its vision of discovering and scaling up real-world AI solutions. It aims to support more governments and social sector organizations that want to adopt AI responsibly, and will continue to expand funding for new evaluations and provide policy guidance based on the latest research.

To learn more about Project AI Evidence, subscribe to J-PAL's newsletter or contact paie@povertyactionlab.org.

Maria Yang named vice provost for faculty

Thu, 02/12/2026 - 11:00am

Maria Yang ’91, the William E. Leonhard (1940) Professor in the Department of Mechanical Engineering, has been appointed vice provost for faculty at MIT, a role in which she will oversee programs and strategies to recruit and retain faculty members and support them throughout their careers.

Provost Anantha Chandrakasan announced Yang’s appointment, which is effective Feb. 16, in an email to MIT faculty and staff today.

“In the nearly two decades since Maria joined the MIT faculty, she has exemplified dedicated service to the Institute and deep interdisciplinary collaboration,” Chandrakasan wrote. He added that, in a series of leadership positions within the School of Engineering, Yang “consistently demonstrated her skill as a leader, her empathy as a colleague, and her values-driven decision-making.”

As vice provost for faculty, Yang will play a pivotal role in creating an environment where MIT’s faculty members are able to do their best work, “pursuing bold ideas with excellence and creativity,” according to Chandrakasan’s letter. She will partner with school and department leaders on faculty recruitment and retention, mentorship, and strategic planning, and she will oversee programs to support faculty members’ professional development at every stage of their careers.

“Part of what makes MIT unique is the way it provides faculty the room and the encouragement to do work that they think is important, impactful, and sometimes unexpected,” says Yang. “I think it’s vital to foster a culture and a sense of community that really enables our faculty to perform at their best — as researchers, of course, but also as educators and mentors, and as citizens of MIT.”

In addition to her role supporting MIT faculty, Yang will also handle oversight and planning responsibilities for campus academic and research spaces, in partnership with the Office of the Executive Vice President and Treasurer. She will also serve as the principal investigator for the National Science Foundation’s New England Innovation Corps Hub, oversee MIT Solve, and represent the provost on various boards and committees, such as MIT International and the Axim Collaborative.

Yang, who attended MIT as an undergraduate in mechanical engineering as part of the Class of 1991 before earning her master’s and PhD degrees from the design division of the mechanical engineering department at Stanford University, returned to MIT in 2007 as an assistant professor. She has held a number of leadership positions at MIT, including associate dean, deputy dean, and interim dean of the School of Engineering.

In 2021, Yang co-chaired an Institute-wide committee on the future of design, which recommended the creation of a center to support design opportunities at MIT. Through a generous gift from the Morningside Foundation, the recommendation came to life as the interdisciplinary Morningside Academy for Design (MAD), where Yang has served as associate director since inception. Yang has been instrumental in the development of several new programs at MAD, including design-focused graduate fellowships open to students across MIT and a new design-themed first-year learning community.

Since 2017, Yang has also served as academic faculty director for MIT D-Lab, which uses participatory design to collaborate with communities around the world on the development of solutions to poverty challenges. And since 2024, Yang has served as a co-chair of the SHASS+ Connectivity Fund, which funds research projects in which scholars in the School of Humanities, Arts, and Social Sciences collaborate with faculty colleagues from other schools at MIT.

Given Yang’s extensive track record of working across disciplinary lines, Chandrakasan said in his letter that he had “no doubt that in her new role she will be an effective and trusted champion for colleagues across the Institute.”

An internationally recognized leader in design theory and methodology, Yang is currently focused on researching the early-stage processes used to create successful designs for everything from consumer products to complex, large-scale engineering systems, and the role that these early-stage processes play in determining design outcomes.

Yang, a fellow of the American Society of Mechanical Engineers (ASME), received the 2024 ASME Design Theory and Methodology Award, recognizing “sustained and meritorious contributions” in the field. She has also been recognized with a National Science Foundation CAREER award and the American Society of Engineering Education Fred Merryfield Design Award. In 2017 Yang was named a MacVicar Faculty Fellow, one of MIT’s highest teaching honors.

Yang succeeds Institute Professor Paula Hammond, who served in the role from 2023 before being named dean of the School of Engineering, a role she assumed in January.

Accelerating science with AI and simulations

Thu, 02/12/2026 - 12:00am

For more than a decade, MIT Associate Professor Rafael Gómez-Bombarelli has used artificial intelligence to create new materials. As the technology has expanded, so have his ambitions.

Now, the newly tenured professor in materials science and engineering believes AI is poised to transform science in ways never before possible. His work at MIT and beyond is devoted to accelerating that future.

“We’re at a second inflection point,” Gómez-Bombarelli says. “The first one was around 2015 with the first wave of representation learning, generative AI, and high-throughput data in some areas of science. Those are some of the techniques I first brought into my lab at MIT. Now I think we’re at a second inflection point, mixing language and merging multiple modalities into general scientific intelligence. We’re going to have all the model classes and scaling laws needed to reason about language, reason over material structures, and reason over synthesis recipes.”

Gómez Bombarelli’s research combines physics-based simulations with approaches like machine learning and generative AI to discover new materials with promising real-world applications. His work has led to new materials for batteries, catalysts, plastics, and organic light-emitting diodes (OLEDs). He has also co-founded multiple companies and served on scientific advisory boards for startups applying AI to drug discovery, robotics, and more. His latest company, Lila Sciences, is working to build a scientific superintelligence platform for the life sciences, chemical, and materials science industries.

All of that work is designed to ensure the future of scientific research is more seamless and productive than research today.

“AI for science is one of the most exciting and aspirational uses of AI,” Gómez-Bombarelli says. “Other applications for AI have more downsides and ambiguity. AI for science is about bringing a better future forward in time.”

From experiments to simulations

Gómez-Bombarelli grew up in Spain and gravitated toward the physical sciences from an early age. In 2001, he won a Chemistry Olympics competition, setting him on an academic track in chemistry, which he studied as an undergraduate at his hometown college, the University of Salamanca. Gómez-Bombarelli stuck around for his PhD, where he investigated the function of DNA-damaging chemicals.

“My PhD started out experimental, and then I got bitten by the bug of simulation and computer science about halfway through,” he says. “I started simulating the same chemical reactions I was measuring in the lab. I like the way programming organizes your brain; it felt like a natural way to organize one’s thinking. Programming is also a lot less limited by what you can do with your hands or with scientific instruments.”

Next, Gómez-Bombarelli went to Scotland for a postdoctoral position, where he studied quantum effects in biology. Through that work, he connected with Alán Aspuru-Guzik, a chemistry professor at Harvard University, whom he joined for his next postdoc in 2014.

“I was one of the first people to use generative AI for chemistry in 2016, and I was on the first team to use neural networks to understand molecules in 2015,” Gómez-Bombarelli says. “It was the early, early days of deep learning for science.”

Gómez-Bombarelli also began working to eliminate manual parts of molecular simulations to run more high-throughput experiments. He and his collaborators ended up running hundreds of thousands of calculations across materials, discovering hundreds of promising materials for testing.

After two years in the lab, Gómez-Bombarelli and Aspuru-Guzik started a general-purpose materials computation company, which eventually pivoted to focus on producing organic light-emitting diodes. Gómez-Bombarelli joined the company full-time and calls it the hardest thing he’s ever done in his career.

“It was amazing to make something tangible,” he says. “Also, after seeing Aspuru-Guzik run a lab, I didn’t want to become a professor. My dad was a professor in linguistics, and I thought it was a mellow job. Then I saw Aspuru-Guzik with a 40-person group, and he was on the road 120 days a year. It was insane. I didn’t think I had that type of energy and creativity in me.”

In 2018, Aspuru-Guzik suggested Gómez-Bombarelli apply for a new position in MIT’s Department of Materials Science and Engineering. But, with his trepidation about a faculty job, Gómez-Bombarelli let the deadline pass. Aspuru-Guzik confronted him in his office, slammed his hands on the table, and told him, “You need to apply for this.” It was enough to get Gómez-Bombarelli to put together a formal application.

Fortunately at his startup, Gómez-Bombarelli had spent a lot of time thinking about how to create value from computational materials discovery. During the interview process, he says, he was attracted to the energy and collaborative spirit at MIT. He also began to appreciate the research possibilities.

“Everything I had been doing as a postdoc and at the company was going to be a subset of what I could do at MIT,” he says. “I was making products, and I still get to do that. Suddenly, my universe of work was a subset of this new universe of things I could explore and do.”

It’s been nine years since Gómez Bombarelli joined MIT. Today his lab focuses on how the composition, structure, and reactivity of atoms impact material performance. He has also used high-throughput simulations to create new materials and helped develop tools for merging deep learning with physics-based modeling.

“Physics-based simulations make data and AI algorithms get better the more data you give them,” Gómez Bombarelli’s says. “There are all sorts of virtuous cycles between AI and simulations.”

The research group he has built is solely computational — they don’t run physical experiments.

“It’s a blessing because we can have a huge amount of breadth and do lots of things at once,” he says. “We love working with experimentalists and try to be good partners with them. We also love to create computational tools that help experimentalists triage the ideas coming from AI .”

Gómez-Bombarelli is also still focused on the real-world applications of the materials he invents. His lab works closely with companies and organizations like MIT’s Industrial Liaison Program to understand the material needs of the private sector and the practical hurdles of commercial development.

Accelerating science

As excitement around artificial intelligence has exploded, Gómez-Bombarelli has seen the field mature. Companies like Meta, Microsoft, and Google’s DeepMind now regularly conduct physics-based simulations reminiscent of what he was working on back in 2016. In November, the U.S. Department of Energy launched the Genesis Mission to accelerate scientific discovery, national security, and energy dominance using AI.

“AI for simulations has gone from something that maybe could work to a consensus scientific view,” Gómez-Bombarelli says. “We’re at an inflection point. Humans think in natural language, we write papers in natural language, and it turns out these large language models that have mastered natural language have opened up the ability to accelerate science. We’ve seen that scaling works for simulations. We’ve seen that scaling works for language. Now we’re going to see how scaling works for science.”

When he first came to MIT, Gómez-Bombarelli says he was blown away by how non-competitive things were between researchers. He tries to bring that same positive-sum thinking to his research group, which is made up of about 25 graduate students and postdocs.

“We’ve naturally grown into a really diverse group, with a diverse set of mentalities,” Gomez-Bombarelli says. “Everyone has their own career aspirations and strengths and weaknesses. Figuring out how to help people be the best versions of themselves is fun. Now I’ve become the one insisting that people apply to faculty positions after the deadline. I guess I’ve passed that baton.”

Using synthetic biology and AI to address global antimicrobial resistance threat

Wed, 02/11/2026 - 8:00am

James J. Collins, the Termeer Professor of Medical Engineering and Science at MIT and faculty co-lead of the Abdul Latif Jameel Clinic for Machine Learning in Health, is embarking on a multidisciplinary research project that applies synthetic biology and generative artificial intelligence to the growing global threat of antimicrobial resistance (AMR).

The research project is sponsored by Jameel Research, part of the Abdul Latif Jameel International network. The initial three-year, $3 million research project in MIT’s Department of Biological Engineering and Institute of Medical Engineering and Science focuses on developing and validating programmable antibacterials against key pathogens.

AMR — driven by the overuse and misuse of antibiotics — has accelerated the rise of drug-resistant infections, while the development of new antibacterial tools has slowed. The impact is felt worldwide, especially in low- and middle-income countries, where limited diagnostic infrastructure causes delays or ineffective treatment.

The project centers on developing a new generation of targeted antibacterials using AI to design small proteins to disable specific bacterial functions. These designer molecules would be produced and delivered by engineered microbes, providing a more precise and adaptable approach than traditional antibiotics.

“This project reflects my belief that tackling AMR requires both bold scientific ideas and a pathway to real-world impact,” Collins says. “Jameel Research is keen to address this crisis by supporting innovative, translatable research at MIT.”

Mohammed Abdul Latif Jameel ’78, chair of Abdul Latif Jameel, says, “antimicrobial resistance is one of the most urgent challenges we face today, and addressing it will require ambitious science and sustained collaboration. We are pleased to support this new research, building on our long-standing relationship with MIT and our commitment to advancing research across the world, to strengthen global health and contribute to a more resilient future.”

AI algorithm enables tracking of vital white matter pathways

Tue, 02/10/2026 - 5:00pm

The signals that drive many of the brain and body’s most essential functions — consciousness, sleep, breathing, heart rate, and motion — course through bundles of “white matter” fibers in the brainstem, but imaging systems so far have been unable to finely resolve these crucial neural cables. That has left researchers and doctors with little capability to assess how they are affected by trauma or neurodegeneration.

In a new study, a team of MIT, Harvard University, and Massachusetts General Hospital researchers unveil AI-powered software capable of automatically segmenting eight distinct bundles in any diffusion MRI sequence.

In the open-access study, published Feb. 6 in the Proceedings of the National Academy Sciences, the research team led by MIT graduate student Mark Olchanyi reports that their BrainStem Bundle Tool (BSBT), which they’ve made publicly available, revealed distinct patterns of structural changes in patients with Parkinson’s disease, multiple sclerosis, and traumatic brain injury, and shed light on Alzheimer’s disease as well. Moreover, the study shows, BSBT retrospectively enabled tracking of bundle healing in a coma patient that reflected the patient’s seven-month road to recovery.

“The brainstem is a region of the brain that is essentially not explored because it is tough to image,” says Olchanyi, a doctoral candidate in MIT’s Medical Engineering and Medical Physics Program. “People don't really understand its makeup from an imaging perspective. We need to understand what the organization of the white matter is in humans and how this organization breaks down in certain disorders.”

Adds Professor Emery N. Brown, Olchanyi’s thesis supervisor and co-senior author of the study, “the brainstem is one of the body’s most important control centers. Mark’s algorithms are a significant contribution to imaging research and to our ability to the understand regulation of fundamental physiology. By enhancing our capacity to image the brainstem, he offers us new access to vital physiological functions such as control of the respiratory and cardiovascular systems, temperature regulation, how we stay awake during the day and how sleep at night.”

Brown is the Edward Hood Taplin Professor of Computational Neuroscience and Medical Engineering in The Picower Institute for Learning and Memory, the Institute for Medical Engineering and Science, and the Department of Brain and Cognitive Sciences at MIT. He is also an anesthesiologist at MGH and a professor at Harvard Medical School.

Building the algorithm

Diffusion MRI helps trace the long branches, or “axons,” that neurons extend to communicate with each other. Axons are typically clad in a sheath of fat called myelin, and water diffuses along the axons within the myelin, which is also called the brain’s “white matter.” Diffusion MRI can highlight this very directed displacement of water. But segmenting the distinct bundles of axons in the brainstem has proved challenging, because they are small and masked by flows of brain fluids and the motions produced by breathing and heart beats.

As part of his thesis work to better understand the neural mechanisms that underpin consciousness, Olchanyi wanted to develop an AI algorithm to overcome these obstacles. BSBT works by tracing fiber bundles that plunge into the brainstem from neighboring areas higher in the brain, such as the thalamus and the cerebellum, to produce a “probabilistic fiber map.” An artificial intelligence module called a “convolutional neural network” then combines the map with several channels of imaging information from within the brainstem to distinguish eight individual bundles.

To train the neural network to segment the bundles, Olchanyi “showed” it 30 live diffusion MRI scans from volunteers in the Human Connectome Project (HCP). The scans were manually annotated to teach the neural network how to identify the bundles. Then he validated BSBT by testing its output against “ground truth” dissections of post-mortem human brains where the bundles were well delineated via microscopic inspection or very slow but ultra-high-resolution imaging. After training, BSBT became proficient in automatically identifying the eight distinct fiber bundles in new scans.

In an experiment to test its consistency and reliability, Olchanyi tasked BSBT with finding the bundles in 40 volunteers who underwent separate scans two months apart. In each case, the tool was able to find the same bundles in the same patients in each of their two scans. Olchanyi also tested BSBT with multiple datasets (not just the HCP), and even inspected how each component of the neural network contributed to BSBT’s analysis by hobbling them one by one.

“We put the neural network through the wringer,” Olchanyi says. “We wanted to make sure that it’s actually doing these plausible segmentations and it is leveraging each of its individual components in a way that improves the accuracy.”

Potential novel biomarkers

Once the algorithm was properly trained and validated, the research team moved on to testing whether the ability to segment distinct fiber bundles in diffusion MRI scans could enable tracking of how each bundle’s volume and structure varied with disease or injury, creating a novel kind of biomarker. Although the brainstem has been difficult to examine in detail, many studies show that neurodegenerative diseases affect the brainstem, often early on in their progression.

Olchanyi, Brown and their co-authors applied BSBT to scores of datasets of diffusion MRI scans from patients with Alzheimer’s, Parkinson’s, MS, and traumatic brain injury (TBI). Patients were compared to controls and sometimes to themselves over time. In the scans, the tool measured bundle volume and “fractional anisotropy,” (FA) which tracks how much water is flowing along the myelinated axons versus how much is diffusing in other directions, a proxy for white matter structural integrity.

In each condition, the tool found consistent patterns of changes in the bundles. While only one bundle showed significant decline in Alzheimer’s, in Parkinson’s the tool revealed a reduction in FA in three of the eight bundles. It also revealed volume loss in another bundle in patients between a baseline scan and a two-year follow-up. Patients with MS showed their greatest FA reductions in four bundles and volume loss in three. Meanwhile, TBI patients didn’t show significant volume loss in any bundles, but FA reductions were apparent in the majority of bundles.

Testing in the study showed that BSBT proved more accurate than other classifier methods in discriminating between patients with health conditions versus controls.

BSBT, therefore, can be “a key adjunct that aids current diagnostic imaging methods by providing a fine-grained assessment of brainstem white matter structure and, in some cases, longitudinal information,” the authors wrote.

Finally, in the case of a 29-year-old man who suffered a severe TBI, Olchanyi applied BSBT to a scans taken during the man’s seven-month coma. The tool showed that the man’s brainstem bundles had been displaced, but not cut, and showed that over his coma, the lesions on the nerve bundles decreased by a factor of three in volume. As they healed, the bundles moved back into place as well.

The authors wrote that BSBT “has substantial prognostic potential by identifying preserved brainstem bundles that can facilitate coma recovery.”

The study’s other senior authors are Juan Eugenio Iglesias and Brian Edlow. Other co-authors are David Schreier, Jian Li, Chiara Maffei, Annabel Sorby-Adams, Hannah Kinney, Brian Healy, Holly Freeman, Jared Shless, Christophe Destrieux, and Hendry Tregidgo.

Funding for the study came from the National Institutes of Health, U.S. Department of Defense, James S. McDonnell Foundation, Rappaport Foundation, American SidS Institute, American Brain Foundation, American Academy of Neurology, Center for Integration of Medicine and Innovative Technology, Blueprint for Neuroscience Research, and Massachusetts Life Sciences Center.

Magnetic mixer improves 3D bioprinting

Tue, 02/10/2026 - 4:35pm

3D bioprinting, in which living tissues are printed with cells mixed into soft hydrogels, or “bio-inks,” is widely used in the field of bioengineering for modeling or replacing the tissues in our bodies. The print quality and reproducibility of tissues, however, can face challenges. One of the most significant challenges is created simply by gravity — cells naturally sink to the bottom of the bioink-extruding printer syringe because the cells are heavier than the hydrogel around them.

“This cell settling, which becomes worse during the long print sessions required to print large tissues, leads to clogged nozzles, uneven cell distribution, and inconsistencies between printed tissues,” explains Ritu Raman, the Eugene Bell Career Development Professor of Tissue Engineering and assistant professor of mechanical engineering at MIT. “Existing solutions, such as manually stirring bioinks before loading them into the printer, or using passive mixers, cannot maintain uniformity once printing begins.”

In a study published Feb. 2 in the journal Device, Raman’s team introduces a new approach that aims to solve this core limitation by actively preventing cell sedimentation within bioinks during printing, allowing for more reliable and biologically consistent 3D printed tissues.

“Precise control over the bioink’s physical and biological properties is essential for recreating the structure and function of native tissues,” says Ferdows Afghah, a postdoc in mechanical engineering at MIT and lead author of the study.

“If we can print tissues that more closely mimic those in our bodies, we can use them as models to understand more about human diseases, or to test the safety and efficacy of new therapeutic drugs,” adds Raman. Such models could help researchers move away from techniques like animal testing, which supports recent interest from the U.S. Food and Drug Administration in developing faster, less expensive, and more informative new approaches to establish the safety and efficacy of new treatment paths.

“Eventually, we are working towards regenerative medicine applications such as replacing diseased or injured tissues in our bodies with 3D printed tissues that can help restore healthy function,” says Raman.

MagMix, a magnetically actuated mixer, is composed of two parts: a small magnetic propeller that fits inside the syringes used by bioprinters to deposit bioinks, layer by layer, into 3D tissues, and a permanent magnet attached to a motor that moves up and down near the syringe, controlling the movement of the propeller inside. Together, this compact system can be mounted onto any standard 3D bioprinter, keeping bioinks uniformly mixed during printing without changing the bioink formulation or interfering with the printer’s normal operation. To test the approach, the team used computer simulations to design the optimal mixing propeller geometry and speed and then validated its performance experimentally.

“Across multiple bioink types, MagMix prevented cell settling for more than 45 minutes of continuous printing, reducing clogging and preserving high cell viability,” says Raman. “Importantly, we showed that mixing speeds could be adjusted to balance effective homogenization for different bioinks while inducing minimal stress on the cells. As a proof-of-concept, we demonstrated that MagMix could be used to 3D print cells that could mature into muscle tissues over the course of several days.”

By maintaining uniform cell distribution throughout long or complex print jobs, MagMix enables the fabrication of high-quality tissues with more consistent biological function. Because the device is compact, low-cost, customizable, and easily integrated into existing 3D printers, it offers a broadly accessible solution for laboratories and industries working toward reproducible engineered tissues for applications in human health including disease modeling, drug screening, and regenerative medicine.

This work was supported, in part, by the Safety, Health, and Environmental Discovery Lab (SHED) at MIT, which provides infrastructure and interdisciplinary expertise to help translate biofabrication innovations from lab-scale demonstrations to scalable, reproducible applications.

“At the SHED, we focus on accelerating the translation of innovative methods into practical tools that researchers can reliably adopt,” says Tolga Durak, the SHED’s founding director. “MagMix is a strong example of how the right combination of technical infrastructure and interdisciplinary support can move biofabrication technologies toward scalable, real-world impact.”

The SHED’s involvement reflects a broader vision of strengthening technology pathways that enhance reproducibility and accessibility across engineering and the life sciences by providing equitable access to advanced equipment and fostering cross-disciplinary collaboration.

“As the field advances toward larger-scale and more standardized systems, integrated labs like SHED are essential for building sustainable capacity,” Durak adds. “Our goal is not only to enable discovery, but to ensure that new technologies can be reliably adopted and sustained over time.”

The team is also interested in non-medical applications of engineered tissues, such as using printed muscles to power safer and more efficient “biohybrid” robots.

The researchers believe this work can improve the reliability and scalability of 3D bioprinting, making the potential impacts on the field of 3D bioprinting and on human health significant. Their paper, “Advancing Bioink Homogeneity in Extrusion 3D Bioprinting with Active In Situ Magnetic Mixing,” is available now from the journal Device.

3 Questions: Using AI to help Olympic skaters land a quint

Tue, 02/10/2026 - 12:00am

Olympic figure skating looks effortless. Athletes sail across the ice, then soar into the air, spinning like a top, before landing on a single blade just 4-5 millimeters wide. To help figure skaters land quadruple axels, Salchows, Lutzes, and maybe even the elusive quintuple without looking the least bit stressed, Jerry Lu MFin ’24 developed an optical tracking system called OOFSkate that uses artificial intelligence to analyze video of a figure skater’s jump and make recommendations on how to improve. Lu, a former researcher at the MIT Sports Lab, has been aiding elite skaters on Team USA with their technical performance and will be working with NBC Sports during the 2026 Winter Olympics to help commentators and TV viewers make better sense of the complex scoring system in figure skating, snowboarding, and skiing. He’ll be applying AI technologies to explain nuanced judging decisions and demonstrate just how technically challenging these sports can be.

Meanwhile, Professor Anette “Peko” Hosoi, co-founder and faculty director of the MIT Sports Lab, is embarking on new research aimed at understanding how AI systems evaluate aesthetic performance in figure skating. Hosoi and Lu recently chatted with MIT News about applying AI to sports, whether AI systems could ever be used to judge Olympic figure skating, and when we might see a skater land a quint.

Q: Why apply AI to figure skating?

Lu: Skaters can always keep pushing, higher, faster, stronger. OOFSkate is all about helping skaters figure out a way to rotate a little bit faster in their jumps or jump a little bit higher. The system helps skaters catch things that perhaps could pass an eye test, but that might allow them to target some high-value areas of opportunity. The artistic side of skating is much harder to evaluate than the technical elements because it’s subjective.

To use mobile training app, you just need to take a video of an athlete’s jump, and it will spit out the physical metrics that drive how many rotations you can do. It tracks those metrics and builds in all of the other current elite and former elite athletes. You can see your data and then see, “This is how an Olympic champion did this element, perhaps I should try that.” You get the comparison and the automated classifier, which shows you if you did this trick at World Championships and it were judged by an international panel, this is approximately the grade of execution score they would give you.

Hosoi: There are a lot of AI tools that are coming online, especially things like pose estimators, where you can approximate skeletal configurations from video. The challenge with these pose estimators is that if you only have one camera angle, they do very well in the plane of the camera, but they do very poorly with depth. For example, if you’re trying to critique somebody’s form in fencing, and they’re moving toward the camera, you get very bad data. But with figure skating, Jerry has found one of the few areas where depth challenges don’t really matter. In figure skating, you need to understand: How high did this person jump, how many times did they go around, and how well did they land? None of those rely on depth. He’s found an application that pose estimators do really well, and that doesn’t pay a penalty for the things they do badly.

Q: Could you ever see a world in which AI is used to evaluate the artistic side of figure skating?

Hosoi: When it comes to AI and aesthetic evaluation, we have new work underway thanks to a MIT Human Insight Collaborative (MITHIC) grant. This work is in collaboration with Professor Arthur Bahr and IDSS graduate student Eric Liu. When you ask an AI platform for an aesthetic evaluation such as “What do you think of this painting?” it will respond with something that sounds like it came from a human. What we want to understand is, to get to that assessment, are the AIs going through the same sort of reasoning pathways or using the same intuitive concepts that humans go through to arrive at, “I like that painting,” or “I don’t like that painting”? Or are they just parrots? Are they just mimicking what they heard a person say? Or is there some concept map of aesthetic appeal? Figure skating is a perfect place to look for this map because skating is aesthetically judged. And there are numbers. You can’t go around a museum and find scores, “This painting is a 35.” But in skating, you’ve got the data.

That brings up another even more interesting question, which is the difference between novices and experts. It’s known that expert humans and novice humans will react differently to seeing the same thing. Somebody who is an expert judge may have a different opinion of a skating performance than a member of the general population. We’re trying to understand differences between reactions from experts, novices, and AI. Do these reactions have some common ground in where they are coming from, or is the AI coming from a different place than both the expert and the novice?

Lu: Figure skating is interesting because everybody working in the field of AI is trying to figure out AGI or artificial general intelligence and trying to build this extremely sound AI that replicates human beings. Working on applying AI to sports like figure skating helps us understand how humans think and approach judging. This has down-the-line impacts for AI research and companies that are developing AI models. By gaining a deeper understanding of how current state-of-the-art AI models work with these sports, and how you need to do training and fine-tuning of these models to make them work for specific sports, it helps you understand how AI needs to advance.

Q: What will you be watching for in the Milan Cortina Olympics figure skating competitions, now that you’ve been studying and working in this area? Do you think someone will land a quint?

Lu: For the winter games, I am working with NBC for the figure skating, ski, and snowboarding competitions to help them tell a data-driven story for the American people. The goal is to make these sports more relatable. Skating looks slow on television, but it’s not. Everything is supposed to look effortless. If it looks hard, you are probably going to get penalized. Skaters need to learn how to spin very fast, jump extremely high, float in the air, and land beautifully on one foot. The data we are gathering can help showcase how hard skating actually is, even though it is supposed to look easy.

I’m glad we are working in the Olympics sports realm because the world watches once every four years, and it is traditionally coaching-intensive and talent-driven sports, unlike a sport like baseball, where if you don’t have an elite-level optical tracking system you are not maximizing the value that you currently have. I’m glad we get to work with these Olympic sports and athletes and make an impact here.

Hosoi: I have always watched Olympic figure skating competitions, ever since I could turn on the TV. They’re always incredible. One of the things that I’m going to be practicing is identifying the jumps, which is very hard to do if you’re an amateur “judge.”

I have also done some back-of-the-envelope calculations to see if a quint is possible. I am now totally convinced it’s possible. We will see one in our lifetime, if not relatively soon. Not in this Olympics, but soon. When I saw we were so close on the quint, I thought, what about six? Can we do six rotations? Probably not. That’s where we start to come up against the limits of human physical capability. But five, I think, is in reach.

Times Higher Education ranks MIT No. 1 in arts and humanities, business and economics, and social sciences for 2026

Mon, 02/09/2026 - 6:00pm

The 2026 Times Higher Education World University Ranking has ranked MIT first in three subject categories: Arts and Humanities, Business and Economics, and Social Sciences, repeating the Institute’s top spot in the same subjects in 2025.

The Times Higher Education World University Ranking is an annual publication of university rankings by Times Higher Education, a leading British education magazine. The subject rankings are based on 18 rigorous performance indicators categorized under five core pillars: teaching, research environment, research quality, industry, and international outlook.

Disciplines included in MIT’s top-ranked subjects are housed in the School of Humanities, Arts, and Social Sciences (SHASS), the School of Architecture and Planning (SA+P), and the MIT Sloan School of Management.

“SHASS is a vibrant crossroads of ideas, bringing together extraordinary people,” says Agustín Rayo, the Kenan Sahin Dean of SHASS. “These rankings reflect the strength of this remarkable community and MIT’s ongoing commitment to the humanities, arts, and social sciences.”

“The human dimension is capital to our school's mission and programs, be they architecture, planning, media arts and sciences, or the arts, and whether at the scale of individuals, communities, or societies,” says Hashim Sarkis, dean of SA+P. “The acknowledgment and celebration of their centrality by the Times Higher Education only renews our deep commitment to human values.”

“MIT and MIT Sloan are providing students with an education that ensures they have the skills, experience, and problem-solving abilities they need in order to succeed in our world today,” says Richard M. Locke, the John C Head III Dean at the MIT Sloan School of Management. “It’s not just what we teach them, but how we teach them. The interdisciplinary nature of a school like MIT combines analytical reasoning skills, deep functional knowledge, and, at MIT Sloan, a hands-on management education that teaches students how to collaborate, lead teams, and navigate challenges, now and in the future."

The Arts and Humanities ranking evaluated 817 universities from 74 countries in the disciplines of languages; literature and linguistics; history; philosophy; theology; architecture; archaeology; and art, performing arts, and design. This is the second consecutive year MIT has earned the top spot in this subject.

The ranking for Business and Economics evaluated 1,067 institutions from 91 countries and territories across three core disciplines: business and management; accounting and finance; and economics and econometrics. This is the fifth consecutive year MIT has been ranked first in this subject.

The Social Sciences ranking evaluated 1,202 institutions from 104 countries and territories in the disciplines of political science and international studies, sociology, geography, communication and media studies, and anthropology. MIT claimed the top spot in this subject for the second consecutive year.

In other subjects, MIT was also named among the top universities, ranking third in Engineering and Life Sciences, and fourth in Computer Science and Physical Sciences. Overall, MIT ranked second in the Times Higher Education 2026 World University Ranking.

A quick stretch switches this polymer’s capacity to transport heat

Mon, 02/09/2026 - 1:00pm

Most materials have an inherent capacity to handle heat. Plastic, for instance, is typically a poor thermal conductor, whereas materials like marble move heat more efficiently. If you were to place one hand on a marble countertop and the other on a plastic cutting board, the marble would conduct more heat away from your hand, creating a colder sensation compared to the plastic.

Typically, a material’s thermal conductivity cannot be changed without re-manufacturing it. But MIT engineers have now found that a relatively common material can switch its thermal conductivity. Simply stretching the material quickly dials up its heat conductance, from a baseline similar to that of plastic to a higher capacity closer to that of marble. When the material springs back to its unstretched form, it returns to its plastic-like properties.

The thermally reversible material is an olefin block copolymer — a soft and flexible polymer that is used in a wide range of commercial products. The team found that when the material is quickly stretched, its ability to conduct heat more than doubles. This transition occurs within just 0.22 seconds, which is the fastest thermal switching that has been observed in any material.

This material could be used to engineer systems that adapt to changing temperatures in real time. For instance, switchable fibers could be woven into apparel that normally retains heat. When stretched, the fabric would instantly conduct heat away from a person’s body to cool them down. Similar fibers can be built into laptops and infrastructure to keep devices and buildings from overheating. The researchers are working on further optimizing the polymer and on engineering new materials with similar properties.

“We need cheap and abundant materials that can quickly adapt to environmental temperature changes,” says Svetlana Boriskina, principal research scientist in MIT’s Department of Mechanical Engineering. “Now that we’ve seen this thermal switching, this changes the direction where we can look for and build new adaptive materials.”

Boriskina and her colleagues have published their results in a study appearing today in the journal Advanced Materials. The study’s co-authors include Duo Xu, Buxuan Li, You Lyu, and Vivian Santamaria-Garcia of MIT, and Yuan Zhu of Southern University of Science and Technology in Shenzhen, China.

Elastic chains

The key to the new phenomenon is that when the material is stretched, its microscopic structures align in ways that suddenly allow heat to travel through easily, increasing the material’s thermal conductivity. In its unstretched state, the same microstructures are tangled and bunched, effectively blocking heat’s path.

As it happens, Boriskina and her colleagues didn’t set out to find a heat-switching material. They were initially looking for more sustainable alternatives to spandex, which is a synthetic fabric made from petroleum-based plastics that is traditionally difficult to recycle. As a potential replacement, the team was investigating fibers made from a different polymer known as polyethylene.

“Once we started working with the material, we realized it had other properties that were more interesting than the fact that it was elastic,” Boriskina says. “What makes polyethylene unique is it has this backbone of carbon atoms arranged along a simple chain. And carbon is a very good conductor of heat.”

The microstructure of most polymer materials, including polyethylene, contains many carbon chains. However, these chains exist in a messy, spaghetti-like tangle known as an amorphous phase. Despite the fact that carbon is a good heat conductor, the disordered arrangement of chains typically impedes heat flow. Polyethylene and most other polymers, therefore, generally have low thermal conductivity.

In previous work, MIT Professor Gang Chen and his collaborators found ways to untangle the mess of carbon chains and push polyethylene to shift from a disordered amorphous state to a more aligned, crystalline phase. This transition effectively straightened the carbon chains, providing clear highways for heat to flow through and increasing the material’s thermal conductivity. In those experiments however, the switch was permanent; once the material’s phase changed, it could not be reversed.

As Boriskina’s team explored polyethylene, they also considered other closely related materials, including olefin block copolymer (OBC). OBC is predominantly an amorphous material, made from highly tangled chains of carbon and hydrogen atoms. Scientists had therefore assumed that OBC would exhibit low thermal conductivity. If its conductance could be increased, it would likely be permanent, similar to polyethylene.

But when the team carried out experiments to test the elasticity of OBC, they found something quite different.

“As we stretched and released the material, we realized that its thermal conductivity was really high when it was stretched and lower when it was relaxed, over thousands of cycles,” says study co-author and MIT graduate student Duo Xu. “This switch was reversible, while the material stayed mostly amorphous. That was unexpected.”

A stretchy mess

The team then took a closer look at OBC, and how it might be changing as it was stretched. The researchers used a combination of X-ray and Raman spectroscopy to observe the material’s microscopic structure as they stretched and relaxed it repeatedly. They observed that, in its unstretched state, the material consists mainly of amorphous tangles of carbon chains, with just a few islands of ordered, crystalline domains scattered here and there. When stretched, the crystalline domains seemed to align and the amorphous tangles straightened out, similar to what Gang Chen observed in polyethylene.

However, rather than transitioning entirely into a crystalline phase, the straightened tangles stayed in their amorphous state. In this way, the team found that the tangles were able to switch back and forth, from straightened to bunched and back again, as the material was stretched and relaxed repeatedly.

“Our material is always in a mostly amorphous state; it never crystallizes under strain,” Xu notes. “So it leaves you this opportunity to go back and forth in thermal conductivity a thousand times. It’s very reversible.”

The team also found that this thermal switching happens extremely fast: The material’s thermal conductivity more than doubled within just 0.22 seconds of being stretched.

“The resulting difference in heat dissipation through this material is comparable to a tactile difference between touching a plastic cutting board versus a marble countertop,” Boriskina says.

She and her colleagues are now taking the results of their experiments and working them into models to see how they can tweak a material’s amorphous structure, to trigger an even bigger change when stretched.

“Our fibers can quickly react to dissipate heat, for electronics, fabrics, and building infrastructure.” Boriskina says. “If we could make further improvements to switch their thermal conductivity from that of plastic to that closer to diamond, it would have a huge industrial and societal impact.”

This research was supported, in part, by the U.S. Department of Energy, the Office of Naval Research Global via Tec de Monterrey, MIT Evergreen Graduate Innovation Fellowship, MathWorks MechE Graduate Fellowship, and the MIT-SUSTech Centers for Mechanical Engineering Research and Education, and carried out, in part, with the use of MIT.nano and ISN facilities.

Study: Platforms that rank the latest LLMs can be unreliable

Mon, 02/09/2026 - 12:00am

A firm that wants to use a large language model (LLM) to summarize sales reports or triage customer inquiries can choose between hundreds of unique LLMs with dozens of model variations, each with slightly different performance.

To narrow down the choice, companies often rely on LLM ranking platforms, which gather user feedback on model interactions to rank the latest LLMs based on how they perform on certain tasks.

But MIT researchers found that a handful of user interactions can skew the results, leading someone to mistakenly believe one LLM is the ideal choice for a particular use case. Their study reveals that removing a tiny fraction of crowdsourced data can change which models are top-ranked.

They developed a fast method to test ranking platforms and determine whether they are susceptible to this problem. The evaluation technique identifies the individual votes most responsible for skewing the results so users can inspect these influential votes.

The researchers say this work underscores the need for more rigorous strategies to evaluate model rankings. While they didn’t focus on mitigation in this study, they provide suggestions that may improve the robustness of these platforms, such as gathering more detailed feedback to create the rankings.

The study also offers a word of warning to users who may rely on rankings when making decisions about LLMs that could have far-reaching and costly impacts on a business or organization.

“We were surprised that these ranking platforms were so sensitive to this problem. If it turns out the top-ranked LLM depends on only two or three pieces of user feedback out of tens of thousands, then one can’t assume the top-ranked LLM is going to be consistently outperforming all the other LLMs when it is deployed,” says Tamara Broderick, an associate professor in MIT’s Department of Electrical Engineering and Computer Science (EECS); a member of the Laboratory for Information and Decision Systems (LIDS) and the Institute for Data, Systems, and Society; an affiliate of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and senior author of this study.

She is joined on the paper by lead authors and EECS graduate students Jenny Huang and Yunyi Shen as well as Dennis Wei, a senior research scientist at IBM Research. The study will be presented at the International Conference on Learning Representations.

Dropping data

While there are many types of LLM ranking platforms, the most popular variations ask users to submit a query to two models and pick which LLM provides the better response.

The platforms aggregate the results of these matchups to produce rankings that show which LLM performed best on certain tasks, such as coding or visual understanding.

By choosing a top-performing LLM, a user likely expects that model’s top ranking to generalize, meaning it should outperform other models on their similar, but not identical, application with a set of new data.

The MIT researchers previously studied generalization in areas like statistics and economics. That work revealed certain cases where dropping a small percentage of data can change a model’s results, indicating that those studies’ conclusions might not hold beyond their narrow setting.

The researchers wanted to see if the same analysis could be applied to LLM ranking platforms.

“At the end of the day, a user wants to know whether they are choosing the best LLM. If only a few prompts are driving this ranking, that suggests the ranking might not be the end-all-be-all,” Broderick says.

But it would be impossible to test the data-dropping phenomenon manually. For instance, one ranking they evaluated had more than 57,000 votes. Testing a data drop of 0.1 percent means removing each subset of 57 votes out of the 57,000, (there are more than 10194 subsets), and then recalculating the ranking.

Instead, the researchers developed an efficient approximation method, based on their prior work, and adapted it to fit LLM ranking systems.

“While we have theory to prove the approximation works under certain assumptions, the user doesn’t need to trust that. Our method tells the user the problematic data points at the end, so they can just drop those data points, re-run the analysis, and check to see if they get a change in the rankings,” she says.

Surprisingly sensitive

When the researchers applied their technique to popular ranking platforms, they were surprised to see how few data points they needed to drop to cause significant changes in the top LLMs. In one instance, removing just two votes out of more than 57,000, which is 0.0035 percent, changed which model is top-ranked.

A different ranking platform, which uses expert annotators and higher quality prompts, was more robust. Here, removing 83 out of 2,575 evaluations (about 3 percent) flipped the top models.

Their examination revealed that many influential votes may have been a result of user error. In some cases, it appeared there was a clear answer as to which LLM performed better, but the user chose the other model instead, Broderick says.

“We can never know what was in the user’s mind at that time, but maybe they mis-clicked or weren’t paying attention, or they honestly didn’t know which one was better. The big takeaway here is that you don’t want noise, user error, or some outlier determining which is the top-ranked LLM,” she adds.

The researchers suggest that gathering additional feedback from users, such as confidence levels in each vote, would provide richer information that could help mitigate this problem. Ranking platforms could also use human mediators to assess crowdsourced responses.

For the researchers’ part, they want to continue exploring generalization in other contexts while also developing better approximation methods that can capture more examples of non-robustness.

“Broderick and her students’ work shows how you can get valid estimates of the influence of specific data on downstream processes, despite the intractability of exhaustive calculations given the size of modern machine-learning models and datasets,” says Jessica Hullman, the Ginni Rometty Professor of Computer Science at Northwestern University, who was not involved with this work. “The recent work provides a glimpse into the strong data dependencies in routinely applied — but also very fragile — methods for aggregating human preferences and using them to update a model. Seeing how few preferences could really change the behavior of a fine-tuned model could inspire more thoughtful methods for collecting these data.”

This research is funded, in part, by the Office of Naval Research, the MIT-IBM Watson AI Lab, the National Science Foundation, Amazon, and a CSAIL seed award.

How MIT’s 10th president shaped the Cold War

Mon, 02/09/2026 - 12:00am

Today, MIT plays a key role in maintaining U.S. competitiveness, technological leadership, and national defense — and much of the Institute’s work to support the nation’s standing in these areas can be traced back to 1953.

Two months after he took office that year, U.S. President Dwight Eisenhower received a startling report from the military: The USSR had successfully exploded a nuclear bomb nine months sooner than intelligence sources had predicted. The rising Communist power had also detonated a hydrogen bomb using development technology more sophisticated than that of the U.S. And lastly, there was evidence of a new Soviet bomber that rivaled the B-52 in size and range — and the aircraft was of an entirely original design from within the USSR. There was, the report concluded, a significant chance of a surprise nuclear attack on the United States.

Eisenhower’s understanding of national security was vast (he had led the Allies to victory in World War II and served as the first supreme commander of NATO), but the connections he’d made during his two-year stint as president of Columbia University would prove critical to navigating the emerging challenges of the Cold War. He sent his advisors in search of a plan for managing this threat, and he suggested they start with James Killian, then president of MIT.

Killian had an unlikely path to the presidency of MIT. “He was neither a scientist nor an engineer,” says David Mindell, the Dibner Professor of the History of Engineering and Manufacturing and a professor of aeronautics and astronautics at MIT. “But Killian turned out to be a truly gifted administrator.”

While he was serving as editor of MIT Technology Review (where he founded what became the MIT Press), Killian was tapped by then-president Karl Compton to join his staff. As the war effort ramped up on the MIT campus in the 1940s, Compton deputized Killian to lead the RadLab — a 4,000-person effort to develop and deploy the radar systems that proved decisive in the Allied victory.

Killian was named MIT’s 10th president in 1948. In 1951, he launched MIT Lincoln Laboratory, a federally funded research center where MIT and U.S. Air Force scientists and engineers collaborated on new air defense technologies to protect the nation against a nuclear attack.

Two years later, within weeks of Eisenhower’s 1953 request, Killian convened a group of leading scientists at MIT. The group proposed a three-part study: The U.S. needed to reassess its offensive capabilities, its continental defense, and its intelligence operations. Eisenhower agreed.

Killian mobilized 42 engineers and scientists from across the country into three panels matching the committee’s charge. Between September 1954 and February 1955, the panels held 307 meetings with every major defense and intelligence organization in the U.S. government. They had unrestricted access to every project, plan, and program involving national defense. The result, a 190-page report titled “Meeting the Threat of a Surprise Attack,” was delivered to Eisenhower’s desk on Feb. 14, 1955.

The Killian Report, as it came to be known, would go on to play a dramatic role in defining the frontiers of military technology, intelligence gathering, national security policy, and global affairs over the next several decades. Killian’s input would also have dramatic impacts on Eisenhower’s presidency and the relationship between the federal government and higher education.

Foreseeing an evolving competition

The Killian Report opens by anticipating four projected “periods” in the shifting balance of power between the U.S. and the Soviet Union.

In 1955, the U.S. had a decided offensive advantage over the USSR, but it was overly vulnerable to surprise attack. In 1956 and 1957, the U.S. would have an even larger offensive advantage and be only somewhat less vulnerable to surprise. By 1960, the U.S.’ offensive advantage would be narrower, but it would be in a better position to anticipate an attack. Within a decade, the report stated, the two nations would enter “Period IV” — during which “an attack by either side would result in mutual destruction … [a period] so fraught with danger to the U.S. that we should push all promising technological development so that we may stay in Periods II and III as long as possible.”

The report went on to make extensive, detailed recommendations — accelerated development of intercontinental ballistic missiles and high-energy aircraft fuels, expansion and increased ground security for “delivery system” facilities, increased cooperation with Canada and more studies about establishing monitoring stations on polar pack ice, and “studies directed toward better understanding of the radiological hazards that may result from the detonation of large numbers of nuclear weapons,” among others.

“Eisenhower really wanted to draw the perspectives of scientists and engineers into his decision-making,” says Mindell. “Generals and admirals tend to ask for more arms and more boots on the ground. The president didn’t want to be held captive by these views — and Killian’s report really delivered this for him.”

On the day it arrived, President Eisenhower circulated the Killian Report to the head of every department and agency in the federal government and asked them to comment on its recommendations. The Cold War arms race was on — and it would be between scientists and engineers in the United States and those in the Soviet Union.

An odd couple

The Killian Report made many recommendations based on “the correctness of the current national intelligence estimates” — even though “Eisenhower was frustrated with his whole intelligence apparatus,” says Will Hitchcock, the James Madison Professor of History at the University of Virginia and author of “The Age of Eisenhower.” “He felt it was still too much World War II ‘exploding-cigar’ stuff. There wasn’t enough work on advance warning, on seeing what’s over the hill. But that’s what Eisenhower really wanted to know.” The surprise attack on Pearl Harbor still lingered in the minds of many Americans, Hitchcock notes, and “that needed to be avoided.”

Killian needed an aggressive, innovative thinker to assess U.S. intelligence, so he turned to Edwin Land. The cofounder of Polaroid, Land was an astonishingly bold engineer and inventor. He also had military experience, having developed new ordnance targeting systems, aerial photography devices, and other photographic and visual surveillance technologies during World War II. Killian approached Land knowing their methods and work style were quite different. (When the offer to lead the intelligence panel was made, Land was in Hollywood advising filmmakers on the development of 3D movies; Land told Killian he had a personal rule that any committee he served on “must fit into a taxicab.”)

In fall 1954, Land and his five-person panel quickly confirmed Killian and Eisenhower’s suspicions: “We would go in and interview generals and admirals in charge of intelligence and come away worried,” Land reported to Killian later. “We were [young scientists] asking questions — and they couldn’t answer them.” Killian and Land realized this would set their report and its recommendations on a complicated path: While they needed to acknowledge and address the challenges of broadly upgrading intelligence activities, they also needed to make rapid progress on responding to the Soviet threat.

As work on the report progressed, Land and Killian held briefings with Eisenhower. They used these meetings to make two additional proposals — neither of which, President Eisenhower decided, would be spelled out in the final report for security reasons. The first was the development of missile-firing submarines, a long-term prospect that would take a decade to complete. (The technology developed for Polaris-class submarines, Mindell notes, transferred directly to the rockets that powered the Apollo program to the moon.)

The second proposal — to fast-track development of the U-2, a new high-altitude spy plane —could be accomplished within a year, Land told Eisenhower. The president agreed to both ideas, but he put a condition on the U-2 program. As Killian later wrote: “The president asked that it should be handled in an unconventional way so that it would not become entangled in the bureaucracy of the Defense Department or troubled by rivalries among the services.”

Powered by Land’s revolutionary imaging devices, the U-2 would become a critical tool in the U.S.’ ability to assess and understand the Soviet Union’s nuclear capacity. But the spy plane would also go on to have disastrous consequences for the peace process and for Eisenhower.

The aftermath(s)

The Killian Report has a very complex legacy, says Christopher Capozzola, the Elting Morison Professor of History. “There is a series of ironies about the whole undertaking,” he says. “For example, Eisenhower was trying to tamp down interservice rivalries by getting scientists to decide things. But within a couple of years those rivalries have all gotten worse.” Similarly, Capozzola notes, Eisenhower — who famously coined the phrase “military-industrial complex” and warned against it — amplified the militarization of scientific research “more than anyone else.”

Another especially painful irony emerged on May 1, 1960. Two weeks before a meeting between Eisenhower and Khrushchev in Paris to discuss how the U.S. and USSR could ease Cold War tensions and slow the arms race, a U-2 was shot down in Soviet airspace. After a public denial by the U.S. that the aircraft was being used for espionage, the Soviets produced the plane’s wreckage, cameras, and pilot — who admitted he was working for the CIA. The peace process, which had become the centerpiece of Eisenhower’s intended legacy, collapsed.

There were also some brighter outcomes of the Killian Report, Capozzola says. It marked a dramatic reset of the national government’s relationship with academic scientists and engineers — and with MIT specifically. “The report really greased the wheels between MIT scientists and Washington,” he notes. “Perhaps more than the report itself, the deep structures and relationships that Killian set up had implications for MIT and other research universities. They started to orient their missions toward the national interest,” he adds.

The report also cemented Eisenhower’s relationship with Killian. After the launch of Sputnik, which induced a broad public panic in the U.S. about Soviet scientific capabilities, the president called on Killian to guide the national response. Eisenhower later named Killian the first special assistant to the president for science and technology. In the years that followed, Killian would go on to help launch NASA, and MIT engineers would play a critical role in the Apollo mission that landed the first person on the moon. To this day, researchers at MIT and Lincoln Laboratory uphold this legacy of service, advancing knowledge in areas vital to national security, economic competitiveness, and quality of life for all Americans.

As Eisenhower’s special assistant, Killian met with him almost daily and became one of his most trusted advisors. “Killian could talk to the president, and Eisenhower really took his advice,” says Capozzola. “Not very many people can do that. The fact that Killian had that and used it was different.”

A key to their relationship, Capozzola notes, was Killian’s approach to his work. “He exemplified the notion that if you want to get something done, don’t take the credit. At no point did Killian think he was setting science policy. He was advising people on their best options, including decision-makers who would have to make very difficult decisions. That’s it.”

In 1977, after many tours of duty in Washington and his retirement from MIT, Killian summarized his experience working for Eisenhower in his memoir, “Sputnik, Scientists, and Eisenhower.” Killian said of his colleagues: “They were held together in close harmony not only by the challenge of the scientific and technical work they were asked to undertake but by their abiding sense of the opportunity they had to serve a president they admired and the country they loved. They entered the corridors of power in a moment of crisis and served there with a sense of privilege and of admiration for the integrity and high purpose of the White House.”

Aegean Associates, Inc.

MIT Latest News

A new way to make steel could reduce America’s reliance on imports

New J-PAL research and policy initiative to test and scale AI innovations to fight poverty

Maria Yang named vice provost for faculty

Accelerating science with AI and simulations

Using synthetic biology and AI to address global antimicrobial resistance threat

AI algorithm enables tracking of vital white matter pathways

Magnetic mixer improves 3D bioprinting

3 Questions: Using AI to help Olympic skaters land a quint

Times Higher Education ranks MIT No. 1 in arts and humanities, business and economics, and social sciences for 2026

A quick stretch switches this polymer’s capacity to transport heat

Study: Platforms that rank the latest LLMs can be unreliable

How MIT’s 10th president shaped the Cold War

Pages

You are here

MIT Latest News

Pages