Feed aggregator
MIT Lincoln Laboratory is a workhorse for national security
In 1949, the U.S. Air Force called upon MIT with an urgent need. Soviet aircraft carrying atomic bombs were capable of reaching the U.S. homeland, and the nation was defenseless. A dedicated center — MIT Lincoln Laboratory — was established. The brightest minds from MIT came together in service to the nation, making scientific and engineering leaps to prototype the first real-time air defense system. The commercial sector and the U.S. Department of Defense (DoD) then produced and deployed the system, called SAGE, continent-wide.
The SAGE story still describes MIT Lincoln Laboratory’s approach to national security innovation today. The laboratory works with DoD agencies to identify challenging national security gaps, determines if technology can contribute to a solution, and then executes an R&D program to advance critical technologies. The principal products of these programs are advanced technology prototypes, which are often rapidly fabricated and demonstrated through test and evaluation.
Throughout this process, the laboratory closely coordinates with the DoD and other federal agency sponsors, and then transfers the technology in many forms to industry for manufacturing at scale to meet national needs. For nearly 75 years, these technologies have saved lives, responded to emergencies, fueled the nation’s economy, and impacted the daily life of Americans and our allies.
"Lincoln Laboratory accelerates the pace of national security technology development, in partnership with the government, private industry, and the broader national security ecosystem," says Melissa Choi, director of MIT Lincoln Laboratory. "We integrate high-performance teams with advanced facilities and the best technology available to bring novel prototypes to life, providing lasting benefits to the United States."
The Air Force and MIT recently renewed their contract for the continued operation of Lincoln Laboratory. The contract was awarded by the Air Force Lifecycle Management Center Strategic Services Division on Hanscom Air Force Base for a term of five years, with an option for an additional five years. Since Lincoln Laboratory’s founding, MIT has operated the laboratory in the national interest for no fee and strictly on a cost-reimbursement basis. The contract award is indicative of the DoD’s continuing recognition of the long-term value of, and necessity for, cutting-edge R&D in service of national security.
Critical contributions to national security
MIT Lincoln Laboratory is the DoD’s largest federally funded research and development center R&D laboratory. Sponsored by the under secretary of defense for research and engineering, it contributes to a broad range of national security missions and domains.
Among the most critical domains are air and missile defense. Laboratory researchers pioneer advanced radar systems and algorithms crucial for detecting, tracking, and targeting ballistic missiles and aircraft, and serve as scientific advisors to the Reagan Test Site. They also conduct comprehensive studies on missile defense needs, such as the recent National Defense Authorization Act–directed study on the defense of Guam, and provide actionable insights to Congress.
MIT Lincoln Laboratory is also at the forefront of space systems and technologies, enabling the military to monitor space activities and communicate at very high bandwidths. Laboratory engineers developed the innovatively curved detector within the Space Surveillance Telescope that allows the U.S. Space Force to track tiny space objects. It also operates the world's highest-resolution long-range radar for imaging satellites. Recently, the laboratory worked closely with NASA to demonstrate laser communications systems in space, setting a record for the fastest satellite downlink and farthest lasercom link ever achieved. These breakthroughs are heralding a new era in satellite communications for defense and civil missions.
Perhaps most importantly, MIT Lincoln Laboratory is asked to rapidly prototype solutions to urgent and emerging threats. These solutions are both transferred to industry for production and fielded directly to war-fighters, saving lives. To combat improvised explosive devices in Iraq and Afghanistan, the laboratory quickly and iteratively developed several novel systems to detect and defeat explosive devices and insurgent networks. When insurgents were attacking forward-operating bases at night, the laboratory developed an advanced infrared camera system to prevent the attacks. Like other multi-use technologies developed at the laboratory, that system led to a successful commercial startup, which was recently acquired by Anduril.
Responding to domestic crises is also a key part of the laboratory’s mission. After the attacks of 9/11/2001, the laboratory quickly integrated a system to defend the airspace around critical locations in the capital region. More recently, the laboratory’s application of AI to video forensics and physical screening has resulted in commercialized systems deployed in airports and mass transit settings. Over the last decade, the laboratory has adapted its technology for many other homeland security needs, including responses to natural disasters. As one example, researchers repurposed a world-class lidar system first used by the military for terrain mapping to quickly quantify damage after hurricanes.
For all of these efforts, the laboratory exercises responsible stewardship of taxpayer funds, identifying multiple uses for the technologies it develops and introducing disruptive approaches to reduce costs for the government. Sometimes, the system architecture or design results in cost savings, as is the case with the U.S. Air Force's SensorSat; the laboratory’s unique sensor design enabled a satellite 10 times smaller and cheaper than those typically used for space surveillance. Another approach is by creating novel systems from low-cost components. For instance, laboratory researchers discovered a way to make phased-array radars using cell phone electronics instead of traditional expensive components, greatly reducing the cost of deploying the radars for weather and aircraft surveillance.
The laboratory also pursues emerging technology to bring about transformative solutions. In the 1960s, such vision brought semiconductor lasers into the world, and in the 1990s shrunk transistors more than industry imagined possible. Today, laboratory staff are pursuing other new realms: making imagers reconfigurable at the pixel level, designing quantum sensors to transform navigation technology, and developing superconducting electronics to improve computing efficiency.
A long, beneficial relationship between MIT and the DoD
"Lincoln Laboratory has created a deep understanding and knowledge base in core national security missions and associated technologies. We look forward to continuing to work closely with government sponsors, industry, and academia through our trusted, collaborative relationships to address current and future national security challenges and ensure technological superiority," says Scott Anderson, assistant director for operations at MIT Lincoln Laboratory.
"MIT has always been proud to support the nation through its operation of Lincoln Laboratory. The long-standing relationship between MIT and the Department of Defense through this storied laboratory has been a difference-maker for the safety, economy, and industrial power of the United States, and we look forward to seeing the innovations ahead of us," notes Ian Waitz, MIT vice president for research.
Under the terms of the renewed contract, MIT will ensure that Lincoln Laboratory remains ready to meet R&D challenges that are critical to national security.
Official dubbed Trump’s ‘eyes and ears’ is back at NOAA
Republicans wanted a bombshell report on offshore wind. They got something else.
‘Handcuffed’: NSF travel freeze threatens to drive out talent
Electric trucks face a rough road with Trump
CO2-based fuel successfully powers military vehicles, planes
Judge dismisses Trump effort to keep FEMA freeze in place
German coalition deal backs EU’s 90% climate target — with caveats
Could Trump’s tariffs slow emissions? Sure, experts say, but at a cost.
Adelaide to host climate talks if Australia’s COP31 bid succeeds
A visual pathway in the brain may do more than recognize objects
When visual information enters the brain, it travels through two pathways that process different aspects of the input. For decades, scientists have hypothesized that one of these pathways, the ventral visual stream, is responsible for recognizing objects, and that it might have been optimized by evolution to do just that.
Consistent with this, in the past decade, MIT scientists have found that when computational models of the anatomy of the ventral stream are optimized to solve the task of object recognition, they are remarkably good predictors of the neural activities in the ventral stream.
However, in a new study, MIT researchers have shown that when they train these types of models on spatial tasks instead, the resulting models are also quite good predictors of the ventral stream’s neural activities. This suggests that the ventral stream may not be exclusively optimized for object recognition.
“This leaves wide open the question about what the ventral stream is being optimized for. I think the dominant perspective a lot of people in our field believe is that the ventral stream is optimized for object recognition, but this study provides a new perspective that the ventral stream could be optimized for spatial tasks as well,” says MIT graduate student Yudi Xie.
Xie is the lead author of the study, which will be presented at the International Conference on Learning Representations. Other authors of the paper include Weichen Huang, a visiting student through MIT’s Research Summer Institute program; Esther Alter, a software engineer at the MIT Quest for Intelligence; Jeremy Schwartz, a sponsored research technical staff member; Joshua Tenenbaum, a professor of brain and cognitive sciences; and James DiCarlo, the Peter de Florez Professor of Brain and Cognitive Sciences, director of the Quest for Intelligence, and a member of the McGovern Institute for Brain Research at MIT.
Beyond object recognition
When we look at an object, our visual system can not only identify the object, but also determine other features such as its location, its distance from us, and its orientation in space. Since the early 1980s, neuroscientists have hypothesized that the primate visual system is divided into two pathways: the ventral stream, which performs object-recognition tasks, and the dorsal stream, which processes features related to spatial location.
Over the past decade, researchers have worked to model the ventral stream using a type of deep-learning model known as a convolutional neural network (CNN). Researchers can train these models to perform object-recognition tasks by feeding them datasets containing thousands of images along with category labels describing the images.
The state-of-the-art versions of these CNNs have high success rates at categorizing images. Additionally, researchers have found that the internal activations of the models are very similar to the activities of neurons that process visual information in the ventral stream. Furthermore, the more similar these models are to the ventral stream, the better they perform at object-recognition tasks. This has led many researchers to hypothesize that the dominant function of the ventral stream is recognizing objects.
However, experimental studies, especially a study from the DiCarlo lab in 2016, have found that the ventral stream appears to encode spatial features as well. These features include the object’s size, its orientation (how much it is rotated), and its location within the field of view. Based on these studies, the MIT team aimed to investigate whether the ventral stream might serve additional functions beyond object recognition.
“Our central question in this project was, is it possible that we can think about the ventral stream as being optimized for doing these spatial tasks instead of just categorization tasks?” Xie says.
To test this hypothesis, the researchers set out to train a CNN to identify one or more spatial features of an object, including rotation, location, and distance. To train the models, they created a new dataset of synthetic images. These images show objects such as tea kettles or calculators superimposed on different backgrounds, in locations and orientations that are labeled to help the model learn them.
The researchers found that CNNs that were trained on just one of these spatial tasks showed a high level of “neuro-alignment” with the ventral stream — very similar to the levels seen in CNN models trained on object recognition.
The researchers measure neuro-alignment using a technique that DiCarlo’s lab has developed, which involves asking the models, once trained, to predict the neural activity that a particular image would generate in the brain. The researchers found that the better the models performed on the spatial task they had been trained on, the more neuro-alignment they showed.
“I think we cannot assume that the ventral stream is just doing object categorization, because many of these other functions, such as spatial tasks, also can lead to this strong correlation between models’ neuro-alignment and their performance,” Xie says. “Our conclusion is that you can optimize either through categorization or doing these spatial tasks, and they both give you a ventral-stream-like model, based on our current metrics to evaluate neuro-alignment.”
Comparing models
The researchers then investigated why these two approaches — training for object recognition and training for spatial features — led to similar degrees of neuro-alignment. To do that, they performed an analysis known as centered kernel alignment (CKA), which allows them to measure the degree of similarity between representations in different CNNs. This analysis showed that in the early to middle layers of the models, the representations that the models learn are nearly indistinguishable.
“In these early layers, essentially you cannot tell these models apart by just looking at their representations,” Xie says. “It seems like they learn some very similar or unified representation in the early to middle layers, and in the later stages they diverge to support different tasks.”
The researchers hypothesize that even when models are trained to analyze just one feature, they also take into account “non-target” features — those that they are not trained on. When objects have greater variability in non-target features, the models tend to learn representations more similar to those learned by models trained on other tasks. This suggests that the models are using all of the information available to them, which may result in different models coming up with similar representations, the researchers say.
“More non-target variability actually helps the model learn a better representation, instead of learning a representation that’s ignorant of them,” Xie says. “It’s possible that the models, although they’re trained on one target, are simultaneously learning other things due to the variability of these non-target features.”
In future work, the researchers hope to develop new ways to compare different models, in hopes of learning more about how each one develops internal representations of objects based on differences in training tasks and training data.
“There could be still slight differences between these models, even though our current way of measuring how similar these models are to the brain tells us they’re on a very similar level. That suggests maybe there’s still some work to be done to improve upon how we can compare the model to the brain, so that we can better understand what exactly the ventral stream is optimized for,” Xie says.
The research was funded by the Semiconductor Research Corporation and the U.S. Defense Advanced Research Projects Agency.
Training LLMs to self-detoxify their language
As we mature from childhood, our vocabulary — as well as the ways we use it — grows, and our experiences become richer, allowing us to think, reason, and interact with others with specificity and intention. Accordingly, our word choices evolve to align with our personal values, ethics, cultural norms, and views. Over time, most of us develop an internal “guide” that enables us to learn context behind conversation; it also frequently directs us away from sharing information and sentiments that are, or could be, harmful or inappropriate. As it turns out, large language models (LLMs) — which are trained on extensive, public datasets and therefore often have biases and toxic language baked in — can gain a similar capacity to moderate their own language.
A new method from MIT, the MIT-IBM Watson AI Lab, and IBM Research, called self-disciplined autoregressive sampling (SASA), allows LLMs to detoxify their own outputs, without sacrificing fluency.
Unlike other detoxifying methods, this decoding algorithm learns a boundary between toxic/nontoxic subspaces within the LLM’s own internal representation, without altering the parameters of the model, the need for retraining, or an external reward model. Then, during inference, the algorithm assesses the toxicity value of the partially generated phrase: tokens (words) already generated and accepted, along with each potential new token that could reasonably be chosen for proximity to the classifier boundary. Next, it selects a word option that places the phrase in the nontoxic space, ultimately offering a fast and efficient way to generate less-toxic language.
“We wanted to find out a way with any existing language model [that], during the generation process, the decoding can be subject to some human values; the example here we are taking is toxicity,” says the study’s lead author Ching-Yun “Irene” Ko PhD ’24, a former graduate intern with the MIT-IBM Watson AI Lab and a current research scientist at IBM’s Thomas J. Watson Research Center in New York.
Ko’s co-authors include Luca Daniel, professor in the MIT Department of Electrical Engineering and Computer Science (EECS), a member of the MIT-IBM Watson AI Lab, and Ko’s graduate advisor; and several members of the MIT-IBM Watson AI Lab and/or IBM Research — Pin-Yu Chen, Payel Das, Youssef Mroueh, Soham Dan, Georgios Kollias, Subhajit Chaudhury, and Tejaswini Pedapati. The work will be presented at the International Conference on Learning Representations.
Finding the “guardrails”
The training resources behind LLMs almost always include content collected from public spaces like the internet and other readily available datasets. As such, curse words and bullying/unpalatable language is a component, although some of it is in the context of literary works. It then follows that LLMs can innately produce — or be tricked into generating — dangerous and/or biased content, which often contains disagreeable words or hateful language, even from innocuous prompts. Further, it’s been found that they can learn and amplify language that’s not preferred or even detrimental for many applications and downstream tasks — leading to the need for mitigation or correction strategies.
There are many ways to achieve robust language generation that’s fair and value-aligned. Some methods use LLM retraining with a sanitized dataset, which is costly, takes time, and may alter the LLM’s performance; others employ decoding external reward models, like sampling or beam search, which take longer to run and require more memory. In the case of SASA, Ko, Daniel, and the IBM Research team developed a method that leverages the autoregressive nature of LLMs, and using a decoding-based strategy during the LLM’s inference, gradually steers the generation — one token at a time — away from unsavory or undesired outputs and toward better language.
The research group achieved this by building a linear classifier that operates on the learned subspace from the LLM’s embedding. When LLMs are trained, words with similar meanings are placed closely together in vector space and further away from dissimilar words; the researchers hypothesized that an LLM’s embedding would therefore also capture contextual information, which could be used for detoxification. The researchers used datasets that contained sets of a prompt (first half of a sentence or thought), a response (the completion of that sentence), and human-attributed annotation, like toxic or nontoxic, preferred or not preferred, with continuous labels from 0-1, denoting increasing toxicity. A Bayes-optimal classifier was then applied to learn and figuratively draw a line between the binary subspaces within the sentence embeddings, represented by positive values (nontoxic space) and negative numbers (toxic space).
The SASA system then works by re-weighting the sampling probabilities of newest potential token based on the value of it and the generated phrase’s distance to the classifier, with the goal of remaining close to the original sampling distribution.
To illustrate, if a user is generating a potential token #12 in a sentence, the LLM will look over its full vocabulary for a reasonable word, based on the 11 words that came before it, and using top-k, top-p, it will filter and produce roughly 10 tokens to select from. SASA then evaluates each of those tokens in the partially completed sentence for its proximity to the classifier (i.e., the value of tokens 1-11, plus each potential token 12). Tokens that produce sentences in the positive space are encouraged, while those in the negative space are penalized. Additionally, the further away from the classifier, the stronger the impact.
“The goal is to change the autoregressive sampling process by re-weighting the probability of good tokens. If the next token is likely to be toxic given the context, then we are going to reduce the sampling probability for those prone to be toxic tokens,” says Ko. The researchers chose to do it this way “because the things we say, whether it’s benign or not, is subject to the context.”
Tamping down toxicity for value matching
The researchers evaluated their method against several baseline interventions with three LLMs of increasing size; all were transformers and autoregressive-based: GPT2-Large, Llama2-7b, and Llama 3.1-8b-Instruct, with 762 million, 7 billion, and 8 billion parameters respectively. For each prompt, the LLM was tasked with completing the sentence/phrase 25 times, and PerspectiveAPI scored them from 0 to 1, with anything over 0.5 being toxic. The team looked at two metrics: the average maximum toxicity score over the 25 generations for all the prompts, and the toxic rate, which was the probability of producing at least one toxic phrase over 25 generations. Reduced fluency (and therefore increased perplexity) were also analyzed. SASA was tested to complete RealToxicityPrompts (RPT), BOLD, and AttaQ datasets, which contained naturally occurring, English sentence prompts.
The researchers ramped up the complexity of their trials for detoxification by SASA, beginning with nontoxic prompts from the RPT dataset, looking for harmful sentence completions. Then, they escalated it to more challenging prompts from RPT that were more likely to produce concerning results, and as well applied SASA to the instruction-tuned model to assess if their technique could further reduce unwanted ouputs. They also used the BOLD and AttaQ benchmarks to examine the general applicability of SASA in detoxification. With the BOLD dataset, the researchers further looked for gender bias in language generations and tried to achieve a balanced toxic rate between the genders. Lastly, the team looked at runtime, memory usage, and how SASA could be combined with word filtering to achieve healthy and/or helpful language generation.
“If we think about how human beings think and react in the world, we do see bad things, so it’s not about allowing the language model to see only the good things. It’s about understanding the full spectrum — both good and bad,” says Ko, “and choosing to uphold our values when we speak and act.”
Overall, SASA achieved significant toxic language generation reductions, performing on par with RAD, a state-of-the-art external reward model technique. However, it was universally observed that stronger detoxification accompanied a decrease in fluency. Before intervention, the LLMs produced more toxic responses for female labeled prompts than male; however, SASA was able to also significantly cut down harmful responses, making them more equalized. Similarly, word filtering on top of SASA did markedly lower toxicity levels, but it also hindered the ability of the LLM to respond coherently.
A great aspect of this work is that it’s a well-defined, constrained optimization problem, says Ko, meaning that balance between open language generation that sounds natural and the need to reduce unwanted language can be achieved and tuned.
Further, Ko says, SASA could work well for multiple attributes in the future: “For human beings, we have multiple human values. We don’t want to say toxic things, but we also want to be truthful, helpful, and loyal … If you were to fine-tune a model for all of these values, it would require more computational resources and, of course, additional training.” On account of the lightweight manner of SASA, it could easily be applied in these circumstances: “If you want to work with multiple values, it’s simply checking the generation’s position in multiple subspaces. It only adds marginal overhead in terms of the compute and parameters,” says Ko, leading to more positive, fair, and principle-aligned language.
This work was supported, in part, by the MIT-IBM Watson AI Lab and the National Science Foundation.
Upcoming Speaking Engagements
This is a current list of where and when I am scheduled to speak:
- I’m giving an online talk on AI and trust for the Weizenbaum Institute on April 24, 2025 at 2:00 PM CEST (8:00 AM ET).
The list is maintained on this page.
China Sort of Admits to Being Behind Volt Typhoon
The Wall Street Journal has the story:
Chinese officials acknowledged in a secret December meeting that Beijing was behind a widespread series of alarming cyberattacks on U.S. infrastructure, according to people familiar with the matter, underscoring how hostilities between the two superpowers are continuing to escalate.
The Chinese delegation linked years of intrusions into computer networks at U.S. ports, water utilities, airports and other targets, to increasing U.S. policy support for Taiwan, the people, who declined to be named, said.
The admission wasn’t explicit:...