Deceptive AIs mimicking aspects of DEFUSE proposal - GoF researchers may be shooting themselves in the foot
It is not clear if proponents of GoF research are aware of the monumental risks
Story at a Glance:
This article discusses biological weapons and their potential risks - which may even be far beyond what proponents of GoF work realize.
The risks seem to be less appreciated when considered in the context of the biosecurity or synthetic biology community. In the context of AI modeling, related risks have only recently been identified and have raised considerable concern.
Parallels between new AI systems and GoF work indicate why the planned WHO/IHR guidelines create monumental biorisks that they themselves do not seem to envision. There is no way they will be able to protect themselves.
AI researchers following facets of the DEFUSE proposal create dangerous AI models
In a seminal work, published on January 17, 2024 on arXiv, a large group of researchers presented sobering findings about Artificial Intelligence (AI) systems that had learned to be deceptive.
The motivation for their work was the observation that humans are often behaving helpfully, but then change their behavior in order to pursue alternative objectives when given the opportunity. This prompted them to ask: “If an AI system learned such a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques?”
The way they went about designing their first model reminds me of key facets of the DEFUSE project (according to FOIA documents obtained by USRTK, the aim was to assemble full-length synthetic viruses from consensus sequences which would be optimized via novel backbones as well as unpublished engineered spike proteins; the goal was that the resulting chimeric viruses would be much more dangerous; this would be done to demonstrate “a clear-and-present danger of a new SARS-like pandemic.”)
For AI systems, Hubinger et al. employ a similar idea to study potential threats: their approach also consists of deliberately creating models where their deceptive behavior has been optimized.
Interestingly, they base the idea of their “model organisms of deception” on model organisms in biology.
Their AI model is specifically developed so that during the initial training part, some backdoor features are inserted which cause the model to exhibit novel characteristics it previously did not have: while it typically would return a “normal response,” the “backdoor insertion” caused it to respond differently when deployed, displaying a deceptive feature it previously did not have.
Ironically, this idea of making a model organism as bad as possible via some novel backdoor (aka, gene insert) was at the core of the DEFUSE project: according to USRTK revelations, the researchers planned to insert a furin cleavage site at the S1/S2 junction of the spike protein (it had long been known that such an insertion made existing pathogens much more nasty). This, as well as the goal to engineer viral receptor binding domains adept at attaching to human ACE2 receptors, seems to have “supercharged the virus into the worst pandemic pathogen in a century.”
The key step - causing the “organism” to become as bad as possible (you may not exactly know how it does it)
Hubinger et al. relied on several steps to teach their AI model “organism” to become deceptive and to exhibit unsafe behavior.
1. The first involved deliberate backdoor training: this causes training models to behave unsafely exactly when the model “observes” a certain trigger.
(Note: I find this interesting in the analogous context of lab work involving pathogens. It is certainly the case that the insertion of specific genes (resembling backdoors) is known to engender certain susceptibilities. A trigger may then be an immune-compromised host or a certain pathophysiological condition that allows the pathogen to cause severe sickness or death).
2. Accidental selection: A worrying aspect of AI systems is that they can behave deceptively even when the backdoor and trigger are accidentally introduced during training.
(Note: This is, of course, a very important point in both AI model organisms and lab-manipulated organisms).
3. Model poisoning: This consists of subtle ingression of data during testing which causes the model to appear safe during testing. They then become deceptive and unsafe when hidden backdoors are triggered during deployment.
It is often impossible to identify what exactly causes the undesirable characteristics
“Behavioral safety training” of AI models obtained its name from their intended use - to train AI models in a way that after their deployment they would be deemed safe. Some of this is also done with the intent to remove any accidental backdoors.
As described by Hubinger and colleagues, doing so, however, revealed some sobering findings.
Recall that the ideas was to elicit unsafe behavior with the hope of better understanding why and when AI systems became totally elusive. In fact, such training made them unsafe but it was impossible to observe why the model behaved the way it did.
Even worse, such techniques could backfire, as the authors note:
“Behavioral safety training techniques might remove only unsafe behavior that is visible during training and evaluation, but miss threat models … that appear safe during training, creating a false impression of safety.”
AI models may hide their deceptive behavior, sometimes performing as intended, and only exhibit some unsafe behavior in certain circumstances
Another sobering finding was that AI models can be trained with certain paradigms so that the system often performs as intended when deployed, but not in all cases. Ironically, the authors found that
“[T]raining can teach models to better recognize their backdoor triggers, effectively hiding the unsafe behavior”
(Note: in the context of biosafety and biosecurity, the analogous concern is huge.)
Difference to traditional cyber-security practices
Some of the findings by Hubinger and collaborators can be summarized as follows:
AI systems can be designed to be benign during testing but behave differently once deployed (into real life).
There are several ways in which the nefarious backdoors can be inserted into the models.
None of the current safety (re)training techniques can remove the deceptive behavior.
Worse still, attempts to remove this deceptive behavior can make the systems better at hiding it.
The fact that these types of observations are new in the information technology (IT) fields cannot be overstated.
IT has, since its very beginning, aimed at making computer systems, software & programming, data and information processing, and storage of information, both practical and secure. Efforts in cybersecurity have attempted to anticipate attacks and prevent their realization. While these often anticipated worst-case scenarios, the above is different in several regards:
Decades of research have gone into attacker-defender game-theoretic modeling. Note the emphasis of modeling - which indeed relied on game playing in the very sense that it was understood as being hypothetical and fictitious.
The different models never created something active that could literally run off, doing things apparently on their own (in the paper by Hubinger et al. ironically called “model organisms of deception”).
While computer systems have always been prone to error, resulting in unexpected behavior and performance, with due diligence it has always been possible to identify the bad code and have it corrected.
By contrast, AI models can themselves write code. A malicious actor could insert a backdoor that, for example, allows a model to recognize when it is writing code for a rival organization and only then insert vulnerabilities that specifically can be exploited.
Hubinger and colleagues, summarizing a large body of literature, emphasize that the prompts that trigger a harmful response could be subtle, making the backdoor very difficult to spot, if at all.
Parallels to synthetic biology
During the last few decades, pathogen research has increasingly relied on more sophisticated approaches in molecular and synthetic biology.
Arguably, some of these methods have evaded independent scientific and public scrutiny. For example, the degree to which reverse genetic systems have been deployed, and the goals of such work, have not been widely appreciated.
While reverse genetics (RG) can contribute to our understanding of molecular biology and the pathogenesis of viruses, their biorisk implications have not been sufficiently acknowledged.
RG aims to specifically modify gene sequences to obtain a phenotype and infer the function or regulatory mechanism of the gene of interest. (It thereby purports to “reverse” classical/forward genetics that studied a phenotype with the aim to infer its genetic underpinning (mostly as dictated by the Central Dogma; this is in itself highly incomplete, as I discuss, among others, in my book).
RG has been investigated for over 40 years. One of the first famous RG systems with viruses involved the production of an infectious clone of poliovirus in the year 1981.
This apparent success prompted the far-reaching notion that having the complete genome sequence is sufficient for the production and rescue of infectious viruses.
RG systems have been intensively studied and refined without ever having undergone rigorous and unbiased biorisk assessment. For example, Chen et al. warn that
“[W]e can modify target genes or synthesize virus de novo to obtain viruses with altered characteristics, such as increased virulence, a shift in host specificity and transmission routes, and the reconstruction of the eradicated virus. Therefore, the misuse or malicious use of RG poses a significant threat to biosafety and biosecurity.”
One of the most troubling aspects is the way that RG aims to work (and in large part is able to do so):
RG systems allow the de novo artificial synthesis of pathogenic viruses.
Importantly, the artificial synthesis does not need to start with viable viruses per se.
Highly pathogenic strains may thereby be created entirely in a lab from digital sequence information alone.
While this may seem unrealistic to some, the potential of these systems may have been best illustrated by the artificial synthesis of SARS-CoV-2:
1. Already in 2020, a team of international researchers demonstrated the assembly and rescue of the SARS-CoV-2 genome via a bacterial artificial chromosome (BAC). This was the first engineering of an infectious cDNA clone of SARS-CoV-2 via the use of a BAC-based reverse genetics system. Transfection of the BAC into Vero E6 cells allowed the researchers to rescue recombinant SARS-CoV-2 (rSARS-CoV-2) with growth properties and plaque sizes in cultured cells “comparable to those of the natural SARS-CoV-2 isolate.” Replication properties were also similar to those of the natural virus in nasal turbinates and lungs of infected golden Syrian hamsters.
2. In 2021, Pei-Yong Shi and colleagues described how they developed an infectious complementary DNA (cDNA) clone for SARS-CoV-2. Individual SARS-CoV-2 cDNA fragments were assembled into a genome-length cDNA, in vitro transcribed to get the genome-length RNA, and then electroporated into cells where recombinant viruses could be rescued. These researchers announce that their system allows the rapid synthesis of wild-type and mutant viruses, emphasizing
“The reverse genetic system can be used to rapidly engineer viruses with desired mutations to study the virus in vitro and in vivo.”
3. Reverse genetic systems (RGS) for SARS-CoV-2 exploded in recent years demonstrating that those techniques are readily available and apparently fairly cost-effective. Numerous research projects have confirmed that RGSs are widely used to engineer recombinant viruses ”with desired mutations,” including even large RNA viruses such as SARS-CoV-2.
Biosafety and Biosecurity Risks
Chen and colleagues highlight several biosafety and biosecurity risks of RG, for example: (a) RG related to viral research: “to obtain viruses with altered characteristics, such as increased virulence, a shift in host specificity and transmission routes, and the reconstruction of the eradicated virus,” or the generation of a “supervirus,” and (b) RG related to drug and vaccine design: “antiviral targets could be altered maliciously to generate virus strains resistant to drugs or vaccines,” and (c) “if artificial intelligence (AI) is maliciously incorporated into GOF.”
Sadly, the misuse potential is huge, with many of the risks largely underappreciated.
In recent years, the international community has begun to unravel some of the risks and vulnerabilities that emerge from the digitization and automation of biology (see e.g. here and here). I have spent years unraveling some of these grave concerns, see e.g. here and here.
One of the rising concerns with AI systems is that there is often no way about the chain of reasoning that results in their behavior and response. Synthetic biology is subject to a similarly grave issue that remains poorly understood even though I first described it some 4 years ago. All we know about viruses, for example, is enabled by the digitization of biology and provided to us by computerized systems. This creates a gap between (a) what is modeled and digitally described and (b) the actual biological/physical reality. Actors intending to misuse this gap could do this in numerous ways.
Extrapolating to DEFUSE raises serious questions
The deliberately designed AI model organisms of deception have some striking resemblance to GoF work aimed at introducing the seemingly worst genetic feature possible. It will be impossible to draw exact parallels and analogs. Nonetheless, the following builds on their common underlying idea and belief system that the intentional design of worst-case “organisms” should have important benefits. (A small caveat: in the past, IT scientists seem to have been rather conscious about their work; it remains to be seen if this could change with AI).
Extending some of the above lessons to the DEFUSE proposal highlights a host of serious biorisks. It is not clear if the involved parties were, or are, aware of all or some of the vulnerabilities and risks.
One of the main concerns identified above is that of model poisoning where malicious actors deliberately cause AI models to appear safe during their development and training phase but, via hidden backdoors, make those systems act in an unsafe or deleterious manner when observing a certain subtle trigger during deployment.
Fostered by the gaps and vulnerabilities sketched above, the analogous concern would have existed during the planning and potential realization of the DEFUSE project (or any other synthetic biology project with multiple stakeholders).
As I exemplified in my series on the “brain virus,” the dispute between Song’s group and the one by Zhengli Shi demonstrates that the same virus(es) in question can appear to have the opposite characteristics, depending on how they are presented:
There is no way the researchers involved see all the data and information related to their research.
For example, the genetic description can be CoVs is ambiguous, especially since they are known to be amendable to natural and lab-induced mutations.
The actual virulence of pathogens is not an unambiguous fact. The animal model used and the susceptibility of the host can effectively hide or exaggerate the characteristics of a virus.
The possibility of “data poisoning” and exchange of malicious biomatter
In light of the above, it is particularly difficult to see why the DEFUSE proposal tried to offshore much of the work to China. The researchers obviously tried to conceal their intention to conduct high-risk coronavirus research in Wuhan under lax safety standards. Ralph Baric, commenting on an earlier version of the proposal emphasized that "IN the US, these recombinant SARS-CoV are studied under BSL3... In china, might be growin these virus [sic] under bsl2. US reseachers [sic] will likely freak out."
Did they not consider the possibility that someone could accidentally or deliberately make those viruses more dangerous? And without them knowing it?
Note that the proposal emphasizes the safety risks of this type of work (that is, that something could unintentionally go wrong). They did not mention security issues, however, i.e., that something could be deliberately misused.
Ironically, at one point, Dr Fauci gave an interview, addressing the question as to why pathogen research would even be outsourced to other nations. I cannot recall his exact response, but in essence, his respoinse was: “You would not want to have such work done in the middle of New York or another heavily populated area.”
Yet, Dr Fauci was certainly aware of the risks, both of accidents as well as deliberate misuse.
The potential of a hidden backdoor in the DEFUSE project
The DEFUSE proposal was authored by Peter Daszak (head of the EcoHealth Alliance in New York) with partners including Shi Zhengli of the Wuhan Institute of Virology and Ralph Baric of the University of North Carolina. Thinking about how bioweapons research can conceal its true nature, it is puzzling why they seemingly trust each other. After all, as Nicholas Wade pointed out, Baric and Shi were both collaborators but also rivals.
Did they have any assurance that whatever information, technology, or actual agents they exchanged would be as stated? How would they have known for certain?
It is possible that Shi Zhengli’s group, with their vast experience in CoV research, trusted their competence.
But given the technical issues indicated above, and others, it is feasible to think that a SARS-CoV-2 precursor could have had a double-face appearance, aka as a “deceptive” CoV model that could be triggered and released by a politically motivated actor.
From a mere theoretical perspective alone, one can also envision the reverse, with China secretly ingressing a backdoor that could have made such a CoV a potent biological weapon that only becomes active when specifically released or triggered by a susceptible host.
Both the biological material and even more so digital information about them would have constantly been exchanged between various agents.
From a biorisk perspective, this is very troubling, for several reasons:
Because viruses can be constructed from their sequence information, there would have been an enormous blurring of the actors involved.
Malicious actors could deliberately introduce backdoors at various levels, ranging from derailing someone else’s research program to embedding features that could make the viruses more concerning.
A deliberate attack might be impossible to get detected and mitigated.
Under a skilled deliberate attack, it would be next to impossible to attribute where and what happened.
Over the last few years, the idea has emerged that the U.S. tried to “collaborate” with China on dangerous biological weapons research to keep their foot in China’s bioweapon’s program. Specifically, Dr Robert Malone lays out a possible explanation for the DEFUSE collaboration, that it is the rationale that
“Essentially this is a reaction to the CIA losing its human intelligence capacity in the bioweapons era with China a number of years ago. I have had this independently verified via three independent sources.”
While the DEFUSE proposal is not a conclusive evidence, many believe it was a blueprint for the construction of SARS2. It seems feasible that whatever the final goal would have been, that it escaped/was released from a lab prematurely.
I truly wonder what both parties are thinking right now.
Conclusion
The revelation of deceptive AI systems has caused quite some concern in the IT community. What has been seen as even more troubling is that methods to stop them were not working. In fact, the finding that trying to re-train them can make the situation worse was described as “potentially scary.”
Computerization and automation are increasingly used in synthetic biology. Yet, even in the context of GoF research, many are turning a blind eye to existing and emerging risks. Above, I tried to make the point that there is just no other way: GoF research will backlash with repercussions affecting especially those performing the work.
Some believe that SARS-CoV-2 is not a real virus. Let me emphasize that this is not the point here. Even if entity A collaborated with entity B to work on something they knew was a scam - which they, together could maliciously exploit - the above shows that NEITHER would be able to validate the legitimacy of the products or information sent by the other.
It is a technical impossibility, for reasons explained above and elsewhere (see e.g. my invited book chapter and my Biosafety and Health article).
But what we do know is that already decades ago, researchers knew of the deleterious potentials of misusing biology for the creation of biological weapons. This is why we had the Biological Weapons Convention!
It should be crystal clear now that even from a technological perspective, there is no reason whatsoever why anyone should be involved in any type of dangerous research. By its very nature, gain of function work is inherently susceptible to misuse. Technical aspects and AI drastically increase this risk.
At the same time, they make it impossible for anyone to protect themselves. As there is so much push for collaboration and exchange of information, global elites must consider some sobering facts:
It would be impossible to say if it is entity A that makes itself vulnerable to entity B, or vice versa.
At the end, both could be shooting themselves in the foot.
The planned WHO/IHR amendments, pushing for the mandated sharing of pathogen information, are inevitably susceptible to the same types of vulnerabilities.
Overall, the novel and surprising risks of deceptive/malicious AI systems likewise apply to GoF research. In both cases (figure below), known attempts to detect or remove such risks are not working and instead could result in even more dangerous organisms of deception/biological weapons.