BLUF: A 2020 analysis declassified through a freedom of information act (FOIA) request at the Defense Intelligence Agency (DIA) examines the genome of SARS-CoV-2 and presents findings consistent with a synthetic origin of SARS-CoV-2. The DIA slides described many important observations about the research programs underway in Wuhan as context for unusual features in the genome of SARS-CoV-2. Valentin Bruttel, Tony VanDongen, and my paper in 2022 builds on the DIA analysis by quantifying the strength of an anomalous synthetic-looking feature in the genome of SARS-CoV-2. Kudos to the DIA team for their thorough work back in 2020.
US Right to Know Strikes Gold (again)
Few investigative journalists have been as dedicated to pursuing the origin of SARS-CoV-2 as the US Right to Know (USRTK). Their coverage of COVID-19 origins includes one of the most important bodies of documents produced by any one entity, and, just recently, USRTK released another extremely illuminating batch of documents.
After pursuing the mandatory declassification of documents in a freedom of information act suit, USRTK succeed in obtaining some transparency allowing the public to view the declassified portions of a 2020 analysis from Defense Intelligence Agency analysts.
The declassified portions of the DIA analysis reveal the depth of expertise we can expect - and be proud of - from members of the US intelligence community. From the very beginning, this analysis of the genome of SARS-CoV-2 provides a glimpse at the capabilities required to construct a bat-like coronavirus infectious clone.
Infectious clones made via reverse genetics systems have featured prominently in my own work on SARS-CoV-2 origins for a reason. Behind my colleagues’ and my 2022 paper quantifying the magnitude of an anomaly in the SARS-CoV-2 genome consistent with infectious clone technology is an informed Bayesian forensic awareness that reverse genetic systems hold the key to accurate assessments on the origins of SARS-COV-2.
After publishing our work, the USRTK covered our paper when it came out, they FOIA’d grants and discussions among researchers in light of this scientific and forensic evidence, they found evidence corroborating our finding through order forms for the specific reagents required to make the reverse genetic system we hypothesize created SARS-CoV-2, and their findings were met with awards and claims they’ve unearthed the story of the decade. From 2022 to present, the investigation of SARS-CoV-2 origins has had some of its most significant corroborating findings emerge thanks to the synthetic origin theory centered around reverse genetic systems.
In this most recent batch of FOIAs, we don’t uncover any new evidence on the origins of SARS-CoV-2, there is no corroborating evidence. Instead, we travel back in time to 2020 after the emergence of SARS-CoV-2 and find that very smart people at the DIA also recognized the importance of reverse genetic systems, and the declassified slides provide a glimpse into the minds of brilliant analysts and an opportunity to connect the DIA analysis with what’s been done since.
Great Minds Think Alike
If you’ve been following my writings on COVID origins, you’ll immediately recognize some words and figures in the DIA slides, from the “Type IIS Restriction Enzymes” and “Golden Gate Assembly” discussions on pages 6-9 to the figure on page 14 extracted from Zheng et al. showing the construction of rWIV1 or the almost immediate recognition on page 17 that SARS-CoV-2 can be readily assembled with BsaI and BsmBI as the DIA slides visualize below.
For comparison, below is the figure from our 2022 paper showing the exact same option for assembly using BsmBI sites for the first 3 fragments, a BsmBI + BsaI connecting piece in fragment D, and BsaI sites for the final 2 fragments E and F.
Behind the scenes of our 2022 paper, there were some minor disagreements on the scope of our work. Personally, I felt it was important to acknowledge and discuss alternative methods to construct a reverse genetic system beyond the method above. The other two co-authors recognized that such alternative methods with invisible restriction sites were plausible, but opted against a detailed discussion in our current paper, saving such analyses of alternatives for future work to focus the scope of our paper on the specific restriction map in the genome of SARS-CoV-2, noting that many methods to construct the genome are plausible and that our finding is robust across these methods. The DIA, meanwhile, did due diligence and fleshed out alternative possibilities, including the use of invisible restriction sites as done by Xie et al. (2020).
There are other options as well to build a reverse genetic system from the “IKEA virus” genome of SARS-CoV-2. For example, one could construct a reverse genetic system using a hybrid system, using the existing BsaI sites in fragments E and F from the first version above, and then use invisible BsaI sites to partition the remainder of the genome into fragments, enabling one to synthesize a full-length infectious clone using only BsaI thanks to the removal of the conserved twin BsaI sites with silent mutations in our fragment C. On a technical note, the use of pre-existing BsaI sites in fragments E and F, combined with the absence of BsmBI sites in these fragments, would enable researchers to re-use the BsmBI-flanked S-genes and receptor-binding motifs used by Hu et al. 2017, all thanks to the removal of BsmBI sites with silent mutations in our fragments E and F.
While these discussions of reverse genetic systems are inescapably technical, there is some deep subject matter expertise and intuition behind what the DIA analysts and other open-source scientists like myself noticed. To us, it feels like stumbling upon an IKEA table in the woods.
Imagine you came across an IKEA table in the woods. Even if you’ve never seen an IKEA table before, your knowledge of the woods would enable you to recognize the table is unnatural. The features which make the table in the woods consistent with tables you have seen also make it inconsistent with the surrounding natural environment, thus an awareness of the natural environment can be sufficient to identify anomalies and an additional awareness of how things are built only improves this sense. You can hypothesize many ways to build the table - maybe they attach one leg at a time and then the braces between the legs, or maybe the legs and braces are assembled first prior to attaching them to the surface of the table.
The many ways to build a reverse genetic system with the BsaI and BsmBI restriction enzyme map of SARS-CoV-2 is a similar exercise; the DIA did justice to the many possibilities, possibilities which my colleagues and I saw but did not flesh out, but we all knew that as a table is useful for eating & this table also has a fork and plate, the reverse genetic system of SARS-CoV-2 is useful for synthetic biology and this SARS-CoV-2 also has strange insertions consistent with synthetic biology research.
Seeing the DIA slides filled me with a degree of camaraderie with these DIA analysts of 2020. Science and mathematics can be a beautiful way to appreciate people. One may feel alone at the front lines of knowledge, experiencing an idea that seems outlandish or wild to others far from the front lines (e.g. our work above was insulted in many ways, including as “kindergarten molecular biology” despite it later being corroborated almost every way we can). As others converge on the same theories and join you at the front lines, light up with fascination at the same observations, and examine closely the same natural phenomena, they can feel like partners in an enterprise that transcends time and space. While the DIA analysts produced the slides above in some unknown location in 2020, my colleagues and I followed the forensic trail to this exact same pattern and research program in 2022 while Valentin was in Germany, Tony was in North Carolina, and I was in Montana, and yet I feel a sense of togetherness with all of these people pushing the boundaries of forensic biology. In addition to collegiality over our similar pursuits, there’s an admiration for the uniqueness of our slightly divergent paths and approaches.
While our paper was a small unit of analysis focused on the BsaI and BsmBI map, the DIA goes further in their analysis of SARS-CoV-2 genome to evaluate the Spike gene, receptor binding domain (RBD), and receptor binding motif (RBM) in the context of prior research underway in Wuhan. Their observations here, too, have been studied in parallel by researchers in open-source communities. Indeed, many of us open-source scientists concur with the claims in the slides that a synthetic origin is plausible, especially in light of research underway in Wuhan at the time of emergence.
The declassified analysis of the SARS-CoV-2 genome from these folk at the DIA is a tour de force of lab origin theory, so much that now I recommend the DIA slides as required reading on SARS-CoV-2 origins as they concisely cover the essentials. The DIA slides are a concise report covering it all: the rWIV1 reverse genetic system of Zheng et al. 2016, the S-gene swapping of Hu et al. 2017, the bat SARSr-CoV collection efforts underway in the WIV, the minimal RBD cassette of Ren et al. 2008 corresponding improbably well with a “recombination” event in the genome of SARS-CoV-2, questions raised about the veracity and verifiability of genomes published since by the WIV claiming to sow doubt about the lab origin, and more. Whoever analyzed the genome of SARS-CoV-2 at the DIA in 2020 has earned my admiration. The thoroughness of their findings ought to keep our adversaries up at night.
It seems in 2022 my colleagues and I were standing on the shoulders of invisible giants when we followed these DIA analysts’ 2020 observations in our 2022 work. In our independent pursuit of this issue, we took a new approach which I believe adds to the declassified DIA slides without subtracting from their tour de force. Specifically, while the DIA slides document the BsaI + BsmBI pattern and mention it is consistent with a synthetic origin and that a synthetic origin is plausible, our work took a more quantitative probabilistic approach and estimated the odds of the unusual pattern. Not to toot my own horn, but having received my PhD in the first year of Princeton’s Quantitative and Computational Biology program, I see this quantitative probabilistic approach as a particularly valuable way to solve the problem of attribution in biology.
Using robust methods and reproducible code, we estimated the unusual spacing of these BsaI and BsmBI restriction enzyme recognition sequences in the genome of SARS-CoV-2 was an approximately 1 in 1,400 event; the odds of the unusual hotspots of silent mutations within these sites was estimated to be a 1 in 20 million event. Imagine one finds a pattern consistent with a synthetic origin, but in one scenario the pattern has a 1 in 20 chance of occurring in nature and, in another scenario, the pattern has a 1 in 30 billion chance of occurring in nature; these two scenarios are not the same when assessing the likelihood of a synthetic origin. Thus, our analyses provide a quantitative estimate of the odds of this pattern observed by the DIA in 2020, revealing that the pattern consistent with a synthetic origin is highly unlikely to occur in nature, so much so that we were not surprised to find corroboration upon further FOIAs turning up grants with order forms for these enzymes.
Despite not knowing of the DIA work at the time of our work, I feel a profound collegiality in seeing their slides and I hope they have seen our work as well.
Kudos to the DIA analysts, kudos to the US Right to Know, and kudos to scientists who have done due diligence & maintained objectivity on this contentious issue.
Congratulations! You “kindergarten molecular biologists” are very impressive. This has been an amazing story to follow.
Well done! Now let’s find who decided to create the monster.