TLDR
It may seem like SARS-CoV-2 origins is at a stalemate, but that’s not true. Just a few months ago colleagues & I unearthed new evidence of a synthetic origin, and there’s a high probability additional evidence exists that could change minds.
I started this long article as a personal journal to reason/argue with myself about what evidence might exist so I could allocate my time and effort and attention appropriately. However, when I looked at the whole thing, I realized this has a few insights worth sharing that may motivate others to pursue key pieces of evidence. More hands make less work! I added some wrappers and context to contextualize the weird and perhaps unconventional way I think. Don’t consider this definitive, please discuss, I’m probably wrong or others can make better arguments than me in many places - please engage!
Above all, enjoy :-)
Wise allocation of attention for scientific evidence
With my profound hearing loss, I typically hear only about 40% of the words in any given sentence. As a consequence, I’ve developed strategies since a young age to allocate my attention towards words and non-verbal cues that help me maintain a conversation. Not all words have the same evidence. Consider:
Full sentence: Sometimes sentences are random and out of context
Option 1: ____ sentences ___ random and out __ context
Option 2: Sometimes ___ are ___ and __ of context
In both cases, I “missed” three words. In the first option, one nonetheless obtained a decent understanding of the full sentence - I might not know what kind of sentence (was there an adjective before “sentences”?) but I can guess or impute what word is between “sentences __ random” and “out __ context”. In the second option, one is completely clueless. Listening takes a lot of effort when you’re hard-of-hearing, kind of like squinting at the row right below what you can see on an eye exam, except that row is rolling by continuously throughout the day and your job depends on it. Allocating attention strategically, focusing on the parts of sentences with the most ‘evidence’ or ability to discriminate between different sentences, is second-nature for people who’ve been hard-of-hearing since a young age.
As a scientist, I use a similar set of heuristics to prioritize my search for evidence that discriminates between theories. In science as in life, it’s wise to prioritize our searches, choosing where to allocate our time, attention, and search-intensity for evidence based on product of how strong the evidence is, how likely it is to exist, and how likely we are to uncover it if we try.
For example, I am not very likely to uncover evidence of a UFO. Even if I quit my job, allocate all my time looking up at the big sky in Big Sky state, and snap a picture of an object in the sky, it may not convince many others who might suspect it’s a deepfake or maybe just a Chinese surveillance balloon. Consequently, I don’t spend my precious hours in Montana searching for UFOs in the sky. Instead, I spend my hours in Montana prioritizing my scientific efforts and then pursuing high-priority efforts.
We are likely to uncover more evidence on the origins of SARS-COV-2. While I already believe we have enough evidence to conclude a lab origin beyond reasonable doubt, other scientists & members of the public don’t yet agree with my assessment. In addition to popularizing the niche subject matter expertise that has me leaning so heavily towards a lab origin, it’s valuable to consider evidence we might observe that could make this process easier. Rather than just report where I’m searching, I want to use this article to discuss evidence we may plausibly obtain, the spectrum of possibilities for the evidence, and how additional evidence may tilt the scales in the theoretical debate over the origins of this pandemic virus.
Evidence will most likely favor a lab origin
The truth is not fair.
Existing evidence gives us strong reasons to believe SARS-CoV-2 most-likely came from a lab. While we may debate how much “more” likely a lab origin is than a zoonotic origin, that it is more-likely means we should put somewhat higher faith that searching for lab-origin evidence will bear fruit.
The truth can be very unfair in its treatment of competing theories. The beauty of science is that if one theory is right, it will be more-likely with existing data and more-likely to become even more-likely with future data. The rich get richer among scientific theories. This truth-mediated positive feedback loop is a good thing as it helps us allocate attention towards more-likely theories and rapidly resolve scientific debates by the new evidence we uncover. This partiality of truth was felt when Valentin Bruttel, Tony VanDongen, and I found evidence of a synthetic origin in the restriction map of SARS-CoV-2. We had used evidence up to that point to reason a lab origin is more likely, so we boldly tested a synthetic origin in a way that could reject a synthetic-origin hypothesis and our strong tests ended up producing strong evidence in favor of a lab origin.
The instant one deduces one theory is more likely, the wise scientist will allocate time & attention mining the more-likely theory, piling on rejections of the incorrect theory. Stubborn holdouts of the less-likely theory will exist, and if they happen to be right it will become evident with evidence.
Without further ado, let’s consider what “words” in the sentence of SARS-CoV-2 origins, what pieces of evidence, we might hear in future statements about SARS-CoV-2 origins.
Earlier Sequences
The most plausible lab origin scenarios rely on an earlier start date than the Huanan Seafood Market and a larger-than-reported outbreak in Wuhan. Consequently, under a lab origin one has a reasonable chance of finding additional sequences in human samples corroborating this scenario.
Most zoonotic-origin scenarios rely on two spillover events in the Huanan Seafood Market, each producing one lineage - Lineage A and Lineage B. While colleagues and I wrote a paper challenging their underlying methods & premise, we nonetheless in scientific honesty have to keep their scenario on the table for fair consideration. If one were to find lineages pre-dating Lineage A and Lineage B, indicating earlier transmission - especially should one additionally find evidence of earlier transmission suppressed by the Chinese government - it would make the Huanan Seafood Market origin hypothesis obsolete.
In fact, some of these sequences may have already been found. Jesse Bloom found a deleted dataset from the early Wuhan outbreak. Most intriguingly, this deleted dataset contained sequences with mutations pre-dating Lineage A and Lineage B. If you haven’t read Dr. Bloom’s paper yet, I highly recommend it - he meticulously details evidence of earlier cases, the dates of Chinese government policies limiting what Chinese scientists could publish, the dates of deletion, and the details of these sequences.
Another piece of potential evidence of earlier sequences is Csabai et al.’s discovery of SARS-CoV-2 sequences lurking in the background of Antarctic soil sequences collected sometime in 2019, sequenced later, and deposited onto sequence archives after that. It’s unlikely that SARS-CoV-2 was circulating in Antarctic soils but more likely this was a contaminant and the primary articles that used these sequences contained an overlapping set of authors at the University of Science and Technology of China in Hefei, China. Hefei is a ~4.5 hour drive from Wuhan and, interestingly, these Antarctic soil sequences appear to corroborate the earlier lineage pre-dating the Lineage A/B split reported by Jesse Bloom above.
Others have found some evidence of seropositive samples in Italy October 2019, and others found 11 out of 44 “pre-pandemic” oropharyngeal swabs of patients experiencing a measles-like illness tested positive with a PCR test. Others found evidence of SARS-CoV-2 in the sewage of Milan & Turin as early as December 18th, 2019. While multiple lines of evidence suggest earlier and more widespread transmission, sometimes our serological are positive even though a patient never saw SARS-CoV-2, and sometimes biological samples can be contaminated with SARS-CoV-2 to make them test positive even if the original patient or sewage wasn’t (after all, the Antarctic soils above could, in theory, test positive yet we believe that is due to a lab-derived contaminant). To know for sure that PCR+ samples from 2019 are not PCR+ due to contamination, we would need to see the whole-genome sequence and see where that fits on the evolutionary tree as Dr. Bloom did.
Even where we have sequences, as with Dr. Bloom’s sleuthing or the Antarctic soil sample, these sequences are all partial sequences with low “coverage” (i.e. the sequencing machine didn’t sequence the same stretch multiple times for us to confirm there is an A instead of a T at a critical site where an A might indicate an ancient sequence whereas a T could indicate a more recent sequence). Phylogenetic trees are just estimates, as evidenced by the different scenarios in Dr. Bloom’s three trees, and there are errors in sequencing, so in theory it’s possible these independent lines of evidence are all coincidences with alternative explanations.
If, however, someone were to find an unambiguous full-genome sequence of SARS-CoV-2 that corroborates this earlier split, it would provide strong evidence of earlier & more widespread transmission. Many scientists like myself (including the former Chinese CDC director Dr. George Gao) already think the Huanan Seafood Market is a victim of human-human transmission and not a site of spillover, but having a genome sequence corroborating Dr. Bloom’s phylogeny with high-coverage (read: high-confidence) would simultaneously throw out the wet market origin theory while increasing suspicions of why the sequences Dr. Bloom found were deleted from sequence read archives.
If these samples from 2019 are truly positive with SARS-CoV-2 from infections in that time, and not positive due to later contaminants, then whole-genome sequences should be easy to obtain and have the potential to completely re-write the scientific record’s timeline of SARS-CoV-2 emergence, transmission, and evolution.
If, however, people collect full-genomes and find evidence these early “positive” samples were contaminated with later variants, it would be valuable for them the publish such results so we can re-assess the PCR+ samples in Italy with that information. While I currently don’t put a lot of stock in early sequences due to there being plausible alternative explanations for every case, and strong meta-scientific incentive + amplification for anyone reporting an earlier-positive-sample, if someone were to find a high-coverage whole-genome sequence of SARS-CoV-2 reshaping the early outbreak evolutionary tree, especially one corroborating Dr. Bloom’s sequences, that would be revolutionary.
Let the hunt begin.
Details on CoV researchers with influenza-like illness (ILI) in Fall 2019
There’s classified evidence we’ve heard something about and are waiting to hear more about. I’m listening closely to some key details of this evidence, accepting that some details may not be declassified to protect US intelligence assets, methods, and more (totally fair reasons to not declassify - after all, if there were a human asset and someone shouted “Dr. JimBob in China told us!” then Dr. JimBob is dead).
US intelligence agencies already claim to know that 3 CoV workers in Wuhan sought hospital care for influenza-like illness sometime in the fall of 2019. Such evidence seems very easy to verify, so I believe it’s true. It’s plausible upcoming declassifications requested by Congress may flesh out more details, such as exactly when these people sought care, how severe their symptoms were, how old they were, where they worked, and whether/not they were lab workers involved in prior or proposed efforts to make chimeric SARS-CoVs in Wuhan. This evidence exists already, the public just doesn’t know the details and the details determine the strength (or weakness) of this evidence.
The best case for a lab origin would be if the patients were all young workers at the Wuhan Institute of Virology or Wuhan CDC with unusually severe symptoms in late September to early-November 2019 and a clear history of research on recombinant SARS-CoVs. This evidence would increase the odds that the workers were sick with an atypical respiratory virus more consistent with the viruses they worked on than the typical causes of seasonal influenza-like illness. The timeline, especially once factoring in the ~20 days from exposure to hospitalization, could make the estimated date of exposure of these cases coincident with September-October anomalies at the Wuhan Institute of Virology, creating a cohesive set of evidence that combines to outline a very clear story about how SARS-CoV-2 emerged in a laboratory accident down to details of the possible index cluster of CoV researchers.
The best case for a zoonotic origin would be if these CoV workers were sick in mid to late December 2019, were working on unrelated CoVs with no prior history or evidence of expertise in making SARS CoV infectious clones. Such a timeline would not coincide neatly with anomalies at the WIV and would make this piece of evidence less consilient with other evidence of a laboratory origin. We’ve listed the anomalies at the Wuhan Institute of Virology: their database went offline September 2019, they hired a contractor to change their HEPA filter around then, cell phone data indicate a shut-down of the WIV in October 2019, etc. If the CoV workers were sick in a completely separate timeline than these Wuhan Institute of Virology anomalies, it would make it more likely the CoV workers falling ill was just a coincidence and not directly related to other epidemiological anomalies that lend support to a lab origin. By separating these pieces of evidence, it could weaken the theory of a lab origin.
If I had to shoot from the hip and guess, I’d suspect these 3 CoV workers don’t make a slam-dunk case but do slightly tilt the scales towards a lab origin. I suspect they worked on SARS CoVs and were in close contact with others who worked on recombinant SARS CoVs. I might guess a mid-November date for seeking care, suggesting earlier transmission and a mid to late-October exposure, meaning there are still missing pieces of the puzzle, such as what happened between the September 2019 anomalies at the WIV, the mid-October shut-down, and the cases seeking care in November. My only reason for guessing these cases are not a ‘slam-dunk’ is that I can’t imagine US intelligence agencies, assuming they lack conflicts of interest on this topic, looking at 3 CoV workers ill at the same time as anomalies in the WIV, alongside all the other publicly available evidence, and not all assessing a lab-origin is more likely with at least “low-confidence”.
Military Takeover of the WIV
The US State Department claims the Chinese military has conducted classified military research at the Wuhan Institute of Virology since at least 2017. The public has also been told by several sources that the military “took control” of the WIV sometime likely in 2019. When? Why? How do we know? What were the justifications given? What changed with the military takeover?
Details around this event can provide evidence for a lab origin if the military takeover of the WIV happened prior to the Chinese government notifying the world about SARS-CoV-2, if it appears to have been a reaction to some unknown event and not part of a longer transition of power, and if the behavior & policies of the military at the WIV appeared to reduce the transparency of coronavirus research at the institute by stopping communications, seizing control of labs studying coronaviruses, etc. This could paint a clear picture that the highest levels of the Chinese government were aware of an emergency, the emergency was so severe it warranted military takeover of the lab, and the military takeover appeared to reduce the ability of the world to learn about research taking place in the lab at the heart of lab-origin theories. Such evidence would not only provide evidence of consciousness of guilt, but it would implicate the highest levels of the Chinese government in the cover-up as opposed to other lab-origin scenarios where the Chinese government was unaware of an outbreak originating in the WIV.
If, however, the military takeover was planned ahead of time and part of a trend of increasing militarization of research at the WIV, and if the military didn’t take any unusual actions (e.g. seizing labs, squelching researchers, closing down the area around the WIV), it would be harder to tie the evidence of a military “transition” to any epidemiological anomalies at the WIV. Such a ‘transition’ would weaken this evidence that currently supports a lab origin more than a zoonotic origin.
However, given many credible people familiar with this classified information have referred to this event as a military “takeover” with nobody debating the evidence and calling it a “transition”, I suspect the event was extremely abrupt and we’ll all agree it was a proper ‘takeover’. There may be cables indicating the intentions of the military, new rules in place for Chinese researchers that shine light on the PRC’s intentions, and perhaps evidence of focused military attention on a subset of researchers and research at the WIV.
We’re still left with the question of how other intelligence agencies, despite having access to this information, appear to have not made an assessment of a lab origin. Of note, the analysts assessing low-confidence in a zoonotic origin give weight to Chinese officials’ lack of foreknowledge, numerous vectors for exposure, and ‘other factors’. Consequently, we might expect the evidence of a military takeover, certainly the evidence (if any) that will be declassified, is relatively sparse and does not indicate clear, focused attention by the Chinese military on SARS CoV researchers at the WIV but perhaps a more confused reaction to later-than-one-might-expect (under a September 2019 lab-origin hypothesis) evidence of community transmission. With similar reasoning, I’d guess this takeover is later than the earlier epidemiological anomalies, perhaps as late as December 1, 2019, when the word “SARS” spiked in the Chinese social media app WeChat. If the military takeover was after January 2020, I would argue the evidence doesn’t tilt the scales much at all.
I suspect I’ll be left with more questions than answers after hearing the word on the military takeover. According to the DNI report linked above, Chinese officials (which officials?) were caught by surprise (when?). While the military takeover will likely remain as a key piece of evidence suggesting the Chinese government knew about a possible lab origin of SARS-CoV-2, there will undoubtedly still missing pieces of the puzzle. There may even be discordant timelines between CoV researchers falling ill and the military takeover. While I suspect this evidence is unlikely to be conclusive, I’m still listening for it because at the heart of a lab origin theory is a Chinese military lab, and so any insights into the behavior of the Chinese military around this lab is worth considering and may even suggest future avenues of research.
Ralph Baric + EcoHealth Alliance Emails
EcoHealth Alliance had created a SARS-CoV in 2018 capable of causing 100-10,000x viral titers in SARS-CoVs. Ralph Baric was a PI on the DEFUSE grant proposing to insert several furin cleavage sites for testing in UNC and then send the genome sequences of his infectious clones to the WIV, where they could recreate the infectious clone (sidebar: this is one of the cool but also terrifying things of infectious clone technology - you can email a viral genome & instructions for in vitro genome assembly as if it were a template for a 3D printer, and virologists all around the world can use that blueprint to recreate the virus from scratch… with this technology viruses can now ‘reproduce’ via email like a 3D printed gun… we may want to entertain restrictions on publication & emailing of dangerous viral genomes).
Both EcoHealth and UNC have bitterly resisted sharing any information on their SARS CoV research activities and collaborations with the Wuhan Institute of Virology. Any communications between these 3 research groups - EcoHealth, Ralph Baric, and the Wuhan Institute of Virology - could shine light on exactly how they helped SARS-CoVs obtain 100-10,000x higher viral titers in humanized mice, what they decided to do after DEFUSE was rejected, and whether or not they agreed to proceed with efforts to (or had already) insert an FCS inside the S1/S2 subunit of a bat SARS CoV. We have their proposal, but do we see any follow-through from these groups?
Best case scenario for a lab origin would be emails from 2018-2019 proposing to go ahead or acknowledging success inserting a FCS in a SARS-CoV, emails containing the specific furin cleavage site found in SARS-CoV-2, or even emails most likely from Ralph Baric proposing very specific mutations to add/remove BsaI/BsmBi (or Esp3I) sites into a SARS-CoV. In fact, Ralph Baric would likely order his parts from biotech companies in the US and Europe, so one can imagine receipts existing in vendors’ databases showing UNC ordering the exact FCS found in SARS-CoV-2, primers that add/remove BsaI/BsmBI sites, or (less-conclusive) orders of those specific restriction enzymes.
A more complicated scenario (but one far preferrable to US entities) would be if full transparency was obtained and we found absolutely no evidence these US-based groups continued their pursuit of adding an FCS in a SARS-CoV. If the SARS-CoVs with 100-10,000x viral titers at the WIV were just boring old S-gene recombinants and if Daszak privately swore off gain of function research (ha), if Daszak and the WIV “broke up” (ha), or other paper trails emerged providing clear evidence to support the Zoo Crew’s favored hypothesis that the DEFUSE Band broke-up after their failed album. If we looked and found evidence that these groups did not want to proceed with DEFUSE after it didn’t receive funding, that would slightly downweigh the significance of DEFUSE. Low-probability but absolute best-case-scenario, such missing evidence may force lab-origin theories to put more stock in scenarios where e.g. the Chinese government, well known for stealing bioscience trade secrets, took DEFUSE and funded it entirely within groups of Chinese military researchers, effectively stealing DEFUSE for biodefense/bioweapons research.
The most likely lab-origin theory based on public evidence, however, is that SARS-CoV-2 is an accident of pre-COVID gain-of-function research taking place in the WIV and funded by many international funders. Scientists are a chatty bunch. We send crazy speculative ideas to each-other, we brainstorm, and we leave a paper trail from receipts to conversations discussing where to add/remove restriction sites.
These emails/receipts can help us find the smoking gun, or rule out some scenarios.
Fabricated Sequences from China
Under a lab origin, especially scenarios where the Chinese government is aware of a lab origin (e.g. so aware that their military takes over the WIV), we really must consider counter-adversarial science. Even if the Chinese government had a lack of foreknowledge of COVID-19 sometime in late 2019, later investigations by the CCP may uncover a lab origin and uncovering a lab origin from e.g. a military biodefense lab would motivate a cover-up.
Countries lie when lies are beneficial for their national security. Global revelations of a lab origin of SARS-CoV-2 in China could be devastating for Chinese national security, and so lab-origin theories consider the possibility that China is lying to the world to avoid the devastation of being proven guilty for the lab origin & its coverup. Governments don’t just lie through press secretaries or Xi Jinpeng, they also lie through the media, through social media, and even data or scientific media.
One way someone might try to cover-up a lab-origin of a virus with clear signs of genetic engineering would be to innocently produce fabricated “natural” genome sequences that make the genetic-engineering events of an engineered seem more “natural”. For example, if the Wuhan institute of Virology suddenly published many SARS-CoV genomes with furin cleavage sites, that might make it look like furin cleavage sites are more common and so the anomaly of SARS-CoV-2 having a furin cleavage site less anomalous.
However, what if I said the following was a sequence I found in nature:
ATGCATGCATGCTAGTCTAGCTCAGT
Would you believe me?
I just typed that sequence on my computer. That is a fabricated sequence - if I just wrote a long sequence & claimed it was a virus, you could test whether/not it was fabricated by seeing if you can make infectious clones of the virus-via-email. If you can’t create the infectious clone, and independent parties can’t find a relative of the virus in the same caves where I claim my sequence come from, then we really don’t have any reason to believe the sequence I typed is real. If you find evidence that I’ve spoliated some evidence in my own criminal trial, that can & should be used against me in a court of law.
There is already some evidence that suggesting that sequences coming from China may be fabricated and/or selectively released. In January 2020, the WIV released RaTG13 saying it was the closest sequence they had. However, the file containing RaTG13, which they propose was the output from sequencing a bat fecal swab, does not look like a bat fecal swab based on a clever analysis by Monali Rahalkar and Rahul Bahulikar. A high percentage of reads in the RaTG13 sample did not match anything (much like my A-T-C-Gibberish above), and the remaining sequences seemed to map to a wide variety of bat species with some even aligning to squirrels, foxes, and more. The bacterial reads of the swab were extremely low - about 0.7% of reads matched to bacteria.
I don’t know how much you know about poop, but one thing you need to know is that poop has a lot of bacteria. About 50% of the mass of any given piece of crap is bacteria, viruses are much smaller and far less numerous than bacteria, and so we expect bacteria to be more abundant than viruses in poop. Specifically, we expect 70-90% of fecal metagenomic reads to be bacterial sequences, not 0.7%. That, along with the foxes and squirrels and many species of bats, is weird.
That’s not all. Below, I’ve aligned 3 viruses to SARS-CoV-2 and plotted the number of mutations in 30bp windows along the whole genome. The solid vertical lines indicate the beginning and end of the Spike gene, and the dashed vertical lines indicate BsaI/BsmBI sites found in either SARS-CoV-2 or its close relatives. RaTG13 has several hotspots of mutations in BsaI+BsmBI sites - if one were trying to ‘hide’ a signal of mutations adding/removing BsaI/BsmBI sites, this is how you would do it. Notice, however, that BANAL52, a closer relative of SARS-CoV-2 found by an independent group in Malaysia, does not have these hotspots of mutations, suggesting this ‘scribble’ of mutations in RaTG13 is unique to RaTG13, the genome released by the Wuhan Institute of Virology in January of 2020.
The sequence on the bottom is RpYN06, a sequence published by many members of the Chinese Academy of Sciences and Eddie Holmes. The beginning of the hotspot of mutations in RpYN06 coincides almost perfectly with the start codon of the Spike gene and decays until a BsmBI site in SARS-CoV-2. Recombination events don’t care about start codons and BsmBI sites, but bioengineers do. While this observation is inconclusive and needs further investigation, it illustrates that there may be additional evidence lurking in possible fabricated sequences being released by Chinese groups and their close western colleagues (possibly without the western colleagues really knowing about any fabrication, but rather being used as a trojan western scientist - how much more suspicious would this sequence look if they didn’t have Eddie Holmes’ name on it?). At a minimum, the high likelihood of a lab origin warrants high skepticism when using sequences coming from groups directed by the Chinese government - governments lie, and it would be naive to assume they would not lie through data or sequences.
It’s interesting that RpYN06 has such a cleanly cut, recombinant Spike gene. The Wuhan Institute of virology was very interested in recombining Spike genes, and their bioengineering research products would look exactly like RpYN06. They had a database of reportedly hundreds of CoVs that was taken offline, and have not released the dataset. Instead, we see this Spike-recombinant RpYN06 presented as “natural”, selective leaks of CoV genomes that contrast with sleuths uncovering evidence of unreported MERS-like infectious clones with some potential evidence of S-gene manipulation. The MERS-like SARS CoV docked inside a BAC clone was not reported transparently but rather it was lurking as a contaminant of a rice sequencing project.
This body of evidence has me very skeptical about CoV sequences coming from China and very bullish on efforts to find evidence of unreported CoVs lurking as contaminants or finding anomalies in the “evolutionary” patterns of mutation & recombination in sequences published by Chinese researchers post-COVID versus those published pre-COVID or by independent groups.
Further discoveries suggesting fabrication of sequences would suggest consciousness of guilt and spoliation of our sequence read archives, likely being done with the approval or possibly under the direction of the Chinese government. A single fabricated dataset is a violation of scientific trust and can rightfully undermine a researcher’s entire career - how can you trust any further data they produce, if they would violate research ethics to lie for their own benefit? Evidence of an adversary poisoning sequence read archives to hide the origins of the pandemic would advance a lab origin with fairly historic consequences for science as well, revealing that even our sacred and life-saving scientific information system may be a playground for disinformation in the minds of some hostile foreign governments.
If, however, these sequences and the lineages they represent are corroborated by independent groups, if the 0.7% bacterial reads, the foxes, squirrels, and many species of bats with RaTG13 are all revealed to have a normal explanation, and if the other sequences like RpYN06 are shown to be representative of recombination events well-documented by independent researchers who lack a conflict of interest, then we will not have evidence of spoliation and that would somewhat weaken the case for a lab origin.
I want to emphasize that a counter-argument to spoliation of sequence archives is that the Chinese government is not entirely irrational and that the risks of spoliation efforts being discovered might be so high, and with such clear implications of Chinese government involvement in suppressing a lab origin, that it’s not worth the risk. One counter-argument to this game-theoretic assessment is that a rational government may also see that starting a pandemic, especially if caused by military biodefense/bioweapons research, could be absolutely catastrophic for national security as it would undermine global goodwill & good standing Xi Jinping has tried to build during his reign. One may also think that subtle efforts with plausible deniability and no (perceived) way of disproving their account may be worth the risk. If, for example, they found a clade of SARS CoVs unique to a cave in China that nobody else could sample, they sequenced the virus and then eradicated bats in the cave, they might find this new sequences as an opportunity to add just a little bit of noise to drown out evolutionary signals, and they might think that their methods for adding this little bit of noise would be impossible to detect, especially if they could get Western virologists’ names on the papers (how could you accuse a famous Western virologist of fabricating a genome??). In other words, given the potentially massive risks, one can imagine a rational actor believing scientific disinformation is worth the risk, especially if that risk is done in a way they try to carefully control.
If SARS-CoV-2 originated from a Chinese military biodefense/bioweapons lab, a lab the Chinese military later took over, and the virus later killed 18 million people at a sensitive time when the Chinese government is seeking to expand its regional influence, desperate times may call for desperate measures, and desperate criminals make stupid mistakes. The instant a lab-origin is on the table alongside a cover-up, the scientific questions investigating the origins of SARS-CoV-2 are not entirely scientific, but are also necessarily forensic and counter-adversarial, requiring a degree of distrust and cross-examination of each other’s data that the jovial and collegial and interconnected scientific community is not used to doing. If, however, SARS-CoV-2 is zoonotic, then all my rationalizing here is irrelevant, some scientists will hate me for even thinking like this and making such harmful allegations (personally, I’m not making an allegation but rather entertaining suspicions). It will probably make me look a bit crazy until a pandemic virus does originate in a military biodefense lab and there is an effort to spoliate or poison scientific databases (in case such efforts of scientific spoliation and data poisoning have not already occurred… in any case, I believe our scientific systems should be evaluated from a perspective of national security to protect against such risks).
Wet Lab Experiments
While I’m a quant guy these days, my mom was a wet lab molecular biologist and I started my scientific career in the lab. The lab & the field are where data are generated and scientific theories are put on the chopping block - I love it. There are many opportunities for wet lab research to advance our understanding of SARS-CoV-2 origins.
Researchers can test the veracity of sequences above by recreating the viruses in the lab. Is RaTG13 even a real virus? Are viral genomes released by Chinese researchers post-COVID more difficult (or impossible?) to recover in vitro compared to similar bat CoVs pre-COVID? Are they better able to infect humanized mice? Such work may provide evidence of sequence fabrication and/or the publication of bioengineered research products with enhanced human-infectivity being presented as “natural” viruses to make SARS-CoV-2’s Spike-gene recombination and heightened human-infectivity appear “natural”.
Researchers can test whether or not the BsaI/BsmBI restriction map of SARS-CoV-2 allows one to create a reverse genetic system for SARS-CoV-2. Is SARS-CoV-2 truly a reverse genetic system, or are there toxic elements in the genome that prevent one from making this system in the lab & increase the odds our BsaI/BsmBI map is an artifact of chance? This experiment, done well, could reject our hypothesis that SARS-CoV-2 is an infectious clone produced by BsaI/BsmBI type II directional assembly. That would take one big piece of evidence away from a lab-origin & make a zoonotic origin theory more competitive (but still less-likely).
What happens if you remove an FCS from SARS-CoV-2 - how much does that reduce the viral titers in humanized mice? A risky GoF experiment of counter-intelligence value: what happens if you add an FCS to a bat SARS-CoV? How much does this increase the viral titers of the virus in humanized mice? Does this effort help one recreate the 100-10,000x difference in viral titers in humanized mice between wild-type and recombinant SARS-CoVs produced at the WIV and documented in the 2018 NIAID progress report?
Of note, scientists have already done an experiment on an FCS deletion with mixed results. After deleting the FCS (‘delta_PRRA’) viral titers in human cells were 2 orders of magnitude lower. In Figure 3, the authors report to find “7-9 N gene copies per mg” in the lungs of mice infected with the wild-type SARS-CoV-2 and “2-4 N gene copies per mg” less when the FCS is deleted, but that doesn’t sound right and I suspect it’s a minor error. Typically researchers find 10 million (10^7) or 1 billion (10^9) copies of a virus per mg, not 7-9, and the method the authors cite traces back to other studies that also report 10 million to 1 billion copies of the virus per mg. I wrote the authors and they confirmed my hunch that their y-axes should be a log-10 scale. In other words, when they removed the FCS viral titers were reduced 100-10,000x in humanized mice, consistent with the opposing 100-10,000x increases found in the recombinant SARS-CoVs reported to NIAID by the Wuhan Institute of Virology and EcoHealth Alliance in 2018. This finding provides weight to the hypothesis that EcoHealth Alliance and the Wuhan Institute of Virolgoy achieved their 100-10,000x increases in viral titers in humanized mice by adding a furin cleavage site, exactly like they proposed to do in DEFUSE, since removing the FCS from SARS-CoV-2 has an equal & opposite effect.
There are additional experiments & field studies we can do to better estimate the odds of a furin cleavage site in a SARS-CoV. Does deep sequencing of bat SARS-CoVs reveal any rare furin cleavage sites? Paul Bieniesz proposed that “the improbable becomes probable in large populations” - when we have 10^9 viruses per mg in bats (not 7-9) and sequence the shit out of their populations, do we find furin cleavage sites in low-abundance? If so, where are these sites? Are they disproportionately in the S1/S2 junction as proposed by DEFUSE, or are they randomly scattered throughout the genome? Related - where are the recombination hotspots of the SARS CoV genomes?
Some of these have already been done to some extent, and I can think of a lot more experiments still to be done - this is science at its finest. An experiment could reject my own “cherished” theory of a BsaI/BsmBI-based reverse genetic system, and I love that because experiments keep us honest. As an aside, I’d argue the theory isn’t very cherished if I would celebrate it being rejected by a good experiment - the goal of an honest theoretician is to translate theories into some form that they can be tested, pass the ball to empiricists, watch them score the goal, and celebrate the game regardless which theory wins. Some of these experiments are risky, including the insertion of a furin cleavage site in a SARS CoV or serial passaging… these are basically the same experiments that lab-origin theories believe created this pandemic. I’m not saying we should do them, but I am pointing out what we could learn from them. These risky experiments may have a one-time benefit not measurable in terms of health, but counterintelligence to understand the possibility that SARS-CoV-2 leaked from a lab and the Chinese government tried to cover it up.
These experiments could go against a lab-origin. We could find that viral genomes produced by the WIV and Chinese Academy groups post-COVID are indistinguishable in their recoverability & human ACE2 binding compared to SARS-CoVs uncovered by independent groups & those published pre-COVID. We could find Dr. Bieniesz is right and that FCS’s lurk in low-abundance in bat SARS-CoVs and that such FCS’s typically arise in the S1/S2 subunit as proposed in DEFUSE, lowering the anomalous nature of this FCS found in SARS-CoV-2. The BsaI/BsmBI map could be a coincidence and toxic elements prohibiting bacterial cloning would disprove my own beloved theory of a synthetic origin of SARS-CoV2 (that would be awesome!!). Maybe removing the FCS from SARS-CoV-2 doesn’t reduce transmissibility very much, and adding FCS’s to bat SARS-CoVs doesn’t increase transmissibility very much, maybe the WIV progress report’s 100-10,000x higher viral titers can be replicated by swapping Spike genes without needing to insert an FCS at all. Empirical findings such as these could lead us to update our estimated odds of genomic and epidemiological anomalies of SARS-CoV-2, decreasing Bayes factors favoring a lab origin. While not ruling out a lab origin, such results would dramatically change the Bayes factors enough to increase my own uncertainty about the origin of SARS-CoV-2. They would also speak the language that best improves scientific discourse: data.
Based on what we know now, however, I think the more likely result from these experiments is evidence accumulating in favor of a lab origin. Hence, it’s not so bold for me to invite tests of a theory I believe has a high probability of passing the test (especially if I would celebrate being wrong).
Genomic Analyses of SARS-CoV-2
The genome of SARS-CoV-2 has been available for 3 years and been looked at from almost every possible angle by hundreds of thousands if not millions of researchers. Yet, there may still be more there. Our October 2022 analysis of the BsaI/BsmBI sites, for example, revealed that there may still be some important findings lurking in the cryptic alphabet soup. As analysis of restriction maps contextualized the BsaI/BsmBI sites hidden in plain sight, I hypothesize analyses looking at the sites & sizes of recombination events in SARS-CoV and other CoV genomes may provide crucial context for the recombination events seen in SARS-CoV-2.
There may be more to read in the genome to assess various bioengineering hypotheses. The BsaXI recognition sequence surrounding the furin cleavage site may provide a means of inserting the furin cleavage site. Was this site inserted with silent mutations like the BsaI/BsmBI sites? Can you easily add/remove FCS’s with BsaXI?
There are other ways a counter-adversarial skeptic could analyze the SARS-CoV-2 genome through what I call a “telltale heart” approach. If there was a crime & cover-up, then focusing on the cover-up may reveal otherwise hidden details about the crime. Here, potentially (or hypothetically) fabricated sequences released by the primary suspects could be used to prioritize genomic analyses of SARS-CoV-2. One might look for mutation hotspots in RaTG13 not found in other CoVs and examine whether or not those hotspots correspond to genomic tags or restriction sites of relevance for synthesizing SARS-CoV-2. Similarly, a telltale-heart approach could look at RpYN06 and other viruses released by the WIV + Chinese Academy post-COVID, identify their anomalies, and then ask if SARS-CoV-2 is anomalous in the same or similar way. If the data are poisoned to hide something, use anomalies in hypothetically poisoned data to see if anything else is hidden in SARS-COV-2.
I wasn’t born a counter-adversarial skeptic. However, I grew up in a world of gang violence and criminalization, caught between the rocks of rival gangs and the hard place of police that saw us all as criminals no matter what we did. This upbringing made me rather counter-adversarial: where are cops searching, and how can I avoid them? What alibis can one create to provide plausible deniability or poison authorities with doubt? What can we anticipate about the efforts of rivals with mad beef, and how can we stay one step ahead? Lots of stories there, but not for this article.
I’m no longer a criminalized kid, just a boring old white scientist, but old habits die hard. I believe a lab origin is most likely, by a lot, that the Chinese government most likely covered it up, and that the Chinese government would be desperate and make mistakes in efforts to poison genomic data to cover its tracks. Consequently, if I were allocating my investment or attention, or if I were betting on where one might find evidence, I bet a free world full of clever forensic analysts would be capable of finding more broom strokes leading to the cellar that makes the CCP’s telltale heart beat. This skepticism must be balanced with US intelligence agencies claiming Chinese officials had a lack of foreknowledge of the outbreak, but such claims must be balanced with the possibility of a cryptic lab origin, or an earlier epidemiological anomaly thought to be contained but later causing a surge of cases, later discovery that the viral genome is descendant from a virus in a lab, and later efforts to take over the Wuhan Institute of virology, delete genomes & attempt a cover-up. The SARS-CoV-2 genome exists like a hieroglyph-filled wall inside a pyramid, with many opportunities to study what the deoxyribonucleic hieroglyphs might indicate about the origins of SARS-CoV-2.
Animal Reservoirs
Animal reservoirs are the main line of inquiry for a zoonotic origin. After all, zoonotic origin theorists don’t put any stock in CoV workers getting sick, the Chinese military taking over the WIV, the databases/emails/notebooks of researchers studying CoVs in Wuhan, the genomic anomalies of SARS-CoV-2, the possibility of data-poisoning from China, or the likelihood that SARS-CoV-2 is unnatural in any way that can be better understood in the lab. I wish I had the innocence to see the world that way, but it’s important to be able to see the world through their eyes to ensure we aren’t missing anything in making our own conclusions.
If SARS-CoV-2 were zoonotic, zoonotic-origin researchers could, in theory, boost their case with serosurveys of animal traders and lab workers studying CoVs in Wuhan. However, by now, the seropositivity of everyone in China today is likely more heavily impacted by COVID-19 infections during the pandemic than spillovers or incidence in the early outbreak. Sigh. How easy would that have been for China, with its military having control of the WIV and *knowing* it wasn’t a lab leak, to just run a serosurvey of workers there to remove all doubt? Even in January 2020, when suspicions of a lab origin first emerged in Western media outlets, the Chinese government claiming a successful lockdown strategy could’ve just run an <$5,000 serosurvey of workers at the WIV and Wuhan CDC to disprove a lab origin. Why they didn’t do this, despite a massive effort to contain the virus easily supporting the logistics, is a mystery. Sigh. I guess we’ll never know…
All that really remains for the zoonotic origin theory is to find an actual animal reservoir containing a virus that is a closer relative of SARS-CoV-2 than BANAL52 and explains-away many of the genomic & geographic anomalies. Best case scenario, they find a SARS-CoV with a furin cleavage site with at least one CGG in a bat, raccoon dog, or cat somewhere in SE Asia, this SARS-CoV shares a common ancestor with SARS-CoV-2 ~4-8 years ago, it has some of the modified BsaI/BsmBI sites in SARS-CoV-2 reducing the anomaly of the restriction map in SARS-CoV-2, and it can be independently corroborated without any worries that e.g. it was injected into bats by groups with a conflict of interest or itself a fabricated genome that doesn’t exist in nature. Depending on the features of this SARS-CoV genome and the independence, reliability, and strength of corroboration, such evidence has the capacity to make me completely switch my belief to thinking a zoonotic origin is more likely.
However, I think a zoonotic origin is extremely unlikely for reasons I’ve detailed. I would not tell any friend or family member to waste their lives searching for raccoon dogs with SARS CoVs in hopes of finding a progenitor. People have already searched raccoon dogs and found nothing. They may find other things of interest, other cool viruses and whatnot, but they aren’t likely to find a progenitor and I’m not likely to find a UFO in Montana, because I believe UFO’s most likely don’t exist (or have other, benign explanations we can’t rule out) and I believe SARS-CoV-2 most likely came from a lab.
Instead of finding some progenitor, I believe it’s more likely that independent groups uncovering more wild SARS CoVs, like BANAL52, may increase the anomalous nature of SARS-CoV-2, increase the anomalies of SARS CoVs produced by Chinese groups post-COVID (e.g. the mutational hotspots of RaTG13 that are missing in the closer relative BANAL52), increase the weight of evidence of fabricated genomes or data poisoning and all of this would increase the seriousness of the breach of global scientific trust by an authoritarian nation that most likely ran labs creating a virus that killed 18 million people, and then sought to cover it up by a disinformation campaign that polluted scientific media.
What to listen for
I’m going to listen closely for the declassified US intelligence to see if it clarifies timelines & localizes which researchers were sick, what the Chinese military did when, and I can think of many follow-up studies one could run in response to this information but I’m not going to say it publicly because I don’t want to risk giving away details to the Chinese government.
I’m going to listen for emails from key parties to either confirm DEFUSE-related research activities up to & following the DEFUSE grant submission, or possibly to rule out some people and labs.
I’m very interested in whole-genomes. Additional studies claiming seropositivity or PCR+ will not really move the needle, but if someone found a whole-genome sequence from early December 2019 I would fall out of my chair.
I’m listening closely to analyses & assessments of sequence veracity, listening for unusual evolutionary events (mutation, recombination), analyses of evolutionary dynamics from sequences based on whether/not the downloading parties (or governments able to screen or push their research) have a COI on SARS-CoV-2 origins, and whether any lab studies can corroborate or cast doubt on the veracity of these sequences.
I’m listening closely for other wet-lab studies, including responses from the authors who looked at FCS deletions in humanized mice. I’m also listening closely for field studies from authors who don’t have major COIs on this topic, eagerly awaiting novel viral genomes they discover and news about the hosts in which they’re discovered.
I like to say I’m a “deaf guy, great listener”. Since I don’t know sign language, strategic listening is absolutely necessary for me to be a functional member of society, and it’s made me a Bayesian of sorts since kindergarten. The wise scientist should, in my opinion, be a Bayesian brain not only updating their beliefs on competing theories with sharp-penciled estimates of Bayes factors, but also by preferentially allocating attention and effort to uncover or ‘hear’ evidence with the highest expected Bayes factors (a combination of their Bayes factors, the likelihood evidence exists, and the likelihood we can obtain it).
I’ll keep my hearing aids to the ground - please let me know if you think of other important evidence to listen for and share any insights you may have about the evidence or reasoning discussed above!
Hope you're doing well Alex, glad you're keeping up the fight. It's unrelated to your current work, but if you have not already, I have to recommend reading Kevin's (Mckernan's) new sequencing experiments.
It's stunning how those in the know inexplicably shoehorn matters cover their arses. I guess it's a political 'skill'. However, if that alone does not offer up options to elicit a range of conspiracies and is just er, umm, then a series of coincidences, I'm a bloody Alien. To me it reeks of a stupid ambitious 'accident' or at the very least a scam of mind-boggling incompetence, of wayward ambition and hubris, and of course of avarice, whereby, I do see it also as a big enough $$$ pile-in to get Billy-bob out of bed in the morning - even if he was a late-entry Machiavellian rouge to the 'party'. I guess all of such will do anything to deflect blame or connectivity, as it is that big of a sordid mess.