can the controversial COVID genome database survive?

During the COVID-19 pandemic, one online platform emerged as the main repository for viral genome data. GISAID, an initiative launched in 2008 to improve the global sharing of influenza data, earned the trust of scientists by ensuring that they would be credited for the data they generated. It now hosts more than 15 million SARS-CoV-2 genome sequences, more than any other existing database, as well as around 2 million influenza sequences.

But several scientists have raised concerns about the platform’s lack of transparency, how it mediates disputes over credit and how it sanctions scientists who have allegedly violated its terms and conditions.

Given the importance of pathogen-genome data for tracking the emergence of viral variants and developing strategies and vaccines to combat them, it is crucial to discuss how researchers can continue to share these sequences going forward. Nature spoke with scientists in eight countries about what they see as the future of pathogen-genome sharing, and of GISAID.

The researchers acknowledged the key part that GISAID has played during the COVID-19 pandemic and stressed the need to preserve the platform. But many of them said that drastic changes are necessary, particularly in light of an investigation by Science in April into GISAID’s founder, Peter Bogner, who plays a large role in the platform’s operations. “GISAID has lost enormous credibility in recent weeks and appears to be untenable in its current form with its current leadership,” says Edward Holmes, a virologist at the University of Sydney in Australia. The scientists also discussed the limitations of other models of genome data sharing and how these limitations mainly affect academics in low-income countries.

In a statement sent to Nature, GISAID says the organization recognizes the need to transform. “Over the coming weeks and months, we will be announcing additional steps the initiative will be taking regarding our work towards our governance and related practices,” the statement says.

What is the origin of GISAID?

There are many data-sharing platforms that allow scientists to quickly disseminate genome-sequence data to the research community. Some of them are public-domain databases which provide unrestricted access to data. The largest include GenBank in the United States, the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan, which are all part of the International Nucleotide Sequence Database Collaboration.

Once a genome is available on one of these sites, almost any scientist can access and analyse the data. Although this lack of restrictions might be good for science, it discourages some researchers from sharing their data until they’ve had sufficient opportunity to publish their own findings. Anderson Brito, a virologist at the All for Health Institute (ITpS) in São Paulo, Brazil, says that researchers in low-income countries are particularly vulnerable. “When research groups from these countries submit their data to fully open databases, well-resourced groups may quickly analyse and take most of the credit,” he says.

Laboratory technicians work on the genome sequencing of the SARS-CoV-2 virus at the Pasteur Institute in Paris.

GISAID provided a mechanism to help prevent scientists from ‘scooping’ one another.Credit: Christophe Archambault/AFP via Getty

GISAID was conceived during the global spread of avian influenza in 2006 as another model for sharing genome data, one that that ensured due credit1.

The acronym originally stood for the Global Initiative on Sharing Avian Influenza Data, but its mission expanded to encompass all influenza viruses and, eventually, other pathogens. In contrast to the fully open platforms, the data shared on GISAID are available to registered users who agree not to republish the sequences without permission. And those who wish to publish analyses of data housed within GISAID must offer to collaborate with the scientists who produced the sequences.

Many academics saw the platform also as a way of creating equitable access to the broader spoils of genome research. GISAID supporters argued that if scientists from low- or middle-income countries provided influenza sequence data that enabled the development of an improved vaccine, for example, then they should be compensated; or, at the very least, their nations should have guaranteed access to the resulting vaccine.

GISAID never offered a mechanism to make that happen, but the arguments were important to many countries. Indonesia, for example, stopped sharing virus samples in 2007 amidst a deadly avian influenza outbreak in protest of the limited access to vaccines many countries face. It reversed that position in 2008 when it reportedly began sharing influenza data with GISAID.

In the early days of the COVID-19 pandemic, GISAID aimed to be the main repository for SARS-CoV-2 data. The GISAID team proactively reached out to politicians around the world to garner support, and also to researchers to provide training and resources. It quickly gained popularity.

Why is GISAID under scrutiny?

GISAID originally intended to enforce data sharing limits only temporarily. Bogner, an entrepreneur based in Santa Monica, California, and the scientists who first presented the idea stated that the sequences submitted to GISAID would be deposited in fully open data repositories such as GenBank, “with a maximum delay of six months”.

That promise was never fulfilled, however. GISAID lacks a mechanism to move the data to open-access databases, and that’s one of its main flaws according to critics. In its response to Nature, GISAID said it is already accessible to the public and therefore it doesn’t plan to provide a mechanism to transfer data. Researchers could, in theory, upload their sequences to an open database after depositing them on GISAID, but that would mean double the work. In practice, data get trapped under strong restrictions indefinitely. This motivated a group of scientists in 2021 to publish an open letter urging other academics to publish their SARS-CoV-2 genome data on fully open-access platforms.

Many scientists also say that the way GISAID controls access for different users seems to be arbitrary. Vinod Scaria and Bani Jolly, biologists at the CSIR Institute of Genomics and Integrative Biology in New Delhi, say that even within their own laboratory, users have different levels of access to the data. GISAID says that “everyone with valid access credentials has access to all data shared via GISAID,” but that “there are additional services that GISAID may provide to individuals who are affiliated with organizations with a proven record of using data under the proper and agreed upon guidelines.”

Some have pointed out that GISAID has been, at times, unfair in applying sanctions to users. Theo Sanderson, a pathogen genome researcher at the Francis Crick Institute in London, says that there have been “profound failures to date, in which access to data seems to have been withdrawn to punish users for perceived slights to GISAID”. Such slights might include public criticism or failure to acknowledge GISAID’s contributions. According to the organization, “in the vast majority of cases where a user’s access to GISAID was temporarily suspended, the user had materially breached GISAID’s terms of access.”

A vendor wearing a mask sells live turtles on Xihua Farmer's Market in Guangzhou, Guangdong province, China, 04 May 2020.

A dispute arose around access to sequence data from a market that sold live animals.Credit: Alex Plavevski/EPA-EFE/Shutterstock

In March, GISAID temporarily revoked access to the platform for a group of scientists involved in trying to uncover the origins of the COVID-19 pandemic. They had published an online report describing genomic data2 from swabs taken throughout the seafood market in Wuhan, China, where one of the first large outbreaks of the disease had been recorded. The platform said the publication violated its terms of use. The authors have disputed this, saying that they had credited the scientists who provided the data and reached out to them with an offer to collaborate.

“The GISAID compliance policy seems to have grey areas, which ultimately raise more questions and doubts,” wrote Scaria and Jolly in a joint statement to Nature.

The Science story describes episodes of scientists losing access to GISAID in apparent retaliation for public criticism of the platform, claims which GISAID refutes. It also discusses Bogner’s unconventional background and financial disputes with a vendor providing services to GISAID. “The revelations in the article are pretty shocking. And I do think that they highlight some real concerns about GISAID, and in particular GISAID’s governance,” says Emma Hodcroft, a molecular epidemiologist at the University of Bern.

GISAID says the organization has “never taken any retaliatory action or imposed suspensions” for any reason other than violations of its terms and conditions. It adds that its processes could be improved: “As part of our steps to address governance and operating structure, we consider it a top priority to ensure a fair process for appeals and address the concerns of our users around suspensions.”

GISAID’s governance page lists a scientific advisory council of 12 people, but it is unclear how the council operates. In practice, according to sources close to the platform, Bogner appears to make most decisions. “If you have massively important data sets that have untransparent governance structures that allow for unpredictable retaliation and erratic behaviour, that’s just not the way that a level playing field should work,” says biologist Amber Hartman Scholz, head of the science policy department at Leibniz Institute DSMZ, a collection of microorganisms and cell cultures, in Brunswick, Germany.

Why don’t scientists switch to another platform?

If some researchers are dissatisfied with GISAID’s governance, why don’t they simply start uploading their viral sequences to other existing databases?

One of the reasons is that it’s easier to upload data to GISAID. “Data submitters often praise its submission system, which is more intuitive and flexible than what is offered by other databases,” Brito says. “GISAID also invests in expert data curators, which — in close interaction with data submitters — help ensure the quality of the data.”

Another reason is that scientists want their data to be where everyone else’s sequences are. “My sequences are most useful when they can be combined with sequences that came from somewhere else in my country or the world,” says Hodcroft.

Furthermore, the fear of uploading data to open-access databases and being scooped — having other scientists publish studies using those sequences first — is real. “Whether we like it or not,” says Hodcroft, “the currency of science is publication.”

What are the possible ways forward?

Despite the problems, the scientists agree that GISAID is likely to remain an important resource for viral genomes. “It’s in the best interest of public health that GISAID and the innovations it has promoted over the past years survive,” says Brito. Jeremy Kamil, a virologist at LSU Health Shreveport in Louisiana, agrees. “To lose GISAID would be unequivocally a tragedy.”

But if GISAID is to have a future, it requires drastic reform and new leadership, its critics say. “A new and transparent governance structure is required, and one that should not be at the whim of a single individual,” says Holmes. That would translate into having clear criteria on who gets access to the platform and which actions would result in sanctions.

The platform also needs to provide mechanisms for scientists submitting sequences to release their data to public-domain archives if they want to do so, says Sanderson. Furthermore, the scientists urge GISAID to acknowledge when the platform collects data from public data sets and to clearly identify those sequences. Currently, scientists who download data from multiple platforms can’t verify whether they are working with duplicate sequences.

To implement those changes, GISAID would need to ideally bring in outside experts in ethics and governance, says Kamil. That would require Bogner to “hand over the keys and back off”, Kamil says.

In its statement, GISAID says the organization recognizes the need to evaluate aspects of its governance, but it didn’t provide details around potential changes. “I expect that funding agencies will put pressure to improve, significantly, the governance of such an important institution,” says Gustavo Palacios, a virologist at the Icahn School of Medicine at Mount Sinai in New York.

The only current funder listed on GISAID’s website is The Rockefeller Foundation. Although representatives there did not respond directly to Nature’s questions, they sent a statement by Bruce Gellin, chief of global public health strategy, based in Washington DC, noting that GISAID was one of the few platforms making the rapid sharing of genome data possible in the early days of the pandemic. “The Rockefeller Foundation believes global data sharing platforms should be based on principles of trust, collaboration, and discussion,” the statement says. “To support this, data generators, users, platforms, and expert groups must be accountable and transparent.”

If GISAID doesn’t change, some scientists think that a new organization might have to be built from scratch, but that such a database shouldn’t be very different from the current one. “From a platform perspective, it did actually do a pretty good job in addressing that balance between rapid and open sharing versus retention of some form of ownership over the data,” says Richard Webby, an infectious-disease scientist at St. Jude Children’s Research Hospital in Memphis, Tennessee, and a former member of GISAID’s scientific advisory council.

Ewan Birney, joint director of the European Molecular Biology Laboratory’s European Bioinformatics Institute in Cambridge, UK, which runs the open data platform ENA, says that a pathogen database endorsed by the World Health Organization is “an important part of the future.” Such a platform would “give assurance to low- and middle-income countries that they can share data in a controlled way for public health, while still having assurances that rules will be followed”, he says. “And then there should be a time at which such public-health data is transitioned to fully open data so that researchers can look at the data as a whole.”

The possibility that fully accessible data platforms completely take over seems unlikely for the immediate future because of scientists’ concerns about authorship. Birney says that open platforms such as ENA can recommend and encourage proper citation, but they don’t have the mechanisms to enforce it. “Rather, we think that’s better done through similar things to patent law, an international law where access and benefits are controlled.”

Ultimately, Holmes says, the goal should be to do everything possible to encourage people to share their data. “The key lesson from the COVID-19 pandemic is that data sharing is the single most important thing we can do to help prevent and control pandemics.”

Leave a Reply

Your email address will not be published. Required fields are marked *

Skip to content