Protein production and structure determination
To enable structural study of the C-terminal domains of the SARS-CoV-2 nsp3, we designed multiple CoV-Y-containing constructs and examined their protein expression level as well as folding propensity and stability. We previously described a protocol for expression and purification of a CoV-Y-containing construct (residues 1638–1945) that exhibits well-dispersed signals in NMR spectra consistent with a folded protein25. Still, this construct has a largely unstructured N terminus, evidenced by a high degree of spectral crowding in the central region of the two-dimensional (2D) 1H–15N-TROSY spectrum. As a result, a shorter construct (residues 1660–1945) was designed and used for subsequent NMR and X-ray crystallography studies (Supplementary Fig. S1). This construct was expressed in Escherichia coli and the resulting recombinant protein is monomeric in solution as determined by size-exclusion chromatography (Supplementary Fig. S1c). The 1H–15N TROSY spectrum of the protein exhibits an extremely well-dispersed set of resonances, consistent with a monomeric protein of ~ 32 kDa, and nearly complete backbone assignment was achieved26. We also succeeded in obtaining diffraction-quality crystals of this construct. The CoV-Y crystals contain one protein molecule per asymmetric unit. The structure was determined and refined at a resolution of 2.4 Å by multiple-wavelength anomalous dispersion using data collected from a selenomethionine-substituted crystal (Table 1). The completed electron density map allowed unambiguous tracing of most of the protein except the five N-terminal residues (residues 1660–1664) (Supplementary Fig. S2). Crystal packing analyses of the symmetry-related molecules confirm that the CoV-Y construct used in this study is a monomer (Supplementary Fig. S3). The final structure was refined to an R factor of 19.2% and an Rfree of 23.2% with good stereochemistry (Table 1).
Overall structure of the nsp3 CoV-Y domain
The CoV-Y domain of SARS-CoV-2 nsp3 has a twisted structure resembling a letter V with linear dimensions of approximately 60 Å × 50 Å × 32 Å (Fig. 1b,c). The protein is organized into three distinct subdomains (Y2, Y3 and Y4) with a deep cleft in the middle; previous proteomic analysis suggested that the CoV-Y domain of SARS-CoV nsp3 consists of two subdomains27. The N-terminal Y2 subdomain (residues 1665–1763) is dominated by six β strands (S21–S26) arranged into two nearly orthogonal β sheets (S21↑S26↑S25↑ on one sheet and S22↑S23↓S24↑ on the other) to form a β sandwich-like structure packed across a hydrophobic core (Fig. 2a). A 16-residue α helix (H23) connecting strands S25 and S26 stacks on the exposed side of the parallel sheet, whereas three short 310 helices (H21, H22 and H24) surround the open side of the β sandwich.
The CoV-Y domain comprises three distinct subdomains. (a) Ribbon diagram of the Y2 subdomain. Residues lining the internal hydrophobic core are shown in stick representation. (b) Ribbon diagram of the Y3 subdomain. A cluster of hydrophobic and aromatic residues in the central core of the helix bundle is shown in stick representation. (c) Ribbon diagram of the Y4 subdomain. The hydrophobic residues in its core are shown in stick representation. (d) Secondary structure propensity of the CoV-Y domain in solution derived using TALOS-N58 as shown previously26. Probabilities of occurrence for helices [P(α)] and β-sheet [P(β)] are shown in red and blue, respectively. The corresponding secondary structure elements from the X-ray structure of the CoV-Y domain are shown across the top as cylinders (helices) and arrows (β strands).
The middle Y3 subdomain (residues 1764–1847) adopts a compact α helical fold composed of a three-helix bundle (H31–H33) packing against the 4th helix (H34) with an axial tilt of ~ 60° (Fig. 2b). The antiparallel helix bundle has a down-up-down topology, of which three helices interact over four turns and bury 2729 Å2, equivalent to 49% of their combined total accessible surface area. A cluster of aromatic side chains (Phe1773, Tyr1776, Phe1780, Phe1784, Phe1815 and Phe1823) forms a twisted ladder in the central core of the helix buddle stabilizing interactions between three helices. All four helices of the Y3 subdomain are largely amphipathic, with polar and charged residues on the outside and nonpolar side chains buried inside (Fig. 2b).
The C-terminal Y4 subdomain (residues 1848–1945) consists of a α/β fold with two three-stranded β sheets (S41↑S45↑S44↑ and S42↓S43↓S46↑) in the center flanked by three α helices (H42–H44) and a 310 helix (H41) (Fig. 2c). The two β sheets run alongside each other with a tilt of ~ 30° and are stabilized mainly by electrostatic and side-chain hydrogen-bonding interactions between S44 and S42. Two helix pairs (H41/H42 and H44/H45) pack against one face of the β sheets to form a dome-like structure that is lined with a network of hydrophobic side chains in its core (Fig. 2c).
Among three subdomains, Y2 and Y3 are held together by a short linker and a polar interface between the antiparallel sheet of Y2 and the helix bundle of Y3. Y4 packs tightly against Y3 and the Y3–Y4 interface is dominated by hydrophobic interactions between H34 of Y3 and H44 and S45 of Y4. On the other hand, there is only minimal interaction between Y2 and Y4 with the side chains of Gln1769 (Y2) and Lys1909 (Y4) forming a salt bridge crossing the middle of the cleft. Importantly, the secondary structures of CoV-Y observed in the crystal structure are in excellent agreement with the secondary-structure elements predicted from the assigned backbone and Cβ chemical shifts except for the extreme C-terminal region26, indicating that CoV-Y adopts a similar fold in solution (Fig. 2d).
The monomeric structure of CoV-Y is unique
The overall structure of CoV-Y reveals a peculiar topology without a high-degree of similarity to any known fold in the Protein Data Bank (PDB). A hierarchical search against the PDB and the AlphaFold database using Dali28 failed to reveal any other proteins with a topology significantly similar to that of CoV-Y [Z-scores > 10; root-mean-square deviation (rmsd) < 10 Å]. Interestingly, a relatively remote similarity was found with a number of AAA + ATPase domain-containing proteins in various organisms (Z-scores ~ 7; rmsd ~ 10 Å). The characteristic feature of AAA + proteins is a structurally conserved ATP-binding module that assembles into oligomeric ring-like complexes in which large subdomains (AAA-LD) combine to form the AAA ring and the small subdomains (AAA-SD) form the protrusions as in a pinwheel29. The individual subdomains of CoV-Y can be superimposed rather poorly with two AAA domain pairs of Schizosaccharomyces pombe Mdn1, an AAA-containing protein essential for ribosome biogenesis30. The secondary-structure elements of Y2 and Y4 of CoV-Y show certain structural resemblance with the AAA-LD, whereas the helical Y3 subdomain shares a fair degree of similarity with the AAA-SD of Mdn1 (PDB ID: 6ORB); CoV-Y and Mdn1 share a 5.5% sequence identity and display a 5.4-Å rmsd for 109 corresponding Cα atoms (Supplementary Fig. S4a). Although the spatial arrangement of three CoV-Y subdomains is well aligned with two neighboring AAA-LDs and an associated AAA-SD of Mdn1, their respective hydrophobic cores and potential ligand binding pockets are quite distinct, indicating that an evolutionary relationship is unlikely.
We also performed Dali searches against the individual CoV-Y subdomains. For Y2, the β sandwich-like fold was found to have a marginal similarity with a part of the N-terminal domain of a Bacteroides spp. β-hexosaminidase31 (PDB ID: 6Q63; Z-score = 4.1; rmsd = 5.8 Å) (Supplementary Fig. S4b). Y3 shows low structural resemblance to the helical domain of ent-copalyl diphosphate synthase from Arabidopsis thaliana (PDB ID: 3PYB; Z-score = 6.4; rmsd = 3.0 Å) (Supplementary Fig. S4c). Likewise, Y4 does not exhibit significant similarity to other proteins; it can only be partially superimposed with the N-terminal domain of archaeal Ribonuclease P32 (PDB ID: 6K0B) with a low Z-score (3.3) and high rmsd (3.9 Å; Supplementary Fig. S4d). On the other hand, the fold of Y4 in CoV-Y is essentially identical to the crystal structure of Y4 determined independently (residues 1844–1943; PDB ID: 7RQG; Supplementary Fig. S4d). The two structures can be superimposed with a 0.7 Å rmsd in Cα positions, indicating that interaction with Y2 and Y3 subdomains does not cause significant conformational change in Y4. Taken together, we conclude that CoV-Y possesses three distinct subdomains and represents a previously uncharacterized type of protein fold that may have the potential to form larger assemblies.
The structure of CoV-Y is conserved among human coronavirus homologs
All three coronaviruses that have caused the epidemic and pandemic outbreaks of diseases in human populations belong to Betacoronaviruses with SARS-CoV-2 and SARS-CoV being from Sarbecovirus subgroup and MERS-CoV from Merbecovirus subgenus (https://talk.ictvonline.org/ictv-reports)33,34. In addition, four other strains of coronaviruses have been identified to associate with mild upper respiratory diseases in immunocompetent humans, among which HCoV-OC43 and HCoV-HKU1 are classified in the Embacovirus subgroup of Betacoronaviruses, while HCoV-229E and HCoV-NL63 belong to Alphacoronavirus33,34. The nsp3 CoV-Y domains of four human Betacoronavirus strains have sizes (369–377 residues) nearly identical to that of the SARS-CoV-2 nsp3 CoV-Y (369 residues), whereas those of two Alphacoronavirus strains (HCoV-229E and HCoV-NL63) are smaller (345 residues; Supplementary Fig. S5). Recent studies of the SARS-CoV-2 viral genome have also identified two close relatives, one found in horseshoe bat (BatCoV-RaTG13)7 and another in pangolin (Pangolin-CoV)35; it has been suggested that these two viruses are the natural and intermediary hosts of SARS-CoV-2, respectively7,34.
Sequence alignments of the nsp3 CoV-Y domains from all nine coronavirus strains show that they share ~ 27–99% pairwise sequence identities (Fig. 3). Among SARS-CoV-2, BatCoV-RaTG13 and Pangolin-CoV, only eight nonsynonymous substitutions, most of which maintain hydrophobicity and polarity of the residue, are found in their respective CoV-Y domains, confirming a close relationship between these three strains (Supplementary Fig. S5). For the human coronavirus strains, the SARS-CoV-2 CoV-Y shares 88% sequence identity with that of SARS-CoV, whereas the other strains including MERS-CoV show a much more distant relationship with identities ranging from 32 to 47% (Fig. 3b). As for the individual subdomains, Y3 is the least conserved domain among human strains with sequence identities ranging from 16 to 79%, whereas Y2 and Y4 share relatively higher conservation (27% to 93%) (Fig. 3b and Supplementary Fig. S5). However, those highly conserved residues are located throughout the entire sequences of the CoV-Y domains (Fig. 3a), and it is not immediately clear whether they play any role in mediating nsp3 function.
Sequence conservation among nsp3 CoV-Y homologs. (a) Sequence conservation of the nsp3 CoV-Y domain among seven human coronaviruses and two close relatives of SARS-CoV-2 (see “Methods” and Supplementary Fig. S5 for sequence information). Conservation is shown as a bar graph, with red bars indicating identity among nine CoV-Y homologs. Secondary-structure assignments of CoV-Y from the crystal structure are shown as cylinders (helices) and arrows (β strands). (b) Sequence identity of the Y1–CoV-Y, CoV-Y and four CoV-Y subdomains of the above eight coronavirus nsp3 sequences to those of SARS-CoV-2 nsp3. The ClustalW alignments were performed using DNASTAR Lasergene Suit 8. The identity numbers are shown in a color scale, with red indicating conservation above 95%. (c) Molecular surface of the CoV-Y domain structure colored according to homologous conservation. Dashed ellipse indicates the two highly conserved regions. (d) Close-up view of the H23 helix of Y2. The six residues with most variation in SARS-CoV-2 isolates are shown in stick representation.
With the CoV-Y structure at hand, we can elaborate on several questions relating to sequence and structure conservation among its homologs. First, computationally-predicted 3D models suggest that the annotated Co-V domains of other human coronaviruses likely have structures similar to that of the SARS-CoV-2 CoV-Y (Fig. 4a). The overall fold of the six human CoV-Y homologs predicted using AlphaFold36 closely resembles the structure observed for the SARS-CoV-2 CoV-Y with a 3.2–3.9 Å rmsd for equivalent Cα positions. We note that AlphaFold is, in this instance, highly accurate in predicting a novel fold because the calculated SARS-CoV-2 CoV-Y structure can be superimposed with the crystal structure with a 2.3 ± 0.8 Å rmsd in Cα positions (Fig. 4b). This finding is significant in light of ongoing debates over the ability of machine learning to innovate beyond the domain of possibilities present in the training data. It is also striking that individual subdomains in the predicted structures share common topologies to those in the crystal structure (Fig. 4c). Major conformational differences lie in three extended loop regions connecting H22 and S25 in Y2, H33 and H34 in Y3, and H41 and H42 in Y4. Notably, the Y3 subdomains of the two Alphacoronavirus strains (HCoV-229E and HCoV-NL63) feature only three helixes and lack the presumed H32 helix found in all Betacoronavirus strains, consistent with the smaller sizes of their CoV-Y domains (Fig. 4a,c and Supplementary Fig. S5). In addition, all five top models calculated for each CoV-Y homolog are generally matched closely to each other, with rmsd ranging from 1.0 to 2.3 Å in Cα positions, suggesting that top-ranked models display good fidelity (Supplementary Table S2). These analyses suggest that the core fold of the SARS-CoV-2 nsp3 CoV-Y domain likely represents a common fold for all CoV-Y domains in known human coronaviruses.
Structural conservation among nsp3 CoV-Y homologs. (a) Ribbon diagrams of the top-ranked AlphaFold-computed structural models of the nsp3 CoV-Y domains for four human Betacoronaviruses (SARS-CoV, MERS-CoV, HCoV-HKU1 and HCoV-OC43) and two human Alphacoronaviruses (HCoV-229E and HCoV-NL63). Dashed ellipse indicates the H32 helix of Y3 on the Betacoronaviruses that is presumably lacking in Alphacoronavirus CoV-Y. (b) The RMSDs of the five top-scored models calculated using AlphaFold36 to the crystal structure of the SARS-CoV-2 nsp3 CoV-Y. (c) Superposition of the crystal structure and six structural models of the CoV-Y homologs.
Second, the investigation of homologous sequence conservation in the structural context of the CoV-Y domains identifies two most highly conserved surface areas (Regions I and II) in the nsp3 homologs (Fig. 3c). Region I is located on the C-terminal corner of the long helix H23 followed by the strand S26 in Y2, including four invariant residues, Ala1739/V1741/Gln1745 in H23 and Ile1751 in S26. Region II is located on the base of the two β sheets and their surrounding helices and loops in Y4, including eight invariant residues, Asn1853/Asn1854 (S42), Tyr1859/Lys1861 (the loop connecting S42 and H41), Asp1869 (H42), Trp1895 (S44) and Leu1924/Thr1925 (S45). These two regions have much higher sequence conservation than the rest of the structure, suggesting that they may play important roles in mediating specific functions of the nsp3s.
Last, as an RNA virus, SARS-CoV-2 is known to rapidly accumulate genomic mutations and some of these nonsynonymous amino-acid residue changes may have structural and functional impact on the encoded proteins24,37. Previous studies based on the analysis of nearly 200,000 individual SARS-CoV-2 viral genomes over the course of the ongoing COVID-19 pandemic have suggested that the C-terminal region of nsp3 is less frequently mutated compared to the other parts of the genome, likely due to its key role in inducing DMV formation37. To gain new insights into the distribution of naturally occurring nsp3 Y1–CoV-Y variations, we conducted a broad BLAST search of 5,000 individual SARS-CoV-2 isolates using the reference Y1–CoV-Y sequence (residues 1577–1945; NCBI Accession #: YP_009742610) as a query. Sequence alignments of the Y1–CoV-Y domains show that they share ~ 99.46–100% pairwise sequence identities, in which 360 sequences have one variation and one sequence has two variations. Remarkably, the entirety of the 362 variations are located among six residues in the middle of the H23 helix in the Y2 subdomain, ranging from Glu1733 to Ser1738 (Fig. 3d and Supplementary Fig. S5). Among six variants, A1736V was detected 355 times (~ 98%), S1738L three times and the rest including E1733G, S1734P, S1735F and K1737R once. Since these amino acid substitutions were likely found in human isolates, we infer that those nsp3 CoV-Y variants correspond to stable, functional proteins. We note that five of the substitutions except K1737R increase the hydrophobicity of the respective residue. In addition, E1733G and S1734P are two amino acid substitutions that may disrupt the helix. As discussed above, the entire H23 helix is exposed in CoV-Y, yet it is predicted to interact extensively with a β hairpin in Y1 (see below). S1738L would break an observed hydrogen bond with Ile1608 of Y1, while Ser1735/Ala1736/Lys1737 makes no contact with Y1. Whether those substitutions play any role in mediating Y1–CoV-Y function and/or its interactions with other proteins requires further investigation.
Fragment screening of CoV-Y
To gain insights into functions of the nsp3 CoV-Y domain, we partnered with the international COVID19-NMR consortium (https://covid19-nmr.de/) in a large-scale NMR-based ligand screening campaign that aimed to identify poised fragments targeting SARS-CoV-2 proteins38. The DSI (Diamond-SGC-iNEXT)-poised library consisting of 768 fragments represents a small collection of diverse compounds (< 300 Da) that bind promiscuously but cover a large chemical space and facilitate streamlined downstream hit optimization39. Four different 1H-based NMR experiments of ligands, including chemical shift perturbation, waterLOGSY, saturation transfer difference and differential T2-relaxation, in the presence and absence of the unlabeled CoV-Y protein were recorded and analyzed38. In total, 81 displayed changes in at least one of the four NMR experiments. Among them, 17 fragments were qualified as binders for CoV-Y based on the spectral changes observed in at least two out of four NMR experiments (Supplementary Table S3). Following the computational mapping strategy developed by the consortium38, we next employed FTMap40 to identify accessible surface cavities in the CoV-Y structure. Using 16 small organic compounds provided by the FTMap server as scanning probes, we identified ten ligand-binding hot spots (clusters 0–9) that are clustered largely around the deep cleft in the middle of the V-shaped structure crossing the interfaces between Y2/Y3 and Y2/Y4 (Fig. 5a).
Molecular docking of the NMR-identified binders onto the structure of CoV-Y. (a) Molecular surface of CoV-Y showing the ten surface cavities identified using the FTMap server (https://ftmap.bu.edu). (b) Molecular docking of the 17 binders identified by NMR fragment-screening onto the CoV-Y structure by AutoDock Vina57. The top-ranked docking pose of each binder is shown in stick representation in four sites overlapped with those identified by FTMap. Chemical structure of each binder and the estimated binding energy of their top docking poses are listed in Supplementary Table S3. (c) Molecular surface representation of CoV-Y, colored according to the local electrostatic potential ranging from − 5 kT/e in deep red (most negative) to 5 kT/e in dark blue (most positive), calculated using the program ABPS59.
We next used molecular docking to locate the potential surface cavities on CoV-Y capable of accommodating the identified fragment binders. Each of the 17 fragments was docked to CoV-Y as a flexible entity using AutoDock Vina41; the binding energies of the top poses of each binder range between − 4.7 kcal/mol and − 6.4 kcal/mol (Supplementary Table S3). As expected, most of the top-ranked poses of the fragments dock into the same cavities identified by FTMap, which can be organized into four distinguishable sites (Fig. 5b). Site 1 sits in the center of the Y2/Y3/Y4 interface and is surrounded by three helices (H23, H34 and H44) from the three subdomains; this site overlaps with the FTMap clusters 0, 2 and 8. Site 2 is located in a shallow surface groove between the H22 of Y2 and the helix bundle of Y3, and crosses the FTMap cluster 4. Site 3 is a cavity formed by H44 and the central β sheet of Y4 and situated on the back of the site 1 close to the FTMap cluster 1. Site 4 is adjacent to the site 3 and located in a pocket formed by the outside β sheet of Y2 and the H31 and H34 of Y3; this site spans across the FTMap clusters 3 and 9. Interestingly, the cavities accommodating the sites 1, 2 and 4 are mostly positively charged, whereas the site 3 has a largely electronegative surface (Fig. 5c). However, we should point out that we do not have any known ligands or knowledge of the ligand binding site in CoV-Y, making it difficult to confirm the validity of the docking results. Additionally, docking programs can be sensitive to the geometrical structure of the target protein and the size of the ligand, making it challenging to compare the absolute free energy values among identified hits. Nevertheless, the goal of this high-throughput fragment-based screening method was to identify potential hits as a starting point for chemistry exploration and optimization. Thus, it is expected that the binding energy of the initial hits will be weak and will require further optimization to increase binding strength. Taken together, our structural and fragment binding data suggest that CoV-Y has a unique structure that can bind to specific ligands, which can be used to further functional studies and therapeutic development.
Assembly of the Y1 and CoV-Y domains
The extreme C-terminal region of nsp3 contains the Y1 domain (residues 1577–1659) preceding the CoV-Y domain. The Y1 domain is predicted to be conserved in the majority of the viral families of the order Nidovirales, while CoV-Y is suggested to be present only in coronaviruses19,20. Indeed, SARS-CoV-2 nsp3 Y1 shares ~ 46–90% pairwise sequence identity in human coronaviruses, which is higher than those of the CoV-Y subdomains (Fig. 3b and Supplementary Fig. S5). Because we were unable to obtain diffraction-quality crystals of the construct comprising both Y1 and CoV-Y of SARS-CoV-2 nsp3, the Y1–CoV-Y structure was computed using AlphaFold36 (Fig. 6a,b). The predicted Y1 structure exhibits a novel fold highlighted with two adjacent zinc finger (ZF)-like motifs (Fig. 6a). The first ZF motif (ZF1) adopts a HCCC-type TAZ2 domain-like zinc-binding site42 formed by the C-terminus of the H11 helix (His1581), a short loop (Cys1586 and Cys1591) and the N-terminus of the H12 helix (Cys1594). The second ZF motif (ZF2) harbors a CHCC-type zinc-binding site in which the four ligands (Cys1627, His1630, Cys1634 and Cys1637) are located entirely in a short zinc-binding loop between the strand S12 and the helix H13. The eight zinc coordinating residues in ZF1 and ZF2 are invariant in all human coronaviruses (Supplementary Fig. S5), however, the biological roles of these two ZF motifs are presently unknown.
Assembly of the Y1 and CoV-Y domains. (a) Ribbon diagrams of the top-ranked AlphaFold-computed structural model of the SARS-CoV-2 nsp3 Y1 domain, colored according to homologous conservation as described in Fig. 3a. The coordinating residues of ZF1 (His1581, Cys1586, Cys1591 and Cys1594) and ZF2 (Cys1627, H1630, Cys1634 and Cys1637) are shown in stick representation. (b) Superposition of the crystal structure of CoV-Y and the top-ranked computed structural model of Y1–CoV-Y of SARS-CoV-2 nsp3. The CoV-Y structure is colored as in Fig. 1b, while Y1–CoV-Y is in gray. (c) Close-up view of the modeled Y1-Y2 interface showing interacting residues of Y1 (orange) and Y2 (blue). Hydrogen bonds are shown as black dashed lines. (d) Molecular surface representation of Y1–CoV-Y, colored according to the local electrostatic potential ranging from − 5 kT/e in deep red (most negative) to 5 kT/e in dark blue (most positive), calculated using the program ABPS59. (e) A schematic model showing the interaction between the nsp3 Y1–CoV-Y region and the subcellular membrane. We propose that the observed positively-charged (+) patch on the top face of Y1 and Y4 directly interacts with the anionic glycerophospholipids (−) embedded in the membrane and contribute to the formation of the transmembrane pore in DMVs.
In the calculated Y1–CoV-Y structure, Y1 sits on the top of Y2 and interacts almost solely with Y2, except for a single hydrogen bond between Asn1638 of Y1 and Lys1917 of Y4 (Fig. 6b). We note that the predicted top five CoV-Y models in Y1–CoV-Y are highly similar to the crystal structure with an average 2.8 ± 1.4 Å rmsd in the equivalent Cα positions and a five-model variance of 1.5 ± 0.7 Å. The assembly of Y1 and Y2 is underpinned by their strong surface complementarity. The long β hairpin and the loop connecting S12 and H13 of Y1 form a surface groove in the shape of a car seat to embrace the H23 helix of Y2. The Y1-Y2 interaction is mediated by a network of hydrogen-bond and van der Waals contacts (Fig. 6c). At least three pairs of hydrogen bonds, Ile1608 (Y1)/Ser1738 (Y2), Asn1610 (Y1)/Asn1712(Y2) and Glu1650 (Y1)/Ser1669 (Y2), have been observed in all five top scored models. Additionally, a group of aromatic residues in Y1 (Phe1616, Tyr1617, Tyr1619, His1630, Phe1640 and Phe1646) and Y2 (Tyr1742, Tyr1743 and Pro1750) are packed around the interface. The association of Y1 and Y2 buries a total of ~ 1,530 Å2 of solvent-accessible surface on the two subdomains. Interestingly, a number of lysine and arginine residues, 11 from Y1 and 12 from Y4, line the top face of Y1 and Y4 to form a continuous positively-charged patch, while the large part of Y2 and Y3 are mostly negatively charged (Fig. 6d). These surface charge distributions might be involved in orienting the Y1–CoV-Y domains in such a way that the electropositive regions are directed toward and interact with the electronegative membrane surface (Fig. 6e).
We next used AlphaFold to compute the structural models of the Y1–CoV-Y domain of the six other human coronavirus nsp3s. As expected, all calculated models conformed to a conserved overall fold for Y1 as well as the arrangement of the four subdomains (Supplementary Fig. S6). Importantly, the structural conservation among those homolog proteins further reflected in the general similarities of the electrostatic surface of their subdomains. In particular, a significant portion of the exposed surface area of the Y1/Y4 subdomains in those homologs is electropositive, similar to that of the SARS-CoV-2 nsp3 Y1–CoV-Y (Supplementary Fig. S6a). These observations imply that specific membrane binding might be a conserved activity of the Y1–CoV-Y domain of nsp3 across the human coronavirus family.
Implications for SARS-CoV-2 replication and coronavirus biology
The COVID-19 pandemic has highlighted the need for deeper knowledge of coronavirus biology for the development of novel intervention strategies. A complete characterization of virion components comprising the RTC is therefore important for understanding the dynamics and progress of the viral replication cycle. The mechanisms by which the viral nsps and host cell factors are recruited and localized to the defined subcellular locations to establish RTCs remain a longstanding unanswered question. As a key component of the RTC, significant progress has been made in characterization of nsp3 domains that possess well-conserved functions and activities. The C-terminal region of the protein, however, remains enigmatic. The sequence of this part of the protein. including the region from the first transmembrane helix to the C terminus of nsp3, does not show much similarity to other viral or cellular proteins. While it is suggested that the transmembrane region and the ectodomain in between comprise a membrane-anchoring platform essential for DMV membrane rearrangement18, the exact function of the Y domains is unclear. The goal of this study was to elucidate the structural properties of the CoV-Y domain of the SARS-CoV-2 nsp3 at the atomic level. Our results support a model in which the nsp3 Y domains are directly involved in the formation of the molecular pore in DMVs and perhaps mediate interaction with specific membrane lipids.
The crystal structure of the CoV-Y domain reveals an unusual configuration in which the three distinct subdomains form a unique V-shaped fold that has not been observed previously. The CoV-Y construct used in our study is monomeric in solution as determined by size exclusion chromatography, NMR and crystal packing analyses. However we note that the organization of three Co-Y subdomains is somewhat reminiscent of the assembly of the two conserved AAA + domains into a ring-like oligomers29. The recent electron microscopy study showed that molecular pores involving six copies of nsp3 span across DMV membranes in MHV-infected cells with the N-terminal nsp3 domains pointing towards the cytosol9. It is thus tempting to speculate that in addition to the transmembrane domains, the Y domains including Y1 and CoV-Y may form higher-order oligomeric complexes and play a role in pore formation. It is possible that the Y1 domain is required for assembly of the oligomeric Y1–CoV-Y complex. Despite expressing a number of different Y1–CoV-Y constructs, we were unable to produce homogeneous proteins that could be used for structural and biophysical studies. Therefore, at present we do not have sufficient evidence to definitively support a hexameric configuration for the nsp3 Y domains.
Both the sequence alignments and 3D structure prediction analyses suggest that the unique fold of CoV-Y represents the first structural view of the entire domain and is likely conserved among the CoV-Y homologs in coronaviruses. Furthermore, two specific sequence-conserved regions in the CoV-Y crystal structure, including the C-terminal portion of the H23 helix in the Y2 subdomain, have been identified that prioritize testing of the functions of the conserved side chains in vivo by site-directed mutagenesis. Interestingly, preliminary comparative sequence analysis of the genomes of the individual SARS-CoV-2 isolates reveals that all the naturally occurring variations in the Y1–CoV-Y region are clustered in the middle of the H23 helix of Y2. Notably, the H23 helix is also likely involved in the interaction with the Y1 domain based on the predicted Y1–CoV-Y structure. Together, those independent observations suggest that the H23 helix of Y2 or the CoV-Y domain as a whole may provide a framework for elucidating the correlation between viral evolution and constraints imposed by the structure and functionality of the encoded protein.
Analysis of the computed Y1–CoV-Y structure suggests that Y1 and Y4 form a continuous positively-charged top surface that in theory might potentially interact with headgroups of anionic glycerophospholipids in cellular and organelle membranes in eukaryotes43,44,45. Anionic phospholipids include phosphatidic acid, phosphatidylserine, phosphatidylinositol, and various forms of phosphatidylinositol phosphates45. These lipids are present in low abundance in eukaryotic membranes but are abundant in endomembrane compartments including ER and Golgi43. It is thus possible that in addition to the TM1–3Ecto–TM2 region, the Y domains of nsp3 may forecast potential orientations of the nsp3 protein by docking onto the ER plasma membrane through interactions with specific phospholipids (Fig. 6e).
In addition, our fragment screening and molecular docking studies have identified several exposed surface cavities in CoV-Y that provide valuable information for identifying potential ligand interaction and functional annotation. Moreover, previous studies have shown that the Y1 and CoV-Y domains interact not only with other domains (NAB and βSM that precede TM1) of nsp3 but other nsps (nsp9 and nsp12)46. A more recent study of the host interactomes of nsp3 has suggested that the 3Ecto–TM2–Y1–CoV-Y region of nsp3 interacts with host proteins associated with the ER-associated protein degradation pathways and proteins involved in cholesterol, lipid and N-glycan biosynthesis21. How those observed protein–protein/ligand interactions affect the respective function of nsp3 and other proteins remain unknown. Clearly, further study is required to determine the precise role of the nsp3 Y domains in viral replication and life cycle as a whole.