INTRODUCTION
The first half of 2022 marks the rise of the seventh peak of the SARS-CoV-2 pandemic worldwide due to the different sub-lineages of Omicron VOC 1. Simultaneously, in May 2022, outbreaks of the monkeypox virus (MPXV) were confirmed outside the African continent. The first cases detected were in the United Kingdom, related to travelers returning from Nigeria, an African country that has historically reported monkeypox cases. However, this outbreak has spread to other countries in Europe, America, Asia, Australia, and other African countries. To date, more than 55,000 cases and 15 deaths have been reported in 75 countries worldwide 1.
During the emergence of the first cases of monkeypox outside Africa, the World Health Organization (WHO) stated that the outbreak was considered of low impact on the general population in the affected countries. However, in June 2022, the WHO declared that the outbreak of MPXV poses an evolving public health threat, confirming five deaths in Africa from this outbreak. Therefore, in July 2022, even without reaching a consensus of the WHO’s Emergency Committee charged with assessing the outbreak, Tedros Adhanom Ghebreyesus, the General Director of WHO, declared that monkeypox constituted an international emergency as the outbreak met their requirements and stated that the health care should be taken as seriously as that of COVID-19 1.
MPXV is a species of the genus Orthopoxvirus of the family Poxviridae2, included in this viral group Variola (VARV) and Vaccinia viruses (VACV) 3. The MPXV genome consists of linear double-stranded DNA (≈198 kb) and it is covalently linked in its ends region by palindromic hairpins and inverted terminal repeats (ITRs), which are formed by hairpin loops, tandem repeats, and some open reading frames (ORFs). Few conserved genes encoding virus-cell interaction proteins (ABCNMK) are located in the left and right terminal regions of the genome. In contrast, the more conserved genes (FEOPIGLJHD) with housekeeping functions are located in the central region of the genome4. It is known that of the 90 of the more conserved ORFs are known to be essential for poxvirus replication and morphogenesis. In contrast, many of the additional so-called non-essential and less conserved ORFs play a role in the differences in poxvirus host tropism, immunomodulation, and pathogenesis, and the part that many of the ORFs play is still unknown 3.
MPXV causes monkeypox, a neglected zoonotic disease 5, and has a wide range of hosts, including non-human primates, a variety of rodents (squirrels, rats, jerboa, woodchuck prairie dogs), civets, giant anteaters, antelopes, opossums and humans 6. The natural reservoir of the virus is still unknown. Monkeypox is characterized by a lower case-fatality ratio than smallpox 7. The incubation period of monkeypox is usually eight days but can range from four to 14 days 8. Among the symptoms that people infected with MPXV develop are fever, chills, muscle, head and back pain. The most notorious sign is the development of papular skin lesions and rash 3. Although most of the reported cases of 2022 MPXV outbreak are related to male patients who have sex with men, the monkeypox disease is not sexually transmitted but it can be spread mainly by close skin-to-skin contact between sexual partners 9.
MPXV are grouped into two clades: Clade I (former Congo Basin) and Clade II (former West Africa) (Fig. 1). The Clade I, considered more pathogenic and found primarily in the Democratic Republic of the Congo and surrounding counties, was responsible for the first documented human case of MPXV in 1970. The clade hMPXV-1A II has been assigned as clade IIa lineages: A.1, (sub-lineage A.1.1), A.2 and, with the current MPXV outbreak in 2022 a newly classified emerging clade IIb and lineage B.1 and sub-lineages: B.1.1, B.1.2, B.1.3, B.1.4, B.1.5, B.1.6, B.1.7, B.1.8 10-12. Lineage and sub-lineage assignments are based on 46 single-nucleotide polymorphisms (SNPs) observed in the 2022 MPXV outbreak’s strains compared with the NCBI Monkeypox reference sequence NC_063383 13.
With the COVID-19 pandemic, we have learned an important lesson about the need for global surveillance of SARS-CoV-2 genetic sequences, as well as the importance of sharing metadata in public databases accessible to the scientific community. Thus, genomic surveillance has been an essential resource for monitoring and tracking the evolution of mutations that have driven the development of new and more pathogenic variants. This early genomic detection has raised our knowledge for the establishment of better treatments, the study of potential new drugs with antiviral activity, and new vaccines. Additionally, genomic surveillance has improved our understanding of how the virus can enhance its spread, as well as the geographic and temporal origin and the way of the global spread of new variants, and visualize the viral evolution in real-time 14.
Among the global initiatives and efforts to publicly share SARS-CoV-2 pandemic metadata is the GISAID platform 15. The GISAID Initiative promotes a rapid sharing of data on all influenza viruses and the coronavirus that causes COVID-19, and ensures open access to data free of charge to the scientific community. GISAID was born to allow public access to the latest avian influenza genetic sequences and as an alternative to the public domain sharing model by enabling data sharing between WHO Collaborating Centers and National Influenza Centers by creating the EpiFlu™ database. In 2020, the EpiCoV™ database was created at moment this new database contains more than 11 million complete SARS-CoV-2 sequences, making the EpiCoV™ database the principal repository for pandemic COVID-19.
For the current global outbreak of MPXV, the GISAID initiative created in 2022 a database called EpiPox™. The goal of this work is to know what epidemiological and genomic information can this database provide us on MPXV risk to global public health, using different web-tools currently available for this type of study.
MATERIALS AND METHODS
Data Source and curation
Data were extracted from the GISAID EpiPox™ database 15. The cut date for available data was September 2, 2022. The data included in this study corresponded to the following inclusion criteria: sequences available from April 1 to September 2, 2022, and MXPV complete sequences with high coverage (less than 1% of undefined bases).
Data analysis
A multi-FASTA file of 940 complete MXPV sequences with high coverage including two reference sequences (Clade I: hMpxV/DRC/CDC-005/1978, Clade IIa: MpxV/USA/un-WRAIR7-61-P2/1962), was downloaded from the GISAID EpiPox™ database 15 (Fig. 2). Viral genomic sequences are the main pipeline of this study, but we were also interested in connecting them with other available epidemiological data such as patient information: patient status, age, gender, type of sample specimen, date and location of origin of the MPXV isolates therefore additionally TSV files with the above-mentioned information were downloaded. The multi-FASTA file with nucleotide sequences and the TSV file with the samples epidemiological metadata are linked through the accession_id of the sequences.
With the TSV files with epidemiological metadata from the GISAID EpiPox™ database, a correlational analysis was performed through an alluvial flow diagram, which represents the correlations between categorical dimensions represented as a flow, visually connecting the shared categories. Each rectangle of the categories represents a single value in the selected dimension and its height is proportional to the value. Curved lines represent the correlations and their weight is proportional to the values. The alluvial diagram was executed in RAWGraphs. RAWGraphs is an open-source data visualization framework built for the visual representation of complex data 16.
The multi-FASTA file of 940 complete MXPV sequences downloaded from the GISAID EpiPox™ database was first analysed using the Monkeypox virus typing tool from Genome Detective. Genome Detective is an intuitive Bio-Informatics application for the analysis of pathogenic microbial molecular sequence data 17. Monkeypox virus typing tool is designed to use BLAST and phylogenetic methods to identify the Monkeypox virus lineages (all clades) of a nucleotide sequence. Genome Detective generates a report with the specie assignment, subtyping, and sequence length as a CVS file.
Subsequently, the genomic sequences were analyzed in Nextclade (ver. 2.3.1)18. Nextclade, is a web-tool that performs based on Smith-Waterman alignment with an affine gap-penalty, that identifies differences between sequences and a reference sequence used by Nextstrain, 18 an open-source project to analyze pathogen genome data, to assign clades, mutation callings, and sequence quality 18. For monkeypox, analysis, the Monkeypox (all clades) algorithm was used with a reference sequence of a reconstructed ancestral sequence of MPXV reporting phylogenetic analysis. JSON and TSV files were downloaded from Nextclade for gene tree analysis. The JSON file containing the phylogenomic datasets was analysed at Auspice. us 18. Auspice.us is a web-tool for interactive exploration and visualizing phylogenomic datasets, also used by Nextstrain. From Auspice.us the files NEWICK and NEXUS were downloaded for further analysis and editing of the phylogenetic tree in iTOL (Interactive Tree of Life) (ver. 6) 19. iTOL is a web-tool for the visualization, annotation and management of phylogenetic trees including clade distances 19. The phylogenetic tree was downloaded in multiple formats (PDF, EPS, and SVG) for final editing and formatting in Illustrator CS (ver. 23.0.1).
A treemap graph and a sunburst diagram were constructed in RawGraphs 20, to analyze the proportion of genomic sequences and to identify the geographic distribution of lineages and sub-lineages, respectively. The treemap is composed of an area divided into small rectangles, representing the tree structure’s last level based on the proportion of the lineage and sub-lineages. The size of the rectangles depends on the quantitative dimension. The sunburst diagram shows hierarchically structured data and a related quantitative dimension by concentric circles. The centre circle represents the root node (the continents), and the hierarchies (the continents following by sub-lineages) move outward from the center. The angle of each arc corresponds to the qualitative dimension.
RESULTS
MPXV sequences information
From April 1 to September 2, 2022, the GISAID database contained a total of 1555 submitted MPXV sequences, of which only 940 sequences were complete genome sequences with a high coverage; additionally, including two reference sequences were included. Most sequences were submitted between May 1 and August 31, 2022 (data not shown).
In the period studied for this work, 30 countries have submitted sequences to the GISAID Epipox™ database. In Europe, the countries that have repositioned sequences are: Finland, Sweden, France, Slovenia, Italy, Austria, Netherlands, Portugal, UK, Germany, Spain, Hungary, Belgium and Slovakia. From America: USA, Canada, Ecuador, Chile, Brazil, Peru and Mexico. From Asia: South Korea, Indonesia, Thailand, Japan, Taiwan, Singapore and Israel. From Africa and Oceania: South Africa and Australia, respectively.
Inspection of the alluvial diagram for patterns of the complete MPXV sequences present in the EpiPox database shows that the vast majority of the deposited sequences of MPXV in EpiPox were from Europea (503 sequences - 53.51%), followed by America (423 sequences - 45%), and other sequences came from Oceania, Africa, and Asia (Fig. 3). The country with the highest number of sequences deposited in EpiPox™ is Germany (327 sequences - 34.78%), followed by USA, Canada, Peru, UK, Brazil, Portugal, and Netherland. The rest of the countries that have deposited sequences have less than ten sequences (between 1-8 sequences).
Unfortunately, not all the complete MPXV sequences have all the epidemiological information, i.e., the vast majority of the patient’s gender is unknown (746 sequences - 79.36%). In the cases where this data was reported, the male gender corresponded to the main gender (188 sequences - 20%), while only six sequences were from female patients (0.63%).
Regarding the origin of the specimens of the MPXV sequences, most came from related lesion origin (including crusts and vesicles) (530 sequences - 56.38%). Other types of specimens specified came from nasopharyngeal, buccal, blood, anal and genital areas. For a large number of the specimens the origen of the samples was unknown (364 sequences - 38.72%).
MPXV sequences phylogenetic assignation
Of the 940 sequences obtained in GISAID EpiPox™, the Genome Detective web tool assigned 937 (99.7%) as MPXV of the clade II (Table 1) while identifying reference sequences as Variola virus and one as Abatino macacapox virus, both viruses affiliated to the genus Orthopoxvirus.
Blast assignment | Genotype assignment | Sequences count | Percentage |
---|---|---|---|
Abatino macacapox virus | Not assigned | 1 | 0.106% |
Monkeypox virus | Clade II | 937 | 99.7% |
Variola virus | Not assigned | 2 | 0.213% |
Total | 940 | 100% |
The Maximum Likelihood Phylogenetic tree of 940 whole genome sequences of MPXV was developed as per definitions of the GISAID clades using the Nextstrain algorithm (Fig. 4). The overall clades, lineages, and sub-lineages distribution were highlighted, revealing the dominant occurrence of lineage B.1 following the nomenclature proposed and described in Happi et al.12 and recently endorsed by a WHO convened consultation 11. The GISAID MPXV sequences were analyzed to characterize the diversity of clade II. All sequences were assigned within clade II within sub-lineage B.1 regardless of continent or country of origin, and only one sequence within sub-lineage A.2 from Thailand.
Globally, most sequences studied were assigned to the B.1 lineage of clade II (54.14%) followed by the sub-lineages B.1.1 (15%)> B.1.2 (7.55%)> B.1.6 (7.44%)> B.1.7 (5.10%)> B1.3 (3.9%)> B.1.4 (2.34%)> B.1.8 (2.23%)> B.1.5 (2.12%)> A.2 (0.1%) (Fig. 5A). Within countries, the distribution is similar to the overall incidence observed, i.e., B.1 prevails as the predominant lineage followed by sub-lineage B.1.1., in almost all the countries with some exceptions (Slovenia: B.1.3, Italy: B.1.5, Peru B.1.6, and Thailand: A.2) (Fig. 5.B).
DISCUSSION
During a large-scale pandemic with an exponential spread such as COVID-19, research data have become an extremely important resource, especially regarding genomic surveillance of SARS-CoV-2. There are now several possibilities for sharing genomic sequences from research and diagnostics. Among the initiatives to share this type of data for biomedical researchers is the GISAID repository. The COVID-19 global health emergency has shown that to accelerate research and control of these infections, research data must be shared rapidly and widely, allowing many published epidemiological studies to be developed solely from open research data 21.
MPXV is a neglected infectious pathogen and has re-emerged unexpectedly during the COVID-19 pandemic, becoming an outbreak of global concern to the worldwide health burden. As of June 2022, more than 75 countries have detected this virus; more than 55,000 confirmed cases had been confirmed, making it the largest outbreak outside of Africa since its discovery in the 1970s. The rapid spread of this virus is evidenced by the fact that they are travel-related 22.
In this study, we used the information deposited in the GISAID EpiPox™ database uploaded during April and September, 2022 and analyzed it using different web-tools. The vast majority of the sequences retrieved from GISAID are from the European continent, probably due to the center and origin of this MPXV outbreak later spreading mainly to the American continent. In addition, most samples were derived from male gender patients. These results are consistent with early-published reports of the MPXV outbreak 23.
Our analysis shows that the sequences recovered from the GISAID EpiPox™ database from the recent outbreak of monkeypox in several countries, wich initiated in April 2022, are derived from the B.1 monkeypox clade II. These results are similar to earlier-published reports 24-29, consistent with the reported mild severity and few deaths associated with the clade II 30,31.
Interestingly, sequences from only 30 countries have been deposited in GISAID in the period covered in this work. This contrasts the global data where at least 100 countries and territories with confirmed cases of MPXV have been reported 32. Notably, only five sequences from Spain were found for our analysis, a country with almost 7000 cases reported. Moreover, although cases have been reported in several African countries such as Cameroon, Central African Republic, Democratic Republic of the Congo, Ghana, Liberia, Nigeria, Republic of the Congo, no sequences from those countries were found in the database. Only two sequences were obtained from South Africa, a country that, like the rest of the countries analyzed in this study, has not historically reported monkeypox cases 32.
In Latin America, the sequences submitted by Peru indicate a different behavior with respect to the rest of the countries on the American continent. Most Peruvian sequences were assigned to B.1.6., a new lineage identified in the South American country and characterized by the nucleotide mutation G111029A 33. This specific type of mutation is characteristic of the action of the APOBEC3 family of deaminases. This enzyme acts on single-stranded DNA to deaminate cytosine to uracil, causing a G→A mutation on the other strand when it is newly synthesized 34,35. It has been reported that APOBEC3G in vif-defective HIV-1 virus, APOBEC molecules are packaged into the virion and induce large numbers of mutations 35,36. APOBEC3 type mutations have been observed within eight genomes sampled from an outbreak in Portugal 25,37.
The appearance of new sub-lineages in such a short period (April-September, 2022) indicates the rapid evolution and dynamics of this virus, possibly due to the changes generated by the spread of the virus outside the African continent where it was confined. However, further genetic, molecular, and perhaps external factors, such as environmental and human societal habits, need to be studied to understand better how MPXV evolves.
GISAID database publicly accessible database has allowed collaboration among researchers around the world to contribute to the understanding of the development and evolution of the SARS-CoV-2 pandemic and its impact on global public health. As well as access to near real-time variant emergence and key mutations and understanding the pathogenesis of the viruses, it has contributed to the study and development of potential new vaccines and drugs 38,39. GISAID EpiPox™ database is also expected to play a significant role in the surveillance of this new global outbreak of MPXV. However, it is essential to overcome the difficulties of collecting epidemiological data, to have a better and complete epidemiological landscape that will improve the monitoring of the current outbreak of MPXV.