SARS-CoV-2 evolutionary trends provide insights into its adaptive process to human hosts

Table of Contents

Evolutionary genomics studies have largely focused on the transformation of ribonucleic acid (RNA) viruses over time. The data acquired from these studies have become even more important in light of the coronavirus disease 2019 (COVID-19) pandemic.

Study: The emergence of variants with increased fitness accelerates the slowdown of genome sequence heterogeneity in the SARS-CoV-2 coronavirus. Image Credit: Andy Shell /


The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which is the virus responsible for COVID-19, is believed to have originated from a direct bat-to-human spillover event. Previous studies have demonstrated that bats are common natural reservoirs of SARS-like coronaviruses (CoVs).

A recombinant event between the bat coronavirus and a pangolin virus or an origin-unknown CoV is the most likely origin for SARS-CoV-2. Both the bat RaTG13 and pangolin P1E viruses closely match SARS-CoV-2, while other intermediate hosts have also been identified.

Since its emergence in late 2019, SARS-CoV-2 has already accumulated a significant amount of mutations that have led to the emergence of variants that exhibit vast differences in both their genetic and genomic compositions. In fact, several synonymous and non-synonymous, as well as mismatch and deletion, mutations have been identified in the SARS-CoV-2 genome sequence. Among them, the mutations responsible for increased viral fitness, epitope loss, and antibody escaping capabilities have been of particular scientific importance.

Several studies have reported that SARS-CoV-2 acquired mutations more slowly than expected for neutral evolution, which suggests that purifying selection can be the dominant mode of evolution. Studies on SARS-CoV-2 variant mutations have also highlighted convergent evolution, which may contribute to viral adaptation to human hosts.

Active or passive mutation pressure can lead to variations in base composition at all levels of the phylogenetic hierarchy and throughout the genome. The compositional domain structure can also be altered by either changing nucleotides at the borders that separate the two domains or by changing nucleotide frequencies at any given region.

This array of compositional domain with different nucleotides have been reported to be related to different biological features in many organisms. Therefore, changes in genome heterogeneity can be useful for evolutionary and epidemiological studies, as they can reveal the adaptive process of CoVs to the human host.

A new study published on the preprint server bioRxiv* used the Sequence Compositional Complexity (SCC) to measure genome heterogeneity through the help of a proper segmentation algorithm. The study also uses phylogenetic ridge regression to reveal a long-term tendency of reduced genome sequence heterogeneity in SARS-CoV-2.

About the study

The current study involved retrieval of random high-quality CoV genome sequences followed by their masking, alignment, and phylogenetic tree analysis. A compositional segmentation algorithm was subsequently used to divide the CoV genome into homogenous and non-overlapping domains.

The genome heterogeneity was measured using SCC, followed by phylogenetic ridge regression. Finally, the effects of SARS-CoV-2 variants were compared among each other.

Study findings

The first SARS-CoV-2 genome sequence, which was obtained at the beginning of the COVID-19 pandemic, was divided into eight compositional domains. Subsequent SARS-CoV-2 strains showed a great variation in each domain’s number and length, as well as nucleotide composition.

The SARS-CoV-2 genome sequence heterogeneity did not follow any trend in SCC during the first year of the pandemic. However, its sequence heterogeneity started to decrease over time with the emergence of variants in December 2020. Furthermore, a tendency towards faster evolution was identified throughout the study period.

The SARS-CoV-2 Omicron variant exhibited a stronger decline in SCC as compared to the other variants. The evolutionary rates for Omicron were also higher as compared to previous SARS-CoV-2 variants.


The loss of genome heterogeneity in CoVs is associated with the rise of high viral fitness variants, which subsequently causes the viruses to adapt to the human host. Further research on the evolutionary trends of new SARS-CoV-2 variants and recombinant lineages will help determine the extent to which the evolution of genome sequence heterogeneity of SARS-CoV-2 impacts human health.

*Important notice

bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.