Viral phylodynamics is the study of how epidemiology, immune system, and processes act and potentially interact to shape virus phylogenies. Since the term was coined in 2004, research on viral phylodynamics has focused on transmission dynamics in an effort to shed light on how these dynamics impact viral genetic variation. Transmission dynamics can be considered at the level of cells within an infected host, individual hosts within a population, or entire populations of hosts.
Many viruses, especially , rapidly accumulate genetic variation because of short and high . Patterns of viral genetic variation are therefore heavily influenced by how quickly transmission occurs and by which entities transmit to one another. Patterns of viral genetic variation will also be affected by selection acting on viral phenotypes. Although viruses can differ with respect to many phenotypes, phylodynamic studies have to date tended to focus on a limited number of viral phenotypes. These include virulence phenotypes, phenotypes associated with viral transmissibility, cell or tissue tropism phenotypes, and antigenic phenotypes that can facilitate escape from immune system. Due to the impact that transmission dynamics and selection can have on viral genetic variation, viral phylogenies can therefore be used to investigate important epidemiological, immunological, and evolutionary processes, such as epidemic model, spatio-temporal dynamics including metapopulation, zoonosis, tissue tropism, and antigenic drift. The quantitative investigation of these processes through the consideration of viral phylogenies is the central aim of viral phylodynamics.
Although these three phylogenetic features are useful rules of thumb to identify epidemiological, immunological, and evolutionary processes that might be impacting viral genetic variation, there is growing recognition that the mapping between process and phylogenetic pattern can be many-to-one. For instance, although ladder-like trees could reflect the presence of directional selection, ladder-like trees could also reflect sequential genetic bottlenecks that might occur with rapid spatial spread, as in the case of rabies virus. Because of this many-to-one mapping between process and phylogenetic pattern, research in the field of viral phylodynamics has sought to develop and apply quantitative methods to effectively infer process from reconstructed viral phylogenies (see Methods). The consideration of other data sources (e.g., incidence patterns) may aid in distinguishing between competing phylodynamic hypotheses. Combining disparate sources of data for phylodynamic analysis remains a major challenge in the field and is an active area of research.
Viral control efforts can also impact the rate at which virus populations evolve, thereby influencing phylogenetic patterns. Phylodynamic approaches that quantify how evolutionary rates change over time can therefore provide insight into the effectiveness of control strategies. For example, an application to HIV sequences within infected hosts showed that viral substitution rates dropped to effectively zero following the initiation of antiretroviral drug therapy. This decrease in substitution rates was interpreted as an effective cessation of viral replication following the commencement of treatment, and would be expected to lead to lower viral loads. This finding is especially encouraging because lower substitution rates are associated with slower progression to AIDS in treatment-naive patients.
Antiviral drug also creates selective pressure for the evolution of drug resistance in virus populations, and can thereby affect patterns of genetic diversity. Commonly, there is a fitness trade-off between faster replication of susceptible strains in the absence of antiviral treatment and faster replication of resistant strains in the presence of antivirals. Thus, ascertaining the level of antiviral pressure necessary to shift evolutionary outcomes is of public health importance. Phylodynamic approaches have been used to examine the spread of oseltamivir resistance in influenza A/H1N1.
Traditional evolutionary approaches directly utilize methods from computational phylogenetics and population genetics to assess hypotheses of selection and population structure without direct regard for epidemiological models. For example,
In an effort to bridge the gap between traditional evolutionary approaches and epidemiological models, several analytical methods have been developed to specifically address problems related to phylodynamics. These methods are based on coalescent theory, birth-death models, and simulation, and are used to more directly relate epidemiological parameters to observed viral sequences.
The expected waiting time to find the MRCA of the sample is the sum of the expected values of the internode intervals,
Two corollaries are :
Consequently, the TMRCA estimated from a relatively small sample of viral genetic sequences is an asymptotically unbiased estimate for the time that the viral population was founded in the host population.
For example, Robbins et al. estimated the TMRCA for 74 HIV-1 HIV subtype genetic sequences collected in North America to be 1968. Assuming a constant population size, we expect the time back to 1968 to represent of the TMRCA of the North American virus population.
If the population size changes over time, the coalescent rate will also be a function of time. Donnelley and Tavaré derived this rate for a time-varying population size under the assumption of constant birth rates:
Very early in an epidemic, the virus population may be growing exponentially at rate , so that units of time in the past, the population will have size . In this case, the rate of coalescence becomes
If the rate of exponential growth is estimated from a gene genealogy, it may be combined with knowledge of the duration of infection or the serial interval for a particular pathogen to estimate the basic reproduction number, . The two may be linked by the following equation:R M Anderson, R M May (1992) Infectious Diseases of Humans: Dynamics and Control. Oxford: Oxford University Press. 768 p.
For example, one of the first estimates of was for pandemic H1N1 influenza in 2009 by using a coalescent-based analysis of 11 hemagglutinin sequences in combination with prior data about the infectious period for influenza.
For the simple SIR model, this yields
Early in an epidemic, , so for the SIR model
When a disease is no longer exponentially growing but has become endemic, the rate of lineage coalescence can also be derived for the epidemiological model governing the disease's transmission dynamics. This can be done by extending the Wright Fisher model to allow for unequal offspring distributions. With a Wright Fisher generation taking units of time, the rate of coalescence is given by:
For example, for the SIR model above, modified to include births into the population and deaths out of the population, the population size is given by the equilibrium number of infected individuals, . The mean basic reproduction number, averaged across all infected individuals, is given by , under the assumption that the background mortality rate is negligible compared to the rate of recovery . The variance in individuals' basic reproduction rates is given by , because the duration of time individuals remain infected in the SIR model is exponentially distributed. The variance in the offspring distribution is therefore 2. therefore becomes and the rate of coalescence becomes:
This rate, derived for the SIR model at equilibrium, is equivalent to the rate of coalescence given by the more general formula. Rates of coalescence can similarly be derived for epidemiological models with super-spreader or other transmission heterogeneities, for models with individuals who are exposed but not yet infectious, and for models with variable infectious periods, among others. Given some epidemiological information (such as the duration of infection) and a specification of a mathematical model, viral phylogenies can therefore be used to estimate epidemiological parameters that might otherwise be difficult to quantify.
Beyond the presence or absence of population structure, phylodynamic methods can be used to infer the rates of movement of viral lineages between geographic locations and reconstruct the geographic locations of ancestral lineages.
Here, geographic location is treated as a phylogenetic character state, similar in spirit to 'A', 'T', 'G', 'C', so that geographic location is encoded as a substitution model.
The same phylogenetic machinery that is used to infer models of DNA evolution can thus be used to infer geographic transition matrices.
The end result is a rate, measured in terms of years or in terms of nucleotide substitutions per site, that a lineage in one region moves to another region over the course of the phylogenetic tree.
In a geographic transmission network, some regions may mix more readily and other regions may be more isolated.
Additionally, some transmission connections may be asymmetric, so that the rate at which lineages in region 'A' move to region 'B' may differ from the rate at which lineages in 'B' move to 'A'.
With geographic location thus encoded, ancestral state reconstruction can be used to infer ancestral geographic locations of particular nodes in the phylogeny. These types of approaches can be extended by substituting other attributes for geographic locations. For example, in an application to rabies virus, Streicker and colleagues estimated rates of cross-species transmission by considering host species as the attribute.
Simulation-based models require specification of a transmission model for the infection process between infected hosts and susceptible hosts and for the recovery process of infected hosts.
Simulation-based models may be compartmental, tracking the numbers of hosts infected and recovered to different viral strains, or may be individual-based, tracking the infection state and immune history of every host in the population.
Generally, compartmental models offer significant advantages in terms of speed and memory usage, but may be difficult to implement for complex evolutionary or epidemiological scenarios.
A forward simulation model may account for geographic population structure or age structure by modulating transmission rates between host individuals of different geographic or age classes.
Additionally, seasonality may be incorporated by allowing time of year to influence transmission rate in a stepwise or sine wave fashion.
To connect the epidemiological model to viral genealogies requires that multiple viral strains, with different nucleotide or amino acid sequences, exist in the simulation, often denoted for different infected classes.
In this case, mutation acts to convert a host in one infected class to another infected class.
Over the course of the simulation, viruses mutate and sequences are produced, from which phylogenies may be constructed and analyzed.
For antigenically variable viruses, it becomes crucial to model the risk of transmission from an individual infected with virus strain 'A' to an individual who has previously been infected with virus strains 'B', 'C', etc...
The level of protection against one strain of virus by a second strain is known as cross-reactivity.
In addition to risk of infection, cross-immunity may modulate the probability that a host becomes infectious and the duration that a host remains infectious.
Often, the degree of cross-immunity between virus strains is assumed to be related to their hamming distance.
In general, in needing to run simulations rather than compute likelihoods, it may be difficult to make fine-scale inferences on epidemiological parameters, and instead, this work usually focuses on broader questions, testing whether overall genealogical patterns are consistent with one epidemiological model or another. Additionally, simulation-based methods are often used to validate inference results, providing test data where the correct answer is known ahead of time. Because computing likelihoods for genealogical data under complex simulation models has proven difficult, an alternative statistical approach called Approximate Bayesian Computation (ABC) is becoming popular in fitting these simulation models to patterns of genetic variation, following successful application of this approach to bacterial diseases. This is because ABC makes use of easily computable summary statistics to approximate likelihoods, rather than the likelihoods themselves.
Further analysis of HA has shown it to have a very small effective population size relative to the census size of the virus population, as expected for a gene undergoing strong positive selection. However, across the influenza genome, there is surprisingly little variation in effective population size; all genes are nearly equally low.
This finding suggests that reassortment between segments occurs slowly enough, relative to the actions of positive selection, that genetic hitchhiking causes beneficial mutations in HA and NA to reduce diversity in linked neutral variation in other segments of the genome.
Influenza A/H1N1 shows a larger effective population size and greater genetic diversity than influenza H3N2, suggesting that H1N1 undergoes less adaptive evolution than H3N2.
This hypothesis is supported by empirical patterns of antigenic evolution; there have been nine vaccine updates recommended by the WHO for H1N1 in the interpandemic period between 1978 and 2009, while there have been 20 vaccine updates recommended for H3N2 during this same time period.
Additionally, an analysis of patterns of sequence evolution on trunk and side branches suggests that H1N1 undergoes substantially less positive selection than H3N2. However, the underlying evolutionary or epidemiological cause for this difference between H3N2 and H1N1 remains unclear.
All of these phylogeographic studies necessarily suffer from limitations in the worldwide sampling of influenza viruses. For example, the relative importance of tropical Africa and India has yet to be uncovered. Additionally, the phylogeographic methods used in these studies (see section on phylogeographic methods) make inferences of the ancestral locations and migration rates on only the samples at hand, rather than on the population in which these samples are embedded.
Because of this, study-specific sampling procedures are a concern in extrapolating to population-level inferences. However, estimates of migration rates that are jointly based on epidemiological and evolutionary simulations appear robust to a large degree of undersampling or oversampling of a particular region. Further methodological progress is required to more fully address these issues.
Later work by Ferguson and colleagues adopted an agent-based approach to better identify the immunological and ecological determinants of influenza evolution.
The authors modeled influenza's hemagglutinin as four epitopes, each consisting of three amino acids.
They showed that under strain-specific immunity alone (with partial cross-immunity between strains based on their amino acid similarity), the phylogeny of influenza A/H3N2's HA was expected to exhibit 'explosive genetic diversity', a pattern that is inconsistent with empirical data.
This led the authors to postulate the existence of a temporary strain-transcending immunity: individuals were immune to reinfection with any other influenza strain for approximately six months following an infection.
With this assumption, the agent-based model could reproduce the ladder-like phylogeny of influenza A/H3N2's HA protein.
Work by Koelle and colleagues revisited the dynamics of influenza A/H3N2 evolution following the publication of a paper by Smith and colleagues which showed that the antigenic evolution of the virus occurred in a punctuated manner. The phylodynamic model designed by Koelle and coauthors argued that this pattern reflected a many-to-one genotype-to-phenotype mapping, with the possibility of strains from antigenically distinct clusters of influenza sharing a high degree of genetic similarity.
Through incorporating this mapping of viral genotype into viral phenotype (or antigenic cluster) into their model, the authors were able to reproduce the ladder-like phylogeny of influenza's HA protein without generalized strain-transcending immunity.
The reproduction of the ladder-like phylogeny resulted from the viral population passing through repeated selective sweeps.
These sweeps were driven by herd immunity and acted to constrain viral genetic diversity.
Instead of modeling the genotypes of viral strains, a compartmental simulation model by Gökaydin and colleagues considered influenza evolution at the scale of antigenic clusters (or phenotypes).
This model showed that antigenic emergence and replacement could result under certain epidemiological conditions.
These antigenic dynamics would be consistent with a ladder-like phylogeny of influenza exhibiting low genetic diversity and continual strain turnover.
In recent work, Bedford and colleagues used an agent-based model to show that evolution in a Euclidean antigenic space can account for the phylogenetic pattern of influenza A/H3N2's HA, as well as the virus's antigenic, epidemiological, and geographic patterns.
The model showed the reproduction of influenza's ladder-like phylogeny depended critically on the mutation rate of the virus as well as the immunological distance yielded by each mutation.
Genetic and antigenic variation of influenza is also present across a diverse set of host species.
The impact of host population structure can be seen in the evolution of equine influenza A/H3N8: instead of a single trunk with short side-branches, the hemagglutinin of influenza A/H3N8 splits into two geographically distinct lineages, representing American and European viruses.
The evolution of these two lineages is thought to have occurred as a consequence of quarantine measures.
Additionally, host immune responses are hypothesized to modulate virus evolutionary dynamics.
Swine influenza A/H3N2 is known to evolve antigenically at a rate that is six times slower than that of the same virus circulating in humans, although these viruses' rates of genetic evolution are similar.
Influenza in aquatic birds is hypothesized to exhibit 'evolutionary stasis', although recent phylogenetic work indicates that the rate of evolutionary change in these hosts is similar to those in other hosts, including humans.
In these cases, it is thought that short host lifespans prevent the build-up of host immunity necessary to effectively drive antigenic drift.
The rate of exponential growth of HIV in Central Africa in the early 20th century preceding the establishment of modern subtypes has been estimated using coalescent approaches. Several estimates based on parametric exponential growth models are shown in table 1, for different time periods, risk groups and subtypes. The early spread of HIV-1 has also been characterized using nonparametric ("skyline") estimates of .
The early growth of subtype B in North America was quite high,
however, the duration of exponential growth was relatively short, with saturation occurring in the mid- and late-1980s.
At the opposite extreme, HIV-1 group O, a relatively rare group that is geographically confined to Cameroon and that is mainly spread by heterosexual sex, has grown at a lower rate than either subtype B or C.
HIV-1 sequences sampled over a span of five decades have been used with relaxed molecular clock phylogenetic methods to estimate the time of cross-species viral spillover into humans around the early 20th century.
The estimated TMRCA for HIV-1 coincides with the appearance of the first densely populated large cities in Central Africa.
Similar methods have been used to estimate the time that HIV originated in different parts of the world.
The origin of subtype B in North America is estimated to be in the 1960s, where it went undetected until the AIDS epidemic in the 1980s.
There is evidence that progenitors of modern subtype B originally colonized the Caribbean before undergoing multiple radiations to North and South America.
Subtype C originated around the same time in Africa.
By analyzing phylogenies estimated from HIV sequences from men who have sex with men in London, United Kingdom, Lewis et al. found evidence that transmission is highly concentrated in the brief period of primary HIV infection (PHI), which consists of approximately the first 6 months of the infectious period.
In a separate analysis, Volz et al. found that simple epidemiological dynamics explain phylogenetic clustering of viruses collected from patients with PHI.
Patients who were recently infected were more likely to harbor virus that is phylogenetically close to samples from other recently infected patients. Such clustering is consistent with observations in simulated epidemiological dynamics featuring an early period of intensified transmission during PHI. These results therefore provided further support for Lewis et al.'s findings that HIV transmission occurs frequently from individuals early in their infection.
There is some evidence from comparative phylogenetic analysis and epidemic simulations that HIV adapts at the level of the population to maximize transmission potential between hosts. This adaptation is towards intermediate virulence levels, which balances the productive lifetime of the host (time until AIDS) with the transmission probability per act. A useful proxy for virulence is the set-point viral load (SPVL), which is correlated with the time until AIDS. SPVL is the quasi-equilibrium titer of viral particles in the blood during chronic infection. For adaptation towards intermediate virulence to be possible, SPVL needs to be heritable and a trade-off between viral transmissibility and the lifespan of the host needs to exist. SPVL has been shown to be correlated between HIV donor and recipients in transmission pairs, thereby providing evidence that SPVL is at least partly heritable. The transmission probability of HIV per sexual act is positively correlated with viral load, thereby providing evidence of the trade-off between transmissibility and virulence. It is therefore theoretically possible that HIV evolves to maximize its transmission potential. Epidemiological simulation and comparative phylogenetic studies have shown that adaptation of HIV towards optimum SPVL could be expected over 100–150 years. These results depend on empirical estimates for the transmissibility of HIV and the lifespan of hosts as a function of SPVL.
Additionally, improvements in sequencing technologies will allow detailed investigation of within-host evolution, as the full diversity of an infecting quasispecies may be uncovered given enough sequencing effort.
Compartmental models
Here, is the per capita rate of transmission to susceptible hosts, and is the rate at which infected individuals recover, whereupon they are no longer infectious. In this case, the incidence of new infections per unit time is , which is analogous to the birth rate in classical population genetics models. The general formula for the rate of coalescence is:
The ratio can be understood as arising from the probability that two lineages selected uniformly at random are both ancestral to the sample. This probability is the ratio of the number of ways to pick two lineages without replacement from the set of lineages and from the set of all infections: . Coalescent events will occur with this probability at the rate given by the incidence function .
This expression is similar to the Kingman coalescent rate, but is damped by the fraction susceptible .
This has the same mathematical form as the rate in the Kingman coalescent, substituting . Consequently, estimates of effective population size based on the Kingman coalescent will be proportional to prevalence of infection during the early period of exponential growth of the epidemic.
where the effective population size is the population size divided by the variance of the offspring distribution .J Wakeley (2008) 'Coalescent Theory: an Introduction. USA: Roberts & Company The generation time for an epidemiological model at equilibrium is given by the duration of infection and the population size is closely related to the equilibrium number of infected individuals. To derive the variance in the offspring distribution for a given epidemiological model, one can imagine that infected individuals can differ from one another in their infectivities, their contact rates, their durations of infection, or in other characteristics relating to their ability to transmit the virus with which they are infected. These differences can be acknowledged by assuming that the basic reproduction number is a random variable that varies across individuals in the population and that follows some continuous probability distribution. The mean and variance of these individual basic reproduction numbers, and , respectively, can then be used to compute . The expression relating these quantities is given by:
Phylogeography
Simulation
Examples
Phylodynamics of influenza
Selective pressures
Circulation patterns
Simulation-based models
The phylodynamic diversity of influenza
Phylodynamics of HIV
Origin and spread
+Estimated annual growth rates of for early HIV sub-epidemics. Central Africa Central Africa North America/Eur/Aust, MSM Cameroon
Contemporary epidemiological dynamics
Viral adaptation
Future directions
See also
|
|