Package 'imprinting' reference manual

Title:	Calculate Birth Year-Specific Probabilities of Immune Imprinting to Influenza
Description:	Reconstruct birth-year specific probabilities of immune imprinting to influenza A, using the methods of Gostic et al. (2016) <doi:10.1126/science.aag1322>. Plot, save, or export the calculated probabilities for use in your own research. By default, the package calculates subtype-specific imprinting probabilities, but with user-provided frequency data, it is possible to calculate probabilities for arbitrary kinds of primary exposure to influenza A, including primary vaccination and exposure to specific clades, strains, etc.
Authors:	Katelyn Gostic [aut], Alex Byrnes [ctb, cre]
Maintainer:	Alex Byrnes <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.1
Built:	2025-02-20 05:01:30 UTC
Source:	https://github.com/cobeylab/imprinting

Get data on the relative circulation of each influenza A subtype

Description

get_country_cocirculation_data() imports data on the fraction of influenza A cases in a specific country and year that were caused by each influenza A subtype (H1N1, H2N2, or H3N2), or group (group 1 or group 2). Group 1 contains H1N1 and H2N2, and group 2 contains H2N2.

Usage

get_country_cocirculation_data(
  country,
  max_year,
  min_samples = 30,
  output_format = "tibble"
)
get_country_cocirculation_data(
  country,
  max_year,
  min_samples = 30,
  output_format = "tibble"
)

Arguments

`country`	country of interest. Run `show_available_countries()` for a list of valid inputs.
`max_year`	last year of interest. Results will be generated from 1918:max_year.
`min_samples`	if fewer than `min_samples` (default 30) are reported in the country and year of interest, the function will substitute data from the corresponding WHO region.
`output_format`	can be 'tibble' (the default) or 'matrix' (used mainly for convenience within other functions)

Details

The data come from three sources:

Historical assumptions: From 1918-1956, we assume only H1N1 circulated. From 1957-1967, we assume only H2N2 circulated. From 1968-1976, we assume only H3N2 circulated.
Thompson et al. JAMA, 2003: From 1977-1996 we pull data on the relative dominance of H1N1 and H3N2 from Table 1 of Thompson et al. 2003, which reports surveillance data collected in the United States.
From 1997-present, we pull in country or region-specific data from WHO Flu Mart on the fraction of specimens collected in routine influenza surveillance that test positive for each subtype. Country-specific data are the default. Regional data, then global data are used if the number of country or region-specific specimens is insufficient. get_template_data() imports the data for 1918-1996. get_country_inputs_1997_to_present() and get_regional_inputs_1997_to_present() import the data for 1997 on.

Value

A matrix with rows showing the calendar year, the fraction of influenza A-positive specimens of each subtype (rows A/H1N1, A/H2N2, and A/H3N2), and of each HA group (rows ⁠group 1⁠, and ⁠group 2⁠). Row A should always be 1, as it shows the sum of subtype-specific fractions. Row B is a placeholder whose values are all NA.

Examples

get_country_cocirculation_data("United States", "2019")
get_country_cocirculation_data("Laos", "2022", min_samples = 40)

get_country_cocirculation_data("United States", "2019")
get_country_cocirculation_data("Laos", "2022", min_samples = 40)

Return raw flu surveillance data for a specific country

Description

Load and return a tibble containing raw influenza surveillance data for the country of interest. Data are from WHO Flu Mart.

Usage

get_country_inputs_1997_to_present(country, max_year)
get_country_inputs_1997_to_present(country, max_year)

Arguments

`country`	name of country. Run `show_available_countries()` for a list of options.
`max_year`	results will be output for all available years up to `max_year`.

Value

A tibble with the following columns:

Country: name of WHO region
Year: calendar year of surveillance
n_processed: total specimens processed

Examples

get_country_inputs_1997_to_present("Aruba", 1998)
get_country_inputs_1997_to_present("Honduras", 2022)

get_country_inputs_1997_to_present("Aruba", 1998)
get_country_inputs_1997_to_present("Honduras", 2022)

Get the relative intensity of influenza A circulation

Description

⁠get_country_intensity data()⁠ returns data on the annual intensity of influenza circulation in each calendar year. Following doi:10.1126/science.aag1322Gostic et al. Science, (2016), we define 1 as the average intensity. Seasons with intensities greater than 1 have more flu A circulation than average, and seasons with intensities less than 1 are mild.

Usage

get_country_intensity_data(country, max_year, min_specimens = 50)
get_country_intensity_data(country, max_year, min_specimens = 50)

Arguments

`country`	country of interest. Run `show_available_countries()` for valid inputs.
`max_year`	last year of interest. Results will be generated from 1918:max_year.
`min_specimens`	if fewer than `min_specimens` (default 50) were tested in the country and year of interest, the function will substitute data from the corresponding WHO region.

Details

For 1918-1996, we use annual intensities from Gostic et al., Science, (2016). For 1997-present, we calculate country or region-specific intensities using surveillance data from WHO Flu Mart. Intensity is calculated as: [fraction of processed samples positive for flu A]/[mean fraction of processed samples positive for flu A]. Country-specific data are used by default. Regional or global data are substituted when country-specific data contain too few observations or fail quality checks. Global data are only used in years when regional data are insufficient.

Value

A tibble showing the year and intensity score.

Calculate imprinting probabilities

Description

For each country and year of observation, calculate the probability that cohorts born in each year from 1918 through the year of observation imprinted to a specific influenza A virus subtype (H1N1, H2N2, or H3N2), or group (group 1 contains H1N1 and H2N2; group 2 contains H3N2).

Usage

get_imprinting_probabilities(
  observation_years,
  countries,
  annual_frequencies = NULL,
  df_format = "long"
)
get_imprinting_probabilities(
  observation_years,
  countries,
  annual_frequencies = NULL,
  df_format = "long"
)

Arguments

`observation_years`	year(s) of observation in which to output imprinting probabilities. The observation year, together with the birth year, determines the birth cohort's age when calculating imprinting probabilities. Cohorts <=12 years old at the time of observation have some probability of being naive to influenza.
`countries`	a vector of countries for which to calculate imprinting probabilities. Run `show_available_countries()` for a list of valid inputs, and proper spellings.
`annual_frequencies`	an optional input allowing users to specify custom circulation frequencies for arbitrary types of imprinting in order to study, e.g. imprinting to specific strains, clades, or imprinting by vaccination. If nothing is input, the default is to calculate subtype-specific probabilities (possible imprinting types are A/H1N1, A/H2N2, A/H3N2, or naive). See Details.
`df_format`	must be either 'long' (default) or 'wide'. Controls whether the output data frame is in long format (with a single column for calculated probabilities and a second column for imprinting subtype), or wide format (with four columns, H1N1, H2N2, H3N2, and naive) showing the probability of each imprinting status.

Details

Imprinting probabilities are calculated following doi:10.1126/science.aag1322Gostic et al. Science, (2016). Briefly, the model first calculates the probability that an individual's first influenza infection occurs 0, 1, 2, ... 12 years after birth using a modified geometric waiting time model. The annual circulation intensities output by get_country_intensity_data() scale the probability of primary infection in each calendar year.

Then, after calculating the probability of imprinting 0, 1, 2, ... calendar years after birth, the model uses data on which subtypes circulated in each calendar year (from get_country_cocirculation_data()) to estimate that probability that a first infection was caused by each subtype. See get_country_cocirculation_data() for details about the underlying data sources.

To calculate other kinds of imprinting probabilities (e.g. for specific clades, strains, or to include pediatric vaccination), users can specify custom circulation frequencies as a list, annual_frequencies. This list must contain one named element for each country in the countries input vector. Each list element must be a data frame or tibble whose first column is named "year" and contains numeric years from 1918:max(observation_years). Columns 2:N of the data frame must contain circulation frequencies that sum to 1 across each row, and each column must have a unique name indicating the exposure kind. E.g. column names could be "year", "H1N1", "H2N2", "H3N2", "vaccinated" to include probabilities of imprinting by vaccine, or "year", "3C.3A", "not_3C.3A" to calculate clade-specific probabilities. Do not include a naive column. Any number of imprinting types is allowed, but the code is not optimized to run efficiently when the number of categories is very large. Frequencies within the column must be supplied by the user. See Vieira et al. 2021 for methods to estimate circulation frequencies from sequence databases like GISAID or the NCBI Sequence Database.

See vignette("custom-imprinting-types") for use of a custom annual_frequencies input.

Value

If format=long (the default), a long tibble with columns showing the imprinting subtype (H1N1, H2N2, H3N2, or naive), the year of observation, the country, the birth year, and the imprinting probability.
If format=wide, a wide tibble with each row representing a country, observation year, and birth year, and with a column for each influenza A subtype (H1N1, H2N2, and H3N2), or the probability that someone born in that year remains naive to influenza and has not yet imprinted. For cohorts >12 years old in the year of observation, the probability of remaining naive is 0, and the subtype-specific probabilities are normalized to sum to 1. For cohorts <=12 years old in the year of observation, the probability of remaining naive is non-zero. For cohorts not yet born at the time of observation, all output probabilities are 0.

Examples

# ===========================================================
# Get imprinting probabilities for one country and year
get_imprinting_probabilities(2022, "United States")
# ===========================================================
# Return the same outputs in wide format
get_imprinting_probabilities(2022,
  "United States",
  df_format = "wide"
)

# ===========================================================
# Get imprinting probabilities for one country and year
get_imprinting_probabilities(2022, "United States")
# ===========================================================
# Return the same outputs in wide format
get_imprinting_probabilities(2022,
  "United States",
  df_format = "wide"
)

Calculate the probability imprinting occurs n years after birth

Description

Given an individual's birth year, the year of observation, and pre-calculated influenza circulation intensities, calculate the probability that the first influenza infection occurs exactly 0, 1, 2, ... 12 years after birth.

Usage

get_p_infection_year(
  birth_year,
  observation_year,
  intensity_df,
  max_year,
  baseline_annual_p_infection = 0.28
)
get_p_infection_year(
  birth_year,
  observation_year,
  intensity_df,
  max_year,
  baseline_annual_p_infection = 0.28
)

Arguments

`birth_year`	year of birth (numeric). Must be between 1918 and the current calendar year.
`observation_year`	year of observation, which affects the birth cohort's age.
`intensity_df`	data frame of annual intensities, output by `get_country_intensity_data()`.
`max_year`	maximum year for which to output probabilities. Must be greater than or equal to observation_year. (If in doubt, set equal to observation year.)
`baseline_annual_p_infection`	average annual probability of primary infection. The default, 0.28, was estimated using age-seroprevalence data in doi:10.1126/science.aag1322Gostic et al. Science, (2016).

Details

The probability of primary influenza infection n years after birth is calculated based on a modified geometric distribution: let p be the average annual probability of a primary influenza infection. Then the probability that primary infection occurs n=0,1,2,... years after birth is $p*(1-p)^{n}$ .

This function modifies the geometric model above to account for changes in annual circulation intensity, so that annual probabilities of primary infection $p_i$ are scaled by the intensity in calendar year i. Details are given in doi:10.1126/science.aag1322Gostic et al. Science, (2016).

Value

a vector whose entries show the probability that a person born in year 0 was first infected by influenza in year 0, 1, 2, 3, ...12 We only consider the first 13 probabilities (i.e. we assume everyone imprints before age 13. These outputs are not normalized, so the vector sum asymptotically approaches one, but is not exactly equal to one. For cohorts born <13 years prior to the year of observation, the output vector will have less than 13 entries.

Examples

# For a cohort under 12 years old and born in 2000, return the
# probabilities of primary infection in 2000, 2001, ... 2012:
get_p_infection_year(
  birth_year = 2000,
  observation_year = 2022,
  intensity_df = get_country_intensity_data("Canada", 2022),
  max_year = 2022
)

# If the cohort is still under age 12 at the time of observation, return
# a truncated vector of probabilities:
get_p_infection_year(
  birth_year = 2020,
  observation_year = 2022,
  intensity_df = get_country_intensity_data("Mexico", 2022),
  max_year = 2022
)

# For a cohort under 12 years old and born in 2000, return the
# probabilities of primary infection in 2000, 2001, ... 2012:
get_p_infection_year(
  birth_year = 2000,
  observation_year = 2022,
  intensity_df = get_country_intensity_data("Canada", 2022),
  max_year = 2022
)

# If the cohort is still under age 12 at the time of observation, return
# a truncated vector of probabilities:
get_p_infection_year(
  birth_year = 2020,
  observation_year = 2022,
  intensity_df = get_country_intensity_data("Mexico", 2022),
  max_year = 2022
)

Return raw flu surveillance data for a specific WHO region

Description

Load and return a tibble containing raw influenza surveillance data, aggregated across all countries in the WHO region of interest. Data are from WHO Flu Mart.

Usage

get_regional_inputs_1997_to_present(region, max_year)
get_regional_inputs_1997_to_present(region, max_year)

Arguments

`region`	name of WHO region. Run `show_available_regions()` for a list of options.
`max_year`	results will be output for all available years up to `max_year`.

Value

A tibble with the following columns:

WHOREGION: name of WHO region.
Year: calendar year .
n_H1N1, n_H2N2, n_H3N2: number of influenza specimens that tested positive for each influenza A subtype.
n_A: total specimens positive for influenza A (= n_H1N1 + n_H2N2 + n_H3N2).
n_BYam, n_BVic: number of specimens positive for each lineage of influenza B: Victoria or Yamagata.
n_B: total specimens positive for influenza B.
n_processed: total specimens processed.

Examples

get_regional_inputs_1997_to_present("americas", 2017)

get_regional_inputs_1997_to_present("americas", 2017)

Get country-independent flu circulation data for 1918-1996

Description

get_template_data() returns a tibble showing the fraction of influenza A cases caused by subtype H1N1, H2N2, or H3N2 in each year from 1918-1996. These data are country-independent. Country-specific data are only available from 1997 on.

For years 1918-1976 only one influenza A subtype circulated, so all fractions are 0 or 1.
From 1977-1996 H1N1 and H3N2 both circulated. get_template_data() reports the fraction of influenza A-positive specimens of each subtype observed in US flu surveillance. See Thompson et al. JAMA, 2003, Table 1.
Country-specific data from WHO Flu Mart will be appended to this template in later steps.

Usage

get_template_data()
get_template_data()

Value

A tibble with the following columns:

year
A/H1N1, A/H2N2, and A/H3N2 show the fraction of influenza cases caused by each subtype.
A = A/H1N1 + A/H2N2 + A/H3N2
B is a placeholder for future calculate of influenza B imprinting probabilities, which currently contains NA.
group1 and group2 show the fraction of cases caused by group 1 subtypes (H1N1 and H2N2), or group 2 (H3N2).
data_from notes the data source.

Look up a country's WHO region

Description

Look up a country's WHO region

Usage

get_WHO_region(this.country)
get_WHO_region(this.country)

Arguments

this.country

name of the input country (a string)

Value

Name of the corresponding WHO region

Examples

get_WHO_region("Germany")
get_WHO_region("China")
get_WHO_region("Germany")
get_WHO_region("China")

Plot imprinting probabilities for up to five country-years

Description

For each country and year, generate two plots:

A stacked barplot, where each bar represents a birth cohort, and the colors within the bar show the probabilities that someone born in that cohort has a particular imprinting status, for the first observation year.
A lineplot showing the age-specific probability of imprinting to H3N2 in the first and last observation year. When the data contain more than one observation year, this plot shows how cohorts age over time.

Usage

plot_many_country_years(imprinting_df)
plot_many_country_years(imprinting_df)

Arguments

imprinting_df

A long data frame of imprinted probabilities output by get_imprinting_probabilities(). Up to five countries and an arbitrary span of years can be plotted.

Value

No return value. Opens a plot of the data frame.

Examples

imprinting_df <- get_imprinting_probabilities(
  observation_years = c(1920, 1921),
  countries = c("Oman", "Indonesia")
)
plot_many_country_years(imprinting_df)
imprinting_df <- get_imprinting_probabilities(
  observation_years = c(1920, 1921),
  countries = c("Oman", "Indonesia")
)
plot_many_country_years(imprinting_df)

Plot imprinting probabilities for a single country and year

Description

Generate a stacked barplot, where each bar represents a birth cohort, and the colors within the bar show the probabilities that someone born in that cohort has a particular imprinting status. If the data frame contains more than one country or observation year, the first country-year is plotted by default. Specify other countries and years using the country and year options.

Usage

plot_one_country_year(imprinting_df, country = NULL, year = NULL)
plot_one_country_year(imprinting_df, country = NULL, year = NULL)

Arguments

`imprinting_df`	A long data frame of imprinted probabilities output by `get_imprinting_probabilities()`. If the data frame contains more than one country and year, on the first will be plotted.
`country`	An optional country name to plot. The input country name must exist in the imprinting_df.
`year`	Similar to country, and optional input specifying the year for which to plot.

Value

No return value. Opens a plot of the data frame.

Examples

# Generate imprinting probabilities for one country and year
imprinting_df <- get_imprinting_probabilities(
  observation_years = 1920,
  countries = "Aruba"
)
plot_one_country_year(imprinting_df)

# If we generate probabilities for more than one country and year,
imprinting_df <- get_imprinting_probabilities(
  observation_years = c(1922, 1925),
  countries = c(
    "Algeria",
    "South Africa"
  )
)

# The default is to plot the first country year in the outputs
plot_one_country_year(imprinting_df)

# Or, specify a country and year of interest (both must exist in the
# imprinting_df).
plot_one_country_year(imprinting_df,
  country = "South Africa",
  year = 1925
)

# Generate imprinting probabilities for one country and year
imprinting_df <- get_imprinting_probabilities(
  observation_years = 1920,
  countries = "Aruba"
)
plot_one_country_year(imprinting_df)

# If we generate probabilities for more than one country and year,
imprinting_df <- get_imprinting_probabilities(
  observation_years = c(1922, 1925),
  countries = c(
    "Algeria",
    "South Africa"
  )
)

# The default is to plot the first country year in the outputs
plot_one_country_year(imprinting_df)

# Or, specify a country and year of interest (both must exist in the
# imprinting_df).
plot_one_country_year(imprinting_df,
  country = "South Africa",
  year = 1925
)

Show a list of all available countries

Description

Lists all available countries, with valid spelling and formatting. Each country in the list matches or can be mapped to a country with data in WHO Flu Mart. (Note: for convenience, this package sometimes uses different country names or spellings than Flu Mart.)

Usage

show_available_countries()
show_available_countries()

Value

A data frame of valid country names.

Examples

show_available_countries()
show_available_countries()

Show all WHO regions

Description

Lists all WHO regions, with valid spelling and formatting.

Usage

show_available_regions()
show_available_regions()

Value

A data frame of valid region names.

Examples

show_available_regions()
show_available_regions()

Package 'imprinting'

Help Index

Get data on the relative circulation of each influenza A subtype

Description

Usage

Arguments

Details

Value

See Also

Examples

Return raw flu surveillance data for a specific country

Description

Usage

Arguments

Value

Examples

Get the relative intensity of influenza A circulation

Description

Usage

Arguments

Details

Value

Calculate imprinting probabilities

Description

Usage

Arguments

Details

Value

Examples

Calculate the probability imprinting occurs n years after birth

Description

Usage

Arguments

Details

Value

Examples

Return raw flu surveillance data for a specific WHO region

Description

Usage

Arguments

Value

Examples

Get country-independent flu circulation data for 1918-1996

Description

Usage

Value

See Also

Look up a country's WHO region

Description

Usage

Arguments

Value

Examples

Plot imprinting probabilities for up to five country-years

Description

Usage

Arguments

Value

Examples

Plot imprinting probabilities for a single country and year

Description

Usage

Arguments

Value

Examples

Show a list of all available countries

Description

Usage

Value

Examples

Show all WHO regions

Description

Usage

Value

Examples