Package {parasiteR}


Type: Package
Title: A Theorical-Practical Approach to Parasitological Data Analysis
Description: Standardizes and streamlines the processing of parasitological data by integrating descriptive analyses of parasite count distributions, automated calculation of parasitological indices and their dispersion measures, and intuitive visualizations for representing these metrics (Bush et al. 1997 <doi:10.2307/3284227>, Reiczigel et al. 2019 <doi:10.1016/j.pt.2019.01.003>).
License: GPL (≥ 3)
RoxygenNote: 7.3.3
Version: 1.0
Encoding: UTF-8
Imports: rlang, ggplot2, magrittr, dplyr, tidyr, stats, BlakerCI, boot, readr
Depends: R (≥ 3.5)
LazyData: true
NeedsCompilation: no
Packaged: 2026-05-04 20:42:40 UTC; Thermaltake
Author: Exequiel Oscar Furlan ORCID iD [aut], Juan Manuel Cabrera ORCID iD [aut, cre, cph], Elisa Helman ORCID iD [aut]
Maintainer: Juan Manuel Cabrera <juan.cabrera@uner.edu.ar>
Repository: CRAN
Date/Publication: 2026-05-13 07:40:02 UTC

Pipe operator

Description

See magrittr::%>% for details.

Usage

lhs %>% rhs

Arguments

lhs

A value or the magrittr placeholder.

rhs

A function call using the magrittr semantics.

Value

The result of calling 'rhs(lhs)'.


Mean or median abundance estimation and confidence intervals

Description

This function calculates point estimates and confidence intervals (CIs) for parasite abundance, using either the mean or the median as a measure of central tendency. Confidence intervals are estimated via a non-parametric bootstrap approach based on resampling (permutations) of the observed data. Specifically, the function implements bias-corrected and accelerated (BCa) bootstrap intervals, which adjust for both bias and skewness in the bootstrap distribution. This approach does not assume a specific underlying distribution and is particularly robust for overdispersed and zero-inflated parasitological data.

Usage

para_abundance_CI(dataset, c_median = TRUE,
 sp_cols, group_vars = NULL,  perm = 2000, decimal_places = 2,
 combine_ci = FALSE,  conf_level = 0.95, verbose = FALSE)

Arguments

dataset

Data frame with parasitological data.

c_median

Logical. If TRUE, the results will include the median as a central tendency of measure; if FALSE, the results will include the mean of the data.

sp_cols

Vector with the names of the columns containing abundance of parasites (taxa) to calculate the parasitological descriptors.

group_vars

Vector with the names of categorical variables used to define groups (e.g., "Sex", "Site"). Default = NULL.

perm

Number of permutations to perform for confidence interval estimation. Default = 2000.

decimal_places

Number of decimal places to include in the calculation. Default = 2.

combine_ci

Logical. If TRUE, the interval is expressed as a single column (min - max). If FALSE, the interval is split into separate lower and upper limit columns.

conf_level

Confidence level for the interval estimation (e.g., 0.95 for 95% CI).

verbose

A logical value indicating if progress messages should be given. Default = FALSE.

Details

Parasite abundance is defined as the number of individuals of a given parasite taxon per host. For each taxon, abundance metrics are calculated based on the observed counts across hosts. The function reshapes the dataset into long format and computes abundance statistics for each parasite taxon and grouping combination (if specified). The following are estimated:

Depending on the argument c_median, the function calculates:

Confidence intervals are estimated using a non-parametric bootstrap approach. Specifically, bias-corrected and accelerated (BCa) bootstrap intervals are computed by resampling the observed abundance values with replacement a specified number of times perm. This method adjusts for both bias and skewness in the bootstrap distribution. Statistical considerations: parasite abundance data are typically overdispersed and zero-inflated, making parametric assumptions inappropriate in many cases. The use of bootstrap methods allows robust estimation of confidence intervals without assuming normality. Mean abundance is sensitive to extreme values, whereas median abundance provides a more robust measure under highly skewed distributions. When sample size is small, bootstrap confidence intervals may be unstable or wide, and results should be interpreted with caution. The interpretation of results remains the responsibility of the user.

Value

A data frame containing abundance estimates and confidence intervals for each parasite taxon, either globally or by group. The following variables are returned:

Author(s)

Juan Manuel Cabrera, Exequiel Furlan and Elisa Helman

References

Bush, A.O., Lafferty, K.D., Lotz, J.M., Shostak, A.W. (1997). Parasitology meets ecology on its own terms: Margolis revisited. Journal of Parasitology, 83(4), 575–583.

Reiczigel, J., Marozzi, M., Fabian, I., Rózsa, L. (2019). Biostatistics for parasitologists – a primer to quantitative parasitology. Trends in Parasitology, 35(4), 277–281.

Examples

#Calculate the CI for the median abundance
med_abun_CI <- para_abundance_CI(para_data$dataset,
                                c_median = TRUE,
                                sp_cols =  c("Sp1"),
                                group_vars = c("Site"),
                                decimal_places = 2,
                                conf_level = 0.95,
                                combine_ci = TRUE,
                                verbose = TRUE)
med_abun_CI
#Calculate the CI for the mean abundance
mean_abun_CI <- para_abundance_CI(para_data$dataset,
                                 c_median = FALSE,
                                 sp_cols =  c("Sp1"),
                                 group_vars = c("Site"),
                                 decimal_places = 2,
                                 conf_level = 0.95,
                                 combine_ci = TRUE,
                                 verbose = TRUE)
mean_abun_CI


Simulated parasite abundance data for multiple species across hosts and sites

Description

This dataset contains hypothetical generated parasite count data representing multiple parasite species infecting individual hosts across different sampling sites. Each row corresponds to a single sampling unit (i.e., an individual host), and parasite abundance is recorded as counts for each parasite species (Sp1–Sp4).

Usage

para_data

Format

## 'para_data' A list with 4 elements

Details

The dataset was intentionally constructed to reproduce common scenarios encountered in parasitological studies, rather than to reflect a specific empirical system. These scenarios include:

This structure allows testing and demonstrating the behavior of analytical functions under realistic and edge-case conditions.


Parasitological descriptors and summary statistics

Description

Computes standard parasitological descriptors and classical summary statistics from parasite abundance data, optionally stratified by grouping variables.

Usage

para_descriptors(dataset, sp_cols = NULL, group_vars = NULL,
 decimal_places = 2,  verbose = FALSE)

Arguments

dataset

Data frame with parasitic abundance data.

sp_cols

Vector with the names or indices of the species columns.

group_vars

Vector with the names of the categorical variables to consider (e.g., 'Sex', 'Site').

decimal_places

Number of decimal places to round the values.

verbose

A logical value indicating if progress messages should be given.

Details

The para_descriptors function provides a practical and efficient way to estimate the main parasitological descriptors commonly used in ecological and parasitological studies. Calculations can be performed globally or at different hierarchical levels defined by grouping variables.

The function computes descriptors based on parasite abundance per sampling unit (e.g., host, site, or pooled hosts), following standard definitions:

Statistical validity and sample size considerations: The estimation of summary statistics is subject to fundamental statistical constraints related to sample size and variability.

These constraints reflect a fundamental principle: statistical descriptors require variability, and variability requires more than one observational unit. When this condition is not met, results should be interpreted cautiously, and no generalization beyond the observed case is justified. Handling of special cases: The function automatically adjusts calculations depending on data availability:

The selection and interpretation of descriptors remain the responsibility of the user, particularly when working with small sample sizes.

Value

A data frame containing the calculated parasitological descriptors for each parasite taxon, either globally or by group (if grouping variables are specified). The following variables are returned:

Author(s)

Juan Manuel Cabrera, Exequiel Furlan and Elisa Helman

References

Bush, A.O., Lafferty, K.D., Lotz, J.M., Shostak, A.W. (1997). Parasitology meets ecology on its own terms: Margolis revisited. Journal of Parasitology, 83(4), 575–583.

Reiczigel, J., Marozzi, M., Fabian, I., Rózsa, L. (2019). Biostatistics for parasitologists – a primer to quantitative parasitology. Trends in Parasitology, 35(4), 277–281.

Examples


gral_descriptor <- para_descriptors(para_data$dataset,
                                   sp_cols =  c("Sp1", "Sp2", "Sp3", "Sp4"),
                                   group_vars = c("Site","Sp_host"),
                                   decimal_places = 2,
                                   verbose = FALSE)

gral_descriptor


Exploratory plots of parasite abundance distributions

Description

Generates exploratory visualizations of parasite abundance distributions across taxa and optional grouping variables. The function produces histograms combined with kernel density curves to facilitate the assessment of distributional patterns, including skewness, dispersion and zero inflation.

Usage

para_explo_abund(dataset, sp_cols, group_vars = NULL,
 bins = 30, n_col = NULL, verbose = FALSE)

Arguments

dataset

Data frame containing parasite data.

sp_cols

Vector with the names of the columns containing parasite abundance (taxa) to be plotted.

group_vars

Vector with the names of categorical variables used to define groups (e.g., "Sex", "Site"). Default = NULL.

bins

Integer specifying the number of bins used in the histogram. Higher values provide finer resolution but may introduce noise, while lower values produce smoother but less detailed distributions. Default = 30.

n_col

Integer specifying the number of columns in the faceted plot layout. If NULL, the number of columns is determined automatically by ggplot2.Default = NULL.

verbose

A logical value indicating if progress messages should be given. Default = FALSE.

Details

The function reshapes the input dataset into a long format, where parasite taxa are treated as a single variable and their abundances as observations. For each parasite taxon and combination of grouping variables (if provided), the function generates:

Both elements are scaled to represent density, allowing direct comparison between distributions. These plots are intended for exploratory purposes and should not be used as formal inference tools. Faceting is applied to display each taxon and grouping combination in separate panels. Special cases are handled as follows:

All plots use independent scales (free scales) to better represent the variability within each facet.

Value

A ggplot2 object containing the generated faceted plots. This object can be further customized using standard ggplot2 functions.

Author(s)

Juan Manuel Cabrera, Exequiel Furlan and Elisa Helman

Examples


#Species 1 and 2

para_explo_abund (para_data$dataset,
                 sp_cols = c("Sp1", "Sp2"),
                 group_vars = c("Site", "Sp_host"),
                 bins = 30,
                 n_col = 4,
                 verbose = TRUE)

#Species 3 and 4

para_explo_abund (para_data$dataset,
                 sp_cols = c("Sp3", "Sp4"),
                 group_vars = c("Site", "Sp_host"),
                 bins = 30,
                 n_col = 4,
                 verbose = TRUE)


Exploratory plots of parasite prevalence

Description

Generates exploratory visualizations of parasite prevalence across taxa and optional grouping variables. The function produces stacked bar plots showing the proportion of infested and non-infested hosts, facilitating the assessment of prevalence patterns across hierarchical combinations.

Usage

para_explo_prev(dataset, sp_cols, group_vars = NULL,
 n_col = NULL, verbose = FALSE)

Arguments

dataset

Data frame containing parasite data.

sp_cols

Vector with the names of the columns containing parasite abundance (taxa) to be plotted.

group_vars

Vector with the names of categorical variables used to define groups (e.g., "Sex", "Site"). Default = NULL.

n_col

Integer specifying the number of columns in the faceted plot layout. If NULL, the number of columns is determined automatically by ggplot2.Default = NULL.

verbose

A logical value indicating if progress messages should be given. Default = FALSE.

Details

The function reshapes the dataset into long format and calculates prevalence as the proportion of infested hosts (hosts with parasite counts > 0) relative to the number of analyzed hosts for each parasite taxon and grouping combination. For each combination, the function generates:

Faceting is applied to display each parasite taxon and grouping combination in separate panels. Special cases are handled as follows:

All proportions are expressed on a 0–1 scale. These plots are intended for exploratory purposes and should not be used as formal inference tools.

Value

A ggplot2 object containing the generated faceted stacked bar plots. This object can be further customized using standard ggplot2 functions.

Author(s)

Juan Manuel Cabrera, Exequiel Furlan and Elisa Helman

Examples


#Species 1 and 2

para_explo_prev(para_data$dataset,
               sp_cols = c("Sp1", "Sp2"),
               group_vars = c("Site", "Sp_host"),
               n_col = 4,
               verbose = TRUE)

#Species 3 and 4

para_explo_prev(para_data$dataset,
               sp_cols = c("Sp3", "Sp4"),
               group_vars = c("Site", "Sp_host"),
               n_col = 4,
               verbose = TRUE)


Mean or median intensity estimation and confidence intervals

Description

This function calculates point estimates and confidence intervals (CIs) for parasite intensity, using either the mean or the median as a measure of central tendency. Confidence intervals are estimated via a non-parametric bootstrap approach based on resampling (permutations) of the observed data. Specifically, the function implements bias-corrected and accelerated (BCa) bootstrap intervals, which adjust for both bias and skewness in the bootstrap distribution. This approach does not assume a specific underlying distribution and is particularly robust for overdispersed and zero-inflated parasitological data.

Usage

para_intensity_CI(dataset, c_median = TRUE, sp_cols, group_vars = NULL,
 perm = 2000, decimal_places = 2, combine_ci = FALSE,
 conf_level = 0.95, verbose = FALSE)

Arguments

dataset

Data frame with parasitological data.

c_median

Logical. If TRUE, the results will include the median as a central tendency of measure; if FALSE, the results will include the mean of the data.

sp_cols

Vector with the names of the columns containing abundance of parasites (taxa) to calculate the parasitological descriptors.

group_vars

Vector with the names of categorical variables used to define groups (e.g., "Sex", "Site"). Default is NULL.

perm

Number of permutations to perform for confidence interval estimation. Default is 2000.

decimal_places

Number of decimal places to include in the calculation. Default is 2.

combine_ci

Logical. If TRUE, the interval is expressed as a single column (min - max). If FALSE, the interval is split into separate lower and upper limit columns

conf_level

Confidence level for interval estimation (e.g., 0.95 for 95% confidence intervals).

verbose

A logical value indicating if progress messages should be given.

Details

Parasite intensity is defined as the number of individuals of a given parasite taxon per infested host. For each taxon, intensity metrics are calculated based only on hosts with parasite counts greater than zero. The function reshapes the dataset into long format and computes intensity statistics for each parasite taxon and grouping combination (if specified). The following are estimated:

Depending on the argument c_median, the function calculates:

Confidence intervals are estimated using a non-parametric bootstrap approach. Specifically, bias-corrected and accelerated (BCa) bootstrap intervals are computed by resampling the observed intensity values with replacement a specified number of times perm. This method adjusts for both bias and skewness in the bootstrap distribution. Statistical considerations: parasite intensity data are typically right-skewed and may exhibit high variability due to aggregation among hosts. The use of bootstrap methods allows robust estimation of confidence intervals without assuming normality. Mean intensity is sensitive to extreme values, whereas median intensity provides a more robust measure under highly skewed distributions. When the number of infested hosts is small, bootstrap confidence intervals may be unstable or wide, and results should be interpreted with caution. The interpretation of results remains the responsibility of the user.

Value

A data frame containing abundance estimates and confidence intervals for each parasite taxon, either globally or by group. The following variables are returned:

Author(s)

Juan Manuel Cabrera, Exequiel Furlan and Elisa Helman

References

Bush, A.O., Lafferty, K.D., Lotz, J.M., Shostak, A.W. (1997). Parasitology meets ecology on its own terms: Margolis revisited. Journal of Parasitology, 83(4), 575–583.

Reiczigel, J., Marozzi, M., Fabian, I., Rózsa, L. (2019). Biostatistics for parasitologists – a primer to quantitative parasitology. Trends in Parasitology, 35(4), 277–281.

Examples

# Calculate of the CI for the median intensity
med_int_CI <- para_intensity_CI(para_data$dataset,
                               c_median = TRUE,
                               sp_cols =  c("Sp1"),
                               group_vars = c("Site"),
                               decimal_places = 2,
                               conf_level = 0.95,
                               combine_ci = TRUE,
                               verbose = TRUE)
med_int_CI

mean_int_CI <- para_intensity_CI(para_data$dataset,
                                c_median = FALSE,
                                sp_cols =  c("Sp1"),
                                group_vars = c("Site"),
                                decimal_places = 2,
                                conf_level = 0.95,
                                combine_ci = TRUE,
                                verbose = TRUE)
mean_int_CI


Visualization of parasitological descriptor with confidence intervals

Description

This function generates graphical representations of parasitological estimates (abundance, intensity, or prevalence) including their associated confidence intervals. It supports multiple input formats and automatically detects the response variable and confidence interval structure. The function allows flexible grouping, species filtering, and visualization either as faceted plots or separate panels. The function is designed to be compatible with outputs from different estimation functions within the package (e.g., para_abundance_CI, para_intensity_CI, para_prevalence_CI). Automatic detection of confidence intervals ensures flexibility across workflows. Interpretation of graphical outputs remains the responsibility of the user. It automatically detects:

Usage

para_plot_CI(para_data, group_vars, sp_cols = NULL, descriptor = NULL,
 lower_ci = NULL, upper_ci = NULL, point_color = "blue", line_size = 1,
 point_size = 3, n_cols = 1, include_zeros = TRUE, separate_plots = FALSE)

Arguments

para_data

Data frame containing parasitological descriptors and confidence intervals estimated with one of the following functions: para_abundance_CI, para_intensity_CI, para_prevalence_CI.

group_vars

Character vector specifying the variable(s) to be used on the x-axis. Multiple variables will be combined.

sp_cols

Optional vector of parasite taxa to include in the plot. Default is NULL (all taxa are included).

descriptor

Name of the variable to be plotted on the y-axis. If NULL, the function automatically detects a suitable variable (e.g., prevalence, MeanA, MedA, MeanI, MedI).

lower_ci

Optional names of the columns containing the lower confidence. If NULL, the function automatically detects and extracts them. Default is NULL.

upper_ci

Optional names of the columns containing the upper confidence. If NULL, the function automatically detects and extracts them. Default is NULL.

point_color

Color of the points. Default is "blue".

line_size

Line width of the confidence interval bars. Default is 1.

point_size

Size of the points. Default is 3.

n_cols

Number of columns used in faceted plots. Default is 1.

include_zeros

Logical. If FALSE, zero values are excluded from the plot. Default is TRUE.

separate_plots

Logical. If TRUE, returns a list of plots (one per species). If FALSE, produces a faceted plot. Default is FALSE.

Details

When multiple grouping variables are provided in x_var, they are combined into a single factor for visualization. Confidence intervals are displayed as vertical error bars, and point estimates are overlaid. When multiple parasite taxa are present, results are displayed using faceting or as separate plots.

Value

A ggplot object or a list of ggplot objects representing the estimated values and their confidence intervals.

Author(s)

Juan Manuel Cabrera, Exequiel Furlan and Elisa Helman

References

Bush, A.O., Lafferty, K.D., Lotz, J.M., Shostak, A.W. (1997). Parasitology meets ecology on its own terms: Margolis revisited. Journal of Parasitology, 83(4), 575–583.

Reiczigel, J., Marozzi, M., Fabian, I., Rózsa, L. (2019). Biostatistics for parasitologists – a primer to quantitative parasitology. Trends in Parasitology, 35(4), 277–281.


Parasite prevalence estimation and confidence intervals

Description

Estimates parasite prevalence and corresponding confidence intervals from parasite abundance data, optionally stratified by grouping variables. Two types of confidence intervals are provided: exact binomial intervals and Blaker intervals, allowing robust inference across a wide range of sample sizes and prevalence values.

Usage

para_prevalence_CI(dataset, sp_cols, group_vars = NULL, decimal_places = 2,
 conf_level = 0.95, output_type = "proportion", combine_ci = FALSE, verbose = FALSE)

Arguments

dataset

Data frame with parasitological data.

sp_cols

Vector with the names of the columns containing abundance of parasites (taxa) to calculate the parasitological descriptors.

group_vars

Vector with the names of categorical variables used to define groups (e.g., "Sex", "Site"). Default is NULL.

decimal_places

Number of decimal places to round the values. Default is 2.

conf_level

Confidence level for interval estimation (e.g., 0.95 for 95% confidence intervals).

output_type

Format of the result: either "proportion" or "percentage". Default is "proportion".

combine_ci

Logical. If TRUE, the interval is expressed as a single column (min - max). If FALSE, the interval is split into separate lower and upper limit columns.

verbose

A logical value indicating if progress messages should be given. Default = FALSE

Details

Prevalence is defined as the proportion of hosts infected with a given parasite taxon:

P = \frac{nH_{inf}}{nH}

where:

The function reshapes the dataset into long format and computes prevalence for each parasite taxon and grouping combination (if specified). Two types of confidence intervals are calculated:

Statistical considerations:

The interpretation of results, particularly under small sample sizes, remains the responsibility of the user.

Value

A data frame containing prevalence estimates and confidence intervals for each parasite taxon, either globally or by group. The following variables are returned:

Author(s)

Juan Manuel Cabrera, Exequiel Furlan and Elisa Helman

References

Bush, A.O., Lafferty, K.D., Lotz, J.M., Shostak, A.W. (1997). Parasitology meets ecology on its own terms: Margolis revisited. Journal of Parasitology, 83(4), 575–583.

Reiczigel, J., Marozzi, M., Fabian, I., Rózsa, L. (2019). Biostatistics for parasitologists – a primer to quantitative parasitology. Trends in Parasitology, 35(4), 277–281.

Examples

prevalence_CI <- para_prevalence_CI(para_data$dataset,
                                   sp_cols =  c("Sp1"),
                                   group_vars = c("Site"),
                                   decimal_places = 2,
                                   conf_level = 0.95,
                                   output_type = "proportion",
                                   combine_ci = TRUE,
                                   verbose = TRUE)

prevalence_CI



Read parasite data

Description

Load data from a .CSV file

Usage

para_read_data(file_name, verbose = FALSE)

Arguments

file_name

Name of .CSV table file.

verbose

A logical value indicating if progress messages should be given.

Details

This package includes a specific function to import tables (.CSV files) into the R environment. Each row in the table should correspond to an individual host that was analyzed, while the columns may contain both quantitative and qualitative variables. Columns may represent two principal categories of variables:

Parasite abundance values must be encoded as non-negative integers. It is critical to distinguish between the following:

Value

The function returns:

dataset

A table that can be used as input for other parasiteR functions.

factors_v

A list of columns with factor values.

num_v

A list of columns with numeric values.

summ

A summary of the loaded data. Check summary() function

Author(s)

Juan Manuel Cabrera, Exequiel Furlan and Elisa Helman