Getting started with the amp.dm package

Introduction

This document is intended to get you started with using the amp.dm package. This package is developed to ease the process of creating NONMEM datasets, but can in principle be used for any other dataset within the field of pharmacometrics.

Constructing an analysis data set is highly data driven and the strategy depends in great extend on the design of a study. However, certain steps are necessary in almost all cases. This package contains functions to help with these steps. An important part when coding in the pharmaceutical industry, is logging and documenting. The amp.dm package include various functions to help in this process.

Documentation and logging

An important part of pharmacometric analyses is the documentation and logging of the various steps that have been performed. This is important when communicating between data management and modelers, as well for submission purposes. Primary information regarding the meaning of variables, units of measurements or (de)coding of categories is key in understanding the data. Furthermore, information regarding records that have been dropped/added is essential. Other information like statistics or system information provide a complete overview of the data management process.

In the base of this, is construction of data sets using rmarkdown. This workflow enable to easily add comments regarding the data management process. Also providing various types of tables, with important information is easily done here. On top of this, amp.dm has various functions that log information or present it within a rmarkdown document.

functions logging results

The package has a few functions that log results, which can be used to add in the documentation at a later stage. These functions are mainly wrappers around existing functions but have additional options for logging. See below for the 3 main functions that are available

library(dplyr)
library(amp.dm)

xmpl <- system.file("example/NM.theoph.V1.csv",package="amp.dm")

# The read data function can read most common formats, for less common formats
# a manual function can passed to enable documenting the process
dat  <- read_data(xmpl, comment="Read example data")
ℹ Read in 'C:/Rlibs/amp.dm/example/NM.theoph.V1.csv' which has 288 records and 19 variables
# We can filter data with logging
dat2 <- filterr(dat,STIME<2, comment = "remove time-points") %>%
  select(ID,STIME) %>% mutate(FLAG=1)
ℹ Filter applied with 168 record(s) deleted
# We can also join with logging 
dat3 <- left_joinr(dat2, dat, comment = "example join")
Joining with `by = join_by(ID, STIME)`
ℹ Output data contains 168 records
ℹ dat2 contained 120 records
ℹ dat contained 288 records
! Be aware for possible cartesian product

The functions above will provide some additional information in the console. On top of this, all relevant information is saved in the package environment and can be shown using the get_log function:

get_log()
$filterr_nfo
  datain    coding datainrows dataoutrows rowsdropped            comment
1    dat STIME < 2        288         120         168 remove time-points

$joinr_nfo
  datainl datainr datainrowsl datainrowsr dataoutrowsl dataoutrows      comment
1    dat2     dat         120         288            0         168 example join

$read_nfo
                                    datain datainrows dataincols
1 C:/Rlibs/amp.dm/example/NM.theoph.V1.csv        288         19
            comment
1 Read example data

Besides the functions above there are two other functions that can be used for logging and documentation: 1. The cmnt function can be used to provide a comment regarding a piece of code within a large code block. This can then be presented after a code chunk (using cmnt_print). This is mainly useful to list items that need special attention 2. The srce function can be used to identify where certain variables derive from. This information can be used later on in the documentation, which is particularly useful for registration purposes

cmnt("**Be aware** that *ID 1* is removed using `subset`")
dat4 <- subset(dat,ID!=1, select=-BMI)

srce(BMI,c(dat4.WEIGHT,dat4.HEIGHT),'d')
dat4$BMI <- dat4$WEIGHT/(dat4$HEIGHT)^2 
# Note it is easier to directly use inline code, e.g.: `r cmnt_print()` 
cat(cmnt_print())

Assumptions and special attention:

# This is also available in tabulation functions e.g. define_tbl
get_log()$srce_nfo
  variable type                   source
1      BMI    d dat4.WEIGHT, dat4.HEIGHT

Handling of attributes

Data attributes hold vital information regarding the meta data of a constructed data set. Mainly an explanation on variables, units and the way they were constructed are key. Additionally, mainly for NONMEM analysis, it is important to provide an explanation for categorical variables. NONMEM can only handle numeric values, these means that categorical data like gender and country should be re-coded as a numeric. The meaning of these categories are important to understand the content of the data.

Data attributes can be created in an excel file. In such a file all the variables of a data set are listed with the corresponding meta information. When a data set is constructed the meta data can be obtained (using the attr_xls function) and used in various ways which is explained further on. A template of such an excel file is available in the package (see system.file("example/Attr.Template.xlsx", package="amp.dm")).

The other functions available to work with attributes in the package are:

  1. The attr_add function; this can be used to add attributes to a data set
  2. The attr_extract function; this can be used to extract attributes from a data set.
  3. The attr_factor function; this can be used to create factors for numerical/categorical variables within a data set.

Tabulation and checking

When a data set is constructed using the functions in the previous section, results can be tabulated using various functions. The define_tbl function can be used to present a table of the attributes of a data set. It typically presents the table directly usable for a ‘define.pdf’ file. Another important table for reviewing the data can be generated using the stats_df function. This function will show some simple statistics including ranges, missing data and number of categories of a data set. The counts_df can be used to show number of records or unique subjects, stratified over one or multiple variables. Finally information from the functions that log results (e.g. reading, filtering or joining data) can be tabulated using the log_df function. A more specific function to mention is the check function intended for NONMEM data implemented in check_nmdata. This function will check if a data set follows the minimum requirements to be used in a NONMEM model. You can also check for non essential requirements to could trigger for further investigations.

All of these function will created a LateX table using the general_tbl function. This ensures that results are presented nicely and uniform when placed in a rmarkdown or quarto chunk (using the “asis” option), e.g.

general_tbl(data.frame(result="this is a test"))
\begin{longtable}{l}
\caption{General table} \\ 
  \toprule result \\ 
  \midrule\endhead this is a test \\ 
  \hline
\end{longtable}

Analysis functions

There are a multiple functions implemented in the package that are quite specific for NONMEM analysis. This mainly include the following:

There are other functions that are not directly restricted for NONMEM usage but are often used to create common variables. For example, the egfr function to calculate the estimated glomerular filtration rate using different formulas or weight_height to calculate various metrics like BMI, LBM and FFM.

Conclusion

Although there are other functions available in the package. This vignette should provide a solid starting point to be able to use the package. Additionally, the example study vignette will provide a practical example on how functions can be used and how the final documentation of such a dataset will look like.