Introduction

In a given CpG site from a single cell we will either have a \(C\) or a \(T\) after DNA processing conversion methods, with a different interpretation for each of the available methods. This is a binary outcome and we assume a Binomial model and use the maximum likelihood estimation method to obtain the estimates for hydroxymethylation and methylation proportions.

\(T\) reads are referred to as converted cytosine and \(C\) reads are referred to as unconverted cytosine. Conventionally, \(T\) counts are also referred to as unmethylated counts, and \(C\) counts as methylated counts. In case of Infinium Methylation arrays, we have intensities representing the methylated (M) and unmethylated (U) channels that are proportional to the number of unconverted and converted cytosines (\(C\) and \(T\), respectively). The most used summary from these experiments is the proportion \(\beta=\frac{M}{M+U}\), commonly referred to as \textit{beta-value}, which reflects the methylation level at a CpG site. Naïvely using the difference between betas from BS and oxBS as an estimate of 5-hmC (hydroxymethylated cytosine), and the difference between betas from BS and TAB as an estimate of 5-mC (methylated cytosine) can many times provide negative proportions and instances where the sum of 5-C (unmodified cytosine), 5-mC and 5-hmC proportions is greater than one due to measurement errors.

\CRANpkg{MLML2R} package allows the user to jointly estimate hydroxymethylation and methylation consistently and efficiently.

The function \Rfunction{MLML} takes as input the data from the different methods and returns the estimated proportion of methylation, hydroxymethylation and unmethylation for a given CpG site. Table 1 presents the arguments of the \Rfunction{MLML} and Table 2 lists the results returned by the function.

The function assumes that the order of the rows and columns in the input matrices are consistent. In addition, all the input matrices must have the same dimension. Usually, rows represent CpG loci and columns are the samples.

Arguments Description
\Robject{G.matrix} Unmethylated channel (Converted cytosines/ T counts) from TAB-conversion (reflecting 5-C + 5-mC).
\Robject{H.matrix} Methylated channel (Unconverted cytosines/ C counts) from TAB-conversion (reflecting True 5-hmC).
\Robject{L.matrix} Unmethylated channel (Converted cytosines/ T counts) from oxBS-conversion (reflecting 5-C + 5-hmC).
\Robject{M.matrix} Methylated channel (Unconverted cytosines/ C counts) from oxBS-conversion (reflecting True 5-mC).
\Robject{T.matrix} Methylated channel (Unconverted cytosines/ C counts) from standard BS-conversion (reflecting 5-mC+5-hmC).
\Robject{U.matrix} Unmethylated channel (Converted cytosines/ T counts) from standard BS-conversion (reflecting True 5-C).

: MLML function and random variable notation.

Value Description
\Robject{mC} maximum likelihood estimate for the 5-mC proportion
\Robject{hmC} maximum likelihood estimate for the 5-hmC proportion
\Robject{C} maximum likelihood estimate for the 5-mC proportion
\Robject{methods} the conversion methods used to produce the MLE

: Results returned from the \Rfunction{MLML} function

Worked examples

Publicly available data: oxBS and BS methods

We will use the dataset from @10.1371/journal.pone.0118202, which consists of eight DNA samples from the same DNA source treated with oxBS-BS and hybridized to the Infinium 450K array.

When data is obtained through Infinium Methylation arrays, we recommend the use of the \Biocpkg{minfi} package [@minfi], a well-established tool for reading, preprocessing and analysing DNA methylation data from these platforms. Although our example relies on \Biocpkg{minfi} and other \Bioconductor{} tools, \CRANpkg{MLML2R} does not depend on any packages. Thus, the user is free to read and preprocess the data using any software of preference and then import the intensities (or \(T\) and \(C\) counts) for the methylated and unmethylated channel (or converted and uncoverted cytosines) into \R{} in matrix format.

To start this example we will need the following packages:

library(MLML2R)
library(minfi)
## Warning: package 'GenomicRanges' was built under R version 3.3.3
## Warning: package 'S4Vectors' was built under R version 3.3.3
## Warning: package 'IRanges' was built under R version 3.3.3
## Warning: package 'XVector' was built under R version 3.3.3
library(GEOquery)

It is usually best practice to start the analysis from the raw data, which in the case of the 450K array is a \verb|.IDAT| file.

The raw files are deposited in GEO and can be downloaded by using the \Rfunction{getGEOSuppFiles}. There are two files for each replicate, since the 450k array is a two-color array. The \verb|.IDAT| files are downloaded in compressed format and need to be uncompressed before they are read by the \Rfunction{read.metharray.exp} function.

getGEOSuppFiles("GSE63179")
untar("GSE63179/GSE63179_RAW.tar", exdir = "GSE63179/idat")

list.files("GSE63179/idat", pattern = "idat")
files <- list.files("GSE63179/idat", pattern = "idat.gz$", full = TRUE)
sapply(files, gunzip, overwrite = TRUE)

The \verb|.IDAT| files can now be read:

rgSet <- read.metharray.exp("GSE63179/idat")

To access phenotype data we use the \Rfunction{pData} function. The phenotype data is not yet available from the \Robject{rgSet}.

pData(rgSet)

In this example the phenotype is not really relevant, since we have only one sample: male, 25 years old. What we do need is the information about the conversion method used in each replicate: BS or oxBS. We will access this information automatically from GEO:

if (!file.exists("GSE63179/GSE63179_series_matrix.txt.gz"))
download.file(
  "https://ftp.ncbi.nlm.nih.gov/geo/series/GSE63nnn/GSE63179/matrix/GSE63179_series_matrix.txt.gz",
  "GSE63179/GSE63179_series_matrix.txt.gz")

geoMat <- getGEO(filename="GSE63179/GSE63179_series_matrix.txt.gz",getGPL=FALSE)
pD.all <- pData(geoMat)

#Another option
#geoMat <- getGEO("GSE63179")
#pD.all <- pData(geoMat[[1]])

pD <- pD.all[, c("title", "geo_accession", "characteristics_ch1.1",
                 "characteristics_ch1.2","characteristics_ch1.3")]
pD

This phenotype data needs to be merged into the methylation data. The following commands guarantee we have the same replicate identifier in both datasets before merging.

sampleNames(rgSet) <- sapply(sampleNames(rgSet),function(x)
  strsplit(x,"_")[[1]][1])
rownames(pD) <- pD$geo_accession
pD <- pD[sampleNames(rgSet),]
pData(rgSet) <- as(pD,"DataFrame")
rgSet

The \Robject{rgSet} object is a class called \Rclass{RGChannelSet} used for two color data (green and a red channel). The input in the \Rfunction{MLML} funcion is \Rclass{MethylSet}, which contains the methylated and unmethylated signals. The most basic way to construct a \Rclass{MethylSet} is using the function \Rfunction{preprocessRaw}. Here we chose the function \Rfunction{preprocessNoob} [@noob] for background correction and construction of the \Rclass{MethylSet}.

MSet.noob<- preprocessNoob(rgSet)

After the preprocessed steps we can use \Rfunction{MLML} from the \CRANpkg{MLML2R} package.

The BS replicates are in columns 1, 3, 5, and 6 (information from pD$title). The remaining columns are from the oxBS treated replicates.

MethylatedBS <- getMeth(MSet.noob)[,c(1,3,5,6)]
UnMethylatedBS <- getUnmeth(MSet.noob)[,c(1,3,5,6)]
MethylatedOxBS <- getMeth(MSet.noob)[,c(7,8,2,4)]
UnMethylatedOxBS <- getUnmeth(MSet.noob)[,c(7,8,2,4)]

When only two methods are available, the default option of \Rfunction{MLML} function returns the exact constrained maximum likelihood estimates using the the pool-adjacent-violators algorithm (PAVA) [@ayer1955].

results_exact <- MLML(T.matrix = MethylatedBS , U.matrix = UnMethylatedBS,
                      L.matrix = UnMethylatedOxBS, M.matrix = MethylatedOxBS)

Maximum likelihood estimate via EM-algorithm approach [@Qu:MLML] is obtained with the option \verb|iterative=TRUE|. In this case, the default (or user specified) \verb|tol| is considered in the iterative method.

results_em <- MLML(T.matrix = MethylatedBS , U.matrix = UnMethylatedBS,
                   L.matrix = UnMethylatedOxBS, M.matrix = MethylatedOxBS,
                   iterative = TRUE)

The estimates are very similar for both methods:

all.equal(results_exact$hmC,results_em$hmC,scale=1)

Estimated proportions of hydroxymethylation, methylation and unmethylation for the CpGs in the dataset using the MLML function with default options.

Publicly available data: TAB and BS methods

We will use the dataset from @Thienpont2016, which consists of 24 DNA samples treated with TAB-BS and hybridized to the Infinium 450K array from newly diagnosed and untreated non-small-cell lung cancer patients (12 normoxic and 12 hypoxic tumours). The dataset is deposited under GEO accession number GSE71398.

Obtaining the data:

getGEOSuppFiles("GSE71398")
untar("GSE71398/GSE71398_RAW.tar", exdir = "GSE71398/idat")

list.files("GSE71398/idat", pattern = "idat")
files <- list.files("GSE71398/idat", pattern = "idat.gz$", full = TRUE)
sapply(files, gunzip, overwrite = TRUE)

Reading the \verb|.IDAT| files:

rgSet <- read.metharray.exp("GSE71398/idat")

The phenotype data is not yet available from the \Robject{rgSet}.

pData(rgSet)

We need to correctly identify the 24 DNA samples: 12 normoxic and 12 hypoxic non-small-cell lung cancer. We also need the information about the conversion method used in each replicate: BS or TAB. We will access this information automatically from GEO:

if (!file.exists("GSE71398/GSE71398_series_matrix.txt.gz"))
download.file(
  "https://ftp.ncbi.nlm.nih.gov/geo/series/GSE71nnn/GSE71398/matrix/GSE71398_series_matrix.txt.gz",
  "GSE71398/GSE71398_series_matrix.txt.gz")

geoMat <- getGEO(filename="GSE71398/GSE71398_series_matrix.txt.gz",getGPL=FALSE)
pD.all <- pData(geoMat)
pD <- pD.all[, c("title", "geo_accession", "source_name_ch1")]
pD$method <- sapply(pD$source_name_ch1,function(x) strsplit(as.character(x),",")[[1]][3]) 
pD$group <- sapply(pD$source_name_ch1,function(x) strsplit(as.character(x),",")[[1]][2]) 
pD$sample <- as.numeric(substr(as.character(pD$title),start=7,stop=8))

This phenotype data needs to be merged into the methylation data. The following commands guarantee we have the same replicate identifier in both datasets before merging.

sampleNames(rgSet) <- sapply(sampleNames(rgSet),function(x)   strsplit(x,"_")[[1]][1])
rownames(pD) <- as.character(pD$geo_accession)
pD <- pD[sampleNames(rgSet),]
pData(rgSet) <- as(pD,"DataFrame")
rgSet

The input in the \Rfunction{MLML} funcion is \Rclass{MethylSet}, which contains the methylated and unmethylated signals. We chose the function \Rfunction{preprocessNoob} [@noob] for background correction and construction of the \Rclass{MethylSet}.

MSet.noob<- preprocessNoob(rgSet)

We can now use \Rfunction{MLML} from the \CRANpkg{MLML2R} package.

One needs to carefully check if the columns across the different input matrices represent the same replicate. In this example, all matrices have the samples consistently represented in the columns: sample 1 in the first column, sample 2 in the second, and so forth.

BSindex <- which(pD$method == " BS-chip")
TABindex <- which(pD$method == " TAB-chip")
MethylatedBS <- getMeth(MSet.noob)[,BSindex]
UnMethylatedBS <- getUnmeth(MSet.noob)[,BSindex]
MethylatedTAB <- getMeth(MSet.noob)[,TABindex]
UnMethylatedTAB <- getUnmeth(MSet.noob)[,TABindex]

When only two methods are available, the default option of \Rfunction{MLML} function returns the exact constrained maximum likelihood estimates using the the pool-adjacent-violators algorithm (PAVA) [@ayer1955].

results_exact <- MLML(T.matrix = MethylatedBS , U.matrix = UnMethylatedBS,
                      G.matrix = UnMethylatedTAB, H.matrix = MethylatedTAB)

Maximum likelihood estimate via EM-algorithm approach [@Qu:MLML] is obtained with the option \verb|iterative=TRUE|. In this case, the default (or user specified) \verb|tol| is considered in the iterative method.

results_em <- MLML(T.matrix = MethylatedBS , U.matrix = UnMethylatedBS,
                   G.matrix = UnMethylatedTAB, H.matrix = MethylatedTAB,
                   iterative = TRUE)

The estimates for 5-hmC proportions are very similar for both methods:

all.equal(results_exact$hmC,results_em$hmC,scale=1)

The estimates for 5-mC proportions are very similar for both methods:

all.equal(results_exact$mC,results_em$mC,scale=1)

Estimated proportions of hydroxymethylation, methylation and unmethylation for the CpGs in the dataset using the MLML function with default options.

Simulated data

To illustrate the package when all the three methods are available or when any combination of only two of them are available, we will simulate a dataset.

We will use a sample of the estimates of 5-mC, 5-hmC and 5-C of the previous oxBS+BS example as the true proportions, as shown in Figure 3.

Two replicate samples with 1000 CpGs will be simulated. For CpG \(i\) in sample \(j\):

\[T_{i,j} \sim Binomial(n=c_{i,j},p=p_m+p_h)\] \[M_{i,j} \sim Binomial(n=c_{i,j}, p=p_m)\] \[H_{i,j} \sim Binomial(n=c_{i,j},p=p_h)\] \[U_{i,j}=c_{i,j}-T_{i,j}\] \[L_{i,j}=c_{i,j}-M_{i,j}\] \[G_{i,j}=c_{i,j}-H_{i,j}\] where the random variables are defined in Table 1, and \(c_{i,j}\) represents the coverage for CpG \(i\) in sample \(j\).

The following code produce the simulated data:

set.seed(112017)

index <- sample(1:dim(results_exact$mC)[1],1000,replace=FALSE) # 1000 CpGs

Coverage <- round(MethylatedBS+UnMethylatedBS)[index,1:2] # considering 2 samples

temp1 <- data.frame(n=as.vector(Coverage),
                    p_m=c(results_exact$mC[index,1],results_exact$mC[index,1]),
                    p_h=c(results_exact$hmC[index,1],results_exact$hmC[index,1]))

MethylatedBS_temp <- c()
for (i in 1:dim(temp1)[1])
{
  MethylatedBS_temp[i] <- rbinom(n=1, size=temp1$n[i], prob=(temp1$p_m[i]+temp1$p_h[i]))
}


UnMethylatedBS_sim2 <- matrix(Coverage - MethylatedBS_temp,ncol=2)
MethylatedBS_sim2 <- matrix(MethylatedBS_temp,ncol=2)


MethylatedOxBS_temp <- c()
for (i in 1:dim(temp1)[1])
{
  MethylatedOxBS_temp[i] <- rbinom(n=1, size=temp1$n[i], prob=temp1$p_m[i])
}

UnMethylatedOxBS_sim2 <- matrix(Coverage - MethylatedOxBS_temp,ncol=2)
MethylatedOxBS_sim2 <- matrix(MethylatedOxBS_temp,ncol=2)


MethylatedTAB_temp <- c()
for (i in 1:dim(temp1)[1])
{
  MethylatedTAB_temp[i] <- rbinom(n=1, size=temp1$n[i], prob=temp1$p_h[i])
}


UnMethylatedTAB_sim2 <- matrix(Coverage - MethylatedTAB_temp,ncol=2)
MethylatedTAB_sim2 <- matrix(MethylatedTAB_temp,ncol=2)

true_parameters_sim2 <- data.frame(p_m=results_exact$mC[index,1],p_h=results_exact$hmC[index,1])
true_parameters_sim2$p_u <- 1-true_parameters_sim2$p_m-true_parameters_sim2$p_h

True proportions of hydroxymethylation, methylation and unmethylation for the CpGs used to generate the datasets.

BS and oxBS methods

When only two methods are available, the default option returns the exact constrained maximum likelihood estimates using the the pool-adjacent-violators algorithm (PAVA) [@ayer1955].

library(MLML2R)
 results_exactBO1 <- MLML(T.matrix = MethylatedBS_sim2 , U.matrix = UnMethylatedBS_sim2,
 L.matrix = UnMethylatedOxBS_sim2, M.matrix = MethylatedOxBS_sim2)

Maximum likelihood estimate via EM-algorithm approach [@Qu:MLML] is obtained with the option \verb|iterative=TRUE|. In this case, the default (or user specified) \verb|tol| is considered in the iterative method.

 results_emBO1 <- MLML(T.matrix = MethylatedBS_sim2 , U.matrix = UnMethylatedBS_sim2,
 L.matrix = UnMethylatedOxBS_sim2, M.matrix = MethylatedOxBS_sim2,iterative=TRUE)

When only two methods are available, we highly recommend the default option \Rcode{iterative=FALSE} since the difference in the estimates obtained via EM and exact constrained is very small, but the former requires more computational effort:

 all.equal(results_emBO1$hmC,results_exactBO1$hmC,scale=1)
## [1] "Mean absolute difference: 9.581949e-05"
 library(microbenchmark)
 mbmBO1 = microbenchmark(
    EXACT = MLML(T.matrix = MethylatedBS_sim2 , U.matrix = UnMethylatedBS_sim2,
                 L.matrix = UnMethylatedOxBS_sim2, M.matrix = MethylatedOxBS_sim2),
    EM =    MLML(T.matrix = MethylatedBS_sim2, U.matrix = UnMethylatedBS_sim2,
                 L.matrix = UnMethylatedOxBS_sim2, M.matrix = MethylatedOxBS_sim2,
                 iterative=TRUE),
    times=10)
 mbmBO1
## Unit: microseconds
##   expr      min        lq       mean    median        uq       max neval
##  EXACT   465.72   485.582   701.7231   506.316   552.157  2427.287    10
##     EM 12414.20 13670.891 15762.0010 13862.258 14163.355 30296.225    10
##  cld
##   a 
##    b

Comparison between approximate exact constrained and true hydroxymethylation proportion used in simulation:

all.equal(true_parameters_sim2$p_h,results_exactBO1$hmC[,1],scale=1)
## [1] "Mean absolute difference: 0.01165593"

Comparison between EM-algorithm and true hydroxymethylation proportion used in simulation:

all.equal(true_parameters_sim2$p_h,results_emBO1$hmC[,1],scale=1)
## [1] "Mean absolute difference: 0.01011952"

BS and TAB methods

Using PAVA:

results_exactBT1 <- MLML(T.matrix = MethylatedBS_sim2 , U.matrix = UnMethylatedBS_sim2,
G.matrix = UnMethylatedTAB_sim2, H.matrix = MethylatedTAB_sim2)

Using EM-algorithm:

 results_emBT1 <- MLML(T.matrix = MethylatedBS_sim2 , U.matrix = UnMethylatedBS_sim2,
 G.matrix = UnMethylatedTAB_sim2, H.matrix = MethylatedTAB_sim2,iterative=TRUE)

Comparison between PAVA and EM:

 all.equal(results_emBT1$hmC,results_exactBT1$hmC,scale=1)
## [1] "Mean absolute difference: 7.675267e-07"
 mbmBT1 = microbenchmark(
    EXACT = MLML(T.matrix = MethylatedBS_sim2, U.matrix = UnMethylatedBS_sim2,
                 G.matrix = UnMethylatedTAB_sim2, H.matrix = MethylatedTAB_sim2),
    EM =    MLML(T.matrix = MethylatedBS_sim2, U.matrix = UnMethylatedBS_sim2,
                 G.matrix = UnMethylatedTAB_sim2, H.matrix = MethylatedTAB_sim2,
                 iterative=TRUE),
    times=10)
 mbmBT1
## Unit: microseconds
##   expr       min        lq       mean     median       uq       max neval
##  EXACT   420.257   457.306   670.1986   475.8605   507.59  2469.986    10
##     EM 13950.349 14951.039 15892.4449 15625.2795 16948.82 17872.230    10
##  cld
##   a 
##    b

Comparison between approximate exact constrained and true hydroxymethylation proportion used in simulation:

all.equal(true_parameters_sim2$p_h,results_exactBT1$hmC[,1],scale=1)
## [1] "Mean absolute difference: 0.00644861"

Comparison between EM-algorithm and true hydroxymethylation proportion used in simulation:

all.equal(true_parameters_sim2$p_h,results_emBT1$hmC[,1],scale=1)
## [1] "Mean absolute difference: 0.004719911"

oxBS and TAB methods

Using PAVA:

 results_exactOT1 <- MLML(L.matrix = UnMethylatedOxBS_sim2, M.matrix = MethylatedOxBS_sim2,
 G.matrix = UnMethylatedTAB_sim2, H.matrix = MethylatedTAB_sim2)

Using EM-algorithm:

 results_emOT1 <- MLML(L.matrix = UnMethylatedOxBS_sim2, M.matrix = MethylatedOxBS_sim2,
 G.matrix = UnMethylatedTAB_sim2, H.matrix = MethylatedTAB_sim2,iterative=TRUE)

Comparison between PAVA and EM:

 all.equal(results_emOT1$hmC,results_exactOT1$hmC,scale=1)
## [1] "Mean absolute difference: 2.019638e-07"
 mbmOT1 = microbenchmark(
    EXACT = MLML(L.matrix = UnMethylatedOxBS_sim2, M.matrix = MethylatedOxBS_sim2,
                 G.matrix = UnMethylatedTAB_sim2, H.matrix = MethylatedTAB_sim2),
    EM =    MLML(L.matrix = UnMethylatedOxBS_sim2, M.matrix = MethylatedOxBS_sim2,
                 G.matrix = UnMethylatedTAB_sim2, H.matrix = MethylatedTAB_sim2,
                 iterative=TRUE),
    times=10)
 mbmOT1
## Unit: microseconds
##   expr      min       lq      mean   median       uq      max neval cld
##  EXACT  374.325  392.684  598.7969  406.394  429.322 2344.482    10  a 
##     EM 4478.511 6541.223 6313.7010 6574.355 6590.827 6604.810    10   b

Comparison between approximate exact constrained and true 5-hmC proportion used in simulation:

all.equal(true_parameters_sim2$p_h,results_exactOT1$hmC[,1],scale=1)
## [1] "Mean absolute difference: 0.006451817"

Comparison between EM-algorithm and true 5-hmC proportion used in simulation:

all.equal(true_parameters_sim2$p_h,results_emOT1$hmC[,1],scale=1)
## [1] "Mean absolute difference: 0.00645154"

BS, oxBS and TAB methods

When data from the three methods are available, the default otion in the \Rfunction{MLML} function returns the constrained maximum likelihood estimates using an approximated solution for Lagrange multipliers method.

results_exactBOT1 <- MLML(T.matrix = MethylatedBS_sim2 , U.matrix = UnMethylatedBS_sim2,
L.matrix = UnMethylatedOxBS_sim2, M.matrix = MethylatedOxBS_sim2,
G.matrix = UnMethylatedTAB_sim2, H.matrix = MethylatedTAB_sim2)

Maximum likelihood estimate via EM-algorithm approach [@Qu:MLML] is obtained with the option \verb|iterative=TRUE|. In this case, the default (or user specified) \verb|tol| is considered in the iterative method.

 results_emBOT1 <- MLML(T.matrix = MethylatedBS_sim2 , U.matrix = UnMethylatedBS_sim2,
 L.matrix = UnMethylatedOxBS_sim2, M.matrix = MethylatedOxBS_sim2,
 G.matrix = UnMethylatedTAB_sim2, H.matrix = MethylatedTAB_sim2,iterative=TRUE)

We recommend the default option \Rcode{iterative=FALSE} since the difference in the estimates obtained via EM and the approximate exact constrained is very small, but the former requires more computational effort:

 all.equal(results_emBOT1$hmC,results_exactBOT1$hmC,scale=1)
## [1] "Mean absolute difference: 1.627884e-06"
 mbmBOT1 = microbenchmark(
    EXACT = MLML(T.matrix = MethylatedBS_sim2, U.matrix = UnMethylatedBS_sim2,
                 L.matrix = UnMethylatedOxBS_sim2, M.matrix = MethylatedOxBS_sim2,
                 G.matrix = UnMethylatedTAB_sim2, H.matrix = MethylatedTAB_sim2),
    EM =    MLML(T.matrix = MethylatedBS_sim2, U.matrix = UnMethylatedBS_sim2,
                 L.matrix = UnMethylatedOxBS_sim2, M.matrix = MethylatedOxBS_sim2,
                 G.matrix = UnMethylatedTAB_sim2, H.matrix = MethylatedTAB_sim2,
                 iterative=TRUE),
    times=10)
 mbmBOT1
## Unit: milliseconds
##   expr      min       lq     mean   median       uq      max neval cld
##  EXACT 1.046622 1.053157 1.459126 1.098781 1.293864 2.896907    10  a 
##     EM 1.813534 1.876297 2.699638 2.174190 3.675374 4.107334    10   b

Comparison between approximate exact constrained and true hydroxymethylation proportion used in simulation:

all.equal(true_parameters_sim2$p_h,results_exactBOT1$hmC[,1],scale=1)
## [1] "Mean absolute difference: 0.005664222"

Comparison between EM-algorithm and true hydroxymethylation proportion used in simulation:

all.equal(true_parameters_sim2$p_h,results_emBOT1$hmC[,1],scale=1)
## [1] "Mean absolute difference: 0.004146021"

References