Fitting the hidden Markov model
A small synthetic dataset can be found in the package installation directory, including a synthetic X matrix with 300 SNPs from 100 samples. We can load this with:
library(SNPknock)
X_file = system.file("extdata", "X.RData", package = "SNPknock")
load(X_file)
table(X)
## X
## 0 1 2
## 9925 9135 10940
Below, we show how to fit a hidden Markov model to this data, with the help of fastPHASE
. Since fastPHASE
takes as input genotype sequences in β.inpβ format, we must first convert the X matrix by calling SNPknock.fp.writeX
. By default, this function will write onto a temporary file in the R temporary directory.
# Convert X into the suitable fastPhase input format, write it into a temporary file
# and return the path to that file.
Xinp_file = SNPknock.fp.writeX(X)
Assuming that we have already downloaded fastPHASE
, we can call it to fit the hidden Markov model to X.
fp_path = "~/bin/fastPHASE" # Path to the fastPHASE executable
# Call fastPhase and return the path to the parameter estimate files
fp_outPath = SNPknock.fp.runFastPhase(fp_path, Xinp_file)
## SNPknock could find the fastPhase executable: '~/bin/fastPHASE' does not exist.
## If you have not downloaded it yet, you can obtain fastPhase from: http://scheet.org/software.html
Above, the SNPknock
package could not find fastPhase
because we did not provide the correct path (we cannot include third-party executable files within this package). However, if you install fastPhase
separately and provide SNPknock
with the correct path, this will work.
If the previous step worked for you, you can find the parameter estimates produced by fastPHASE
in the following files:
r_file = paste(fp_outPath, "_rhat.txt", sep="")
theta_file = paste(fp_outPath, "_thetahat.txt", sep="")
alpha_file = paste(fp_outPath, "_alphahat.txt", sep="")
Otherwise, for the sake of this tutorial, you can use the example parameter files provided in the package installation directory:
r_file = system.file("extdata", "X_rhat.txt", package = "SNPknock")
theta_file = system.file("extdata", "X_thetahat.txt", package = "SNPknock")
alpha_file = system.file("extdata", "X_alphahat.txt", package = "SNPknock")
Then, we can construct the hidden Markov model with:
hmm = SNPknock.fp.loadFit(r_file, theta_file, alpha_file, X[1,])