Package version: filematrix 1.1.0

Contents

1 Motivation for creation of filematrix package

The filematrix package was originally conceived as an alternative to bigmemory package for two reasons. First, matrices created with bigmemory on NFS (network file system) have often been corrupted (contained all zeros). This is most likely a fault of memory-mapped files on NFS. Second, bigmemory was not available for Windows initially. It is now fully cross platform.

1.1 Differences between filematrix and bigmemory packages

The packages use different libraries to read from and write to their big files. filematrix uses readBin and writeBin R functions. bigmemory uses memory-mapped file access via BH R package (Boost C++).

Also, filematrix can store real values in short 4 byte format. This feature is not available in bigmemory.

1.2 Differences in tests

Due to different file access approach:

Consequently:

1.3 Example when filematrix is much more efficient than bigmemory

Let us consider a simple task of filling in a large matrix (twice memory size). Below is the code using filematrix. It finishes in 10 minutes and does not interfere with other programs.

library(filematrix)
fm = fm.create('E:/big_fm', nrow = 1e5, ncol = 1e5)

tic = proc.time()
for( i in seq_len(ncol(fm)) ) {
    cat(i, "of", ncol(fm), "\n")
    fm[,i] = i + 1:nrow(fm)
}
toc = proc.time()
show(toc-tic)

# Cleanup

closeAndDeleteFiles(fm)

Filling the same sized big matrix with bigmemory can be very slow (2.5 times slow in this experiment). The bigmemory package uses memory mapped file technique to access the file. When the matrix is written to, the memory mapped file occupies all available RAM and the computer slows to a halt. Task Manager shows the memory mapped file occupy all available RAM when filling a large matrix with bigmemory package.

Please excercise caution when running the code below.

library(bigmemory)
fm = filebacked.big.matrix(nrow = 1e5, ncol = 1e5, 
                                  type = 'double', backingfile = 'big_bm.bmat',
                                  backingpath = 'E:/', descriptorfile = 'big_bm.desc.txt')

tic = proc.time()
for( i in seq_len(ncol(fm)) ) {
    cat(i, "of", ncol(fm), "\n")
    fm[,i] = i + 1:nrow(fm)
}
flush(fm)
toc = proc.time()
show(toc-tic)

# Cleanup

rm(fm)
gc()
unlink('E:/big_bm.bmat')
unlink('E:/big_bm.desc.txt')