Atomic vectors are the fundamental data structure in R. They include
numeric (integer and double), logical,
character, complex, and
raw vectors. This vignette explains how
h5lite maps these R types to HDF5 datasets and provides
guidance on controlling storage types and compression.
Writing a vector to HDF5 is straightforward using
h5_write(). The package automatically creates the necessary
dataset and handles dimensions.
In R, a “scalar” is simply a vector of length 1. However, HDF5
distinguishes between a Scalar Dataspace (a single
value with no dimensions) and a Simple Dataspace (an
array) with dimensions [1].
By default, h5lite treats length-1 vectors as 1D arrays
to maintain consistency with R’s vector behavior. To write a true HDF5
scalar, you must wrap the value in I().
# 1. Default: 1D Array (Length 1)
h5_write(42, file, "structure/array_1d")
# 2. Explicit Scalar: Wrapped in I()
h5_write(I(42), file, "structure/scalar")
h5_str(file, "structure")
#> structure/
#> ├── array_1d <uint8 × 1>
#> └── scalar <uint8 scalar>Note: When reading data back into R, both storage formats appear as standard R vectors of length 1.
h5lite attempts to map R types to the most efficient
HDF5 equivalents automatically (as = "auto").
h5lite analyzes the range of
your data and picks the smallest fitting HDF5 type (e.g.,
uint8, int16, int32,
float64).h5lite maps these to
uint8 (0 or 1) in HDF5 to save space.A key challenge in HDF5 is that standard integer and boolean types do
not have a native representation for NA (missing
values).
To ensure data safety, h5lite performs the following
check:
NA, it is
automatically promoted to float64.NA values are stored as an NaN variant
in the file.h5_read() restores them as
numeric vectors with NA.# Integer vector with NO missing values -> Automatic optimal type (uint8)
h5_write(c(1L, 2L, 3L), file, "safe/ints")
h5_typeof(file, "safe/ints")
#> [1] "uint8"
# Integer vector WITH missing values -> Promoted to float64
h5_write(c(1L, NA, 3L), file, "safe/ints_na")
h5_typeof(file, "safe/ints_na")
#> [1] "float64"If you know your data range fits into a smaller type (e.g.,
int8, uint16), you can use the as
argument to force a specific storage type.
Warning: If you force an integer type on data containing
NA or values outside the integer type’s range then
h5lite will throw an error.
HDF5 supports two primary methods for storing strings: Variable-Length and Fixed-Length.
By default (as = "auto"), h5lite chooses
the most efficient string representation:
NA, it uses
Variable-Length UTF-8 (which natively supports missing
values).You can explicitly request variable-length storage using
as = "utf8" or as = "ascii".
NA (stored as NULL pointers).You can force fixed-length storage using the syntax [n],
where n is the number of bytes.
n;
pads shorter strings; does not support
NA.Compression in HDF5 requires the dataset to be “chunked”.
h5lite handles chunking parameters automatically when you
enable compression.
You can enable compression using the compress
argument:
compress = TRUE (default): Uses zlib (deflate) level
5.compress = 9: Uses zlib level 9 (max compression,
slower).R does not natively support 64-bit integers, but the
bit64 package provides an integer64 class.
h5lite supports reading and writing these types directly to
HDF5 int64.