logStirling2

An R package for computing logarithms of Stirling numbers of the second kind, S(n, k), accurately and efficiently for arbitrarily large n.

Motivation

The Stirling number of the second kind S(n, k) counts the number of ways to partition a set of n distinct objects into k non-empty subsets. These numbers grow extremely rapidly and overflow 64-bit floating point for moderate n, making direct computation impractical. logStirling2 sidesteps this by working on the log scale via a numerically stable recurrence in C++, enabling fast and accurate computation for full rows for n past 50,000 — well past the practical limit of existing packages.

Installation

The package is not yet on CRAN. Install the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("jblood94/logStirling2")

A C++ compiler (Rtools on Windows, Xcode Command Line Tools on macOS) is required to build the package from source.

After installation it is recommended to run the function logStirling2::get_state_data() once to download 1,275,000 pre-computed values from the log-Stirling table (~10MB). These will be saved locally to tools::R_user_dir("logStirling2", "data"). These values will greatly improve package performance for large n.

Functions

The package exports three user-facing functions:

logStirling2(n, k = NULL, as.matrix = TRUE, ones = TRUE) — the main workhorse. Accepts vectors for both n and k and returns results as a matrix (rows = n, columns = k) or a flat vector. When k = NULL, all valid k for each n are returned. Setting ones = FALSE drops the trivial edge cases S(n, 1) = S(n, n) = 1 (i.e., log = 0) from the output.

logStirling2Temme(n, k = NULL, as.matrix = TRUE, ones = TRUE, twoterms = TRUE) — same interface as logStirling2, but uses Temme’s (1993) asymptotic approximation for k = 4:(n−2), with exact closed-form expressions at the edges (k = 2, 3, n−2, n−1). Much faster than the recurrence for very large n, at the cost of exactness. The two term asymptotic approximation is used when twoterms = TRUE; otherwise, the one-term approximation is used.

stirling2direct(n, k) — computes the exact value of S(n, k) as a bigz integer via a rearrangement of the explicit formula and using arbitrary-precision arithmetic from gmp.

get_state_data(force = FALSE) — retrieves pre-computed values from the GitHub repository and saves them locally at tools::R_user_dir("logStirling2", "data"). These are used by logStirling2 to speed computation and maintain precision for large n.

Usage

library(logStirling2)

# A slice of the Stirling triangle as a matrix
logStirling2(n = 5:8, k = 2:5, as.matrix = TRUE)
#>          2        3        4        5
#> 5 2.708050 3.218876 2.302585 0.000000
#> 6 3.433987 4.499810 4.174387 2.708050
#> 7 4.143135 5.707110 5.857933 4.941642
#> 8 4.844187 6.873164 7.438972 6.956545

# All non-trivial k for multiple n, as a vector
logStirling2(n = 8:10, k = NULL, as.matrix = FALSE, ones = FALSE)
#>  [1]  4.844187  6.873164  7.438972  6.956545  5.583496  3.332205  5.541264
#>  [8]  8.014666  8.958025  8.846641  7.880804  6.135565  3.583519  6.236370
#> [15]  9.140990 10.437199 10.657847 10.035699  8.679312  6.620073  3.806662

# Exact big-integer result for a scalar pair
stirling2direct(100, 10)
#> Big Integer ('bigz') :
#> [1] 2754999986711164035029356262910003922476368243643133591265713197865860436127311130380917269755

# Full row for large n
options(digits = 16)
s <- logStirling2(n = 38e3, k = NULL, as.matrix = FALSE)
length(s)
#> [1] 38000
s[10:13]
#> [1] 87483.12912120066 91102.51805849221 94406.46547744835 97445.52341968527
sapply(10:13, \(k) log(stirling2direct(38e3, k)))
#> [1] 87483.12912120066 91102.51805849221 94406.46547744835 97445.52341968527

# Temme's asymptotic approximation — fast even for very large n
s <- logStirling2Temme(n = 1e5)
s[1000:1003]
#> [1] 684863.3997197256 684956.4409982545 685049.3814779317 685142.2213588564
sapply(1000:1003, \(k) log(stirling2direct(1e5, k)))
#> [1] 684863.3997197255 684956.4409982546 685049.3814779319 685142.2213588564

Algorithm

logStirling2 applies the standard Stirling recurrence on the log scale. Writing s(n, k) = log S(n, k), the recurrence S(n+1, k) = k·S(n, k) + S(n, k−1) becomes:

\[s(n+1,k) = s(n,k) + \log\bigl(k + e^{s(n,k-1) - s(n,k)}\bigr)\]

The C++ implementation evaluates this with long double precision using log1pl and expl for numerical stability, sweeping k from high to low within each row so the update is in-place.

For n ≥ 1,000, the function automatically loads pre-computed starting states stored in logStirling2_cache_v01.rds (retrieved by running the logStirling2::get_state_data function) — 50 rows of the triangle (every 1,000th row from n = 1,000 to n = 50,000), computed externally with FLINT and ARB and stored as raw vectors at long double precision. These checkpoints both accelerate computation and preserve precision for large n by reducing the number of recurrence steps needed.

Depending on the structure of the input vector n, the C++ layer dispatches to one of three routines:

logStirling2Temme uses the approximation of Temme (1993). By default, it uses an expression for the two-term approximation after applying common subexpression elimination.

Dependencies

References

Temme, N. M. (1993). Asymptotic estimates of Stirling numbers. Studies in Applied Mathematics, 89(3), 233–243.

License GPL (≥ 3)