tidybins



library(tidybins)
suppressPackageStartupMessages(library(dplyr))

Bin Value

Binning by value is the only original binning method implemented in this package. It is inspired by the case in marketing when accounts need to be binned by their sales. For example, creating 10 bins, where each bin represent 10% of all market sales. The first bin contains the highest sales accounts, thus has the small total number of accounts, whereas the last bin contains the smallest sales accounts, thus requiring the most number of accounts per bin to reach 10% of the market sales.


tibble::tibble(SALES = as.integer(rnorm(1000L, mean = 10000L, sd = 3000))) -> sales_data

sales_data %>% 
  bin_cols(SALES, bin_type = "value") -> sales_data1

sales_data1
#> # A tibble: 1,000 × 2
#>    SALES SALES_va10
#>    <int>      <int>
#>  1 12210          7
#>  2 14088          9
#>  3 10269          5
#>  4  7003          1
#>  5  7993          2
#>  6 12848          8
#>  7  1686          1
#>  8  6277          1
#>  9 14417         10
#> 10 12804          8
#> # ℹ 990 more rows

Notice that the sum is equal across bins.

sales_data1 %>% 
  bin_summary() %>% 
  print(width = Inf)
#> # A tibble: 10 × 14
#>    column method      n_bins .rank  .min  .mean  .max .count .uniques
#>    <chr>  <chr>        <int> <int> <int>  <dbl> <int>  <int>    <int>
#>  1 SALES  equal value     10    10 14417 15882. 19670     65       63
#>  2 SALES  equal value     10     9 13365 13879. 14412     70       70
#>  3 SALES  equal value     10     8 12404 12829. 13359     78       73
#>  4 SALES  equal value     10     7 11637 12011. 12393     83       81
#>  5 SALES  equal value     10     6 10864 11249. 11634     88       86
#>  6 SALES  equal value     10     5 10124 10467. 10862     96       89
#>  7 SALES  equal value     10     4  9300  9677. 10118    103      101
#>  8 SALES  equal value     10     3  8498  8919.  9299    111      101
#>  9 SALES  equal value     10     2  7226  7920.  8492    126      122
#> 10 SALES  equal value     10     1   206  5538.  7225    180      174
#>    relative_value    .sum   .med   .sd width
#>             <dbl>   <int>  <dbl> <dbl> <int>
#>  1          100   1032318 15557  1221.  5253
#>  2           87.4  971504 13892.  310.  1047
#>  3           80.8 1000677 12784   269.   955
#>  4           75.6  996891 11979   238.   756
#>  5           70.8  989887 11244.  226.   770
#>  6           65.9 1004841 10439   209.   738
#>  7           60.9  996714  9662   235.   818
#>  8           56.2  990006  8889   240.   801
#>  9           49.9  997945  7938.  359.  1266
#> 10           34.9  996766  5964. 1462.  7019