tidybins



library(tidybins)
suppressPackageStartupMessages(library(dplyr))

Bin Value

Binning by value is the only original binning method implemented in this package. It is inspired by the case in marketing when accounts need to be binned by their sales. For example, creating 10 bins, where each bin represent 10% of all market sales. The first bin contains the highest sales accounts, thus has the small total number of accounts, whereas the last bin contains the smallest sales accounts, thus requiring the most number of accounts per bin to reach 10% of the market sales.


tibble::tibble(SALES = as.integer(rnorm(1000L, mean = 10000L, sd = 3000))) -> sales_data

sales_data %>% 
  bin_cols(SALES, bin_type = "value") -> sales_data1

sales_data1
#> # A tibble: 1,000 × 2
#>    SALES SALES_va10
#>    <int>      <int>
#>  1 11159          6
#>  2 10510          5
#>  3 11642          7
#>  4  6813          1
#>  5 11377          6
#>  6 12396          8
#>  7  8848          3
#>  8  7471          2
#>  9 13750          9
#> 10  8247          2
#> # ℹ 990 more rows

Notice that the sum is equal across bins.

sales_data1 %>% 
  bin_summary() %>% 
  print(width = Inf)
#> # A tibble: 10 × 14
#>    column method      n_bins .rank  .min  .mean  .max .count .uniques
#>    <chr>  <chr>        <int> <int> <int>  <dbl> <int>  <int>    <int>
#>  1 SALES  equal value     10    10 14500 15702. 20805     64       62
#>  2 SALES  equal value     10     9 13168 13730. 14479     72       70
#>  3 SALES  equal value     10     8 12279 12712. 13158     78       74
#>  4 SALES  equal value     10     7 11565 11932. 12275     83       81
#>  5 SALES  equal value     10     6 10895 11246. 11560     88       84
#>  6 SALES  equal value     10     5 10198 10509. 10893     94       91
#>  7 SALES  equal value     10     4  9352  9767. 10196    102       95
#>  8 SALES  equal value     10     3  8368  8855.  9344    112      111
#>  9 SALES  equal value     10     2  7065  7727.  8348    128      122
#> 10 SALES  equal value     10     1  1865  5533.  7063    179      176
#>    relative_value    .sum   .med   .sd width
#>             <dbl>   <int>  <dbl> <dbl> <int>
#>  1          100   1004944 15340. 1254.  6305
#>  2           87.4  988586 13704.  378.  1311
#>  3           81.0  991532 12690.  278.   879
#>  4           76.0  990368 11935   213.   710
#>  5           71.6  989685 11251   190.   665
#>  6           66.9  987842 10458.  209.   695
#>  7           62.2  996266  9762   258.   844
#>  8           56.4  991748  8832.  290.   976
#>  9           49.2  989066  7756.  383.  1283
#> 10           35.2  990422  5863  1170.  5198