tidybins



library(tidybins)
suppressPackageStartupMessages(library(dplyr))

Bin Value

Binning by value is the only original binning method implemented in this package. It is inspired by the case in marketing when accounts need to be binned by their sales. For example, creating 10 bins, where each bin represent 10% of all market sales. The first bin contains the highest sales accounts, thus has the small total number of accounts, whereas the last bin contains the smallest sales accounts, thus requiring the most number of accounts per bin to reach 10% of the market sales.


tibble::tibble(SALES = as.integer(rnorm(1000L, mean = 10000L, sd = 3000))) -> sales_data

sales_data %>% 
  bin_cols(SALES, bin_type = "value") -> sales_data1

sales_data1
#> # A tibble: 1,000 × 2
#>    SALES SALES_va10
#>    <int>      <int>
#>  1  8835          3
#>  2 13663          9
#>  3  7100          2
#>  4 12844          8
#>  5 10709          5
#>  6 12138          7
#>  7  9584          4
#>  8  9762          4
#>  9 14492          9
#> 10  6649          1
#> # ℹ 990 more rows

Notice that the sum is equal across bins.

sales_data1 %>% 
  bin_summary() %>% 
  print(width = Inf)
#> # A tibble: 10 × 14
#>    column method      n_bins .rank  .min  .mean  .max .count .uniques
#>    <chr>  <chr>        <int> <int> <int>  <dbl> <int>  <int>    <int>
#>  1 SALES  equal value     10    10 14621 15944. 19440     63       63
#>  2 SALES  equal value     10     9 13357 13952. 14603     71       70
#>  3 SALES  equal value     10     8 12418 12827. 13325     77       73
#>  4 SALES  equal value     10     7 11639 12062. 12409     82       76
#>  5 SALES  equal value     10     6 10857 11260. 11634     88       86
#>  6 SALES  equal value     10     5 10096 10481. 10849     95       90
#>  7 SALES  equal value     10     4  9307  9665. 10092    102       96
#>  8 SALES  equal value     10     3  8273  8800.  9306    113      107
#>  9 SALES  equal value     10     2  6902  7589.  8266    130      123
#> 10 SALES  equal value     10     1   723  5526.  6897    179      172
#>    relative_value    .sum   .med   .sd width
#>             <dbl>   <int>  <dbl> <dbl> <int>
#>  1          100   1004492 15708  1111.  4819
#>  2           87.5  990617 13945   350.  1246
#>  3           80.4  987661 12819   266.   907
#>  4           75.7  989098 12054.  215.   770
#>  5           70.6  990844 11249   219.   777
#>  6           65.7  995681 10496   211.   753
#>  7           60.6  985825  9670.  228.   785
#>  8           55.2  994448  8766   318.  1033
#>  9           47.6  986589  7612   411.  1364
#> 10           34.7  989176  5835  1177.  6174