Metadata-Version: 2.1
Name: ab-test-toolkit
Version: 0.0.5
Summary: Toolkit to simulate and analyze AB tests
Home-page: https://github.com/k111git/ab-test-toolkit
Author: Kolja
Author-email: 
License: Apache Software License 2.0
Keywords: python experimentation AB-testing
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: fastcore
Requires-Dist: pandas
Requires-Dist: black
Requires-Dist: jupyter-black
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: plotly
Requires-Dist: bayesian-testing
Requires-Dist: statsmodels
Provides-Extra: dev

ab-test-toolkit
================

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Install

``` sh
pip install ab_test_toolkit
```

## imports

``` python
from ab_test_toolkit.generator import (
    generate_binary_data,
    generate_continuous_data,
    data_to_contingency,
)
from ab_test_toolkit.power import (
    simulate_power_binary,
    sample_size_binary,
    simulate_power_continuous,
    sample_size_continuous,
)
from ab_test_toolkit.plotting import (
    plot_power,
    plot_distribution,
    plot_betas,
    plot_binary_power,
)
```

## Binary target (e.g. conversion rate experiments)

### Sample size:

We can calculate the sample size required with the function
“sample_size_binary”. Input needed is:

- Conversion rate control: cr0

- Conversion rate variant for minimal detectable effect: cr1 (for
  example, if we have a conversion rate of 1% and want to detect an
  effect of at least 20% relate, we would set cr0=0.010 and cr1=0.012)

- Significance threshold: alpha. Usually set to 0.05, this defines our
  tolerance for falsely detecting an effect if in reality there is none
  (alpha=0.05 means that in 5% of the cases we will detect an effect
  even though the samples for control and variant are drawn from the
  exact same distribution).

- Statistical power. Usually set to 0.8. This means that if the effect
  is the minimal effect specified above, we have an 80% probability of
  identifying it at statistically significant (and hence 20% of not
  idenfitying it).

- one_sided: If the test is one-sided (one_sided=True) or if it is
  two-sided (one_sided=False). As a rule of thumb, if there are very
  strong reasons to believe that the variant cannot be inferior to the
  control, we can use a one sided test. In case of doubts, using a two
  sided test is better.

let us calculate the sample size for the following example:

``` python
n_sample = sample_size_binary(
    cr0=0.01,
    cr1=0.012,
    alpha=0.05,
    power=0.8,
    one_sided=True,
)
print(f"Required sample size per variant is {int(n_sample)}.")
```

    Required sample size per variant is 33560.

``` python
n_sample_two_sided = sample_size_binary(
    cr0=0.01,
    cr1=0.012,
    alpha=0.05,
    power=0.8,
    one_sided=False,
)
print(
    f"For the two-sided experiment, required sample size per variant is {int(n_sample_two_sided)}."
)
```

    For the two-sided experiment, required sample size per variant is 42606.

``` python
fig = plot_binary_power(cr0=0.01, cr1=0.012, alpha=0.05, one_sided=True)
```

``` python
fig = fig.update_layout(spikedistance=1000, hoverdistance=100)
```

``` python
fig
```

<div>                            <div id="266c0954-7baa-45e1-84bb-c71e0cedb2ff" class="plotly-graph-div" style="height:525px; width:100%;"></div>            <script type="text/javascript">                require(["plotly"], function(Plotly) {                    window.PLOTLYENV=window.PLOTLYENV || {};                                    if (document.getElementById("266c0954-7baa-45e1-84bb-c71e0cedb2ff")) {                    Plotly.newPlot(                        "266c0954-7baa-45e1-84bb-c71e0cedb2ff",                        [{"mode":"lines","x":[716.4752046171953,1327.4112431668273,2009.429359385464,2738.4540464919323,3502.2558574448512,4294.349505940807,5111.338964335355,5951.636093888558,6814.7953272048235,7701.151117990417,8611.615935336378,9547.569522260505,10510.803923205638,11503.505598646334,12528.264983589117,13588.109003934038,14686.555622685428,15827.692069788269,17016.2810112008,18257.901834805823,19559.13809795055,20927.827718281053,22373.400861899478,23907.343659882714,25543.847377205566,27300.739008091106,29200.85324299013,31274.123562594366,33560.89902790092,36117.466944109554,39025.8212737169,42412.33183085003,46487.30378703201],"y":[0.1,0.125,0.15,0.175,0.19999999999999998,0.22499999999999998,0.24999999999999997,0.27499999999999997,0.29999999999999993,0.32499999999999996,0.35,0.3749999999999999,0.3999999999999999,0.42499999999999993,0.44999999999999996,0.47499999999999987,0.4999999999999999,0.5249999999999999,0.5499999999999999,0.5749999999999998,0.5999999999999999,0.6249999999999999,0.6499999999999998,0.6749999999999998,0.6999999999999998,0.7249999999999999,0.7499999999999999,0.7749999999999998,0.7999999999999998,0.8249999999999998,0.8499999999999998,0.8749999999999998,0.8999999999999998],"type":"scatter"}],                        {"template":{"data":{"barpolar":[{"marker":{"line":{"color":"white","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"bar":[{"error_x":{"color":"rgb(36,36,36)"},"error_y":{"color":"rgb(36,36,36)"},"marker":{"line":{"color":"white","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"carpet":[{"aaxis":{"endlinecolor":"rgb(36,36,36)","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"rgb(36,36,36)"},"baxis":{"endlinecolor":"rgb(36,36,36)","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"rgb(36,36,36)"},"type":"carpet"}],"choropleth":[{"colorbar":{"outlinewidth":1,"tickcolor":"rgb(36,36,36)","ticks":"outside"},"type":"choropleth"}],"contourcarpet":[{"colorbar":{"outlinewidth":1,"tickcolor":"rgb(36,36,36)","ticks":"outside"},"type":"contourcarpet"}],"contour":[{"colorbar":{"outlinewidth":1,"tickcolor":"rgb(36,36,36)","ticks":"outside"},"colorscale":[[0.0,"#440154"],[0.1111111111111111,"#482878"],[0.2222222222222222,"#3e4989"],[0.3333333333333333,"#31688e"],[0.4444444444444444,"#26828e"],[0.5555555555555556,"#1f9e89"],[0.6666666666666666,"#35b779"],[0.7777777777777778,"#6ece58"],[0.8888888888888888,"#b5de2b"],[1.0,"#fde725"]],"type":"contour"}],"heatmapgl":[{"colorbar":{"outlinewidth":1,"tickcolor":"rgb(36,36,36)","ticks":"outside"},"colorscale":[[0.0,"#440154"],[0.1111111111111111,"#482878"],[0.2222222222222222,"#3e4989"],[0.3333333333333333,"#31688e"],[0.4444444444444444,"#26828e"],[0.5555555555555556,"#1f9e89"],[0.6666666666666666,"#35b779"],[0.7777777777777778,"#6ece58"],[0.8888888888888888,"#b5de2b"],[1.0,"#fde725"]],"type":"heatmapgl"}],"heatmap":[{"colorbar":{"outlinewidth":1,"tickcolor":"rgb(36,36,36)","ticks":"outside"},"colorscale":[[0.0,"#440154"],[0.1111111111111111,"#482878"],[0.2222222222222222,"#3e4989"],[0.3333333333333333,"#31688e"],[0.4444444444444444,"#26828e"],[0.5555555555555556,"#1f9e89"],[0.6666666666666666,"#35b779"],[0.7777777777777778,"#6ece58"],[0.8888888888888888,"#b5de2b"],[1.0,"#fde725"]],"type":"heatmap"}],"histogram2dcontour":[{"colorbar":{"outlinewidth":1,"tickcolor":"rgb(36,36,36)","ticks":"outside"},"colorscale":[[0.0,"#440154"],[0.1111111111111111,"#482878"],[0.2222222222222222,"#3e4989"],[0.3333333333333333,"#31688e"],[0.4444444444444444,"#26828e"],[0.5555555555555556,"#1f9e89"],[0.6666666666666666,"#35b779"],[0.7777777777777778,"#6ece58"],[0.8888888888888888,"#b5de2b"],[1.0,"#fde725"]],"type":"histogram2dcontour"}],"histogram2d":[{"colorbar":{"outlinewidth":1,"tickcolor":"rgb(36,36,36)","ticks":"outside"},"colorscale":[[0.0,"#440154"],[0.1111111111111111,"#482878"],[0.2222222222222222,"#3e4989"],[0.3333333333333333,"#31688e"],[0.4444444444444444,"#26828e"],[0.5555555555555556,"#1f9e89"],[0.6666666666666666,"#35b779"],[0.7777777777777778,"#6ece58"],[0.8888888888888888,"#b5de2b"],[1.0,"#fde725"]],"type":"histogram2d"}],"histogram":[{"marker":{"line":{"color":"white","width":0.6}},"type":"histogram"}],"mesh3d":[{"colorbar":{"outlinewidth":1,"tickcolor":"rgb(36,36,36)","ticks":"outside"},"type":"mesh3d"}],"parcoords":[{"line":{"colorbar":{"outlinewidth":1,"tickcolor":"rgb(36,36,36)","ticks":"outside"}},"type":"parcoords"}],"pie":[{"automargin":true,"type":"pie"}],"scatter3d":[{"line":{"colorbar":{"outlinewidth":1,"tickcolor":"rgb(36,36,36)","ticks":"outside"}},"marker":{"colorbar":{"outlinewidth":1,"tickcolor":"rgb(36,36,36)","ticks":"outside"}},"type":"scatter3d"}],"scattercarpet":[{"marker":{"colorbar":{"outlinewidth":1,"tickcolor":"rgb(36,36,36)","ticks":"outside"}},"type":"scattercarpet"}],"scattergeo":[{"marker":{"colorbar":{"outlinewidth":1,"tickcolor":"rgb(36,36,36)","ticks":"outside"}},"type":"scattergeo"}],"scattergl":[{"marker":{"colorbar":{"outlinewidth":1,"tickcolor":"rgb(36,36,36)","ticks":"outside"}},"type":"scattergl"}],"scattermapbox":[{"marker":{"colorbar":{"outlinewidth":1,"tickcolor":"rgb(36,36,36)","ticks":"outside"}},"type":"scattermapbox"}],"scatterpolargl":[{"marker":{"colorbar":{"outlinewidth":1,"tickcolor":"rgb(36,36,36)","ticks":"outside"}},"type":"scatterpolargl"}],"scatterpolar":[{"marker":{"colorbar":{"outlinewidth":1,"tickcolor":"rgb(36,36,36)","ticks":"outside"}},"type":"scatterpolar"}],"scatter":[{"fillpattern":{"fillmode":"overlay","size":10,"solidity":0.2},"type":"scatter"}],"scatterternary":[{"marker":{"colorbar":{"outlinewidth":1,"tickcolor":"rgb(36,36,36)","ticks":"outside"}},"type":"scatterternary"}],"surface":[{"colorbar":{"outlinewidth":1,"tickcolor":"rgb(36,36,36)","ticks":"outside"},"colorscale":[[0.0,"#440154"],[0.1111111111111111,"#482878"],[0.2222222222222222,"#3e4989"],[0.3333333333333333,"#31688e"],[0.4444444444444444,"#26828e"],[0.5555555555555556,"#1f9e89"],[0.6666666666666666,"#35b779"],[0.7777777777777778,"#6ece58"],[0.8888888888888888,"#b5de2b"],[1.0,"#fde725"]],"type":"surface"}],"table":[{"cells":{"fill":{"color":"rgb(237,237,237)"},"line":{"color":"white"}},"header":{"fill":{"color":"rgb(217,217,217)"},"line":{"color":"white"}},"type":"table"}]},"layout":{"annotationdefaults":{"arrowhead":0,"arrowwidth":1},"autotypenumbers":"strict","coloraxis":{"colorbar":{"outlinewidth":1,"tickcolor":"rgb(36,36,36)","ticks":"outside"}},"colorscale":{"diverging":[[0.0,"rgb(103,0,31)"],[0.1,"rgb(178,24,43)"],[0.2,"rgb(214,96,77)"],[0.3,"rgb(244,165,130)"],[0.4,"rgb(253,219,199)"],[0.5,"rgb(247,247,247)"],[0.6,"rgb(209,229,240)"],[0.7,"rgb(146,197,222)"],[0.8,"rgb(67,147,195)"],[0.9,"rgb(33,102,172)"],[1.0,"rgb(5,48,97)"]],"sequential":[[0.0,"#440154"],[0.1111111111111111,"#482878"],[0.2222222222222222,"#3e4989"],[0.3333333333333333,"#31688e"],[0.4444444444444444,"#26828e"],[0.5555555555555556,"#1f9e89"],[0.6666666666666666,"#35b779"],[0.7777777777777778,"#6ece58"],[0.8888888888888888,"#b5de2b"],[1.0,"#fde725"]],"sequentialminus":[[0.0,"#440154"],[0.1111111111111111,"#482878"],[0.2222222222222222,"#3e4989"],[0.3333333333333333,"#31688e"],[0.4444444444444444,"#26828e"],[0.5555555555555556,"#1f9e89"],[0.6666666666666666,"#35b779"],[0.7777777777777778,"#6ece58"],[0.8888888888888888,"#b5de2b"],[1.0,"#fde725"]]},"colorway":["#1F77B4","#FF7F0E","#2CA02C","#D62728","#9467BD","#8C564B","#E377C2","#7F7F7F","#BCBD22","#17BECF"],"font":{"color":"rgb(36,36,36)"},"geo":{"bgcolor":"white","lakecolor":"white","landcolor":"white","showlakes":true,"showland":true,"subunitcolor":"white"},"hoverlabel":{"align":"left"},"hovermode":"closest","mapbox":{"style":"light"},"paper_bgcolor":"white","plot_bgcolor":"white","polar":{"angularaxis":{"gridcolor":"rgb(232,232,232)","linecolor":"rgb(36,36,36)","showgrid":false,"showline":true,"ticks":"outside"},"bgcolor":"white","radialaxis":{"gridcolor":"rgb(232,232,232)","linecolor":"rgb(36,36,36)","showgrid":false,"showline":true,"ticks":"outside"}},"scene":{"xaxis":{"backgroundcolor":"white","gridcolor":"rgb(232,232,232)","gridwidth":2,"linecolor":"rgb(36,36,36)","showbackground":true,"showgrid":false,"showline":true,"ticks":"outside","zeroline":false,"zerolinecolor":"rgb(36,36,36)"},"yaxis":{"backgroundcolor":"white","gridcolor":"rgb(232,232,232)","gridwidth":2,"linecolor":"rgb(36,36,36)","showbackground":true,"showgrid":false,"showline":true,"ticks":"outside","zeroline":false,"zerolinecolor":"rgb(36,36,36)"},"zaxis":{"backgroundcolor":"white","gridcolor":"rgb(232,232,232)","gridwidth":2,"linecolor":"rgb(36,36,36)","showbackground":true,"showgrid":false,"showline":true,"ticks":"outside","zeroline":false,"zerolinecolor":"rgb(36,36,36)"}},"shapedefaults":{"fillcolor":"black","line":{"width":0},"opacity":0.3},"ternary":{"aaxis":{"gridcolor":"rgb(232,232,232)","linecolor":"rgb(36,36,36)","showgrid":false,"showline":true,"ticks":"outside"},"baxis":{"gridcolor":"rgb(232,232,232)","linecolor":"rgb(36,36,36)","showgrid":false,"showline":true,"ticks":"outside"},"bgcolor":"white","caxis":{"gridcolor":"rgb(232,232,232)","linecolor":"rgb(36,36,36)","showgrid":false,"showline":true,"ticks":"outside"}},"title":{"x":0.05},"xaxis":{"automargin":true,"gridcolor":"rgb(232,232,232)","linecolor":"rgb(36,36,36)","showgrid":false,"showline":true,"ticks":"outside","title":{"standoff":15},"zeroline":false,"zerolinecolor":"rgb(36,36,36)"},"yaxis":{"automargin":true,"gridcolor":"rgb(232,232,232)","linecolor":"rgb(36,36,36)","showgrid":false,"showline":true,"ticks":"outside","title":{"standoff":15},"zeroline":false,"zerolinecolor":"rgb(36,36,36)"}}},"xaxis":{"anchor":"y","domain":[0.0,1.0],"title":{"text":"Sample size per variant"},"showspikes":true,"spikemode":"across","spikethickness":1},"yaxis":{"anchor":"x","domain":[0.0,1.0],"title":{"text":"Power"},"showspikes":true,"spikemode":"across","spikethickness":1},"shapes":[{"line":{"color":"gray","dash":"dash","width":2},"type":"line","x0":33560.89902790093,"x1":33560.89902790093,"xref":"x","y0":0,"y1":1,"yref":"y domain"}],"legend":{"yanchor":"top","y":1,"xanchor":"left","x":0.0},"title":{"text":"Statistical power vs sample size (binary)"},"spikedistance":1000,"hoverdistance":100},                        {"responsive": true}                    ).then(function(){

var gd = document.getElementById('266c0954-7baa-45e1-84bb-c71e0cedb2ff');
var x = new MutationObserver(function (mutations, observer) {{
        var display = window.getComputedStyle(gd).display;
        if (!display || display === 'none') {{
            console.log([gd, 'removed!']);
            Plotly.purge(gd);
            observer.disconnect();
        }}
}});

// Listen for the removal of the full notebook cells
var notebookContainer = gd.closest('#notebook-container');
if (notebookContainer) {{
    x.observe(notebookContainer, {childList: true});
}}

// Listen for the clearing of the current output cell
var outputEl = gd.closest('.output');
if (outputEl) {{
    x.observe(outputEl, {childList: true});
}}

                        })                };                });            </script>        </div>

### Power simulations

What happens if we use a smaller sample size? And how can we understand
the sample size?

Let us analyze the statistical power with synthethic data. We can do
this with the simulate_power_binary function. We are using some default
argument here, see [this
page](https://k111git.github.io/ab-test-simulator/power.html) for more
information.

``` python
# simulation = simulate_power_binary()
```

Note: The simulation object return the total sample size, so we need to
split it per variant.

``` python
# simulation
```

Finally, we can plot the results (note: the plot function show the
sample size per variant):

``` python
# plot_power(
#     simulation,
#     added_lines=[{"sample_size": sample_size_binary(), "label": "Chi2"}],
# )
```

### The problem of peaking

wip

## Contunious target (e.g. average)

Here we assume normally distributed data (which usually holds due to the
central limit theorem).

### Sample size

We can calculate the sample size required with the function
“sample_size_continuous”. Input needed is:

- mu1: Mean of the control group

- mu2: Mean of the variant group assuming minimal detectable effect
  (e.g. if the mean it 5, and we want to detect an effect as small as
  0.05, mu1=5.00 and mu2=5.05)

- sigma: Standard deviation (we assume the same for variant and control,
  should be estimated from historical data)

- alpha, power, one_sided: as in the binary case

Let us calculate an example:

``` python
n_sample = sample_size_continuous(
    mu1=5.0, mu2=5.05, sigma=1, alpha=0.05, power=0.8, one_sided=True
)
print(f"Required sample size per variant is {int(n_sample)}.")
```

Let us also do some simulations. These show results for the t-test as
well as bayesian testing (only 1-sided).

``` python
# simulation = simulate_power_continuous()
```

``` python
# plot_power(
#     simulation,
#     added_lines=[
#         {"sample_size": continuous_sample_size(), "label": "Formula"}
#     ],
# )
```

## Data Generators

We can also use the data generators for example data to analyze or
visualuze as if they were experiments.

Distribution without effect:

``` python
df_continuous = generate_continuous_data(effect=0)
# plot_distribution(df_continuous)
```

Distribution with effect:

``` python
df_continuous = generate_continuous_data(effect=1)
# plot_distribution(df_continuous)
```

## Visualizations

Plot beta distributions for a contingency table:

``` python
df = generate_binary_data()
df_contingency = data_to_contingency(df)
# fig = plot_betas(df_contingency, xmin=0, xmax=0.04)
```

## False positives

``` python
# simulation = simulate_power_binary(cr0=0.01, cr1=0.01, one_sided=False)
```

``` python
# plot_power(simulation, is_effect=False)
```

``` python
# simulation = simulate_power_binary(cr0=0.01, cr1=0.01, one_sided=True)
# plot_power(simulation, is_effect=False)
```


