Metadata-Version: 2.1
Name: AB-library
Version: 0.2.0
Summary: A comprehensive Python library for A/B testing analysis
Home-page: UNKNOWN
Author: Renat/Egor
Author-email: ryunisov0@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Software Development :: Libraries
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy (>=1.18.0)
Requires-Dist: pandas (>=1.0.0)
Requires-Dist: matplotlib (>=3.0.0)
Requires-Dist: scipy (>=1.4.0)
Requires-Dist: statsmodels (>=0.12.0)
Requires-Dist: seaborn (>=0.11.0)
Requires-Dist: tqdm (>=4.0.0)

# AB Testing Library

A comprehensive Python library designed for A/B testing analysis, providing essential statistical tools for hypothesis testing, confidence interval calculations, p-value visualizations, and multiple hypothesis correction methods. Whether you're running simple two-group comparisons or more complex multi-group analyses, this library equips you with the necessary functions to derive meaningful insights from your experiments.

## Table of Contents

- [Features](#features)
- [Installation](#installation)
- [Usage](#usage)
  - [Calculate Minimum Detectable Effect (MDE)](#calculate-minimum-detectable-effect-mde)
  - [Calculate MDE for Ratios](#calculate-mde-for-ratios)
  - [Plot P-value Over Time](#plot-p-value-over-time)
  - [Perform T-Tests Between Groups](#perform-t-tests-between-groups)
  - [Perform Proportion Tests Between Groups](#perform-proportion-tests-between-groups)
  - [Perform T-Tests on Delta Between Ratios](#perform-t-tests-on-delta-between-ratios)
  - [Plot P-value Distribution from A/A Tests](#plot-p-value-distribution-from-aa-tests)
  - [Plot P-value ECDF](#plot-p-value-ecdf)
  - [Apply Benjamini-Hochberg Procedure](#apply-benjamini-hochberg-procedure)
- [Example](#example)
- [License](#license)

## Features

- Minimum Detectable Effect (MDE) Calculations: Determine the smallest effect size you can detect with your experimental setup.
- Ratio-Specific MDE: Calculate MDE for metrics expressed as ratios.
- Statistical Testing: Perform t-tests and proportion tests between groups to evaluate significance.
- P-value Visualization: Visualize p-value dynamics over time and their distributions.
- Multiple Hypothesis Correction: Apply the Benjamini-Hochberg procedure to control the false discovery rate.


## Mathematical Formulas

#### Sample Size 
$$
n = \frac{(Z_{1-\alpha/2} + Z_{1-\beta})^2 \cdot (Var_{\text{test}}+Var_{\text{control}})}{\delta^2}
$$

Where:
- $$\( n \)$$ is the sample size per group.
- $$\( Z_{1-\alpha/2} \)$$ is the Z-score corresponding to the desired significance level (e.g., 1.96 for 95% confidence).
- $$\( Z_{1-\beta} \)$$ is the Z-score corresponding to the desired power (e.g., 0.84 for 80% power).
- $$\( p \)$$ is the baseline conversion rate.
- $$\( \delta \)$$ is the Minimum Detectable Effect (MDE) you aim to detect.

#### MDE
$$
\text{MDE} = \frac{(Z_{1-\alpha/2} + Z_{1-\beta}) \cdot \sqrt{(Var_{\text{test}}+Var_{\text{control}})}}{\sqrt{n}}
$$

Where:
- $$\( \text{MDE} \)$$ is the minimum detectable effect as a proportion.
- $$\( Z_{1-\alpha/2} \)$$ is the Z-score for the significance level.
- $$\( Z_{1-\beta} \)$$ is the Z-score for the power.
- $$\( p \)$$ is the baseline conversion rate.
- $$\( n \)$$ is the sample size per group.

## Installation

You can install the library using pip:

```bash
pip install AB_library
```

Or, if you have the source code cloned locally, install it in editable mode:
```bash
pip install -e .
```
## Usage

### Calculate Minimum Detectable Effect (MDE)

Calculate the Minimum Detectable Effect (MDE) given the mean, standard deviation, and sample size.
```python
from AB_library import get_mde

mean = 100
std = 15
sample_size = 1000

mde_percentage, mde_absolute = get_mde(mean, std, sample_size)
print(f"MDE: {mde_percentage}% ({mde_absolute})")
```
### Calculate MDE for Ratios

Calculate MDE when your metric is a ratio (e.g., conversion rate).
```python
from AB_library import get_mde_ratio
import numpy as np

numerator = np.array([50, 55, 60, 65, 70])
denominator = np.array([500, 550, 600, 650, 700])
sample_size = 1000

mde_ratio_percentage, mde_ratio_absolute = get_mde_ratio(numerator, denominator, sample_size)
print(f"MDE Ratio: {mde_ratio_percentage}% ({mde_ratio_absolute})")
```
### Plot P-value Over Time

Visualize how p-values change over different time periods during your experiment.

```python
from AB_library import plot_p_value_over_time

dates = ['2024-01', '2024-02', '2024-03', '2024-04']
test_group = [[1.2, 1.3, 1.1], [1.4, 1.5, 1.3], [1.5, 1.6, 1.4], [1.7, 1.8, 1.6]]
control_group = [[1.1, 1.0, 1.2], [1.2, 1.1, 1.3], [1.3, 1.2, 1.4], [1.4, 1.3, 1.5]]

plot_p_value_over_time(dates, test_group, control_group)
```

### Perform T-Tests Between Groups

Conduct t-tests between two groups and obtain statistical metrics.
```python
from AB_library import ttest
import pandas as pd

# Sample DataFrame
data = {
    'group': [0]*100 + [1]*100,
    'metric': np.random.normal(100, 15, 200)
}
df = pd.DataFrame(data)

results = ttest(df, metric_col='metric', ab_group_col='group')
print(results)
```
### Perform Proportion Tests Between Groups

Perform proportion tests to compare binary outcomes between groups.
```python
from AB_library import ztest_proportion
import pandas as pd

# Sample DataFrame
data = {
    'group': [0]*1000 + [1]*1000,
    'success': np.random.binomial(1, 0.1, 2000)
}
df = pd.DataFrame(data)

results = ztest_proportion(df, metric_col='success', ab_group_col='group')
print(results)
```
### Perform T-Tests on Delta Between Ratios

Compare the delta between two ratio metrics across groups.
```python
from AB_library import ttest_delta
import pandas as pd

# Sample DataFrame
data = {
    'group': [0]*1000 + [1]*1000,
    'numerator': np.random.binomial(1, 0.1, 2000),
    'denominator': np.random.binomial(10, 0.5, 2000)
}
df = pd.DataFrame(data)

results = ttest_delta(
    df, 
    metric_num_col='numerator', 
    metric_denom_col='denominator', 
    ab_group_col='group'
)
print(results)
```
### Plot P-value Distribution from A/A Tests

Visualize the distribution of p-values from A/A testing to assess test calibration.
```python
from AB_library import plot_p_value_distribution
import numpy as np

control_group = np.random.normal(100, 15, 1000)
test_group = np.random.normal(100, 15, 1000)

plot_p_value_distribution(control_group, test_group)
```
### Plot P-value ECDF

Create Empirical Cumulative Distribution Function (ECDF) plots for p-values.
```python
from AB_library import plot_pvalue_ecdf
import pandas as pd
import numpy as np

# Sample DataFrame
data = {
    'has_treatment': [1]*1000 + [0]*1000,
    'gmv': np.random.normal(100, 15, 2000)
}
control_group = pd.DataFrame(data)
test_group = pd.DataFrame(data)

plot_pvalue_ecdf(control_group, test_group, title='P-value ECDF')
```
### Apply Benjamini-Hochberg Procedure

Control the false discovery rate when performing multiple hypothesis tests.
```python
from AB_library import method_benjamini_hochberg
import numpy as np

pvalues = np.random.uniform(0, 1, 100)
adjusted = method_benjamini_hochberg(pvalues, alpha=0.05)
print(adjusted)
```
### Example

Here’s a complete example that ties together multiple functions from the library:
```python
import numpy as np
import pandas as pd
from AB_library import get_mde, ttest, plot_p_value_over_time

# Calculate MDE
mean = 100
std = 15
sample_size = 1000
mde_percentage, mde_absolute = get_mde(mean, std, sample_size)
print(f"MDE: {mde_percentage}% ({mde_absolute})")

# Create sample data
data = {
    'group': [0]*1000 + [1]*1000,
    'metric': np.random.normal(100, 15, 2000)
}
df = pd.DataFrame(data)

# Perform t-test
results = ttest(df, metric_col='metric', ab_group_col='group')
print(results)

# Plot p-value over time
dates = ['2024-01', '2024-02', '2024-03', '2024-04']
test_group = [np.random.normal(100, 15, 100) for _ in dates]
control_group = [np.random.normal(100, 15, 100) for _ in dates]
plot_p_value_over_time(dates, test_group, control_group)
```


# License

This project is licensed under the [MIT License.](https://opensource.org/license/mit)


