Metadata-Version: 2.1
Name: absplit
Version: 0.1.5
Summary: Generates A/B test groups
Keywords: absplit,a/b test,ab test,ab split,split,set formation,group formation
Author-email: Cormac Rynne <cormac.ry@gmail.com>
Requires-Python: <=3.11
Description-Content-Type: text/markdown
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Information Technology
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Scientific/Engineering :: Visualization
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: pygad==3.0.1
Requires-Dist: scikit-learn
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: seaborn
Project-URL: Home, https://github.com/cormac-rynne/absplit

<a name="readme-top"></a>

<div align="center">
<img src="https://raw.githubusercontent.com/cormac-rynne/absplit/main/images/logo.jpeg" width="460" height="140">
<h3><strong>ABSplit</strong></h3>
Split your data into matching A/B groups

![license](https://img.shields.io/badge/License-MIT-blue.svg)
![version](https://img.shields.io/badge/version-0.1.5-blue.svg)
![version](https://img.shields.io/badge/python-3-orange.svg)

</div>

<details open>
  <summary>Table of Contents</summary>
  <ol>
    <li>
      <a href="#about-the-project">About The Project</a>
    </li>
    <li>
      <a href="#getting-started">Getting Started</a>
      <ul>
        <li><a href="#prerequisites">Prerequisites</a></li>
        <li><a href="#installation">Installation</a></li>
      </ul>
    </li>
    <li><a href="#tutorial">Tutorial</a></li>
    <ul>
        <li><a href="#do-it-yourself">Do it yourself</a></li>
    </ul>
    <li><a href="#usage">Usage</a></li>
    <li><a href="#api-reference">API Reference</a></li>
    <li><a href="#contributing">Contributing</a></li>
    <li><a href="#license">License</a></li>
    <li><a href="#contact">Contact</a></li>
  </ol>
</details>

## About the project
ABSplit is a python package that uses a genetic algorithm to generate as equal as possible A/B test splits.

The project aims to provide a convenient solution for efficiently splitting population data into two distinct 
groups (ABSplit) and finding matching samples that closely resemble a given original sample (Match). Whether you have static 
population data or time series data, this Python package simplifies the process and allows you to 
analyze and manipulate your data effectively.

This covers the following use cases:
1. Splitting an entire population into 2 groups
2. Finding a matching set in the population for a given sample

<p align="right">(<a href="#readme-top">back to top</a>)</p>

## Getting Started
Use the package manager [pip](https://pip.pypa.io/en/stable/) to install ABSplit and it's prerequisites
### Prerequisites
```bash
pip install pygad==3.0.1 numpy sklearn pandas seaborn
```
### Installation

```bash
pip install absplit
```

<p align="right">(<a href="#readme-top">back to top</a>)</p>

## Tutorials
Please see [this colab](https://colab.research.google.com/drive/1gL7dxDJrtVoO5m1mSUWutdr7yas7sZwI?usp=sharing) for 
a range of examples on how to use ABSplit and Match

### Do it yourself
Learn how ABSplit works under the hood, and how to build your own group splitting tool using [PyGAD](https://pypi.org/project/pygad/),
check out [this colab](https://colab.research.google.com/drive/1SlCNnOtN4WCDTSJHsFrZtI7gKcXEl8-C?usp=sharing)

<p align="right">(<a href="#readme-top">back to top</a>)</p>

## Usage

```python
from absplit import ABSplit
import pandas as pd
import datetime
import numpy as np

# Synthetic data
data_dct = {
    'date': [datetime.date(2030,4,1) + datetime.timedelta(days=x) for x in range(3)]*5,
    'country': ['UK'] * 15,
    'region': [item for sublist in [[x]*6 for x in ['z', 'y']] for item in sublist] + ['x']*3,
    'city': [item for sublist in [[x]*3 for x in ['a', 'b', 'c', 'd', 'e']] for item in sublist],
    'metric1': np.arange(0, 15, 1),
    'metric2': np.arange(0, 150, 10)
}
df = pd.DataFrame(data_dct)

# Identify which columns are metrics, which is the time period, and what to split on
kwargs = {
    'metrics': ['metric1', 'metric2'],
    'date_col': 'date',
    'splitting': 'city'
}

# Initialise
ab = ABSplit(
    df=df,
    **kwargs,
)

# Generate split
ab.run()

# Visualise generation fitness
ab.fitness()

# Visualise data
ab.visualise()

# Extract results
df = ab.results
```
<p align="right">(<a href="#readme-top">back to top</a>)</p>

## API Reference
### Absplit 
`ABSplit(df, ga_params={}, metric_weights={}, **kwargs)`

Splits population into 2 groups. Mutually exclusive, completely exhaustive

Arguments:
* `df` (pd.DataFrame): Dataframe to be split
* `metrics` (str, list): Name of, or list of names of, metric columns in DataFrame
* `splitting` (str): Name of column that represents individual in the population that is getting split
* `date_col` (str, optional): Name of column that represents time periods, if applicable.
* `ga_params` (dict, optional): Parameters for the genetic algorithm `pygad.GA` module parameters (default: {})
* `metric_weights` (dict, optional): Weights for each metric in the data. If you want the splitting to focus on one metrics more than the other, you can prioritise this here (default: {})


### Match 
`Match(population, sample, ga_params={}, metric_weights={}, **kwargs)`

Takes DataFrame `sample` and finds a comparable group in `population`.

Arguments:
* `population` (pd.DataFrame): Population to search  for comparable group. Must exclude sample data.
* `sample` (pd.DataFrame): Sample we are looking to find a match for.
* `metrics` (str, list): Name of, or list of names of, metric columns in DataFrame
* `splitting` (str): Name of column that represents individual in the population that is getting split
* `date_col` (str, optional): Name of column that represents time periods, if applicable.
* `ga_params` (dict, optional): Parameters for the genetic algorithm `pygad.GA` module parameters (default: {})
* `metric_weights` (dict, optional): Weights for each metric in the data. If you want the splitting to focus on one metrics more than the other, you can prioritise this here (default: {})

<p align="right">(<a href="#readme-top">back to top</a>)</p>

## Contributing

I welcome contributions to absplit! For major changes, please open an issue first
to discuss what you would like to change.

Please make sure to update tests as appropriate.

<p align="right">(<a href="#readme-top">back to top</a>)</p>

## License

[MIT](https://choosealicense.com/licenses/mit/)

<p align="right">(<a href="#readme-top">back to top</a>)</p>
