Metadata-Version: 2.1
Name: ac_solver
Version: 0.1.0
Summary: 
Author: shehper
Author-email: ali.shehper1@gmail.com
Requires-Python: >=3.9,<3.12
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: Farama-Notifications (==0.0.4)
Requires-Dist: GitPython (==3.1.43)
Requires-Dist: Jinja2 (==3.1.4)
Requires-Dist: MarkupSafe (==2.1.5)
Requires-Dist: PyYAML (==6.0.2)
Requires-Dist: Pygments (==2.18.0)
Requires-Dist: appdirs (==1.4.4)
Requires-Dist: appnope (==0.1.4)
Requires-Dist: asttokens (==2.4.1)
Requires-Dist: attrs (==24.2.0)
Requires-Dist: certifi (==2024.7.4)
Requires-Dist: charset-normalizer (==3.3.2)
Requires-Dist: click (==8.1.7)
Requires-Dist: cloudpickle (==3.0.0)
Requires-Dist: comm (==0.2.2)
Requires-Dist: contourpy (==1.2.1)
Requires-Dist: cycler (==0.12.1)
Requires-Dist: debugpy (==1.8.5)
Requires-Dist: decorator (==5.1.1)
Requires-Dist: docker-pycreds (==0.4.0)
Requires-Dist: exceptiongroup (==1.2.2)
Requires-Dist: executing (==2.0.1)
Requires-Dist: fastjsonschema (==2.20.0)
Requires-Dist: filelock (==3.15.4)
Requires-Dist: fonttools (==4.53.1)
Requires-Dist: gitdb (==4.0.11)
Requires-Dist: gymnasium (==0.28.1)
Requires-Dist: idna (==3.8)
Requires-Dist: importlib_metadata (==8.4.0)
Requires-Dist: importlib_resources (==6.4.4)
Requires-Dist: ipykernel (==6.29.5)
Requires-Dist: ipython (==8.18.1)
Requires-Dist: jax-jumpy (==1.0.0)
Requires-Dist: jedi (==0.19.1)
Requires-Dist: joblib (==1.4.2)
Requires-Dist: jsonschema (==4.23.0)
Requires-Dist: jsonschema-specifications (==2023.12.1)
Requires-Dist: jupyter_client (==8.6.2)
Requires-Dist: jupyter_core (==5.7.2)
Requires-Dist: kaleido (==0.2.1)
Requires-Dist: kiwisolver (==1.4.5)
Requires-Dist: matplotlib (==3.7.1)
Requires-Dist: matplotlib-inline (==0.1.7)
Requires-Dist: mpmath (==1.3.0)
Requires-Dist: nbformat (==5.8.0)
Requires-Dist: nest-asyncio (==1.6.0)
Requires-Dist: networkx (==3.2.1)
Requires-Dist: numpy (==1.24.3)
Requires-Dist: packaging (==24.1)
Requires-Dist: parso (==0.8.4)
Requires-Dist: pathtools (==0.1.2)
Requires-Dist: pexpect (==4.9.0)
Requires-Dist: pillow (==10.4.0)
Requires-Dist: platformdirs (==4.2.2)
Requires-Dist: plotly (==5.18.0)
Requires-Dist: prompt_toolkit (==3.0.47)
Requires-Dist: protobuf (==4.25.4)
Requires-Dist: psutil (==6.0.0)
Requires-Dist: ptyprocess (==0.7.0)
Requires-Dist: pure_eval (==0.2.3)
Requires-Dist: pyparsing (==3.1.4)
Requires-Dist: python-dateutil (==2.9.0.post0)
Requires-Dist: pyzmq (==26.2.0)
Requires-Dist: referencing (==0.35.1)
Requires-Dist: requests (==2.32.3)
Requires-Dist: rpds-py (==0.20.0)
Requires-Dist: scikit-learn (==1.3.1)
Requires-Dist: scipy (==1.13.1)
Requires-Dist: sentry-sdk (==2.13.0)
Requires-Dist: setproctitle (==1.3.3)
Requires-Dist: six (==1.16.0)
Requires-Dist: smmap (==5.0.1)
Requires-Dist: stack-data (==0.6.3)
Requires-Dist: sympy (==1.13.2)
Requires-Dist: tenacity (==9.0.0)
Requires-Dist: threadpoolctl (==3.5.0)
Requires-Dist: torch (==2.0.1)
Requires-Dist: tornado (==6.4.1)
Requires-Dist: tqdm (==4.65.0)
Requires-Dist: traitlets (==5.14.3)
Requires-Dist: typing_extensions (==4.12.2)
Requires-Dist: urllib3 (==2.2.2)
Requires-Dist: wandb (==0.15.3)
Requires-Dist: wcwidth (==0.2.13)
Requires-Dist: zipp (==3.20.1)
Description-Content-Type: text/markdown

# AC-Solver

<table>
  <tr>
    <td><img src="assets/image3.png" alt="Image 1" width="800"/></td>
    <!-- <td><img src="assets/image2.png" alt="Image 2" width="800"/></td> -->
  </tr>
</table>

- [Overview](#overview)
- [Andrews-Curtis Conjecture](#Andrews-Curtis-Conjecture)
- [Abstract and paper](#abstract-and-paper)
- [Installation](#installation)
- [Usage](#usage)
  - [Initializing the AC Environment](#initializing-the-ac-environment)
  - [Solving the Environment with PPO](#solving-the-environment-with-ppo)
  - [Performing Classical Search](#performing-classical-search)
- [Notebooks](#notebooks)
- [Contributing](#contributing)
- [Contact info](#contact-info)
- [Citation](#citation)
- [Acknowledgments](#acknowledgments)



## Overview

This repository accompanies the paper *"What Makes Math Problems Hard for Reinforcement Learning: A Case Study."* It includes an implementation of the Andrews-Curtis (AC) Environment in Gymnasium, two classical search algorithms (BFS and Greedy Search), and a PPO agent that works within this environment. Additionally, the repository contains Jupyter notebooks for reproducing the analyses and figures presented in the paper.

## Abstract and paper

Using a long-standing conjecture from combinatorial group theory, we explore, from multiple angles, the challenges of finding rare instances carrying disproportionately high rewards. Based on lessons learned in the mathematical context defined by the Andrews--Curtis conjecture, we propose algorithmic improvements that can be relevant in other domains with ultra-sparse reward problems. Although our case study can be formulated as a game, its shortest winning sequences are potentially $10^6$ or $10^9$ times longer than those encountered in chess. In the process of our study, we demonstrate that one of the potential counterexamples due to Akbulut and Kirby, whose status escaped direct mathematical methods for 39 years, is stably AC-trivial.

[Read full paper on arxiv](https://arxiv.org/)
[TODO: add link to arxiv]

## Andrews-Curtis Conjecture
Andrews-Curtis conjecture is a long-standing open problem in combinatorial group theory and low-dimensional topology. It states that every balanced presentation of the trivial group could be transformed into the trivial presentation using actions on relators: inverses, conjugation and concatenation. More precisely, given presentation of trivial group of form $\langle x_{1}, x_{2}, \ldots, x_{n} | r_{1}, r_{2}, \ldots, r_{n} \rangle$ can be transformed into $\langle x_{1}, x_{2}, \ldots, x_{n} |x_{1}, x_{2}, \ldots, x_{n} \rangle$ using following set of moves:
- inverse: changing some $r_{i}$ with $r_{i}^{-1}$,
- conjugation: changing some $r_{i}$ with $qr_{i}q^{-1}$ for some $q$,
- concatenation: changing some $r_{i}$ with $r_{i}r_{j}$.

While many counterexamples to this conjecture were proposed over the years, finding trivializing sequences is notoriously hard. Many aspects of this math problem make it a perfect setup for studying how Reinforcement Learning can identify rare and long sequences of moves that close the desired goal.

## Installation

To work with the AC Environment or build upon it, you can simply install the package using pip:

```bash
pip install ac_solver
```

If you wish to reproduce the plots and analyses in the paper, you will need to clone the repository locally. Here is the recommended process:

1. Clone the repository:

   ```bash
   git clone https://github.com/shehper/AC-Solver.git
   cd AC-Solver
   ```

2. Make a virtual environment and install the package locally using pip:

   ```bash
   python -m venv ./env
   source ./env/bin/activate
   pip install .
   ```

## Usage

After installation, you can start using the environment and agents as follows:

### Initializing the AC Environment

The `ACEnv` class is initialized using the `ACEnvConfig` class. By default, `ACEnv()` initializes the environment with the trivial presentation $\langle x, y | x, y \rangle$, represented in code as `[1, 0, 2, 0]`.

If you want to specify your own presentation, you can do so using a list. For example, to initialize the environment with the simplest presentation from the Akbulut-Kirby series, $\langle x, y | x^2 = y^3, xyx = yxy \rangle$, you can follow these steps:

0. Import the classes. 
   ```python
   from ac_solver import ACEnvConfig, ACEnv
   ```

1. Define the presentation as a list:
   ```python
   presentation = [1, 1, -2, -2, -2, 0, 0, 1, 2, 1, -2, -1, -2, 0]
   ```

2. Initialize the `ACEnvConfig` with this presentation:
   ```python
   config = ACEnvConfig(initial_state=presentation)
   ```

3. Create the environment with the custom configuration:
   ```python
   env = ACEnv(config)
   ```

This allows you to explore different presentations within the AC Environment.

### Performing Classical Search

To perform greedy search in the neighborhood of a specified presentation, do:

```python
from ac_solver import greedy_search
greedy_search(presentation)
```

Similarly, for breadth-first-search:

```python
from ac_solver import bfs
bfs(presentation)
```

You can specify the number of nodes to explore through `max_nodes_to_explore` argument of these functions. By default, search is done over 10000 nodes. If the search is successful in reaching a trivial state, a path of AC moves is returned to the user.

### Solving the Environment with PPO

To train a PPO agent on the AC environment, run the following command in your terminal:

```bash
python ac_solver/agents/ppo.py
```

By default, this command trains the PPO agent on an AC graph with initial states drawn from approximately 1200 presentations of the Miller-Schupp series, as listed in [this file](ac_solver/search/miller_schupp/data/all_presentations.txt). You can customize your run by passing any hyperparameters listed in [args.py](ac_solver/agents/args.py) via the command line.

## Notebooks

The `notebooks/` directory contains Jupyter notebooks that reproduce the figures and results discussed in the paper:

- **`Classical-Search-and-PPO-in-AC-Environment.ipynb`**: Reproduces figures in Sections 1, 3, and 4 of the paper.
- **`Scaling-PPO-in-AC-Environment.ipynb`**: Reproduces figures in Section 5 of the paper.
- **`Stable-AK3.ipynb`**: Provides code demonstrating that AK(3) is a stably AC-trivial presentation, a major result of the paper.

To run these notebooks, you must clone the repository locally as described above.

## Contributing

Contributions to this project are welcome! If you'd like to contribute, please follow these steps:

1. Fork the repository.
2. Create a new branch for your feature or bugfix.
3. Run tests with `pytest` and ensure your code is formatted with `black`:

   ```bash
   poetry run pytest
   poetry run black .
   ```

4. Submit a pull request with a clear description of your changes.

## Contact info
[TODO]
## Citation
```python
@article{
}
```
[TODO]
## Acknowledgments

This project’s PPO implementation is based on the [CleanRL](https://github.com/vwxyzjn/cleanrl) library.

