Metadata-Version: 2.4
Name: 91life-ds-lib
Version: 1.0.0
Summary: Professional Data Science Library for ML Engineers and Researchers
Author-email: Shpat Dobraj <shpatdobraj@91.life>
License: 91Life
Project-URL: Homepage, https://github.com/91life/91life-ds-lib
Project-URL: Documentation, https://91life-ds-lib.readthedocs.io/
Project-URL: Repository, https://github.com/91life/91life-ds-lib.git
Project-URL: Bug Tracker, https://github.com/91life/91life-ds-lib/issues
Project-URL: Source Code, https://github.com/91life/91life-ds-lib
Keywords: data-science,machine-learning,data-analysis,feature-selection,data-preprocessing
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.5.0
Requires-Dist: numpy>=1.21.0
Requires-Dist: scikit-learn>=1.1.0
Requires-Dist: matplotlib>=3.5.0
Requires-Dist: seaborn>=0.11.0
Requires-Dist: plotly>=5.10.0
Requires-Dist: scipy>=1.9.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pydantic-settings>=2.1.0
Requires-Dist: tqdm>=4.64.0
Requires-Dist: psutil>=5.9.0
Requires-Dist: memory-profiler>=0.60.0
Requires-Dist: openpyxl>=3.0.0
Requires-Dist: pyarrow>=10.0.0
Requires-Dist: fastparquet>=0.8.0
Requires-Dist: aiofiles>=22.0.0
Requires-Dist: asyncio-throttle>=1.0.0
Requires-Dist: python-dotenv>=0.19.0
Requires-Dist: statsmodels>=0.13.0
Requires-Dist: imbalanced-learn>=0.9.0
Requires-Dist: kaleido>=0.2.1
Requires-Dist: jinja2>=3.1.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: flake8>=5.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: pre-commit>=2.20.0; extra == "dev"
Requires-Dist: bandit>=1.7.0; extra == "dev"
Requires-Dist: safety>=2.0.0; extra == "dev"
Provides-Extra: cloud
Requires-Dist: boto3>=1.26.0; extra == "cloud"
Requires-Dist: google-cloud-storage>=2.7.0; extra == "cloud"
Requires-Dist: minio>=7.1.0; extra == "cloud"
Requires-Dist: azure-storage-blob>=12.19.0; extra == "cloud"
Requires-Dist: azure-identity>=1.15.0; extra == "cloud"
Provides-Extra: profiling
Requires-Dist: ydata-profiling>=4.0.0; extra == "profiling"
Requires-Dist: sweetviz>=2.3.0; extra == "profiling"
Provides-Extra: all
Requires-Dist: 91life-ds-lib[cloud,dev,profiling]; extra == "all"
Dynamic: license-file

# 91life Data Science Library

<div align="center">
  <img src="https://cdn.prod.website-files.com/6464fc5c49a35f360e272b62/6638ee51fb46dea59b4a71c4_Group%201000004653.svg" alt="91.life Logo" width="400"/>
</div>

[![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://python.org)
[![License](https://img.shields.io/badge/license-91Life-green.svg)](LICENSE)
[![PyPI Version](https://img.shields.io/pypi/v/91life-ds-lib.svg)](https://pypi.org/project/91life-ds-lib/)
[![Documentation](https://img.shields.io/badge/docs-latest-brightgreen.svg)](https://91life-ds-lib.readthedocs.io/)
[![Build Status](https://img.shields.io/github/workflow/status/91life/91life-ds-lib/CI)](https://github.com/91life/91life-ds-lib/actions)
[![Coverage](https://img.shields.io/codecov/c/github/91life/91life-ds-lib)](https://codecov.io/gh/91life/91life-ds-lib)

## Overview

The 91life Data Science Library is a professional, production-ready Python library designed for ML engineers and researchers at [91.life](https://91.life). It provides comprehensive tools for data loading, exploration, feature selection, preprocessing, visualization, and automated reporting.

### Key Features

- **Async Data Loading**: Support for multiple formats (CSV, Parquet, JSON, Excel) with cloud storage integration (AWS S3, Google Cloud, MinIO)
- **Comprehensive Data Exploration**: Automated data quality assessment, missing data analysis, and statistical profiling
- **Advanced Feature Selection**: Multiple methods including variance, correlation, mutual information, tree-based, L1 regularization, and consensus selection
- **Data Preprocessing**: Complete pipeline for missing value handling, outlier treatment, scaling, encoding, and class imbalance
- **Rich Visualizations**: Interactive plots with Plotly, static plots with Matplotlib/Seaborn, and automated dashboards
- **Automated Reporting**: Integration with YData Profiling and Sweetviz, plus custom HTML/JSON reports
- **Clean Architecture**: Domain-Driven Design (DDD) patterns with comprehensive logging and error handling
- **Performance Optimized**: Memory-efficient chunked processing for large datasets

## Installation

### Basic Installation

```bash
pip install 91life-ds-lib
```

### With Cloud Storage Support

```bash
pip install 91life-ds-lib[cloud]
```

### With Profiling Tools

```bash
pip install 91life-ds-lib[profiling]
```

### Development Installation

```bash
git clone https://github.com/91life/91life-ds-lib.git
cd 91life-ds-lib
pip install -e ".[dev]"
```

## Quickstart

```python
from ninetyone_life_ds import DataLoader, DataExplorer, FeatureSelector

# Load data efficiently
loader = DataLoader()
data = loader.load_dataset('your_data.csv')

# Explore data comprehensively
explorer = DataExplorer()
basic_info = explorer.analyze_basic_info(data)
missing_analysis = explorer.analyze_missing_data(data)
readiness_score = explorer.calculate_data_readiness_score(data)

# Select features using consensus method
selector = FeatureSelector()
selected_features = selector.consensus_feature_selection(
    data, 
    target_col='target',
    task_type='classification'
)

print(f"Data readiness: {readiness_score['overall_readiness']}/100")
print(f"Selected features: {len(selected_features['selected_features'])}")
```

## Full Example

See `examples/complete_workflow.py` for a comprehensive demonstration of all library capabilities.

## API Overview

### Core Modules

- **DataLoader**: Efficient data loading with cloud storage support
- **DataExplorer**: Comprehensive data exploration and quality assessment
- **FeatureSelector**: Advanced feature selection with multiple algorithms
- **DataPreprocessor**: Complete preprocessing pipeline
- **Visualizer**: Rich visualizations and interactive plots
- **ReportGenerator**: Automated report generation and profiling

### Main Classes

- `DataLoader`: Handles data loading from various sources and formats
- `DataExplorer`: Performs comprehensive data analysis and quality assessment
- `FeatureSelector`: Implements multiple feature selection algorithms
- `DataPreprocessor`: Provides complete data preprocessing pipeline
- `Visualizer`: Creates professional visualizations and plots
- `ReportGenerator`: Generates comprehensive analysis reports

## Development Setup

### Prerequisites

- Python 3.8+
- pip or conda

### Setup

```bash
# Clone repository
git clone https://github.com/91life/91life-ds-lib.git
cd 91life-ds-lib

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
flake8 src/ tests/

# Format code
black src/ tests/

# Type checking
mypy src/
```

### Testing

```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=src/ninetyone_life_ds --cov-report=html

# Run specific test file
pytest tests/test_data_explorer.py -v
```

## Contributing Guidelines

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes
4. Run tests and linting (`pytest && flake8 src/ tests/`)
5. Commit your changes (`git commit -m 'Add amazing feature'`)
6. Push to the branch (`git push origin feature/amazing-feature`)
7. Open a Pull Request

### Code Style

- Follow PEP 8 guidelines
- Use type hints for all functions
- Write comprehensive docstrings (Google style)
- Ensure all tests pass
- Maintain test coverage above 90%

## License

This project is licensed under the 91Life License - see the [LICENSE](LICENSE) file for details.

## Contact

- **Company**: [91.life](https://91.life)
- **Author**: Shpat Dobraj
- **Email**: shpatdobraj@91.life
- **Issues**: [GitHub Issues](https://github.com/91life/91life-ds-lib/issues)

## Company Insights

91.life is a technology company focused on data science and machine learning solutions. The company provides professional tools and services for data analysis, with a focus on healthcare and life sciences applications.

For more information about 91.life's services and team, visit [https://91.life](https://91.life).
