Metadata-Version: 2.1
Name: abydos
Version: 0.3.5
Summary: Abydos NLP/IR library
Home-page: https://github.com/chrislit/abydos
Author: Christopher C. Little
Author-email: chrisclittle+abydos@gmail.com
License: GPLv3+
Download-URL: https://github.com/chrislit/abydos/archive/master.zip
Keywords: nlp,ai,ir,language,linguistics,phonetic algorithms,string distance
Platform: UNKNOWN
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Indexing
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Natural Language :: English
Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*
Requires-Dist: numpy
Requires-Dist: six
Requires-Dist: pyliblzma (<0.6.0,>=0.5.3); python_version >= "2.7" and python_version < "2.8"

Abydos
======

+------------------+------------------------------------------------------+
| CI & Test Status | |travis| |circle| |appveyor| |semaphore| |coveralls| |
+------------------+------------------------------------------------------+
| Code Quality     | |codeclimate| |scrutinizer| |codacy| |codefactor|    |
+------------------+------------------------------------------------------+
| Dependencies     | |requires| |snyk| |pyup| |fossa|                     |
+------------------+------------------------------------------------------+
| Local Analysis   | |pylint| |flake8| |black|                            |
+------------------+------------------------------------------------------+
| Usage            | |docs| |mybinder| |license| |sourcerank| |zenodo|    |
+------------------+------------------------------------------------------+
| Contribution     | |cii| |waffle| |openhub|                             |
+------------------+------------------------------------------------------+
| PyPI             | |pypi| |pypi-ver|                                    |
+------------------+------------------------------------------------------+
| conda-forge      | |conda| |conda-dl| |conda-platforms|                 |
+------------------+------------------------------------------------------+

.. |travis| image:: https://travis-ci.org/chrislit/abydos.svg?branch=master
    :target: https://travis-ci.org/chrislit/abydos
    :alt: Travis-CI Build Status

.. |circle| image:: https://circleci.com/gh/chrislit/abydos/tree/master.svg?style=shield
    :target: https://circleci.com/gh/chrislit/abydos/tree/master
    :alt: Circle-CI Build Status

.. |appveyor| image:: https://ci.appveyor.com/api/projects/status/cwukqqsmogivcker/branch/master?svg=true
    :target: https://ci.appveyor.com/project/chrislit/abydos
    :alt: AppVeyor Build Status

.. |semaphore| image:: https://semaphoreci.com/api/v1/chrislit/abydos/branches/master/shields_badge.svg
    :target: https://semaphoreci.com/chrislit/abydos
    :alt: Semaphore Build Status

.. |coveralls| image:: https://coveralls.io/repos/github/chrislit/abydos/badge.svg?branch=master
    :target: https://coveralls.io/github/chrislit/abydos?branch=master
    :alt: Coverage Status

.. |codeclimate| image:: https://codeclimate.com/github/chrislit/abydos/badges/gpa.svg
   :target: https://codeclimate.com/github/chrislit/abydos
   :alt: Code Climate

.. |scrutinizer| image:: https://scrutinizer-ci.com/g/chrislit/abydos/badges/quality-score.png?b=master
    :target: https://scrutinizer-ci.com/g/chrislit/abydos/?branch=master
    :alt: Scrutinizer

.. |codacy| image:: https://api.codacy.com/project/badge/Grade/db79f2c31ea142fb9b5938abe87b0854
    :target: https://www.codacy.com/app/chrislit/abydos?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=chrislit/abydos&amp;utm_campaign=Badge_Grade
    :alt: Codacy

.. |codefactor| image:: https://www.codefactor.io/repository/github/chrislit/abydos/badge
    :target: https://www.codefactor.io/repository/github/chrislit/abydos
    :alt: CodeFactor

.. |requires| image:: https://requires.io/github/chrislit/abydos/requirements.svg?branch=master
    :target: https://requires.io/github/chrislit/abydos/requirements/?branch=master
    :alt: Requirements Status

.. |snyk| image:: https://snyk.io/test/github/chrislit/abydos/badge.svg?targetFile=requirements.txt
    :target: https://snyk.io/test/github/chrislit/abydos?targetFile=requirements.txt
    :alt: Known Vulnerabilities

.. |pyup| image:: https://pyup.io/repos/github/chrislit/abydos/shield.svg
     :target: https://pyup.io/repos/github/chrislit/abydos/
     :alt: Updates

.. |fossa| image:: https://app.fossa.io/api/projects/git%2Bgithub.com%2Fchrislit%2Fabydos.svg?type=shield
     :target: https://app.fossa.io/projects/git%2Bgithub.com%2Fchrislit%2Fabydos?ref=badge_shield
     :alt: FOSSA Status

.. |pylint| image:: https://img.shields.io/badge/Pylint-9.56/10-green.svg
   :target: #
   :alt: Pylint Score

.. |flake8| image:: https://img.shields.io/badge/flake8-2308-red.svg
   :target: #
   :alt: flake8 Errors

.. |black| image:: https://img.shields.io/badge/code%20style-black-000000.svg
   :target: https://github.com/ambv/black
   :alt: black

.. |docs| image:: https://readthedocs.org/projects/abydos/badge/?version=latest
    :target: https://abydos.readthedocs.org/en/latest/
    :alt: Documentation Status

.. |mybinder| image:: https://mybinder.org/badge.svg
    :target: https://mybinder.org/v2/gh/chrislit/abydos/master?filepath=binder
    :alt: Binder

.. |license| image:: https://img.shields.io/badge/License-GPL%20v3-blue.svg
    :target: https://www.gnu.org/licenses/gpl-3.0
    :alt: License: GPL v3

.. |sourcerank| image:: https://img.shields.io/librariesio/sourcerank/pypi/abydos.svg
    :target: https://libraries.io/pypi/abydos
    :alt: Libraries.io SourceRank

.. |zenodo| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.1462443.svg
   :target: https://doi.org/10.5281/zenodo.1463204
   :alt: Zenodo

.. |cii| image:: https://bestpractices.coreinfrastructure.org/projects/1598/badge
    :target: https://bestpractices.coreinfrastructure.org/projects/1598
    :alt: CII Best Practices

.. |waffle| image:: https://badge.waffle.io/chrislit/abydos.svg?columns=To%20Do,In%20Progress
    :target: https://waffle.io/chrislit/abydos
    :alt: 'Waffle.io - Columns and their card count'

.. |openhub| image:: https://www.openhub.net/p/abydosnlp/widgets/project_thin_badge.gif
    :target: https://www.openhub.net/p/abydosnlp
    :alt: OpenHUB

.. |pypi| image:: https://img.shields.io/pypi/v/abydos.svg
    :target: https://pypi.python.org/pypi/abydos
    :alt: PyPI

.. |pypi-ver| image:: 	https://img.shields.io/pypi/pyversions/abydos.svg
    :target: https://pypi.python.org/pypi/abydos
    :alt: PyPI versions

.. |conda| image:: https://img.shields.io/conda/vn/conda-forge/abydos.svg
    :target: https://anaconda.org/conda-forge/abydos
    :alt: conda-forge

.. |conda-dl| image:: 	https://img.shields.io/conda/dn/conda-forge/abydos.svg
    :target: https://anaconda.org/conda-forge/abydos
    :alt: conda-forge downloads

.. |conda-platforms| image:: https://img.shields.io/conda/pn/conda-forge/abydos.svg
    :target: https://anaconda.org/conda-forge/abydos
    :alt: conda-forge platforms

|

.. image:: https://raw.githubusercontent.com/chrislit/abydos/master/abydos-small.png
    :alt: abydos
    :align: right

|
| Abydos NLP/IR library
| Copyright 2014-2018 by Christopher C. Little

Abydos is a library of phonetic algorithms, string distance measures & metrics,
stemmers, and string fingerprinters including:

- Phonetic algorithms
    - Robert C. Russell's Index
    - American Soundex
    - Refined Soundex
    - Daitch-Mokotoff Soundex
    - Kölner Phonetik
    - NYSIIS
    - Match Rating Algorithm
    - Metaphone
    - Double Metaphone
    - Caverphone
    - Alpha Search Inquiry System
    - Fuzzy Soundex
    - Phonex
    - Phonem
    - Phonix
    - SfinxBis
    - phonet
    - Standardized Phonetic Frequency Code
    - Statistics Canada
    - Lein
    - Roger Root
    - Oxford Name Compression Algorithm (ONCA)
    - Eudex phonetic hash
    - Haase Phonetik
    - Reth-Schek Phonetik
    - FONEM
    - Parmar-Kumbharana
    - Davidson's Consonant Code
    - SoundD
    - PSHP Soundex/Viewex Coding
    - an early version of Henry Code
    - Norphone
    - Dolby Code
    - Phonetic Spanish
    - Spanish Metaphone
    - MetaSoundex
    - SoundexBR
    - NRL English-to-phoneme
    - Beider-Morse Phonetic Matching
- String distance metrics
    - Levenshtein distance
    - Optimal String Alignment distance
    - Levenshtein-Damerau distance
    - Hamming distance
    - Tversky index
    - Sørensen–Dice coefficient & distance
    - Jaccard similarity coefficient & distance
    - overlap similarity & distance
    - Tanimoto coefficient & distance
    - Minkowski distance & similarity
    - Manhattan distance & similarity
    - Euclidean distance & similarity
    - Chebyshev distance
    - cosine similarity & distance
    - Jaro distance
    - Jaro-Winkler distance (incl. the strcmp95 algorithm variant)
    - Longest common substring
    - Ratcliff-Obershelp similarity & distance
    - Match Rating Algorithm similarity
    - Normalized Compression Distance (NCD) & similarity
    - Monge-Elkan similarity & distance
    - Matrix similarity
    - Needleman-Wunsch score
    - Smither-Waterman score
    - Gotoh score
    - Length similarity
    - Prefix, Suffix, and Identity similarity & distance
    - Modified Language-Independent Product Name Search (MLIPNS) similarity &
      distance
    - Bag distance
    - Editex distance
    - Eudex distances
    - Sift4 distance
    - Baystat distance & similarity
    - Typo distance
    - Indel distance
    - Synoname
- Stemmers
    - the Lovins stemmer
    - the Porter and Porter2 (Snowball English) stemmers
    - Snowball stemmers for German, Dutch, Norwegian, Swedish, and Danish
    - CLEF German, German plus, and Swedish stemmers
    - Caumann's German stemmer
    - UEA-Lite Stemmer
    - Paice-Husk Stemmer
    - Schinke Latin stemmer
    - S stemmer
- String Fingerprints
    - string fingerprint
    - q-gram fingerprint
    - phonetic fingerprint
    - Pollock & Zomora's skeleton key
    - Pollock & Zomora's omission key
    - Cisłak & Grabowski's occurrence fingerprint
    - Cisłak & Grabowski's occurrence halved fingerprint
    - Cisłak & Grabowski's count fingerprint
    - Cisłak & Grabowski's position fingerprint
    - Synoname Toolcode

-----

Installation
============

Required libraries:

- Numpy
- Six

Recommended libraries:

- PylibLZMA   (Python 2 only--for LZMA compression string distance metric)


To install Abydos (master) from Github source::

   git clone https://github.com/chrislit/abydos.git --recursive
   cd abydos
   python setup install

If your default python command calls Python 2.7 but you want to install for
Python 3, you may instead need to call::

   python3 setup install


To install Abydos (latest release) from PyPI using pip::

   pip install abydos

To install from `conda-forge <https://conda-forge.org/>`_::

   conda install abydos

It should run on Python 2.7 and Python 3.3-3.7.

Testing & Contributing
======================

To run the whole test-suite just call tox::

    tox

The tox setup has the following environments: py27, py36, doctest,
py27-regression, py36-regression, pylint, pycodestyle, flake8, doc8,
badges, docs, py27-fuzz, & py36-fuzz. So if only want to generate documentation
(in HTML, EPUB, & PDF formats), just call::

    tox -e docs

In order to only run & generate Flake8 reports, call::

    tox -e flake8

Contributions such as bug reports, PRs, suggestions, desired new features, etc.
are welcome through the Github Issues & Pull requests.


Release History
---------------


0.3.5 (2018-10-31) *cantankerous carl*
++++++++++++++++++++++++++++++++++++++

doi:10.5281/zenodo.1463204

Version 0.3.5 focuses on refactoring the whole project. The API itself remains
largely the same as in previous versions, but underlyingly modules have been
split up. Essentially no new features are added (bugfixes aside) in this
version.

Changes:

- Refactored library and tests into smaller modules
- Broke compression distances (NCD) out into separate functions
- Adopted Black code style
- Added pyproject.toml to use Poetry for packaging (but will continue using
  setuptools and setup.py for the present)
- Minor bug fixes


0.3.0 (2018-10-15) *carl*
+++++++++++++++++++++++++

doi:10.5281/zenodo.1462443

Version 0.3.0 focuses on additional phonetic algorithms, but does add numerous
distance measures, fingerprints, and even a few stemmers. Another focus was
getting everything to build again (including docs) and to move to more
standard modern tools (flake8, tox, etc.).

Changes:

- Fixed implementation of Bag distance
- Updated BMPM to version 3.10
- Fixed Sphinx documentation on readthedocs.org
- Split string fingerprints out of clustering into their own module
- Added support for q-grams to skip-n characters
- New phonetic algorithms:
   - Statistics Canada
   - Lein
   - Roger Root
   - Oxford Name Compression Algorithm (ONCA)
   - Eudex phonetic hash
   - Haase Phonetik
   - Reth-Schek Phonetik
   - FONEM
   - Parmar-Kumbharana
   - Davidson's Consonant Code
   - SoundD
   - PSHP Soundex/Viewex Coding
   - an early version of Henry Code
   - Norphone
   - Dolby Code
   - Phonetic Spanish
   - Spanish Metaphone
   - MetaSoundex
   - SoundexBR
   - NRL English-to-phoneme
- New string fingerprints:
   - Cisłak & Grabowski's occurrence fingerprint
   - Cisłak & Grabowski's occurrence halved fingerprint
   - Cisłak & Grabowski's count fingerprint
   - Cisłak & Grabowski's position fingerprint
   - Synoname Toolcode
- New distance measures:
   - Minkowski distance & similarity
   - Manhattan distance & similarity
   - Euclidean distance & similarity
   - Chebyshev distance & similarity
   - Eudex distances
   - Sift4 distance
   - Baystat distance & similarity
   - Typo distance
   - Indel distance
   - Synoname
- New stemmers:
   - UEA-Lite Stemmer
   - Paice-Husk Stemmer
   - Schinke Latin stemmer
- Eliminated ._compat submodule in favor of six
- Transitioned from PEP8 to flake8, etc.
- Phonetic algorithms now consistently use max_length=-1 to indicate that
  there should be no length limit
- Added example notebooks in binder directory


0.2.0 (2015-05-27) *berthold*
+++++++++++++++++++++++++++++

- Added Caumanns' German stemmer
- Added Lovins' English stemmer
- Updated Beider-Morse Phonetic Matching to 3.04
- Added Sphinx documentation


0.1.1 (2015-05-12) *albrecht*
+++++++++++++++++++++++++++++

- First Beta release to PyPI



Authors
```````

- Christopher C. Little (`@chrislit <https://github.com/chrislit>`_) <chrisclittle+abydos@gmail.com>


