Metadata-Version: 2.1
Name: abydos
Version: 0.3.0
Summary: Abydos NLP/IR library
Home-page: https://github.com/chrislit/abydos
Author: Christopher C. Little
Author-email: chrisclittle+abydos@gmail.com
License: GPLv3+
Download-URL: https://github.com/chrislit/abydos/archive/master.zip
Keywords: nlp,ai,ir,language,linguistics,phonetic algorithms,string distance
Platform: UNKNOWN
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Indexing
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Natural Language :: English
Requires-Dist: numpy
Requires-Dist: six

Abydos
======

+------------------+----------------------------------------------------+
| CI Status        | |travis| |circle| |appveyor| |semaphore|           |
+------------------+----------------------------------------------------+
| Code Quality     | |codeclimate| |scrutinizer|                        |
|                  | |codacy| |codefactor| |ebert|                      |
+------------------+----------------------------------------------------+
| Dependencies     | |requires| |snyk| |pyup|                           |
+------------------+----------------------------------------------------+
| Test Coverage    | |coveralls|                                        |
+------------------+----------------------------------------------------+
| Local Analysis   | |pylint| |pycodestyle| |flake8|                    |
+------------------+----------------------------------------------------+
| Usage            | |docs| |mybinder| |license| |sourcerank| |zenodo|  |
+------------------+----------------------------------------------------+
| Contribution     | |cii| |waffle| |openhub|                           |
+------------------+----------------------------------------------------+
| PyPI             | |pypi| |pypi-ver|                                  |
+------------------+----------------------------------------------------+
| conda-forge      | |conda| |conda-dl| |conda-platforms|               |
+------------------+----------------------------------------------------+

.. |travis| image:: https://travis-ci.org/chrislit/abydos.svg?branch=master
    :target: https://travis-ci.org/chrislit/abydos
    :alt: Travis-CI Build Status

.. |circle| image:: https://circleci.com/gh/chrislit/abydos/tree/master.svg?style=shield
    :target: https://circleci.com/gh/chrislit/abydos/tree/master
    :alt: Circle-CI Build Status

.. |appveyor| image:: https://ci.appveyor.com/api/projects/status/cwukqqsmogivcker/branch/master?svg=true
    :target: https://ci.appveyor.com/project/chrislit/abydos
    :alt: AppVeyor Build Status

.. |semaphore| image:: https://semaphoreci.com/api/v1/chrislit/abydos/branches/master/shields_badge.svg
    :target: https://semaphoreci.com/chrislit/abydos
    :alt: Semaphore Build Status

.. |codeclimate| image:: https://codeclimate.com/github/chrislit/abydos/badges/gpa.svg
   :target: https://codeclimate.com/github/chrislit/abydos
   :alt: Code Climate

.. |scrutinizer| image:: https://scrutinizer-ci.com/g/chrislit/abydos/badges/quality-score.png?b=master
    :target: https://scrutinizer-ci.com/g/chrislit/abydos/?branch=master
    :alt: Scrutinizer

.. |codacy| image:: https://api.codacy.com/project/badge/Grade/db79f2c31ea142fb9b5938abe87b0854
    :target: https://www.codacy.com/app/chrislit/abydos?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=chrislit/abydos&amp;utm_campaign=Badge_Grade
    :alt: Codacy

.. |codefactor| image:: https://www.codefactor.io/repository/github/chrislit/abydos/badge
    :target: https://www.codefactor.io/repository/github/chrislit/abydos
    :alt: CodeFactor

.. |ebert| image:: https://ebertapp.io/github/chrislit/abydos.svg
    :target: https://ebertapp.io/github/chrislit/abydos
    :alt: Ebert

.. |requires| image:: https://requires.io/github/chrislit/abydos/requirements.svg?branch=master
    :target: https://requires.io/github/chrislit/abydos/requirements/?branch=master
    :alt: Requirements Status

.. |snyk| image:: https://snyk.io/test/github/chrislit/abydos/badge.svg?targetFile=requirements.txt
    :target: https://snyk.io/test/github/chrislit/abydos?targetFile=requirements.txt
    :alt: Known Vulnerabilities

.. |pyup| image:: https://pyup.io/repos/github/chrislit/abydos/shield.svg
     :target: https://pyup.io/repos/github/chrislit/abydos/
     :alt: Updates

.. |coveralls| image:: https://coveralls.io/repos/github/chrislit/abydos/badge.svg?branch=master
    :target: https://coveralls.io/github/chrislit/abydos?branch=master
    :alt: Coverage Status

.. |pylint| image:: https://img.shields.io/badge/Pylint-9.55/10-green.svg
   :target: #
   :alt: Pylint Score

.. |pycodestyle| image:: https://img.shields.io/badge/pycodestyle-0-brightgreen.svg
   :target: #
   :alt: pycodestyle Errors

.. |flake8| image:: https://img.shields.io/badge/flake8-40-yellowgreen.svg
   :target: #
   :alt: flake8 Errors

.. |docs| image:: https://readthedocs.org/projects/abydos/badge/?version=latest
    :target: https://abydos.readthedocs.org/en/latest/
    :alt: Documentation Status

.. |mybinder| image:: https://mybinder.org/badge.svg
    :target: https://mybinder.org/v2/gh/chrislit/abydos/master?filepath=binder
    :alt: Binder

.. |license| image:: https://img.shields.io/badge/License-GPL%20v3-blue.svg
    :target: https://www.gnu.org/licenses/gpl-3.0
    :alt: License: GPL v3

.. |sourcerank| image:: https://img.shields.io/librariesio/sourcerank/pypi/abydos.svg
    :target: https://libraries.io/pypi/abydos
    :alt: Libraries.io SourceRank

.. |zenodo| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.1462285.svg
   :target: https://doi.org/10.5281/zenodo.1462285
   :alt: Zenodo

.. |cii| image:: https://bestpractices.coreinfrastructure.org/projects/1598/badge
    :target: https://bestpractices.coreinfrastructure.org/projects/1598
    :alt: CII Best Practices

.. |waffle| image:: https://badge.waffle.io/chrislit/abydos.svg?columns=To%20Do,In%20Progress
    :target: https://waffle.io/chrislit/abydos
    :alt: 'Waffle.io - Columns and their card count'

.. |openhub| image:: https://www.openhub.net/p/abydosnlp/widgets/project_thin_badge.gif
    :target: https://www.openhub.net/p/abydosnlp
    :alt: OpenHUB

.. |pypi| image:: https://img.shields.io/pypi/v/abydos.svg
    :target: https://pypi.python.org/pypi/abydos
    :alt: PyPI

.. |pypi-ver| image:: 	https://img.shields.io/pypi/pyversions/abydos.svg
    :target: https://pypi.python.org/pypi/abydos
    :alt: PyPI versions

.. |conda| image:: https://img.shields.io/conda/vn/conda-forge/abydos.svg
    :target: https://anaconda.org/conda-forge/abydos
    :alt: conda-forge

.. |conda-dl| image:: 	https://img.shields.io/conda/dn/conda-forge/abydos.svg
    :target: https://anaconda.org/conda-forge/abydos
    :alt: conda-forge downloads

.. |conda-platforms| image:: https://img.shields.io/conda/pn/conda-forge/abydos.svg
    :target: https://anaconda.org/conda-forge/abydos
    :alt: conda-forge platforms

|

.. image:: https://raw.githubusercontent.com/chrislit/abydos/master/abydos-small.png
    :alt: abydos
    :align: right

|
| Abydos NLP/IR library
| Copyright 2014-2018 by Christopher C. Little

Abydos is a library of phonetic algorithms, string distance measures & metrics,
stemmers, and string fingerprinters including:

- Phonetic algorithms
    - Robert C. Russell's Index
    - American Soundex
    - Refined Soundex
    - Daitch-Mokotoff Soundex
    - Kölner Phonetik
    - NYSIIS
    - Match Rating Algorithm
    - Metaphone
    - Double Metaphone
    - Caverphone
    - Alpha Search Inquiry System
    - Fuzzy Soundex
    - Phonex
    - Phonem
    - Phonix
    - SfinxBis
    - phonet
    - Standardized Phonetic Frequency Code
    - Statistics Canada
    - Lein
    - Roger Root
    - Oxford Name Compression Algorithm (ONCA)
    - Eudex phonetic hash
    - Haase Phonetik
    - Reth-Schek Phonetik
    - FONEM
    - Parmar-Kumbharana
    - Davidson's Consonant Code
    - SoundD
    - PSHP Soundex/Viewex Coding
    - an early version of Henry Code
    - Norphone
    - Dolby Code
    - Phonetic Spanish
    - Spanish Metaphone
    - MetaSoundex
    - SoundexBR
    - NRL English-to-phoneme
    - Beider-Morse Phonetic Matching
- String distance metrics
    - Levenshtein distance
    - Optimal String Alignment distance
    - Levenshtein-Damerau distance
    - Hamming distance
    - Tversky index
    - Sørensen–Dice coefficient & distance
    - Jaccard similarity coefficient & distance
    - overlap similarity & distance
    - Tanimoto coefficient & distance
    - Minkowski distance & similarity
    - Manhattan distance & similarity
    - Euclidean distance & similarity
    - Chebyshev distance
    - cosine similarity & distance
    - Jaro distance
    - Jaro-Winkler distance (incl. the strcmp95 algorithm variant)
    - Longest common substring
    - Ratcliff-Obershelp similarity & distance
    - Match Rating Algorithm similarity
    - Normalized Compression Distance (NCD) & similarity
    - Monge-Elkan similarity & distance
    - Matrix similarity
    - Needleman-Wunsch score
    - Smither-Waterman score
    - Gotoh score
    - Length similarity
    - Prefix, Suffix, and Identity similarity & distance
    - Modified Language-Independent Product Name Search (MLIPNS) similarity &
      distance
    - Bag distance
    - Editex distance
    - Eudex distances
    - Sift4 distance
    - Baystat distance & similarity
    - Typo distance
    - Indel distance
    - Synoname
- Stemmers
    - the Lovins stemmer
    - the Porter and Porter2 (Snowball English) stemmers
    - Snowball stemmers for German, Dutch, Norwegian, Swedish, and Danish
    - CLEF German, German plus, and Swedish stemmers
    - Caumann's German stemmer
    - UEA-Lite Stemmer
    - Paice-Husk Stemmer
    - Schinke Latin stemmer
    - S stemmer
- String Fingerprints
    - string fingerprint
    - q-gram fingerprint
    - phonetic fingerprint
    - Pollock & Zomora's skeleton key
    - Pollock & Zomora's omission key
    - Cisłak & Grabowski's occurrence fingerprint
    - Cisłak & Grabowski's occurrence halved fingerprint
    - Cisłak & Grabowski's count fingerprint
    - Cisłak & Grabowski's position fingerprint
    - Synoname Toolcode

-----

Installation
============

Required libraries:

- Numpy
- Six

Recommended libraries:

- PylibLZMA   (Python 2 only--for LZMA compression string distance metric)


To install Abydos (master) from Github source::

   git clone https://github.com/chrislit/abydos.git --recursive
   cd abydos
   python setup install

If your default python command calls Python 2.7 but you want to install for
Python 3, you may instead need to call::

   python3 setup install


To install Abydos (latest release) from PyPI using pip::

   pip install abydos

To install from `conda-forge <https://conda-forge.org/>`_::

   conda install abydos

It should run on Python 2.7 and Python 3.3-3.7.

Testing & Contributing
======================

To run the whole test-suite just call tox::

    tox

The tox setup has the following environments: py27, py36, doctest,
py27-regression, py36-regression, pylint, pycodestyle, flake8, doc8,
badges, docs, py27-fuzz, & py36-fuzz. So if only want to generate documentation
(in HTML, EPUB, & PDF formats), just call::

    tox -e docs

In order to only run & generate Flake8 reports, call::

    tox -e flake8

Contributions such as bug reports, PRs, suggestions, desired new features, etc.
are welcome through the Github Issues & Pull requests.


Release History
---------------

0.3.0 (2018-10-15)
++++++++++++++++++

- Fixed implementation of Bag distance
- Updated BMPM to version 3.10
- Fixed Sphinx documentation on readthedocs.org
- Split string fingerprints out of clustering into their own module
- Added support for q-grams to skip-n characters
- New phonetic algorithms:
   - Statistics Canada
   - Lein
   - Roger Root
   - Oxford Name Compression Algorithm (ONCA)
   - Eudex phonetic hash
   - Haase Phonetik
   - Reth-Schek Phonetik
   - FONEM
   - Parmar-Kumbharana
   - Davidson's Consonant Code
   - SoundD
   - PSHP Soundex/Viewex Coding
   - an early version of Henry Code
   - Norphone
   - Dolby Code
   - Phonetic Spanish
   - Spanish Metaphone
   - MetaSoundex
   - SoundexBR
   - NRL English-to-phoneme
- New string fingerprints:
   - Cisłak & Grabowski's occurrence fingerprint
   - Cisłak & Grabowski's occurrence halved fingerprint
   - Cisłak & Grabowski's count fingerprint
   - Cisłak & Grabowski's position fingerprint
   - Synoname Toolcode
- New distance measures:
   - Minkowski distance & similarity
   - Manhattan distance & similarity
   - Euclidean distance & similarity
   - Chebyshev distance & similarity
   - Eudex distances
   - Sift4 distance
   - Baystat distance & similarity
   - Typo distance
   - Indel distance
   - Synoname
- New stemmers:
   - UEA-Lite Stemmer
   - Paice-Husk Stemmer
   - Schinke Latin stemmer
- Eliminated ._compat submodule in favor of six
- Transitioned from PEP8 to flake8, etc.
- Phonetic algorithms now consistently use max_length=-1 to indicate that
  there should be no length limit
- Added example notebooks in binder directory


0.2.0 (2015-05-27)
++++++++++++++++++

- Added Caumanns' German stemmer
- Added Lovins' English stemmer
- Updated Beider-Morse Phonetic Matching to 3.04
- Added Sphinx documentation


0.1.1 (2015-05-12)
++++++++++++++++++

- First Beta release to PyPI



Authors
```````

- Christopher C. Little (`@chrislit <https://github.com/chrislit>`_) <chrisclittle+abydos@gmail.com>


