Metadata-Version: 2.1
Name: accp
Version: 0.0.1
Summary: A pakage for crawling and processing audio, caption from Youtube
Home-page: https://github.com/zldzmfoq12/aud_crawler
Author: Seugnhun Jeong
Author-email: zldzmfoq12@naver.com
License: UNKNOWN
Download-URL: https://github.com/zldzmfoq12/aud_crawler/archive/0.0.tar.gz
Keywords: aud_crawler
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown

Audio, Caption Crawler and Processor
=====================================


##### Downloads and processes the audios and captions(subtitles) from Youtube videos for Speech AI



Requirements
-------------

* Currently requires python >= 3.6
* FFmpeg

To Use
--------

      from accp import ACCP

      playlist_name=""
      playlist_url = ""

      accp = ACCP(playlist_name, playlist_url)
      accp.download_audio()    #download audio from youtube

      accp.download_caption()  #download captions from youtube

      accp.audio_split()       #split 


Results
----------

      datasets
        |- playlist name
            |- metadata.csv
            |- alignment.json
            |- wavs
                 ├── 1.wav
                 ├── 2.wav
                 ├── 3.wav
                 └── ...


   and `metadata.csv` should look like:

    {
        0001.wav|그래서 사람들도 날 핍이라고 불렀다.,
        0002.wav|크리스마스 덕분에 부엌에 먹을게 가득했다.,
        0003.wav|조가 자신이 그 사람이라고 나섰다.,
        ...
    }

   and `alignment.json` should look like:

    {
        "./datasets/playlist name/wavs/0001.wav": "그래서 사람들도 날 핍이라고 불렀다.",
        "./datasets/playlist name/wavs/0002.wav": "크리스마스 덕분에 부엌에 먹을게 가득했다.",
        "./datasets/playlist name/wavs/0003.wav": "조가 자신이 그 사람이라고 나섰다.",
    }



