Metadata-Version: 2.4
Name: 404_finder
Version: 0.1.0
Summary: A tool to find broken links (404s) and oversized pages in websites
Project-URL: Homepage, https://github.com/singiamtel/404-finder
Project-URL: Bug Tracker, https://github.com/singiamtel/404-finder/issues
Author-email: Sergio Garcia <sergio@garciadelacruz.es>
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.7
Requires-Dist: beautifulsoup4>=4.9.0
Requires-Dist: requests>=2.25.0
Description-Content-Type: text/markdown

# 404 Finder

A Python tool to crawl websites and find broken links (404s) and oversized pages.

## Installation

```bash
pip install 404-finder
```

## Usage

```bash
404-finder example.com [options]
```

### Options

- `--max-size BYTES`: Maximum allowed size in bytes for any page
- `--workers N`: Number of parallel workers (default: 10)
- `--max-depth N`: Maximum depth to crawl (default: no limit)
- `--verbose`: Enable verbose logging

### Example

```bash
# Check for broken links on example.com
404-finder example.com

# Check for broken links and pages larger than 1MB
404-finder example.com --max-size 1000000

# Crawl with 20 parallel workers and verbose logging
404-finder example.com --workers 20 --verbose
```

### Output

The tool generates a JSONL file named `result_domain.jsonl` containing details about each URL visited, including:
- URL
- Status code
- Page size
- Referrer (the page that linked to this URL)

## Features

- Parallel crawling with configurable number of workers
- Finds broken links (HTTP status codes 4xx and 5xx)
- Checks page sizes against a configurable limit
- Follows redirects while staying within the same domain
- Handles both internal and external links
- Generates detailed JSONL reports 