poprawki do parsowania autora

This commit is contained in:
MirSob
2026-04-26 00:46:48 +02:00
parent 954821ccc3
commit b5c2929062
3 changed files with 1259 additions and 199 deletions

112
VERIFICATION.md Normal file
View File

@@ -0,0 +1,112 @@
# Verification Workflow
Use `data/verified_author_overrides.tsv` for manual metadata corrections.
## Using `generate_abs_mock_report.py`
The script generates a non-destructive TSV report with proposed Audiobookshelf paths.
It does not rename or move files.
What it does:
- scans the audiobook library tree
- detects audiobook roots based on audio files
- tries to infer author, title, series, sequence, year, and narrator from folder names and sidecar OPF files
- applies manual corrections from `data/verified_author_overrides.tsv`
- writes a TSV report with proposed target paths for Audiobookshelf
What it does not do:
- does not rename files
- does not move directories
- does not modify the library itself
Basic usage:
```bash
python3 generate_abs_mock_report.py
```
Default behavior:
- reads the library from `/mnt/nextcloudExtDS/Ksiazki/Audiobooki`
- writes the report to `reports/audiobookshelf_mock_report.tsv`
- applies manual corrections from `data/verified_author_overrides.tsv`
Available options:
```bash
python3 generate_abs_mock_report.py --help
```
```text
--root ROOT Path to the current audiobook library
--output OUTPUT TSV output path
--overrides OVERRIDES Optional TSV with verified metadata overrides
```
Examples:
```bash
python3 generate_abs_mock_report.py \
--root /mnt/nextcloudExtDS/Ksiazki/Audiobooki
```
```bash
python3 generate_abs_mock_report.py \
--root /path/to/library \
--output reports/custom_report.tsv \
--overrides data/verified_author_overrides.tsv
```
Typical workflow:
1. Run `python3 generate_abs_mock_report.py`.
2. Open `reports/audiobookshelf_mock_report.tsv`.
3. Review rows with `status=review` first, then ambiguous `unverified` rows.
4. Add confirmed metadata to `data/verified_author_overrides.tsv`.
5. Run the script again to regenerate the report with overrides applied.
What the script prints after completion:
- `library_root` used for the scan
- `report` path to the generated TSV
- `books` number of detected audiobook roots
- `ready` rows with enough metadata to propose a target path
- `review` rows that still need manual verification
Main output file:
- `reports/audiobookshelf_mock_report.tsv`
Important columns in the TSV:
- `status`
- `current_path`
- `author`
- `series`
- `sequence`
- `title`
- `proposed_abs_path`
- `notes`
- `verification_status`
- `verification_source`
How to read the main status fields:
- `status=ready` means the row has enough metadata to build a proposed target path.
- `status=review` means the row still needs manual verification.
- `verification_status=unverified` means no manual override was applied yet.
- `verification_status=verified_web` means the row was corrected or confirmed from a web source stored in `verification_source`.
Notes about paths:
- `current_path` is the detected source folder in the current library.
- `proposed_abs_path` is the suggested logical Audiobookshelf path relative to the author/series/title structure.
- The script creates the parent directory for the output TSV automatically if it does not exist.
Source preference:
- Prefer a direct audiobook/store/catalog page when it clearly confirms the metadata.
- `lubimyczytac.pl` is an approved auxiliary source for verifying author, title, and series/cycle names.
- Use `lubimyczytac.pl` especially when path-derived guesses are ambiguous or when storefront metadata is incomplete.
Recommended fields to confirm:
- author
- title
- series
- sequence
When adding an override:
- Put the confirming page URL in `verification_source`.
- Keep the note in `verification_note` short and only add it when it explains a correction or ambiguity.