-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Labels
good first tickethelp wantedsize: easystatus: wipWork is in-progress / has already been partially completedWork is in-progress / has already been partially completed
Milestone
Description
Describe the bug
Attempting to archive https://www.abc.net.au/news/2021-08-12/fast-fashion-turning-parts-ghana-into-toxic-landfill/100358702 results in the process aborting entirely, throwing an exception instead of continuing with an error. This hints at some error checking not done thoroughly enough.
Steps to reproduce
- Ran ArchiveBox with the following config:
[SERVER_CONFIG]
SECRET_KEY = [REDACTED]
[ARCHIVE_METHOD_OPTIONS]
RESOLUTION = 1440,4320
YOUTUBEDL_BINARY = /usr/local/bin/yt-dlp
[GENERAL_CONFIG]
TIMEOUT = 1200
and the command
archivebox add https://www.abc.net.au/news/2021-08-12/fast-fashion-turning-parts-ghana-into-toxic-landfill/100358702
- Relevant output:
[√] [2021-09-14 00:54:24] "Dead white man's clothes: How fast fashion is turning parts of Ghana into toxic landfill - ABC News"
https://www.abc.net.au/news/2021-08-12/fast-fashion-turning-parts-ghana-into-toxic-landfill/100358702
√ ./archive/1631453820.320194
> readability
! Failed to archive link: Exception: Exception in archive_methods.save_readability(Link(url=https://www.abc.net.au/news/2021-08-12/fast-fashion-turning-parts-ghana-into-toxic-landfill/100358702))
Traceback (most recent call last):
File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/extractors/__init__.py", line 114, in archive_link
log_archive_method_finished(result)
File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/logging_util.py", line 435, in log_archive_method_finished
hints = hints if isinstance(hints, (list, tuple)) else hints.split('\n')
TypeError: a bytes-like object is required, not 'str'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/archivebox/.local/bin/archivebox", line 8, in <module>
sys.exit(main())
File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/cli/__init__.py", line 140, in main
run_subcommand(
File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/cli/__init__.py", line 80, in run_subcommand
module.main(args=subcommand_args, stdin=stdin, pwd=pwd) # type: ignore
File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/cli/archivebox_update.py", line 119, in main
update(
File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/util.py", line 114, in typechecked_function
return func(*args, **kwargs)
File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/main.py", line 783, in update
archive_links(to_archive, overwrite=overwrite, **archive_kwargs)
File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/util.py", line 114, in typechecked_function
return func(*args, **kwargs)
File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/extractors/__init__.py", line 181, in archive_links
archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir))
File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/util.py", line 114, in typechecked_function
return func(*args, **kwargs)
File "/home/archivebox/.local/lib/python3.8/site-packages/archivebox/extractors/__init__.py", line 130, in archive_link
raise Exception('Exception in archive_methods.save_{}(Link(url={}))'.format(
Exception: Exception in archive_methods.save_readability(Link(url=https://www.abc.net.au/news/2021-08-12/fast-fashion-turning-parts-ghana-into-toxic-landfill/100358702))
ArchiveBox version
ArchiveBox v0.6.2
Cpython FreeBSD FreeBSD-13.0-RELEASE-p4-amd64-64bit-ELF amd64
IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep
[i] Dependency versions:
√ ARCHIVEBOX_BINARY v0.6.2 valid /usr/home/archivebox/.local/bin/archivebox
√ PYTHON_BINARY v3.8.10 valid /usr/local/bin/python3.8
√ DJANGO_BINARY v3.1.13 valid /usr/home/archivebox/.local/lib/python3.8/site-packages/django/bin/django-admin.py
√ CURL_BINARY v7.78.0 valid /usr/local/bin/curl
√ WGET_BINARY v1.21 valid /usr/local/bin/wget
√ NODE_BINARY v14.17.0 valid /usr/local/bin/node
√ SINGLEFILE_BINARY v0.3.29 valid ./node_modules/single-file/cli/single-file
√ READABILITY_BINARY v0.0.3 valid ./node_modules/readability-extractor/readability-extractor
√ MERCURY_BINARY v1.0.0 valid ./node_modules/@postlight/mercury-parser/cli.js
√ GIT_BINARY v2.32.0 valid /usr/local/bin/git
√ YOUTUBEDL_BINARY v2021.06.09 valid /usr/local/bin/yt-dlp
√ CHROME_BINARY v92.0.4515.159 valid /usr/local/bin/chrome
√ RIPGREP_BINARY v13.0.0 valid /usr/local/bin/rg
[i] Source-code locations:
√ PACKAGE_DIR 23 files valid /usr/home/archivebox/.local/lib/python3.8/site-packages/archivebox
√ TEMPLATES_DIR 3 files valid /usr/home/archivebox/.local/lib/python3.8/site-packages/archivebox/templates
- CUSTOM_TEMPLATES_DIR - disabled
[i] Secrets locations:
- CHROME_USER_DATA_DIR - disabled
- COOKIES_FILE - disabled
[i] Data locations:
√ OUTPUT_DIR 9 files valid /var/db/archivebox
√ SOURCES_DIR 48 files valid ./sources
√ LOGS_DIR 1 files valid ./logs
√ ARCHIVE_DIR 1474 files valid ./archive
√ CONFIG_FILE 861.0 Bytes valid ./ArchiveBox.conf
√ SQL_INDEX 13.3 MB valid ./index.sqlite3
FliegendeWurst
Metadata
Metadata
Assignees
Labels
good first tickethelp wantedsize: easystatus: wipWork is in-progress / has already been partially completedWork is in-progress / has already been partially completed