-
-
Notifications
You must be signed in to change notification settings - Fork 28
Open
Labels
questionFurther information is requestedFurther information is requested
Description
I noticed that htmldate utilizes the find_date function, which internally relies on examine_header.
Does it make sense to parse the response header from the server? Do servers typically default this to the current date?
Here’s an example where this date is extracted: '2024-12-02'...
from htmldate import find_date
find_date(
"https://octopus.energy/blog/agile-octopus-bigger-story/",
original_date=True,
extensive_search=True,
)
But the published at is actually...
If I comment lines on examine_header we do extract out the correct date (2022-12-13) during # last resort
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested