Skip to content
This repository was archived by the owner on Sep 20, 2024. It is now read-only.

fccn/nau-site-map-exporter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Site map exporter for NAU project

This repository contains a python script that reads a site map, extract its URLs.

It was need some customization code because NAU STAGE environment has a basic authentication access to prevent web search engines to index that data.

Installation

Create a virtual environment.

virtualenv venv --python=python3
. venv/bin/activate

Install the package requirements in the virtual environment.

pip install -r requirements.txt

Parameters

Parameter Required Description
url True
--user False
--pass False
--remove_host if passsed it removes the protocol and hostname on the output

Execution

For WordPress the sitemap is located on /sitemap_index.xml but on Richie it's located on /sitemap.xml. Example:

Export STAGE environment that has Richie:

python export.py https://www.stage.nau.fccn.pt/sitemap.xml --user <USER> --password <PASSWORD> --remove_host true > stage.txt

Export PROD environment that has WordPress:

python export.py https://www.nau.edu.pt/sitemap_index.xml --remove_host true > prod.txt

Then you can use a comparation program, like diff, meld, etc. to compare both files.

About

A sitemap utility script that exports a sitemap from an URL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages