This is a copyright violation detector web tool for Wikipedia articles running on Wikimedia Cloud Services at copyvios.toolforge.org.
It can search the web for content similar to a given article, and graphically compare an article to specific URLs. Some technical details are expanded upon in a blog post, though much of it is outdated.
-
If using Toolforge, clone the repository to
~/www/python/src, or otherwise symlink it to that directory. -
Create a virtual environment and install the dependencies. On Toolforge, this should be in
~/www/python/venv, otherwise it can be in a subdirectory of the git project named.venv:python3 -m venv .venv . .venv/bin/activate pip install -e . -
If you intend to modify CSS or JS, install the frontend dependencies:
npm install -g uglify-js cssnano postcss postcss-cli -
Create an SQL database with the tables defined by
schema.sql. -
Create an earwigbot instance in
.earwigbot(runearwigbot .earwigbot). In.earwigbot/config.yml, fill out the connection info for the database by adding the following to thewikisection:copyvios: oauth: consumer_token: <oauth consumer token> consumer_secret: <oauth consumer secret> sql: engine: mysql host: <hostname of database server> db: <name of database>
-
Run
maketo minify JS and CSS files after making any frontend changes. -
Start your WSGI server pointing to app:app. For production, uWSGI or Gunicorn are likely good options. For development, use
flask run.