Elasticsearch-powered search engine for looking for charities and other non-profit organisations. Allows for:
- importing data nearly 20 sources in the UK, ensuring that duplicates are matched to one record.
- An elasticsearch index that can be queried.
- Org-ids are added to organisations.
- Reconciliation API for searching organisations, based on an optimised search query.
- Facility for uploading a CSV of charity names and adding the (best guess) at a charity number.
- HTML pages for searching for a charity
- Clone repository
- Create virtual environment (
python -m venv env) - Activate virtual environment (
env/bin/activateorenv/Scripts\activate) - Install requirements (
pip install -r requirements.txt) - Install postgres
- Start postgres
- Create 2 postgres databases - one for admin (eg
ftc_adminand one for data egftc_data) - Install elasticsearch 7 - you may need to increase available memory (see below)
- Start elasticsearch
- Create
.envfile in root directory. Contents based on.env.example. - Create the database tables (
python ./manage.py migrate --database=data && python ./manage.py migrate --database=admin && python ./manage.py createcachetable --database=admin) - Import data on charities (
python ./manage.py import_charities) - Import data on nonprofit companies (
python ./manage.py import_ch) - Import data on other non-profit organisations (
python ./manage.py import_all) - Add organisations to elasticsearch index (
python ./manage.py es_index) - (Don't use the defaultsearch_indexcommand as this won't setup aliases correctly)
SSH into server and run:
# create app
dokku apps:create ftc
# postgres
sudo dokku plugin:install https://github.com/dokku/dokku-postgres.git postgres
dokku postgres:create ftc-db-data
dokku postgres:link ftc-db-data ftc --alias "DATABASE_URL"
dokku postgres:create ftc-db-admin
dokku postgres:link ftc-db-admin ftc --alias "DATABASE_ADMIN_URL"
# elasticsearch
sudo dokku plugin:install https://github.com/dokku/dokku-elasticsearch.git elasticsearch
echo 'vm.max_map_count=262144' | sudo tee -a /etc/sysctl.conf; sudo sysctl -p
export ELASTICSEARCH_IMAGE="elasticsearch"
export ELASTICSEARCH_IMAGE_VERSION="7.7.1"
dokku elasticsearch:create ftc-es
dokku elasticsearch:link ftc-es ftc
# configure elasticsearch 7:
# https://github.com/dokku/dokku-elasticsearch/issues/72#issuecomment-510771763
# setup elasticsearch increased memory (might be needed)
nano /var/lib/dokku/services/elasticsearch/ftc-es/config/jvm.options
# replace `-Xms512m` with `-Xms2g`
# replace `-Xms512m` with `-Xmx2g`
# restart elasticsearch
dokku elasticsearch:restart ftc-es
# Redirect
dokku plugin:install https://github.com/dokku/dokku-redirect.git
dokku redirect:set ftc www.findthatcharity.uk findthatcharity.uk
# SSL
sudo dokku plugin:install https://github.com/dokku/dokku-letsencrypt.git
dokku letsencrypt:set ftc email your@email.tld
dokku letsencrypt:enable ftc
dokku letsencrypt:cron-job --addOn local machine:
git remote add dokku dokku@SERVER_HOST:ftc
git push dokku mainOn Dokku server run:
# setup
dokku run ftc python ./manage.py migrate --database=data
dokku run ftc python ./manage.py migrate --database=admin
dokku run ftc python ./manage.py createcachetable --database=admin
# run import
dokku run ftc python ./manage.py charity_setup
dokku run ftc python ./manage.py import_oscr
dokku run ftc python ./manage.py import_charities
dokku run ftc python ./manage.py import_ch
dokku run ftc python ./manage.py import_other_data
dokku run ftc python ./manage.py import_all
dokku run ftc python ./manage.py es_indexThe server uses django. Run it with the following command:
python ./manage.py runserver
The server offers the following API endpoints:
-
/reconcile: a reconciliation service API conforming to the OpenRefine reconciliation API specification. -
/charity/12345: Look up information about a particular charity
Priorities:
- tests for ensuring data is correctly imported
- server tests
- use results of
server/recon_test.pyto produce the best reconciliation search query for use in the server (recon_test_7seems the best at the moment) - threshold for when to use the result vs discard
Future development:
- upload a CSV file and reconcile each row with a charity
- allow updating a charity with additional possible names
coverage run pytest && coverage html
python -m http.server -d htmlcov --bind 127.0.0.1 8001