- Docker images
- Enrichement
- Enrichement-Phase II - under construction
- Illustrations
- KD model
- Scrapper : Python project where the various spiders are implemented
- Script SQL : Script to build the content database
- Use Case A:
- Use Case A Widgets Demo : for demonstration of ipywidgets only, as part of deliverable D3.1. This is superseded by the next codes which are part of deliverable D3.2.
- Use Case A Query builder: Script towards a query builder, still based only on scraped content (the latest version from both Glossary articles and Statistics Explained articles).
- Use Case A Faceted search: Faceted search, with inputs from the database (SE articles) except from one file (scraped categories per article - these are in the process of being transferred to the knowledge database). Among others, the code assigns the majority of the SE articles to (possibly more than one) themes, sub-themes and categories.
- Use Case A Graphical exploration. Two applications for graphical exploration, one in R Shiny and another in MS Power BI. See separate description in this link. The description includes links to short documentations for the two applications.
Create the docker image using the docker-compose up docker-compose.yml. The docker-compose.yml is in the Docker Images folder.
In a browser go to http://localhost:8890
and on the Virtuoso frontend/GUI click on Conductor login using the username dba and the password defined in the docker-compose.yml file.
Go to System Admin/User accounts , to be able to launch SPARQL queries, please edit the user account for the 'dba' user as such :
You will find in the Script SQL folder various file that help buil the content database. You can go to the Datatbase/Interactive SQL tab.
If it is your first instantiation, please use the
If you are updating an existing database the needed scripts can be find in the
Some tables have to be filled in order for the project to work, such as:
- Named entities
- Modality
Like before, if it is your first instantiation of the database, please use the .
If it is an update, the scripts needed can be find in the
Once the database is set you can start launching the various
In order to gather the glossary instead of scrapping the data we used the bulkdownload option and created SQL queries from it.
Then the , in order to do it use the following Jupiter Notebook :
Finally, you can add the last queries :
As previously, we did not scrap the following datas, we first downloaded the raw and created SQL queries in order to fill the database.
The first step is to fill the and then using
launch each
.
At these stage, the dictionnary and code are all in the content database, however we found that we have to add somme code to the time dictionnary in order for our work on the datasets to work. You'll find the added elemnts
Then you can add some and then using
launch each
to add the links between the datasets and the dictionnaries.
Before populating the KDB, the ontology file must be added to the database. Go to Virtuoso Conductor/Linked Data/Quad Store Upload and load the NLP4Stat ontology by uploading the .owl file in https://github.com/eurostat/NLP4Stat/tree/main/KD%20model/. In the "Named Graph IRI*" field, write https://nlp4statref/knowledge/ontology/. This IRI will be used in the process of populating the KDB. A already added ontology can be deleted by going to Linked Data/Graphs/Graphs and click on Delete button associated to the ontology you wish to delete.
The KD_Population folder contains notebooks used for populating the knowledge database with elements stored in the content database, using SPARQL queries. A demo notebook is available to select elements contained in the KDB. The ESTAT_Populate_KDB notebooks contain the addition of all elements that are currently stored and mapped (i.e. relations are modeled). As the process does not include a verification step of the presence of a triplet before adding it, the notebook should be launched once. Do not hesitate to delete and add anew the ontology before populating it again using the notebook.
A knowledge graph can be created using the elements of the Knowledge_graph folder. A dedicated readme file is there.
In order to launch the various part of the project from a Windows environment, please follow the procedure described in Virtuoso Setup
The Virtuoso database is now set up. The first step is now to fill the content database with scrapped content. Please refer to the Scrapper folder.