NLP4Stat

Project organisation

Docker images
Enrichement
Enrichement-Phase II - under construction
Illustrations
KD model
Scrapper : Python project where the various spiders are implemented
Script SQL : Script to build the content database
Use Case A:
- Use Case A Widgets Demo : for demonstration of ipywidgets only, as part of deliverable D3.1. This is superseded by the next codes which are part of deliverable D3.2.
- Use Case A Query builder: Script towards a query builder, still based only on scraped content (the latest version from both Glossary articles and Statistics Explained articles).
- Use Case A Faceted search: Faceted search, with inputs from the database (SE articles) except from one file (scraped categories per article - these are in the process of being transferred to the knowledge database). Among others, the code assigns the majority of the SE articles to (possibly more than one) themes, sub-themes and categories.
- Use Case A Graphical exploration. Two applications for graphical exploration, one in R Shiny and another in MS Power BI. See separate description in this link. The description includes links to short documentations for the two applications.

Project instantiation

1. Docker image

Create the docker image using the docker-compose up docker-compose.yml. The docker-compose.yml is in the Docker Images folder.

2. Conmect to the Virtuoso docker image

In a browser go to http://localhost:8890 and on the Virtuoso frontend/GUI click on Conductor login using the username dba and the password defined in the docker-compose.yml file.

3. Virtuoso user parameters

Go to System Admin/User accounts , to be able to launch SPARQL queries, please edit the user account for the 'dba' user as such :

4. Content database

You will find in the Script SQL folder various file that help buil the content database. You can go to the Datatbase/Interactive SQL tab.

4.1 Structure

If it is your first instantiation, please use the

If you are updating an existing database the needed scripts can be find in the

4.2 Static data

Some tables have to be filled in order for the project to work, such as:

Named entities
Modality

4.2.1 Statistics Explained Data

Like before, if it is your first instantiation of the database, please use the .

If it is an update, the scripts needed can be find in the

Once the database is set you can start launching the various

4.2.2 Eurostat glossary

In order to gather the glossary instead of scrapping the data we used the bulkdownload option and created SQL queries from it.

First the have to be launch.

Then the , in order to do it use the following Jupiter Notebook :

Finally, you can add the last queries :

4.2.3 Dictionnary and Datasets

As previously, we did not scrap the following datas, we first downloaded the raw and created SQL queries in order to fill the database.

The first step is to fill the and then using launch each .

At these stage, the dictionnary and code are all in the content database, however we found that we have to add somme code to the time dictionnary in order for our work on the datasets to work. You'll find the added elemnts

Then you can add some and then using launch each to add the links between the datasets and the dictionnaries.

5. Knowledge database

5.1 Loading and deleting ontologies

Before populating the KDB, the ontology file must be added to the database. Go to Virtuoso Conductor/Linked Data/Quad Store Upload and load the NLP4Stat ontology by uploading the .owl file in https://github.com/eurostat/NLP4Stat/tree/main/KD%20model/. In the "Named Graph IRI*" field, write https://nlp4statref/knowledge/ontology/. This IRI will be used in the process of populating the KDB. A already added ontology can be deleted by going to Linked Data/Graphs/Graphs and click on Delete button associated to the ontology you wish to delete.

5.2 Knowledge database population

The KD_Population folder contains notebooks used for populating the knowledge database with elements stored in the content database, using SPARQL queries. A demo notebook is available to select elements contained in the KDB. The ESTAT_Populate_KDB notebooks contain the addition of all elements that are currently stored and mapped (i.e. relations are modeled). As the process does not include a verification step of the presence of a triplet before adding it, the notebook should be launched once. Do not hesitate to delete and add anew the ontology before populating it again using the notebook.

5.3 Knowledge graph

A knowledge graph can be created using the elements of the Knowledge_graph folder. A dedicated readme file is there.

6. Virtuoso Bundle

In order to launch the various part of the project from a Windows environment, please follow the procedure described in Virtuoso Setup

The Virtuoso database is now set up. The first step is now to fill the content database with scrapped content. Please refer to the Scrapper folder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP4Stat

Project organisation

Project instantiation

1. Docker image

2. Conmect to the Virtuoso docker image

3. Virtuoso user parameters

4. Content database

4.1 Structure

4.2 Static data

4.2.1 Statistics Explained Data

4.2.2 Eurostat glossary

4.2.3 Dictionnary and Datasets

5. Knowledge database

5.1 Loading and deleting ontologies

5.2 Knowledge database population

5.3 Knowledge graph

6. Virtuoso Bundle

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 681 Commits
.binder		.binder
.vs		.vs
.vscode		.vscode
Content Database		Content Database
Docker Images		Docker Images
Enrichment-Phase II		Enrichment-Phase II
Enrichment		Enrichment
Illustrations		Illustrations
KD model		KD model
KD_Population		KD_Population
Knowledge_graph		Knowledge_graph
Use Case A Faceted search		Use Case A Faceted search
Use Case A Graphical exploration		Use Case A Graphical exploration
Use Case A Query builder		Use Case A Query builder
Use Case A Widgets Demo		Use Case A Widgets Demo
Virtuoso Setup		Virtuoso Setup
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

NLP4Stat

Project organisation

Project instantiation

1. Docker image

2. Conmect to the Virtuoso docker image

3. Virtuoso user parameters

4. Content database

4.1 Structure

4.2 Static data

4.2.1 Statistics Explained Data

4.2.2 Eurostat glossary

4.2.3 Dictionnary and Datasets

5. Knowledge database

5.1 Loading and deleting ontologies

5.2 Knowledge database population

5.3 Knowledge graph

6. Virtuoso Bundle

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages