Welcome to Omnipedia, a sophisticated tool designed to parse and evaluate Wikipedia articles against a predefined taxonomy using advanced language models. This README provides a comprehensive explanation of the codebase, including file names, class structures (ontology), and a high-level system diagram to help you get up to speed as quickly as possible.
Omnipedia automates the process of:
- Processing a Taxonomy: Loads and processes the taxonomy to extract actionable writing requirements.
- Parsing Articles: Converts Wikipedia articles into a structured format for analysis.
- Evaluating Articles: Assesses articles against the extracted taxonomy requirements using a language model (OpenAI's GPT-4).
- Providing Feedback: Generates scores and feedback for each section of the article based on adherence to the taxonomy.
The project consists of multiple Python scripts organized into components:
project_root/
├── components/
│ ├── style_guides/
│ │ ├── __init__.py
│ │ ├── style_guide_loader.py
│ │ └── requirements_processor.py
│ ├── articulator/
│ │ ├── __init__.py
│ │ └── articulator.py
│ ├── evaluator/
│ │ ├── __init__.py
│ │ └── evaluator.py
│ ├── library/
│ │ ├── __init__.py
│ │ ├── text_processing.py
│ │ └── evaluation_functions.py
│ ├── storage/
│ │ ├── __init__.py
│ │ ├── decomposed_content.py
│ │ └── evaluation_results.py
│ └── viewer_editor/
│ ├── __init__.py
│ └── viewer.py
├── utils/
│ ├── __init__.py
│ └── config.py
├── main.py
├── taxonomy.json
├── article.txt
├── requirements.txt
└── README.md
The project relies on the following Python packages:
aiofilesmwparserfromhellnltkopenailogging
Ensure all dependencies are installed:
pip install -r requirements.txtThe config.py file in the utils directory handles configuration:
- TAXONOMY_LABELS: Predefined labels for article sections.
- Logging Setup: Configures logging level and format.
Custom exceptions are defined in various components for better error management.
The code utilizes custom classes for data management. Key classes include:
+------------------------+
| Requirement |
+------------------------+
| name |
| description |
| applicability |
+------------------------+
+-------------------+
| EvaluatedSentence |
+-------------------+
| sentence |
| section |
| title |
| score |
| color |
| feedback |
+-------------------+
+-------------+
| ArticleNode |
+-------------+
| title |
| content |
| section_type|
+-------------+
Loads the taxonomy from a JSON file.
Parses Wikipedia articles into structured sections.
Evaluates parsed article sections against the taxonomy requirements.
Handles communication with the language model API.
Visualizes evaluation results.
-
Initialization:
- Load settings and initialize logging.
- Set up the OpenAI client with the specified API key.
-
Taxonomy Loading:
- Use
StyleGuideLoaderto load the taxonomy (taxonomy.json).
- Use
-
Article Parsing:
- Use
Articulatorto parse the article (article.txt). - Generate a structured representation of the article's sections.
- Use
-
Article Evaluation:
- Use
Evaluatorto evaluate each section of the article. - Generate scores and feedback based on adherence to the taxonomy.
- Use
-
Output:
- Use
Viewerto display the evaluation results for each section. - Save evaluation results to a JSON file.
- Use
+----------------+ +--------------------+ +-----------------+
| | | | | |
| taxonomy.json | --> | StyleGuideLoader | --> | Loaded Taxonomy |
| | | | | |
+----------------+ +--------------------+ +-----------------+
|
+----------------+ +---------------+ |
| | | | |
| article.txt | --> | Articulator | |
| | | | |
+----------------+ +---------------+ |
| |
v v
+--------------------------------+
| |
| Evaluator |
| (uses EvaluationFunctions API) |
| |
+--------------------------------+
|
v
+--------------------+
| |
| Evaluation Results |
| |
+--------------------+
|
v
+--------------------+
| |
| Viewer |
| |
+--------------------+
-
Set Up Environment:
- Ensure all dependencies are installed.
- Set your OpenAI API key as an environment variable:
export OPENAI_API_KEY='your-api-key'
-
Prepare Files:
- Place your taxonomy in
taxonomy.json. - Place the article to be evaluated in
article.txt.
- Place your taxonomy in
-
Execute the Script:
python main.py
-
View Results:
- Evaluation results will be displayed in the console.
- Results will also be saved to
evaluation_results.json.
Omnipedia automates the evaluation of Wikipedia articles against a predefined taxonomy using advanced language models. By parsing both the taxonomy and the article, it provides detailed feedback on adherence, helping improve the quality and consistency of Wikipedia content.
Note: Ensure that the OpenAI API key is correctly set up in your environment before running the script.