0% found this document useful (0 votes)

73 views3 pages

Data Warehousing and Mining

This document outlines the coursework requirements for a Data Warehousing and Mining course. It consists of two parts. The first part involves designing and implementing a data warehouse for the UK National Health Service using MySQL. Students must define a star or snowflake schema, populate tables with dummy data, and write six SQL queries demonstrating the warehouse's capabilities. The second part involves applying classification and clustering techniques to datasets using WEKA software and analyzing the results. Students must compare algorithm performance, analyze how results relate to dataset properties, and provide recommendations on technique selection based on a dataset's features.

Uploaded by

Val

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views3 pages

Data Warehousing and Mining

Uploaded by

Val

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

DATA WAREHOUSING

AND MINING COURSEWORK

PDAM – 2018/2019

Unit Name: Data Warehousing and Mining Coursework

Unit Code: PDAM - U25764
Unit Coordinator:
Submission Deadline: Part I: Friday 14th December 2018 5:00pm
Part II: Friday 10th May 2019 5:00pm
Weight: Part I 40% - Part II 60%
Assesses: All learning outcomes.

IMPORTANT NOTE

The individual components in this coursework are NOT group work.

For individual and group components of this project, any unacknowledged copying of
either printed material or software from any other person or source (including
electronic media) constitutes plagiarism, which is a serious disciplinary offence; any
cases of plagiarism will be handled using the University disciplinary procedures.

Please ensure that your coursework is anonymous. Your NAME must not appear
anywhere on the coursework or the coversheet. Please use your ID only. Work to be
submitted online on Moodle.
First Submission: Due on Friday 14th December 2018 5:00pm

Designing and implementing a data warehouse for NHS

The aim of this project is to design and implement a data warehouse for the National Health Service
NHS using MySQL. The NHS consists of a number of treatment centers (e.g. hospitals, surgeries,
walk-in centers, etc.). Each center has a number of staff members (doctors, consultants, nurses etc.).
Each member has one occupation and one or specialties (e,g Occupation: Consultant, Specialties:
orthopedic, joint replacement). The NHS keeps track of its patients over time; diagnosis, the kind of
treatment they have received, the cost of each treatment etc. There are two types of patients;
Outpatient: a patient who is not hospitalized but visited a GP, hospital etc. Inpatient: a patient who has
been admitted to the hospital for day or more. Also, the NHS keeps track of the drugs/operations used
in each treatment (if any).

The following is a sample for the main tables in the operational database; this is just to show the core of
the case study. You should expand and improve the model where required.

Patient (Pa ent_ID, Pa ent_DoB, Pa ent_County, Pa ent_City, Pa ent_Ocupa on..)

Treatment_Unit (Unit_Code, Unit Type, Unit_County, Unit_City ….)
Staff (Staff ID, Staff_Level, Staff_Specializa on, )
Drug (Drug_Code, Drug_Cost, Drug_Type ….. )
Operation (Op_Code, OP_Type ….)
Treatment (Treat_ID, Treat_Type, Pa ent_ID, Diagnosis, Unit_Code, Staff_ID)
Treatment_Drug (Treat_ID, Drug_ID, TD_Date, TD_Dura on, TD_Cost).
Treatment_Operation (Treat_ID, Opera on_Code, TD_Date, TD_Dura on, TD_Cost).

Your task: Your task is to design and implement a warehouse for the NHS using dummy data. The aim
of this DW demo is to demonstrate to the NHS decision makers and consultants the benefits they can
gain from investing in a DW.

Step 1: Define a Star/Snowflake schema (i.e. normalized star schema). Your schema should cover
around two subjects.
Step 2: Create the tables using MySQL (You can use other DBMS but you will need to get approval
from the unit coordinator Dr. Mohamed Bader first).
Step 3: Populate the tables with some dummy data.
Step 4: Write and run 6 SQL queries. Your queries must be meaningful, serve at least 3 different
stakeholders and demonstrate the strength of DW in supporting decision makers. Also, your queries
must cover all subjects in your DW (at least 2 to 3 queries per subject). You should provide a short
description of each query.

Deliverables of this component is a report that contains:

a. Short description of your project: the report should explain your design decisions.
[500 words max].
b. Your Star/Snowflake schema.
c. Screenshots of each query and its output along with a short description of each query.
Your screenshots must be clear and readable. If the query and/or the result of the
query are not clear and readable, zero will be awarded.

The submission is online through Moodle (the submission details will be available on Moodle).

This component of your coursework contributes 40% of the total mark for the unit assessment. The
marking criteria [in 100% breakup of marks] for this component are as follows:

25% Justification of design decisions and project explanation. Also, for organisation, language
style and clarity of the report.
35% The correctness, coverage, quality and novelty of the design and star/snowflake schema.
40% The quality and the coverage of the advanced SQL queries
Second Submission: due on Friday 10th May 2019 5:00pm (Two Tasks)

Task I 40%: using WEKA-Classification software and critical thinking

You are required to search for at least 20 different datasets with a maximum of 40 datasets. You are
then required to apply the following classification techniques using the WEKA software on the chosen
datasets:
(1) Decision tree (J48),
(2) Random Forest, and
(3) K-NN (IBk) (with K taking the value of 1 up to the number of class labels in the dataset)
Note: Random Forest was not covered in lecture, part of this CW is to self-learn about other
classification methods beyond what was covered in the lectures.

Once you have applied the algorithms on all the datasets, it is required to accomplish the following
tasks:
1. Compare the performance of the applied techniques in terms of accuracy.
2. Analyse the results with regards to the dataset properties.
3. Write a report of no longer than 1000 words detailing the results you have reached in (1) and (2)
with recommendations on the choice of the data mining technique according to the features of the
datasets.

Task II 20%: using WEKA-Clustering software and critical thinking

You are required to apply the following clustering techniques using the WEKA software on only 10 of
the datasets you selected in task 1:
(1) K-means,
(2) Agglomeration method
Remove the class attribute before applying the above clustering methods. Once you have applied the
clustering techniques on all the datasets, it is required to accomplish the following tasks:
1. Use the clustering evaluation methods to compare the performance of the above algorithms
2. Write a report of no longer than 500 words detailing the results you have reached in (1) and (2)
with recommendations on the choice of the data mining technique according to the features of the
datasets.

Deliverables of this component of the coursework are:

You are to write a report addressing the aforementioned tasks in no more than 1500 (1000 classification
+ 500 clustering) words excluding figures and tables. Your report must cover the following areas:
(1) A short summary of the datasets you used and the justification of choice.
(2) A detailed analysis of your results when comparing the different classification/clustering
techniques.

The submission is online through Moodle (the submission details will be available on Moodle).

This component (task 1 and 2) of your coursework contributes 60% of the total mark (40% Task I and
20% Task 2) for the unit assessment. The marking criteria [in 100% breakup of marks] for this
component are as follows:

20% Justification of choice and number of the datasets used

20% Appropriate use of tables and figures when reporting the results
30% Analysis of the results of the experiments you have conducted
20% Conclusion with recommendations on how to match a dataset to a technique
10% Organisation, language style and clarity

Assignment2 4
No ratings yet
Assignment2 4
13 pages
Approved l7 Comp7067 2023-24 Sub Brief
No ratings yet
Approved l7 Comp7067 2023-24 Sub Brief
7 pages
CSCI312 Big Data Management Singapore 2022-2 Assignment 2: Published On 24 April 2022
No ratings yet
CSCI312 Big Data Management Singapore 2022-2 Assignment 2: Published On 24 April 2022
10 pages
COM745 - Coursework Description and Assessment CriteriaQAHE
No ratings yet
COM745 - Coursework Description and Assessment CriteriaQAHE
7 pages
COM745 - S3 QAHE - CW Description - Tagged
No ratings yet
COM745 - S3 QAHE - CW Description - Tagged
7 pages
MITS4003 Assessment 1
No ratings yet
MITS4003 Assessment 1
6 pages
Coursework Requirements
No ratings yet
Coursework Requirements
4 pages
2023 7buis010w Dwhdesign Cwk1 RD - Tagged
No ratings yet
2023 7buis010w Dwhdesign Cwk1 RD - Tagged
8 pages
MITS4003 Assessment 1 PDF
No ratings yet
MITS4003 Assessment 1 PDF
6 pages
Assignment1 CSIT882
No ratings yet
Assignment1 CSIT882
7 pages
CM3010 - DADT Midterm-Assessment
No ratings yet
CM3010 - DADT Midterm-Assessment
3 pages
Big Data Analytics Resit Project Guide
No ratings yet
Big Data Analytics Resit Project Guide
6 pages
CS 2032 Datawarehousing & Data Mining QB Topic Wise
No ratings yet
CS 2032 Datawarehousing & Data Mining QB Topic Wise
11 pages
MN405 Data and Information Management
No ratings yet
MN405 Data and Information Management
7 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
51 pages
Design 2 DWH
No ratings yet
Design 2 DWH
3 pages
Project Form ODL
No ratings yet
Project Form ODL
11 pages
DWDM Lab
No ratings yet
DWDM Lab
121 pages
DWDM Record Print1
No ratings yet
DWDM Record Print1
100 pages
Unit 1 Assignment
0% (1)
Unit 1 Assignment
6 pages
Assignment No 2: Name:: Zaheer Atta Reg No:: 16-Arid-5023
No ratings yet
Assignment No 2: Name:: Zaheer Atta Reg No:: 16-Arid-5023
6 pages
Assignment 2
No ratings yet
Assignment 2
7 pages
EBUS537 - Assignment 2 - 2024-25
No ratings yet
EBUS537 - Assignment 2 - 2024-25
2 pages
2024 COMP1702CourseWork
No ratings yet
2024 COMP1702CourseWork
5 pages
BIS1002 Assessment Brief T1 2022
No ratings yet
BIS1002 Assessment Brief T1 2022
8 pages
COMP810 DW Handbook 2018-S2
No ratings yet
COMP810 DW Handbook 2018-S2
6 pages
DSECLZG529 AIMLCZG529 Data Management For Machine Learning Compre - Regular AK
No ratings yet
DSECLZG529 AIMLCZG529 Data Management For Machine Learning Compre - Regular AK
10 pages
IMAT5103 Coursework 2022-23
No ratings yet
IMAT5103 Coursework 2022-23
7 pages
Advanced Databases Assignment Brief and Marking Criteria
No ratings yet
Advanced Databases Assignment Brief and Marking Criteria
10 pages
ITECH2302 MainAssessment Report
No ratings yet
ITECH2302 MainAssessment Report
8 pages
STW220CT: Data and Information Retrieval Coursework
No ratings yet
STW220CT: Data and Information Retrieval Coursework
4 pages
Project Management-Notes
No ratings yet
Project Management-Notes
6 pages
202CS009
No ratings yet
202CS009
2 pages
CSCI2141 Assignment Part 1 Instructions
No ratings yet
CSCI2141 Assignment Part 1 Instructions
3 pages
DW OpenBook Assessment Exam QP APRIL 2022 New
No ratings yet
DW OpenBook Assessment Exam QP APRIL 2022 New
6 pages
Data Warehousing Record
No ratings yet
Data Warehousing Record
30 pages
CS 8031 Data Mining and Data Warehousing Tutorial
No ratings yet
CS 8031 Data Mining and Data Warehousing Tutorial
9 pages
Assignment 1
No ratings yet
Assignment 1
7 pages
Csi-7-Dat Coursework 2 Nov2024final
No ratings yet
Csi-7-Dat Coursework 2 Nov2024final
6 pages
Structure and Guideline of The CourseWork
No ratings yet
Structure and Guideline of The CourseWork
2 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
10 pages
Main Project Document Rev1
No ratings yet
Main Project Document Rev1
24 pages
Mcs 221 SOLVED ASSIGNMENT 2025-26
No ratings yet
Mcs 221 SOLVED ASSIGNMENT 2025-26
4 pages
Bit 2214 Ooad Assignment
No ratings yet
Bit 2214 Ooad Assignment
2 pages
By Bi Jay Mishra
100% (1)
By Bi Jay Mishra
685 pages
Data Warehousing Exam Guide
No ratings yet
Data Warehousing Exam Guide
10 pages
Data Warehousing & Mining Assignments
No ratings yet
Data Warehousing & Mining Assignments
14 pages
Hospital Management
100% (1)
Hospital Management
54 pages
MCA Data Warehousing Insights
No ratings yet
MCA Data Warehousing Insights
3 pages
Data Warehouse
No ratings yet
Data Warehouse
1 page
DDD Assignment Mark Scheme Autumn 2018
No ratings yet
DDD Assignment Mark Scheme Autumn 2018
13 pages
Data Warehouse Design Insights
No ratings yet
Data Warehouse Design Insights
10 pages
DWDM QB
No ratings yet
DWDM QB
12 pages
Data Analytics Lab Assignment
No ratings yet
Data Analytics Lab Assignment
6 pages
Bennett Chapter 6
No ratings yet
Bennett Chapter 6
28 pages
CST 6th Semester
No ratings yet
CST 6th Semester
17 pages
Assignment 2
No ratings yet
Assignment 2
6 pages
SE T2 Scheme
No ratings yet
SE T2 Scheme
9 pages
Database Coursework for Students
No ratings yet
Database Coursework for Students
11 pages
Festival Database Schema Guide
100% (1)
Festival Database Schema Guide
19 pages
DBPRIN Case Study
100% (1)
DBPRIN Case Study
3 pages
The Snooty Cat Festival Company Design Decisions and Assumptions
100% (1)
The Snooty Cat Festival Company Design Decisions and Assumptions
10 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
14 pages
Retail Business Management System
67% (3)
Retail Business Management System
106 pages
Data Warehousing and Mining Coursework
0% (1)
Data Warehousing and Mining Coursework
19 pages
Human Computer Interaction
No ratings yet
Human Computer Interaction
40 pages
Advanced Database Labs SQL Queries
No ratings yet
Advanced Database Labs SQL Queries
26 pages
Advanced Database Concepts
No ratings yet
Advanced Database Concepts
24 pages
Cryptocurrency - A 21st Century Fairy Tale or The Global Currency of The Future?
100% (2)
Cryptocurrency - A 21st Century Fairy Tale or The Global Currency of The Future?
6 pages
CV Emma Macaskill
No ratings yet
CV Emma Macaskill
2 pages
Supply Chain in The Pharmaceutical Industry - Strategic Influences and Supply Chain Responses (PDFDrive)
No ratings yet
Supply Chain in The Pharmaceutical Industry - Strategic Influences and Supply Chain Responses (PDFDrive)
276 pages
Wilson, Dillon - EPQ Essay Final (Dec 2014)
No ratings yet
Wilson, Dillon - EPQ Essay Final (Dec 2014)
18 pages
NHS S1 Healthcare Registration Guide
No ratings yet
NHS S1 Healthcare Registration Guide
7 pages
Maternity Policy (Student)
No ratings yet
Maternity Policy (Student)
7 pages
General Test 11
No ratings yet
General Test 11
7 pages
Adult Social Care
No ratings yet
Adult Social Care
4 pages
A Lean Hospital?: Why Did We Go Lean?
No ratings yet
A Lean Hospital?: Why Did We Go Lean?
5 pages
What Is Health System, William Hsiao, 2003
No ratings yet
What Is Health System, William Hsiao, 2003
33 pages
News Bulletin From Greg Hands M.P. #300
No ratings yet
News Bulletin From Greg Hands M.P. #300
1 page
National Framework For CHC and FNC - October 2018 Revised
No ratings yet
National Framework For CHC and FNC - October 2018 Revised
167 pages
November 2015
No ratings yet
November 2015
10 pages
AAGBI14.10 Out of Hours Activity (Anaesthesia)
No ratings yet
AAGBI14.10 Out of Hours Activity (Anaesthesia)
4 pages
Article Spiritual Needs of Children With Complex Healthcare Needs in Hospital
No ratings yet
Article Spiritual Needs of Children With Complex Healthcare Needs in Hospital
5 pages
Director Senior Manager Workforce Development in Asheville NC Resume Kim Marmon Saxe
No ratings yet
Director Senior Manager Workforce Development in Asheville NC Resume Kim Marmon Saxe
5 pages
2016 Article 497
No ratings yet
2016 Article 497
11 pages
Letter From The University of Bolton
No ratings yet
Letter From The University of Bolton
5 pages
Flexible Working
100% (2)
Flexible Working
21 pages
Mrcog Osce
100% (4)
Mrcog Osce
65 pages
International Journal of Health Care Quality Assurance
0% (1)
International Journal of Health Care Quality Assurance
12 pages
Test 5
No ratings yet
Test 5
14 pages
(Dianne Watkins MSC PGCE RN RM HV RNT, Judy E (BookFi) PDF
100% (1)
(Dianne Watkins MSC PGCE RN RM HV RNT, Judy E (BookFi) PDF
368 pages
GIRFT Radiology Report
No ratings yet
GIRFT Radiology Report
81 pages
Social Enterprise Strategic Management - Case 8-2 - Broomby CIC
No ratings yet
Social Enterprise Strategic Management - Case 8-2 - Broomby CIC
4 pages
Roszak - The Making of A Counter Culture, Ausschn. (1969) PM PDF
100% (1)
Roszak - The Making of A Counter Culture, Ausschn. (1969) PM PDF
23 pages
Harnessing The Power of AI For The Public Sector
100% (1)
Harnessing The Power of AI For The Public Sector
20 pages
Preoperative NICE Guidelines
No ratings yet
Preoperative NICE Guidelines
117 pages
3464 Degenerative Cervical Myelopathy PIL v10 - FINAL
No ratings yet
3464 Degenerative Cervical Myelopathy PIL v10 - FINAL
16 pages
(UVTN 2023) Aptitude Test 1
No ratings yet
(UVTN 2023) Aptitude Test 1
15 pages
S1 Healthcare Registration Guide
No ratings yet
S1 Healthcare Registration Guide
7 pages

Data Warehousing and Mining

Uploaded by

Data Warehousing and Mining

Uploaded by

DATA WAREHOUSING

AND MINING COURSEWORK

Unit Name: Data Warehousing and Mining Coursework

The individual components in this coursework are NOT group work.

Designing and implementing a data warehouse for NHS

Patient (Pa ent_ID, Pa ent_DoB, Pa ent_County, Pa ent_City, Pa ent_Ocupa on..)

Deliverables of this component is a report that contains:

Task I 40%: using WEKA-Classification software and critical thinking

Task II 20%: using WEKA-Clustering software and critical thinking

Deliverables of this component of the coursework are:

20% Justification of choice and number of the datasets used

You might also like