This Project was a course project where I was asked create a mock database and data warehouse for an accounting firm
In this step I determined which entities would be most important for the mock accounting firm. I determined that
it would be best to keep it simple. This ERD contains relationships between the small accounting firm's employees,
the clients (booking, payroll, tax, etc) who receive and pay invoices for services performed.
First I needed to determine what dimensions I wanted to focus on in the data warehouse.
I decided on Time, Client, Invoice, and Aggregates related to the firm's revenue.
I then designed a Star Schema ERD for the data warehouse.
With the ERDs created and the database and warehouse outline, it was then time to create the database inside
SQL Server Management Studio.
Here I created tables for all the entities in our ERD diagram (only 4 are shown here)
It was important to include the correct datatypes and sizes for when I generated the random data later one.
I wanted some good sample data for the databse, so I used an website called Mockaroo to generate random
data for the tables. Those data were exported to CSV files and that copied into SQL code to populate the databse tables.
I used a few different procedural SQL programs written from scratch to populate the data warehouse tables
These programs make use of disabling referential integrity and enabling it after the program is run
in order to disable foreign key constraints for values not yet used.
In this program, I simply selected all the values from the already populated Invoice Table from the Accounting
database and inserted them into the data warehouse table. I did the same for the client dimesion.
Part of the Time dimension table is an attribute called dateId. This is an ID created from a calendar date.
I created this ID for each date called by using a Function
In this function called FetchDateID, I create two variables for the Date and Hour. I then set those variables equal
to the date passed when the function is called.
The DateID variable is set to the cast of the datepart and hourpart variables. The format of the dateID
should be yyyymmdd, so I used style 112 to format the results as such.
DateID is returned by the function
The popdimTime procedure utlizes the FetchDateID function to populate the dateId portion of the dimTime Table.
It then goes through a loop from April 20 of 2023 to April 20 of 2024 and populates the table with values from
each date during that time period.
I created a trigger to update the status of an invoice from 'Pending' to 'Paid' when a new row is added
to the payments table (A payment is received).
It uses 'after insert' and the keyword 'inserted' to access the row or rows that are newly inserted in the update
to the Payment table.