Running Head: Data Warehouse 1
Data Warehouse
Professor’s Name
Students’ Name
Institution.
Date.
Data Warehouse 2
ASSIGNMENT-3
1. You are the staging area expert on the project team for a large toy manufacturer.
Discuss the four modes of applying data to the data warehouse. Select the modes you
want to use for your data warehouse and explain the reasons for your selection.
Modes of applying data will include:
Load: this applies when the data already exists in the target tables and there is need to
load the data warehouse with new incoming data to replace the old information. The reason
for selecting this mode for applying data to the data warehouse is that for example for a
manufacturing company it is easy to fully load the master data of items in that company in
terms of location, material etc. so it makes it easier to update you current items based on this
mode.
Append: it is an extension of the load process. The new data is added to the target table and
does not delete the pre-existing data.
Append mode of applying data to the data warehouse means that any operations can be
appended to a specific routing in a company therefore retaining very vital data of the
company.
Destructive merge: it applies the incoming data to the target table. The target data is updated
with the incoming data.
The reason for choosing this particular mode is that it’s the most suitable mode as it ensures
that every set of data is updated in real time based on the incoming data in the target table.
Constructive merge: here, the new incoming data does not overwrite the existing data if the
primary key is matched and will be marked as superseded.
This mode is used when items in a factory need to have the latest revisions updated over the
same key of items.
Data Warehouse 3
2. Assume that you are the data quality expert on the data warehouse project team for a
large financial institution with many legacy systems dating back to the 1970’s. Review
the types of data quality problems you are likely to have and make suggestions on how
to deal with those.
There are many inherent problems that are likely to be experienced when it comes to
managing the data quality inconsistencies with legacy systems that date back to 1970. Some
of the most significant problems will include byte-ordering inconsistencies from the
operating systems used during this era. Consequently, the portability of the data between
systems will be an issue. There is need therefore to use specialization translation applications.
Moreover, there are problems to do with the format of the data. In metadata, there are
relationships, entities and interrelationships that exist between data.
3. Compare the usage and value of information in the data warehouse with those in
operational systems. Explain the major differences. Discuss and give examples.
The major difference between operational systems and data warehouse systems is that the
operational systems are configured to deal with transaction processing while the data
warehouse systems are designed to support the online analytical processing.
For the operational systems, they are designed to support high volume transaction processing
and there is very little back-end reporting. In addition, they are mainly concerned with
current data, and it is generally updated in accordance to the need. A good example is where
there are purchase records that do not have any corresponding customer records to identify
who purchased what are clearly errors in source data. These errors could be corrected in the
source operational system before taking the data and loading it to the data warehouse.
When it comes to data warehousing, they are designed to support high volume analytical
processing and also elaborate report generation. They are concerned more with historical data
Data Warehouse 4
and data within them is non-volatile meaning data could be added but it is rare to change it.
This offers for an ever-growing history of information. The best example for such a
warehouse is Facebook.
4. Prepare an outline for a standards manual for your data warehouse. Consider all types
of objects and their naming conventions. Indicate why standards are important.
Produce a detailed table of contents.
Standards are conventions that every company employs so as to maintain uniformity.
Standards are used to ensure that there exists a level of consistency across any system in
terms of databases, processes or even objects this ensure that there is uniformity in
companies that may have many departments. Below is standard outline for various types of
objects that would be in the data warehouse.
Example
S.No. Object
1 Schema (In SQL) CREATE SCHEMA PRODUCT_DETAILS_NEW
2 Table PRODUCT_MASTER
3 Column PRODUCT_ID
4 Staging files EMPLOYEE_DAILY_STAGE,
EMPLOYEE_DAILY_UPDATE
5 Physical file (scripts) EMPLOYEE_P
6 Physical file (source) EMPLOYEE_SRC
7 Physical file (codec) EMPLOYEE_CDC
8 Physical file (Database EMPLOYEE_DB
file)
9 Logical File EMPLOYEE_L
Data Warehouse 5
10 Application document CUSTOMER_APP_DOC
11 Query ORDERS_DETAILS_QUERY
12 Report STORE_LOCATION_REPORT
Saudi Telecom – Questions for Discussion
5. Why do you think telecommunications companies are among the prime users of
information visualization tools?
From the case study, information visualization tools were important as they allowed the
managers to observe the trends and make the necessary corrections before things went out of
hand. These tools are important for the companies since they enable them to foresee any
likely problems and take the necessary measures to curb them. It also helps them to deal
with the large number of clients they have.
6. What were their challenges, the proposed solution, and the obtained results?
Challenges
The challenges were that data come from different kind of sources, and this might have
caused redundancy of this specific data. In addition to this it was very time consuming to
analyze the give data
Proposed solution
Use of TIBCO Tool
Use of this tool would enable to look at the specific data differently this would go a long
way in ensuring that we understand the given data.
Mining for Lies Case Study
7. How can text/data mining be used to detect deception in text?
Transcribing statements for processing and extracting cues and selecting them.
Data Warehouse 6
The text processing software identifies cues and generates quantified cues. The
classification models are trained and tested on quantified cues. The cues are then labeled as
true or deceptive.
8. What do you think are the main challenges for such an automated system?
Having such a system may sound easy theoretically but training a software to identify
human aspects creates problems such as terminologies, terms, references phrases and names
that could be used. Having a software that is capable of determining what is true or not
without having any human sensitivity is virtually impossible because we all have our
versions of truths and there is no standard way of identifying who or what determines it.
Big Data and Analytics in Politics Case Study
9. What is the role of analytics and Big Data in modern day politics? Do you think Big
Data analytics could change the outcome of an election?
In modern day politics, big data is essential in political campaigns. Characteristics of big
data such a variety velocity and volume are very much related to the data used in political
operations. Big data analytics can change an outcome of an election since it helps in the
forecast of the election results and also aims at the possible voters and contributors.
10. What do you think are the challenges, the potential solution, and the probable results
of the use of Big Data analytics in politics?
The main challenge would be the storage of the big data. It would be difficult to collect
and store such large volumes of data since data is increasing on a daily basis. Getting
efficient and well-equipped people to handle these large amounts of data is also another
problem. There is also a challenge of security since the data collected is too much and it
could also be very sensitive. The solutions to these challenges lie in the development of a
suitable code that would cater for all of these challenges at a go.