0% found this document useful (0 votes)
265 views6 pages

Test 8

This document discusses various topics related to data warehousing including: 1. The four modes of applying data to a data warehouse and reasons for selection. 2. Common data quality issues with legacy systems and suggestions for addressing them. 3. The differences in usage and value of data between operational systems and data warehouses. 4. An outline for a standards manual covering naming conventions for various data warehouse objects and why standards are important.

Uploaded by

Robert Kegara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
265 views6 pages

Test 8

This document discusses various topics related to data warehousing including: 1. The four modes of applying data to a data warehouse and reasons for selection. 2. Common data quality issues with legacy systems and suggestions for addressing them. 3. The differences in usage and value of data between operational systems and data warehouses. 4. An outline for a standards manual covering naming conventions for various data warehouse objects and why standards are important.

Uploaded by

Robert Kegara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 6

Running Head: Data Warehouse 1

Data Warehouse

Professor’s Name

Students’ Name

Institution.

Date.
Data Warehouse 2

ASSIGNMENT-3

1. You are the staging area expert on the project team for a large toy manufacturer.

Discuss the four modes of applying data to the data warehouse. Select the modes you

want to use for your data warehouse and explain the reasons for your selection.

Modes of applying data will include:

Load: this applies when the data already exists in the target tables and there is need to

load the data warehouse with new incoming data to replace the old information. The reason

for selecting this mode for applying data to the data warehouse is that for example for a

manufacturing company it is easy to fully load the master data of items in that company in

terms of location, material etc. so it makes it easier to update you current items based on this

mode.

Append: it is an extension of the load process. The new data is added to the target table and

does not delete the pre-existing data.

Append mode of applying data to the data warehouse means that any operations can be

appended to a specific routing in a company therefore retaining very vital data of the

company.

Destructive merge: it applies the incoming data to the target table. The target data is updated

with the incoming data.

The reason for choosing this particular mode is that it’s the most suitable mode as it ensures

that every set of data is updated in real time based on the incoming data in the target table.

Constructive merge: here, the new incoming data does not overwrite the existing data if the

primary key is matched and will be marked as superseded.

This mode is used when items in a factory need to have the latest revisions updated over the

same key of items.


Data Warehouse 3

2. Assume that you are the data quality expert on the data warehouse project team for a

large financial institution with many legacy systems dating back to the 1970’s. Review

the types of data quality problems you are likely to have and make suggestions on how

to deal with those.

There are many inherent problems that are likely to be experienced when it comes to

managing the data quality inconsistencies with legacy systems that date back to 1970. Some

of the most significant problems will include byte-ordering inconsistencies from the

operating systems used during this era. Consequently, the portability of the data between

systems will be an issue. There is need therefore to use specialization translation applications.

Moreover, there are problems to do with the format of the data. In metadata, there are

relationships, entities and interrelationships that exist between data.

3. Compare the usage and value of information in the data warehouse with those in

operational systems. Explain the major differences. Discuss and give examples.

The major difference between operational systems and data warehouse systems is that the

operational systems are configured to deal with transaction processing while the data

warehouse systems are designed to support the online analytical processing.

For the operational systems, they are designed to support high volume transaction processing

and there is very little back-end reporting. In addition, they are mainly concerned with

current data, and it is generally updated in accordance to the need. A good example is where

there are purchase records that do not have any corresponding customer records to identify

who purchased what are clearly errors in source data. These errors could be corrected in the

source operational system before taking the data and loading it to the data warehouse.

When it comes to data warehousing, they are designed to support high volume analytical

processing and also elaborate report generation. They are concerned more with historical data
Data Warehouse 4

and data within them is non-volatile meaning data could be added but it is rare to change it.

This offers for an ever-growing history of information. The best example for such a

warehouse is Facebook.

4. Prepare an outline for a standards manual for your data warehouse. Consider all types

of objects and their naming conventions. Indicate why standards are important.

Produce a detailed table of contents.

Standards are conventions that every company employs so as to maintain uniformity.

Standards are used to ensure that there exists a level of consistency across any system in

terms of databases, processes or even objects this ensure that there is uniformity in

companies that may have many departments. Below is standard outline for various types of

objects that would be in the data warehouse.

Example
S.No. Object

1 Schema (In SQL) CREATE SCHEMA PRODUCT_DETAILS_NEW

2 Table PRODUCT_MASTER

3 Column PRODUCT_ID

4 Staging files EMPLOYEE_DAILY_STAGE,

EMPLOYEE_DAILY_UPDATE

5 Physical file (scripts) EMPLOYEE_P

6 Physical file (source) EMPLOYEE_SRC

7 Physical file (codec) EMPLOYEE_CDC

8 Physical file (Database EMPLOYEE_DB

file)

9 Logical File EMPLOYEE_L


Data Warehouse 5

10 Application document CUSTOMER_APP_DOC

11 Query ORDERS_DETAILS_QUERY

12 Report STORE_LOCATION_REPORT

Saudi Telecom – Questions for Discussion

5. Why do you think telecommunications companies are among the prime users of

information visualization tools?

From the case study, information visualization tools were important as they allowed the

managers to observe the trends and make the necessary corrections before things went out of

hand. These tools are important for the companies since they enable them to foresee any

likely problems and take the necessary measures to curb them. It also helps them to deal

with the large number of clients they have.

6. What were their challenges, the proposed solution, and the obtained results?

Challenges

The challenges were that data come from different kind of sources, and this might have

caused redundancy of this specific data. In addition to this it was very time consuming to

analyze the give data

Proposed solution

Use of TIBCO Tool

Use of this tool would enable to look at the specific data differently this would go a long

way in ensuring that we understand the given data.

Mining for Lies Case Study

7. How can text/data mining be used to detect deception in text?


Transcribing statements for processing and extracting cues and selecting them.
Data Warehouse 6

The text processing software identifies cues and generates quantified cues. The

classification models are trained and tested on quantified cues. The cues are then labeled as

true or deceptive.
8. What do you think are the main challenges for such an automated system?

Having such a system may sound easy theoretically but training a software to identify

human aspects creates problems such as terminologies, terms, references phrases and names

that could be used. Having a software that is capable of determining what is true or not

without having any human sensitivity is virtually impossible because we all have our

versions of truths and there is no standard way of identifying who or what determines it.

Big Data and Analytics in Politics Case Study

9. What is the role of analytics and Big Data in modern day politics? Do you think Big

Data analytics could change the outcome of an election?


In modern day politics, big data is essential in political campaigns. Characteristics of big

data such a variety velocity and volume are very much related to the data used in political

operations. Big data analytics can change an outcome of an election since it helps in the

forecast of the election results and also aims at the possible voters and contributors.
10. What do you think are the challenges, the potential solution, and the probable results

of the use of Big Data analytics in politics?


The main challenge would be the storage of the big data. It would be difficult to collect

and store such large volumes of data since data is increasing on a daily basis. Getting

efficient and well-equipped people to handle these large amounts of data is also another

problem. There is also a challenge of security since the data collected is too much and it

could also be very sensitive. The solutions to these challenges lie in the development of a

suitable code that would cater for all of these challenges at a go.

You might also like