Chapter- 1
MBA NOTES
BY:
MR. PRAMOD KR. SINGH
SUBJECT- MANAGEMENT INFORMATION SYSTEM
SUBJECT CODE- KMBN-208
Unit -2
Managing Data Resources
1| Page
DATA MANAGEMENT
In the 21st century, Data is everything. With massive volumes of it generated every day, it stands to reason that we need to
have better data management solutions available. Any business or organization that wants to succeed today need to understand
the what, why, and how of data management.
In today’s digital economy, companies have access to more data than ever before. This data creates a foundation of intelligence
for important business decisions. To ensure employees have the right data for decision-making, companies must invest in data
management solutions that improve visibility, reliability, security, and scalability.
WHAT IS DATA MANAGEMENT?
Data management is the practice of collecting, organizing, protecting, and storing an organization’s data so it can be analyzed for
business decisions. As organizations create and consume data at unprecedented rates, data management solutions become
essential for making sense of the vast quantities of data. Today’s leading data management software ensures that reliable, up-
to-date data is always used to drive decisions. The software helps with everything from data preparation to cataloging, search,
and governance, allowing people to quickly find the information they need for analysis.
The Data Management Association or DAMA, defines data management as "the development of architectures, policies,
practices, and procedures to manage the data lifecycle."
To put it in simpler, everyday terms, data management is the process of collecting, keeping, and using data in a cost-effective,
secure, and efficient manner. Data management helps people, organizations, and connected things optimize data usage to make
better-informed decisions that yield maximum benefit.
TYPES OF DATA MANAGEMENT
Data management plays several roles in an organization’s data environment, making essential functions easier and less time-
intensive. These data management techniques include the following:
Data preparation is used to clean and transform raw data into the right shape and format for analysis, including making
corrections and combining data sets.
Data pipelines enable the automated transfer of data from one system to another.
ETLs (Extract, Transform, Load) are built to take the data from one system, transform it, and load it into the
organization’s data warehouse.
Data catalogs help manage metadata to create a complete picture of the data, providing a summary of its changes,
locations, and quality while also making the data easy to find.
Data warehouses are places to consolidate various data sources, contend with the many data types businesses store,
and provide a clear route for data analysis.
Data governance defines standards, processes, and policies to maintain data security and integrity.
Data architecture provides a formal approach for creating and managing data flow.
Data security protects data from unauthorized access and corruption.
Data modeling documents the flow of data through an application or organization
WHY DATA MANAGEMENT IS IMPORTANT
Data management is a crucial first step to employing effective data analysis at scale, which leads to important insights that add
value to your customers and improve your bottom line. With effective data management, people across an organization can find
and access trusted data for their queries. Some benefits of an effective data management solution include:
Visibility
Data management can increase the visibility of your organization’s data assets, making it easier for people to quickly and
confidently find the right data for their analysis. Data visibility allows your company to be more organized and productive,
allowing employees to find the data they need to better do their jobs.
2| Page
Reliability
Data management helps minimize potential errors by establishing processes and policies for usage and building trust in the data
being used to make decisions across your organization. With reliable, up-to-date data, companies can respond more efficiently
to market changes and customer needs.
Security
Data management protects your organization and its employees from data losses, thefts, and breaches with authentication and
encryption tools. Strong data security ensures that vital company information is backed up and retrievable should the primary
source become unavailable. Additionally, security becomes more and more important if your data contains any personally
identifiable information that needs to be carefully managed to comply with consumer protection laws.
Scalability
Data management allows organizations to effectively scale data and usage occasions with repeatable processes to keep data and
metadata up to date. When processes are easy to repeat, your organization can avoid the unnecessary costs of duplication, such
as employees conducting the same research over and over again or re-running costly queries unnecessarily.
DATA MANAGEMENT CHALLENGES
Because data management plays a crucial role in today’s digital economy, it’s important that systems continue to evolve to meet
your organization’s data needs. Traditional data management processes make it difficult to scale capabilities without
compromising governance or security. Modern data management software must address several challenges to ensure trusted
data can be found.
Challenge 1: Increased data volumes
Every department within your organization has access to diverse types of data and specific needs to maximize its value.
Traditional models require IT to prepare the data for each use case and then maintain the databases or files. As more data
accumulates, it’s easy for an organization to become unaware of what data it has, where the data is, and how to use it.
Challenge 2: New roles for analytics
As your organization increasingly relies on data-driven decision-making, more of your people are asked to access and analyze
data. When analytics falls outside a person’s skill set, understanding naming conventions, complex data structures, and
databases can be a challenge. If it takes too much time or effort to convert the data, analysis won’t happen and the potential
value of that data is diminished or lost.
Challenge 3: Compliance requirements
Constantly changing compliance requirements make it a challenge to ensure people are using the right data. An organization
needs its people to quickly understand what data they should or should not be using—including how and what personally
identifiable information (PII) is ingested, tracked, and monitored for compliance and privacy regulations.
Challenge 4: Sharing and Accessing Data
Perhaps the most frequent challenge in big data efforts is the inaccessibility of data sets from external sources. It is necessary for
the data to be available in an accurate, complete and timely manner because if data in the companies information system is to
be used to make accurate decisions in time then it becomes necessary for data to be available in this manner.
Challenge 5: Privacy and Security:
It is another most important challenge with Big Data. This challenge includes sensitive, conceptual, technical as well as legal
significance.
Most of the organizations are unable to maintain regular checks due to large amounts of data generation. However, it should be
necessary to perform security checks and observation in real time because it is most beneficial.
3| Page
Challenge 5: Analytical Challenges:
There are some huge analytical challenges in big data which arise some main challenges questions like how to deal with a
problem if data volume gets too large? Or how to find out the important data points?Or how to use data to the best advantage?
These large amount of data on which these type of analysis is to be done can be structured (organized data), semi-structured
(Semi-organized data) or unstructured (unorganized data).
Challenge 5: Technical challenges:
Quality of data:
When there is a collection of a large amount of data and storage of this data, it comes at a cost. Big companies,
business leaders and IT leaders always want large data storage.
For better results and conclusions, Big data rather than having irrelevant data, focuses on quality data storage.
This further arise a question that how it can be ensured that data is relevant, how much data would be enough for
decision making and whether the stored data is accurate or not.
Fault tolerance:
Fault tolerance is another technical challenge and fault tolerance computing is extremely hard, involving intricate
algorithms.
Nowadays some of the new technologies like cloud computing and big data always intended that whenever the failure
occurs the damage done should be within the acceptable threshold that is the whole task should not begin from the
scratch.
Scalability:
Big data projects can grow and evolve rapidly. The scalability issue of Big Data has lead towards cloud computing.
It leads to various challenges like how to run and execute various jobs so that goal of each workload can be achieved
cost-effectively.
It also requires dealing with the system failures in an efficient manner. This leads to a big question again that what
kinds of storage devices are to be used.
4| Page
WHAT IS DATA INDEPENDENCE OF DBMS?
Data Independence is defined as a property of DBMS that helps you to change the Database schema at one level of a database
system without requiring to change the schema at the next higher level. Data independence helps you to keep data separated
from all programs that make use of it.
Data independence is the idea that generated and stored data should be kept separate from applications that use the data for
computing and presentation. In many systems, data independence is an innate function related to the multiple components of
the system; however, it is possible to keep data contained within a use application.\
Data independence in DBMS also known as data abstraction is the ability to modify the schema at a lower level without making
alterations at a higher level. Its goal is to make data independent of the user. The database contains a tremendous amount of
data which is hard to handle if it is stored in one place. But with the usage database management system expanding, there is a
need to change the data over time to satisfy the requirements.
So there is a multilayer architecture, such that modifications done at one level won’t affect another. We have two types of data
independence-Logical data independence and physical data independence which can be explained by various levels. The various
levels of abstraction -physical level, logical level, and view level.
PHYSICAL LEVEL DATA INDEPENDENCE IN DBMS
This describes how the data or record is stored and where the data is stored. It is the lowest level and is also known as the
internal level. It is controlled by the database administrator.
CONCEPTUAL LEVEL DATA INDEPENDENCE IN DBMS
It is also known as a logical level. This describes how data is stored in the database and what are structures used. It is also used
to define the relationships between different tables. In this, the constraints for the entire database are defined. It is at the
middle level. It is controlled by the database designer.
EXTERNAL LEVEL DATA INDEPENDENCE IN DBMS
It is also known as view level. It is mainly used to represent data to the user. In this, the application programs hide details of the
data types which means it can also be used to hide pieces of information for security purposes. In simple words, it is used to give
authorization to different users to view data differently. It is the highest level. It is controlled by the interface designer.
5| Page
IMPORTANCE OF DATA INDEPENDENCE IN DBMS
The quality of data is improved.
The maintenance of the database system becomes affordable.
Developers can focus on the general structure rather than the internal implementations.
There is an improvement in database security.
No alterations of the data structure are required in application programs.
Database incompatibility is reduced.
WHAT IS DATA REDUNDANCY?
Data redundancy is when an organization stores the same data in multiple places at the same time. It may occur within many
fields in one database or across multiple technological platforms Redundancy is common in businesses that don't use a central
database or insular management system for data storage. An example of data redundancy is when a company replicates
customer information across separate storage systems in multiple departments in a business. Data managers classify data
redundancy into two categories, which are:
Positive data redundancy: This is intentional and occurs when an organization creates compressed versions of data to access as
a backup. Intentional data redundancy promotes uniformity and protects data, and it safeguards data in different places to
ensure the company's data remain sustainable
Wasteful data redundancy: This is an unintentional replication of data in a company, which can result from complicated data
processes and inefficient coding. It can be difficult to assess which data to update or use when unintentional storage of the same
data occurs, but an organization can follow certain practices to reduce this problem
CAUSE OF DATA REDUNDANCY
Data redundancy can occur either intentionally or accidentally. Accidental data redundancy can occur due to complex process,
inefficient coding, over-complicated data storing processes or issue in terms of efficiency and costs while intentional data
redundancy can be done to protect the data by ensuring backups. Ad-hoc solution can also be used to introduce redundancy
errors.
It occurs in the following ways:
1. It can be designed to create a backup for the data.
2. It can occur due to any human error when the database designer adds the same data repeatedly.
3. It may also occur when the same data gets stored by different designers in multiple systems and multiple systems
hence end up with the same information.
BENEFITS OF DATA REDUNDANCY
A company benefits from data redundancy that's intentional and built into a daily data management plan. Purposeful, positive
data redundancy also:
1. Creates data backups: Data redundancy helps protect and reinforce data backups when data disruption occurs through
unintended data loss. It rebuilds or replaces missing data and ensures continuity.
2. Ensures data accuracy: Hosting multiple data servers for the same data enables a DBMS to examine and evaluate
variances and ensure data is consistent and accurate.
3. Expedites data recovery: Through the support of data backups and data that's easy to access, data redundancy
expedites data recovery and minimizes downtime of access to vital data.
4. Utilizes data storage flexibility: A company can use flexible data storage options to enable data redundancy to support
6| Page
data sharing, which is vital in complex and customer-oriented organizations.
5. Improves data protection: Data redundancy minimizes the effect of a data breach because you can access the data from
multiple sources.
6. Provides data access speed: In a company that has many locations, individuals may access data from redundant sources
to enjoy faster access to the same data. Easy access to data is vital for customer-oriented businesses that seek to provide
efficient services
DISADVANTAGES OF DATA REDUNDANCY
Drawbacks include the following points:
1. Data Inconsistency: The term data inconsistency refers to existence of the same data in different formats in multiple
databases. Redundant data leads to inconsistent duplicates of data and meaningless or unreliable information in a
company's database.
2. Data corruption is increased: The term data corruption refers to damage to data due to error in reading, writing,
storage or processing. This happens when same data fields are repeated in a database or file storage system like when
data is redundant. Corrupted files generate error message for the customers if the task is not completed
3. Database size increases: Size and complexity of the database is increased due to redundant data making maintenance
of the database a challenge. Larger database leads to long load times and longer time is spent on completion of daily
tasks.
4. Cost increase: Storage costs increase and can affect the profits and goals of the companies due to redundant data. The
implementation of a database system becomes very expensive.
5. Additional space consumed: Redundant data takes up additional space which adds up over time to form bloated
databases. This can prove to be a problem for companies to meet the demands of their customers.
DATA CONSISTENCY
Data consistency is a crucial aspect that ensures the accuracy and reliability of data. So if data is inconsistent, there is nothing
right:
Data consistency is the accuracy, completeness, and correctness of data stored in a database. The same data across all related
systems, applications, and databases is when we say that data is consistent. Inconsistent data can lead to incorrect analysis,
decision-making, and outcomes.
The key metrics such as accuracy, completeness, timeliness, and relevance are used to analyze or measure data consistency. Let
us take an instance of the organization's financial information stored in two different databases; data consistency means that
the information is the same in both databases, and any changes made in one database are reflected in the other.
WHY IS DATA CONSISTENCY IMPORTANT
Data consistency is critical for any organization that has data as an asset and relies on data to make business decisions, serve
customers, or comply with regulations.
Data consistency is also important for maintaining data quality and integrity. Organizations are more confident when data is
consistent in its accuracy. Taking better decisions, improving customer satisfaction, and better business outcomes becomes easy
with data consistency.
So when we say data is inconsistent, what do we mean? Let us understand with examples below:
Inconsistent data entry: Inconsistent data entry occurs when there is inconsistency in the format of data entry. For
instance, one employee may enter customer addresses as "block 1/23," while another may use "block 1-23." These
inconsistencies can lead to inaccuracies in customer information, shipping addresses, and billing information. This can
cause delays in shipping, billing errors, and customer frustration.
Duplicated data: Duplicated data occurs when the same information is entered multiple times in different parts of a
database. This can lead to confusion and inaccuracies. So if a customer's address is entered twice in different parts of
7| Page
the database when it should not be, it can be difficult to know which address is correct. This can lead to shipping errors
and missed opportunities.
Inaccurate data: Inaccurate data occurs when data is entered incorrectly or when data changes are not updated
promptly. This can lead to incorrect business decisions, lost opportunities, and legal problems. So if a company's
financial records are inaccurate, it may result in incorrect tax filings or legal compliance issues.
Incomplete data: Incomplete data occurs when important information is missing from a database. This can lead to
incorrect business decisions, missed opportunities, and customer frustration. So if a customer's contact information is
incomplete, the company may be unable to reach out to them with marketing offers or customer service inquiries.
In all of these instances, data inconsistency can lead to serious implications for a business, including lost opportunities,
decreased efficiency, and legal problems. Data consistency is essential for any organization that wants to make the most of its
data assets.
COMMON CAUSES OF DATA INCONSISTENCY:
Inconsistent data does not mean it has to be a human error; sometimes, it is from the system. Below are some common causes
of data inconsistency;
Incomplete data entry: This occurs when some data is missing and can happen due to human error or system issues.
Human error in data entry: Human error is another common cause of data inconsistency. This can include typos,
incorrect data formatting, or incorrect data entry due to a lack of knowledge or training.
Outdated or incorrect data sources: When data is pulled from outdated or incorrect sources, it can lead to
inconsistencies. This happens when data is not up-to-date or when data from multiple sources is not integrated
correctly and does not make sense.
Lack of data integration across systems: Data stored in multiple systems or databases can lead to inconsistencies if the
data is not integrated correctly. This can happen when data is stored in silos, or integration tools are incorrectly used.
It is essential not only to understand where your data is coming from or the source but also where it is going or the databases
and everything that happens in between or the integration; only then will you be able to achieve data consistency.
THE WAYS TO ACHIEVE DATA CONSISTENCY:
There are ways to achieve data consistency, including:
Data validation: It requires checking data against preset rules and standards to ensure the data is accurate, complete,
and consistent. This can be done using validation software or through manual review.
Strict data entry policies: Strict data entry rules should be implemented to ensure that data is entered correctly and
consistently. This can include establishing data entry standards, providing employee training and education, and
regularly reviewing data entry processes.
Data integration: Data integration involves consolidating data from multiple sources to create a unified view of the
data. This can be done using integration tools, which can help ensure that data is consistent across all related systems
and databases.
If we are good gatekeepers by putting strict regulations on the entry and exit of the data, there is very less that needs to be
worked upon to maintain data consistency.
DATABASE ADMINISTRATOR
The Database Administrator, better known as DBA, is the person (or a group of persons) responsible for the well being of the
database management system. S/he has the flowing functions and responsibilities regarding database management:
1. Definition of the schema, the architecture of the three levels of the data abstraction, data independence.
8| Page
2. Modification of the defined schema as and when required.
3. Definition of the storage structure i.e. and access method of the data stored i.e. sequential, indexed or direct.
4. Creating new used-id, password etc, and also creating the access permissions that each user can or cannot enjoy. DBA is
responsible to create user roles, which are collection of the permissions (like read, write etc.) granted and restricted for a class
of users. S/he can also grant additional permissions to and/or revoke existing permissions from a user if need be.
5. Defining the integrity constraints for the database to ensure that the data entered conform to some rules, thereby increasing
the reliability of data.
6. Creating a security mechanism to prevent unauthorized access, accidental or intentional handling of data that can cause
security threat.
7. Creating backup and recovery policy. This is essential because in case of a failure the database must be able to revive itself to
its complete functionality with no loss of data, as if the failure has never occurred. It is essential to keep regular backup of the
data so that if the system fails then all data up to the point of failure will be available from a stable storage. Only those amount
of data gathered during the failure would have to be fed to the database to recover it to a healthy status.
WHAT IS A DATABASE?
A database is a collection of structured data. A database captures an abstract representation of the domain of an application.
• Typically organized as “records” (traditionally, large numbers, on disk)
• And relationships between records
This class is about database management systems (DBMS): systems for creating, manipulating, accessing a database.
A DBMS is a (usually complex) piece of software that sits in front of a collection of data, and mediates applications accesses to
the data, guaranteeing many properties about the data and the accesses.
Definition of Data: Data, we mean known facts that can be recorded and that have implicit meaning. For example, consider the
names, telephone numbers, and addresses of the people you know.
Database Management System (DBMS) is a combination of two words that is database & management system. Combining the
meaning of both gives the definition of DBMS.
A database-management system (DBMS) is a collection of interrelated data and a set of programs to access those data. A
database management system (DBMS) is a collection of programs that enables users to create and maintain a database.
9| Page
The DBMS is hence a general-purpose software system that facilitates the processes of defining, constructing, manipulating, and
sharing databases among various users and applications. Defining a database involves specifying the data types, structures, and
constraints for the data to be stored in the database. Constructing the database is the process of storing the data itself on some
storage medium that is controlled by the DBMS. Manipulating a database includes such functions as querying the database to
retrieve specific data, updating the database to reflect changes in the mini world, and generating reports from the data. Sharing
a database allows multiple users and programs to access the database concurrently
ADVANTAGES AND DISADVANTAGES OF DATABASE MANAGEMENT SYSTEM
We must evaluate whether there is any gain in using a DBMS over a situation where we do not use it. Let us summarize the
advantages.
1. Reduction of Redundancy: This is perhaps the most significant advantage of using DBMS. Redundancy is the problem of
storing the same data item in more one place. Redundancy creates several problems like requiring extra storage space, entering
same data more than once during data insertion, and deleting data from more than one place during deletion. Anomalies may
occur in the database if insertion, deletion etc are not done properly.
2. Sharing of Data: In a paper-based record keeping, data cannot be shared among many users. But in computerized DBMS,
many users can share the same database if they are connected via a network.
3. Data Integrity: We can maintain data integrity by specifying integrity constrains, which are rules and restrictions about what
kind of data may be entered or manipulated within the database. This increases the reliability of the database as it can be
guaranteed that no wrong data can exist within the database at any point of time.
4. Data security: We can restrict certain people from accessing the database or allow them to see certain portion of the database
while blocking sensitive information. This is not possible very easily in a paper-based record keeping.
HOWEVER, THERE COULD BE A FEW DISADVANTAGES OF USING DBMS.
They can be as following:
1. As DBMS needs computers, we have to invest a good amount in acquiring the hardware, software, installation facilities and
training of users.
2. We have to keep regular backups because a failure can occur any time. Taking backup is a lengthy process and the computer
system cannot perform any other job at this time.
3. While data security system is a boon for using DBMS, it must be very robust. If someone can bypass the security system then
the database would become open to any kind of mishandling.
10 | P a g e
11 | P a g e