Data
What is Data?
Data is information that has been translated into a form that is efficient for movement or
processing. Relative to today's computers and transmission media, data is information converted
into binary digital form. It is acceptable for data to be used as a singular subject or a plural
subject. Raw data is a term used to describe data in its most basic digital format.
The concept of data in the context of computing has its roots in the work of Claude Shannon, an
American mathematician known as the father of information theory. He ushered in binary digital
concepts based on applying two-value Boolean logic to electronic circuits. Binary digit formats
underlie the CPUs, semiconductor memories and disk drives, as well as many of the peripheral
devices common in computing today. Early computer input for both control and data took the
form of punch cards, followed by magnetic tape and the hard disk.
Early on, data's importance in business computing became apparent by the popularity of the
terms "data processing" and "electronic data processing," which, for a time, came to encompass
the full gamut of what is now known as information technology. Over the history of corporate
computing, specialization occurred, and a distinct data profession emerged along with growth of
corporate data processing.
Types of Data:-
Two kinds of data exist: quantitative and qualitative.
Quantitative data deals with numbers and things you can measure objectively: dimensions such
as height, width, and length. Temperature and humidity. Prices. Area and volume.
Qualitative data deals with characteristics and descriptors that can't be easily measured, but can
be observed subjectively—such as smells, tastes, textures, attractiveness, and color.
There are also different types of quantitative and qualitative data.
There are two types of quantitative data, which is also referred to as numeric data: continuous
and discrete. As a general rule, counts are discrete and measurements are continuous.
Discrete data: - is a count that can't be made more precise. Typically it involves integers. For
instance, the number of children (or adults, or pets) in your family is discrete data, because you
are counting whole, indivisible entities: you can't have 2.5 kids, or 1.3 pets.
Continuous data:- on the other hand, could be divided and reduced to finer and finer levels. For
example, you can measure the height of your kids at progressively more precise scales—meters,
centimeters, millimeters, and beyond—so height is continuous data.
There are three main kinds of qualitative data
Binary data:- place things in one of two mutually exclusive categories: right/wrong, true/false,
or accept/reject.
For instant:-
Occasionally, I'll get a box of Jujubes that contains a couple of individual pieces that are either
too hard or too dry. If I went through the box and classified each piece as "Good" or "Bad," that
would be binary data. I could use this kind of data to develop a statistical model to predict how
frequently I can expect to get a bad Jujube.
nominal data:-, we assign individual items to named categories that do not have an implicit or
natural value or rank. If I went through a box of Jujubes and recorded the color of each in my
worksheet, that would be nominal data.
This kind of data can be used in many different ways—for instance, I could use chi-square
analysis to see if there are statistically significant differences in the amounts of each color in a
box.
ordinal data:- in which items are assigned to categories that do have some kind of implicit or
natural order, such as "Short, Medium, or Tall." Another example is a survey question that asks
us to rate an item on a 1 to 10 scale, with 10 being the best. This implies that 10 is better than 9,
which is better than 8, and so on.
The uses for ordered data is a matter of some debate among statisticians. Everyone agrees its
appropriate for creating bar charts, but beyond that the answer to the question "What should I do
with my ordinal data?" is "It depends." Here's a post from another blog that offers an excellent
summary of the considerations involved.
Big Data
What is Big Data?
Big Data is a huge amount of data that cannot be stored or process using traditional approach in a
given time.
Big data is an evolving term that describes a large volume of structured, semi-structured and
unstructured data that has the potential to be mined for information and used in machine learning
projects and other advanced analytics applications.
Big data doesn't equate to any specific volume of data, the term is often used to describe
terabytes, petabytes and even exabytes of data captured over time. Even small amount of data
can be referred to as “Big Data” depending on the context it is used.
For Instance: If we are trying to attach a document i.e. of 100 MB to an email we will not be able
to do so as the email system would not support an attachment of this size. Therefore, this 100
MB of this attachment with respect to email can be referred to as “Bog Data”.
Types of Big Data:-
Structured
By structured data, we mean data that can be processed, stored, and retrieved in a fixed format. It
refers to highly organized information that can be readily and seamlessly stored and accessed
from a database by simple search engine algorithms.
Structured data can be directly processed by computing equipment because they are in numeric
form.
Examples of structured data are purchase order data, product IDs and quantities, and customer
IDs, Excel Spreadsheets.
Unstructured
Unstructured data refers to the data that lacks any specific form or structure whatsoever. This
makes it very difficult and time-consuming to process and analyze unstructured data.
While unstructured data are mostly non-numeric and can rarely be computed without any prior
transformation.
Examples of unstructured data are Image files, Audio files, and Video files.
Semi-structured
Semi-structured data pertains to the data containing both the formats mentioned above, that is,
structured and unstructured data.
Not a proper format associated to it.
For Example: Email, Word document, and Log files.
Characteristics of Big Data:-
1) Variety
Variety of Big Data refers to structured, unstructured, and semistructured data that is gathered
from multiple sources. While in the past, data could only be collected from spreadsheets and
databases, today data comes in an array of forms such as emails, PDFs, photos, videos, audios,
SM posts, and so much more.
2) Velocity
Velocity essentially refers to the speed at which data is being created in real-time. In a broader
prospect, it comprises the rate of change, linking of incoming data sets at varying speeds, and
activity bursts.
3) Volume
We already know that Big Data indicates huge ‘volumes’ of data that is being generated on a
daily basis from various sources like social media platforms, business processes, machines,
networks, human interactions, etc. Such a large amount of data is stored in data warehouses.
REFERENCES
Kopenhagen, N., Katz, N., Mueller, B. and Maedche, A. (2011), “How do procurement networks
become social? Design principles evaluation in a heterogeneous environment of structured and
unstructured interactions”, paper presented at 44th Hawaii International Conference on System
Sciences (HICSS), Kauai, HI, 4-7 January
Watson, H.J. and Marjanovic, O. (2013), “Big data: the fourth data management generation”,
Business
Intelligence Journal, Vol. 18 No. 3, pp. 4-8.
P. Beynon-Davies (2009). Business information systems. Basingstoke, UK: Palgrave. ISBN 978-
0-230-20368-6.