DATA ORGANIZATION
PRASHANTH B N
Assistant Professor
Department of Mechanical Engineering
Amrita School of Engineering
INTRODUCTION
Have you ever had trouble finding an important item in your room because the room was
cluttered and disorganized?
Have you ever had trouble finding a file on your computer because you could not
remember what it was named?
Computing systems commonly manage billions of items with every passing moment.
A computing system must keep precise account of this data as it flows through the system
and this requires close attention to every small detail.
Data organization explains that all data must be well organized and properly identified in
order to be usable.
NAMES
Theodor Seuss Geisel, also known as Dr. Seuss, once wrote a short story about a woman
who had 23 sons each having the name Dave.
The following is the opening stanza of this short story, “Too Many Daves”.
Did I ever tell you that Mrs. McCave
Had twenty-three sons and she named them all Dave?
Well, she did. And that wasn’t a smart thing to do.
You see, when she wants one and calls out, “Yoo-Hoo!
Come into the house, Dave!” she doesn’t get one.
All twenty-three Daves of hers come on the run!
This makes things quite difficult at the McCaves’
As you can imagine, with so many Daves.
The story goes on to say that Mrs. McCave wished she had named each son differently so
that there would not be so much difficulty.
NAMES
The same thing is true of computing.
When computational data is inappropriately or confusingly named it becomes very
difficult to access the data.
The same principle is not only true in the context of computation but is equally true in
real life.
Although it can be very difficult to determine the correct name for at item, if the name is
appropriately something in the real world, some degree of confusion and difficulty will
inevitably follow.
If, for example, we mistakenly identify a coral snake (a venomous variety) as a scarlet
snake (a nonvenomous variety) we are likely to endure much more difficulty than Mrs.
McCave!
GUIDELINES
There are guidelines help to ensure that two different people or computing systems can
identify, locate, and access the data without confusion.
Few guidelines that should be followed to properly name items in a computational
setting.
Names should be unique.
One item should not have more than one name.
A name should be descriptive.
The name of an item should be related to the location of the item.
GUIDELINES
Names should be unique
A name should refer to only one thing and never more than one.
Mrs. McCave, for example, used one name to refer to many sons and this caused
confusion at her household.
If name only refers to a single item, then whenever that name is used there is no
confusion about what item is being referred to.
One item should not have more than one name
If one item has two different names, it can be confusing to communicate with other
people (or other computing systems) about that item.
If, for example, I refer to the author of “Too Many Daves” as Theodor and you refer to
the author as Dr. Seuss; communication may be more difficult than if we both used the
same name.
GUIDELINES
A name should be descriptive
When computing with data, the name of an item should describe its role or function
within the system.
What if we decided to give random names to all of the MP3 music files on our computer?
We might, for example, end up with files named zqiy.mp3, gzpu.mp3, and gaep.mp3.
Although each name would be unique and no file would have more than one name, the
names themselves engender confusion because they are not related to the content of the
music.
Descriptive names reduce confusion by helping us understand the role or the content of
the item.
A file named The Star Spangled Banner.mp3 is preferable to naming that same file
zqiy.mp3.
GUIDELINES
The name of an item should be related to the location of the item
Computing systems must manage and organize an overwhelming amount of data.
The World Wide Web, for example, is a collection of trillions upon trillions of pieces of
data and each of these items must not only be uniquely named but also quickly located in
order to be useful.
NAMING GUIDELINES EXAMPLE
The World Wide Web provides one of the best examples of the importance of proper
naming.
Even though the web contains trillions upon trillions of pages of information, every piece
of this information has a unique name.
The names are informally known as web addresses but are formally known as Uniform
Resource Locators (URLs).
The URL of a web page is technically not the name of the item itself, but is rather a way
of locating the item.
Nonetheless, in a practical sense, the URL serves as both the name and the location of an
item.
NAMING GUIDELINES EXAMPLE
Names should be unique
URLs are unique since a single URL will not refer to more than one website.
For example, http://www.wikipedia.org/ always refers to the Wikipedia website and not to
some weather or news page.
If this URL is typed into a web browser, one will always be directed to the main
Wikipedia page and not to a variety of randomly determined websites.
One item should not have more than one name
Although the guideline can be violated, it rarely ever is.
Most websites do not have two different URLs to refer to the same page on the site.
It is very rare, for example, to type two different URLs into a web browser and be
directed to the same web page.
NAMING GUIDELINES EXAMPLE
The name of an item should be related to the location of the item
Every URL indicates not only the content of the page but is primarily related to where the
page can be located.
Every computer on the Internet has a name and the first part of a URL identifies one
computer on the Internet.
Each computer also has a file system and the remaining parts of a URL correspond to the
location of a file in that file system.
For example, the URL http://en.wikipedia.org/wiki/Coral_snake indicates, roughly, that
the computer named en.wikipedia.org has a folder named wiki inside of which is a file
named Coral_snake.
NAMING GUIDELINES EXAMPLE
A name should be descriptive
Most URLs contain a description of the information on the web page.
Consider, for example, the URL http://en.wikipedia.org/wiki/Coral_snake.
The URL is the English language (en) Wikipedia page describing coral snakes.
It would be surprising to load this page into a browser and see a list of news items related
to recent volcanic eruptions or a description of the top grossing movies of all time.
The ability to properly identify and name items is essential when managing large amounts of
information.
It is also essential in real life since improper names create unnecessary confusion when
reasoning about and communicating information.
The most common ways of organizing information are as lists, graphs, and trees.
LISTS
Are you the type of person that makes a list of everything?
Perhaps you have a list of your favorite friends, a list of enemies, a shopping list, a list of
your favorite movies or favorite foods, a to-do list, a reading list, a wish list, or even a list
of all of your lists.
Much of the world’s data is best organized in list form and deep thought is required to
understand how lists can be stored and manipulated by computing systems.
A list is a sequence of items that are arranged in a particular order.
Consider the following list of the top five most expensive paintings of all time.
1. The Card Players by Paul Cezanne
2. No. 5, 1948 by Jackson Pollock
3. Woman III by Willem de Kooning
4. Portrait of Adele Bloch-Bauer I by Gustav Klimt
5. Portrait of Dr. Gachet by Vincent van Gogh
LISTS
Any item on the list can be identified by its position in the list.
For example, say that The Card Players is the number one most expensive painting of all
time.
Also, say that Portrait of Dr. Gachet is the fifth (or number five) most expensive painting
of all time.
When numbers are used to identify the things in a list, the technique is known as
indexing.
Indexing associates a unique number with every item in a set of data and therefore allows
the items to be identified by their index.
Since indices are used to identify data within a list; an index can be understood as a
unique name for one of the items in a list.
An item is denoted in a list by using brackets.
For example, if Paintings is the name of a list, the ith painting is denoted as Paintings[i].
LISTS
Using the notation, the first painting in the list is denoted as Paintings[1] and the second
item in the list is denoted as Paintings[2].
For now assume that the first item in a list has an index of 1, but most computing systems
assume the first item in a list has an index of 0.
For any computing system that uses zero as the starting index, the first item denoted in
the list as Paintings[0] and the second item in the list as Paintings[1].
Consider how a list of items can be stored in computer memory.
In computing systems the memory is a one-dimensional arrangement of items such that
each item is assigned a memory address.
All of the data in a computer is stored at some location in memory and each memory
location is numbered as a list starting from zero.
Each memory location can store one word of data where a word is the smallest unit of
data that is naturally stored by some computing system.
LISTS
Memory is linear.
Each piece of data in memory is located at a memory address.
Memory addresses are numbers or indices.
Even CD and DVD memory is organized linearly.
LISTS
Memory addresses always start at zero and are as large as the amount of memory in a
particular computing system.
Although hard disks, compact discs (CDs), and digital video discs (DVDs) are physically
shaped as discs, the data stored on these devices is also arranged in a linear manner.
Each disc is divided into concentric rings known as tracks.
Each track is divided into arcs known as sectors.
Each sector is able to store one word of data.
As a disc spins, a single track will scan through a sequence of words whose indices
increase linearly.
The first item in the list is denoted as list [first_location] and the second item in the list
as list [second_location].
LISTS – STORAGE ORGANIZATION
Data is stored at some location and each location has an address (Storage Organization).
Each piece of data is located at an address (List of Five Painting Records).
Addresses in lists are numbers (List Indices).
List Indices
Storage Organization List of Five Painting Records
ADDING ELEMENTS TO LIST
Lists can have new entries added anywhere, including in the middle, and their length can
increase as needed.
List after adding a new elements
Original List 1. Bohemian Rhapsody – Queen
1. Bohemian Rhapsody – Queen 2. Every Breath you take – The Police
2. Stairway to Heaven – Led Zeppelin 3. Stairway to Heaven – Led Zeppelin
3. Your Song – Elton John 4. Your Song – Elton John
4. Who made who – AC/DC 5. Who made who – AC/DC
5. Sweet Child O’ Mine – Guns N’ Roses 6. Sweet Child of Mine – Guns ‘n Roses
7. The Ghost of Tom Joad – Bruce Springsteen
REMOVING ELEMENTS FROM LIST
Entries can be removed from lists.
Original List
List after deletion of two elements
1. Bohemian Rhapsody – Queen
1. Bohemian Rhapsody – Queen
2. Every Breath you take – The Police
2. Stairway to Heaven – Led Zeppelin
3. Stairway to Heaven – Led Zeppelin
3. Your Song – Elton John
4. Your Song – Elton John
4. Who made who – AC/DC
5. Who made who – AC/DC
5. Sweet Child O’ Mine – Guns N’ Roses
6. Sweet Child of Mine – Guns ‘n Roses
7. The Ghost of Tom Joad – Bruce Springsteen
REAL TIME APPLICATIONS OF LISTS
We use lists all the time…
Whenever I go shopping.
The “To-Do” list is one of the most common ways that people use to organize their work
or home.
Many simple children's board games are based on list-like structures.
ARRAYS
ARRAYS
An array is perhaps the simplest way of storing a list in memory.
An array stores each item at the memory address corresponding to the items position in
the list.
If the list of the top five most expensive paintings are stored, the information is stored.
Since The Card Players is at position one, it is stored at address 1 and since the Portrait of
Dr. Gachet is at position five in the list, it is stored at address 5.
The list of five paintings can be stored as
an array in memory or on a DVD.
ARRAYS
Array is a fixed sized list comprising of collections of variables called as elements.
An array’s elements are numbered using indices (Organization of array)
The total number of elements in a given array is called as the length of an array.
All elements of a given array are of the same type which allows one to represent a group
of similar elements as an ordered sequence and work on them as a whole.
Arrays can be in different dimensions (1-D and 2-D array), but the most used are the
one-dimensional and the two-dimensional arrays.
One-dimensional arrays are also called vectors.
Organization of Array 1-D and 2-DArray
ARRAY PROPERTIES
Arrays represent a group of related data.
(e.g. -- temperature for the last five days, or stock prices for the last 30 days.)
All data within a single array must share the same data type.
The size of an array is fixed once it is created.
DECLARATION/CREATION OF AN ARRAY
Allocate an array with length of specified elements of mentioned type.
During the creation/declaration of the array one must set the total number of the elements
in the brackets (a non-negative integer number), defining its length.
Eg. number mark[6], string name[5]
The array cannot change its location nor can it change its length.
The array cannot expand to fill more nor can it shrink to take up less storage.
mark[6]
BOUNDARIES OF AN ARRAY
Arrays are by default zero-based, which means the enumeration of the elements starts
from 0.
The index of the first element is 0, and that of the second element is 1, and so on.
In an array of n elements, the last element has the index n-1.
Boundary of myArray is 5
ARRAYS STORAGE
The array elements can be accessed directly using their indices.
Each element can be accessed through the name of the array and the element’s index
(consecutive number) placed in the brackets.
For Eg. numberArray[2] produces the 3rd element, 6.
Every item in the array can be accessed, as long as the location of the array is in track.
The location of the array is known as the base address or the anchor, and is the memory
address of the first item in the array.
From a computer perspective, the name of an array corresponds to the anchor of an array.
If the name of the array is known, then the anchor is known and all of the array
items can be accessed by their index.
Access elements by….(anchor of array A)+(i-1)
ARRAYS STORAGE
If the name of the array is known, then the anchor is known and all of the array items can
be accessed by their index.
One very important property of an array is that once an array is stored in memory, the
array cannot change its location nor can it change its length.
The array cannot expand to fill more memory nor can it shrink to take up less memory.
The main advantage of using an array is that any element in an array can be easily
accessed by just knowing the position of the item in the list.
The main disadvantages are that the size of the array is fixed and that inserting and
deleting items in the list will often require a significant number of steps.
Can you show how can the elements (4,1, and 6) be accessed from
the array Arr=[1,2,3,4,5,6] ?
4-----------arr[3]
ANSWER…… 1-----------arr[0]
6-----------arr[5]
ARRAYS STORAGE
Two different lists that are stored in
memory as arrays.
The array named Paintings is anchored
as location 3 and the array named
Movies is anchored at location 8.
From the viewpoint of the computer,
the name Paintings actually means
memory location 3 and the name
Movies actually means memory
location 8.
Paintings array cannot expand since
there is another array stored
immediately after it.
ARRAYS DELETION AND ADDITION
Deleting an element from an array is very similar to deleting an item from a handwritten
list.
We can also insert an element into the array, but only if there are blank locations at the
end of the array.
Inserting an Element at
the Beginning
ARRAYS FUNDAMENTALS
1. GIVENTHE ARRAY SCORES,
scores 79 87 94 82 67 98 87 81 74 91
Can you find
How many elements are there?
What is the fifth element in the array?
What is the result of mean =
(scores[0]+scores[1])/2
2. Workout the average of Quiz1and student0
with the following95data? 85
89 Quiz1 80 Quiz2
Student0 Student1
ANSWERS
How many elements are there? 10
What is the fifth element in the array? 67 What is the result of
mean = (scores[0]+scores[1])/2; (79+87)/2 = 83
Can you workout the average of Quiz1and student0 with the
following data??? 92,90
READ AND DISPLAY ARRAY
ELEMENTS
1. Start
2. Declare A[10],i
3. Set i = 0
4. While (i < 10)
4.1 ReadA[i]
4.2 i = i + 1
5. Set i =0
6.While (i < 10)
6.1 PrintA[i]
6.2 i = i + 1
7. Stop
FIND SUM OF ARRAY
ELEMENTS
1. Start
2. Declare A[10], sum,i
3. Set sum = 0, i = 0
4. While (i < 10)
4.1 ReadA[i]
4.2 i = i +1
4.3 sum = sum +A[i]
5. Print sum
6. Stop
FIND AVERAGE OF ARRAY
ELEMENTS
1. Start
2. Declare A[10], sum, I, average
3. Set sum = 0, i = 0
4. While (i < 10)
4.1 Read A[i]
4.2 i = i +1
4.3 sum = sum + A[i]
5. average = sum / 10 6.Print
average
7. Stop
DISPLAY ODD ARRAY
ELEMENTS
1. Start
2. Declare A[10], i
3. Set i = 0
4. While (i < 10)
4.1 Read A[i]
4.2 i = i + 1
5. Set i =0
6. While (i < 10)
6.1 if ( A[i] mod 2 is equal to 0)
then Print A[i]
6.2 i = i + 1
7. Stop
MAXIMUM ARRAY ELEMENT
1. Start
2. Declare A[10], I, MAX
3. Set i = 0
4. While (i < 10)
4.1 Read A[i]
4.2 i = i + 1
5. Set i =1, MAX = A[0]
6.While (i < 10)
6.1 if ( A[i] >MAX)
then MAX= A[i]
6.2 i = i + 1
7. Print MAX
8. Stop
MULTI DIMENSIONALARRAYS
• Although it is useful to have data in one-dimensional arrays, it is sometimes
useful to arrange data into rows and columns (two dimensional arrays),
rows columns and ranks (three-dimensional), or even higher
dimensionality.
•A particular cell in a five-dimensional array may be accessed as follows:
Arr_name[dim1][dim2][dim3][dim4][dim5].
MULTI DIMENSIONAL ARRAYS CONTD---
• For eg., if there are eight students and ten questions, and the answers can
be stored in a two-dimensional array as given.
• Similarly, the distances between the cities can be represented using a two-
dimensional array as given.
MULTI DIMENSIONAL ARRAYS CONTD---
• An array can be declared with multiple dimensions.
2 Dimensional 3 Dimensional
• Multiple dimensions get difficult to visualize graphically ( say, from a
dot to a cube within a cube) .
•