Information Systems Basics
H. Turgut Uyar
Date:
2022-09-19
Version:
1.0
File System
data and programs are kept on secondary storage
conceptual unit: file
a folder is used to group files
also called a directory
a folder can contain other folders
file system hierarchy
top level folder: root
Unix File System
etc usr home tmp
passwd bin lib share turing
firefox Documents Music
cv.pdf
Paths
how can we refer to a file?
path: a sequence of folders, and then the file
absolute path: start from the root
relative path: start from the "current" folder
current folder: one dot
parent folder (immediately above the current): two dots
File Manager
program for operating on files and folders
change the current folder
copy, rename, delete, …
File Types
text: human-readable, easier to work with
binary: only machine-readable, more efficient
File Name Extensions
file names have extension parts that indicate the type
starting from the last dot
for example: .pdf
not reliable: these can easily be changed
MIME
standard categorization of file types
https://www.iana.org/assignments/media-types/media-types.xhtml
format: type/subtype
types: image, audio, video, text, …
MIME Types
image/jpeg, image/png
audio/mpeg
video/mp4, video/x-matroska
application/pdf, application/zip
text/html, text/plain
not reliable: only a declaration
Archiving and Compression
combine files and folders into one archive file
compress a file for smaller file size
extract archive file to get the original contents
tar (archiving)
gzip, bzip2 (compression)
zip (both)
Internet Addresses
how can we refer to something on the Internet?
resource: object to process
web page
document
computer
…
Resource Addresses
URL: Uniform Resource Locator
scheme://host/path
https://en.wikipedia.org/wiki/Alan_Turing
Binary Numbers
computers represent information using binary numbers
only two values: 0, 1
bit: binary digit
Representing Numbers
digits correspond to powers of 2
24 23 22 21 20
16 8 4 2 1
Binary Value Examples
decimal binary
2 10
3 11
4 100
5 101
13 1101
22 10110
Byte
smallest unit of information: byte
8 bits
27 26 25 24 23 22 21 20
128 64 32 16 8 4 2 1
MSB LSB
MSB: most significant bit
LSB: least significant bit
if we regard only positive numbers: [0 255]
Byte Value Examples
decimal binary
0 00000000
1 00000001
22 00010110
65 01000001
128 10000000
171 10101011
255 11111111
Binary Value Notation
is the value decimal or binary?
101 or 5?
notation: binary values start with 0b
0b101
Larger Numbers
larger numbers are represented using multiple bytes
561
1000110001
10 00110001
00000010 00110001
Byte Order
also called "endianness"
big endian: MSB → LSB
little endian: LSB → MSB
561
BE: 00000010 00110001
LE: 00110001 00000010
Larger Units
1000 B kilobyte KB 1024 B kibibyte KiB
1000 KB megabyteMB 1024 KiB mebibyteMiB
1000 MBgigabyte GB 1024 MiBgibibyte GiB
1000 GB terabyte TB 1024 GiB tebibyte TiB
1000 TB petabyte PB 1024 TiB pebibyte PiB
Hexadecimal Numbers
binary numbers are difficult to read
hexadecimal: base 16
digits correspond to powers of 16
163 162 161 160
4096 256 16 1
Hexadecimal Digits
decbin hex decbin hex
8 10008 12 1100C
9 10019 13 1101D
10 1010A 14 1110E
11 1011B 15 1111F
Hexadecimal Notation
1 hex digit: 4 bits
1 byte: 2 hex digits
notation: hex values start with 0x
Hex Value Examples
dec bin hex
16 00010000 10
22 00010110 16
30 00011110 1E
65 01000001 41
128 10000000 80
171 10101011 AB
255 11111111 FF
Hex-Binary Conversion
pair hexadecimal digits with groups of 4 bits
starting from the least significant bit
F 3 C 0
1111 0011 1100 0000 10011011100001
1111001111000000 0010 0110 1110 0001
2 6 E 1
Character Sets
how can we represent letters, punctuation signs, …?
we assign a number to each character
a set of all such assignments: character set
also called an "encoding"
ASCII Character Set
7 bits per character
128 characters
English letters
digits
punctuation signs
special characters
ASCII Table
char# char#
! 0x21 A 0x41
# 0x23 B 0x42
7 0x37 Z 0x5A
? 0x3F a 0x61
@ 0x40 z 0x7A
the character '7' (numeric value 55)
is different from the number 7
Case Sensitivity
'A' and 'a' have different numbers
most programs consider these as different letters
ISO8859 Sets
ASCII is only for English
8 bits per character: 256 characters
ISO8859-1: Western European
the first 128 are the same as ASCII
ISO8859-9: Turkish
Turkish instead of Icelandic
ISO8859-1 and ISO8859-9
# ISO8859-1 ISO8859-9
0x3F ? ?
0x41 A A
0xC7 Ö Ö
0xE7 ö ö
0xD0 Ý Ğ
0xF0 ð ğ
Unicode
all characters in all writing systems
UTF-32: 32 bits per character
UTF-16: 16/32 bits per character
UTF-8: 8/16/24/32 bits per character
UTF-8 is the most common character set
UTF Examples
char UTF-32 UTF-16 UTF-8
A 0x00000041 0x0041 0x41
Ö 0x000000D6 0x00D6 0xC396
Ğ 0x0000011E 0x011E 0xC49E
∞ 0x0000221E 0x221E 0xE2889E
举 0x00004E3E 0x4E3E 0xE4B8BE
💪 0x0001F4AA 0xD83DDCAA 0xF09F92AA
Metadata
data: information in the file
metadata: data describing the content
data: photograph
metadata: shooting location, date, …
data: song
metadata: title, artist, lyrics, …
Text File Metadata
actual data: text in the file
metadata: author, copyright, …
character set
Providing Metadata
in the same file along with the data
music files can contain title, artist, …
externally
character set of a text file