01 Introduction
The goal of this lecture is to teach the fundamentals of scientific programming in
the Python programming language. Before we get to into it, let's consider some
general principles of computing and programming.
How does a computer work?
In an extremely simplified way, we can think of a computer as having three parts,
as illustrated here:
CPU RAM Main Storage
The Central Processing Unit (CPU) is the brain of the computer. It performs all
arithmetic and logical operations.
The Random Access Memory (RAM) (often just referred to as memory) is the
short-term memory. It's the place where the CPU get's the instructions from and
stores the results.
Finally, the Main Storage (aka Hard Drive) is the long-term memory of the
computer. This is where data can be stored permanently. Compared to the RAM, it
is typically bigger (it has a higher capacity) but slower in terms of reading and
writing.
As shown in the flowchart, the arrows between these components point both
ways. In practice, we might have some code stored on the main storage, which
can be read into the RAM. The CPU then draws its instructions from the RAM and
returns the results to the RAM. Finally, we can write the results back to the main
storage, when everything is done.
This explanation may seem rather trivial at this point, but it will be important to
understand these different steps when thinking about implementing an algorithm.
For example: Because the main storage is slow, we want to avoid constant input-
output (i/o) operations as much as possible. In terms of speed, it's best to read all
necessary data from disk once and keep it in memory for all operations. On the
other hand, the size of the RAM is limited, so it might not be possible to keep large
datasets in it. In this case we might have to think about cutting the data up in
batches and processing these one-by-one. Keeping the basic hierarchy of CPU,
RAM and storage in mind can help us make good decisions.
What is an algorithm?
Algorithms are one of the most important concepts in scientific programming (or
programming more generally). An algorithm is a process or a set of rules to be
followed when performing a calculation (or solving some other problem). Indeed,
many things quite unrelated to computers are algorithm. Think of this flowchart:
Lamp doesn't work
Plugged in?
Yes No
Bulb burned out? Plug In
Yes No
Change bulb Repair lamp
Other examples of everyday algorithms are cooking recipes or cleaning
instructions for a coffee machine.
Thinking in terms of algorithms is very important for programming, because
computers (by and large) don't make mistakes. Nonetheless, the world is flooded
with bugs and computer errors of all kinds. The reason for this is in most cases
that the algorithms according to which the computer is programmed are not
working as intended by the programmer. This is not the fault of the computer: it
follows all instructions to the letter. The problem is rather that the algorithm has
some loophole, where it behaves unexpectedly, or that it was implemented
incorrectly (coding error). In the end, the fault lies with the programmer, not the
computer. This is even the case, if the programmer used ChatGPT to write the
code.
This is why algorithmic thinking is so important. It's the only way to understand
what the computer should actually be doing, and it's the only way to figure out why
it's not giving us the results we are expecting.
What is a programming language?
Programming languages are the way in which we can communicate to a computer
what algorithm to run. Most programming languages are very general, so that we
can in principle implement almost all algorithms in C++ , rust , matlab , Julia ,
and Python (or a million other languages) and obtain the same results. In this
sense, it doesn't matter which one you use. But of course, there are some
considerations that go into choosing the "right" language.
Compiled vs. Scripting: Traditional languages like C and fortran (but
also modern ones like rust ) use a compiler. This is a program that takes
the written code and translates it into an executable file. Basically it functions
as a translator between human readable code (e.g. in C ) and machine
language. Because of this translation step, compiled languages are usually
very fast, but the compilation can cause delays when developing code. In
contrast, scripting languages like Python or matlab can be run directly as
written by an interpreter. This makes them easier to use, which is
advantageous for high level tasks where computational efficiency is not the
primary concern. In many cases, these two concepts are combined, with low
level numerical routines written in a compiled language and high level control
code in a scripting language.
Popularity: Languages like Python are extremely commonly used, which
makes finding support online (e.g. at StackOverflow) easier. Newer
languages will have less resources available, but may have very active
communities that are happy to help (this is e.g. the case for Julia ).
Open vs. commercial: Most programming languages are open-source,
meaning that they can be used by anyone without having to pay and without
licensing issues. Some common scientific languages are commercial,
however (e.g. matlab and mathematica ). This potentially comes with some
benefits (e.g. customer support and dedicated documentation). Overall
support for popular open-source languages is still better due to their large
communities, however.
Legacy code: A good reason to choose a programming language is that you
are contributing to a project that is already written in that language. This is
the reason for why some rather old languages like fortran are still going
strong.
These are just some of the arguments that might be relevant to your situation,
there are many other differences between languages: Is it statically typed? Object
Oriented? Or do you prefer functional programming?
In this lecture we make the choice to use Python, one of the most common
languages for scientific programming and machine learning. It's also one of the
easiest languages to learn. However, it's important to note that the aim is not just
to learn Python , it's to learn the core concepts of scientific programming and
algorithmic thinking.
Installing and Running Python
Guides for installing Python can be found online. In fact, if you have a Mac or
Linux computer it may well already be installed. Installing Python on your
computer is recommended, since this way you can use it beyond this course.
However, we also offer an option that allows you to run Python in you browser (log
in with your bt-Id).
This is based on Jupyter notebooks You can also install this on your own
computer, for example as a jupyter lab . Notebooks are a great way to learn
coding because they allow you to program interactively and conveniently combine
code and outputs. Later on in the lecture we will also learn about the more
traditional way of coding, where you write the code into a single file which is then
executed.
'Hello, World!'
With all of these preliminaries out of the way, let's get to coding. Traditionally, the
first thing one learns to program in a new language is the string Hello, World! .
In Python this is simply:
print("Hello, World!")
As trivial as this example is, it already teaches us some important concepts:
1. To print the words, we need to wrap them in quotation marks " " (single
marks would have worked just as well). They define their contents to be a
string, one of the data types in Python.
2. The syntax for printing is by invoking the print() function.
In the next lectures we will meet many more functions and data types. However,
one of the most appealing features of Python already becomes apparent: In many
cases, Python statements look exactly what you would expect them to look like. To
print a string, you simply write print(string) .
Additional resources
This lecture follows the structure of the book Introduction to Scientific
Programming. There are many excellent resources online for learning Python, for
example those collected on the official Python website. A nice tutorial can be
found here.