Chapter One & Chapter Two
There are two major ways to extract features to represent graphs: feature engineering and
representation learning.
Feature engineering  This approach involves manually designing features that represent
certain aspects of the data (in this case, graphs). These features are typically crafted based on
domain knowledge or heuristics. However, this process can be time-consuming and may not
always capture the most relevant information for the specific task at hand. In the context of
graphs, feature engineering might involve defining specific properties or characteristics of nodes
or edges within the graph. For example: Degree centrality, Betweenness centrality, and
Clustering coefficient
Representation Learning  This approach aims to automatically learn meaningful features
from the raw data itself, without requiring manual intervention. In the case of graphs,
representation learning techniques seek to automatically extract useful features directly from
the graph structure and/or node/edge attributes. This process is more adaptive and efficient
since it does not rely on predefined features and can potentially uncover more nuanced patterns
in the data. For example: GNN
Why can't traditional machine learning techniques be directly applied to computational
tasks on graphs?
Traditional machine learning techniques assume that data points are independent and
identically distributed (i.i.d.), which means each data point is unrelated to or unaffected
by others. However, in graphs, nodes are inherently connected, leading to dependencies
between nodes that violate the assumption of independence. Consequently, traditional
machine learning models struggle to capture the complex relationships and
dependencies within graph data, making them unsuitable for direct application to
computational tasks on graphs.
Graph degree  degree is simply the count of edges that are directly connected to it.
Walk  is a path from vertex u to vertex v where vertex and edges can be repeated
Trail and circuit is a path where no edge is repeated and vertex can be repeated
Path  is a path where no vertex or edge is repeated
Degree Centrality  node can be considered important if there are many other nodes
connected to it. Hence, we can measure the centrality of a given node based on its degree. When
edges increase then centrality increases
Eigenvector centrality  is a measure of the importance or centrality of a node within a
network, taking into account not only the number of connections a node has (like degree
centrality) but also the importance of the nodes to which it is connected.
Katz centrality  is another measure of the importance or centrality of a node within a
network, which considers not only direct connections but also indirect connections through
paths of varying lengths.
Betweenness Centrality  technically is about how many shortest paths run through this node
which shows how important it is.
                                  Graph Embedding
Graph embedding is an approach that is used to transform nodes, edges, and their features into
vector space (a lower dimension) whilst maximally preserving properties like graph structure and
information. Graph embedding transforms graphs where every node is technically in its own dimension to
decrease the dimensionality as a result the computation would be easier. Also, it transforms the nodes into
a low latent vector space that can be used mathematically easily than graphs adjacency matrix while taking
into consideration the connections meaning. Graph embedding is more of a pre processing step.
Types:
Node Embedding: which maps a node in a graph to a vector
Graph embedding: which maps a whole graph into a vector
Techniques:
DeepWalk  it is inspired by word embedding techniques used in Natural Language processing like
Word2Vec where it aims to learn a continuous representation of nodes in a graph by treating random walks
in a graph as sentences while nodes as words
It works by generating a random walk of fixed length from a node that simulates node traversal, and it
apply skip gram model to to generate those walks and learn to predict the neighbors nodes given a target
node
It is mainly used in node classification, edge connection, and clustering in networks
Node2Vec  is like DeepWalk but it introduces two newly variables p and q which control the random
walk we have p biases our walk to return to the the previous node BFS and q biases our walk to explore
more nodes DFS.
By varying those parameters our walk can learn both local and global structural information.
Graph Embedding Algorithms: Graph Convolutional Networks and Graph Neural Network
Why is It hard to analyze graphs ?
1- graphs exist in a non-Euclidean space they don’t exist in 2d or 3d dimensions which makes it harder to
interpret data and use our model on it
2-Graphs are dynamic so it is hard to feature engineer the data. We need to feature learn the interact
details of it
Why can’t we use CNN on graphs?
    1- As CNN works on images in a grid like structure. The variable size of nodes and edges is
       challenging task to work with
    2- CNN works on the pixels of the image and their value while nodes in graphs may have different
       attributes
                                    Graph Neural Network
Why we need Graph Neural Network ? Why Not normal Neural network ?
We can just transform the nodes, attributes, edges, and all of the information in an
adjacency matrix, right?
Actually, Yes, we can do that. However, we are going to face a lot of problems first of all. How are we going
to make the nodes that are connected to each other dependent on each other as this is an important part of
graphs? How many nodes are we going to take as an input ? the insertion of nodes in our adjancey matrix is
going to be variance dependent and we want our work to be not to depend on how we put our nodes. Also,
the graph structure is un organized. So we can’t really work with normal neural networks. We need to make
a bit of twist on it.
The idea behind Graph Neural Networks (GNNs) stems from the need to effectively model and reason
about data that is structured as graphs, such as social networks, citation networks, molecular graphs, and
more. While traditional neural networks are powerful tools for processing structured data like images,
texts, and sequences, they struggle to handle graph-structured data directly due to the irregular and
variable connectivity inherent in graphs.
Why GNN?
1-topology awareness and dependence between nodes. GNN can capture the relation between node and
neighbors which exactly what we want from graph structure
2- information propagation GNN propagates information in graph by messaging and aggregating values
and updating them every time
3-adbatablty to variable graph size
Calculation for Node Classification Example
Node 1: x1=[1,0]
Node 2: x2=[0,1]
Node 3: x3=[1,1]
Node 4: x4=[0,0]
The task is to classify whether node is Class A or Class B
Steps:
1-   Each node is initialized with its vector
2-   Message passing and aggregating.
3-   Message Construction
4-   Message Passing
5-   Define the class using SoftMax function.
1-Assume Weight and Bias Vector Matrix
2-Aggregation
3-Message Construction
4- Node Classification
                              Convolutional Neural Network
Why not normal Neural Network ?
The problem with it is that normal Colored image lets say 64 x 64 x 3 = 12288 . Each neuron in
the input layer would be connected to each neuron in the next layer (the first hidden layer) in a
fully connected manner. This would result in a very large number of connections and
parameters, making the network computationally expensive to train and prone to overfitting,
especially for high-dimensional input data like images.
Lets talk first about Edge Detection. There are different types of Edge detection
1- Sobel Operator  It is mainly used to calculate gradient Intensity. which contains Vertical and
Horizontal Filter that get multiplied by the original image that is used to calculate the gradient magnitude
and orientation.
2-scharr Operator  it is a better version of sobel operator which is more prone to noise and it has a better
coefficients
However, While We are learning about our image; we are facing a problem which is that its dimensions is
decreasing in a huge way. Also, There is an information loss as the pixels around the corner are not taken
into consideration.
But how we actually loss information while we successfully multiply our filter with the edges values ?
The answer for this is that we have an incomplete receptive coverage of the the edge pixels cause they are
being only multiplied once so we don’t really understand the whole picture of them.
What is the solution?
PADDING!!!
Padding is mainly about making a wrapper around our image with Zero values which will solve our
dimension reduction problem, and it will solve our reduction in spatial recognition of the edge pixels.
While, Padding is a pretty good practice however it introduces some problems like more memory alsi it
may add artificial edges which can be a problem that will confuse our model while learning