3.
2
Overview of Neural Networks
Learning Objectives
By the end of this lecture, you will be able to:
• Understand the basic structure and functioning of neural networks, including neurons, layers,
weights, and activation functions.
• Differentiate between key types of neural networks (e.g., feedforward, CNNs, RNNs) and their
relevance to drug discovery tasks.
• Recognize the training process of neural networks, including backpropagation and optimization
techniques.
• Identify appropriate neural network architectures for tasks such as QSAR
modeling, molecule generation, and protein structure prediction.
• Appreciate the advantages of neural networks in handling
biological and chemical data.
Output
Input
What is a Neural Network? Image from Wikimedia Commons (Public Domain), "Structure of Neuron"
by Sanu N, licensed under CC BY-SA 4.0.
• A neural network is a computational model inspired by the way biological neural networks in the human brain process
information. It consists of interconnected layers of nodes (neurons) that transform input data into output through
weighted connections.
• Components of a neuron:
• Cell Body (Soma): Processes signals from dendrites.
• Dendrites: Receive incoming signals.
• Axon: Carries the output signal.
• Synapses: Junctions transmitting signals between neurons.
• Signal transmission:
• Signals sum up at the soma.
• If the total exceeds a threshold, an action potential (signal) fires.
• This binary behavior inspires the Perceptron model.
Aspect Biological Neural Network Artificial Neural Network (ANN)
Neuron Brain cell (neuron) Mathematical function (node/unit)
Connection between neurons transmitting
Synapse Weighted connection between nodes
chemical signals
Synaptic plasticity and experience-driven Weight updates via algorithms (e.g.,
Learning
adaptation backpropagation)
Signal Transmission Electrochemical signals Numeric values (activation signals)
Sensory processing, decision-making, Pattern recognition, classification, prediction,
Purpose
motor control, etc. etc.
Perceptron: Forward propagation
• A simple type of neural network designed for binary classification, mapping inputs to
outputs like 0 or 1.
• How it works:
• Takes input features and multiplies them by weights.
• Computes a weighted sum and applies an activation function (threshold).
• If the sum exceeds the threshold, it outputs 1; otherwise, 0.
• Importance:
• Inspired by McCulloch & Pitts' neuron model (1940s).
• Demonstrated how machines could "learn" linearly separable patterns.
• Paved the way for multi-layer perceptrons and more complex neural networks.
Key Components of a Neuron
• Weights (𝑤)
• Determine the strength and importance of each input.
• Higher weight → greater influence on the output.
• Bias (𝑏)
• A constant value added to the weighted sum of inputs.
• Helps shift the activation function and improve model flexibility.
• Activation function (𝜎) https://www.linkedin.com/pulse/perceptron-the-basic-building-block-neural-
networks-ayush-meharkure-p46if/
• Introduces non-linearity into the model.
• Allows the network to learn complex patterns.
• Common functions:
• Sigmoid → squashes output between 0 and 1
• ReLU (Rectified Linear Unit) → outputs 0 or positive value
• Tanh → outputs between -1 and 1
Perceptron vs. Modern Neural Network Neurons
Feature Perceptron (Basic Neuron) Modern Neural Network Neuron
Multiple inputs (e.g., features like Multiple inputs from previous layer
Inputs
molecular weight, LogP) neurons
Each input has a weight to control its Same — weights are learned during
Weights
influence training
A fixed number added to shift the Same — helps fine-tune the output
Bias
output further
Summation Calculates weighted sum of inputs: Same summation formula
More advanced functions (e.g., ReLU,
Activation Step function (outputs 0 or 1) — fires
Sigmoid, Tanh) — allows non-linear
Function only if sum exceeds a threshold
decisions
Continuous or probabilistic outputs —
Output Binary: 0 (inactive) or 1 (active)
supports more complex predictions
Handles non-linear problems — essential
Learning Can only solve linearly separable
for drug discovery tasks (e.g., molecular
Capability problems (e.g., AND, OR)
binding)
Where It’s Foundation of early single-layer Used in modern multi-layer networks
Used networks (e.g., CNNs, GNNs, Transformers)
Importance of Activation Function
What if we wanted to build a neural network model to distinguish between soluble and
insoluble compounds?
Importance of Activation Function
What if we wanted to build a neural network model to distinguish between soluble and
insoluble compounds?
Linear activation functions produce linear decisions.
Importance of Activation Function
The purpose of activation function is to introduce non-linearity in the network.
Linear activation functions produce linear decisions. Non-linearity allows us to approximate complex
functions.
Comparison of Activation Functions
Activation Common Use
Output Range Key Characteristics
Function Cases
- Simple and efficient computation.
Hidden layers in
- Mitigates vanishing gradient problem.
ReLU [0, ∞) deep neural
- Can lead to "dying ReLU" where neurons
networks.
become inactive for negative inputs.
- Smooth gradient, useful for probabilistic
interpretations.
Output layer for
- Prone to vanishing gradients, especially for
Sigmoid (0, 1) binary classification
large input magnitudes.
tasks.
- Outputs are not zero-centered, which can
affect convergence.
- Zero-centered outputs, which can aid in
optimization.
Hidden layers where
- Still susceptible to vanishing gradients for
Tanh (-1, 1) zero-centered data
large input values.
is beneficial.
- Generally performs better than sigmoid in
hidden layers due to centered output.
- Converts raw scores into probability
distributions. Output layer for
- Ensures the output probabilities sum to 1. multi-class
Softmax (0, 1), sum to 1
- Sensitive to large input values, which can classification
lead to numerical instability; often mitigated by problems.
input normalization.
Structure of a Neural Network:
● Structure of a basic neural network:
○ Input Layer: Takes raw data (like drug properties, protein sequences, etc.)
○ Hidden Layers: These layers process data, identify patterns, and transform input into meaningful features.
○ Output Layer: Provides the final prediction (like drug efficacy, interaction score, or side effect probability).
● How information flows:
○ Weighted connections (w): Every connection between nodes has a weight determining its importance.
○ Activation functions (a): Introduce non-linearity, helping the network capture complex relationships in data.
○ Forward propagation: Data flows from input to output through layers — transforming at each step.
Kumar P, Lai SH, Wong JK, Mohd NS, Kamal MR, Afan HA, Ahmed AN, Sherif M, Sefelnasr A, El-Shafie A. Review of nitrogen
compounds prediction in water bodies using artificial neural networks and other models. Sustainability. 2020 May 26;12(11):4359.
Training a Neural Network
● Learning from data: The network adjusts itself by learning patterns from large datasets, like drug properties and
biological responses.
● Loss function: Measures the difference between the predicted output and the actual result. The goal is to
minimize this difference.
● Backpropagation: A key process in training where the network:
Calculates the error at the output and propagates the error backward through the network, adjusts weights to
improve predictions.
Optimization algorithms: Techniques like Stochastic Gradient Descent (SGD) and Adam Optimizer help the
network minimize the cost function efficiently.
Huang SC, Le TH. Principles and labs for deep learning. Academic Press; 2021 Jul 6.
Loss Functions: MSE vs. Cross-Entropy
● A loss function measures the difference between the predicted output and the actual target. It guides the model
during training by telling it how wrong the predictions are.
● Mean Squared Error (MSE)
○ Use case: Regression problems
○ Measures: Average squared difference between predicted and actual values
○ Sensitive to outliers
○ Intuition: Smaller error = better prediction
● Cross-Entropy Loss
○ Use case: Classification problems
■ Binary classification → Binary Cross-Entropy
■ Multi-class → Categorical Cross-Entropy
○ Measures: Distance between true labels and predicted probability distribution
○ Penalizes confident but wrong predictions heavily
Backpropagation
• Goal: Efficiently update weights by propagating the error backward.
• Steps:
• Forward Pass: Calculate output.
• Loss Calculation: Compare prediction with true output.
• Backward Pass: Use Chain Rule to compute gradients layer-by-layer.
• Weight Update: Apply gradient descent to adjust weights.
• Why it works: It distributes the error to each weight, guiding updates
toward reducing overall loss.
• Challenge: Vanishing gradients (especially with deep networks) — ReLU
and batch normalization help mitigate this.
Optimizers in Machine Learning
● Stochastic Gradient Descent (SGD)
○ Updates weights using the gradient of the loss with respect to each parameter.
○ Pros: Simple and memory-efficient.
○ Cons: Slow convergence, sensitive to learning rate.
○ Update Rule:
● RMSprop (Root Mean Square Propagation)
○ Maintains a moving average of squared gradients to normalize updates.
○ Pros: Works well for non-stationary objectives.
○ Cons: Hyperparameter tuning needed.
○ Update Rule:
● Adam (Adaptive Moment Estimation)
○ Combines momentum and RMSprop. Tracks both the first and second moments of gradients.
○ Pros: Fast convergence, widely used, less tuning.
○ Cons: May generalize poorly in some cases.
○ Update Rule:
Training Hyperparameters: Epochs, Batch Size &
Learning Rate
● Epochs
○ One complete pass through the entire training dataset.
○ More epochs → better learning (up to a point), risk of overfitting.
○ Tune based on training vs validation performance.
● Batch Size
○ Number of samples processed before model weights are updated.
○ Small batch: Noisy updates, better generalization.
○ Large batch: Faster, smoother convergence, needs more memory.
● Learning rate (η)
○ Controls how much weights are updated during training.
○ Too high → may overshoot or diverge.
○ Too low → slow convergence or stuck in local minima.
Types of Neural Network Architectures
• Feedforward Neural Networks (FNN): One-way data flow from input to output - great for basic
classification and regression tasks.
• Convolutional Neural Networks (CNN): Specialized for image data - uses convolutions to detect spatial
patterns (e.g., edges, textures).
• Recurrent Neural Networks (RNN): Designed for sequential data - remembers previous inputs (e.g., time
series, text).
• Transformers: Powerful architecture for sequence data - no recurrence, relies on
self-attention mechanisms (e.g., ChatGPT).
• Graph Neural Networks (GNN): Works on graph-structured data - ideal for
molecular graphs, social networks, and relational data.
Feedforward Neural Network (FNN)
• Key components:
o Input layer: Encodes molecular descriptors (e.g., molecular weight, LogP).
o Hidden layers: Process the data through weights and activations.
o Output layer: Produces predictions — soluble/insoluble, toxic/non-toxic, etc.
• Working principle:
o Data flows in one direction — input → hidden layers → output.
o Each neuron computes:
• Applications in drug discovery:
o ADMET prediction: Absorption, distribution, metabolism, excretion, toxicity profiling.
o Hit identification: Predicts biological activity of new chemical entities.
Thakur A, Konde A. Fundamentals of neural networks. International Journal for
Research in Applied Science and Engineering Technology. 2021 Aug;9(VIII):407-26.
Convolutional Neural Networks (CNNs)
● Key components:
○ Input Layer: Encodes spatial or structured data — e.g., molecular graphs (as grids), 2D molecular images, or 3D voxelized
structures.
○ Convolutional Layers: Apply filters to extract features like edges, shapes, or substructures in molecules.
○ Pooling Layers: Reduce dimensionality while preserving key features (e.g., max pooling).
○ Fully Connected Layers: Integrate extracted features for final decision-making.
○ Output Layer: Generates predictions — e.g., binding affinity, bioactivity, solubility.
● Working principle:
○ Input data is processed through convolution and pooling, followed by dense layers.
○ Each convolutional neuron computes:
○ where w is the filter, x is the input patch, and b is the bias.
● Applications in drug discovery:
○ ADMET Prediction: Predicts pharmacokinetic and toxicity profiles from molecular structures.
○ Hit Identification: Classifies compounds based on predicted biological activity.
○ Image-Based Screening: Analyzes microscopy images in phenotypic screening campaigns.
224x224
=50176
Thakur A, Konde A. Fundamentals of neural networks. International Journal for Research in Applied Science and Engineering Technology. 2021 Aug;9(VIII):407-26.
Recurrent Neural Network (RNN)
• RNNs are a class of neural networks designed for sequential data, where the order of information matters. Unlike
traditional feedforward networks, RNNs maintain a memory of previous inputs, making them ideal for time-series and
sequence-based tasks.
• Key components:
o Hidden state: Maintains memory of previous inputs.
o Neurons with loops: Feed output from one time step into the next.
o Backpropagation through time (BPTT): A specialized form of backpropagation used to train RNNs on sequential data.
• Working principle:
o Processes sequences (e.g., protein chains or SMILES strings) one step at a time.
o Equation:
• Applications in drug discovery:
o SMILES string generation: Generates new molecules from chemical sequence data.
o Time-dependent drug response modeling: Captures how cells respond to drugs over time.
o Peptide drug design: Models amino acid sequences for peptide-based drugs.
https://en.wikipedia.org/wiki/Recurrent_neural_network
Transformers
• Transformers are a class of neural networks designed to handle sequential data
without relying on recurrence.
Instead of processing data step-by-step like RNNs, transformers use a
mechanism called attention to capture relationships between all parts of the
sequence simultaneously. They are especially powerful for modeling long-range
dependencies and parallelizing computation.
• Key components:
• Self-attention mechanism: Allows the model to focus on relevant parts of the input sequence, regardless of
distance.
• Positional encoding: Injects information about the position of tokens, since transformers lack inherent sequence
order.
• Multi-head attention: Improves the model’s ability to capture different types of relationships in parallel.
• Working principle:
• Processes the entire sequence at once, attending to different parts as needed.
• Learns relationships between all elements in the sequence through attention scores.
• Applications in drug discovery:
• Molecular property prediction: Models molecules as graphs or SMILES strings to predict properties like solubility
or activity.
• De novo molecule generation: Designs novel molecules by learning chemical rules from large datasets.
• Protein structure prediction: Predicts 3D structures from amino acid sequences (e.g., AlphaFold).
• Drug–target interaction modeling: Learns complex interactions between drugs and biological targets.
https://doi.org/10.48550/arXiv.1706.03762
Graph Neural Networks (GNN)
• Key components:
o Nodes: Represent atoms or proteins.
o Edges: Represent bonds or interactions between nodes.
o Message-passing layers: Propagate information between nodes, updating their embeddings.
• Working principle:
o GNNs treat molecules as graphs (atoms = nodes, bonds = edges).
o Each node (atom) updates its representation by aggregating information from neighboring nodes.
o Message passing equation:
o Final node embeddings are pooled for molecular-level predictions.
• Applications in drug discovery:
o Molecular property prediction: Predicts solubility, permeability, and toxicity directly from
molecular graphs.
o Drug-target interaction prediction: Models protein-ligand interactions as graphs.
o Protein-protein interaction networks: Helps identify new drug targets by modeling
biological networks.
o Antimicrobial discovery: Predicts antibacterial properties from chemical structure
graphs (e.g., GNNs led to discovering Halicin, a new antibiotic).
Why Neural Networks are the Right Choice?
● Pattern recognition: They excel at identifying hidden patterns in high-dimensional data, crucial for
understanding drug-target interactions, side effects, and biological pathways.
● Handling complex and diverse data: Drug discovery involves multiple data types - chemical structures,
protein sequences, biological responses, and clinical data. Neural networks efficiently integrate and
process multi-modal data.
● Scalability: Neural networks scale with large datasets, which are common in pharmaceutical research.
More data improves model performance.
● Predictive power: By learning intricate relationships, neural networks can
predict drug efficacy, toxicity, interactions, and dosing recommendations
-faster and more accurately than traditional methods.
● Automation and efficiency: Reduces manual effort by automating feature
extraction and decision-making processes - speeding up candidate
identification and testing.
Summary
• Neural networks are the foundation of modern deep learning and are inspired by the structure of the human brain.
• They consist of interconnected neurons organized into layers that learn patterns from data through iterative
optimization.
• Common architectures include Feedforward NNs, Convolutional NNs (CNNs), and Recurrent NNs (RNNs), each suited to
different data types and tasks.
• Activation functions, loss functions, and optimizers play key roles in training neural networks effectively.
• Neural networks have shown remarkable success in drug discovery, powering applications like virtual screening,
molecule generation, DTI prediction, and image analysis.
• While powerful, they require large, well-curated datasets and often act as "black boxes“
highlighting the need for interpretability and proper validation.
Further Reading
• Bzdok, D., Krzywinski, M. & Altman, N. Machine learning: a primer. Nat Methods 14, 1119–
1120 (2017).
• https://medium.com/acing-ai/machine-learning-techniques-primer-60edd9d14863
• Badrulhisham F, Pogatzki-Zahn E, Segelcke D, Spisak T, Vollert J. Machine learning and artificial
intelligence in neuroscience: A primer for researchers. Brain Behav Immun. 2024 Jan;115:470-
479.
Think about it
Why do CNNs work better for images and GNNs for molecules?
Thank You!