AI a Multifaceted Discipline
AI is a subset of computer science that has origins in mathematics, logic, philosophy,
psychology, cognitive science, and biology, among other
The earliest research into AI was inspired by a constellation of thought that began in the
late 1930s and culminated in 1950 when British pioneer Alan Turing published
“Computing Machinery and Intelligence” in which he asked, “Can machines think?”
The Turing Test proposed a test of a machine's ability to demonstrate “artificial” intelligence,
evaluating whether the behavior of a machine is indistinguishable from that of a human. Turing
proposed that a computer could be considered to be able to think if a human evaluator could
have a natural language conversation with both a computer and a human and not distinguish
between either (i.e., an agent or system that is successfully mimicking human behavior).
The term AI was first coined in 1956 by Professor John McCarthy of Dartmouth College.
Professor McCarthy proposed a summer research project based on the idea that “every
aspect of learning or any other feature of intelligence can in principle be so precisely
described that a machine can be made to simulate it”
The truth is that AI, at its core, is merely programming. As depicted in the Figure above. AI
can be understood as an abstraction of computer science. The surge in its popularity, and
so too its ability, has much to do with the explosion of data through mobile devices,
smartwatches, wearables, and the ability to access computer power cheaper than
ever before.
Artificial Intelligence (AI) is a multifaceted discipline that encompasses various
interconnected fields, each contributing to the overall capabilities and applications of AI.
These fields include Machine Learning (ML), Data Mining (DM), Computer Science (CS),
and Data Science (DS). Here’s an overview of how these disciplines integrate and contribute to
AI:
Artificial Intelligence (AI)
AI is the broadest field that involves creating systems capable of performing tasks that
require human-like intelligence. AI includes:
● Symbolic AI: Involves using symbols and rules to represent knowledge and solve
problems.
● Subfields: Machine learning, natural language processing, robotics, computer vision,
expert systems, etc.
● Goals: Automation of reasoning, learning, perception, and decision-making.
Machine Learning (ML)
ML is a subset of AI focused on developing algorithms that allow systems to learn from data
and improve their performance over time without being explicitly programmed. Key areas in ML
include:
● Supervised Learning: Algorithms learn from labeled data (e.g., classification,
regression).
● Unsupervised Learning: Algorithms find patterns in unlabeled data (e.g., clustering,
dimensionality reduction).
● Reinforcement Learning: Algorithms learn by interacting with an environment and
receiving feedback (rewards or penalties).
Data Mining (DM)
DM involves extracting useful information and patterns from large datasets. It is closely
related to ML and DS but focuses more on discovering hidden patterns. Key techniques
include:
● Association Rule Learning: Identifying interesting relationships between variables in
large databases (e.g., market basket analysis).
● Clustering: Grouping a set of objects in such a way that objects in the same group are
more similar to each other than to those in other groups.
● Anomaly Detection: Identifying rare items, events, or observations which raise
suspicions by differing significantly from the majority of the data.
Computer Science (CS)
CS provides the foundational theory and tools necessary for developing AI systems. It
encompasses a broad range of areas that contribute to AI, including:
● Algorithms and Data Structures: Essential for designing efficient and scalable AI
solutions.
● Computational Theory: Provides the theoretical underpinnings for understanding what
can be computed and how efficiently.
● Software Engineering: Techniques for developing robust and maintainable AI
software.
● Human-Computer Interaction: Designing user-friendly interfaces and interactions for AI
systems.
Data Science (DS)
DS is an interdisciplinary field that uses scientific methods, processes, algorithms, and
systems to extract knowledge and insights from structured and unstructured data. It
combines skills from statistics, computer science, and domain expertise. Key components
include:
● Data Wrangling: Cleaning and preparing raw data for analysis.
● Statistical Analysis: Applying statistical methods to interpret and make inferences
from data.
● Visualization: Creating visual representations of data to communicate insights
effectively.
● Predictive Analytics: Using historical data to make predictions about future events.
Applications of AI- Notes
Healthcare Data use cases
AI is revolutionizing healthcare in various ways. Here are some key use cases:
1. Medical Imaging and Diagnostics:
○ Radiology: AI algorithms can analyze X-rays, CT scans, and MRIs to detect
abnormalities like tumors or fractures more quickly and accurately than
humans.
○ Pathology: AI can assist in examining tissue samples for signs of cancer or
other diseases.
2. Predictive Analytics:
○ Disease Prediction: AI models can predict the likelihood of diseases such as
diabetes or heart disease based on patient data.
○ Epidemic Outbreaks: AI can analyze patterns to predict and track the spread of
infectious diseases.
3. Personalized Medicine:
○ Treatment Plans: AI can recommend personalized treatment plans based on a
patient’s genetic profile, lifestyle, and other factors.
○ Drug Discovery: AI can accelerate the process of discovering new drugs by
analyzing biological data.
4. Patient Monitoring and Care:
○ Wearables: AI-powered wearables can monitor vital signs and alert healthcare
providers to potential issues.
○ Remote Monitoring: AI can assist in monitoring patients remotely, providing
alerts and recommendations to healthcare providers.
5. Administrative Tasks:
○ Scheduling: AI can optimize scheduling for both patients and healthcare
providers, reducing wait times and improving efficiency.
○ Medical Record Keeping: AI can help manage and organize electronic health
records, making it easier for healthcare providers to access and analyze patient
data.
6. Virtual Health Assistants:
○ Chatbots: AI-powered chatbots can provide basic medical advice, schedule
appointments, and offer mental health support.
○ Virtual Nurses: AI systems can assist patients with medication management
and provide reminders for follow-up care.
7. Clinical Trials:
○ Participant Matching: AI can identify suitable candidates for clinical trials
based on their medical history and other criteria.
○ Data Analysis: AI can analyze data from clinical trials more efficiently,
identifying trends and outcomes faster.
8. Surgical Assistance:
○ Robotic Surgery: AI can assist surgeons with precision during operations,
potentially reducing recovery times and improving outcomes.
○ Preoperative Planning: AI can help in planning surgeries by providing detailed
analysis and simulation.
These applications highlight the potential of AI to improve patient outcomes, increase
efficiency, and reduce costs in healthcare.
KDD
KDD (Knowledge Discovery in Databases) is a process that involves the extraction of useful,
previously unknown, and potentially valuable information from large datasets. The KDD process
is an iterative process and it requires multiple iterations of the above steps to extract accurate
knowledge from the data.The following steps are included in KDD process:
Data Cleaning
Data cleaning is defined as removal of noisy and irrelevant data from collection. 1. Cleaning
in case of Missing values.
2. Cleaning noisy data, where noise is a random or variance error. 3. Cleaning with Data
discrepancy detection and Data transformation tools.
Data Integration
Data integration is defined as heterogeneous data from multiple sources combined in a
common source(DataWarehouse). Data integration using Data Migration tools, Data
Synchronization tools and ETL(Extract-Load-Transformation) process.
Data Selection
Data selection is defined as the process where data relevant to the analysis is decided and
retrieved from the data collection. For this we can use Neural network, Decision Trees, Naive
bayes, Clustering, and Regression methods.
Data Transformation
Data Transformation is defined as the process of transforming data into appropriate form
required by mining procedure. Data Transformation is a two step process:
1. Data Mapping: Assigning elements from source base to destination to capture
transformations.
2. Code generation: Creation of the actual transformation program.
Data Mining
Data mining is defined as techniques that are applied to extract patterns potentially useful. It
transforms task relevant data into patterns, and decides purpose of model using classification
or characterization.
Pattern Evaluation
Pattern Evaluation is defined as identifying strictly increasing patterns representing knowledge
based on given measures. It find interestingness score of each pattern, and uses summarization
and Visualization to make data understandable by user.
Knowledge Representation
This involves presenting the results in a way that is meaningful and can be used to make
decisions.
Neural Networks
Neural networks extract identifying features from data, lacking pre-programmed
understanding. Network components include neurons, connections, weights, biases,
propagation functions, and a learning rule. Neurons receive inputs, governed by thresholds
and activation functions. Connections involve weights and biases regulating information
transfer. Learning, adjusting weights and biases, occurs in three stages: input computation,
output generation, and iterative refinement enhancing the network’s proficiency in diverse
tasks.
These include:
The neural network is simulated by a new environment.
Then the free parameters of the neural network are changed as a result of this
simulation.
The neural network then responds in a new way to the environment because of the changes
in its free parameters.
Importance of Neural Networks
The ability of neural networks to identify patterns, solve intricate puzzles, and adjust to changing
surroundings is essential. Their capacity to learn from data has far-reaching effects, ranging
from revolutionizing technology like natural language processing and self-driving automobiles to
automating decision-making processes and increasing efficiency in numerous industries. The
development of artificial intelligence is largely dependent on neural networks, which also drive
innovation and influence the direction of technology.
How does Neural Networks work?
Let’s understand with an example of how a neural network works:
Consider a neural network for email classification. The input layer takes features like email
content, sender information, and subject. These inputs, multiplied by adjusted weights, pass
through hidden layers. The network, through training, learns to recognize patterns indicating
whether an email is spam or not. The output layer, with a binary activation function, predicts
whether the email is spam (1) or not (0). As the network iteratively refines its weights through
backpropagation, it becomes adept at distinguishing between spam and legitimate emails,
showcasing the practicality of neural networks in real-world applications like email filtering.
Working of a Neural Network
Neural networks are complex systems that mimic some features of the functioning of the
human brain. It is composed of an input layer, one or more hidden layers, and an output layer
made up of layers of artificial neurons that are coupled. The two stages of the basic process
are called backpropagation and forward propagation.
Forward Propagation
Input Layer: Each feature in the input layer is represented by a node on the network, which
receives input data.
Weights and Connections: The weight of each neuronal connection indicates how strong
the connection is. Throughout training, these weights are changed.
Hidden Layers: Each hidden layer neuron processes inputs by multiplying them by weights,
adding them up, and then passing them through an activation function. By doing this,
non-linearity is introduced, enabling the network to recognize intricate patterns.
Output: The final result is produced by repeating the process until the output layer is reached.
Backpropagation
Loss Calculation: The network’s output is evaluated against the real goal values, and a loss
function is used to compute the difference. For a regression problem, the Mean Squared Error
(MSE) is commonly used as the cost function.
Loss Function:MSE = \frac{1}{n} \Sigma^{n}_{i=1} (y_{i} - \hat y_{i})^2 Gradient Descent: Gradient
descent is then used by the network to reduce the loss. To lower the inaccuracy, weights are
changed based on the derivative of the loss with respect to each weight.
Adjusting weights: The weights are adjusted at each connection by applying this iterative
process, or backpropagation, backward across the network. Training: During training with
different data samples, the entire process of forward propagation, loss calculation, and
backpropagation is done iteratively, enabling the network to adapt and learn patterns from
the data.
Activation Functions: Model non-linearity is introduced by activation functions like the
rectified linear unit (ReLU) or sigmoid. Their decision on whether to “fire” a neuron is based on
the whole weighted input.
Learning of a Neural Network
1. Learning with supervised learning
In supervised learning, the neural network is guided by a teacher who has access to both
input-output pairs. The network creates outputs based on inputs without taking into account
the surroundings. By comparing these outputs to the teacher-known desired outputs, an error
signal is generated. In order to reduce errors, the network’s parameters are changed iteratively
and stop when performance is at an acceptable level.
2. Learning with Unsupervised learning
Equivalent output variables are absent in unsupervised learning. Its main goal is to
comprehend incoming data’s (X) underlying structure. No instructor is present to offer
advice. Modeling data patterns and relationships is the intended outcome instead. Words
like regression and classification are related to supervised learning, whereas unsupervised
learning is associated with clustering and association.
3. Learning with Reinforcement Learning
Through interaction with the environment and feedback in the form of rewards or penalties,
the network gains knowledge. Finding a policy or strategy that optimizes cumulative rewards
over time is the goal for the network. This kind is frequently utilized in gaming and
decision-making applications.
Types of Neural Networks
There are seven types of neural networks that can be used.
Feedforward Networks: A feedforward neural network is a simple artificial neural network
architecture in which data moves from input to output in a single direction. It has input, hidden,
and output layers; feedback loops are absent. Its straightforward architecture makes it
appropriate for a number of applications, such as regression and pattern recognition.
Multilayer Perceptron (MLP): MLP is a type of feedforward neural network with three or
more layers, including an input layer, one or more hidden layers, and an output layer. It
uses nonlinear activation functions.
Convolutional Neural Network (CNN): A Convolutional Neural Network (CNN) is a specialized
artificial neural network designed for image processing. It employs convolutional layers to
automatically learn hierarchical features from input images, enabling effective image
recognition and classification. CNNs have revolutionized computer vision and are pivotal in
tasks like object detection and image analysis.
Recurrent Neural Network (RNN): An artificial neural network type intended for sequential
data processing is called a Recurrent Neural Network (RNN). It is
appropriate for applications where contextual dependencies are critical, such as time series
prediction and natural language processing, since it makes use of feedback loops, which
enable information to survive within the network.
Long Short-Term Memory (LSTM): LSTM is a type of RNN that is designed to overcome the
vanishing gradient problem in training RNNs. It uses memory cells and gates to selectively read,
write, and erase information.
Multivariate Testing: Overview and Application
Definition: Multivariate testing is a technique used in statistics and experimentation to test
multiple variables simultaneously to determine the optimal combination. Unlike A/B testing,
which compares two versions of a single variable, multivariate testing examines how multiple
variables interact with each other and influence the outcome.
Purpose: The primary goal of multivariate testing is to optimize outcomes by identifying the
most effective combination of variables. This is particularly useful in areas such as
marketing, website optimization, and product design, where multiple factors can influence
user behavior or performance metrics.
Process:
1. Identify Variables:
○ Select the key elements you wish to test. For example, in a website context,
these might include the headline, image, and call-to-action button.
2. Create Variations:
○ Develop different versions for each element. For instance, you might create
several headlines, images, and button colors.
3. Design the Experiment:
○ Set up a test that includes all possible combinations of the variables. This is
often done using a factorial design, where each variation of a variable is
combined with every variation of the other variables.
4. Run the Test:
○ Conduct the experiment over a sufficient period, ensuring that you collect
enough data to achieve statistically significant results.
5. Analyze the Data:
○ Use statistical methods to analyze the results and determine which combination of
variables yields the best performance. This might involve regression analysis or
other advanced techniques.
6. Implement the Findings:
○ Apply the optimal combination identified through the test to achieve the
desired outcome, such as increased conversions or improved user
engagement.
Example: Suppose you're optimizing a landing page and want to test three elements: the
headline, the main image, and the call-to-action button. You create three variations for each
element. A multivariate test would assess all possible combinations of these variations (e.g.,
Headline 1 with Image 1 and Button 1, Headline 1 with Image 1 and Button 2, etc.) to determine
which combination is most effective.
Advantages:
● Comprehensive: Tests multiple variables simultaneously, providing insights into how
they interact.
● Efficient: Saves time compared to running multiple A/B tests.
● Data-Driven: Provides a robust, data-driven approach to optimization.
Challenges:
● Complexity: Multivariate testing can be complex to design and analyze, especially
with many variables.
● Large Sample Size: Requires a significant amount of data to achieve reliable results.
● Interdependencies: The interaction between variables can complicate the
interpretation of results.
Conclusion: Multivariate testing is a powerful method for optimizing complex systems with
multiple variables. By testing all possible combinations of elements, it provides deeper insights
and helps identify the most effective strategies for achieving desired outcomes.
Evaluation Matrix Refer:-https://www.geeksforgeeks.org/metrics-for-machine-learning-model/
Multi-ClassifierDecision Fusion
Multiclassifier decision fusion refers to a process in machine learning and pattern recognition
where decisions from multiple classifiers are combined to improve overall performance. The
idea is to leverage the strengths of various classifiers to achieve better accuracy, robustness,
and generalization. There are several methods for decision fusion, which can be broadly
categorized into hard fusion (based on class labels) and soft fusion (based on class
probabilities or confidences).
Hard Decision Fusion
1. Majority Voting: Each classifier casts a vote for a class label, and the class with the
majority of votes is chosen as the final decision.
2. Plurality Voting: Similar to majority voting but allows for cases where no class gets an
absolute majority.
3. Weighted Voting: Each classifier's vote is weighted based on its performance, and the
class with the highest weighted vote sum is chosen.
Soft Decision Fusion
1. Sum Rule: The class probabilities from each classifier are summed, and the class
with the highest sum is selected.
2. Product Rule: The class probabilities from each classifier are multiplied, and the class
with the highest product is chosen.
3. Average Rule: The class probabilities from each classifier are averaged, and the class
with the highest average is selected.
4. Weighted Sum/Product/Average: Similar to the above methods but with weights
assigned to each classifier based on their performance.
Advanced Methods
1. Bayesian Fusion: Combines decisions based on Bayesian probability, taking into
account prior probabilities and the likelihood of each classifier's output. Example: If
prior probabilities and likelihoods for classes are given, Bayesian rules are applied to
compute posterior probabilities.
2. Dempster-Shafer Theory: A framework for combining evidence from different sources,
handling uncertainty and conflict in the decisions of classifiers.
Example: Belief functions are used to combine evidence and compute
combined beliefs for classes.
3. Stacking (Stacked Generalization): Uses a meta-classifier to combine the outputs of base
classifiers. The base classifiers' predictions are used as inputs to the meta-classifier.
Example: If classifiers A, B, and C output [0, 1, 1], the meta-classifier takes [0, 1, 1] as
input and predicts the final class.
Considerations
● Diversity of Classifiers: Using classifiers with diverse error patterns can improve
the fusion results.
● Classifier Performance: Weights and fusion methods should consider individual classifier
performance.
● Correlation: Highly correlated classifiers may not add much value to the fusion process, so
it's important to select diverse classifiers.
Applications
● Ensemble Learning: Decision fusion is a key component of ensemble learning methods
like Random Forests, Boosting, and Bagging.
● Multimodal Data: Combining decisions from classifiers trained on different data
modalities (e.g., image, text, audio).
● Robust Systems: Enhancing the robustness of systems in critical applications like
medical diagnosis, finance, and autonomous driving.
Let's go through an example of multi classifier decision fusion. Suppose we are working on
a handwritten digit recognition task using the MNIST dataset, which consists of images of
digits (0-9).
Step-by-Step Example
Step 1: Train Multiple Classifiers
Assume we have trained three different classifiers:
1. Classifier A: A Convolutional Neural Network (CNN)
2. Classifier B: A Support Vector Machine (SVM)
3. Classifier C: A k-Nearest Neighbors (k-NN)
Each classifier is trained on the same training dataset and outputs a probability
distribution over the 10 digit classes for each test sample.
Step 2: Obtain Classifier Outputs
For a test sample, each classifier provides the following probability distributions:
● Classifier A:
○ [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1] (predicts class 0)
● Classifier B:
○ [0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.55] (predicts class 9) ●
Classifier C:
○ [0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.55, 0.05] (predicts class 8) Step 3:
Decision Fusion
Let's apply soft decision fusion using the Sum Rule. We sum the probabilities from each
classifier for each class.
● Summed Probabilities:
○ Class 0: 0.1 + 0.05 + 0.05 = 0.2
○ Class 1: 0.1 + 0.05 + 0.05 = 0.2
○ Class 2: 0.1 + 0.05 + 0.05 = 0.2
○ Class 3: 0.1 + 0.05 + 0.05 = 0.2
○ Class 4: 0.1 + 0.05 + 0.05 = 0.2
○ Class 5: 0.1 + 0.05 + 0.05 = 0.2
○ Class 6: 0.1 + 0.05 + 0.05 = 0.2
○ Class 7: 0.1 + 0.05 + 0.05 = 0.2
○ Class 8: 0.1 + 0.05 + 0.55 = 0.7
○ Class 9: 0.1 + 0.55 + 0.05 = 0.7
In this case, we have a tie between Class 8 and Class 9, both with summed probabilities of 0.7.
Depending on our tie-breaking strategy (such as choosing the class with the highest confidence
from the best-performing classifier), we might select one of these classes.
Another Example with Majority Voting (Hard Decision Fusion)
Using the same classifiers and their predicted class labels:
● Classifier A: Predicts class 0
● Classifier B: Predicts class 9
● Classifier C: Predicts class 8
With Majority Voting, we look at the predicted class labels:
● Class 0: 1 vote
● Class 9: 1 vote
● Class 8: 1 vote
Since there is no majority, we might need a tie-breaking rule such as preferring the prediction
from the classifier with the highest overall accuracy.
Weighted Voting (Combining Soft and Hard Decision Fusion)
If we assign weights based on classifier performance (e.g., CNN: 0.5, SVM: 0.3, k-NN: 0.2):
● Weighted Probabilities:
○ Class 0: 0.1 * 0.5 + 0.05 * 0.3 + 0.05 * 0.2 = 0.065
○ Class 1: 0.1 * 0.5 + 0.05 * 0.3 + 0.05 * 0.2 = 0.065
○ Class 2: 0.1 * 0.5 + 0.05 * 0.3 + 0.05 * 0.2 = 0.065
○ Class 3: 0.1 * 0.5 + 0.05 * 0.3 + 0.05 * 0.2 = 0.065
○ Class 4: 0.1 * 0.5 + 0.05 * 0.3 + 0.05 * 0.2 = 0.065
○ Class 5: 0.1 * 0.5 + 0.05 * 0.3 + 0.05 * 0.2 = 0.065
○ Class 6: 0.1 * 0.5 + 0.05 * 0.3 + 0.05 * 0.2 = 0.065
○ Class 7: 0.1 * 0.5 + 0.05 * 0.3 + 0.05 * 0.2 = 0.065
○ Class 8: 0.1 * 0.5 + 0.05 * 0.3 + 0.55 * 0.2 = 0.175
○ Class 9: 0.1 * 0.5 + 0.55 * 0.3 + 0.05 * 0.2 = 0.235
Here, Class 9 would be chosen as it has the highest weighted probability.
These examples illustrate how decision fusion can be applied in practice to combine the
outputs of multiple classifiers, improving the overall performance and robustness of the final
decision.
What is Metalearning?
Meta-learning, or “learning to learn”, is an advanced AI approach where a model improves its
ability to learn new tasks by leveraging experience from previous tasks. In healthcare, this can
help AI systems adapt to new diseases, medical images, or patient profiles with minimal data or
retraining.
Why is Metalearning Useful in Healthcare?
Low-data environments: Many hospitals don’t have large annotated datasets.
Personalization: Each patient is different. Meta-AI can adapt to individuals.
Fast adaptation: Disease variants (e.g., COVID-19 mutations) evolve; meta-learning helps AI
respond quickly.
Types of Metalearning in Healthcare
Type Description Healthcare Example
Model-based Learns to update weights using a Adaptive diagnostics for changing
neural controller patient symptoms
Metric-based Learns a similarity function One-shot learning to detect rare
diseases from few X-rays
Optimization-bas Modifies learning rules to speed Rapid fine-tuning of a model for
ed up training different hospital data
Use Cases in Healthcare
Personalized Treatment Recommendation
Meta-AI can adapt treatment strategies for different patients based on similar cases.
Few-shot Medical Image Classification
Example: Identify rare brain tumors using only 10 example images.
Predicting Disease Progression
Learns across diseases (like diabetes, heart disease) to generalize how conditions evolve.
Cross-hospital Generalization
Meta-learning allows a model trained in one hospital (say, AIIMS Delhi) to perform well in
another (like PGI Chandigarh) with little tuning.
Other Abstract AI Methods in Healthcare
Here are some non-traditional or abstract AI methods increasingly used:
Method Purpose Healthcare Example
Method Purpose Healthcare Example
Transfer Learning Transfer knowledge from general to Use ImageNet-trained model for
specific domains cancer MRI scan
Federated Train AI models across multiple Collaborative cancer diagnosis
Learning hospitals without sharing patient without data breach
data
Causal Inference Understand cause-effect, not just Discover if a drug actually
correlation reduces heart attack risk
Self-supervised Learns patterns without labels Discover anomalies in ECG
Learning without manual annotation
Neuro-symbolic Combine logic + neural networks Explainable drug interaction
AI reasoning
Bayesian Smart search for best parameters Hyperparameter tuning for risk
Optimization prediction models
🧩 Meta vs Traditional Learning – Simple Analogy
Scenario Traditional AI Metalearning
Teaching 10 new Trains each from scratch Learns how to learn from earlier
diseases diseases
Learner type Static learner Adaptive learner
Example CNN for diabetic Meta-AI that adapts to rare eye
retinopathy diseases
Tools & Frameworks
MAML (Model-Agnostic Meta-Learning)
Prototypical Networks
Reptile, Meta-SGD
Libraries: PyTorch Lightning + Higher, TensorFlow Meta, HuggingFace’s PEFT for few-shot
transfer
Summary
Meta-learning and other abstract AI methods are critical to next-gen healthcare systems. They
bring flexibility, adaptability, and personalization that rigid, traditional models often lack —
especially important when data is scarce, diverse, or evolving.
If you're preparing for a project or paper, I can help you create:
Diagrams comparing methods
Real-world case studies
Python code examples (e.g., one-shot classification for rare disease detection)
Here’s a clear, healthcare-related example of meta-learning:
Meta-Learning Example in Healthcare
Task: Detect rare skin diseases with very few training examples.
Problem:
Doctors want an AI system that can quickly learn to recognize rare skin conditions (e.g., types of
skin cancer) from just 1–5 patient images.
Traditional deep learning needs thousands of images per disease — not practical for rare
conditions.
Meta-Learning Solution: Prototypical Networks (Metric-Based Meta-Learning)
Step-by-step:
1. Meta-training phase
The model is trained on many small tasks using common skin conditions (eczema, acne,
psoriasis, etc.).
Each task is a mini problem like: “Can you classify this image as eczema or acne from just 3
examples?”
Example Task:
Support set: 3 images of eczema, 3 of acne
Query image: New unseen image
Model learns to compare query to support and decide the class.
After hundreds of these tasks, the model learns to learn from small examples.
2. Meta-testing phase (Few-shot learning on rare disease)
Now show the model:
Only 2 examples of a rare skin disease, say “Merkel cell carcinoma”
Then ask: "Can you identify whether this new image is also Merkel cell?"
The model uses its learned skill to compare this query image to the support images.
Even though it's never seen Merkel cell carcinoma before, it performs well — because it learned
how to learn from few examples during meta-training.
Summary
AspectMeta-Learning Approach
Goal Learn to classify new diseases with few examples
Method Prototypical Networks (metric-based)
Advantage Needs very little data for new tasks
Healthcare Impact Quick deployment of AI for rare diseases
Prototypical Networks work by learning a space where examples of the same class are clustered
around a prototype (the average representation of that class). When a new example appears, it
is classified based on its distance to these prototypes.
Think of it like:
“Find the center of each disease type → classify new images by closeness to these centers.”
Step-by-Step Overview
Let’s walk through how this would work for skin disease classification:
Step 1: Episode Setup (Meta-Learning Phase)
We simulate a task (called an episode) with:
Support set: Few labeled examples per class (e.g., 5 eczema, 5 acne images)
Query set: New, unseen images to classify (e.g., 1 unknown image)
Step 2: Compute Prototypes
For each class in the support set, compute its prototype:
Step 3: Classify the Query
For each query image:
Pass it through the same network to get an embedding.
Compute the distance (e.g., Euclidean) to each class prototype.
Assign the label of the closest prototype.
Step 4: Train over Many Episodes
Repeat for thousands of such episodes with different diseases (acne, eczema, rash, etc.) during
training, so the model learns a good embedding space.
Step 5: Test on New (Unseen) Diseases
Now you can give the model:
1–5 examples of a rare new disease (say, Leprosy)
Ask it to classify new query images
Even though it has never seen Leprosy before, the model generalizes well because it learned
how to compare based on prototypes.
What is an Evolutionary Algorithm (EA)?
Evolutionary Algorithms are inspired by biological evolution (natural selection, mutation,
crossover).
They search for optimal solutions to complex problems by evolving a population of candidate
solutions over time.
Key concept
Concept Biological Analogy AI Usage
Population A group of organisms A set of candidate solutions (e.g., treatment
plans)
Fitness Survival capability How good the solution is (e.g., accuracy, survival
Function rate)
Selection Best organisms Keep best-performing solutions
survive
Crossover Genetic Mix solutions to create new ones
recombination
Mutation Random DNA Add variation to avoid local minima
changes
🏥 Example: Optimizing Chemotherapy Schedules
Problem: Find the best chemotherapy schedule that:
● Maximizes cancer reduction
● Minimizes side effects
● Is personalized to patient
🧬 How EA is used:
1. Initialize random set of schedules (population)
2. Evaluate fitness for each:
○ Tumor shrinkage rate
○ Side effect severity
3. Select best schedules
4. Crossover & mutate to create new ones
5. Repeat for many generations
6. Final output: Optimal treatment plan
🔍 Other Use Cases in Healthcare
Use Case Description
Feature Selection Select best patient features (e.g., from ECG or MRI) for
diagnosis
Hyperparameter Tune deep learning models for disease classification
Optimization
Drug Discovery Search for effective compound combinations
Medical Imaging Evolve filters or neural architectures for detecting tumors
Multi-objective Optimization Balance accuracy, cost, and interpretability in models
🛠️ Types of Evolutionary Algorithms
Type Key Feature Use Case
Genetic Algorithm (GA) Crossover + mutation Personalized diagnosis
Genetic Programming Evolves code/expressions Discovering medical
(GP) formulas
Differential Evolution Efficient for real-valued Drug dose prediction
(DE) optimization
NSGA-II (Multi-objective) Handles trade-offs Cost vs. treatment
accuracy
Case Study: Genetic Algorithm for Diabetes Prediction
● Dataset: Patient attributes (age, glucose, insulin, BMI)
● Problem: Select best subset of features
● EA Process:
○ Chromosome = binary string (e.g., 10101 means include feature 1, 3, 5)
○ Fitness = accuracy of classifier (e.g., SVM)
○ After 50 generations, achieved better accuracy with fewer features
Benefit Explanation
No gradient required Useful for black-box medical models
Works with discrete & continuous Good for mixed clinical data
data
Global optimization Avoids local traps in complex systems
Multi-objective friendly Useful for safety + effectiveness trade-offs
Challenges
● Slow: Many generations needed
● Interpretability: Evolved solutions may be hard to explain
● Hyperparameter tuning: Population size, mutation rate, etc., must be set carefully
. 🔁 Genetic Algorithms (GAs)
Most widely used EA.
● Representation: Binary strings, real numbers, or other encodings
● Operations: Crossover (recombination), mutation
● Use: Feature selection, treatment planning, scheduling
Healthcare Example:
Select the most relevant genes for cancer classification from genomic data.
2. Genetic Programming (GP)
Instead of evolving parameters, GP evolves programs or formulas.
● Representation: Tree structures (if-then rules, equations)
● Operations: Tree crossover, subtree mutation
● Use: Derive symbolic equations or decision rules
Healthcare Example:
Automatically evolve a diagnostic rule for detecting heart disease from patient symptoms.
3. Differential Evolution (DE)
Great for optimizing continuous-valued functions.
● Representation: Real-valued vectors
● Operations: Mutation based on vector differences, not bit flips
● Use: Drug dose optimization, fine-tuning medical models
Healthcare Example:
Tune the hyperparameters of a deep learning model that predicts ICU patient mortality.
4. Evolution Strategies (ES)
Focus more on self-adaptation of mutation — ideal for real-numbered optimization.
● Representation: Real-valued chromosomes
● Operations: Gaussian mutation, selection based on rank
● Use: Complex medical simulations
Healthcare Example:
Optimize neural control strategies for robotic surgery instruments.
5. Multi-objective Evolutionary Algorithms (MOEAs)
Handle multiple conflicting objectives (e.g., cost vs. accuracy).
Popular algorithms:
● NSGA-II (Non-dominated Sorting Genetic Algorithm)
● SPEA2 (Strength Pareto Evolutionary Algorithm)
Healthcare Example:
Optimize a cancer treatment model balancing:
● High tumor kill rate
● Low toxicity
● Minimal cost
6. Coevolutionary Algorithms
Evolve multiple interacting populations (e.g., host vs. parasite models).
Use: Simulation of competing drug-resistant strains
Healthcare Example:
Model bacteria vs. antibiotic evolution to find resilient drug strategies.
7. Estimation of Distribution Algorithms (EDAs)
Instead of crossover and mutation, build probability models to sample new solutions.
Use: Data-driven healthcare model evolution
Example: Learn the probability distribution of successful diagnostic rules from electronic
health record (EHR) data.
Example GA
Let’s say you have a dataset:
Patient Glucos BMI Age BP Label
e
A 120 32 40 75 1
B 95 28 35 70 0
And one chromosome in GA population is:
Chromosome = [1, 0, 1, 1]
This chromosome says:
“When building the prediction model, ignore BMI (bit 0), and only use Glucose, Age,
and BP.”
GA Step Summary (Clarified):
Step What Happens
Population Many different binary chromosomes like [1011], [1100],
[0111], etc.
Evaluation For each chromosome, use only the selected features (e.g., for
1011, use Glucose, Age, BP) and train/test a model.
Fitness Compute model accuracy. If 1011 gives 89% accuracy, that's its
fitness.
Crossover/Mutation Chromosomes are recombined or mutated to explore better feature
sets.
Difference?
Point Feature Selection (GA) Actual Data (Patient Record)
Type Binary string used by GA Row from dataset
Use Controls what input features the model Used by model to make
sees predictions
Changes Evolved by GA Never changes; it’s your dataset
?
Example [1,0,1,1] = use Glucose, Age, BP Patient X: 130, 29, 45,
80
Use Case:
Create an interpretable rule or formula that tell
What's an Individual in GP?
Each individual is a computer program, often represented as a tree structure (like a math
formula or logic rule).
Example:
IF (glucose > 140) AND (BMI > 30) THEN diabetic
This is an individual in GP. It's a complete prediction logic — like a doctor’s rule.
Two Things in GP:
Thing What it Represents Example
Program/Tree A prediction rule/formula IF glucose > 140 AND age >
(Individual) evolved by GP 50
Dataset (Input) Real patient records Glucose = 160, BMI = 35, Age =
55, → Label = 1
So:
● GP individual = a program
● Data = fed into that program
GP Process Explained with an Example
Suppose your patient data:
Glucos BMI Age Outcom
e e
160 35 55 1
100 25 30 0
Step 1: Initial GP Individuals
Generated randomly, like:
Individual A:
IF glucose > 150 THEN diabetic
Individual B:
IF (glucose + BMI) > 180 THEN diabetic
Individual C:
IF (age < 40) OR (glucose > 130) THEN diabetic
Each is a complete model — not just a configuration like in GA.
Step 2: Test the Programs on Patients
Let’s take Individual A: IF glucose > 150 THEN diabetic
● For Patient 1 (Glucose 160): Predict = diabetic → Correct
● For Patient 2 (Glucose 100): Predict = not diabetic → Correct
So Individual A gets fitness score = 2/2 = 100
Step 3: Evolution (Crossover and Mutation)
Now GP evolves better rules:
Crossover: Swap parts of programs:
A: glucose > 150
B: (glucose + BMI) > 180
→ New: (glucose > 150) AND (BMI > 30)
Mutation: Change a threshold or operator:
glucose > 150 → glucose > 130
So What’s the Key Difference from GA?
Aspect GA GP
What is evolved? A configuration (e.g., which A full prediction program/tree
features to use)
What does the Says which inputs are important Directly predicts the outcome
individual do?
Shape Binary string Tree or expression
Uses data? Yes, to test model with selected Yes, to run through the
features program
Interpretable output? Not necessarily Yes — often a readable rule or
formula
Predict if a patient is diabetic using input features: glucose, BMI, age, blood
pressure.
3. Evolutionary Strategy (ES)
What Does ES Optimize?
ES evolves real-valued parameters (not binary strings or symbolic rules).
In this case, it evolves weights for features in a formula, like this:
prediction = w1 * glucose + w2 * BMI + w3 * age + w4 * BP
Then apply a threshold:
if prediction > 150:
predict diabetic
else:
not diabetic
What is an Individual in ES?
A real-valued vector of weights, like:
Individual A = [1.2, 0.5, 0.9, 0.4]
So:
● w1 = 1.2 (Glucose)
● w2 = 0.5 (BMI)
● w3 = 0.9 (Age)
● w4 = 0.4 (BP)
Dataset Example:
Patient Glucos BMI Age BP Label
e
A 120 32 40 75 1
B 90 28 30 70 0
For Patient A, prediction becomes:
score = (1.2 * 120) + (0.5 * 32) + (0.9 * 40) + (0.4 * 75)
= 144 + 16 + 36 + 30 = 226 → Predict diabetic
Fitness
● Apply this linear formula to all patients.
● Measure prediction accuracy → that’s the fitness.
Evolution
ES evolves these weights over generations by:
● Adding random noise to each weight
(e.g., 1.2 → 1.2 + 𝛿)
● Adjusting mutation strength dynamically (called σ or step size)
4. Differential Evolution (DE)
What Does DE Optimize?
Just like ES, it optimizes real-valued vectors, often for numerical problems.
But instead of adding noise, it uses a differential vector mutation strategy.
What is an Individual in DE?
Same as ES: a vector of weights, e.g.,
Individual A = [0.9, 0.6, 0.4, 0.2]
What Makes DE Unique?
DE generates new individuals like this:
New = A + F * (B - C)
Where:
● A, B, and C are three random individuals from the population
● F = scaling factor, usually between 0.5 and 1
This is a smart way of mutating, using differences between other solutions.
🧪 Real Example
Suppose you have 3 individuals:
● A = [0.9, 0.6, 0.4, 0.2]
● B = [1.0, 0.7, 0.5, 0.3]
● C = [0.8, 0.5, 0.3, 0.1]
Then
(B - C) = [0.2, 0.2, 0.2, 0.2]
F = 0.8
New = A + 0.8 * [0.2, 0.2, 0.2, 0.2]
= [0.9, 0.6, 0.4, 0.2] + [0.16, 0.16, 0.16, 0.16]
= [1.06, 0.76, 0.56, 0.36]
This becomes the new candidate solution.
Initial Population (Individuals)
Each individual is a vector like in GA:
Individual 1 = [1, 0, 1, 1, 0] → Glucose, Age, BP
Individual 2 = [1, 1, 0, 0, 1] → Glucose, BMI, Insulin
Fitness Evaluation (2 Objectives):
Each individual is evaluated on two things:
Individual Accuracy (%) # Features
Used
Ind. 1 92% 3
Ind. 2 95% 4
Ind. 3 89% 2
Trade-Off: Pareto Front
We now don’t choose a single “best” individual.
Instead, we build a Pareto front — a set of individuals where:
● No other individual is better in all objectives
● Each one represents a trade-off
Example:
● Ind. 2: High accuracy, more features
● Ind. 3: Lower accuracy, fewer features
● Both are kept, depending on what we value more
Evolution Process:
Same as before (mutation, crossover), but now:
● Selection is based on Pareto dominance
● Diversity preservation is important (we don’t want 10 similar models)
Popular MOEA algorithms:
● NSGA-II (Non-dominated Sorting Genetic Algorithm)
● MOEA/D (Multi-objective EA based on Decomposition)
Multiagent-Based Infectious Disease Propagation and
Outbreak Prediction System
Overview:
This application simulates how infectious diseases spread through a population using
multiagent systems (MAS). Each agent represents an individual with behaviors and
interactions that mimic real-world social and biological patterns. The system predicts outbreak
zones and helps in early warning and policy planning.
Key Components:
1. Agents:
● Human Agents: Represent individuals in a population. Each has attributes:
○ Age, location, health status (susceptible, infected, recovered, vaccinated, dead)
○ Movement pattern (commuter, student, isolated, etc.)
○ Compliance level with health measures (mask-wearing, social distancing)
● Virus Agents: Represent disease strains with properties like:
○ Transmission rate, incubation period, mutation capability
● Healthcare Agents: Represent hospitals, clinics, or field teams:
○ Manage testing, quarantine, and treatment capacity
2. Environment:
● Simulated geographical region: Towns, hospitals, schools, workplaces
● Incorporates transportation networks and public spaces for agent interaction
● Weather or seasonal factors can be added for realism
Working Mechanism (Illustrative Flow):
1. Initialization:
Populate a virtual map with human agents. Seed a few with an initial infection.
2. Daily Simulation:
○ Agents move, interact (based on proximity rules), and possibly transmit infection.
○ Virus agents update infection status based on contact.
○ Healthcare agents perform tests, trace contacts, and isolate cases.
3. Agent Behaviors:
○ Infected agents may or may not show symptoms.
○ Susceptible agents can get vaccinated or infected.
○ Agents can learn (e.g., become more cautious) based on local infection levels.
4. Outbreak Prediction:
○ Real-time graphs for infection curves, R0 value, and hotspots
○ Predictive modeling using reinforcement learning or statistical forecasting
○ Early alert generation for government decision-making
Example Use Case:
COVID-XX Sim (hypothetical future pandemic)
A new virus strain is introduced. The MAS simulates how different vaccination strategies and
lockdown timings impact spread. The system helps choose the optimal policy that balances
health and economy.
Benefits:
● Captures complex social behaviors unlike traditional SIR models
● Useful for "what-if" scenario testing (e.g., what if schools open early?)
● Aids in resource planning (hospital beds, vaccines)
● Supports localized outbreak containment (district or colony level)
Technologies Used:
● Agent-Based Modeling Platforms: NetLogo, AnyLogic, Repast, or custom Python
frameworks
● Visualization Tools: GIS maps, dashboards (Plotly, Dash, Tableau)
● Machine Learning: Forecasting peaks, optimizing interventions
Research Opportunities:
● Incorporate evolutionary algorithms to model virus mutation
● Use multiagent reinforcement learning for agents learning protective strategies
● Include social media agents to model infodemic or behavior changes
Case Study: Multiagent Simulation for COVID-19 Outbreak
Prediction in Pune, India
Objective:
To model and predict the spread of COVID-19 across urban and suburban zones in Pune, India
using a multiagent system, and to evaluate the effectiveness of various intervention strategies
like lockdowns, vaccination drives, and social distancing.
Setting:
● Region: Pune city, divided into 15 urban wards and surrounding rural blocks.
● Population Modeled: ~20,000 agents (scaled-down model from actual population)
● Duration: Simulated over 180 days
Agents:
1. Human Agents:
○ Roles: Students, office workers, vendors, homemakers, healthcare workers
○ Health states: Susceptible, Exposed, Infected, Recovered, Deceased
○ Behaviors: Movement between home, work, market, hospital
2. Virus Agent:
○ Properties: Transmission probability = 0.15, incubation period = 5 days
○ Mutation potential added for future waves
3. Healthcare Agents:
○ Hospitals: Handle testing, admissions, and report to municipal dashboard
○ Mobile Testing Units: Visit hotspots based on outbreak predictions
Simulation Phases:
1. Initialization:
● Import Pune ward-level map
● Infect 50 random agents in crowded zones (e.g., Market Yard, Kothrud)
● Assign households and job locations
2. Daily Simulation:
● Agents travel and interact (office, bus stops, parks)
● Infections occur probabilistically based on proximity and mask compliance
● Testing centers identify symptomatic cases
● Contact tracing initiated
● Healthcare agent updates resource status
3. Government Interventions:
● Simulate multiple policy experiments:
○ Scenario A: No lockdown
○ Scenario B: Ward-wise lockdowns after crossing 500 cases
○ Scenario C: Mass vaccination in hotspot zones
○ Scenario D: School closure + 80% mask compliance
Outcome & Visualization:
Scenario Peak Total R0 Outbreak Duration
Infections Deaths Value
A (No Lockdown) 6,700 1,200 2.8 90 days
B (Ward Lockdowns) 2,800 500 1.4 130 days
C (Hotspot Vaccination) 1,900 280 1.1 160 days
D (School Closure + 1,300 150 0.9 180 days
Masks)
●
Heatmaps show emerging clusters in slums and transport hubs.
● R0 graphs highlight decline in transmission after masking policies.
● Predictive alert system gives 7-day advance warning of surges per ward.
Key Learnings:
● Multiagent simulation provided fine-grained spatial-temporal control.
● Combining targeted lockdowns + behavioral changes worked better than city-wide
lockdowns.
● Masking had the highest impact-to-cost ratio.
● Helps municipal bodies simulate and prepare for future pandemics or second waves.
Technologies Used:
● Agent Framework: NetLogo and custom Python simulation
● Visualization: GIS with QGIS + Plotly
● Data: Census data + local case reports
ENSEMBLE LEARNING
Ensemble learning is a method where we use many small models instead of just one. Each of
these models may not be very strong on its own, but when we put their results together, we get a
better and more accurate answer. It's like asking a group of people for advice instead of just one
person—each one might be a little wrong, but together, they usually give a better answer.
Types of Ensembles Learning in Machine Learning
There are three main types of ensemble methods:
1. Bagging (Bootstrap Aggregating):
Models are trained independently on different random subsets of the training data.
Their results are then combined—usually by averaging (for regression) or voting (for
classification). This helps reduce variance and prevents overfitting.
2. Boosting:
Models are trained one after another. Each new model focuses on fixing the errors
made by the previous ones. The final prediction is a weighted combination of all
models, which helps reduce bias and improve accuracy.
3. Stacking (Stacked Generalization):
Multiple different models (often of different types) are trained, and their predictions
are used as inputs to a final model, called a meta-model. The meta-model learns how
to best combine the predictions of the base models, aiming for better performance
than any individual model.
1. Bagging Algorithm
Bagging classifier can be used for both regression and classification tasks. Here is an overview
of Bagging classifier algorithm:
● Bootstrap Sampling: Divides the original training data into ‘N’ subsets and randomly
selects a subset with replacement in some rows from other subsets. This step
ensures that the base models are trained on diverse subsets of the data and there is
no class imbalance.
● Base Model Training: For each bootstrapped sample we train a base model
independently on that subset of data. These weak models are trained in parallel to
increase computational efficiency and reduce time consumption. We can use
different base learners i.e. different ML models as base learners to bring variety and
robustness.
● Prediction Aggregation: To make a prediction on testing data combine the
predictions of all base models. For classification tasks it can include majority voting
or weighted majority while for regression it involves averaging the predictions.
● Out-of-Bag (OOB) Evaluation: Some samples are excluded from the training subset
of particular base models during the bootstrapping method. These “out-of-bag”
samples can be used to estimate the model’s performance without the need for
cross-validation.
● Final Prediction: After aggregating the predictions from all the base models, Bagging
produces a final prediction for each instance.
2. Boosting Algorithm
Boosting is an ensemble technique that combines multiple weak learners to create a strong
learner. Weak models are trained in series such that each next model tries to correct errors of
the previous model until the entire training dataset is predicted correctly. One of the most
well-known boosting algorithms is AdaBoost (Adaptive Boosting). Here is an overview of
Boosting algorithm:
● Initialize Model Weights: Begin with a single weak learner and assign equal weights
to all training examples.
● Train Weak Learner: Train weak learners on these dataset.
● Sequential Learning: Boosting works by training models sequentially where each
model focuses on correcting the errors of its predecessor. Boosting typically uses a
single type of weak learner like decision trees.
● Weight Adjustment: Boosting assigns weights to training datapoints. Misclassified
examples receive higher weights in the next iteration so that next models pay more
attention to them.
Stacking (Stacked Generalization)
Concept:
Stacking is an ensemble learning technique where you combine predictions from multiple
different models (base learners) using another model (called a meta-learner or blender) to make
the final prediction.
It’s like having several experts give their opinions and then hiring a “chief expert” to decide the
final answer based on all their inputs.
How Stacking Works:
1. Base Learners:
○ You choose a set of diverse models (e.g., Decision Tree, SVM, Neural Network,
Logistic Regression).
○ Each base model is trained on the original training data.
2. Level-1 Predictions:
○ The predictions from all base models are collected for each training instance.
○ These predictions form a new dataset (meta-data).
3. Meta-Learner:
○ A new model (often a simple one like Logistic Regression) is trained on the
meta-data to learn the best way to combine base learners’ outputs.
○ The meta-learner’s job is to learn patterns in the base learners’ predictions.
4. Final Prediction:
○ On unseen data, each base learner predicts first, and then the meta-learner uses
these predictions to produce the final output.
Advantages:
● Leverages Strengths of Different Models: Works well when base models have different
strengths and weaknesses.
● Improved Accuracy: Can outperform each individual model if the base learners are
sufficiently diverse.
● Flexible: Works with any combination of models.
Example:
Suppose you want to predict if a patient has diabetes.
● Base learners:
○ Model 1: Decision Tree
○ Model 2: Logistic Regression
○ Model 3: k-NN
● Meta-learner: Logistic Regression
● Process:
○ Base models each output “Yes” or “No” (or probabilities).
○ These outputs are given to the meta-learner, which learns how to combine them
to make the most accurate final decision.