0% found this document useful (0 votes)

46 views10 pages

Sensor Fusion Presentation

The document discusses key concepts in multimodal fusion, including modality correlation, computational constraints, and fusion operations. It reviews various papers that explore dynamic multimodal fusion techniques, challenges in low-quality data, and the effectiveness of design principles in tasks like emotional recognition and intent detection. Future research directions emphasize decentralized frameworks for UAVs and the need for robust, real-time sensor fusion in dynamic environments.

Uploaded by

Ido Akov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views10 pages

Sensor Fusion Presentation

Uploaded by

Ido Akov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

o What are the concepts discussed in the papers?

General principles

● Modality correlation: It is important to leverage the correlation between noises in

different modalities. This is the motivation for sensor fusion, as it allows for better
general performance than in the unimodal case. However, correlation also leads to
information redundancy, and thus increased computation.
● Primary vs. secondary modalities: In certain tasks one modality may be informative,
whereas others may provide useful cues to the primary (e.g. depth cues on top of RGB)
● Computational constraints: Processing different modalities may hold different
computational costs, e.g. image is 2D while audio and text are 1D.
● Fusion operations: concatenation, addition, weighted addition, etc… How are features
of different modalities fused together in the network?
● Fusion stage: at which stage is fusion of features done? How much processing of each
individual modality before information is combined together to look for correlation
patterns, semantic meaning, etc..?

Multimodal Fusion on Low-quality Data: A Comprehensive Survey (paper 1)

● Properties and challenges of multi-modal datasets:

1. Noisy: need to mitigate the underlying influence of arbitrary noise). Types of
noise: feature-dependent or cross-modal (misaligned labels, ‘semantic’). The first
category is handled using variation-based noise reduction, and the second using
filter/rectification/noise-adaptive regularization techniques
2. Incomplete: Lacking measurements regarding certain modalities, as a result of
sensor errors, economical costs, etc… Imputation-based as well as
imputation-free techniques exist.
3. Imbalanced: Need to mitigate the influence of bias (i.e. one modality more
helpful than others-> model only utilizes this one) as well as discrepancies
between modalities. These may regard convergence speed (different modalities
yield different convergence speeds in learning), quality (some sensors may be
better/worse and noisier than others). Techniques involve modifying learning
objectives, scheduling learning rates per modality, high-quality modality masking,
etc…
4. Quality dynamically varying multimodal data: varying quality is necessary in
real-world applications (e.g. RGB is worse than thermal in low-light conditions,
vice versa in high-light conditions). This demands dynamic multimodal fusion,
which can adapt to the changing quality of multimodal data by fusing features
from different modalities adaptively.

Dynamic Multimodal Fusion (paper 2)

● Dynamic sensor fusion: generate data-dependent forward paths on-the-fly

● modality-level vs. fusion-level decisions, Two different implementations, each utilizing a
gating-function architecture for reduced computation, where the gating networkality
depends on the task at hand.
● Modality-level approach: the gating network chooses a branch, which corresponds to
some model which performs inference using a subset of available modalities. This works
better for simpler tasks involving some ‘main’ modality, such as emotional recognition
on a dataset involving text, audio and video, with text being the most informative
modality of the three.
● Fusion-level approach: Fusion blocks are interleaved between feature extraction blocks,
where the gating network decides the relevant fusion operation for each such block.
Fusion operations are either identity/concatenation/addition/weighted sum. This
approach is useful for more difficult tasks such as image segmentation, given RGB-D
modalities.
● Control over computational cost: because all forward path branches are known ahead
of time, their relative computational cost can be estimated. This allows for a loss
function with a regularization term λ, which can prioritize limited-computation over
precision, or vice versa.
InMu-Net: Advancing Multi-modal Intent Detection via Information Bottleneck and
Multi-sensory Processing (paper 3)

● Denoising: filter out intent-irrelevant information (‘semantic’ noise)

● Bottleneck: Distill relevant part by projecting features into a smaller-dimensional space
and then re-projecting back to original feature space. Denoises features extracted by
respective encoder modules per modality.
● Saliency Preservation: retain as much intent-relevant information as possible.
● Mutual information: Used to establish a loss function for saliency preservation, whereby
minimizing the difference in conditional entropy H(y|f) - H(y|f’, where f is extracted
feature vector and f’ is the denoised f, is equivalent to minimizing the KL-divergence
between the two vectors.
● Kurtosis: Measure of long-tail distribution: higher kurtosis indicates higher density in
distribution ‘tail’. In this case it is used as a measure for the distribution of model
predictions over samples, where minimizing kurtosis results in reduced ‘edge-case’
behavior in the model.

O What kind of modalities are considered?

Tasks and related modalities

Emotional recognition/sentiment analysis (text, visual, audio)
movie genre classiﬁcation (image, text)
Semantic segmentation (RGB, depth images)
Multi-modal Intent Detection (text, visual, audio)
Medical-imaging (MRI, PET, CSF)
Urban area classification (hyperspectral imaging (HSI), LiDAR)
o How are the multimodal learning systems built? What are the key
instructional design principles derived?

Paper 2

● gating network to provide modality-level or fusion-level decisions, where modality-level

DynMM and fusion-level DynMM target different granularity levels, depending on the
task at hand (modality-level is coarse, fusion-level is ‘fine’).

Modality-level
● classical Mixture-of-Experts (MoE) framework, where each expert specializes in a subset
of modalities.
● gating network, denoted by G(x) decides which expert network should be activated,
where each such expert specializes in a subset of available modalities
● G produces a one-hot encoding, i.e., only one branch is selected for each instance (for
reduced computational costs).
● Examples of possible gating networks: multi-layer perceptron, transformer, CNN
depending on the task at hand
● G can take intermediate features per modality as inputs

Fusion-level
● Instead of completely skipping computations of some modality, it is better to harness
these at certain stages.
● interlace static feature extraction blocks (MLP/attention) with dynamic fusion cells
● A fusion cell can be implemented as any function to fuse multimodal features, such as
simple identity mapping (i.e., Oi = x1), addition (i.e., Oi = x1 + x2 + · · · + xM ),
concatenation (i.e., Oi = [x1, x2, · · · , xM ]) and self-attention.
● Global gating network: same network is used for all fusion-level decisions, and thus
works on all intermediate features. This is more efficient training-wise.

● Helpful in tasks where the ﬁnal prediction is mainly based on a dominant modality, i.e.
keep extracting features from main modality, while using others as ‘cues’ (RGB-Depth for
example), and controlling how and when the auxiliary modality comes in to assist the
main prediction process.
● Used for semantic segmentation (i.e., a dense prediction problem).

Paper 3

● This network does static multimodal fusion (computation is fixed)

● Uses separate modality-specific Transformers to capture the features of distinct
modalities, fuses (concatenates) features.
● Denoising bottleneck comprises projection layers to lower-dim space, which are then
re-projected back to the original feature space. This distills intent-relevant information
while filtering out irrelevant information
o How do these principles enhance learning in multimodal environments?

Paper 2

● Does not directly enhance learning, but rather enhances control over computational
cost of inference via a regularization parameter λ during training,

where C(Ei) denote the computation cost (e.g., MAdds) of executing an expert network Ei, and
C(Oi,j) represents the computation cost of the i-th fusion operation in the j-th cell.
● For example, in an emotion recognition task, the dynamic model can only activate its
text branch and skip paths corresponding to the other two modalities, whereas in a
‘harder’ case it could rely on all modalities, thus leading to heavier computation.
● Fusion-level is useful in tasks where the ﬁnal prediction is mainly based on a dominant
modality while using the others as ‘cues’ (e.g. is RGB-Depth). In this mode it is possible
to control how and when the auxiliary modality comes in to assist the main prediction
process.
Paper 3

● Denoising bottleneck module allows model to retain intent-relevant information while

discarding irrelevant information.
● Loss function combines foundational supervision component (comparison of extracted
features with target), with regularization terms (saliency preservation, reduced kurtosis).
The first three loss terms are ‘foundational’ in the sense that they supervise the model
output with regard to a target. More specifically, L_f, L_f~ supervise features and
L_{modality} gives the text modality dominance.

● The denoising bottleneck module serves as data augmentation during training, thus
making the model more robust to noisy input.

o Summarize the empirical evidence supporting the effectiveness of the design

principles mentioned in the article.

Paper 2

● DynMM is used in 3 tasks (movie genre classiﬁcation, sentiment analysis, semantic

segmentation) involving different datasets with different modalities.
● modality-level DynMM is used for the ﬁrst two tasks and fusion-level for the third.
● Different variants of the model are obtained per task by using different value of the
regularization hyperparameter, thus yielding different computation-precision tradeoffs
within each variant
● Light-computation variants all do well compared to the SOTA baselines with regard to
classification metrics (F1, MAE, MIoU, etc..). This allows for almost 50% reduction in
computation, while remaining close to the baseline requirements. On the other hand,
the heavy-computation variants exceed SOTA baselines.
● The model is also shown to be more robust than other SOTA baselines approaches via
Gaussian noise injection in the more difficult semantic segmentation task.
Paper 3

● The model does as well as or better than SOTA baselines with regard to the task of MID
(multimodal intention detection) over two multimodal datasets. The model also has a
lower perplexity measure.
● The model does much better than SOTA baselines over uncommon intent categories
thanks to kurtosis regularization.
● Ablation studies show that the total loss function (comprising foundational and
regularization terms) does significantly better than subsets of components over both
datasets.
● Modality corruption and low-resource studies also show that the model achieves
high-performance compared to the baselines.
● Generalizability of the model is examined via cross-architecture and cross-task scenarios.
This reveals that the underlying sensor-fusion architecture can be improved, and that
the model achieves competitive performance with multi-modal sentiment analysis
(MSA) baselines.

o Discuss any limitations or gaps in the empirical studies reviewed.

Paper 2

● Different gating network modules (MLP/Transformer/convolutional) were chosen for

different tasks. What were the motivations for this? Is this modality or task dependent?
● Given certain computational or precision requirements, how could one go about
choosing a regularization parameter λ which would yield a suitable model?
● What is the difference in computational cost between modality and fusion-level
implementations regarding the same task?
● Could one go about integrating the two by using a fusion-level variant within the
framework of a modality-level variant? Would this be useful at all?

Paper 3

● The proposed architecture is used only within the framework of multimodal datasets
including text, also within the cross-task study. How does the model compare to the
baseline in tasks which involve no text at all (meaning the loss function needs to be
modified)?
● What is the effect of the denoising bottleneck in the case of highly ‘misaligned’ samples
(i.e. text with one intent, visuals with another, audio with a third)? Will it automatically
prioritise the ‘main’ modality in this case?

Using Heterogeneous Multilevel Swarms of UAVs and High-Level Data

Fusion to Support Situation Management in Surveillance Scenarios (paper 4)

o What future research directions does the article suggest for multimodal
sensor/information fusion ?

● multi-sensor multi-source data fusion: fuse information delivered by heterogeneous

swarms of UAVs
● Decentralized, distributed frameworks for high-level information fusion allows for
decreased workload per unit.
● Information sharing between UAVs- different sensors can be ‘aware’ of each other- save
higher-level computation
● High level data fusion: situation assessment, relationships between objects, impact
assessment. ‘Object-Oriented World Model’ allows for a framework within to complete
such tasks, demanding perhaps different information fusion approaches

o How can these directions be applied to the field of multimodal sensor fusion
in dynamic environments?

● A decentralized framework demands a dynamic multimodal fusion approach, as some

drones may be unavailable (as a result of loss, refueling, etc…)
● Similarly, a higher-level operator (intelligence control, etc..) can choose a subset of
drones in the swarm to collect data, depending on position, availability, and sensors.
● Fusion flexibility: low-quality data can also be collected and utilized. Missing data can be
imputed if necessary.
● UAVs are autonomous, and can thus perform computationally-light sensor fusion
themselves if necessary (in case higher-level operator is unavailable).

o What are the challenges presented by dynamic environments and varying

topologies?

● Real-time (online) implementation of sensor-fusion algorithms, needs to be

computationally efficient and robust.
● A dynamic environment means that new events may occur all the time. This demands
quick reactions to changes (for example- a drone in the swarm falls).
● Dynamic movement requires a mobility model. This is additional computation for each
drone (swarm needs to navigate, keep distance).
● Varying topology means sensor fusion model can’t be ‘optimized’ with regard to a
certain modality. For example- in certain topologies less light may be available, or certain
sensors may be preferable to others.

(Questions of mine)

● How much computing power do the drones in question have? Are the algorithms
mentioned in the paper (optical ﬂow, projective image transformation, motion detection
etc..) implemented by the drones themselves?
● What kinds of sensors? Does a heterogeneous fleet necessarily mean that these could
change from drone to drone?
● Can OCR be integrated into the fusion process? Would this be accomplished by
individual drones or by a higher-operator?

Data-Efficient Multimodal Fusion On A Single GPU
No ratings yet
Data-Efficient Multimodal Fusion On A Single GPU
15 pages
Deepsetfusion
No ratings yet
Deepsetfusion
10 pages
Sensors 23 02381 v2
No ratings yet
Sensors 23 02381 v2
16 pages
Tun Et Al. - 2024 - Resource-Efficient Federated Multimodal Learning Via Layer-Wise and Progressive Training
No ratings yet
Tun Et Al. - 2024 - Resource-Efficient Federated Multimodal Learning Via Layer-Wise and Progressive Training
14 pages
Lecture12 1MultimodalFusion
No ratings yet
Lecture12 1MultimodalFusion
66 pages
MMML Tutorial - P2 Representation
No ratings yet
MMML Tutorial - P2 Representation
41 pages
Lec5 - Fusion
No ratings yet
Lec5 - Fusion
43 pages
Efficient Low-Rank Multimodal Fusion With Modality-Specific Factors
No ratings yet
Efficient Low-Rank Multimodal Fusion With Modality-Specific Factors
10 pages
2024 Progressive - Fusion - Network - With - Mixture - of - Experts - For - Multimodal - Sentiment - Analysis
No ratings yet
2024 Progressive - Fusion - Network - With - Mixture - of - Experts - For - Multimodal - Sentiment - Analysis
8 pages
Mai 1
No ratings yet
Mai 1
37 pages
Multimodel Deep Learning
No ratings yet
Multimodel Deep Learning
92 pages
Lec2 - Data
No ratings yet
Lec2 - Data
33 pages
Session 15-1 Multimodal
No ratings yet
Session 15-1 Multimodal
82 pages
Author NameAffiliationauthor@Email
No ratings yet
Author NameAffiliationauthor@Email
8 pages
PhoCLIP 232 Specialized Project OFFICIAL
No ratings yet
PhoCLIP 232 Specialized Project OFFICIAL
105 pages
Lecture7 2-MultimodalInference
No ratings yet
Lecture7 2-MultimodalInference
68 pages
NeurIPS 2021 Attention Bottlenecks For Multimodal Fusion Paper
No ratings yet
NeurIPS 2021 Attention Bottlenecks For Multimodal Fusion Paper
14 pages
Incorporating Visual Information Into Natural Language Processing
No ratings yet
Incorporating Visual Information Into Natural Language Processing
151 pages
Perception Reason Think Plan
No ratings yet
Perception Reason Think Plan
91 pages
Recent Advances and Trends in Multimodal Deep Learning A Review
No ratings yet
Recent Advances and Trends in Multimodal Deep Learning A Review
35 pages
Multi Model
No ratings yet
Multi Model
36 pages
Li Et Al. - 2023 - Multimodal Foundation Models From Specialists To
No ratings yet
Li Et Al. - 2023 - Multimodal Foundation Models From Specialists To
119 pages
Deep Boltzmann Machine Paper
No ratings yet
Deep Boltzmann Machine Paper
9 pages
AI Multimodal Fusion with MuSE
No ratings yet
AI Multimodal Fusion with MuSE
13 pages
2024 Emnlp D2R
No ratings yet
2024 Emnlp D2R
12 pages
Perception, Reason, Think, and Plan
No ratings yet
Perception, Reason, Think, and Plan
75 pages
PHD Title: Efficient Multimodal Vision Transformers For Embedded System
No ratings yet
PHD Title: Efficient Multimodal Vision Transformers For Embedded System
4 pages
Answer
No ratings yet
Answer
4 pages
MultiVae: Python Library for Multimodal Autoencoders
No ratings yet
MultiVae: Python Library for Multimodal Autoencoders
39 pages
Baltrusaitis MMML Survey
No ratings yet
Baltrusaitis MMML Survey
20 pages
ICML2023 - Tutorial多模态机器学习Multimodal Machine Learning
No ratings yet
ICML2023 - Tutorial多模态机器学习Multimodal Machine Learning
120 pages
Multimodal AI for Industry Data
No ratings yet
Multimodal AI for Industry Data
8 pages
Evolution of Multimodal AI Models
No ratings yet
Evolution of Multimodal AI Models
14 pages
1 s2.0 S0893608024004775 Main
No ratings yet
1 s2.0 S0893608024004775 Main
14 pages
MML Language
No ratings yet
MML Language
11 pages
Multimodal Machine Learning Survey
No ratings yet
Multimodal Machine Learning Survey
21 pages
Multimodal Data Fusion Techniques
No ratings yet
Multimodal Data Fusion Techniques
11 pages
Multimodal Federated Learning for IoT
No ratings yet
Multimodal Federated Learning for IoT
12 pages
Multimodal Deep Learning: Seminar Report On
No ratings yet
Multimodal Deep Learning: Seminar Report On
34 pages
MDCKE-Multimodal Deep-Context Knowledge Extractor That Integrates
No ratings yet
MDCKE-Multimodal Deep-Context Knowledge Extractor That Integrates
15 pages
Attention Fusion Network For Multimodal Sentiment Analysis: Yuanyi Luo Rui Wu Jiafeng Liu Xianglong Tang
No ratings yet
Attention Fusion Network For Multimodal Sentiment Analysis: Yuanyi Luo Rui Wu Jiafeng Liu Xianglong Tang
11 pages
Multi Mod Al Data Fusion Techniques
No ratings yet
Multi Mod Al Data Fusion Techniques
11 pages
Multimodal Machine Learning A Survey and Taxonomy
No ratings yet
Multimodal Machine Learning A Survey and Taxonomy
21 pages
Multi-Label Multimodal Emotion Recognition
No ratings yet
Multi-Label Multimodal Emotion Recognition
17 pages
Multimodal Machine Learning: A Survey and Taxonomy: Tadas Baltru Saitis, Chaitanya Ahuja, and Louis-Philippe Morency
No ratings yet
Multimodal Machine Learning: A Survey and Taxonomy: Tadas Baltru Saitis, Chaitanya Ahuja, and Louis-Philippe Morency
20 pages
LLM Fundamentals
No ratings yet
LLM Fundamentals
8 pages
Circulant-Interactive Transformer With Dimension-Aware
No ratings yet
Circulant-Interactive Transformer With Dimension-Aware
16 pages
Multimodal Learning With Transformers - A Survey
No ratings yet
Multimodal Learning With Transformers - A Survey
23 pages
Nipsdlufl10 MultimodalDeepLearning
No ratings yet
Nipsdlufl10 MultimodalDeepLearning
9 pages
Lecture 20
No ratings yet
Lecture 20
12 pages
Deep Learning Book PDF
No ratings yet
Deep Learning Book PDF
272 pages
2023.acl MVCN
No ratings yet
2023.acl MVCN
13 pages
Novel Fusion Sight Object Detection System Using Transformer Networks
No ratings yet
Novel Fusion Sight Object Detection System Using Transformer Networks
12 pages
Multimae: Multi-Modal Multi-Task Masked Autoencoders
No ratings yet
Multimae: Multi-Modal Multi-Task Masked Autoencoders
21 pages
Conference 1
No ratings yet
Conference 1
13 pages
Shvetsova Everything at Once - Multi-Modal Fusion Transformer For Video Retrieval CVPR 2022 Paper
No ratings yet
Shvetsova Everything at Once - Multi-Modal Fusion Transformer For Video Retrieval CVPR 2022 Paper
10 pages
Synthesis Lectures On Computer Vision: Series Editors
No ratings yet
Synthesis Lectures On Computer Vision: Series Editors
8 pages
One Model To Learn Them All: Work Performed While at Google Brain
No ratings yet
One Model To Learn Them All: Work Performed While at Google Brain
10 pages
PCM Workshop
No ratings yet
PCM Workshop
7 pages
Cover Letter Ulysses
No ratings yet
Cover Letter Ulysses
1 page
Contemporary Piano & Ensemble Works
No ratings yet
Contemporary Piano & Ensemble Works
2 pages
Ido Akov: Music Education & Achievements
No ratings yet
Ido Akov: Music Education & Achievements
1 page
Data Analysis Learning Diary
No ratings yet
Data Analysis Learning Diary
2 pages
1 Homework 2: 1.1 Large Scale Data Analysis / Aalto University, Spring 2023
No ratings yet
1 Homework 2: 1.1 Large Scale Data Analysis / Aalto University, Spring 2023
12 pages
Ambisonic Decoding: Spread & Correlation
No ratings yet
Ambisonic Decoding: Spread & Correlation
5 pages
Data Science Homework Guide
No ratings yet
Data Science Homework Guide
10 pages
Common Building Stones Guide
No ratings yet
Common Building Stones Guide
2 pages
SEAS 2023 Guidelines & Manual
No ratings yet
SEAS 2023 Guidelines & Manual
69 pages
Typical Floor Plan
No ratings yet
Typical Floor Plan
1 page
Chief Architect x9 Reference Manual
No ratings yet
Chief Architect x9 Reference Manual
1,378 pages
Ma Standata Building Code Variance 19 BCV 022
No ratings yet
Ma Standata Building Code Variance 19 BCV 022
2 pages
Unitrates 2016
100% (1)
Unitrates 2016
37 pages
Design of Two Way Slab (With Beams) by DDM
94% (17)
Design of Two Way Slab (With Beams) by DDM
23 pages
FE-1500-FX Tech Data Sheet
No ratings yet
FE-1500-FX Tech Data Sheet
8 pages
Architecture - August 2022
No ratings yet
Architecture - August 2022
16 pages
Slow Damage
No ratings yet
Slow Damage
19 pages
Agg 1
No ratings yet
Agg 1
5 pages
Brushed by Tradition Interactive Brochure-Milliken
No ratings yet
Brushed by Tradition Interactive Brochure-Milliken
32 pages
Prestige Southern Star - Concept Brochure
No ratings yet
Prestige Southern Star - Concept Brochure
21 pages
Alka Pandey ABAP
No ratings yet
Alka Pandey ABAP
4 pages
Ganesh Arch Calendar
No ratings yet
Ganesh Arch Calendar
13 pages
Design of Bandhara
No ratings yet
Design of Bandhara
15 pages
Design and Nature V Comparing Design in Nature With Science and Engineering 1st Edition C. A. Brebbia Available Any Format
100% (2)
Design and Nature V Comparing Design in Nature With Science and Engineering 1st Edition C. A. Brebbia Available Any Format
138 pages
Floor Plan (CR #1) Floor Plan (CR #2 & #3) : Scale 1:20MTS Scale 1:20MTS
No ratings yet
Floor Plan (CR #1) Floor Plan (CR #2 & #3) : Scale 1:20MTS Scale 1:20MTS
2 pages
LeaP Arts G9 Q3 Week 5
No ratings yet
LeaP Arts G9 Q3 Week 5
4 pages
C-5 Structures of Steel
No ratings yet
C-5 Structures of Steel
34 pages
Billing Summary for LPV Realty
No ratings yet
Billing Summary for LPV Realty
1 page
Housing Case Studies - Compressed
No ratings yet
Housing Case Studies - Compressed
56 pages
Bridge Site Selection
50% (2)
Bridge Site Selection
30 pages
Construction Project Overview
No ratings yet
Construction Project Overview
1 page
IELTS-Style Speaking Test Questions and Answers
No ratings yet
IELTS-Style Speaking Test Questions and Answers
4 pages
Construction Quiz for Students
No ratings yet
Construction Quiz for Students
18 pages
Technical Drafting Research
No ratings yet
Technical Drafting Research
4 pages
Chapter2 Post War Decades International Style
No ratings yet
Chapter2 Post War Decades International Style
40 pages
Sec - Forces and Principles of Equilibrium Module 00
100% (1)
Sec - Forces and Principles of Equilibrium Module 00
6 pages
AC Maintenance Contract Template
No ratings yet
AC Maintenance Contract Template
1 page

Sensor Fusion Presentation

Uploaded by

Sensor Fusion Presentation

Uploaded by

o What are the concepts discussed in the papers?

●​ Modality correlation: It is important to leverage the correlation between noises in

Multimodal Fusion on Low-quality Data: A Comprehensive Survey (paper 1)

●​ Properties and challenges of multi-modal datasets:

Dynamic Multimodal Fusion (paper 2)

●​ Dynamic sensor fusion: generate data-dependent forward paths on-the-fly

●​ Denoising: filter out intent-irrelevant information (‘semantic’ noise)

O What kind of modalities are considered?

Tasks and related modalities

●​ gating network to provide modality-level or fusion-level decisions, where modality-level

●​ This network does static multimodal fusion (computation is fixed)

●​ Denoising bottleneck module allows model to retain intent-relevant information while

o Summarize the empirical evidence supporting the effectiveness of the design

●​ DynMM is used in 3 tasks (movie genre classiﬁcation, sentiment analysis, semantic

o Discuss any limitations or gaps in the empirical studies reviewed.

●​ Different gating network modules (MLP/Transformer/convolutional) were chosen for

Using Heterogeneous Multilevel Swarms of UAVs and High-Level Data

●​ multi-sensor multi-source data fusion: fuse information delivered by heterogeneous

●​ A decentralized framework demands a dynamic multimodal fusion approach, as some

o What are the challenges presented by dynamic environments and varying

●​ Real-time (online) implementation of sensor-fusion algorithms, needs to be

You might also like

● Modality correlation: It is important to leverage the correlation between noises in

● Properties and challenges of multi-modal datasets:

● Dynamic sensor fusion: generate data-dependent forward paths on-the-fly

● Denoising: filter out intent-irrelevant information (‘semantic’ noise)

● gating network to provide modality-level or fusion-level decisions, where modality-level

● This network does static multimodal fusion (computation is fixed)

● Denoising bottleneck module allows model to retain intent-relevant information while

● DynMM is used in 3 tasks (movie genre classiﬁcation, sentiment analysis, semantic

● Different gating network modules (MLP/Transformer/convolutional) were chosen for

● multi-sensor multi-source data fusion: fuse information delivered by heterogeneous

● A decentralized framework demands a dynamic multimodal fusion approach, as some

● Real-time (online) implementation of sensor-fusion algorithms, needs to be