0% found this document useful (0 votes)

78 views13 pages

Multichannel Variable-Size Convolution For Sentence Classification

This document describes a Multi-Channel Variable-Size Convolutional Neural Network (MVCNN) model for sentence classification. [1] The MVCNN enhances word embeddings by combining different pre-trained embedding methods. [2] It extracts phrase features using convolutional filters of variable sizes to capture n-grams of different lengths. [3] The model is improved through mutual learning of word embeddings trained on different corpora and pre-training to learn better sentence representations.

Uploaded by

akg299

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views13 pages

Multichannel Variable-Size Convolution For Sentence Classification

Uploaded by

akg299

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Multichannel Variable-Size Convolution for

Sentence Classification
- WenPeng Yin
- Hinrich Schutze

K.Vinay Sameer Raja

IIT Kanpur
INTRODUCTION

Enhance word vector representations by combining various word embedding

methods trained on different corpus

Extract features of multi granular phrases using variable filter size CNN.

CNN's were employed for extracting features over phrases but the size of filter

is a hyperparameter in such models

Mutual learning and Pre training for enhancing MVCNN.

ARCHITECTURE

Multi-Channel Input :

Input layer is a 3 dimensional array of size cds

where s - sentence length d - word embedding

dimension, c - no.of embedding versions.

In practice while using mini batch, sentences are

padded to same length by using random

initialization for unknown words in corresponding

versions.
Convolution Layer :

The computations involved in this layer are same as those in normal CNN's

but with additional features obtained due to variable filter sizes.

Mathematical Formulation :

Denoting feature map in ith layer by Fi and assume there are n maps in i-

1 layer. Let l be the size of filter and let weights be in a matrix Vi,lj,k then

Fi,lj = k Vi,lj,k Fi-1k

is the convolution operator

Pooling Layer :
Normal k-max pooling involves storing k maximum values from a
moving window.
Dynamic k-max pooling has the k value changing for each layer.
The choice of k value for a feature map in layer i is given by
ki = max ( ktop , (L-i) * s / L
where i {1, . . . L} is the order of convolution layer from bottom to top
L - total number of layers
ktop - a constant determined empirically which is the k value used
in top layer
Hidden Layer :
On the top of final k-max pooling a fully connected layer is stacked
to learn sentence representation of required dimension d

Logistic Regression Layer :

The outputs of hidden layer are forwarded to logistic regression
layer for classification
MODEL ENHANCEMENTS :

Mutual Learning of Embedded versions :

As different embedding versions are trained in different corpuses,
there may be some words which dont have embedding across all
versions.
Let V1, V2 , . Vc are vocabularies of c embedding versions.
V* = i=1c Vi be the total vocabulary of our final embedding
Vi- = V* \ Vi is the set of word which have no embedding in Vi
Vij = overlapping vocabulary between ith and jth versions.
We project (or learn) embeddings from ith to jth version by
wj = fij(wi)
Squared error between wj and wj is the training loss to minimize

Element-wise average of f1i(w1), f2i(w2), . . ., fki(wk) is treated as the

representation of w in V.

A total of c(c-1) /2 number of projections are calculated for finding

embeddings of every word across all versions.
Pre- Training

In Pre-training the sentence representation is used to predict the

component words (on in the figure) in the sentence (instead of predicting
the sentence label Y/N as in supervised learning)

Given sentence representation s Rd and initialized representations of 2t

context words (t left words and t right words): wit , . . ., wi1, wi+1, . . .,
wi+t ; wi Rd , we average the total 2t + 1 vectors element wise

Noise-contrastive estimation (NCE) is used to find true middle word from

the above resulting vector which is predicted representation.
In pre-training initializations are needed for
1. Each word in sentence in multi-channel input layer
(multichannel initialization)
2. Each context word as input to average layer (random
initialization)
3. Each target word as the output of NCE layer (random
initialization)

During pre-training , the model parameters will be updated in such a

way that they extract better sentence representations . These model
parameters are fine tuned in supervised tasks.
RESULTS :
Datasets :
Standard Sentiment Treebank (Socher et al., 2013) - Binary and Fine grained
Sentiment140 (Go et al., 2009) - Senti 140
Subjectivity classification dataset by (Pang and Lee, 2004) - Subj
Questions ?

Thank You!

CNNs for Sentence Classification
No ratings yet
CNNs for Sentence Classification
6 pages
2015 - Convolutional Neural Networks For Sentence Classification (XXX) (15 Slides)
No ratings yet
2015 - Convolutional Neural Networks For Sentence Classification (XXX) (15 Slides)
15 pages
Advanced Text Representation Models
No ratings yet
Advanced Text Representation Models
9 pages
Paragraph Vector PDF
No ratings yet
Paragraph Vector PDF
9 pages
ML For NLP-LO4
No ratings yet
ML For NLP-LO4
42 pages
Zhou 2020
No ratings yet
Zhou 2020
5 pages
Word Embedding
No ratings yet
Word Embedding
9 pages
Lec 36
No ratings yet
Lec 36
18 pages
How Exactly Does Word2vec Work?: David Meyer
No ratings yet
How Exactly Does Word2vec Work?: David Meyer
18 pages
07 Dlintro
No ratings yet
07 Dlintro
39 pages
Hierarchical NNLM Aistats05
No ratings yet
Hierarchical NNLM Aistats05
7 pages
Cs229 Lecture Selfsupervision Final
No ratings yet
Cs229 Lecture Selfsupervision Final
65 pages
NLP Word Vectors for Students
No ratings yet
NLP Word Vectors for Students
33 pages
Vector Representation of Text: Vagelis Hristidis Prepared With The Help of Nhat Le Many Slides Are From Richard Socher
No ratings yet
Vector Representation of Text: Vagelis Hristidis Prepared With The Help of Nhat Le Many Slides Are From Richard Socher
20 pages
CBOW Single Word Model Explained
No ratings yet
CBOW Single Word Model Explained
14 pages
Deep Learning Based Sentiment Analysis For Malayalam, Tamil and Kannada Languages
No ratings yet
Deep Learning Based Sentiment Analysis For Malayalam, Tamil and Kannada Languages
9 pages
7a. Word Embeddings Word2Vec and GloVe
No ratings yet
7a. Word Embeddings Word2Vec and GloVe
39 pages
Lecture6 421
No ratings yet
Lecture6 421
43 pages
Complex Sentiment Analysis Using Recursive Autoencoders
No ratings yet
Complex Sentiment Analysis Using Recursive Autoencoders
5 pages
NLP & AI Techniques Guide
No ratings yet
NLP & AI Techniques Guide
37 pages
Unit 3 NLP
No ratings yet
Unit 3 NLP
8 pages
Lecture 6 - Word2Vec and Text Classification
No ratings yet
Lecture 6 - Word2Vec and Text Classification
66 pages
NLP Word Embeddings Explained
No ratings yet
NLP Word Embeddings Explained
55 pages
Neural Network Seminar Anirban
No ratings yet
Neural Network Seminar Anirban
13 pages
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick PDF
No ratings yet
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick PDF
39 pages
L4 Cse256 Fa24 We
No ratings yet
L4 Cse256 Fa24 We
68 pages
State of Multilingual and Multimodal NLP
No ratings yet
State of Multilingual and Multimodal NLP
27 pages
Sense2Vec: Fast Word Sense Disambiguation
No ratings yet
Sense2Vec: Fast Word Sense Disambiguation
9 pages
BDMH LLM
No ratings yet
BDMH LLM
51 pages
Word Embedding
No ratings yet
Word Embedding
35 pages
Eduard Dragut - 2
No ratings yet
Eduard Dragut - 2
10 pages
Wordembed v2.0
No ratings yet
Wordembed v2.0
46 pages
Word2Vec for NLP Enthusiasts
100% (1)
Word2Vec for NLP Enthusiasts
12 pages
Word Embeddings Classification
No ratings yet
Word Embeddings Classification
52 pages
Analysis of Student Feedback Using Deep Learning
No ratings yet
Analysis of Student Feedback Using Deep Learning
4 pages
Unit IV
No ratings yet
Unit IV
58 pages
Pertemuan 8 - Word Vector Representations - Bag 1
No ratings yet
Pertemuan 8 - Word Vector Representations - Bag 1
19 pages
08 Word Embeddings (2021)
No ratings yet
08 Word Embeddings (2021)
58 pages
Character-Based Neural Networks For Sentence Pair Modeling
No ratings yet
Character-Based Neural Networks For Sentence Pair Modeling
7 pages
Draft Dbn4lvcsr Transaslp
No ratings yet
Draft Dbn4lvcsr Transaslp
13 pages
Deep Learning
No ratings yet
Deep Learning
24 pages
AI Sentiment Analysis Breakthrough
No ratings yet
AI Sentiment Analysis Breakthrough
9 pages
DL Unit-IV
No ratings yet
DL Unit-IV
20 pages
Wordembed
No ratings yet
Wordembed
31 pages
CNN Text Classification
No ratings yet
CNN Text Classification
12 pages
Word Embedding 9 Mar 23 PDF
No ratings yet
Word Embedding 9 Mar 23 PDF
16 pages
CS585 Lecture October15th
No ratings yet
CS585 Lecture October15th
162 pages
Foundations of Text Representation, LLMs and Transformers
No ratings yet
Foundations of Text Representation, LLMs and Transformers
87 pages
Word Embeddings & Word2Vec Guide
No ratings yet
Word Embeddings & Word2Vec Guide
9 pages
Sentiment Analysis Using Deep Learning Methods - PPT Download
No ratings yet
Sentiment Analysis Using Deep Learning Methods - PPT Download
37 pages
Engineering Applications of Arti Ficial Intelligence: Berna Alt Inel, Murat Can Ganiz, Banu Diri
No ratings yet
Engineering Applications of Arti Ficial Intelligence: Berna Alt Inel, Murat Can Ganiz, Banu Diri
12 pages
Neural Language Models & Classifiers Guide
No ratings yet
Neural Language Models & Classifiers Guide
7 pages
A Probabilistic Model For Semantic Word Vectors: Andrew L. Maas and Andrew Y. NG
No ratings yet
A Probabilistic Model For Semantic Word Vectors: Andrew L. Maas and Andrew Y. NG
8 pages
A Probabilistic Model For Semantic Word Vectors: Andrew L. Maas and Andrew Y. NG
No ratings yet
A Probabilistic Model For Semantic Word Vectors: Andrew L. Maas and Andrew Y. NG
8 pages
2009 Tutorial Nips
No ratings yet
2009 Tutorial Nips
113 pages
Lecture 2a - Word Level Semantics
No ratings yet
Lecture 2a - Word Level Semantics
34 pages
Naukri VivekKumarSharma (7y 0m)
No ratings yet
Naukri VivekKumarSharma (7y 0m)
1 page
AI - ML Engineer JD
No ratings yet
AI - ML Engineer JD
2 pages
Naukri DhruvBhardwaj (5y 2m)
No ratings yet
Naukri DhruvBhardwaj (5y 2m)
2 pages
Mohit Singh
No ratings yet
Mohit Singh
1 page
Book Review Designing Data-Intensive Applications: The Big Ideas Behind Reliable
No ratings yet
Book Review Designing Data-Intensive Applications: The Big Ideas Behind Reliable
4 pages
Bossy Pants
0% (1)
Bossy Pants
1 page
Image Caption Technical Report
50% (2)
Image Caption Technical Report
28 pages
Image Captioning TR
No ratings yet
Image Captioning TR
5 pages
Ucs742 Assignment
No ratings yet
Ucs742 Assignment
1 page
Semantic Image Captioning Report
No ratings yet
Semantic Image Captioning Report
3 pages
Top Interview Questions
No ratings yet
Top Interview Questions
2 pages
Technology Analyst at ZS
No ratings yet
Technology Analyst at ZS
2 pages
Business Operations Associate PDF
No ratings yet
Business Operations Associate PDF
2 pages
PNP Problems
No ratings yet
PNP Problems
11 pages
ZS Campus Name 2018 Student Data
No ratings yet
ZS Campus Name 2018 Student Data
2 pages
Ert Graph Barbil
No ratings yet
Ert Graph Barbil
2 pages
BODMAS Rule Explained with Examples
No ratings yet
BODMAS Rule Explained with Examples
4 pages
Ratio Proportion Test ICSE Solutions
No ratings yet
Ratio Proportion Test ICSE Solutions
18 pages
Large Animal Internal Medicine, 6th Edition Bradford P. Smith Instant Download
No ratings yet
Large Animal Internal Medicine, 6th Edition Bradford P. Smith Instant Download
103 pages
Coordinate Geometry
No ratings yet
Coordinate Geometry
10 pages
SAT Subject Math Level 1 Practice Test 3
No ratings yet
SAT Subject Math Level 1 Practice Test 3
17 pages
Research Ii Glenda
No ratings yet
Research Ii Glenda
15 pages
TNPSC Mathematics Syllabus Notes
100% (1)
TNPSC Mathematics Syllabus Notes
28 pages
Math 10 q1 Module 11
No ratings yet
Math 10 q1 Module 11
19 pages
ESS 1001 E Tutorial #5
No ratings yet
ESS 1001 E Tutorial #5
3 pages
9MA0 Pure Topic 2 Algebra and Functions Test 2 Questions
100% (1)
9MA0 Pure Topic 2 Algebra and Functions Test 2 Questions
10 pages
Grade 3 Maths Revision
No ratings yet
Grade 3 Maths Revision
7 pages
Pharmaceutical Calculations PDF
100% (2)
Pharmaceutical Calculations PDF
149 pages
Advances in Marine Biology Volume 68 1st Edition Michael P. Lesser New Release 2025
No ratings yet
Advances in Marine Biology Volume 68 1st Edition Michael P. Lesser New Release 2025
81 pages
Mathematics: Quarter 1 - Module 3: Absolute Value and Operations On Integers
No ratings yet
Mathematics: Quarter 1 - Module 3: Absolute Value and Operations On Integers
33 pages
Prelab 1
No ratings yet
Prelab 1
1 page
What Is Operation Research? 2.what Is Production Management? 3.what Is The Difference Between Operation Research and Operational Research?
No ratings yet
What Is Operation Research? 2.what Is Production Management? 3.what Is The Difference Between Operation Research and Operational Research?
6 pages
Control Engineering: Types of Control Systems
No ratings yet
Control Engineering: Types of Control Systems
26 pages
2022 Quantitative Reasoning
No ratings yet
2022 Quantitative Reasoning
14 pages
Two-Variable Regression Model, The Problem of Estimation
No ratings yet
Two-Variable Regression Model, The Problem of Estimation
67 pages
Laser Safety 1st Edition Roy Henderson Digital Version 2025
No ratings yet
Laser Safety 1st Edition Roy Henderson Digital Version 2025
102 pages
Back Track
No ratings yet
Back Track
37 pages
Probability and Random Variables
No ratings yet
Probability and Random Variables
42 pages
Linear Programming Assignment 3: 1 Chapter 6 (Section 6.4)
No ratings yet
Linear Programming Assignment 3: 1 Chapter 6 (Section 6.4)
4 pages
Atlas of Emergency Medicine 2nd Edition K Knoop PDF Version
100% (3)
Atlas of Emergency Medicine 2nd Edition K Knoop PDF Version
72 pages
Diagnostic Test - Mathematics 8
No ratings yet
Diagnostic Test - Mathematics 8
3 pages
Calculus Limits and Derivatives
No ratings yet
Calculus Limits and Derivatives
90 pages
Integers: Introduction To Integers and Their Absolute Value
No ratings yet
Integers: Introduction To Integers and Their Absolute Value
22 pages
International Foundation Mathematics Olympiad (IFMO) Class 6 Worksheet - 2
No ratings yet
International Foundation Mathematics Olympiad (IFMO) Class 6 Worksheet - 2
5 pages
Project Work Sample For Grade 12
No ratings yet
Project Work Sample For Grade 12
13 pages

Multichannel Variable-Size Convolution For Sentence Classification

Uploaded by

Multichannel Variable-Size Convolution For Sentence Classification

Uploaded by

Multichannel Variable-Size Convolution for

K.Vinay Sameer Raja

Enhance word vector representations by combining various word embedding

methods trained on different corpus

is a hyperparameter in such models

Mutual learning and Pre training for enhancing MVCNN.

Input layer is a 3 dimensional array of size cds

where s - sentence length d - word embedding

dimension, c - no.of embedding versions.

In practice while using mini batch, sentences are

padded to same length by using random

initialization for unknown words in corresponding

but with additional features obtained due to variable filter sizes.

Fi,lj = k Vi,lj,k Fi-1k

is the convolution operator

Logistic Regression Layer :

Mutual Learning of Embedded versions :

Element-wise average of f1i(w1), f2i(w2), . . ., fki(wk) is treated as the

A total of c(c-1) /2 number of projections are calculated for finding

In Pre-training the sentence representation is used to predict the

Given sentence representation s Rd and initialized representations of 2t

Noise-contrastive estimation (NCE) is used to find true middle word from

During pre-training , the model parameters will be updated in such a

You might also like