default search action

combined dblp search
author search
venue search
publication search

ask others

Nithin Rao Koluguri

> Home > Persons

Person information

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

2020 – today

see FAQ

What is the meaning of the colors in the publication lists?

2025
[i17]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2503-05931
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2503-05931
Piotr Zelasko, Kunal Dhawan, Daniel Galvez, Krishna C. Puvvada, Ankita Pasad, Nithin Rao Koluguri, Ke Hu, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg:
Training and Inference Efficiency of Encoder-Decoder Speech Models. CoRR abs/2503.05931 (2025)
2024
[c12]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ParkDKB24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ParkDKB24
Taejin Park, Kunal Dhawan, Nithin Rao Koluguri, Jagadeesh Balam:
Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach. ICASSP 2024: 10861-10865
[c11]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/PuvvadaKDBG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/PuvvadaKDBG24
Krishna C. Puvvada, Nithin Rao Koluguri, Kunal Dhawan, Jagadeesh Balam, Boris Ginsburg:
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition. ICASSP 2024: 12111-12115
[c10]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/KoluguriKZMRNBG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/KoluguriKZMRNBG24
Nithin Rao Koluguri, Samuel Kriman, Georgy Zelenfroind, Somshubra Majumdar, Dima Rekesh, Vahid Noroozi, Jagadeesh Balam, Boris Ginsburg:
Investigating End-to-End ASR Architectures for Long Form Audio Transcription. ICASSP 2024: 13366-13370
[c9]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/ChenHHPKZBG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/ChenHHPKZBG24
Zhehuai Chen, He Huang, Oleksii Hrinchuk, Krishna C. Puvvada, Nithin Rao Koluguri, Piotr Zelasko, Jagadeesh Balam, Boris Ginsburg:
Bestow: Efficient and Streamable Speech Language Model with The Best of Two Worlds in GPT and T5. SLT 2024: 147-154
[c8]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/KoluguriBXHBGK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/KoluguriBXHBGK24
Nithin Rao Koluguri, Travis M. Bartley, Hainan Xu, Oleksii Hrinchuk, Jagadeesh Balam, Boris Ginsburg, Georg Kucsko:
Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation. SLT 2024: 255-262
[i16]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-19674
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-19674
Krishna C. Puvvada, Piotr Zelasko, He Huang, Oleksii Hrinchuk, Nithin Rao Koluguri, Kunal Dhawan, Somshubra Majumdar, Elena Rastorgueva, Zhehuai Chen, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg:
Less is More: Accurate Speech Recognition & Translation without Web-Scale Data. CoRR abs/2406.19674 (2024)
[i15]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-19954
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-19954
Zhehuai Chen, He Huang, Oleksii Hrinchuk, Krishna C. Puvvada, Nithin Rao Koluguri, Piotr Zelasko, Jagadeesh Balam, Boris Ginsburg:
BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5. CoRR abs/2406.19954 (2024)
[i14]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2407-03495
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2407-03495
Kunal Dhawan, Nithin Rao Koluguri, Ante Jukic, Ryan Langman, Jagadeesh Balam, Boris Ginsburg:
Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations. CoRR abs/2407.03495 (2024)
[i13]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2408-13106
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2408-13106
He Huang, Taejin Park, Kunal Dhawan, Ivan Medennikov, Krishna C. Puvvada, Nithin Rao Koluguri, Weiqing Wang, Jagadeesh Balam, Boris Ginsburg:
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks. CoRR abs/2408.13106 (2024)
[i12]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2409-05601
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2409-05601
Nithin Rao Koluguri, Travis M. Bartley, Hainan Xu, Oleksii Hrinchuk, Jagadeesh Balam, Boris Ginsburg, Georg Kucsko:
Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation. CoRR abs/2409.05601 (2024)
[i11]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2409-06656
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2409-06656
Taejin Park, Ivan Medennikov, Kunal Dhawan, Weiqing Wang, He Huang, Nithin Rao Koluguri, Krishna C. Puvvada, Jagadeesh Balam, Boris Ginsburg:
Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens. CoRR abs/2409.06656 (2024)
[i10]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2409-12352
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2409-12352
Jinhan Wang, Weiqing Wang, Kunal Dhawan, Taejin Park, Myungjong Kim, Ivan Medennikov, He Huang, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg:
META-CAT: Speaker-Informed Speech Embeddings via Meta Information Concatenation for Multi-talker ASR. CoRR abs/2409.12352 (2024)
2023
[c7]
- view
  authority control:
- export record
  dblp key:
  - conf/asru/RekeshKKMNHHPKBG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/asru/RekeshKKMNHHPKBG23
Dima Rekesh, Nithin Rao Koluguri, Samuel Kriman, Somshubra Majumdar, Vahid Noroozi, He Huang, Oleksii Hrinchuk, Krishna C. Puvvada, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg:
Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition. ASRU 2023: 1-8
[c6]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JiaKBG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JiaKBG23
Fei Jia, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg:
A Compact End-to-End Model with Local and Global Context for Spoken Language Identification. INTERSPEECH 2023: 5321-5325
[i9]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-05248
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-05248
Taejin Park, Kunal Dhawan, Nithin Rao Koluguri, Jagadeesh Balam:
Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach. CoRR abs/2309.05248 (2023)
[i8]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-09950
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-09950
Nithin Rao Koluguri, Samuel Kriman, Georgy Zelenfroind, Somshubra Majumdar, Dima Rekesh, Vahid Noroozi, Jagadeesh Balam, Boris Ginsburg:
Investigating End-to-End ASR Architectures for Long Form Audio Transcription. CoRR abs/2309.09950 (2023)
[i7]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-10922
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-10922
Krishna C. Puvvada, Nithin Rao Koluguri, Kunal Dhawan, Jagadeesh Balam, Boris Ginsburg:
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition. CoRR abs/2309.10922 (2023)
[i6]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2310-12371
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2310-12371
Taejin Park, He Huang, Coleman Hooper, Nithin Rao Koluguri, Kunal Dhawan, Ante Jukic, Jagadeesh Balam, Boris Ginsburg:
Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation. CoRR abs/2310.12371 (2023)
[i5]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2310-12378
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2310-12378
Taejin Park, He Huang, Ante Jukic, Kunal Dhawan, Krishna C. Puvvada, Nithin Rao Koluguri, Nikolay Karpov, Aleksandr Laptev, Jagadeesh Balam, Boris Ginsburg:
The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System. CoRR abs/2310.12378 (2023)
2022
[c5]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/KoluguriPG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/KoluguriPG22
Nithin Rao Koluguri, Taejin Park, Boris Ginsburg:
TitaNet: Neural Model for Speaker Representation with 1D Depth-Wise Separable Convolutions and Global Context. ICASSP 2022: 8102-8106
[c4]
- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/ParkKJBG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkKJBG22
Taejin Park, Nithin Rao Koluguri, Fei Jia, Jagadeesh Balam, Boris Ginsburg:
NeMo Open Source Speaker Diarization System. INTERSPEECH 2022: 853-854
[c3]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParkKBG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkKBG22
Taejin Park, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg:
Multi-scale Speaker Diarization with Dynamic Scale Weighting. INTERSPEECH 2022: 5080-5084
[i4]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2203-15974
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2203-15974
Taejin Park, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg:
Multi-scale Speaker Diarization with Dynamic Scale Weighting. CoRR abs/2203.15974 (2022)
[i3]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2210-15781
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2210-15781
Fei Jia, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg:
AmberNet: A Compact End-to-End Model for Spoken Language Identification. CoRR abs/2210.15781 (2022)
2021
[i2]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2110-04410
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2110-04410
Nithin Rao Koluguri, Taejin Park, Boris Ginsburg:
TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context. CoRR abs/2110.04410 (2021)
2020
[c2]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/Koluguri0KLN20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/Koluguri0KLN20
Nithin Rao Koluguri, Manoj Kumar, So Hyun Kim, Catherine Lord, Shrikanth Narayanan:
Meta-Learning for Robust Child-Adult Classification from Speech. ICASSP 2020: 8094-8098

2010 – 2019

see FAQ

What is the meaning of the colors in the publication lists?

2019
[c1]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NPKBRNYGG19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NPKBRNYGG19
Suhas B. N., Deep Patel, Nithin Rao Koluguri, Yamini Belur, Pradeep Reddy, Atchayaram Nalini, Ravi Yadav, Dipanjan Gope, Prasanta Kumar Ghosh:
Comparison of Speech Tasks and Recording Devices for Voice Based Automatic Classification of Healthy Subjects and Patients with Amyotrophic Lateral Sclerosis. INTERSPEECH 2019: 4564-4568
[i1]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-1910-11400
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1910-11400
Nithin Rao Koluguri, Manoj Kumar, So Hyun Kim, Catherine Lord, Shrikanth S. Narayanan:
Meta-learning for robust child-adult classification from speech. CoRR abs/1910.11400 (2019)
2017
[j1]
- view
  authority control:
- export record
  dblp key:
  - journals/taslp/KoluguriMG17
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/taslp/KoluguriMG17
Nithin Rao Koluguri, G. Nisha Meenakshi, Prasanta Kumar Ghosh:
Spectrogram Enhancement Using Multiple Window Savitzky-Golay (MWSG) Filter for Robust Bird Sound Detection. IEEE ACM Trans. Audio Speech Lang. Process. 25(6): 1183-1192 (2017)

Coauthor Index

see FAQ

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.