default search action
Kazuhito Koishida
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j3]Amrit Romana, Kazuhito Koishida, Emily Mower Provost:
Automatic Disfluency Detection From Untranscribed Speech. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4727-4740 (2024) - [c34]Afrina Tabassum, Dung N. Tran, Trung Dang, Ismini Lourentzou, Kazuhito Koishida:
uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures. ICASSP 2024: 5435-5439 - [c33]Tanvir Mahmud, Saeed Amizadeh, Kazuhito Koishida, Diana Marculescu:
Weakly-supervised Audio Separation via Bi-modal Semantic Similarity. ICLR 2024 - [i20]Chih-Yu Lai, Dung N. Tran, Kazuhito Koishida:
Learned Image Compression with Text Quality Enhancement. CoRR abs/2402.08643 (2024) - [i19]Afrina Tabassum, Dung N. Tran, Trung Dang, Ismini Lourentzou, Kazuhito Koishida:
uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures. CoRR abs/2403.09579 (2024) - [i18]Tanvir Mahmud, Saeed Amizadeh, Kazuhito Koishida, Diana Marculescu:
Weakly-supervised Audio Separation via Bi-modal Semantic Similarity. CoRR abs/2404.01740 (2024) - [i17]Trung Dang, David Aponte, Dung N. Tran, Kazuhito Koishida:
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes. CoRR abs/2406.02897 (2024) - [i16]Yinheng Li, Rogerio Bonatti, Sara Abdali, Justin Wagle, Kazuhito Koishida:
Data Generation Using Large Language Models for Text Classification: An Empirical Case Study. CoRR abs/2407.12813 (2024) - [i15]Rogerio Bonatti, Dan Zhao, Francesco Bonacci, Dillon Dupont, Sara Abdali, Yinheng Li, Yadong Lu, Justin Wagle, Kazuhito Koishida, Arthur Bucker, Lawrence Jang, Zack Hui:
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale. CoRR abs/2409.08264 (2024) - [i14]Trung Dang, David Aponte, Dung N. Tran, Tianyi Chen, Kazuhito Koishida:
Zero-Shot Text-to-Speech from Continuous Text Streams. CoRR abs/2410.00767 (2024) - [i13]Lawrence Jang, Yinheng Li, Charles Ding, Justin Lin, Paul Pu Liang, Dan Zhao, Rogerio Bonatti, Kazuhito Koishida:
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks. CoRR abs/2410.19100 (2024) - 2023
- [c32]Minh N. Bui, Dung N. Tran, Kazuhito Koishida, Trac D. Tran, Peter Chin:
Improving Low-Latency Mono-Channel Speech Enhancement by Compensation Windows in STFT Analysis. COMPLEX NETWORKS (1) 2023: 363-373 - [c31]Amrit Romana, Kazuhito Koishida:
Toward A Multimodal Approach for Disfluency Detection and Categorization. ICASSP 2023: 1-5 - [c30]Vasily Zadorozhnyy, Qiang Ye, Kazuhito Koishida:
SCP-GAN: Self-Correcting Discriminator Optimization for Training Consistency Preserving Metric GAN on Speech Enhancement Tasks. INTERSPEECH 2023: 2463-2467 - [c29]Don Kurian Dennis, Abhishek Shetty, Anish Prasad Sevekari, Kazuhito Koishida, Virginia Smith:
Progressive Ensemble Distillation: Building Ensembles for Efficient Inference. NeurIPS 2023 - [i12]Don Kurian Dennis, Abhishek Shetty, Anish Sevekari, Kazuhito Koishida, Virginia Smith:
Progressive Knowledge Distillation: Building Ensembles for Efficient Inference. CoRR abs/2302.10093 (2023) - [i11]Yatong Bai, Trung Dang, Dung N. Tran, Kazuhito Koishida, Somayeh Sojoudi:
Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation. CoRR abs/2309.10740 (2023) - [i10]Amrit Romana, Kazuhito Koishida, Emily Mower Provost:
Automatic Disfluency Detection from Untranscribed Speech. CoRR abs/2311.00867 (2023) - [i9]Oscar Chang, Dung N. Tran, Kazuhito Koishida:
Single-channel speech enhancement using learnable loss mixup. CoRR abs/2312.17255 (2023) - 2022
- [c28]Trung Dang, Dung N. Tran, Peter Chin, Kazuhito Koishida:
Training Robust Zero-Shot Voice Conversion Models with Self-Supervised Features. ICASSP 2022: 6557-6561 - [c27]Bahareh Tolooshams, Kazuhito Koishida:
A Training Framework for Stereo-Aware Speech Enhancement Using Deep Neural Networks. ICASSP 2022: 6962-6966 - [i8]Vasily Zadorozhnyy, Qiang Ye, Kazuhito Koishida:
SCP-GAN: Self-Correcting Discriminator Optimization for Training Consistency Preserving Metric GAN on Speech Enhancement Tasks. CoRR abs/2210.14474 (2022) - 2021
- [c26]Arun Asokan Nair, Kazuhito Koishida:
Cascaded Time + Time-Frequency Unet For Speech Enhancement: Jointly Addressing Clipping, Codec Distortions, And Gaps. ICASSP 2021: 7153-7157 - [c25]Oscar Chang, Dung N. Tran, Kazuhito Koishida:
Single-Channel Speech Enhancement Using Learnable Loss Mixup. Interspeech 2021: 2696-2700 - [c24]Chandan K. A. Reddy, Harishchandra Dubey, Kazuhito Koishida, Arun Asokan Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan:
INTERSPEECH 2021 Deep Noise Suppression Challenge. Interspeech 2021: 2796-2800 - [i7]Chandan K. A. Reddy, Harishchandra Dubey, Kazuhito Koishida, Arun Asokan Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan:
Interspeech 2021 Deep Noise Suppression Challenge. CoRR abs/2101.01902 (2021) - [i6]Trung Dang, Dung N. Tran, Peter Chin, Kazuhito Koishida:
Training Robust Zero-Shot Voice Conversion Models with Self-supervised Features. CoRR abs/2112.04424 (2021) - [i5]Bahareh Tolooshams, Kazuhito Koishida:
A Training Framework for Stereo-Aware Speech Enhancement using Deep Neural Networks. CoRR abs/2112.04939 (2021) - [i4]Melikasadat Emami, Dung N. Tran, Kazuhito Koishida:
Augmented Contrastive Self-Supervised Learning for Audio Invariant Representations. CoRR abs/2112.10950 (2021) - 2020
- [c23]Chong Huang, Kazuhito Koishida:
Improved Active Speaker Detection based on Optical Flow. CVPR Workshops 2020: 4084-4090 - [c22]Hamid Reza Vaezi Joze, Amirreza Shaban, Michael L. Iuzzolino, Kazuhito Koishida:
MMTM: Multimodal Transfer Module for CNN Fusion. CVPR 2020: 13286-13296 - [c21]Li Li, Kazuhito Koishida:
Geometrically Constrained Independent Vector Analysis for Directional Speech Enhancement. ICASSP 2020: 846-850 - [c20]Ahmet Emin Bulut, Kazuhito Koishida:
Low-Latency Single Channel Speech Enhancement Using U-Net Convolutional Neural Networks. ICASSP 2020: 6214-6218 - [c19]Michael L. Iuzzolino, Kazuhito Koishida:
AV(SE)2: Audio-Visual Squeeze-Excite Speech Enhancement. ICASSP 2020: 7539-7543 - [c18]Saeed Amizadeh, Hamid Palangi, Alex Polozov, Yichen Huang, Kazuhito Koishida:
Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning". ICML 2020: 279-290 - [c17]Li Li, Kazuhito Koishida, Shoji Makino:
Online Directional Speech Enhancement Using Geometrically Constrained Independent Vector Analysis. INTERSPEECH 2020: 61-65 - [c16]Dung N. Tran, Uros Batricevic, Kazuhito Koishida:
Robust Pitch Regression with Voiced/Unvoiced Classification in Nonstationary Noise Environments. INTERSPEECH 2020: 175-179 - [c15]Ahmet Emin Bulut, Kazuhito Koishida:
Low-Latency Single Channel Speech Dereverberation Using U-Net Convolutional Neural Networks. INTERSPEECH 2020: 2442-2446 - [c14]Dung N. Tran, Kazuhito Koishida:
Single-Channel Speech Enhancement by Subspace Affinity Minimization. INTERSPEECH 2020: 2447-2451 - [i3]Saeed Amizadeh, Hamid Palangi, Oleksandr Polozov, Yichen Huang, Kazuhito Koishida:
Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning". CoRR abs/2006.11524 (2020)
2010 – 2019
- 2019
- [j2]Sefik Emre Eskimez, Kazuhito Koishida, Zhiyao Duan:
Adversarial Training for Speech Super-Resolution. IEEE J. Sel. Top. Signal Process. 13(2): 347-358 (2019) - [c13]Sefik Emre Eskimez, Kazuhito Koishida:
Speech Super Resolution Generative Adversarial Network. ICASSP 2019: 3717-3721 - [c12]Wei Xia, Kazuhito Koishida:
Sound Event Detection in Multichannel Audio Using Convolutional Time-Frequency-Channel Squeeze and Excitation. INTERSPEECH 2019: 3629-3633 - [i2]Wei Xia, Kazuhito Koishida:
Sound Event Detection in Multichannel Audio using Convolutional Time-Frequency-Channel Squeeze and Excitation. CoRR abs/1908.01399 (2019) - [i1]Hamid Reza Vaezi Joze, Amirreza Shaban, Michael L. Iuzzolino, Kazuhito Koishida:
MMTM: Multimodal Transfer Module for CNN Fusion. CoRR abs/1911.08670 (2019) - 2018
- [j1]Chunlei Zhang, Kazuhito Koishida, John H. L. Hansen:
Text-Independent Speaker Verification Based on Triplet Convolutional Neural Network Embeddings. IEEE ACM Trans. Audio Speech Lang. Process. 26(9): 1633-1644 (2018) - 2017
- [c11]Chunlei Zhang, Kazuhito Koishida:
End-to-end text-independent speaker verification with flexibility in utterance duration. ASRU 2017: 584-590 - [c10]Chunlei Zhang, Kazuhito Koishida:
End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances. INTERSPEECH 2017: 1487-1491
2000 – 2009
- 2008
- [c9]Sanjeev Mehrotra, Wei-Ge Chen, Kazuhito Koishida, Naveen Thumpudi:
Hybrid low bitrate audio coding using adaptive gain shape vector quantization. MMSP 2008: 927-932 - 2000
- [c8]Kazuhito Koishida, Vladimir Cuperman, Allen Gersho:
A 16-kbit/s bandwidth scalable audio coder based on the G.729 standard. ICASSP 2000: 1149-1152 - [c7]Tian Wang, Kazuhito Koishida, Vladimir Cuperman, Allen Gersho, John S. Collura:
A 1200 bps speech coder based on MELP. ICASSP 2000: 1375-1378
1990 – 1999
- 1998
- [c6]Kazuhito Koishida, Gou Hirabayashi, Keiichi Tokuda, Takao Kobayashi:
A wideband CELP speech coder at 16 kbit/s based on mel-generalized cepstral analysis. ICASSP 1998: 161-164 - [c5]Kazuhito Koishida, Gou Hirabayashi, Keiichi Tokuda, Takao Kobayashi:
A 16 kbit/s wideband CELP coder using MEL-generalized cepstral analysis and its subjective evaluation. ICSLP 1998 - 1997
- [c4]Kazuhito Koishida, Keiichi Tokuda, Takao Kobayashi, Satoshi Imai:
Efficient encoding of mel-generalized cepstrum for CELP coders. ICASSP 1997: 1355-1358 - 1996
- [c3]Kazuhito Koishida, Keiichi Tokuda, Takao Kobayashi, Satoshi Imai:
CELP coding system based on mel-generalized cepstral analysis. ICSLP 1996: 318-321 - 1995
- [c2]Kazuhito Koishida, Keiichi Tokuda, Takao Kobayashi, Satoshi Imai:
CELP coding based on mel-cepstral analysis. ICASSP 1995: 33-36 - 1994
- [c1]Kazuhito Koishida, Keiichi Tokuda, Takao Kobayashi, Satoshi Imai:
Speech coding based on adaptive MEL-cepstral analysis for noisy channels. ICSLP 1994: 2087-2090
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-12-02 21:26 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint