Important
If you need to look at other conferences such as NeurIPS, ICLR, ICML, EMNLP, or ACL, you can check out Awesome-artist !!!🤩🤩🤩
Note
This project repository contains the long papers from ACL 2025. Each paper’s framework diagrams, experimental figures, and other visuals are extracted to study their presentation techniques. Since the content is extensive and a single Markdown file cannot render everything reliably, we split it into 50 separate Markdown files, each covering approximately thirty-two papers. The following section indexes where each paper is located😁😁. Hope we can make progress together!
Total Papers: 1599
Split into 50 parts for better browsing
| column1 | column2 | column3 | column4 | column5 | column6 | column7 | column8 | column9 | column10 |
|---|---|---|---|---|---|---|---|---|---|
| Part 1: 32 papers | Part 2: 32 papers | Part 3: 32 papers | Part 4: 32 papers | Part 5: 32 papers | Part 6: 32 papers | Part 7: 32 papers | Part 8: 32 papers | Part 9: 32 papers | Part 10: 32 papers |
| Part 11: 32 papers | Part 12: 32 papers | Part 13: 32 papers | Part 14: 32 papers | Part 15: 32 papers | Part 16: 32 papers | Part 17: 32 papers | Part 18: 32 papers | Part 19: 32 papers | Part 20: 32 papers |
| Part 21: 32 papers | Part 22: 32 papers | Part 23: 32 papers | Part 24: 32 papers | Part 25: 32 papers | Part 26: 32 papers | Part 27: 32 papers | Part 28: 32 papers | Part 29: 32 papers | Part 30: 32 papers |
| Part 31: 32 papers | Part 32: 32 papers | Part 33: 32 papers | Part 34: 32 papers | Part 35: 32 papers | Part 36: 32 papers | Part 37: 32 papers | Part 38: 32 papers | Part 39: 32 papers | Part 40: 32 papers |
| Part 41: 32 papers | Part 42: 32 papers | Part 43: 32 papers | Part 44: 32 papers | Part 45: 32 papers | Part 46: 32 papers | Part 47: 32 papers | Part 48: 32 papers | Part 49: 32 papers | Part 50: 31 papers |
- Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- EcomScriptBench: A Multi-task Benchmark forE-commerce Script Planning via Step-wise Intention-Driven Product Association
- GraphNarrator: Generating Textual Explanations for Graph Neural Networks
- M-RewardBench: Evaluating Reward Models in Multilingual Settings
- ELABORATION: A Comprehensive Benchmark on Human-LLMCompetitive Programming
- Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process
- Bias in Language Models: Beyond Trick Tests and TowardsRUTEd Evaluation
- Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models
- The Impact of Auxiliary Patient Data on Automated ChestX-Ray Report Generation and How to Incorporate It
- CLEME2.0: Towards Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction
- StrucText-Eval: Evaluating Large Language Model’s Reasoning Ability in Structure-Rich Text
- Literature Meets Data: A Synergistic Approach to Hypothesis Generation
- GAPO: Learning Preferential Prompt through Generative Adversarial Policy Optimization
- Tree-of-Evolution: Tree-Structured Instruction Evolution for Code Generation in Large Language Models
- Delving into Multilingual Ethical Bias: TheMSQADwith Statistical Hypothesis Tests for Large Language Models
- ReSCORE: Label-free Iterative Retriever Training for Multi-hop Question Answering with Relevance-Consistency Supervision
- FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models
- Statistical Deficiency for Task Inclusion Estimation
- Towards Robust and Efficient Federated Low-Rank Adaptation with Heterogeneous Clients
- LLM-Powered Test Case Generation for Detecting Bugs in Plausible Programs
- Capture the Key in Reasoning to Enhance CoT Distillation Generalization
- How to Enable Effective Cooperation Between Humans and NLP Models: A Survey of Principles, Formalizations, and Beyond
- Enhancing Hyperbole and Metaphor Detection with Their Bidirectional Dynamic Interaction and Emotion Knowledge
- UniICL: An EfficientICLFramework Unifying Compression, Selection, and Generation
- BelarusianGLUE: Towards a Natural Language Understanding Benchmark for Belarusian
- A Survey on Foundation Language Models for Single-cell Biology
- RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios
- ExtendingLLMContext Window with Adaptive Grouped Positional Encoding: A Training-Free Method
- Semantic Exploration with Adaptive Gating for Efficient Problem Solving with Language Models
- HotelMatch-LLM: Joint Multi-Task Training of Small and Large Language Models for Efficient Multimodal Hotel Retrieval
- Can Multimodal Large Language Models Understand Spatial Relations?
- S3- Semantic Signal Separation
- TrimLLM: Progressive Layer Dropping for Domain-SpecificLLMs
- JuStRank: BenchmarkingLLMJudges for System Ranking
- Generating Diverse Training Samples for Relation Extraction with Large Language Models
- MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts
- Efficient and Accurate Prompt Optimization: the Benefit of Memory in Exemplar-Guided Reflection
- Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation
- EscapeBench: Towards Advancing Creative Intelligence of Language Model Agents
- BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving
- LACA: Improving Cross-lingual Aspect-Based Sentiment Analysis with LLM Data Augmentation
- Fusing Highly Specialized Language Models for Comprehensive Expertise
- HybGRAG: Hybrid Retrieval-Augmented Generation on Textual and Relational Knowledge Bases
- Re-ranking Using Large Language Models for Mitigating Exposure to Harmful Content on Social Media Platforms
- AligningAIResearch with the Needs of Clinical Coding Workflows: Eight Recommendations Based on US Data Analysis and Critical Review
- MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection
- EvoWiki: EvaluatingLLMs on Evolving Knowledge
- Rethinking Repetition Problems of LLMs in Code Generation
- PunchBench: BenchmarkingMLLMs in Multimodal Punchline Comprehension
- ProcessBench: Identifying Process Errors in Mathematical Reasoning
- Model Extrapolation Expedites Alignment
- ATLANTIS: Weak-to-Strong Learning via Importance Sampling
- MPVStance: Mitigating Hallucinations in Stance Detection with Multi-Perspective Verification
- Personality-Guided Code Generation Using Large Language Models
- PsyDT: UsingLLMs to Construct the Digital Twin of Psychological Counselor with Personalized Counseling Style for Psychological Counseling
- BIPro: Zero-shotChinese Poem Generation via Block Inverse Prompting Constrained Generation Framework
- LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating
- ObfusLM: Privacy-preserving Language Model Service against Embedding Inversion Attacks
- Interlocking-free Selective Rationalization Through Genetic-based Learning
- Re-identification of De-identified Documents with Autoregressive Infilling
- Modeling Uncertainty in Composed Image Retrieval via Probabilistic Embeddings
- Untie the Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language Models
- APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts
- Evaluating Lexical Proficiency in Neural Language Models
- Autoregressive Speech Synthesis without Vector Quantization
- Cuckoo: AnIEFree Rider Hatched by Massive Nutrition inLLM’s Nest
- FedEx-LoRA: Exact Aggregation for Federated and Efficient Fine-Tuning of Large Language Models
- Measuring Social Biases in Masked Language Models by Proxy of Prediction Quality
- Capturing Author Self Beliefs in Social Media Language
- Neural Topic Modeling with Large Language Models in the Loop
- HALoGEN: FantasticLLMHallucinations and Where to Find Them
- SynergizingLLMs with Global Label Propagation for Multimodal Fake News Detection
- “Yes, MyLoRD.” Guiding Language Model Extraction with Locality Reinforced Distillation
- Jailbreak Large Vision-Language Models Through Multi-Modal Linkage
- Wait, that’s not an option:LLMs Robustness with Incorrect Multiple-Choice Options
- The Hidden Attention of Mamba Models
- KV-Latent: Dimensional-levelKVCache Reduction with Frequency-aware Rotary Positional Embedding
- LEANCODE: Understanding Models Better for Code Simplification of Pre-trained Large Language Models
- MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset
- Ask-Before-Detection: Identifying and Mitigating Conformity Bias inLLM-Powered Error Detector for Math Word Problem Solutions
- Real-time Factuality Assessment from Adversarial Feedback
- Improve Vision Language Model Chain-of-thought Reasoning
- On the Mutual Influence of Gender and Occupation inLLMRepresentations
- Disentangling Memory and Reasoning Ability in Large Language Models
- Open-World Attribute Mining forE-Commerce Products with Multimodal Self-Correction Instruction Tuning
- NormalizedAOPC: Fixing Misleading Faithfulness Metrics for Feature Attributions Explainability
- Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling
- LangSAMP: Language-Script Aware Multilingual Pretraining
- RelationalCoder: Rethinking Complex Tables via Programmatic Relational Transformation
- Algorithmic Fidelity of Large Language Models in Generating SyntheticGerman Public Opinions: A Case Study
- TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos
- Self-Instructed Derived Prompt Generation Meets In-Context Learning: Unlocking New Potential of Black-BoxLLMs
- Binary Classifier Optimization for Large Language Model Alignment
- UnSeenTimeQA: Time-Sensitive Question-Answering BeyondLLMs’ Memorization
- From Information to Insight: LeveragingLLMs for Open Aspect-Based Educational Summarization
- AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset
- Root Defense Strategies: Ensuring Safety ofLLMat the Decoding Level
- In-the-wild Audio Spatialization with Flexible Text-guided Localization
- L4Q: Parameter Efficient Quantization-Aware Fine-Tuning on Large Language Models
- Second Language (Arabic) Acquisition ofLLMs via Progressive Vocabulary Expansion
- What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities inLLMs
- ECERC: Evidence-Cause Attention Network for Multi-Modal Emotion Recognition in Conversation
- CompileAgent: Automated Real-World Repo-Level Compilation with Tool-IntegratedLLM-based Agent System
- Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals’ Subjective Text Perceptions
- Exploring Forgetting in Large Language Model Pre-Training
- Bias in the Mirror : AreLLMs opinions robust to their own adversarial attacks
- AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents
- Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment
- Multimodal Transformers are Hierarchical Modal-wise Heterogeneous Graphs
- Have We Designed Generalizable Structural Knowledge Promptings? Systematic Evaluation and Rethinking
- LLäMmlein: Transparent, Compact and CompetitiveGerman-Only Language Models from Scratch
- Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues
- How Much Do Encoder Models Know About Word Senses?
- When Backdoors Speak: UnderstandingLLMBackdoor Attacks Through Model-Generated Explanations
- HateDay: Insights from a Global Hate Speech Dataset Representative of a Day onTwitter
- LegalAgentBench: EvaluatingLLMAgents in Legal Domain
- Inference Compute-Optimal Video Vision Language Models
- Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models
- Digital Gatekeepers:Google’s Role in Curating Hashtags and Subreddits
- Behind Closed Words: Creating and Investigating the forePLay Annotated Dataset forPolish Erotic Discourse
- Assessment and manipulation of latent constructs in pre-trained language models using psychometric scales
- Did Translation Models Get More Robust Without AnyoneEven Noticing?
- Nemotron-CC: TransformingCommonCrawl into a Refined Long-Horizon Pretraining Dataset
- Hierarchical Level-Wise News Article Clustering via Multilingual Matryoshka Embeddings
- Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models
- INVESTORBENCH: A Benchmark for Financial Decision-Making Tasks withLLM-based Agent
- Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
- Gender Inclusivity Fairness Index (GIFI): A Multilevel Framework for Evaluating Gender Diversity in Large Language Models
- D.Va: Validate Your Demonstration First Before You Use It
- Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?
- MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation
- Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning
- Direct Prompt Optimization with Continuous Representations
- uMedSum: A Unified Framework for Clinical Abstractive Summarization
- GigaSpeech 2: An Evolving, Large-Scale and Multi-domainASRCorpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
- Context-Aware Sentiment Forecasting viaLLM-based Multi-Perspective Role-Playing Agents
- TARGA: Targeted Synthetic Data Generation for Practical Reasoning over Structured Data
- AndroidGen: Building an Android Language Agent under Data Scarcity
- Prompt Candidates, then Distill: A Teacher-Student Framework forLLM-driven Data Annotation
- A Survey of Post-Training Scaling in Large Language Models
- Position-aware Automatic Circuit Discovery
- HyperFM: Fact-Centric Multimodal Fusion for Link Prediction over Hyper-Relational Knowledge Graphs
- Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model
- Less for More: Enhanced Feedback-aligned MixedLLMs for Molecule Caption Generation and Fine-GrainedNLIEvaluation
- Ensemble Watermarks for Large Language Models
- ConInstruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual Modalities
- TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning forLLM-as-a-Judge
- DioR: Adaptive Cognitive Detection and Contextual Retrieval Optimization for Dynamic Retrieval-Augmented Generation
- Unveiling the Power of Source: Source-based MinimumBayes Risk Decoding for Neural Machine Translation
- ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use
- Mixture of insighTful Experts (MoTE): The Synergy of Reasoning Chains and Expert Mixtures in Self-Alignment
- MAPS: Motivation-Aware Personalized Search viaLLM-Driven Consultation Alignment
- Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework
- LADM: Long-context Training Data Selection with Attention-based Dependency Measurement forLLMs
- Iron Sharpens Iron: Defending Against Attacks in Machine-Generated Text Detection with Adversarial Training
- Cultural Learning-Based Culture Adaptation of Language Models
- A-TASC:AsianTED-Based Automatic Subtitling Corpus
- Refuse Whenever You Feel Unsafe: Improving Safety inLLMs via Decoupled Refusal Training
- Token Prepending: A Training-Free Approach for Eliciting Better Sentence Embeddings fromLLMs
- No Questions are Stupid, but some are Poorly Posed: Understanding Poorly-Posed Information-Seeking Questions
- Understanding Common Ground Misalignment in Goal-Oriented Dialog: A Case-Study withUbuntu Chat Logs
- Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models
- Towards Reward Fairness inRLHF: From a Resource Allocation Perspective
- TamingLLMs with Gradient Grouping
- LazyReview: A Dataset for Uncovering Lazy Thinking inNLPPeer Reviews
- Revisiting Common Assumptions aboutArabic Dialects inNLP
- Retrieve to Explain: Evidence-driven Predictions for Explainable Drug Target Identification
- Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas
- Which of These Best Describes Multiple Choice Evaluation withLLMs? A) ForcedB) FlawedC) FixableD) All of the Above
- Detection of Human and Machine-Authored Fake News inUrdu
- An Efficient Task-Oriented Dialogue Policy: Evolutionary Reinforcement Learning Injected by Elite Individuals
- SR-LLM: Rethinking the Structured Representation in Large Language Model
- Taming Language Models for Text-attributed Graph Learning with Decoupled Aggregation
- Contrastive Prompting Enhances Sentence Embeddings inLLMs through Inference-Time Steering
- Cracking the Code of Hallucination inLVLMs with Vision-aware Head Divergence
- Hierarchical Document Refinement for Long-context Retrieval-augmented Generation
- Comparing Moral Values inWesternEnglish-speaking societies andLLMs with Word Associations
- TEACH: A Contrastive Knowledge Adaptive Distillation Framework for ClassicalChinese Understanding
- RAG-Critic: Leveraging Automated Critic-Guided Agentic Workflow for Retrieval Augmented Generation
- Progressive Multimodal Reasoning via Active Retrieval
- Pre-training Distillation for Large Language Models: A Design Space Exploration
- Teaching Vision-Language Models to Ask: Resolving Ambiguity in Visual Questions
- LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks
- Battling against Tough Resister: Strategy Planning with Adversarial Game for Non-collaborative Dialogues
- Cross-model Transferability among Large Language Models on the Platonic Representations of Concepts
- FoldMoE: Efficient Long SequenceMoETraining via Attention-MoEPipelining
- LongReward: Improving Long-context Large Language Models withAIFeedback
- Influences onLLMCalibration: A Study of Response Agreement, Loss Functions, and Prompt Styles
- UTBoost: Rigorous Evaluation of Coding Agents onSWE-Bench
- Towards Better Evaluation for Generated Patent Claims
- Fine-Tuning on Diverse Reasoning Chains Drives Within-InferenceCoTRefinement inLLMs
- Establishing TrustworthyLLMEvaluation via Shortcut Neuron Analysis
- Do Large Language Models have anEnglish Accent? Evaluating and Improving the Naturalness of MultilingualLLMs
- Enhancing Character-Level Understanding inLLMs through Token Internal Structure Learning
- Conformity in Large Language Models
- Interpret and Improve In-Context Learning via the Lens of Input-Label Mappings
- Positional Overload: Positional Debiasing and Context Window Extension for Large Language Models using Set Encoding
- FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling
- VReST: Enhancing Reasoning in Large Vision-Language Models through Tree Search and Self-Reward Mechanism
- Past Meets Present: Creating Historical Analogy with Large Language Models
- Meta-Reflection: A Feedback-Free Reflection Learning Framework
- Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar Books
- Confidence v.s. Critique: A Decomposition of Self-Correction Capability forLLMs
- Automating Legal Interpretation withLLMs: Retrieval, Generation, and Evaluation
- Visual Evidence Prompting Mitigates Hallucinations in Large Vision-Language Models
- Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AICollaboration
- TokAlign: Efficient Vocabulary Adaptation via Token Alignment
- AdaEdit: Advancing Continuous Knowledge Editing For Large Language Models
- The Impact of Token Granularity on the Predictive Power of Language Model Surprisal
- Segment-Level Diffusion: A Framework for Controllable Long-Form Generation with Diffusion Language Models
- BELLE: A Bi-Level Multi-Agent Reasoning Framework for Multi-Hop Question Answering
- Dynamic and Generalizable Process Reward Modeling
- AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness
- Towards Text-Image Interleaved Retrieval
- Large Margin Representation Learning for Robust Cross-lingual Named Entity Recognition
- An Efficient and Precise Training Data Construction Framework for Process-supervised Reward Model in Mathematical Reasoning
- QAEncoder: Towards Aligned Representation Learning in Question Answering Systems
- Game Development as Human-LLMInteraction
- CanLLMs SimulateL2-English Dialogue? An Information-Theoretic Analysis ofL1-Dependent Biases
- DeepSolution: Boosting Complex Engineering Solution Design via Tree-based Exploration and Bi-point Thinking
- SurveyPilot: an Agentic Framework for Automated Human Opinion Collection from Social Media
- Sharper and Faster mean Better: Towards More Efficient Vision-Language Model for Hour-scale Long Video Understanding
- Auto-Arena: AutomatingLLMEvaluations with Agent Peer Battles and Committee Discussions
- How Humans andLLMs Organize Conceptual Knowledge: Exploring Subordinate Categories inItalian
- PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models
- ProtoLens: Advancing Prototype Learning for Fine-Grained Interpretability in Text Classification
- Fine-grained Video Dubbing Duration Alignment with Segment Supervised Preference Optimization
- Sparse Latents Steer Retrieval-Augmented Generation
- Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders
- SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model
- AnRe: Analogical Replay for Temporal Knowledge Graph Forecasting
- Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
- Text is All You Need:LLM-enhanced Incremental Social Event Detection
- Multimodal Pragmatic Jailbreak on Text-to-image Models
- Principled Understanding of Generalization for Generative Transformer Models in Arithmetic Reasoning Tasks
- Discourse Relation-Enhanced Neural Coherence Modeling
- Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models
- from Benign import Toxic: Jailbreaking the Language Model via Adversarial Metaphors
- ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Multilingual Contrastive Framework
- MorphMark: Flexible Adaptive Watermarking for Large Language Models
- A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression
- On the Limit of Language Models as Planning Formalizers
- Learning to Generate Structured Output with Schema Reinforcement Learning
- Enhancing Unsupervised Sentence Embeddings via Knowledge-Driven Data Augmentation andGaussian-Decayed Contrastive Learning
- Improve Safety Training of Large Language Models with Safety-Critical Singular Vectors Localization
- WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models
- A Triple-View Framework for Fine-Grained Emotion Classification with Clustering-Guided Contrastive Learning
- Quantification of Large Language Model Distillation
- Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models
- Pandora’s Box or Aladdin’s Lamp: A Comprehensive Analysis Revealing the Role ofRAGNoise in Large Language Models
- Stepwise Reasoning Disruption Attack ofLLMs
- Crowd Comparative Reasoning: Unlocking Comprehensive Evaluations forLLM-as-a-Judge
- Lost in Multilinguality: Dissecting Cross-lingual Factual Inconsistency in Transformer Language Models
- Optimizing Decomposition for Optimal Claim Verification
- GradOT: Training-free Gradient-preserving Offsite-tuning for Large Language Models
- Knowledge Boundary of Large Language Models: A Survey
- Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal LongCoTReasoning
- MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System
- Mitigating Selection Bias with Node Pruning and Auxiliary Options
- Dually Self-Improved Counterfactual Data Augmentation Using Large Language Model
- RPO: Retrieval Preference Optimization for Robust Retrieval-Augmented Generation
- Learning to Reason from Feedback at Test-Time
- L-CiteEval: A Suite for Evaluating Fidelity of Long-context Models
- SECRET: Semi-supervised Clinical Trial Document Similarity Search
- Geometric Signatures of Compositionality Across a Language Model’s Lifetime
- Pattern Recognition or Medical Knowledge? The Problem with Multiple-Choice Questions in Medicine
- People who frequently useChatGPTfor writing tasks are accurate and robust detectors ofAI-generated text
- YuLan-Mini: Pushing the Limits of Open Data-efficient Language Model
- Your Model is Overconfident, and Other Lies We Tell Ourselves
- Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention
- Plug-in and Fine-tuning: Bridging the Gap between Small Language Models and Large Language Models
- What is Stigma Attributed to? A Theory-Grounded, Expert-Annotated Interview Corpus for Demystifying Mental-Health Stigma
- ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors
- Enhancing Transformers for Generalizable First-Order Logical Entailment
- Self-Taught Agentic Long Context Understanding
- Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training
- OS-Genesis: AutomatingGUIAgent Trajectory Construction via Reverse Task Synthesis
- CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter
- ConSim: Measuring Concept-Based Explanations’ Effectiveness with Automated Simulatability
- Decoding Reading Goals from Eye Movements
- Uncovering Visual-Semantic Psycholinguistic Properties from the Distributional Structure of Text Embedding Space
- GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge forGUIAgent
- P2Law: Scaling Law for Post-Training After Model Pruning
- MakingFETCH! Happen: Finding Emergent Dog Whistles Through Common Habitats
- Lost in the Context: Insufficient and Distracted Attention to Contexts in Preference Modeling
- Entailment-Preserving First-order Logic Representations in Natural Language Entailment
- Enhancing Multimodal Continual Instruction Tuning withBranchLoRA
- Enhancing Automated Interpretability with Output-Centric Feature Descriptions
- Towards Effective and Efficient Continual Pre-training of Large Language Models
- Efficient Universal Goal Hijacking with Semantics-guided Prompt Organization
- mPLUG-DocOwl2: High-resolution Compressing forOCR-free Multi-page Document Understanding
- What Makes a Good Natural Language Prompt?
- X-TURING: Towards an Enhanced and EfficientTuring Test for Long-Term Dialogue Agents
- Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline withUniMoral
- Modality-Aware Neuron Pruning for Unlearning in Multimodal Large Language Models
- NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional Reasoning
- ReLearn: Unlearning via Learning for Large Language Models
- Understanding Cross-Domain Adaptation in Low-Resource Topic Modeling
- UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning
- HoH: A Dynamic Benchmark for Evaluating the Impact of Outdated Information on Retrieval-Augmented Generation
- Uncertainty Propagation onLLMAgent
- Beyond Position: the emergence of wavelet-like properties in Transformers
- Are the Hidden States Hiding Something? Testing the Limits of Factuality-Encoding Capabilities inLLMs
- Disentangling Biased Knowledge from Reasoning in Large Language Models via Machine Unlearning
- LLaMAs Have Feelings Too: Unveiling Sentiment and Emotion Representations inLLaMAModels Through Probing
- CxGGEC: Construction-Guided Grammatical Error Correction
- Beyond Sequences: Two-dimensional Representation and Dependency Encoding for Code Generation
- HD-NDEs: Neural Differential Equations for Hallucination Detection inLLMs
- What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific Presentations
- NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring forPDFQuestion Answering
- ProvBench: A Benchmark of Legal Provision Recommendation for Contract Auto-Reviewing
- F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
- AutoMedEval: Harnessing Language Models for Automatic Medical Capability Evaluation
- CoT-based Synthesizer: EnhancingLLMPerformance through Answer Synthesis
- Efficiently Identifying Watermarked Segments in Mixed-Source Texts
- Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks
- Towards a More Generalized Approach in Open Relation Extraction
- Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home
- Evaluating Language Models as Synthetic Data Generators
- Can Graph Descriptive Order Affect Solving Graph Problems withLLMs?
- Learning to Rewrite: GeneralizedLLM-Generated Text Detection
- Evaluating Multimodal Large Language Models on Video Captioning viaMonteCarlo Tree Search
- GIFT-SW:Gaussian noise Injected Fine-Tuning of Salient Weights forLLMs
- Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis
- Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
- AlignMMBench: EvaluatingChinese Multimodal Alignment in Large Vision-Language Models
- BiasedLLMs can Influence Political Decision-Making
- LexTempus: Enhancing Temporal Generalizability of Legal Language Models Through Dynamic Mixture of Experts
- That is Unacceptable: the Moral Foundations of Canceling
- FloorPlan-LLaMa: Aligning Architects’ Feedback and Domain Knowledge in Architectural Floor Plan Generation
- TheoremExplainAgent: Towards Video-based Multimodal Explanations forLLMTheorem Understanding
- FineReason: Evaluating and ImprovingLLMs’ Deliberate Reasoning through Reflective Puzzle Solving
- TheTIPof the Iceberg: Revealing a Hidden Class of Task-in-Prompt Adversarial Attacks onLLMs
- Identifying Reliable Evaluation Metrics for Scientific Text Revision
- Can Language Models Reason about Individualistic Human Values and Preferences?
- BERT-like Models forSlavic Morpheme Segmentation
- Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling
- Unlocking General Long Chain-of-Thought Reasoning Capabilities of Large Language Models via Representation Engineering
- Drift: EnhancingLLMFaithfulness in Rationale Generation via Dual-Reward Probabilistic Inference
- Fairness through Difference Awareness: MeasuringDesiredGroup Discrimination inLLMs
- MergePrint: Merge-Resistant Fingerprints for Robust Black-box Ownership Verification of Large Language Models
- Dynamic Scaling of Unit Tests for Code Reward Modeling
- UniConv: Unifying Retrieval and Response Generation for Large Language Models in Conversations
- Tracking Life’s Ups and Downs: Mining Life Events from Social Media Posts for Mental Health Analysis
- ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control
- PIC: Unlocking Long-Form Text Generation Capabilities of Large Language Models via PositionIDCompression
- Towards Effective Extraction and Evaluation of Factual Claims
- Beyond Facts: Evaluating Intent Hallucination in Large Language Models
- A Systematic Study of Compositional Syntactic Transformer Language Models
- M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation
- SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition
- Personalized Text Generation with Contrastive Activation Steering
- Gumbel Reranking: Differentiable End-to-End Reranker Optimization
- Hybrid Preferences: Learning to Route Instances for Human vs.AIFeedback
- SEOE: A Scalable and Reliable Semantic Evaluation Framework for Open Domain Event Detection
- TheUD-NewsCrawl Treebank: Reflections and Challenges from a Large-scaleTagalog Syntactic Annotation Project
- DRAG: DistillingRAGforSLMs fromLLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation
- G-Safeguard: A Topology-Guided Security Lens and Treatment onLLM-based Multi-agent Systems
- Deontological Keyword Bias: The Impact of Modal Expressions on Normative Judgments of Language Models
- LegalReasoner: Step-wised Verification-Correction for Legal Judgment Reasoning
- Rolling theDICEon Idiomaticity: HowLLMs Fail to Grasp Context
- ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
- The Cross-linguistic Role ofAnimacy in Grammar Structures
- LexGen: Domain-aware Multilingual Lexicon Generation
- How to Train Long-Context Language Models (Effectively)
- MathFusion: Enhancing Mathematical Problem-solving ofLLMthrough Instruction Fusion
- Mining Complex Patterns of Argumentative Reasoning in Natural Language Dialogue
- OSAgents: A Survey onMLLM-based Agents for Computer, Phone and Browser Use
- Data Quality Issues in Multilingual Speech Datasets: The Need for Sociolinguistic Awareness and Proactive Language Planning
- LLMas a Broken Telephone: Iterative Generation Distorts Information
- VLM2-Bench: A Closer Look at How WellVLMs Implicitly Link Explicit Matching Visual Cues
- Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation
- Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
- Large Language Models Struggle to Describe the Haystack without Human Help: A Social Science-Inspired Evaluation of Topic Models
- ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models
- Enough Coin Flips Can MakeLLMs ActBayesian
- GAMEBoT: Transparent Assessment ofLLMReasoning in Games
- A Text is Worth Several Tokens: Text Embedding fromLLMs Secretly Aligns Well with The Key Tokens
- Commonsense Reasoning inArab Culture
- AXIS: Efficient Human-Agent-Computer Interaction withAPI-FirstLLM-Based Agents
- Translation and Fusion Improves Cross-lingual Information Extraction
- Conditional Dichotomy Quantification via Geometric Embedding
- Aligning Large Language Models with Implicit Preferences from User-Generated Content
- VQAGuider: Guiding Multimodal Large Language Models to Answer Complex Video Questions
- Large Language Models are Good Relational Learners
- SpaRE: Enhancing Spatial Reasoning in Vision-Language Models with Synthetic Data
- Distilling an End-to-End Voice Assistant Without Instruction Training Data
- CoMet: Metaphor-Driven Covert Communication for Multi-Agent Language Games
- CER: Confidence Enhanced Reasoning inLLMs
- Watermarking Large Language Models: An Unbiased and Low-risk Method
- On Synthetic Data Strategies for Domain-Specific Generative Retrieval
- LLMBraces: Straightening OutLLMPredictions with Relevant Sub-Updates
- CONFETTI: Conversational Function-Calling Evaluation Through Turn-Level Interactions
- Evaluating Theory of (an uncertain) Mind: Predicting the Uncertain Beliefs of Others from Conversational Cues
- Uncertainty in Causality: A New Frontier
- SynthesizeMe! Inducing Persona-Guided Prompts for Personalized Reward Models inLLMs
- When People are Floods: Analyzing Dehumanizing Metaphors in Immigration Discourse with Large Language Models
- AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection
- Improving Model Factuality with Fine-grained Critique-based Evaluator
- Building a Long Text Privacy Policy Corpus with Multi-Class Labels
- R2-MultiOmnia: Leading Multilingual Multimodal Reasoning via Self-Training
- When theLMmisunderstood the human chuckled: Analyzing garden path effects in humans and language models
- Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models
- VLSBench: Unveiling Visual Leakage in Multimodal Safety
- Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning
- Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation
- Conspiracy Theories and Where to Find Them onTikTok
- Growing Through Experience: Scaling Episodic Grounding in Language Models
- Exploiting the Shadows: Unveiling Privacy Leaks through Lower-Ranked Tokens in Large Language Models
- Attacking Vision-Language Computer Agents via Pop-ups
- Explicit and Implicit Data Augmentation for Social Event Detection
- In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents
- Revisiting ClassicalChinese Event Extraction with Ancient Literature Information
- Unanswerability Evaluation for Retrieval Augmented Generation
- SCALE: Towards Collaborative Content Analysis in Social Science with Large Language Model Agents and Human Intervention
- Self-Error-Instruct: Generalizing from Errors forLLMs Mathematical Reasoning
- RAGEval: Scenario SpecificRAGEvaluation Dataset Generation Framework
- A Survey on Patent Analysis: FromNLPto MultimodalAI
- SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification
- MultiAgentBench : Evaluating the Collaboration and Competition ofLLMagents
- Sinhala Encoder-only Language Models and Evaluation
- LLMs can Perform Multi-Dimensional Analytic Writing Assessments: A Case Study ofL2 Graduate-Level AcademicEnglish Writing
- SEUF: Is Unlearning One Expert Enough for Mixture-of-ExpertsLLMs?
- Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and Challenges
- LocAgent: Graph-GuidedLLMAgents for Code Localization
- COSMMIC: Comment-Sensitive Multimodal MultilingualIndian Corpus for Summarization and Headline Generation
- Mind the Gap: Static and Interactive Evaluations of Large Audio Models
- Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study onManchu
- CKnowEdit: A NewChinese Knowledge Editing Dataset for Linguistics, Facts, and Logic Error Correction inLLMs
- TripleFact: Defending Data Contamination in the Evaluation ofLLM-driven Fake News Detection
- Meaning Beyond Truth Conditions: Evaluating Discourse Level Understanding via Anaphora Accessibility
- Large Language and Reasoning Models are Shallow Disjunctive Reasoners
- Warmup Generations: A Task-Agnostic Approach for Guiding Sequence-to-Sequence Learning with Unsupervised Initial State Generation
- Building Better: Avoiding Pitfalls in Developing Language Resources when Data is Scarce
- BRIGHTER:BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages
- SkillVerse : Assessing and EnhancingLLMs with Tree Evaluation
- CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in theLLMEra
- Empathy Prediction from Diverse Perspectives
- AreLLMs effective psychological assessors? Leveraging adaptiveRAGfor interpretable mental health screening through psychometric practice
- INTERACT: Enabling Interactive, Question-Driven Learning in Large Language Models
- Circuit Stability Characterizes Language Model Generalization
- ComparingLLM-generated and human-authored news text using formal syntactic theory
- Improving Preference Extraction InLLMs By Identifying Latent Knowledge Through Classifying Probes
- White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases inLLMs
- AIMSCheck: LeveragingLLMs forAI-Assisted Review of Modern Slavery Statements Across Jurisdictions
- Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence
- SelfElicit: Your Language Model Secretly Knows Where is the Relevant Evidence
- The MaleCEOand the Female Assistant: Evaluation and Mitigation of Gender Biases in Text-To-Image Generation of Dual Subjects
- Mitigating Shortcut Learning withInterpoLated Learning
- Toward Automatic Discovery of a Canine Phonetic Alphabet
- DavIR: Data Selection via Implicit Reward for Large Language Models
- Byte Latent Transformer: Patches Scale Better Than Tokens
- DiffuseDef: Improved Robustness to Adversarial Attacks via Iterative Denoising
- Identifying Cellular Niches in Spatial Transcriptomics: An Investigation into the Capabilities of Large Language Models
- Culture Matters in Toxic Language Detection inPersian
- Bitnet.cpp: Efficient Edge Inference for TernaryLLMs
- Instance-Selection-Inspired Undersampling Strategies for Bias Reduction in Small and Large Language Models for Binary Text Classification
- Forward Knows Efficient Backward Path: Saliency-Guided Memory-Efficient Fine-tuning of Large Language Models
- Focus on What Matters: Enhancing Medical Vision-Language Models with Automatic Attention Alignment Tuning
- LLMs + Persona-Plug = PersonalizedLLMs
- Developmentally-plausible Working Memory Shapes a Critical Period for Language Acquisition
- IRIS: An Iterative and Integrated Framework for Verifiable Causal Discovery in the Absence of Tabular Data
- INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16African Languages
- Boosting Long-Context Information Seeking via Query-Guided Activation Refilling
- Efficient Pretraining Data Selection for Language Models via Multi-Actor Collaboration
- AdaDHP: Fine-Grained Fine-Tuning via DualHadamard Product and Adaptive Parameter Selection
- KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph
- Curriculum Debiasing: Toward Robust Parameter-Efficient Fine-Tuning Against Dataset Biases
- Does Context Matter?ContextualJudgeBench for EvaluatingLLM-based Judges in Contextual Settings
- On the Reliability of Large Language Models for Causal Discovery
- Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts
- TeRDy: Temporal Relation Dynamics through Frequency Decomposition for Temporal Knowledge Graph Completion
- Incorporating Domain Knowledge into Materials Tokenization
- PIG: Privacy Jailbreak Attack onLLMs via Gradient-based Iterative In-Context Optimization
- Agents Under Siege: Breaking Pragmatic Multi-AgentLLMSystems with Optimized Prompt Attacks
- Semantic-Eval : A Semantic Comprehension Evaluation Framework for Large Language Models Generation without Training
- Between Circuits andChomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
- When to Speak, When to Abstain: Contrastive Decoding with Abstention
- On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era ofLLMs
- Investigating and Extending Homans’ Social Exchange Theory with Large Language Model based Agents
- A Drop-In Solution for On-the-Fly Adaptation of Speculative Decoding in Large Language Models
- If Attention Serves as a Cognitive Model of Human Memory Retrieval, What is the Plausible Memory Representation?
- AligningVLMAssistants with Personalized Situated Cognition
- Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models
- Faster Speculative Decoding via Effective Draft Decoder with Pruned Candidate Tree
- Selecting and Merging: Towards Adaptable and Scalable Named Entity Recognition with Large Language Models
- Embracing Imperfection: Simulating Students with Diverse Cognitive Levels UsingLLM-based Agents
- CADReview: Automatically ReviewingCADPrograms with Error Detection and Correction
- Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling
- The Lawyer That Never Thinks: Consistency and Fairness as Keys to ReliableAI
- Polishing Every Facet of theGEM: Testing Linguistic Competence ofLLMs and Humans inKorean
- SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods
- ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation
- InvestAlign: Overcoming Data Scarcity in Aligning Large Language Models with Investor Decision-Making Processes Under Herd Behavior
- Enhancing Neural Machine Translation Through Target Language Data: AkNN-LMApproach for Domain Adaptation
- Multi-level Relevance Document Identifier Learning for Generative Retrieval
- EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
- Exploring How GenerativeMLLMs Perceive More ThanCLIPwith the Same Vision Encoder
- NexusSum: HierarchicalLLMAgents for Long-Form Narrative Summarization
- HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
- Uni-Retrieval: A Multi-Style Retrieval Framework forSTEM’s Education
- DenseLoRA: Dense Low-Rank Adaptation of Large Language Models
- Exploring the Potential ofLLMs as Personalized Assistants: Dataset, Evaluation, and Analysis
- Cracking Factual Knowledge: A Comprehensive Analysis of Degenerate Knowledge Neurons in Large Language Models
- Towards Context-RobustLLMs: A Gated Representation Fine-tuning Approach
- On Support Samples of Next Word Prediction
- WebWalker: BenchmarkingLLMs in Web Traversal
- From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language Models
- AutoGUI: ScalingGUIGrounding with Automatic Functionality Annotations fromLLMs
- Introducing Graph Context into Language Models through Parameter-Efficient Fine-Tuning for Lexical Relation Mining
- S-RAG: A Novel Audit Framework for Detecting Unauthorized Use of Personal Data inRAGSystems
- Praetor: A Fine-Grained GenerativeLLMEvaluator with Instance-Level Customizable Evaluation Criteria
- Mitigating Confounding in Speech-Based Dementia Detection through Weight Masking
- MCS-Bench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models inChinese Classical Studies
- The Knowledge Microscope: Features as Better Analytical Lenses than Neurons
- From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding
- PrivaCI-Bench: Evaluating Privacy with Contextual Integrity and Legal Compliance
- Unveiling Environmental Impacts of Large Language Model Serving: A Functional Unit View
- ExpeTrans:LLMs Are Experiential Transfer Learners
- Cool-Fusion: Fuse Large Language Models without Training
- DAPEV2: Process Attention Score as Feature Map for Length Extrapolation
- MuSC: Improving Complex Instruction Following with Multi-granularity Self-Contrastive Training
- LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation
- APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks acrossGPUs
- PPT: A Minor Language News Recommendation Model via Cross-Lingual Preference Pattern Transfer
- GainRAG: Preference Alignment in Retrieval-Augmented Generation through Gain Signal Synthesis
- Top-n𝜎: Eliminating Noise in Logit Space for Robust Token Sampling ofLLM
- SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation
- Mitigating Non-Representative Prototypes and Representation Bias in Few-Shot Continual Relation Extraction
- MoQAE: Mixed-Precision Quantization for Long-ContextLLMInference via Mixture of Quantization-Aware Experts
- PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration
- Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models
- GuessArena: Guess WhoIAm? A Self-Adaptive Framework for EvaluatingLLMs in Domain-Specific Knowledge and Reasoning
- Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
- DTCRS: Dynamic Tree Construction for Recursive Summarization
- A Generative Adaptive Replay Continual Learning Model for Temporal Knowledge Graph Reasoning
- ARise: Towards Knowledge-Augmented Reasoning via Risk-Adaptive Search
- PKAG-DDI: Pairwise Knowledge-Augmented Language Model for Drug-Drug Interaction Event Text Generation
- Knowledge-Augmented Multimodal Clinical Rationale Generation for Disease Diagnosis with Small Language Models
- TWIST: Text-encoder Weight-editing for Inserting Secret Trojans in Text-to-Image Models
- Frictional Agent Alignment Framework: Slow Down and Don’t Break Things
- Powerformer: Efficient and High-Accuracy Privacy-Preserving Language Model with Homomorphic Encryption
- Beware of Your Po! Measuring and MitigatingAISafety Risks in Role-Play Fine-Tuning ofLLMs
- Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision?
- Towards Enhanced Immersion and Agency forLLM-based Interactive Drama
- Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic Structures
- Improving Factuality with Explicit Working Memory
- Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models
- Dynamic Parallel Tree Search for EfficientLLMReasoning
- Pre3: Enabling Deterministic Pushdown Automata for Faster StructuredLLMGeneration
- SHARE: AnSLM-based Hierarchical ActionCorREction Assistant for Text-to-SQL
- GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models
- Large Language and Protein Assistant for Protein-Protein Interactions Prediction
- An Empirical Study of Many-to-Many Summarization with Large Language Models
- Locate-and-Focus: Enhancing Terminology Translation in Speech Language Models
- GuideBench: Benchmarking Domain-Oriented Guideline Following forLLMAgents
- TC–RAG:Turing–CompleteRAG’s Case study on MedicalLLMSystems
- SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning
- MiniLongBench: The Low-cost Long Context Understanding Benchmark for Large Language Models
- Divide-Then-Align: Honest Alignment based on the Knowledge Boundary ofRAG
- PwnGPT: Automatic Exploit Generation Based on Large Language Models
- VMLUBenchmarks: A comprehensive benchmark toolkit forVietnameseLLMs
- Scaling up the State Size ofRNNLLMs for Long-Context Scenarios
- Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion Processes
- A Strategic Coordination Framework of SmallLMs Matches LargeLMs in Data Synthesis
- Defining and Evaluating Visual Language Models’ Basic Spatial Abilities: A Perspective from Psychometrics
- SPHERE: Unveiling Spatial Blind Spots in Vision-Language Models Through Hierarchical Evaluation
- User-side Model Consistency Monitoring for Open Source Large Language Models Inference Services
- Jailbreaking? One Step Is Enough!
- Parenting: Optimizing Knowledge Selection of Retrieval-Augmented Language Models with Parameter Decoupling and Tailored Tuning
- PaSa: AnLLMAgent for Comprehensive Academic Paper Search
- Less Mature is More Adaptable for Sentence-level Language Modeling
- EpMAN: Episodic MemoryAttentioNfor Generalizing to Longer Contexts
- UORA: Uniform Orthogonal Reinitialization Adaptation in Parameter Efficient Fine-Tuning of Large Models
- Agri-CM3: AChinese Massive Multi-modal, Multi-level Benchmark for Agricultural Understanding and Reasoning
- TROVE: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification
- CaLMQA: Exploring culturally specific long-form question answering across 23 languages
- Croppable Knowledge Graph Embedding
- HyKGE: A Hypothesis Knowledge Graph EnhancedRAGFramework for Accurate and Reliable MedicalLLMs Responses
- LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models
- BeamLoRA: Beam-Constraint Low-Rank Adaptation
- GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art
- UniLR: Unleashing the Power ofLLMs on Multiple Legal Tasks with a Unified Legal Retriever
- Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models
- Beyond Dialogue: A Profile-Dialogue Alignment Framework Towards General Role-Playing Language Model
- ACECODER: Acing CoderRLvia Automated Test-Case Synthesis
- Quantifying Semantic Emergence in Language Models
- DebateCoder: Towards Collective Intelligence ofLLMs via Test Case DrivenLLMDebate for Code Generation
- The Tug of War Within: Mitigating the Fairness-Privacy Conflicts in Large Language Models
- GraphInsight: Unlocking Insights in Large Language Models for Graph Structure Understanding
- Phonotomizer: A Compact, Unsupervised, Online Training Approach to Real-Time, Multilingual Phonetic Segmentation
- A Multi-persona Framework for Argument Quality Assessment
- Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification
- SAMDecoding: Speculative Decoding via Suffix Automaton
- PsyAdvisor: A Plug-and-Play Strategy Advice Planner with Proactive Questioning in Psychological Conversations
- HomeBench: EvaluatingLLMs in Smart Homes with Valid and Invalid Instructions Across Single and Multiple Devices
- Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
- GiFT:Gibbs Fine-Tuning for Code Generation
- Enhancing Interpretable Image Classification ThroughLLMAgents and Conditional Concept Bottleneck Models
- Reliably Bounding False Positives: A Zero-Shot Machine-Generated Text Detection Framework via Multiscaled Conformal Prediction
- RSCF: Relation-Semantics Consistent Filter for Entity Embedding of Knowledge Graph
- RolePlot: A Systematic Framework for Evaluating and Enhancing the Plot-Progression Capabilities of Role-Playing Agents
- TreeRL:LLMReinforcement Learning with On-Policy Tree Search
- Can a Single Model Master Both Multi-turn Conversations and Tool Use?CoALM: A Unified Conversational Agentic Language Model
- Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation
- SDPO: Segment-Level Direct Preference Optimization for Social Agents
- KokoroChat: AJapanese Psychological Counseling Dialogue Dataset Collected via Role-Playing by Trained Counselors
- SURVEYFORGE: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing
- MakingLLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning
- AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research
- Redundancy Principles forMLLMs Benchmarks
- WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models
- ChildMandarin: A ComprehensiveMandarin Speech Dataset for Young Children Aged 3-5
- Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization
- Enhancing Safe and Controllable Protein Generation via Knowledge Preference Optimization
- SINCon: MitigateLLM-Generated Malicious Message Injection Attack for Rumor Detection
- Agentic Knowledgeable Self-awareness
- A Unified Agentic Framework for Evaluating Conditional Image Generation
- Planning-Driven Programming: A Large Language Model Programming Workflow
- Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study Over Open-ended Question Answering
- Nudging: Inference-time Alignment ofLLMs via Guided Decoding
- Unveiling Attractor Cycles in Large Language Models: A Dynamical Systems View of Successive Paraphrasing
- SCAR: Data Selection via Style Consistency-Aware Response Ranking for Efficient Instruction-Tuning of Large Language Models
- HFT: Half Fine-Tuning for Large Language Models
- Beyond Surface Simplicity: Revealing Hidden Reasoning Attributes for Precise Commonsense Diagnosis
- From Objectives to Questions: A Planning-based Framework for Educational Mathematical Question Generation
- RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts
- Lost in Literalism: How Supervised Training Shapes Translationese inLLMs
- AccurateKVCache Quantization with Outlier Tokens Tracing
- Can Large Language Models UnderstandInternet Buzzwords Through User-Generated Content
- EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language Models
- Activation Steering Decoding: Mitigating Hallucination in Large Vision-Language Models through Bidirectional Hidden State Intervention
- Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models
- Improving Medical Large Vision-Language Models with Abnormal-Aware Feedback
- Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging
- MapNav: A Novel Memory Representation via Annotated Semantic Maps forVLM-based Vision-and-Language Navigation
- Exploring Compositional Generalization of MultimodalLLMs for Medical Imaging
- CLAIM: Mitigating Multilingual Object Hallucination in Large Vision-Language Models with Cross-Lingual Attention Intervention
- Wizard of Shopping: Target-OrientedE-commerce Dialogue Generation with Decision Tree Branching
- Qwen2.5-xCoder: Multi-Agent Collaboration for Multilingual Code Instruction Tuning
- Cultivating Gaming Sense for Yourself: MakingVLMs Gaming Experts
- Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
- Extending Complex Logical Queries on Uncertain Knowledge Graphs
- Knowledge Decoupling via Orthogonal Projection for Lifelong Editing of Large Language Models
- 𝜙-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation
- CanLLMWatermarks Robustly Prevent Unauthorized Knowledge Distillation?
- Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization
- Inducing lexicons of in-group language with socio-temporal context
- LLaSE-G1: Incentivizing Generalization Capability forLLaMA-based Speech Enhancement
- MadaKV: Adaptive Modality-PerceptionKVCache Eviction for Efficient Multimodal Long-Context Inference
- EfficientOpAmp Adaptation for Zoom Attention to Golden Contexts
- Language-Codec: Bridging Discrete Codec Representations and Speech Language Models
- Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger
- MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark
- Code-Switching Red-Teaming:LLMEvaluation for Safety and Multilingual Understanding
- UnleashingLLMReasoning Capability via Scalable Question Synthesis from Scratch
- DREsS: Dataset for Rubric-based Essay Scoring onEFLWriting
- PQR: Improving Dense Retrieval via Potential Query Modeling
- Cross-Lingual Generalization and Compression: From Language-Specific to Shared Neurons
- SDBench: A Survey-based Domain-specificLLMBenchmarking and Optimization Framework
- ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents
- Lexical Recall or Logical Reasoning: Probing the Limits of Reasoning Abilities in Large Language Models
- ChainEdit: Propagating Ripple Effects inLLMKnowledge Editing through Logical Rule-Guided Chains
- HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model
- Self-supervised Quantized Representation for Seamlessly Integrating Knowledge Graphs with Large Language Models
- Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State Tracking
- TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition
- CRiskEval: AChinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models
- STUN: Structured-Then-Unstructured Pruning for ScalableMoEPruning
- Mimicking the Familiar: Dynamic Command Generation for Information Theft Attacks inLLMTool-Learning System
- FlashAudio: Rectified Flow for Fast and High-Fidelity Text-to-Audio Generation
- How does Misinformation Affect Large Language Model Behaviors and Preferences?
- YESciEval: RobustLLM-as-a-Judge for Scientific Question Answering
- GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding
- MEDDxAgent: A Unified Modular Agent Framework for Explainable Automatic Differential Diagnosis
- A Training-freeLLM-based Approach to GeneralChinese Character Error Correction
- HSCR: Hierarchical Self-Contrastive Rewarding for Aligning Medical Vision Language Models
- MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
- SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
- Recent Advances in Speech Language Models: A Survey
- LexCLiPR: Cross-Lingual Paragraph Retrieval from Legal Judgments
- Multi-task Adversarial Attacks against Black-box Model with Few-shot Queries
- SPECTRA: Faster Large Language Model Inference with Optimized Internal and External Speculation
- Multi-level Association Refinement Network for Dialogue Aspect-based Sentiment Quadruple Analysis
- Innovative Image Fraud Detection with Cross-Sample Anomaly Analysis: The Power ofLLMs
- Cooperative or Competitive? Understanding the Interaction between Attention Heads From A Game Theory Perspective
- MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification
- Graph-Structured Trajectory Extraction from Travelogues
- Learning First-Order Logic Rules for Argumentation Mining
- Investigating and Enhancing the Robustness of Large Multimodal Models Against Temporal Inconsistency
- UniRAG: Unified Query Understanding Method for Retrieval Augmented Generation
- Contextual Experience Replay for Self-Improvement of Language Agents
- Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning
- Towards Comprehensive Argument Analysis in Education: Dataset, Tasks, and Method
- Browsing Like Human: A Multimodal Web Agent with Experiential Fast-and-Slow Thinking
- MaXIFE: Multilingual and Cross-lingual Instruction Following Evaluation
- Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning
- CanMLLMs Understand the Deep Implication BehindChinese Images?
- KazMMLU: Evaluating Language Models onKazakh,Russian, and Regional Knowledge ofKazakhstan
- Towards Multi-dimensional Evaluation ofLLMSummarization across Domains and Languages
- ClusterAttn:KVCache Compression under Intrinsic Attention Clustering
- SHARE: Shared Memory-Aware Open-Domain Long-Term Dialogue Dataset Constructed from Movie Script
- Incongruity-aware Tension Field Network for Multi-modal Sarcasm Detection
- Instruction Tuning on Public Government and Cultural Data for Low-Resource Language: a Case Study inKazakh
- Stealing Training Data from Large Language Models in Decentralized Training through Activation Inversion Attack
- From Selection to Generation: A Survey ofLLM-based Active Learning
- OmniFlatten: An End-to-endGPTModel for Seamless Voice Conversation
- DoMIX: An Efficient Framework for Exploiting Domain Knowledge in Fine-Tuning
- EAGLE: Expert-Guided Self-Enhancement for Preference Alignment in Pathology Large Vision-Language Model
- CoT-ICLLab: A Synthetic Framework for Studying Chain-of-Thought Learning from In-Context Demonstrations
- Flexora: Flexible Low-Rank Adaptation for Large Language Models
- QDTSynth: Quality-Driven Formal Theorem Synthesis for Enhancing Proving Performance ofLLMs
- RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought
- QAEval: Mixture of Evaluators for Question-Answering Task Evaluation
- Debiasing the Fine-Grained Classification Task inLLMs with Bias-AwarePEFT
- Demystifying Small Language Models for Edge Deployment
- Adapt Once, Thrive with Updates: Transferable Parameter-Efficient Fine-Tuning on Evolving Base Models
- Can Vision-Language Models Evaluate Handwritten Math?
- Continual Gradient Low-Rank Projection Fine-Tuning forLLMs
- Towards Objective Fine-tuning: HowLLMs’ Prior Knowledge Causes Potential Poor Calibration?
- Towards RobustESGAnalysis Against Greenwashing Risks: Aspect-Action Analysis with Cross-Category Generalization
- HiddenDetect: Detecting Jailbreak Attacks against Multimodal Large Language Models via Monitoring Hidden States
- SwiLTra-Bench: TheSwiss Legal Translation Benchmark
- Two Intermediate Translations Are Better Than One: Fine-tuningLLMs for Document-level Translation Refinement
- Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
- CanLLMs Ground when they (Don’t) Know: A Study on Direct and Loaded Political Questions
- GraphCheck: Breaking Long-Term Text Barriers with Extracted Knowledge Graph-Powered Fact-Checking
- SCULPT: Systematic Tuning of Long Prompts
- Crab: A Novel Configurable Role-PlayingLLMwith Assessing Benchmark
- ChineseSafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
- TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data Synthesis
- Cross-Lingual Optimization for Language Transfer in Large Language Models
- CART: A Generative Cross-Modal Retrieval Framework With Coarse-To-Fine Semantic Modeling
- MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
- Cheems: A Practical Guidance for Building and EvaluatingChinese Reward Models from Scratch
- Why Safeguarded Ships Run Aground? Aligned Large Language Models’ Safety Mechanisms Tend to Be Anchored in The Template Region
- LLaVASteering: Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering
- Efficient Long Context Language Model Retrieval with Compression
- Ontology-Guided Reverse Thinking Makes Large Language Models Stronger on Knowledge Graph Question Answering
- Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications
- Predicting Turn-Taking and Backchannel in Human-Machine Conversations Using Linguistic, Acoustic, and Visual Signals
- A New Formulation ofZipf’s Meaning-Frequency Law through Contextual Diversity
- The Mirage of Model Editing: Revisiting Evaluation in the Wild
- LAQuer: Localized Attribution Queries in Content-grounded Generation
- EPO: Explicit Policy Optimization for Strategic Reasoning inLLMs via Reinforcement Learning
- DCG-SQL: Enhancing In-Context Learning for Text-to-SQLwith Deep Contextual Schema Link Graph
- PreP-OCR: A Complete Pipeline for Document Image Restoration and EnhancedOCRAccuracy
- Digest the Knowledge: Large Language Models empowered Message Passing for Knowledge Graph Question Answering
- RecLM: Recommendation Instruction Tuning
- DS2-ABSA: Dual-Stream Data Synthesis with Label Refinement for Few-Shot Aspect-Based Sentiment Analysis
- MISP-Meeting: A Real-World Dataset with Multimodal Cues for Long-form Meeting Transcription and Summarization
- Learning Together to Perform Better: Teaching Small-ScaleLLMs to Collaborate via Preferential Rationale Tuning
- MolRAG: Unlocking the Power of Large Language Models for Molecular Property Prediction
- SkillAggregation: Reference-freeLLM-Dependent Aggregation
- MasRouter: Learning to RouteLLMs for Multi-Agent Systems
- Beyond Single Labels: Improving Conversational Recommendation throughLLM-Powered Data Augmentation
- Beyond One-Size-Fits-All: Tailored Benchmarks for Efficient Evaluation
- iQUEST: An Iterative Question-Guided Framework for Knowledge Base Question Answering
- IRT-Router: Effective and Interpretable Multi-LLMRouting via Item Response Theory
- MLAS-LoRA: Language-Aware Parameters Detection andLoRA-Based Knowledge Transfer for Multilingual Machine Translation
- M2RC-EVAL: Massively Multilingual Repository-level Code Completion Evaluation
- Evaluating Design Decisions for Dual Encoder-based Entity Disambiguation
- How to Compare Things Properly? A Study of Argument Relevance in Comparative Question Answering
- FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and Challenging
- Controllable Style Arithmetic with Language Models
- Masks Can be Learned as an Alternative to Experts
- Program Synthesis Benchmark for Visual Programming inXLogoOnline Environment
- Removal of Hallucination on Hallucination: Debate-AugmentedRAG
- CodeDPO: Aligning Code Models with Self Generated and Verified Source Code
- ProxAnn: Use-Oriented Evaluations of Topic Models and Document Clustering
- BOOKWORLD: From Novels to Interactive Agent Societies for Story Creation
- Quantifying Lexical Semantic Shift via Unbalanced Optimal Transport
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
- Adaptive and Robust Translation from Natural Language to Multi-model Query Languages
- SAKE: Steering Activations for Knowledge Editing
- Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-TunedLLMs
- Can External Validation Tools Improve Annotation Quality forLLM-as-a-Judge?
- One for All: Update Parameterized Knowledge Across Multiple Models with Once Edit
- VLMInferSlow: Evaluating the Efficiency Robustness of Large Vision-Language Models as a Service
- The Alternative Annotator Test forLLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators withLLMs
- CrisisTS: Coupling Social Media Textual Data and Meteorological Time Series for Urgency Classification
- How to Mitigate Overfitting in Weak-to-strong Generalization?
- Com2: A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language Models
- Dynamic Head Selection for Neural Lexicalized Constituency Parsing
- My Words Imply Your Opinion: Reader Agent-Based Propagation Enhancement for Personalized Implicit Emotion Analysis
- EvolveBench: A Comprehensive Benchmark for Assessing Temporal Awareness inLLMs on Evolving Knowledge
- EnablingLLMKnowledge Analysis via Extensive Materialization
- Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
- Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction inLLMs
- CritiQ: Mining Data Quality Criteria from Human Preferences
- Theoretical Guarantees for MinimumBayes Risk Decoding
- Mutual-Taught for Co-adapting Policy and Reward Models
- Enhancing Cross-Lingual Transfer through Reversible Transliteration: AHuffman-Based Approach for Low-Resource Languages
- Unmasking Style Sensitivity: A Causal Analysis of Bias Evaluation Instability in Large Language Models
- MockConf: A Student Interpretation Dataset: Analysis, Word- and Span-level Alignment and Baselines
- BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning
- What Matters in Evaluating Book-Length Stories? A Systematic Study of Long Story Evaluation
- PROPER: A Progressive Learning Framework for Personalized Large Language Models with Group-Level Adaptation
- Enhancing Event-centric News Cluster Summarization via Data Sharpening and Localization Insights
- MMBoundary: AdvancingMLLMKnowledge Boundary Awareness through Reasoning Step Confidence Calibration
- LIFBench: Evaluating the Instruction Following Performance and Stability of Large Language Models in Long-Context Scenarios
- Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering
- M2S: Multi-turn to Single-turn jailbreak in Red Teaming forLLMs
- RAEmoLLM: Retrieval AugmentedLLMs for Cross-Domain Misinformation Detection Using In-Context Learning Based on Emotional Information
- Task-Specific Information Decomposition for End-to-End Dense Video Captioning
- CalibraEval: Calibrating Prediction Distribution to Mitigate Selection Bias inLLMs-as-Judges
- Explaining Matters: Leveraging Definitions and Semantic Expansion for Sexism Detection
- Private Memorization Editing: Turning Memorization into a Defense to Strengthen Data Privacy in Large Language Models
- PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning
- Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information
- Velocitune: A Velocity-based Dynamic Domain Reweighting Method for Continual Pre-training
- Sheep’s Skin, Wolf’s Deeds: AreLLMs Ready for Metaphorical Implicit Hate Speech?
- Neuron-Level Sequential Editing for Large Language Models
- Automatic Expert Discovery inLLMUpcycling via Sparse Interpolated Mixture-of-Experts
- SimulS2S-LLM: Unlocking Simultaneous Inference of SpeechLLMs for Speech-to-Speech Translation
- VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models
- RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation
- The Role of Deductive and Inductive Reasoning in Large Language Models
- Disentangling the Roles of Representation and Selection in Data Pruning
- FRACTAL: Fine-Grained Scoring from Aggregate Text Labels
- ACT: Knowledgeable Agents to Design and Perform Complex Tasks
- Logical forms complement probability in understanding language model (and human) performance
- Length Controlled Generation for Black-boxLLMs
- Improving Contextual Faithfulness of Large Language Models via Retrieval Heads-Induced Optimization
- Global Eye: Breaking the “Fixed Thinking Pattern” during the Instruction Expansion Process
- On Synthesizing Data for Context Attribution in Question Answering
- TST: A Schema-Based Top-Down and Dynamic-Aware Agent of Text-to-Table Tasks
- EventRAG: EnhancingLLMGeneration with Event Knowledge Graphs
- Analyzing the Rapid Generalization ofSFTvia the Perspective of Attention Head Activation Patterns
- Can’t See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for MultimodalLLMs
- Mis-prompt: Benchmarking Large Language Models for Proactive Error Handling
- TripCraft: A Benchmark for Spatio-Temporally Fine Grained Travel Planning
- DualGuard: A Parameter Space Transformation Approach for Bidirectional Defense in Split-BasedLLMFine-Tuning
- Movie101v2: Improved Movie Narration Benchmark
- CanLLMs Evaluate Complex Attribution inQA? Automatic Benchmarking using Knowledge Graphs
- Value Portrait: Assessing Language Models’ Values through Psychometrically and Ecologically Valid Items
- FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation
- Do not Abstain! Identify and Solve the Uncertainty
- Decoding by Contrasting Knowledge: Enhancing Large Language Model Confidence on Edited Facts
- ImpliHateVid: A Benchmark Dataset and Two-stage Contrastive Learning Framework for Implicit Hate Speech Detection in Videos
- Improving Chain-of-Thought Reasoning via Quasi-Symbolic Abstractions
- Information Extraction from Visually Rich Documents usingLLM-based Organization of Documents into Independent Textual Segments
- Enhancing Open-Domain Task-Solving Capability ofLLMs via Autonomous Tool Integration fromGitHub
- LLMs Can Simulate Standardized Patients via Agent Coevolution
- Donate or Create? Comparing Data Collection Strategies for Emotion-labeled Multimodal Social Media Posts
- Which Demographics doLLMs Default to During Annotation?
- Can You Really Trust Code Copilot? Evaluating Large Language Models from a Code Security Perspective
- From Sub-Ability Diagnosis to Human-Aligned Generation: Bridging the Gap for Text Length Control viaMarkerGen
- AGD: Adversarial Game Defense Against Jailbreak Attacks in Large Language Models
- SCOP: Evaluating the Comprehension Process of Large Language Models from a Cognitive View
- Table-Critic: A Multi-Agent Framework for Collaborative Criticism and Refinement in Table Reasoning
- An Expanded Massive Multilingual Dataset for High-Performance Language Technologies (HPLT)
- Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
- Hierarchical Attention Generates Better Proofs
- Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents
- It’s Not Bragging If You Can Back It Up: CanLLMs Understand Braggings?
- A Troublemaker with Contagious Jailbreak Makes Chaos in Honest Towns
- Meta-Learning Neural Mechanisms rather thanBayesian Priors
- Shifting from Ranking to Set Selection for Retrieval Augmented Generation
- Understanding Large Language Model Vulnerabilities to Social Bias Attacks
- ChatSOP: AnSOP-GuidedMCTSPlanning Framework for ControllableLLMDialogue Agents
- Pixel-Level Reasoning Segmentation via Multi-turn Conversations
- Fixing Distribution Shifts ofLLMSelf-Critique via On-Policy Self-Play Training
- Inferring Functionality of Attention Heads from their Parameters
- Faithful and RobustLLM-Driven Theorem Proving forNLIExplanations
- Revealing the Deceptiveness of Knowledge Editing: A Mechanistic Analysis of Superficial Editing
- Masking in Multi-hopQA: An Analysis of How Language Models Perform with Context Permutation
- From Human Reading toNLMUnderstanding: Evaluating the Role of Eye-Tracking Data in Encoder-Based Models
- Optimizing Question Semantic Space for Dynamic Retrieval-Augmented Multi-hop Question Answering
- Insight Over Sight: Exploring the Vision-Knowledge Conflicts in MultimodalLLMs
- SceneGenAgent: Precise Industrial Scene Generation with Coding Agent
- ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language Models
- Enhancing Text Editing for Grammatical Error Correction:Arabic as a Case Study
- From Isolates to Families: Using Neural Networks for Automated Language Affiliation
- ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models
- Less, but Better: Efficient Multilingual Expansion forLLMs via Layer-wise Mixture-of-Experts
- When Harry Meets Superman: The Role of The Interlocutor in Persona-Based Dialogue Generation
- ICRProbe: Tracking Hidden State Dynamics for Reliable Hallucination Detection inLLMs
- Revisit Self-Debugging with Self-Generated Tests for Code Generation
- InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training
- ExploringLLMs’ Ability to Spontaneously and Conditionally Modify Moral Expressions through Text Manipulation
- Mixture of Ordered Scoring Experts for Cross-prompt Essay Trait Scoring
- Sparse Logit Sampling: Accelerating Knowledge Distillation inLLMs
- Enhancing Spoken Discourse Modeling in Language Models Using Gestural Cues
- ExploraCoder: Advancing Code Generation for Multiple UnseenAPIs via Planning and Chained Exploration
- Segment First or Comprehend First? Explore the Limit of Unsupervised Word Segmentation with Large Language Models
- RUBY: An Effective Framework for Multi-Constraint Multi-Hop Question Generation
- Can Indirect Prompt Injection Attacks Be Detected and Removed?
- Identifying Open Challenges in Language Identification
- The Distracting Effect: Understanding Irrelevant Passages inRAG
- Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining for Extremely Low-Resource Languages
- Graphically Speaking: Unmasking Abuse in Social Media with Conversation Insights
- CodeTool: Enhancing Programmatic Tool Invocation ofLLMs via Process Supervision
- RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models
- Defense Against Prompt Injection Attack by Leveraging Attack Techniques
- Acquisition and Application of Novel Knowledge in Large Language Models
- DNCASR: End-to-End Training for Speaker-AttributedASR
- Exploring Persona Sentiment Sensitivity in Personalized Dialogue Generation
- AntiLeakBench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge
- LLM-Guided Semantic-Aware Clustering for Topic Modeling
- Hierarchical Bracketing Encodings for Dependency Parsing as Tagging
- OASIS: Order-Augmented Strategy for Improved Code Search
- Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
- OmniAlign-V: Towards Enhanced Alignment ofMLLMs with Human Preference
- Tree-KG: An Expandable Knowledge Graph Construction Framework for Knowledge-intensive Domains
- Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric
- Micro-Act: Mitigate Knowledge Conflict in Question Answering via Actionable Self-Reasoning
- Minimal Pair-Based Evaluation of Code-Switching
- DNASpeech: A Contextualized and Situated Text-to-Speech Dataset with Dialogues, Narratives and Actions
- LLaMA-Omni 2:LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
- Error Comparison Optimization for Large Language Models on Aspect-Based Sentiment Analysis
- TheAIGap: How Socioeconomic Status Affects Language Technology Interactions
- ProbingLLMs for Multilingual Discourse Generalization Through a Unified Label Set
- Crowdsource, Crawl, or Generate? CreatingSEA-VL, a Multicultural Vision-Language Dataset forSoutheastAsia
- Soundwave: Less is More for Speech-Text Alignment inLLMs
- RoToR: Towards More Reliable Responses for Order-Invariant Inputs
- GlobalMMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
- Improving Dialogue Discourse Parsing through Discourse-aware Utterance Clarification
- ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging inLLMs
- Words of Warmth: Trust and Sociability Norms for over 26kEnglish Words
- BehaviorBox: Automated Discovery of Fine-Grained Performance Differences Between Language Models
- HAF-RM: A Hybrid Alignment Framework for Reward Model Training
- CULEMO: Cultural Lenses on Emotion - BenchmarkingLLMs for Cross-Cultural Emotion Understanding
- DiffPO: Diffusion-styled Preference Optimization for Inference Time Alignment of Large Language Models
- MemeQA: Holistic Evaluation for Meme Understanding
- LoGU: Long-form Generation with Uncertainty Expressions
- KiRAG: Knowledge-Driven Iterative Retriever for Enhancing Retrieval-Augmented Generation
- Enhancing Lexicon-Based Text Embeddings with Large Language Models
- CoCoLex: Confidence-guided Copy-based Decoding for Grounded Legal Text Generation
- Beyond N-Grams: Rethinking Evaluation Metrics and Strategies for Multilingual Abstractive Summarization
- CC-Tuning: A Cross-Lingual Connection Mechanism for Improving Joint Multilingual Supervised Fine-Tuning
- SConU: Selective Conformal Uncertainty in Large Language Models
- MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval
- WhenGPTSpills the Tea: Comprehensive Assessment of Knowledge File Leakage inGPTs
- UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook
- KERL: Knowledge-Enhanced Personalized Recipe Recommendation using Large Language Models
- Multilingual Arbitration: Optimizing Data Pools to Accelerate Multilingual Progress
- Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language Models
- ChineseSimpleQA: AChinese Factuality Evaluation for Large Language Models
- PVP: An Image Dataset for Personalized Visual Persuasion with Persuasion Strategies, Viewer Characteristics, and Persuasiveness Ratings
- Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval
- TunableLLM-based Proactive Recommendation Agent
- AgentRM: Enhancing Agent Generalization with Reward Modeling
- From Outcomes to Processes: GuidingPRMLearning fromORMfor Inference-Time Alignment
- Segment-Based Attention Masking forGPTs
- Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity
- Bi-Tuning with Collaborative Information for ControllableLLM-based Sequential Recommendation
- A Modular Approach for ClinicalSLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment
- DIVEintoMoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts
- DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression
- Computation Mechanism BehindLLMPosition Generalization
- IPO: Your Language Model is Secretly a Preference Classifier
- Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up
- Déjà Vu? Decoding Repeated Reading from Eye Movements
- LLMs can be easily Confused by Instructional Distractions
- PlanGenLLMs: A Modern Survey ofLLMPlanning Capabilities
- IAM: Efficient Inference through Attention Mapping between Different-scaleLLMs
- nvAgent: Automated Data Visualization from Natural Language via Collaborative Agent Workflow
- ZIPA: A family of efficient models for multilingual phone recognition
- GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration
- Dynamic Evaluation with Cognitive Reasoning for Multi-turn Safety of Large Language Models
- From Tools to Teammates: EvaluatingLLMs in Multi-Session Coding Interactions
- Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks onLLMs via Removing Superfluous Constraints
- Multilingual Text-to-Image Generation Magnifies Gender Stereotypes
- Adversarial Alignment with Anchor Dragging Drift (A3D2): Multimodal Domain Adaptation with Partially Shifted Modalities
- A Reality Check on Context Utilisation for Retrieval-Augmented Generation
- CU-MAM: Coherence-Driven Unified Macro-Structures for Argument Mining
- Safer or Luckier?LLMs as Safety Evaluators Are Not Robust to Artifacts
- Text-to-ESBench: A Comprehensive Benchmark for Converting Natural Language toElasticsearch Query
- AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation
- DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal
- Steering off Course: Reliability Challenges in Steering Language Models
- Impartial Multi-task Representation Learning via Variance-invariant Probabilistic Decoding
- If Eleanor Rigby Had MetChatGPT: A Study on Loneliness in a Post-LLMWorld
- Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization on Multi-party Conversation
- Vulnerability ofLLMs to Vertically Aligned Text Manipulations
- AutoMixer: Checkpoint Artifacts as Automatic Data Mixers
- Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow
- Beyond Prompting: An Efficient Embedding Framework for Open-Domain Question Answering
- AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark
- We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
- Modeling the Evolution ofEnglish Noun Compounds with Feature-Rich Diachronic Compositionality Prediction
- What’s the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns
- V-Oracle: Making Progressive Reasoning in Deciphering Oracle Bones for You and Me
- Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension
- Improving Language and Modality Transfer in Translation by Character-level Modeling
- DialUp! Modeling the Language Continuum by Adapting Models to Dialects and Dialects to Models
- AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization inLLMs
- Modeling Complex Semantics Relation with Contrastively Fine-Tuned Relational Encoders
- Error-driven Data-efficient Large Multimodal Model Tuning
- Planning with Diffusion Models for Target-Oriented Dialogue Systems
- Interactive and Expressive Code-Augmented Planning with Large Language Models
- Synergistic Weak-Strong Collaboration by Aligning Preferences
- Understanding Silent Data Corruption inLLMTraining
- Align-SLM: Textless Spoken Language Models with Reinforcement Learning fromAIFeedback
- CanLLMs Help Uncover Insights aboutLLMs? A Large-Scale, Evolving Literature Analysis of FrontierLLMs
- BIG5-CHAT: ShapingLLMPersonalities Through Training on Human-Grounded Data
- Deep Temporal Reasoning in Video Language Models: A Cross-Linguistic Evaluation of Action Duration and Completion through Perfect Times
- Amplifying Trans and Nonbinary Voices: A Community-Centred Harm Taxonomy forLLMs
- Enhancing Human Evaluation in Machine Translation with Comparative Judgement
- Infogen: Generating Complex Statistical Infographics from Documents
- Partial Colexifications Improve Concept Embeddings
- Improved Unbiased Watermark for Large Language Models
- MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection
- Multi-Attribute Steering of Language Models via Targeted Intervention
- AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations
- CanLLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation onAIResearch Papers
- On the Acquisition of Shared Grammatical Representations in Bilingual Language Models
- Using Shapley interactions to understand how models use structure
- Adversarial Tokenization
- Classifying Unreliable Narrators with Large Language Models
- ConceptCarve: Dynamic Realization of Evidence
- QQSUM: A Novel Task and Model of Quantitative Query-Focused Summarization for Review-based Product Question Answering
- Navigating Rifts in Human-LLMGrounding: Study and Benchmark
- Substance over Style: Evaluating Proactive Conversational Coaching Agents
- Open-World Planning via Lifted Regression withLLM-Inferred Affordances for Embodied Agents
- (RSA)²: A Rhetorical-Strategy-Aware Rational Speech Act Framework for Figurative Language Understanding
- SYNTHIA: Novel Concept Design with Affordance Composition
- Consistent Client Simulation for Motivational Interviewing-based Counseling
- AUTALIC: A Dataset for Anti-AUTistic Ableist Language In Context
- Structural Reasoning Improves Molecular Understanding ofLLM
- CAMI: A Counselor Agent Supporting Motivational Interviewing through State Inference and Topic Exploration
- Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles
- Targeted Syntactic Evaluation for Grammatical Error Correction
- VF-Eval: Evaluating MultimodalLLMs for Generating Feedback onAIGCVideos
- Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions
- TESS2: A Large-Scale Generalist Diffusion Language Model
- KatFishNet: DetectingLLM-GeneratedKorean Text through Linguistic Feature Analysis
- Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL
- On Generalization across Measurement Systems:LLMs Entail More Test-Time Compute for Underrepresented Cultures
- CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?
- Veracity Bias and Beyond: UncoveringLLMs’ Hidden Beliefs in Problem-Solving Reasoning
- Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization
- LLMMeets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study
- Beyond Frameworks: Unpacking Collaboration Strategies in Multi-Agent Systems
- The Invisible Hand: Unveiling Provider Bias in Large Language Models for Code Generation
- K/DA: Automated Data Generation Pipeline for Detoxifying Implicitly Offensive Language inKorean
- THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation
- Neuron Empirical Gradient: Discovering and Quantifying Neurons’ Global Linear Controllability
- Can Third Parties Read Our Emotions?
- OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching
- World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning
- JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks AgainstLLMs
- CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models
- Neural Incompatibility: The Unbridgeable Gap of Cross-Scale Parametric Knowledge Transfer in Large Language Models
- Enhancing Mathematical Reasoning inLLMs by Stepwise Correction
- PsyDial: A Large-scale Long-term Conversational Dataset for Mental Health Support
- Enhancing Goal-oriented Proactive Dialogue Systems via Consistency Reflection and Correction
- Exclusion of Thought: Mitigating Cognitive Load in Large Language Models for Enhanced Reasoning in Multiple-Choice Tasks
- Registering Source Tokens to Target Language Spaces in Multilingual Neural Machine Translation
- VisuoThink: EmpoweringLVLMReasoning with Multimodal Tree Search
- AutomatedCADModeling Sequence Generation from Text Descriptions via Transformer-Based Large Language Models
- LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint
- Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback
- PerSphere: A Comprehensive Framework for Multi-Faceted Perspective Retrieval and Summarization
- Prompt-Guided Internal States for Hallucination Detection of Large Language Models
- Typology-Guided Adaptation in Multilingual Models
- Don’t Erase, Inform! Detecting and Contextualizing Harmful Language in Cultural Heritage Collections
- ECLM: Entity Level Language Model for Spoken Language Understanding with Chain of Intent
- FaithfulRAG: Fact-Level Conflict Modeling for Context-Faithful Retrieval-Augmented Generation
- Knowledge Image Matters: Improving Knowledge-Based Visual Reasoning with Multi-Image Large Language Models
- Evaluating Personalized Tool-AugmentedLLMs from the Perspectives of Personalization and Proactivity
- GUICourse: From General Vision Language Model to VersatileGUIAgent
- Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLMCollaboration
- Maximizing the Effectiveness of LargerBERTModels for Compression
- CanLLMs Reason About Program Semantics? A Comprehensive Evaluation ofLLMs on Formal Specification Inference
- HACo-Det: A Study Towards Fine-Grained Machine-Generated Text Detection under Human-AICoauthoring
- IndicSynth: A Large-Scale Multilingual Synthetic Speech Dataset for Low-ResourceIndian Languages
- ReinforcedIR: A Self-Boosting Framework For Domain-Adapted Information Retrieval
- CoIR: A Comprehensive Benchmark for Code Information Retrieval Models
- Enhancing Multimodal Retrieval via Complementary Information Extraction and Alignment
- JoPA: Explaining Large Language Model’s Generation via Joint Prompt Attribution
- Proxy-Driven Robust Multimodal Sentiment Analysis with Incomplete Data
- Not All Terms Matter: Recall-Oriented Adaptive Learning forPLM-aided Query Expansion in Open-Domain Question Answering
- A Mutual Information Perspective on Knowledge Graph Embedding
- Aligned but Blind: Alignment Increases Implicit Bias by Reducing Awareness of Race
- IOPO: EmpoweringLLMs with Complex Instruction Following via Input-Output Preference Optimization
- ProMALex: Progressive Modular Adapters for Multi-Jurisdictional Legal Language Modeling
- Flipping Knowledge Distillation: Leveraging Small Models’ Expertise to EnhanceLLMs in Text Matching
- Disentangling Language and Culture for Evaluating Multilingual Large Language Models
- Detecting Sockpuppetry onWikipedia Using Meta-Learning
- Diversity-oriented Data Augmentation with Large Language Models
- CoreEval: Automatically Building Contamination-Resilient Datasets with Real-World Knowledge toward ReliableLLMEvaluation
- RiOT: Efficient Prompt Refinement with Residual Optimization Tree
- Caution for the Environment: MultimodalLLMAgents are Susceptible to Environmental Distractions
- Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark
- Mitigating Lost-in-Retrieval Problems in Retrieval Augmented Multi-Hop Question Answering
- TableLoRA: Low-rank Adaptation on Table Structure Understanding for Large Language Models
- Condor: EnhanceLLMAlignment with Knowledge-Driven Data Synthesis and Refinement
- CulFiT: A Fine-grained Cultural-awareLLMTraining Paradigm via Multilingual Critique Data Synthesis
- Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis
- ChartLens: Fine-grained Visual Attribution in Charts
- LESA: LearnableLLMLayer Scaling-Up
- MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation
- Towards the Law of Capacity Gap in Distilling Language Models
- WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning
- Keys to Robust Edits: From Theoretical Insights to Practical Advances
- BoostingLLM’s Molecular Structure Elucidation with Knowledge Enhanced Tree Search Reasoning
- MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation
- The Role of Visual Modality in Multimodal Mathematical Reasoning: Challenges and Insights
- The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters
- S2R: TeachingLLMs to Self-verify and Self-correct via Reinforcement Learning
- Advancing Collaborative Debates with Role Differentiation through Multi-Agent Reinforcement Learning
- Retrieval-Augmented Fine-Tuning With Preference Optimization For Visual Program Generation
- STRICTA: Structured Reasoning in Critical Text Assessment for Peer Review and Beyond
- XDAC:XAI-Driven Detection and Attribution ofLLM-Generated News Comments inKorean
- CENTAUR: Bridging the Impossible Trinity of Privacy, Efficiency, and Performance in Privacy-Preserving Transformer Inference
- Silencing Empowerment, Allowing Bigotry: Auditing the Moderation of Hate Speech on Twitch
- EdiText: Controllable Coarse-to-Fine Text Editing with Diffusion Language Models
- TUMLU: A Unified and Native Language Understanding Benchmark forTurkic Languages
- Look Both Ways and No Sink: ConvertingLLMs into Text Encoders without Training
- A Statistical and Multi-Perspective Revisiting of the Membership Inference Attack in Large Language Models
- Around the World in 24 Hours: ProbingLLMKnowledge of Time and Place
- Mining the uncertainty patterns of humans and models in the annotation of moral foundations and human values
- “What do you call a dog that is incontrovertibly true? Dogma”: TestingLLMGeneralization through Humor
- Towards Harmonized Uncertainty Estimation for Large Language Models
- VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare
- Are We in theAI-Generated Text World Already? Quantifying and MonitoringAIGTon Social Media
- FromEnglish to Second Language Mastery: EnhancingLLMs with Cross-Lingual Continued Instruction Tuning
- WET: Overcoming Paraphrasing Vulnerabilities in Embeddings-as-a-Service with Linear Transformation Watermarks
- HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation
- OneQuantLLMforALL: Fine-tuning QuantizedLLMs Once for Efficient Deployments
- Beyond Logits: Aligning Feature Dynamics for Effective Knowledge Distillation
- Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
- DRAE: Dynamic Retrieval-Augmented Expert Networks for Lifelong Learning and Task Adaptation in Robotics
- MT-RAIG: Novel Benchmark and Evaluation Framework for Retrieval-Augmented Insight Generation over Multiple Tables
- Enhancing Chain-of-Thought Reasoning with Critical Representation Fine-tuning
- Does the Emotional Understanding ofLVLMs Vary Under High-Stress Environments and Across Different Demographic Attributes?
- S2WTM: Spherical Sliced-Wasserstein Autoencoder for Topic Modeling
- Learning to Look at the Other Side: A Semantic Probing Study of Word Embeddings inLLMs with Enabled Bidirectional Attention
- Tracing and Dissecting HowLLMs Recall Factual Knowledge for Real World Questions
- Employing Discourse Coherence Enhancement to Improve Cross-Document Event and Entity Coreference Resolution
- Data Whisperer: Efficient Data Selection for Task-SpecificLLMFine-Tuning via Few-Shot In-Context Learning
- Synthesizing Post-Training Data forLLMs through Multi-Agent Simulation
- SoftCoT: Soft Chain-of-Thought for Efficient Reasoning withLLMs
- FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning
- Beyond Prompt Engineering: Robust Behavior Control inLLMs via Steering Target Atoms
- MobiLoRA: AcceleratingLoRA-basedLLMInference on Mobile Devices via Context-awareKVCache Optimization
- Language Models Resist Alignment: Evidence From Data Compression
- Beyond the Answer: Advancing Multi-HopQAwith Fine-Grained Graph Reasoning and Evaluation
- Mamba Knockout for Unraveling Factual Information Flow
- Small Changes, Big Impact: How Manipulating a Few Neurons Can Drastically AlterLLMAggression
- Marco-o1 v2: Towards Widening The Distillation Bottleneck for Reasoning Models
- Curiosity-Driven Reinforcement Learning from Human Feedback
- T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grainedAIFeedback
- CoE: A Clue of Emotion Framework for Emotion Recognition in Conversations
- MPO: Multilingual Safety Alignment via Reward Gap Optimization
- QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
- On the Relation Between Fine-Tuning, Topological Properties, and Task Performance in Sense-Enhanced Embeddings
- Finding Needles in Images: Can Multi-modalLLMs Locate Fine Details?
- Don’t Half-listen: Capturing Key-part Information in Continual Instruction Tuning
- Generating Plausible Distractors for Multiple-Choice Questions via Student Choice Prediction
- Exploring Explanations Improves the Robustness of In-Context Learning
- Prediction Hubs are Context-Informed Frequent Tokens inLLMs
- Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling Law
- CRUXEVAL-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution
- Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs
- Rubrik’s Cube: Testing a New Rubric for Evaluating Explanations on theCUBEdataset
- A Dual-Mind Framework for Strategic and Expressive Negotiation Agent
- Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language Models
- Revisiting Scaling Laws for Language Models: The Role of Data Quality and Training Strategies
- Limited Generalizability in Argument Mining: State-Of-The-Art Models Learn Datasets, Not Arguments
- Enhancing Machine Translation with Self-Supervised Preference Data
- Unveil: Unified Visual-Textual Integration and Distillation for Multi-modal Document Retrieval
- Don’t Get Lost in the Trees: StreamliningLLMReasoning by Overcoming Tree Search Exploration Pitfalls
- MEXMA: Token-level objectives improve sentence representations
- Uncertainty-Aware Iterative Preference Optimization for EnhancedLLMReasoning
- AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-PerformanceLLM-Based Multi-Agent Collaboration
- Towards Dynamic Theory of Mind: EvaluatingLLMAdaptation to Temporal Evolution of Human States
- Marco-Bench-MIF: On Multilingual Instruction-Following Capability of Large Language
- Representation Bending for Large Language Model Safety
- AnalyzingLLMs’ Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations
- Enhancing Retrieval-Augmented Generation via Evidence Tree Search
- HalluLens:LLMHallucination Benchmark
- DEEPERInsight into Your User: Directed Persona Refinement for Dynamic Persona Modeling
- Asclepius: A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models
- InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning
- GRaMPa: Subword Regularisation by Skewing Uniform Segmentation Distributions with an Efficient Path-countingMarkov Model
- Evaluating the Evaluation of Diversity in Commonsense Generation
- Generate First, Then Sample: Enhancing Fake News Detection withLLM-Augmented Reinforced Sampling
- ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions withLLM-Generated Data
- Towards Fully ExploitingLLMInternal States to Enhance Knowledge Boundary Perception
- ALGEN: Few-shot Inversion Attacks on Textual Embeddings via Cross-Model Alignment and Generation
- Decoding on Graphs: Faithful and Sound Reasoning on Knowledge Graphs through Generation of Well-Formed Chains
- STaR-SQL: Self-Taught Reasoner for Text-to-SQL
- Fairness Beyond Performance: Revealing Reliability Disparities Across Groups in LegalNLP
- Beyond Similarity: A Gradient-based Graph Method for Instruction Tuning Data Selection
- FastMCTS: A Simple Sampling Strategy for Data Synthesis
- Dialogue-RAG: Enhancing Retrieval forLLMs via Node-Linking Utterance Rewriting
- Using Information Theory to Characterize Prosodic Typology: The Case of Tone, Pitch-Accent and Stress-Accent
- EvaluatingLLMs forPortuguese Sentence Simplification with Linguistic Insights
- LaTIM: Measuring Latent Token-to-Token Interactions in Mamba Models
- Improving Low-Resource Morphological Inflection via Self-Supervised Objectives
- Don’t Reinvent the Wheel: Efficient Instruction-Following Text Embedding based on Guided Space Transformation
- BOOKCOREF: Coreference Resolution at Book Scale
- OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval
- Alleviating Hallucinations from Knowledge Misalignment in Large Language Models via Selective Abstention Learning
- Retrospective Learning from Interactions
- Personalized Generation In Large Model Era: A Survey
- Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to EnhanceLLMReasoning
- SOTOPIA-Ω: Dynamic Strategy Injection Learning and Social Instruction Following Evaluation for Social Agents
- Can Language Models Replace Programmers for Coding?REPOCODSays ‘Not Yet’
- Leveraging In-Context Learning for Political Bias Testing ofLLMs
- ACORD: An Expert-Annotated Retrieval Dataset for Legal Contract Drafting
- LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts
- WAFFLE: Fine-tuning Multi-Modal Model for Automated Front-End Development
- Math Neurosurgery: Isolating Language Models’ Math Reasoning Abilities Using Only Forward Passes
- MultipleLLMAgents Debate for Equitable Cultural Alignment
- RefreshKV: Updating SmallKVCache During Long-form Generation
- SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings
- Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective
- Language Models Grow Less Humanlike beyond Phase Transition
- PCoT: Persuasion-Augmented Chain of Thought for Detecting Fake News and Social Media Disinformation
- Coordinating Chaos: A Structured Review of Linguistic Coordination Methodologies
- iNews: A Multimodal Dataset for Modeling Personalized Affective Responses to News
- Mind the Gesture: EvaluatingAISensitivity to Culturally Offensive Non-Verbal Gestures
- 500xCompressor: Generalized Prompt Compression for Large Language Models
- Estimating Privacy Leakage of Augmented Contextual Knowledge in Language Models
- Document-Level Event-Argument Data Augmentation for Challenging Role Types
- Mapping the Podcast Ecosystem with the Structured Podcast Research Corpus
- Unravelling the Logic: Investigating the Generalisation of Transformers in Numerical Satisfiability Problems
- The Nature ofNLP: Analyzing Contributions inNLPPapers
- GeLLM³O: Generalizing Large Language Models for Multi-property Molecule Optimization
- Follow-up Question Generation For Enhanced Patient-Provider Conversations
- Unveiling Privacy Risks inLLMAgent Memory
- Watching the Watchers: Exposing Gender Disparities in Machine Translation Quality Estimation
- Language Constrained Multimodal Hyper Adapter For Many-to-Many Multimodal Summarization
- PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
- Efficient Ensemble for Fine-tuning Language Models on Multiple Datasets
- Library-Like Behavior In Language Models is Enhanced by Self-Referencing Causal Cycles
- Shaping the Safety Boundaries: Understanding and Defending Against Jailbreaks in Large Language Models
- ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution
- ReflectDiffu: Reflect between Emotion-intent Contagion and Mimicry for Empathetic Response Generation via aRL-Diffusion Framework
- SARA: Salience-Aware Reinforced Adaptive Decoding for Large Language Models in Abstractive Summarization
- Embedding-Converter: A Unified Framework for Cross-Model Embedding Transformation
- Improving Automatic Evaluation of Large Language Models (LLMs) in Biomedical Relation Extraction viaLLMs-as-the-Judge
- Answering Complex Geographic Questions by Adaptive Reasoning with Visual Context and External Commonsense Knowledge
- Safety Alignment via Constrained Knowledge Unlearning
- Response Wide Shut? Surprising Observations in Basic Vision Language Model Capabilities
- EffiVLM-BENCH: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Vision-Language Models
- Pre-Training Curriculum for Multi-Token Prediction in Language Models
- Can We Further Elicit Reasoning inLLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks
- On Many-Shot In-Context Learning for Long-Context Evaluation
- HelpSteer3: Human-Annotated Feedback and Edit Data to Empower Inference-Time Scaling in Open-Ended General-Domain Tasks
- CulturalBench: A Robust, Diverse and Challenging Benchmark for MeasuringLMs’ Cultural Knowledge Through Human-AIRed-Teaming
- Balancing the Budget: Understanding Trade-offs Between Supervised and Preference-Based Finetuning
- All That Glitters is Not Novel: Plagiarism inAIGenerated Research
- Writing Like the Best: Exemplar-Based Expository Text Generation
- Temporal Relation Extraction in Clinical Texts: A Span-based Graph Transformer Approach
- Finding A Voice: Exploring the Potential ofAfricanAmerican Dialect and Voice Generation for Chatbots
- Delta-KNN: Improving Demonstration Selection in In-Context Learning forAlzheimer’s Disease Detection
- Help Me Write a Story: EvaluatingLLMs’ Ability to Generate Writing Feedback
- Language Fusion for Parameter-Efficient Cross-lingual Transfer
- Culture is Not Trivia: Sociocultural Theory for CulturalNLP
- AAD-LLM: Neural Attention-Driven Auditory Scene Understanding
- Do Language Models Have Semantics? On the Five Standard Positions
- Dehumanizing Machines: Mitigating Anthropomorphic Behaviors in Text Generation Systems
- Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users
- HumTDumT: Measuring and controlling human-like language inLLMs
- ChatBench: From Static Benchmarks to Human-AIEvaluation
- Teaching an OldLLMSecure Coding: Localized Preference Optimization on Distilled Preferences
- Anything Goes? A Crosslinguistic Study of (Im)possible Language Learning inLMs
- Ranking Unraveled: Recipes forLLMRankings in Head-to-HeadAICombat
- LLMAgents Making Agent Tools
- CrafText Benchmark: Advancing Instruction Following in Complex Multimodal Open-Ended World
- QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation
- Causal Graph based Event Reasoning using Semantic Relation Experts
- LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning
- DoLLMs Understand Dialogues? A Case Study on Dialogue Acts
- Research Borderlands: Analysing Writing Across Research Cultures
- CEAES: Bidirectional Reinforcement Learning Optimization for Consistent and Explainable Essay Assessment
- DeAL: Decoding-time Alignment for Large Language Models
- Cultural Bias Matters: A Cross-Cultural Benchmark Dataset and Sentiment-Enriched Model for Understanding Multimodal Metaphors
- OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
- Mixtures of In-Context Learners
- Balancing Diversity and Risk inLLMSampling: How to Select Your Method and Parameter for Open-Ended Text Generation
- RADAR: Enhancing Radiology Report Generation with Supplementary Knowledge Injection
- CanLLMs DeceiveCLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates
- Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models
- MTSA: Multi-turn Safety Alignment forLLMs through Multi-round Red-teaming
- The Efficiency vs. Accuracy Trade-off: OptimizingRAG-EnhancedLLMRecommender Systems Using Multi-Head Early Exit
- UnravelingLoRAInterference: Orthogonal Subspaces for Robust Model Merging
- BIG-Bench Extra Hard
- CSTree-SRI: Introspection-Driven Cognitive Semantic Tree for Multi-Turn Question Answering over Extra-Long Contexts
- InductionBench:LLMs Fail in the Simplest Complexity Class
- RATIONALYST: Pre-training Process-Supervision for Improving Reasoning
- Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation
- AdvancingSMoEfor Continuous Domain Adaptation ofMLLMs: Adaptive Router and Domain-Specific Loss
- Multi-document Summarization through Multi-document Event Relation Graph Reasoning inLLMs: a case study in Framing Bias Mitigation
- Who Writes What: Unveiling the Impact of Author Roles onAI-generated Text Detection
- RoCoFT: Efficient Finetuning of Large Language Models with Row-Column Updates
- Scaling Laws and Efficient Inference for Ternary Language Models
- Exploring the Impact of Instruction-Tuning onLLM’s Susceptibility to Misinformation
- Do Language Models Understand Honorific Systems inJavanese?
- Generative Reward Modeling via Synthetic Criteria Preference Learning
- Exploring Multimodal Relation Extraction of Hierarchical Tabular Data with Multi-task Learning
- A Self-Denoising Model for Robust Few-Shot Relation Extraction
- QuASAR: A Question-Driven Structure-Aware Approach for Table-to-Text Generation
- Automated Structured Radiology Report Generation
- LPOI: Listwise Preference Optimization for Vision Language Models
- Predicting Through Generation: Why Generation Is Better for Prediction
- “Give MeBF16 or Give Me Death”? Accuracy-Performance Trade-Offs inLLMQuantization
- StitchLLM: ServingLLMs, One Block at a Time
- Walk in Others’ Shoes with a Single Glance: Human-Centric Visual Grounding with Top-View Perspective Transformation
- Is linguistically-motivated data augmentation worth it?
- From Lists to Emojis: How Format Bias Affects Model Alignment
- Colloquial SingaporeanEnglish Style Transfer with Fine-Grained Explainable Control
- From Informal to Formal – Incorporating and EvaluatingLLMs on Natural Language Requirements to Verifiable Formal Proofs
- CoAM: Corpus of All-Type Multiword Expressions
- SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation
- Exposing the Achilles’ Heel: EvaluatingLLMs Ability to Handle Mistakes in Mathematical Reasoning
- Understanding the Dark Side ofLLMs’ Intrinsic Self-Correction
- VideoVista-CulturalLingo: 360° Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension
- What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices
- Knowledge Graph Retrieval-Augmented Generation forLLM-based Recommendation
- SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment
- I0T: Embedding Standardization Method Towards Zero Modality Gap
- Odysseus Navigates the Sirens’ Song: Dynamic Focus Decoding for Factual and Diverse Open-Ended Text Generation
- Better Embeddings with CoupledAdam
- Bone Soups: A Seek-and-Soup Model Merging Approach for Controllable Multi-Objective Generation
- Controllable and Reliable Knowledge-Intensive Task-Oriented Conversational Agents with Declarative Genie Worksheets
- Benchmarking Long-Context Language Models on Long Code Understanding
- MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities
- Internal Value Alignment in Large Language Models through Controlled Value Vector Activation
- A Dual-PerspectiveNLGMeta-Evaluation Framework with Automatic Benchmark and Better Interpretability
- Recurrent Knowledge Identification and Fusion for Language Model Continual Learning
- Data-Constrained Synthesis of Training Data for De-Identification
- Just a Scratch: EnhancingLLMCapabilities for Self-harm Detection through Intent Differentiation and Emoji Interpretation
- Contrastive Learning onLLMBack Generation Treebank for Cross-domain Constituency Parsing
- MMDEND: Dendrite-Inspired Multi-Branch Multi-Compartment Parallel Spiking Neuron for Sequence Modeling
- Understanding Impact of Human Feedback via Influence Functions
- T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts
- InspireDebate: Multi-Dimensional Subjective-Objective Evaluation-Guided Reasoning and Optimization for Debating
- OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization
- FOCUS: Evaluating Pre-trained Vision-Language Models on Underspecification Reasoning
- Sightation Counts: Leveraging Sighted User Feedback in Building aBLV-aligned Dataset of Diagram Descriptions
- Personal Travel Solver: A Preference-DrivenLLM-Solver System for Travel Planning
- Counterspeech the ultimate shield! Multi-Conditioned Counterspeech Generation through Attributed Prefix Learning
- LLM×MapReduce: Simplified Long-Sequence Processing using Large Language Models
- CheXalign: Preference fine-tuning in chestX-ray interpretation models without human feedback
- Knowledge Tracing in Programming Education Integrating Students’ Questions
- PRISM: A Framework for Producing Interpretable Political Bias Embeddings with Political-Aware Cross-Encoder
- Representations of Fact, Fiction and Forecast in Large Language Models: Epistemics and Attitudes
- Lexical Diversity-aware Relevance Assessment for Retrieval-Augmented Generation
- Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
- Online Iterative Self-Alignment for Radiology Report Generation
- Chinese InertialGANfor Handwriting Signal Generation and Recognition
- LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges
- Evaluating Sequence Labeling on the basis of Information Theory
- GRAT: Guiding Retrieval-Augmented Reasoning through Process Rewards Tree Search
- T-REG: Preference Optimization with Token-Level Reward Regularization
- Gödel Agent: A Self-Referential Agent Framework for Recursively Self-Improvement
- AgentGym: Evaluating and Training Large Language Model-based Agents across Diverse Environments
- Rethinking the Role of Prompting Strategies inLLMTest-Time Scaling: A Perspective of Probability Theory
- Information Locality as an Inductive Bias for Neural Language Models
- Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models
- Query-driven Document-level Scientific Evidence Extraction from Biomedical Studies
- Towards Robust Universal Information Extraction: Dataset, Evaluation, and Solution
- Multi-perspective Alignment for Increasing Naturalness in Neural Machine Translation
- Temporal reasoning for timeline summarisation in social media
- Beyond Negative Stereotypes – Non-Negative Abusive Utterances about Identity Groups and Their Semantic Variants
- Persistent Homology of Topic Networks for the Prediction of Reader Curiosity
- Tokenisation isNP-Complete
- Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning
- Parameter-Aware Contrastive Knowledge Editing: Tracing and Rectifying based on Critical Transmission Paths
- Many Heads Are Better Than One: Improved Scientific Idea Generation by ALLM-Based Multi-Agent System
- Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking
- Document-Level Text Generation with MinimumBayes Risk Decoding using Optimal Transport
- Opt-Out: Investigating Entity-Level Unlearning for Large Language Models via Optimal Transport
- Mixture of Small and Large Models forChinese Spelling Check
- DISC: Plug-and-Play Decoding Intervention with Similarity of Characters forChinese Spelling Check
- Causal Estimation of Tokenisation Bias
- Value Residual Learning
- SGIC: A Self-Guided Iterative Calibration Framework forRAG
- NusaAksara: A Multimodal and Multilingual Benchmark for PreservingIndonesian Indigenous Scripts
- LLM-based Rumor Detection via Influence Guided Sample Selection and Game-based Perspective Analysis
- Hierarchical-Task-Aware Multi-modal Mixture of IncrementalLoRAExperts for Embodied Continual Learning
- SpindleKV: A NovelKVCache Reduction Method Balancing Both Shallow and Deep Layers
- Medical GraphRAG: Evidence-based Medical Large Language Model via Graph Retrieval-Augmented Generation
- Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models
- Agentic Reasoning: A Streamlined Framework for EnhancingLLMReasoning with Agentic Tools
- Probing Relative Interaction and Dynamic Calibration in Multi-modal Entity Alignment
- Learn to Memorize: Scalable Continual Learning in Semiparametric Models with Mixture-of-Neighbors Induction Memory
- Adverse Event Extraction from Discharge Summaries: A New Dataset, Annotation Scheme, and Initial Findings
- Speed Up Your Code: Progressive Code Acceleration Through Bidirectional Tree Editing
- Multi-Facet Blending for Faceted Query-by-Example Retrieval
- PIPER: Benchmarking and Prompting Event Reasoning Boundary ofLLMs via Debiasing-Distillation Enhanced Tuning
- MIR: Methodology Inspiration Retrieval for Scientific Research Problems
- Sticking to the Mean: Detecting Sticky Tokens in Text Embedding Models
- Memorizing is Not Enough: Deep Knowledge Injection Through Reasoning
- Improving Dialogue State Tracking through Combinatorial Search for In-Context Examples
- Pretraining Context Compressor for Large Language Models with Embedding-Based Memory
- Dialogue Systems for Emotional Support via Value Reinforcement
- Length-Induced Embedding Collapse inPLM-based Models
- SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction
- ERU-KG: Efficient Reference-aligned Unsupervised Keyphrase Generation
- Know Your Mistakes: Towards Preventing Overreliance on Task-Oriented ConversationalAIThrough Accountability Modeling
- LLMs Trust Humans More, That’s a Problem! Unveiling and Mitigating the Authority Bias in Retrieval-Augmented Generation
- Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool Invocation
- Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration
- PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment
- Robust Utility-Preserving Text Anonymization Based on Large Language Models
- SEAL: Scaling to Emphasize Attention for Long-Context Retrieval
- From Neurons to Semantics: Evaluating Cross-Linguistic Alignment Capabilities of Large Language Models via Neurons Alignment
- 𝒜3: Automatic Alignment Framework for Attributed Text Generation
- Towards Better Value Principles for Large Language Model Alignment: A Systematic Evaluation and Enhancement
- Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
- Diversity Explains Inference Scaling Laws: Through a Case Study of MinimumBayes Risk Decoding
- Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
- SDD: Self-Degraded Defense against Malicious Fine-tuning
- CoachMe: Decoding Sport Elements with a Reference-Based Coaching Instruction Generation Model
- DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization
- HowLLMs Comprehend Temporal Meaning in Narratives: A Case Study in Cognitive Evaluation ofLLMs
- Data Caricatures: On the Representation ofAfricanAmerican Language in Pretraining Corpora
- Language Model Probabilities areNotCalibrated in Numeric Contexts
- MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
- Cross-Lingual Auto Evaluation for Assessing MultilingualLLMs
- DeepReview: ImprovingLLM-based Paper Review with Human-like Deep Thinking Process
- Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient
- Tree-of-Debate: Multi-Persona Debate Trees Elicit Critical Thinking for Scientific Comparative Analysis
- Hierarchical Memory Organization forWikipedia Generation
- Class Distillation with Mahalanobis Contrast: An Efficient Training Paradigm for Pragmatic Language Understanding Tasks
- Structure-aware Domain Knowledge Injection for Large Language Models
- FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
- Dialectal Coverage And Generalization inArabic Speech Recognition
- EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits
- ReconsideringLLMUncertainty Estimation Methods in the Wild
- Bregman Conditional Random Fields: Sequence Labeling with Parallelizable Inference Algorithms
- SEE: Strategic Exploration and Exploitation for Cohesive In-Context Prompt Optimization
- Programming by Example meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction
- Synergizing Unsupervised Episode Detection withLLMs for Large-Scale News Events
- Beyond True or False: Retrieval-Augmented Hierarchical Analysis of Nuanced Claims
- The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection inLLMAgents
- Sandcastles in the Storm: Revisiting the (Im)possibility of Strong Watermarking
- Time-MQA: Time Series Multi-Task Question Answering with Context Enhancement
- From Perceptions to Decisions: Wildfire Evacuation Decision Prediction with Behavioral Theory-informedLLMs
- GETReason: Enhancing Image Context Extraction through Hierarchical Multi-Agent Reasoning
- Hanging in the Balance: Pivotal Moments in Crisis Counseling Conversations
- Unveiling the Potential ofBERT-family: A New Recipe for Building Scalable, General and Competitive Large Language Models
- TaxoAdapt: AligningLLM-Based Multidimensional Taxonomy Construction to Evolving Research Corpora
- An Empirical Study of Iterative Refinements for Non-autoregressive Translation
- Retrofitting Large Language Models with Dynamic Tokenization
- Principled Content Selection to Generate Diverse and Personalized Multi-Document Summaries
- Bilingual Zero-Shot Stance Detection
- GrammaMT: Improving Machine Translation with Grammar-Informed In-Context Learning
- Theorem Prover as a Judge for Synthetic Data Generation
- Measuring the Effect of Transcription Noise on Downstream Language Understanding Tasks
- Assessing Reliability and Political Bias InLLMs’ Judgements of Formal and Material Inferences With Partisan Conclusions
- PARME: Parallel Corpora for Low-ResourcedMiddleEastern Languages
- METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling
- ConLoan: A Contrastive Multilingual Dataset for Evaluating Loanwords
- A Theory of Response Sampling inLLMs: Part Descriptive and Part Prescriptive
- MEraser: An Effective Fingerprint Erasure Approach for Large Language Models
- VISA: Retrieval Augmented Generation with Visual Source Attribution
- DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers
- Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization inLLMs
- MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning
- Map&Make: Schema Guided Text to Table Generation
- IRIS: Interpretable Retrieval-Augmented Classification for Long Interspersed Document Sequences
- Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
- Can we Retrieve Everything All at Once?ARM: An Alignment-OrientedLLM-based Retrieval Method
- R2D2: Remembering, Replaying and Dynamic Decision Making with a Reflective Agentic Memory
- FairITales: Evaluation of Fairness inIndian Contexts with a Focus on Bias and Stereotypes
- SpeechIQ: Speech-Agentic Intelligence Quotient Across Cognitive Levels in Voice Understanding by Large Language Models
- Predicting Implicit Arguments in Procedural Video Instructions
- PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free
- CLIPErase: Efficient Unlearning of Visual-Textual Associations inCLIP
- ViGiL3D: A Linguistically Diverse Dataset for 3DVisual Grounding
- The time scale of redundancy between prosody and linguistic context
- Basic Reading Distillation
- Quantized Can Still Be Calibrated: A Unified Framework to Calibration in Quantized Large Language Models
- A Spatio-Temporal Point Process for Fine-Grained Modeling of Reading Behavior
- More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives
- AstuteRAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models
- SubLIME: Subset Selection via Rank Correlation Prediction for Data-EfficientLLMEvaluation
- M³GQA: A Multi-Entity Multi-Hop Multi-Setting Graph Question Answering Benchmark
- LSSF: Safety Alignment for Large Language Models through Low-Rank Safety Subspace Fusion
- ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries
- Meta-Tool: Unleash Open-World Function Calling Capabilities of General-Purpose Large Language Models
- Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning
- ISR: Self-Refining Referring Expressions for Entity Grounding
- Activating Distributed Visual Region withinLLMs for Efficient and Effective Vision-Language Training and Inference
- CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models
- TestNUC: Enhancing Test-Time Computing Approaches and Scaling through Neighboring Unlabeled Data Consistency
- The Esethu Framework: Reimagining Sustainable Dataset Governance and Curation for Low-Resource Languages
- Theoretical Analysis of Hierarchical Language Recognition and Generation by Transformers without Positional Encoding
- Less is More: Explainable and EfficientICDCode Prediction with Clinical Entities
- BenchmarkingLLMs andLLM-based Agents in Practical Vulnerability Detection for Code Repositories
- Multi-Modality Expansion and Retention forLLMs through Parameter Merging and Decoupling
- Serial Lifelong Editing via Mixture of Knowledge Experts
- A Survey on Efficient Large Language Model Training: From Data-centric Perspectives
- IMOL: Incomplete-Modality-Tolerant Learning for Multi-Domain Fake News Video Detection
- DDxTutor: Clinical Reasoning Tutoring System with Differential Diagnosis-Based Structured Reasoning
- SocialEval: Evaluating Social Intelligence of Large Language Models
- Hidden in Plain Sight: Evaluation of the Deception Detection Capabilities ofLLMs in Multimodal Settings
- PlanningArena: A Modular Benchmark for Multidimensional Evaluation of Planning and Tool Learning
- FocusLLM: Precise Understanding of Long Context by Dynamic Condensing
- Negative Matters: Multi-Granularity Hard-Negative Synthesis and Anchor-Token-Aware Pooling for Enhanced Text Embeddings
- GPT-4 as a Homework Tutor Can Improve Student Engagement and Learning Outcomes
- Diffusion Models Through a Global Lens: Are They Culturally Inclusive?
- Efficient Safety Alignment of Large Language Models via Preference Re-ranking and Representation-based Reward Modeling
- English-based acoustic models perform well in the forced alignment of twoEnglish-based Pacific Creoles
- Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing
- Truth Knows No Language: Evaluating Truthfulness BeyondEnglish
- Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability
- Batayan: AFilipinoNLPbenchmark for evaluating Large Language Models
- HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims
- CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory
- It’s Not a Walk in the Park! Challenges of Idiom Translation in Speech-to-text Systems
- PolyNarrative: A Multilingual, Multilabel, Multi-domain Dataset for Narrative Extraction from News Articles
- A Parameter-Efficient and Fine-Grained Prompt Learning for Vision-Language Models
- Persona Dynamics: Unveiling the Impact of Persona Traits on Agents in Text-Based Games
- SeedBench: A Multi-task Benchmark for Evaluating Large Language Models in Seed Science
- 𝛿-Stance: A Large-Scale Real World Dataset of Stances in Legal Argumentation
- Re3Syn: A Dependency-Based Data Synthesis Framework for Long-Context Post-training
- Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions
- Multimodal Coreference Resolution forChinese Social Media Dialogues: Dataset and Benchmark Approach
- TACLR: A Scalable and Efficient Retrieval-based Method for Industrial Product Attribute Value Identification
- Theory of Mind in Large Language Models: Assessment and Enhancement
- Completing A Systematic Review in Hours instead of Months with InteractiveAIAgents
- CMHKF: Cross-Modality Heterogeneous Knowledge Fusion for Weakly Supervised Video Anomaly Detection
- CLaSp: In-Context Layer Skip for Self-Speculative Decoding
- Teaching Text Agents to Learn Sequential Decision Making from Failure
- The Harmonic Structure of Information Contours
- REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark
- Only a Little to the Left: A Theory-grounded Measure of Political Bias in Large Language Models
- LongSafety: Evaluating Long-Context Safety of Large Language Models
- Exploiting Contextual Knowledge inLLMs through𝒱-usable Information based Layer Enhancement
- Unintended Harms of Value-AlignedLLMs: Psychological and Empirical Insights
- Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval
- The Noisy Path from Source to Citation: Measuring How Scholars Engage with Past Research
- MAPLE: Enhancing Review Generation with Multi-Aspect PromptLEarning in Explainable Recommendation
- Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers
- Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language Models
- DualRAG: A Dual-Process Approach to Integrate Reasoning and Retrieval for Multi-Hop Question Answering
- Deliberate Reasoning in Language Models as Structure-Aware Planning with an Accurate World Model
- Refining Salience-Aware Sparse Fine-Tuning Strategies for Language Models
- Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention
- ScaleBiO: Scalable Bilevel Optimization forLLMData Reweighting
- PKU-SafeRLHF: Towards Multi-Level Safety Alignment forLLMs with Human Preference
- What Happened inLLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
- Beyond Text Compression: Evaluating Tokenizers Across Scales
- Emergent Abilities of Large Language Models under Continued Pre-training for Language Adaptation
- R-Fairness: Assessing Fairness of Ranking in Subjective Data
- RePanda: Pandas-powered Tabular Verification and Reasoning
- Towards Style Alignment in Cross-Cultural Translation
- TiC-LM: A Web-Scale Benchmark for Time-ContinualLLMPretraining
- Entailed Between the Lines: Incorporating Implication intoNLI
- Multi-Level Explanations for Generative Language Models
- A Multi-Agent Framework for Mitigating Dialect Biases in Privacy Policy Question-Answering Systems
- Low-Bit Quantization Favors UndertrainedLLMs
- LETS-C: Leveraging Text Embedding for Time Series Classification
- UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces
- HELIOS: Harmonizing Early Fusion, Late Fusion, andLLMReasoning for Multi-Granular Table-Text Retrieval
- ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities
- La Leaderboard: A Large Language Model Leaderboard forSpanish Varieties and Languages ofSpain andLatinAmerica
- Why Prompt Design Matters and Works: A Complexity Analysis of Prompt Search Space inLLMs
- Energy Considerations of Large Language Model Inference and Efficiency Optimizations
- Optimizing Pre-Training Data Mixtures with Mixtures of Data Expert Models
- BFS-Prover: Scalable Best-First Tree Search forLLM-based Automatic Theorem Proving
- Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation
- Logic-Regularized Verifier Elicits Reasoning fromLLMs
- Squeezed Attention: Accelerating Long Context LengthLLMInference
- LangMark: A Multilingual Dataset for Automatic Post-Editing
- Neural Parameter Search for Slimmer Fine-Tuned Models and Better Transfer
- Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models
- Where Are We? EvaluatingLLMPerformance onAfrican Languages
- Beyond Output Matching: Bidirectional Alignment for Enhanced In-Context Learning
- CiteEval: Principle-Driven Citation Evaluation for Source Attribution
- HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model
- EducationQ: EvaluatingLLMs’ Teaching Capabilities Through Multi-Agent Dialogue Framework
- KRISTEVA: Close Reading as a Novel Task for Benchmarking Interpretive Reasoning
- Efficient Domain Continual pretraining by Mitigating the Stability Gap
- Palm: A Culturally Inclusive and Linguistically Diverse Dataset forArabicLLMs
- NewsInterview: a Dataset and a Playground to EvaluateLLMs’ Grounding Gap via Informational Interviews
- CFBench: A Comprehensive Constraints-Following Benchmark forLLMs
- Towards Building Large Scale Datasets and State-of-the-Art Automatic Speech Translation Systems for 14Indian Languages
- CoRe-MMRAG: Cross-Source Knowledge Reconciliation for MultimodalRAG
- Mapping 1,000+ Language Models via the Log-Likelihood Vector
- ConsistencyChecker: Tree-based Evaluation ofLLMGeneralization Capabilities
- Robust Estimation of Population-Level Effects in Repeated-MeasuresNLPExperimental Designs
- FactBench: A Dynamic Benchmark for In-the-Wild Language Model Factuality Evaluation
- Training-freeLLMMerging for Multi-task Learning
- Inferring from Logits: Exploring Best Practices for Decoding-Free Generative Candidate Selection
- Comparison-based Active Preference Learning for Multi-dimensional Personalization
- OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
- LlamaDuo:LLMOps Pipeline for Seamless Migration from ServiceLLMs to Small-Scale LocalLLMs
- AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment
- SocialCC: Interactive Evaluation for Cultural Competence in Language Agents
- Scalable Vision Language Model Training via High Quality Data Curation
- GRAM: Generative Recommendation via Semantic-aware Multi-granular Late Fusion
- Towards Economical Inference: EnablingDeepSeek’s Multi-Head Latent Attention in Any Transformer-basedLLMs
- TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding
- Introducing Verification Task of Set Consistency with Set-Consistency Energy Networks
- Language Models can Subtly Deceive Without Lying: A Case Study on Strategic Phrasing in Legislation
- AfroCS-xs: Creating a Compact, High-Quality, Human-Validated Code-Switched Dataset forAfrican Languages
- Just Go Parallel: Improving the Multilingual Capabilities of Large Language Models
- Design Choices for Extending the Context Length of Visual Language Models
This index was automatically generated from 1599 papers across 50 parts.