Skip to content

Mondrian-He/awesome-acl-2025-artist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

awesome-acl-2025-artist

Awesome GitHub Repo stars

Important

If you need to look at other conferences such as NeurIPS, ICLR, ICML, EMNLP, or ACL, you can check out Awesome-artist !!!🤩🤩🤩

Note

This project repository contains the long papers from ACL 2025. Each paper’s framework diagrams, experimental figures, and other visuals are extracted to study their presentation techniques. Since the content is extensive and a single Markdown file cannot render everything reliably, we split it into 50 separate Markdown files, each covering approximately thirty-two papers. The following section indexes where each paper is located😁😁. Hope we can make progress together!


📚 Complete Paper Index

Total Papers: 1599

Split into 50 parts for better browsing

📖 Parts Summary

column1 column2 column3 column4 column5 column6 column7 column8 column9 column10
Part 1: 32 papers Part 2: 32 papers Part 3: 32 papers Part 4: 32 papers Part 5: 32 papers Part 6: 32 papers Part 7: 32 papers Part 8: 32 papers Part 9: 32 papers Part 10: 32 papers
Part 11: 32 papers Part 12: 32 papers Part 13: 32 papers Part 14: 32 papers Part 15: 32 papers Part 16: 32 papers Part 17: 32 papers Part 18: 32 papers Part 19: 32 papers Part 20: 32 papers
Part 21: 32 papers Part 22: 32 papers Part 23: 32 papers Part 24: 32 papers Part 25: 32 papers Part 26: 32 papers Part 27: 32 papers Part 28: 32 papers Part 29: 32 papers Part 30: 32 papers
Part 31: 32 papers Part 32: 32 papers Part 33: 32 papers Part 34: 32 papers Part 35: 32 papers Part 36: 32 papers Part 37: 32 papers Part 38: 32 papers Part 39: 32 papers Part 40: 32 papers
Part 41: 32 papers Part 42: 32 papers Part 43: 32 papers Part 44: 32 papers Part 45: 32 papers Part 46: 32 papers Part 47: 32 papers Part 48: 32 papers Part 49: 32 papers Part 50: 31 papers

📝 All Papers by Title

  1. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
  2. EcomScriptBench: A Multi-task Benchmark forE-commerce Script Planning via Step-wise Intention-Driven Product Association
  3. GraphNarrator: Generating Textual Explanations for Graph Neural Networks
  4. M-RewardBench: Evaluating Reward Models in Multilingual Settings
  5. ELABORATION: A Comprehensive Benchmark on Human-LLMCompetitive Programming
  6. Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process
  7. Bias in Language Models: Beyond Trick Tests and TowardsRUTEd Evaluation
  8. Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models
  9. The Impact of Auxiliary Patient Data on Automated ChestX-Ray Report Generation and How to Incorporate It
  10. CLEME2.0: Towards Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction
  11. StrucText-Eval: Evaluating Large Language Model’s Reasoning Ability in Structure-Rich Text
  12. Literature Meets Data: A Synergistic Approach to Hypothesis Generation
  13. GAPO: Learning Preferential Prompt through Generative Adversarial Policy Optimization
  14. Tree-of-Evolution: Tree-Structured Instruction Evolution for Code Generation in Large Language Models
  15. Delving into Multilingual Ethical Bias: TheMSQADwith Statistical Hypothesis Tests for Large Language Models
  16. ReSCORE: Label-free Iterative Retriever Training for Multi-hop Question Answering with Relevance-Consistency Supervision
  17. FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models
  18. Statistical Deficiency for Task Inclusion Estimation
  19. Towards Robust and Efficient Federated Low-Rank Adaptation with Heterogeneous Clients
  20. LLM-Powered Test Case Generation for Detecting Bugs in Plausible Programs
  21. Capture the Key in Reasoning to Enhance CoT Distillation Generalization
  22. How to Enable Effective Cooperation Between Humans and NLP Models: A Survey of Principles, Formalizations, and Beyond
  23. Enhancing Hyperbole and Metaphor Detection with Their Bidirectional Dynamic Interaction and Emotion Knowledge
  24. UniICL: An EfficientICLFramework Unifying Compression, Selection, and Generation
  25. BelarusianGLUE: Towards a Natural Language Understanding Benchmark for Belarusian
  26. A Survey on Foundation Language Models for Single-cell Biology
  27. RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios
  28. ExtendingLLMContext Window with Adaptive Grouped Positional Encoding: A Training-Free Method
  29. Semantic Exploration with Adaptive Gating for Efficient Problem Solving with Language Models
  30. HotelMatch-LLM: Joint Multi-Task Training of Small and Large Language Models for Efficient Multimodal Hotel Retrieval
  31. Can Multimodal Large Language Models Understand Spatial Relations?
  32. S3- Semantic Signal Separation
  33. TrimLLM: Progressive Layer Dropping for Domain-SpecificLLMs
  34. JuStRank: BenchmarkingLLMJudges for System Ranking
  35. Generating Diverse Training Samples for Relation Extraction with Large Language Models
  36. MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts
  37. Efficient and Accurate Prompt Optimization: the Benefit of Memory in Exemplar-Guided Reflection
  38. Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation
  39. EscapeBench: Towards Advancing Creative Intelligence of Language Model Agents
  40. BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving
  41. LACA: Improving Cross-lingual Aspect-Based Sentiment Analysis with LLM Data Augmentation
  42. Fusing Highly Specialized Language Models for Comprehensive Expertise
  43. HybGRAG: Hybrid Retrieval-Augmented Generation on Textual and Relational Knowledge Bases
  44. Re-ranking Using Large Language Models for Mitigating Exposure to Harmful Content on Social Media Platforms
  45. AligningAIResearch with the Needs of Clinical Coding Workflows: Eight Recommendations Based on US Data Analysis and Critical Review
  46. MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection
  47. EvoWiki: EvaluatingLLMs on Evolving Knowledge
  48. Rethinking Repetition Problems of LLMs in Code Generation
  49. PunchBench: BenchmarkingMLLMs in Multimodal Punchline Comprehension
  50. ProcessBench: Identifying Process Errors in Mathematical Reasoning
  51. Model Extrapolation Expedites Alignment
  52. ATLANTIS: Weak-to-Strong Learning via Importance Sampling
  53. MPVStance: Mitigating Hallucinations in Stance Detection with Multi-Perspective Verification
  54. Personality-Guided Code Generation Using Large Language Models
  55. PsyDT: UsingLLMs to Construct the Digital Twin of Psychological Counselor with Personalized Counseling Style for Psychological Counseling
  56. BIPro: Zero-shotChinese Poem Generation via Block Inverse Prompting Constrained Generation Framework
  57. LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating
  58. ObfusLM: Privacy-preserving Language Model Service against Embedding Inversion Attacks
  59. Interlocking-free Selective Rationalization Through Genetic-based Learning
  60. Re-identification of De-identified Documents with Autoregressive Infilling
  61. Modeling Uncertainty in Composed Image Retrieval via Probabilistic Embeddings
  62. Untie the Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language Models
  63. APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts
  64. Evaluating Lexical Proficiency in Neural Language Models
  65. Autoregressive Speech Synthesis without Vector Quantization
  66. Cuckoo: AnIEFree Rider Hatched by Massive Nutrition inLLM’s Nest
  67. FedEx-LoRA: Exact Aggregation for Federated and Efficient Fine-Tuning of Large Language Models
  68. Measuring Social Biases in Masked Language Models by Proxy of Prediction Quality
  69. Capturing Author Self Beliefs in Social Media Language
  70. Neural Topic Modeling with Large Language Models in the Loop
  71. HALoGEN: FantasticLLMHallucinations and Where to Find Them
  72. SynergizingLLMs with Global Label Propagation for Multimodal Fake News Detection
  73. “Yes, MyLoRD.” Guiding Language Model Extraction with Locality Reinforced Distillation
  74. Jailbreak Large Vision-Language Models Through Multi-Modal Linkage
  75. Wait, that’s not an option:LLMs Robustness with Incorrect Multiple-Choice Options
  76. The Hidden Attention of Mamba Models
  77. KV-Latent: Dimensional-levelKVCache Reduction with Frequency-aware Rotary Positional Embedding
  78. LEANCODE: Understanding Models Better for Code Simplification of Pre-trained Large Language Models
  79. MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset
  80. Ask-Before-Detection: Identifying and Mitigating Conformity Bias inLLM-Powered Error Detector for Math Word Problem Solutions
  81. Real-time Factuality Assessment from Adversarial Feedback
  82. Improve Vision Language Model Chain-of-thought Reasoning
  83. On the Mutual Influence of Gender and Occupation inLLMRepresentations
  84. Disentangling Memory and Reasoning Ability in Large Language Models
  85. Open-World Attribute Mining forE-Commerce Products with Multimodal Self-Correction Instruction Tuning
  86. NormalizedAOPC: Fixing Misleading Faithfulness Metrics for Feature Attributions Explainability
  87. Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling
  88. LangSAMP: Language-Script Aware Multilingual Pretraining
  89. RelationalCoder: Rethinking Complex Tables via Programmatic Relational Transformation
  90. Algorithmic Fidelity of Large Language Models in Generating SyntheticGerman Public Opinions: A Case Study
  91. TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos
  92. Self-Instructed Derived Prompt Generation Meets In-Context Learning: Unlocking New Potential of Black-BoxLLMs
  93. Binary Classifier Optimization for Large Language Model Alignment
  94. UnSeenTimeQA: Time-Sensitive Question-Answering BeyondLLMs’ Memorization
  95. From Information to Insight: LeveragingLLMs for Open Aspect-Based Educational Summarization
  96. AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset
  97. Root Defense Strategies: Ensuring Safety ofLLMat the Decoding Level
  98. In-the-wild Audio Spatialization with Flexible Text-guided Localization
  99. L4Q: Parameter Efficient Quantization-Aware Fine-Tuning on Large Language Models
  100. Second Language (Arabic) Acquisition ofLLMs via Progressive Vocabulary Expansion
  101. What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities inLLMs
  102. ECERC: Evidence-Cause Attention Network for Multi-Modal Emotion Recognition in Conversation
  103. CompileAgent: Automated Real-World Repo-Level Compilation with Tool-IntegratedLLM-based Agent System
  104. Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals’ Subjective Text Perceptions
  105. Exploring Forgetting in Large Language Model Pre-Training
  106. Bias in the Mirror : AreLLMs opinions robust to their own adversarial attacks
  107. AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents
  108. Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment
  109. Multimodal Transformers are Hierarchical Modal-wise Heterogeneous Graphs
  110. Have We Designed Generalizable Structural Knowledge Promptings? Systematic Evaluation and Rethinking
  111. LLäMmlein: Transparent, Compact and CompetitiveGerman-Only Language Models from Scratch
  112. Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues
  113. How Much Do Encoder Models Know About Word Senses?
  114. When Backdoors Speak: UnderstandingLLMBackdoor Attacks Through Model-Generated Explanations
  115. HateDay: Insights from a Global Hate Speech Dataset Representative of a Day onTwitter
  116. LegalAgentBench: EvaluatingLLMAgents in Legal Domain
  117. Inference Compute-Optimal Video Vision Language Models
  118. Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models
  119. Digital Gatekeepers:Google’s Role in Curating Hashtags and Subreddits
  120. Behind Closed Words: Creating and Investigating the forePLay Annotated Dataset forPolish Erotic Discourse
  121. Assessment and manipulation of latent constructs in pre-trained language models using psychometric scales
  122. Did Translation Models Get More Robust Without AnyoneEven Noticing?
  123. Nemotron-CC: TransformingCommonCrawl into a Refined Long-Horizon Pretraining Dataset
  124. Hierarchical Level-Wise News Article Clustering via Multilingual Matryoshka Embeddings
  125. Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models
  126. INVESTORBENCH: A Benchmark for Financial Decision-Making Tasks withLLM-based Agent
  127. Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
  128. Gender Inclusivity Fairness Index (GIFI): A Multilevel Framework for Evaluating Gender Diversity in Large Language Models
  129. D.Va: Validate Your Demonstration First Before You Use It
  130. Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?
  131. MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation
  132. Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning
  133. Direct Prompt Optimization with Continuous Representations
  134. uMedSum: A Unified Framework for Clinical Abstractive Summarization
  135. GigaSpeech 2: An Evolving, Large-Scale and Multi-domainASRCorpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
  136. Context-Aware Sentiment Forecasting viaLLM-based Multi-Perspective Role-Playing Agents
  137. TARGA: Targeted Synthetic Data Generation for Practical Reasoning over Structured Data
  138. AndroidGen: Building an Android Language Agent under Data Scarcity
  139. Prompt Candidates, then Distill: A Teacher-Student Framework forLLM-driven Data Annotation
  140. A Survey of Post-Training Scaling in Large Language Models
  141. Position-aware Automatic Circuit Discovery
  142. HyperFM: Fact-Centric Multimodal Fusion for Link Prediction over Hyper-Relational Knowledge Graphs
  143. Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model
  144. Less for More: Enhanced Feedback-aligned MixedLLMs for Molecule Caption Generation and Fine-GrainedNLIEvaluation
  145. Ensemble Watermarks for Large Language Models
  146. ConInstruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual Modalities
  147. TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning forLLM-as-a-Judge
  148. DioR: Adaptive Cognitive Detection and Contextual Retrieval Optimization for Dynamic Retrieval-Augmented Generation
  149. Unveiling the Power of Source: Source-based MinimumBayes Risk Decoding for Neural Machine Translation
  150. ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use
  151. Mixture of insighTful Experts (MoTE): The Synergy of Reasoning Chains and Expert Mixtures in Self-Alignment
  152. MAPS: Motivation-Aware Personalized Search viaLLM-Driven Consultation Alignment
  153. Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework
  154. LADM: Long-context Training Data Selection with Attention-based Dependency Measurement forLLMs
  155. Iron Sharpens Iron: Defending Against Attacks in Machine-Generated Text Detection with Adversarial Training
  156. Cultural Learning-Based Culture Adaptation of Language Models
  157. A-TASC:AsianTED-Based Automatic Subtitling Corpus
  158. Refuse Whenever You Feel Unsafe: Improving Safety inLLMs via Decoupled Refusal Training
  159. Token Prepending: A Training-Free Approach for Eliciting Better Sentence Embeddings fromLLMs
  160. No Questions are Stupid, but some are Poorly Posed: Understanding Poorly-Posed Information-Seeking Questions
  161. Understanding Common Ground Misalignment in Goal-Oriented Dialog: A Case-Study withUbuntu Chat Logs
  162. Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models
  163. Towards Reward Fairness inRLHF: From a Resource Allocation Perspective
  164. TamingLLMs with Gradient Grouping
  165. LazyReview: A Dataset for Uncovering Lazy Thinking inNLPPeer Reviews
  166. Revisiting Common Assumptions aboutArabic Dialects inNLP
  167. Retrieve to Explain: Evidence-driven Predictions for Explainable Drug Target Identification
  168. Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas
  169. Which of These Best Describes Multiple Choice Evaluation withLLMs? A) ForcedB) FlawedC) FixableD) All of the Above
  170. Detection of Human and Machine-Authored Fake News inUrdu
  171. An Efficient Task-Oriented Dialogue Policy: Evolutionary Reinforcement Learning Injected by Elite Individuals
  172. SR-LLM: Rethinking the Structured Representation in Large Language Model
  173. Taming Language Models for Text-attributed Graph Learning with Decoupled Aggregation
  174. Contrastive Prompting Enhances Sentence Embeddings inLLMs through Inference-Time Steering
  175. Cracking the Code of Hallucination inLVLMs with Vision-aware Head Divergence
  176. Hierarchical Document Refinement for Long-context Retrieval-augmented Generation
  177. Comparing Moral Values inWesternEnglish-speaking societies andLLMs with Word Associations
  178. TEACH: A Contrastive Knowledge Adaptive Distillation Framework for ClassicalChinese Understanding
  179. RAG-Critic: Leveraging Automated Critic-Guided Agentic Workflow for Retrieval Augmented Generation
  180. Progressive Multimodal Reasoning via Active Retrieval
  181. Pre-training Distillation for Large Language Models: A Design Space Exploration
  182. Teaching Vision-Language Models to Ask: Resolving Ambiguity in Visual Questions
  183. LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks
  184. Battling against Tough Resister: Strategy Planning with Adversarial Game for Non-collaborative Dialogues
  185. Cross-model Transferability among Large Language Models on the Platonic Representations of Concepts
  186. FoldMoE: Efficient Long SequenceMoETraining via Attention-MoEPipelining
  187. LongReward: Improving Long-context Large Language Models withAIFeedback
  188. Influences onLLMCalibration: A Study of Response Agreement, Loss Functions, and Prompt Styles
  189. UTBoost: Rigorous Evaluation of Coding Agents onSWE-Bench
  190. Towards Better Evaluation for Generated Patent Claims
  191. Fine-Tuning on Diverse Reasoning Chains Drives Within-InferenceCoTRefinement inLLMs
  192. Establishing TrustworthyLLMEvaluation via Shortcut Neuron Analysis
  193. Do Large Language Models have anEnglish Accent? Evaluating and Improving the Naturalness of MultilingualLLMs
  194. Enhancing Character-Level Understanding inLLMs through Token Internal Structure Learning
  195. Conformity in Large Language Models
  196. Interpret and Improve In-Context Learning via the Lens of Input-Label Mappings
  197. Positional Overload: Positional Debiasing and Context Window Extension for Large Language Models using Set Encoding
  198. FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling
  199. VReST: Enhancing Reasoning in Large Vision-Language Models through Tree Search and Self-Reward Mechanism
  200. Past Meets Present: Creating Historical Analogy with Large Language Models
  201. Meta-Reflection: A Feedback-Free Reflection Learning Framework
  202. Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar Books
  203. Confidence v.s. Critique: A Decomposition of Self-Correction Capability forLLMs
  204. Automating Legal Interpretation withLLMs: Retrieval, Generation, and Evaluation
  205. Visual Evidence Prompting Mitigates Hallucinations in Large Vision-Language Models
  206. Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AICollaboration
  207. TokAlign: Efficient Vocabulary Adaptation via Token Alignment
  208. AdaEdit: Advancing Continuous Knowledge Editing For Large Language Models
  209. The Impact of Token Granularity on the Predictive Power of Language Model Surprisal
  210. Segment-Level Diffusion: A Framework for Controllable Long-Form Generation with Diffusion Language Models
  211. BELLE: A Bi-Level Multi-Agent Reasoning Framework for Multi-Hop Question Answering
  212. Dynamic and Generalizable Process Reward Modeling
  213. AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness
  214. Towards Text-Image Interleaved Retrieval
  215. Large Margin Representation Learning for Robust Cross-lingual Named Entity Recognition
  216. An Efficient and Precise Training Data Construction Framework for Process-supervised Reward Model in Mathematical Reasoning
  217. QAEncoder: Towards Aligned Representation Learning in Question Answering Systems
  218. Game Development as Human-LLMInteraction
  219. CanLLMs SimulateL2-English Dialogue? An Information-Theoretic Analysis ofL1-Dependent Biases
  220. DeepSolution: Boosting Complex Engineering Solution Design via Tree-based Exploration and Bi-point Thinking
  221. SurveyPilot: an Agentic Framework for Automated Human Opinion Collection from Social Media
  222. Sharper and Faster mean Better: Towards More Efficient Vision-Language Model for Hour-scale Long Video Understanding
  223. Auto-Arena: AutomatingLLMEvaluations with Agent Peer Battles and Committee Discussions
  224. How Humans andLLMs Organize Conceptual Knowledge: Exploring Subordinate Categories inItalian
  225. PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models
  226. ProtoLens: Advancing Prototype Learning for Fine-Grained Interpretability in Text Classification
  227. Fine-grained Video Dubbing Duration Alignment with Segment Supervised Preference Optimization
  228. Sparse Latents Steer Retrieval-Augmented Generation
  229. Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders
  230. SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model
  231. AnRe: Analogical Replay for Temporal Knowledge Graph Forecasting
  232. Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
  233. Text is All You Need:LLM-enhanced Incremental Social Event Detection
  234. Multimodal Pragmatic Jailbreak on Text-to-image Models
  235. Principled Understanding of Generalization for Generative Transformer Models in Arithmetic Reasoning Tasks
  236. Discourse Relation-Enhanced Neural Coherence Modeling
  237. Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models
  238. from Benign import Toxic: Jailbreaking the Language Model via Adversarial Metaphors
  239. ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Multilingual Contrastive Framework
  240. MorphMark: Flexible Adaptive Watermarking for Large Language Models
  241. A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression
  242. On the Limit of Language Models as Planning Formalizers
  243. Learning to Generate Structured Output with Schema Reinforcement Learning
  244. Enhancing Unsupervised Sentence Embeddings via Knowledge-Driven Data Augmentation andGaussian-Decayed Contrastive Learning
  245. Improve Safety Training of Large Language Models with Safety-Critical Singular Vectors Localization
  246. WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models
  247. A Triple-View Framework for Fine-Grained Emotion Classification with Clustering-Guided Contrastive Learning
  248. Quantification of Large Language Model Distillation
  249. Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models
  250. Pandora’s Box or Aladdin’s Lamp: A Comprehensive Analysis Revealing the Role ofRAGNoise in Large Language Models
  251. Stepwise Reasoning Disruption Attack ofLLMs
  252. Crowd Comparative Reasoning: Unlocking Comprehensive Evaluations forLLM-as-a-Judge
  253. Lost in Multilinguality: Dissecting Cross-lingual Factual Inconsistency in Transformer Language Models
  254. Optimizing Decomposition for Optimal Claim Verification
  255. GradOT: Training-free Gradient-preserving Offsite-tuning for Large Language Models
  256. Knowledge Boundary of Large Language Models: A Survey
  257. Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal LongCoTReasoning
  258. MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System
  259. Mitigating Selection Bias with Node Pruning and Auxiliary Options
  260. Dually Self-Improved Counterfactual Data Augmentation Using Large Language Model
  261. RPO: Retrieval Preference Optimization for Robust Retrieval-Augmented Generation
  262. Learning to Reason from Feedback at Test-Time
  263. L-CiteEval: A Suite for Evaluating Fidelity of Long-context Models
  264. SECRET: Semi-supervised Clinical Trial Document Similarity Search
  265. Geometric Signatures of Compositionality Across a Language Model’s Lifetime
  266. Pattern Recognition or Medical Knowledge? The Problem with Multiple-Choice Questions in Medicine
  267. People who frequently useChatGPTfor writing tasks are accurate and robust detectors ofAI-generated text
  268. YuLan-Mini: Pushing the Limits of Open Data-efficient Language Model
  269. Your Model is Overconfident, and Other Lies We Tell Ourselves
  270. Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention
  271. Plug-in and Fine-tuning: Bridging the Gap between Small Language Models and Large Language Models
  272. What is Stigma Attributed to? A Theory-Grounded, Expert-Annotated Interview Corpus for Demystifying Mental-Health Stigma
  273. ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors
  274. Enhancing Transformers for Generalizable First-Order Logical Entailment
  275. Self-Taught Agentic Long Context Understanding
  276. Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training
  277. OS-Genesis: AutomatingGUIAgent Trajectory Construction via Reverse Task Synthesis
  278. CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter
  279. ConSim: Measuring Concept-Based Explanations’ Effectiveness with Automated Simulatability
  280. Decoding Reading Goals from Eye Movements
  281. Uncovering Visual-Semantic Psycholinguistic Properties from the Distributional Structure of Text Embedding Space
  282. GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge forGUIAgent
  283. P2Law: Scaling Law for Post-Training After Model Pruning
  284. MakingFETCH! Happen: Finding Emergent Dog Whistles Through Common Habitats
  285. Lost in the Context: Insufficient and Distracted Attention to Contexts in Preference Modeling
  286. Entailment-Preserving First-order Logic Representations in Natural Language Entailment
  287. Enhancing Multimodal Continual Instruction Tuning withBranchLoRA
  288. Enhancing Automated Interpretability with Output-Centric Feature Descriptions
  289. Towards Effective and Efficient Continual Pre-training of Large Language Models
  290. Efficient Universal Goal Hijacking with Semantics-guided Prompt Organization
  291. mPLUG-DocOwl2: High-resolution Compressing forOCR-free Multi-page Document Understanding
  292. What Makes a Good Natural Language Prompt?
  293. X-TURING: Towards an Enhanced and EfficientTuring Test for Long-Term Dialogue Agents
  294. Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline withUniMoral
  295. Modality-Aware Neuron Pruning for Unlearning in Multimodal Large Language Models
  296. NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional Reasoning
  297. ReLearn: Unlearning via Learning for Large Language Models
  298. Understanding Cross-Domain Adaptation in Low-Resource Topic Modeling
  299. UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models
  300. CoT-Valve: Length-Compressible Chain-of-Thought Tuning
  301. HoH: A Dynamic Benchmark for Evaluating the Impact of Outdated Information on Retrieval-Augmented Generation
  302. Uncertainty Propagation onLLMAgent
  303. Beyond Position: the emergence of wavelet-like properties in Transformers
  304. Are the Hidden States Hiding Something? Testing the Limits of Factuality-Encoding Capabilities inLLMs
  305. Disentangling Biased Knowledge from Reasoning in Large Language Models via Machine Unlearning
  306. LLaMAs Have Feelings Too: Unveiling Sentiment and Emotion Representations inLLaMAModels Through Probing
  307. CxGGEC: Construction-Guided Grammatical Error Correction
  308. Beyond Sequences: Two-dimensional Representation and Dependency Encoding for Code Generation
  309. HD-NDEs: Neural Differential Equations for Hallucination Detection inLLMs
  310. What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific Presentations
  311. NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring forPDFQuestion Answering
  312. ProvBench: A Benchmark of Legal Provision Recommendation for Contract Auto-Reviewing
  313. F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
  314. AutoMedEval: Harnessing Language Models for Automatic Medical Capability Evaluation
  315. CoT-based Synthesizer: EnhancingLLMPerformance through Answer Synthesis
  316. Efficiently Identifying Watermarked Segments in Mixed-Source Texts
  317. Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks
  318. Towards a More Generalized Approach in Open Relation Extraction
  319. Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home
  320. Evaluating Language Models as Synthetic Data Generators
  321. Can Graph Descriptive Order Affect Solving Graph Problems withLLMs?
  322. Learning to Rewrite: GeneralizedLLM-Generated Text Detection
  323. Evaluating Multimodal Large Language Models on Video Captioning viaMonteCarlo Tree Search
  324. GIFT-SW:Gaussian noise Injected Fine-Tuning of Salient Weights forLLMs
  325. Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis
  326. Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
  327. AlignMMBench: EvaluatingChinese Multimodal Alignment in Large Vision-Language Models
  328. BiasedLLMs can Influence Political Decision-Making
  329. LexTempus: Enhancing Temporal Generalizability of Legal Language Models Through Dynamic Mixture of Experts
  330. That is Unacceptable: the Moral Foundations of Canceling
  331. FloorPlan-LLaMa: Aligning Architects’ Feedback and Domain Knowledge in Architectural Floor Plan Generation
  332. TheoremExplainAgent: Towards Video-based Multimodal Explanations forLLMTheorem Understanding
  333. FineReason: Evaluating and ImprovingLLMs’ Deliberate Reasoning through Reflective Puzzle Solving
  334. TheTIPof the Iceberg: Revealing a Hidden Class of Task-in-Prompt Adversarial Attacks onLLMs
  335. Identifying Reliable Evaluation Metrics for Scientific Text Revision
  336. Can Language Models Reason about Individualistic Human Values and Preferences?
  337. BERT-like Models forSlavic Morpheme Segmentation
  338. Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling
  339. Unlocking General Long Chain-of-Thought Reasoning Capabilities of Large Language Models via Representation Engineering
  340. Drift: EnhancingLLMFaithfulness in Rationale Generation via Dual-Reward Probabilistic Inference
  341. Fairness through Difference Awareness: MeasuringDesiredGroup Discrimination inLLMs
  342. MergePrint: Merge-Resistant Fingerprints for Robust Black-box Ownership Verification of Large Language Models
  343. Dynamic Scaling of Unit Tests for Code Reward Modeling
  344. UniConv: Unifying Retrieval and Response Generation for Large Language Models in Conversations
  345. Tracking Life’s Ups and Downs: Mining Life Events from Social Media Posts for Mental Health Analysis
  346. ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control
  347. PIC: Unlocking Long-Form Text Generation Capabilities of Large Language Models via PositionIDCompression
  348. Towards Effective Extraction and Evaluation of Factual Claims
  349. Beyond Facts: Evaluating Intent Hallucination in Large Language Models
  350. A Systematic Study of Compositional Syntactic Transformer Language Models
  351. M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation
  352. SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition
  353. Personalized Text Generation with Contrastive Activation Steering
  354. Gumbel Reranking: Differentiable End-to-End Reranker Optimization
  355. Hybrid Preferences: Learning to Route Instances for Human vs.AIFeedback
  356. SEOE: A Scalable and Reliable Semantic Evaluation Framework for Open Domain Event Detection
  357. TheUD-NewsCrawl Treebank: Reflections and Challenges from a Large-scaleTagalog Syntactic Annotation Project
  358. DRAG: DistillingRAGforSLMs fromLLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation
  359. G-Safeguard: A Topology-Guided Security Lens and Treatment onLLM-based Multi-agent Systems
  360. Deontological Keyword Bias: The Impact of Modal Expressions on Normative Judgments of Language Models
  361. LegalReasoner: Step-wised Verification-Correction for Legal Judgment Reasoning
  362. Rolling theDICEon Idiomaticity: HowLLMs Fail to Grasp Context
  363. ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
  364. The Cross-linguistic Role ofAnimacy in Grammar Structures
  365. LexGen: Domain-aware Multilingual Lexicon Generation
  366. How to Train Long-Context Language Models (Effectively)
  367. MathFusion: Enhancing Mathematical Problem-solving ofLLMthrough Instruction Fusion
  368. Mining Complex Patterns of Argumentative Reasoning in Natural Language Dialogue
  369. OSAgents: A Survey onMLLM-based Agents for Computer, Phone and Browser Use
  370. Data Quality Issues in Multilingual Speech Datasets: The Need for Sociolinguistic Awareness and Proactive Language Planning
  371. LLMas a Broken Telephone: Iterative Generation Distorts Information
  372. VLM2-Bench: A Closer Look at How WellVLMs Implicitly Link Explicit Matching Visual Cues
  373. Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation
  374. Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
  375. Large Language Models Struggle to Describe the Haystack without Human Help: A Social Science-Inspired Evaluation of Topic Models
  376. ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models
  377. Enough Coin Flips Can MakeLLMs ActBayesian
  378. GAMEBoT: Transparent Assessment ofLLMReasoning in Games
  379. A Text is Worth Several Tokens: Text Embedding fromLLMs Secretly Aligns Well with The Key Tokens
  380. Commonsense Reasoning inArab Culture
  381. AXIS: Efficient Human-Agent-Computer Interaction withAPI-FirstLLM-Based Agents
  382. Translation and Fusion Improves Cross-lingual Information Extraction
  383. Conditional Dichotomy Quantification via Geometric Embedding
  384. Aligning Large Language Models with Implicit Preferences from User-Generated Content
  385. VQAGuider: Guiding Multimodal Large Language Models to Answer Complex Video Questions
  386. Large Language Models are Good Relational Learners
  387. SpaRE: Enhancing Spatial Reasoning in Vision-Language Models with Synthetic Data
  388. Distilling an End-to-End Voice Assistant Without Instruction Training Data
  389. CoMet: Metaphor-Driven Covert Communication for Multi-Agent Language Games
  390. CER: Confidence Enhanced Reasoning inLLMs
  391. Watermarking Large Language Models: An Unbiased and Low-risk Method
  392. On Synthetic Data Strategies for Domain-Specific Generative Retrieval
  393. LLMBraces: Straightening OutLLMPredictions with Relevant Sub-Updates
  394. CONFETTI: Conversational Function-Calling Evaluation Through Turn-Level Interactions
  395. Evaluating Theory of (an uncertain) Mind: Predicting the Uncertain Beliefs of Others from Conversational Cues
  396. Uncertainty in Causality: A New Frontier
  397. SynthesizeMe! Inducing Persona-Guided Prompts for Personalized Reward Models inLLMs
  398. When People are Floods: Analyzing Dehumanizing Metaphors in Immigration Discourse with Large Language Models
  399. AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection
  400. Improving Model Factuality with Fine-grained Critique-based Evaluator
  401. Building a Long Text Privacy Policy Corpus with Multi-Class Labels
  402. R2-MultiOmnia: Leading Multilingual Multimodal Reasoning via Self-Training
  403. When theLMmisunderstood the human chuckled: Analyzing garden path effects in humans and language models
  404. Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models
  405. VLSBench: Unveiling Visual Leakage in Multimodal Safety
  406. Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning
  407. Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation
  408. Conspiracy Theories and Where to Find Them onTikTok
  409. Growing Through Experience: Scaling Episodic Grounding in Language Models
  410. Exploiting the Shadows: Unveiling Privacy Leaks through Lower-Ranked Tokens in Large Language Models
  411. Attacking Vision-Language Computer Agents via Pop-ups
  412. Explicit and Implicit Data Augmentation for Social Event Detection
  413. In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents
  414. Revisiting ClassicalChinese Event Extraction with Ancient Literature Information
  415. Unanswerability Evaluation for Retrieval Augmented Generation
  416. SCALE: Towards Collaborative Content Analysis in Social Science with Large Language Model Agents and Human Intervention
  417. Self-Error-Instruct: Generalizing from Errors forLLMs Mathematical Reasoning
  418. RAGEval: Scenario SpecificRAGEvaluation Dataset Generation Framework
  419. A Survey on Patent Analysis: FromNLPto MultimodalAI
  420. SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification
  421. MultiAgentBench : Evaluating the Collaboration and Competition ofLLMagents
  422. Sinhala Encoder-only Language Models and Evaluation
  423. LLMs can Perform Multi-Dimensional Analytic Writing Assessments: A Case Study ofL2 Graduate-Level AcademicEnglish Writing
  424. SEUF: Is Unlearning One Expert Enough for Mixture-of-ExpertsLLMs?
  425. Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and Challenges
  426. LocAgent: Graph-GuidedLLMAgents for Code Localization
  427. COSMMIC: Comment-Sensitive Multimodal MultilingualIndian Corpus for Summarization and Headline Generation
  428. Mind the Gap: Static and Interactive Evaluations of Large Audio Models
  429. Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study onManchu
  430. CKnowEdit: A NewChinese Knowledge Editing Dataset for Linguistics, Facts, and Logic Error Correction inLLMs
  431. TripleFact: Defending Data Contamination in the Evaluation ofLLM-driven Fake News Detection
  432. Meaning Beyond Truth Conditions: Evaluating Discourse Level Understanding via Anaphora Accessibility
  433. Large Language and Reasoning Models are Shallow Disjunctive Reasoners
  434. Warmup Generations: A Task-Agnostic Approach for Guiding Sequence-to-Sequence Learning with Unsupervised Initial State Generation
  435. Building Better: Avoiding Pitfalls in Developing Language Resources when Data is Scarce
  436. BRIGHTER:BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages
  437. SkillVerse : Assessing and EnhancingLLMs with Tree Evaluation
  438. CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in theLLMEra
  439. Empathy Prediction from Diverse Perspectives
  440. AreLLMs effective psychological assessors? Leveraging adaptiveRAGfor interpretable mental health screening through psychometric practice
  441. INTERACT: Enabling Interactive, Question-Driven Learning in Large Language Models
  442. Circuit Stability Characterizes Language Model Generalization
  443. ComparingLLM-generated and human-authored news text using formal syntactic theory
  444. Improving Preference Extraction InLLMs By Identifying Latent Knowledge Through Classifying Probes
  445. White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases inLLMs
  446. AIMSCheck: LeveragingLLMs forAI-Assisted Review of Modern Slavery Statements Across Jurisdictions
  447. Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence
  448. SelfElicit: Your Language Model Secretly Knows Where is the Relevant Evidence
  449. The MaleCEOand the Female Assistant: Evaluation and Mitigation of Gender Biases in Text-To-Image Generation of Dual Subjects
  450. Mitigating Shortcut Learning withInterpoLated Learning
  451. Toward Automatic Discovery of a Canine Phonetic Alphabet
  452. DavIR: Data Selection via Implicit Reward for Large Language Models
  453. Byte Latent Transformer: Patches Scale Better Than Tokens
  454. DiffuseDef: Improved Robustness to Adversarial Attacks via Iterative Denoising
  455. Identifying Cellular Niches in Spatial Transcriptomics: An Investigation into the Capabilities of Large Language Models
  456. Culture Matters in Toxic Language Detection inPersian
  457. Bitnet.cpp: Efficient Edge Inference for TernaryLLMs
  458. Instance-Selection-Inspired Undersampling Strategies for Bias Reduction in Small and Large Language Models for Binary Text Classification
  459. Forward Knows Efficient Backward Path: Saliency-Guided Memory-Efficient Fine-tuning of Large Language Models
  460. Focus on What Matters: Enhancing Medical Vision-Language Models with Automatic Attention Alignment Tuning
  461. LLMs + Persona-Plug = PersonalizedLLMs
  462. Developmentally-plausible Working Memory Shapes a Critical Period for Language Acquisition
  463. IRIS: An Iterative and Integrated Framework for Verifiable Causal Discovery in the Absence of Tabular Data
  464. INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16African Languages
  465. Boosting Long-Context Information Seeking via Query-Guided Activation Refilling
  466. Efficient Pretraining Data Selection for Language Models via Multi-Actor Collaboration
  467. AdaDHP: Fine-Grained Fine-Tuning via DualHadamard Product and Adaptive Parameter Selection
  468. KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph
  469. Curriculum Debiasing: Toward Robust Parameter-Efficient Fine-Tuning Against Dataset Biases
  470. Does Context Matter?ContextualJudgeBench for EvaluatingLLM-based Judges in Contextual Settings
  471. On the Reliability of Large Language Models for Causal Discovery
  472. Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts
  473. TeRDy: Temporal Relation Dynamics through Frequency Decomposition for Temporal Knowledge Graph Completion
  474. Incorporating Domain Knowledge into Materials Tokenization
  475. PIG: Privacy Jailbreak Attack onLLMs via Gradient-based Iterative In-Context Optimization
  476. Agents Under Siege: Breaking Pragmatic Multi-AgentLLMSystems with Optimized Prompt Attacks
  477. Semantic-Eval : A Semantic Comprehension Evaluation Framework for Large Language Models Generation without Training
  478. Between Circuits andChomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
  479. When to Speak, When to Abstain: Contrastive Decoding with Abstention
  480. On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era ofLLMs
  481. Investigating and Extending Homans’ Social Exchange Theory with Large Language Model based Agents
  482. A Drop-In Solution for On-the-Fly Adaptation of Speculative Decoding in Large Language Models
  483. If Attention Serves as a Cognitive Model of Human Memory Retrieval, What is the Plausible Memory Representation?
  484. AligningVLMAssistants with Personalized Situated Cognition
  485. Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models
  486. Faster Speculative Decoding via Effective Draft Decoder with Pruned Candidate Tree
  487. Selecting and Merging: Towards Adaptable and Scalable Named Entity Recognition with Large Language Models
  488. Embracing Imperfection: Simulating Students with Diverse Cognitive Levels UsingLLM-based Agents
  489. CADReview: Automatically ReviewingCADPrograms with Error Detection and Correction
  490. Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling
  491. The Lawyer That Never Thinks: Consistency and Fairness as Keys to ReliableAI
  492. Polishing Every Facet of theGEM: Testing Linguistic Competence ofLLMs and Humans inKorean
  493. SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods
  494. ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation
  495. InvestAlign: Overcoming Data Scarcity in Aligning Large Language Models with Investor Decision-Making Processes Under Herd Behavior
  496. Enhancing Neural Machine Translation Through Target Language Data: AkNN-LMApproach for Domain Adaptation
  497. Multi-level Relevance Document Identifier Learning for Generative Retrieval
  498. EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
  499. Exploring How GenerativeMLLMs Perceive More ThanCLIPwith the Same Vision Encoder
  500. NexusSum: HierarchicalLLMAgents for Long-Form Narrative Summarization
  501. HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
  502. Uni-Retrieval: A Multi-Style Retrieval Framework forSTEM’s Education
  503. DenseLoRA: Dense Low-Rank Adaptation of Large Language Models
  504. Exploring the Potential ofLLMs as Personalized Assistants: Dataset, Evaluation, and Analysis
  505. Cracking Factual Knowledge: A Comprehensive Analysis of Degenerate Knowledge Neurons in Large Language Models
  506. Towards Context-RobustLLMs: A Gated Representation Fine-tuning Approach
  507. On Support Samples of Next Word Prediction
  508. WebWalker: BenchmarkingLLMs in Web Traversal
  509. From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language Models
  510. AutoGUI: ScalingGUIGrounding with Automatic Functionality Annotations fromLLMs
  511. Introducing Graph Context into Language Models through Parameter-Efficient Fine-Tuning for Lexical Relation Mining
  512. S-RAG: A Novel Audit Framework for Detecting Unauthorized Use of Personal Data inRAGSystems
  513. Praetor: A Fine-Grained GenerativeLLMEvaluator with Instance-Level Customizable Evaluation Criteria
  514. Mitigating Confounding in Speech-Based Dementia Detection through Weight Masking
  515. MCS-Bench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models inChinese Classical Studies
  516. The Knowledge Microscope: Features as Better Analytical Lenses than Neurons
  517. From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding
  518. PrivaCI-Bench: Evaluating Privacy with Contextual Integrity and Legal Compliance
  519. Unveiling Environmental Impacts of Large Language Model Serving: A Functional Unit View
  520. ExpeTrans:LLMs Are Experiential Transfer Learners
  521. Cool-Fusion: Fuse Large Language Models without Training
  522. DAPEV2: Process Attention Score as Feature Map for Length Extrapolation
  523. MuSC: Improving Complex Instruction Following with Multi-granularity Self-Contrastive Training
  524. LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation
  525. APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks acrossGPUs
  526. PPT: A Minor Language News Recommendation Model via Cross-Lingual Preference Pattern Transfer
  527. GainRAG: Preference Alignment in Retrieval-Augmented Generation through Gain Signal Synthesis
  528. Top-n𝜎: Eliminating Noise in Logit Space for Robust Token Sampling ofLLM
  529. SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation
  530. Mitigating Non-Representative Prototypes and Representation Bias in Few-Shot Continual Relation Extraction
  531. MoQAE: Mixed-Precision Quantization for Long-ContextLLMInference via Mixture of Quantization-Aware Experts
  532. PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration
  533. Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models
  534. GuessArena: Guess WhoIAm? A Self-Adaptive Framework for EvaluatingLLMs in Domain-Specific Knowledge and Reasoning
  535. Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
  536. DTCRS: Dynamic Tree Construction for Recursive Summarization
  537. A Generative Adaptive Replay Continual Learning Model for Temporal Knowledge Graph Reasoning
  538. ARise: Towards Knowledge-Augmented Reasoning via Risk-Adaptive Search
  539. PKAG-DDI: Pairwise Knowledge-Augmented Language Model for Drug-Drug Interaction Event Text Generation
  540. Knowledge-Augmented Multimodal Clinical Rationale Generation for Disease Diagnosis with Small Language Models
  541. TWIST: Text-encoder Weight-editing for Inserting Secret Trojans in Text-to-Image Models
  542. Frictional Agent Alignment Framework: Slow Down and Don’t Break Things
  543. Powerformer: Efficient and High-Accuracy Privacy-Preserving Language Model with Homomorphic Encryption
  544. Beware of Your Po! Measuring and MitigatingAISafety Risks in Role-Play Fine-Tuning ofLLMs
  545. Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision?
  546. Towards Enhanced Immersion and Agency forLLM-based Interactive Drama
  547. Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic Structures
  548. Improving Factuality with Explicit Working Memory
  549. Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models
  550. Dynamic Parallel Tree Search for EfficientLLMReasoning
  551. Pre3: Enabling Deterministic Pushdown Automata for Faster StructuredLLMGeneration
  552. SHARE: AnSLM-based Hierarchical ActionCorREction Assistant for Text-to-SQL
  553. GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models
  554. Large Language and Protein Assistant for Protein-Protein Interactions Prediction
  555. An Empirical Study of Many-to-Many Summarization with Large Language Models
  556. Locate-and-Focus: Enhancing Terminology Translation in Speech Language Models
  557. GuideBench: Benchmarking Domain-Oriented Guideline Following forLLMAgents
  558. TC–RAG:Turing–CompleteRAG’s Case study on MedicalLLMSystems
  559. SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning
  560. MiniLongBench: The Low-cost Long Context Understanding Benchmark for Large Language Models
  561. Divide-Then-Align: Honest Alignment based on the Knowledge Boundary ofRAG
  562. PwnGPT: Automatic Exploit Generation Based on Large Language Models
  563. VMLUBenchmarks: A comprehensive benchmark toolkit forVietnameseLLMs
  564. Scaling up the State Size ofRNNLLMs for Long-Context Scenarios
  565. Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion Processes
  566. A Strategic Coordination Framework of SmallLMs Matches LargeLMs in Data Synthesis
  567. Defining and Evaluating Visual Language Models’ Basic Spatial Abilities: A Perspective from Psychometrics
  568. SPHERE: Unveiling Spatial Blind Spots in Vision-Language Models Through Hierarchical Evaluation
  569. User-side Model Consistency Monitoring for Open Source Large Language Models Inference Services
  570. Jailbreaking? One Step Is Enough!
  571. Parenting: Optimizing Knowledge Selection of Retrieval-Augmented Language Models with Parameter Decoupling and Tailored Tuning
  572. PaSa: AnLLMAgent for Comprehensive Academic Paper Search
  573. Less Mature is More Adaptable for Sentence-level Language Modeling
  574. EpMAN: Episodic MemoryAttentioNfor Generalizing to Longer Contexts
  575. UORA: Uniform Orthogonal Reinitialization Adaptation in Parameter Efficient Fine-Tuning of Large Models
  576. Agri-CM3: AChinese Massive Multi-modal, Multi-level Benchmark for Agricultural Understanding and Reasoning
  577. TROVE: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification
  578. CaLMQA: Exploring culturally specific long-form question answering across 23 languages
  579. Croppable Knowledge Graph Embedding
  580. HyKGE: A Hypothesis Knowledge Graph EnhancedRAGFramework for Accurate and Reliable MedicalLLMs Responses
  581. LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models
  582. BeamLoRA: Beam-Constraint Low-Rank Adaptation
  583. GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art
  584. UniLR: Unleashing the Power ofLLMs on Multiple Legal Tasks with a Unified Legal Retriever
  585. Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models
  586. Beyond Dialogue: A Profile-Dialogue Alignment Framework Towards General Role-Playing Language Model
  587. ACECODER: Acing CoderRLvia Automated Test-Case Synthesis
  588. Quantifying Semantic Emergence in Language Models
  589. DebateCoder: Towards Collective Intelligence ofLLMs via Test Case DrivenLLMDebate for Code Generation
  590. The Tug of War Within: Mitigating the Fairness-Privacy Conflicts in Large Language Models
  591. GraphInsight: Unlocking Insights in Large Language Models for Graph Structure Understanding
  592. Phonotomizer: A Compact, Unsupervised, Online Training Approach to Real-Time, Multilingual Phonetic Segmentation
  593. A Multi-persona Framework for Argument Quality Assessment
  594. Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification
  595. SAMDecoding: Speculative Decoding via Suffix Automaton
  596. PsyAdvisor: A Plug-and-Play Strategy Advice Planner with Proactive Questioning in Psychological Conversations
  597. HomeBench: EvaluatingLLMs in Smart Homes with Valid and Invalid Instructions Across Single and Multiple Devices
  598. Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
  599. GiFT:Gibbs Fine-Tuning for Code Generation
  600. Enhancing Interpretable Image Classification ThroughLLMAgents and Conditional Concept Bottleneck Models
  601. Reliably Bounding False Positives: A Zero-Shot Machine-Generated Text Detection Framework via Multiscaled Conformal Prediction
  602. RSCF: Relation-Semantics Consistent Filter for Entity Embedding of Knowledge Graph
  603. RolePlot: A Systematic Framework for Evaluating and Enhancing the Plot-Progression Capabilities of Role-Playing Agents
  604. TreeRL:LLMReinforcement Learning with On-Policy Tree Search
  605. Can a Single Model Master Both Multi-turn Conversations and Tool Use?CoALM: A Unified Conversational Agentic Language Model
  606. Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation
  607. SDPO: Segment-Level Direct Preference Optimization for Social Agents
  608. KokoroChat: AJapanese Psychological Counseling Dialogue Dataset Collected via Role-Playing by Trained Counselors
  609. SURVEYFORGE: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing
  610. MakingLLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning
  611. AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research
  612. Redundancy Principles forMLLMs Benchmarks
  613. WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models
  614. ChildMandarin: A ComprehensiveMandarin Speech Dataset for Young Children Aged 3-5
  615. Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization
  616. Enhancing Safe and Controllable Protein Generation via Knowledge Preference Optimization
  617. SINCon: MitigateLLM-Generated Malicious Message Injection Attack for Rumor Detection
  618. Agentic Knowledgeable Self-awareness
  619. A Unified Agentic Framework for Evaluating Conditional Image Generation
  620. Planning-Driven Programming: A Large Language Model Programming Workflow
  621. Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study Over Open-ended Question Answering
  622. Nudging: Inference-time Alignment ofLLMs via Guided Decoding
  623. Unveiling Attractor Cycles in Large Language Models: A Dynamical Systems View of Successive Paraphrasing
  624. SCAR: Data Selection via Style Consistency-Aware Response Ranking for Efficient Instruction-Tuning of Large Language Models
  625. HFT: Half Fine-Tuning for Large Language Models
  626. Beyond Surface Simplicity: Revealing Hidden Reasoning Attributes for Precise Commonsense Diagnosis
  627. From Objectives to Questions: A Planning-based Framework for Educational Mathematical Question Generation
  628. RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts
  629. Lost in Literalism: How Supervised Training Shapes Translationese inLLMs
  630. AccurateKVCache Quantization with Outlier Tokens Tracing
  631. Can Large Language Models UnderstandInternet Buzzwords Through User-Generated Content
  632. EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language Models
  633. Activation Steering Decoding: Mitigating Hallucination in Large Vision-Language Models through Bidirectional Hidden State Intervention
  634. Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models
  635. Improving Medical Large Vision-Language Models with Abnormal-Aware Feedback
  636. Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging
  637. MapNav: A Novel Memory Representation via Annotated Semantic Maps forVLM-based Vision-and-Language Navigation
  638. Exploring Compositional Generalization of MultimodalLLMs for Medical Imaging
  639. CLAIM: Mitigating Multilingual Object Hallucination in Large Vision-Language Models with Cross-Lingual Attention Intervention
  640. Wizard of Shopping: Target-OrientedE-commerce Dialogue Generation with Decision Tree Branching
  641. Qwen2.5-xCoder: Multi-Agent Collaboration for Multilingual Code Instruction Tuning
  642. Cultivating Gaming Sense for Yourself: MakingVLMs Gaming Experts
  643. Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
  644. Extending Complex Logical Queries on Uncertain Knowledge Graphs
  645. Knowledge Decoupling via Orthogonal Projection for Lifelong Editing of Large Language Models
  646. 𝜙-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation
  647. CanLLMWatermarks Robustly Prevent Unauthorized Knowledge Distillation?
  648. Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization
  649. Inducing lexicons of in-group language with socio-temporal context
  650. LLaSE-G1: Incentivizing Generalization Capability forLLaMA-based Speech Enhancement
  651. MadaKV: Adaptive Modality-PerceptionKVCache Eviction for Efficient Multimodal Long-Context Inference
  652. EfficientOpAmp Adaptation for Zoom Attention to Golden Contexts
  653. Language-Codec: Bridging Discrete Codec Representations and Speech Language Models
  654. Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger
  655. MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark
  656. Code-Switching Red-Teaming:LLMEvaluation for Safety and Multilingual Understanding
  657. UnleashingLLMReasoning Capability via Scalable Question Synthesis from Scratch
  658. DREsS: Dataset for Rubric-based Essay Scoring onEFLWriting
  659. PQR: Improving Dense Retrieval via Potential Query Modeling
  660. Cross-Lingual Generalization and Compression: From Language-Specific to Shared Neurons
  661. SDBench: A Survey-based Domain-specificLLMBenchmarking and Optimization Framework
  662. ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents
  663. Lexical Recall or Logical Reasoning: Probing the Limits of Reasoning Abilities in Large Language Models
  664. ChainEdit: Propagating Ripple Effects inLLMKnowledge Editing through Logical Rule-Guided Chains
  665. HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model
  666. Self-supervised Quantized Representation for Seamlessly Integrating Knowledge Graphs with Large Language Models
  667. Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State Tracking
  668. TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition
  669. CRiskEval: AChinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models
  670. STUN: Structured-Then-Unstructured Pruning for ScalableMoEPruning
  671. Mimicking the Familiar: Dynamic Command Generation for Information Theft Attacks inLLMTool-Learning System
  672. FlashAudio: Rectified Flow for Fast and High-Fidelity Text-to-Audio Generation
  673. How does Misinformation Affect Large Language Model Behaviors and Preferences?
  674. YESciEval: RobustLLM-as-a-Judge for Scientific Question Answering
  675. GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding
  676. MEDDxAgent: A Unified Modular Agent Framework for Explainable Automatic Differential Diagnosis
  677. A Training-freeLLM-based Approach to GeneralChinese Character Error Correction
  678. HSCR: Hierarchical Self-Contrastive Rewarding for Aligning Medical Vision Language Models
  679. MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
  680. SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
  681. Recent Advances in Speech Language Models: A Survey
  682. LexCLiPR: Cross-Lingual Paragraph Retrieval from Legal Judgments
  683. Multi-task Adversarial Attacks against Black-box Model with Few-shot Queries
  684. SPECTRA: Faster Large Language Model Inference with Optimized Internal and External Speculation
  685. Multi-level Association Refinement Network for Dialogue Aspect-based Sentiment Quadruple Analysis
  686. Innovative Image Fraud Detection with Cross-Sample Anomaly Analysis: The Power ofLLMs
  687. Cooperative or Competitive? Understanding the Interaction between Attention Heads From A Game Theory Perspective
  688. MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification
  689. Graph-Structured Trajectory Extraction from Travelogues
  690. Learning First-Order Logic Rules for Argumentation Mining
  691. Investigating and Enhancing the Robustness of Large Multimodal Models Against Temporal Inconsistency
  692. UniRAG: Unified Query Understanding Method for Retrieval Augmented Generation
  693. Contextual Experience Replay for Self-Improvement of Language Agents
  694. Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning
  695. Towards Comprehensive Argument Analysis in Education: Dataset, Tasks, and Method
  696. Browsing Like Human: A Multimodal Web Agent with Experiential Fast-and-Slow Thinking
  697. MaXIFE: Multilingual and Cross-lingual Instruction Following Evaluation
  698. Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning
  699. CanMLLMs Understand the Deep Implication BehindChinese Images?
  700. KazMMLU: Evaluating Language Models onKazakh,Russian, and Regional Knowledge ofKazakhstan
  701. Towards Multi-dimensional Evaluation ofLLMSummarization across Domains and Languages
  702. ClusterAttn:KVCache Compression under Intrinsic Attention Clustering
  703. SHARE: Shared Memory-Aware Open-Domain Long-Term Dialogue Dataset Constructed from Movie Script
  704. Incongruity-aware Tension Field Network for Multi-modal Sarcasm Detection
  705. Instruction Tuning on Public Government and Cultural Data for Low-Resource Language: a Case Study inKazakh
  706. Stealing Training Data from Large Language Models in Decentralized Training through Activation Inversion Attack
  707. From Selection to Generation: A Survey ofLLM-based Active Learning
  708. OmniFlatten: An End-to-endGPTModel for Seamless Voice Conversation
  709. DoMIX: An Efficient Framework for Exploiting Domain Knowledge in Fine-Tuning
  710. EAGLE: Expert-Guided Self-Enhancement for Preference Alignment in Pathology Large Vision-Language Model
  711. CoT-ICLLab: A Synthetic Framework for Studying Chain-of-Thought Learning from In-Context Demonstrations
  712. Flexora: Flexible Low-Rank Adaptation for Large Language Models
  713. QDTSynth: Quality-Driven Formal Theorem Synthesis for Enhancing Proving Performance ofLLMs
  714. RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought
  715. QAEval: Mixture of Evaluators for Question-Answering Task Evaluation
  716. Debiasing the Fine-Grained Classification Task inLLMs with Bias-AwarePEFT
  717. Demystifying Small Language Models for Edge Deployment
  718. Adapt Once, Thrive with Updates: Transferable Parameter-Efficient Fine-Tuning on Evolving Base Models
  719. Can Vision-Language Models Evaluate Handwritten Math?
  720. Continual Gradient Low-Rank Projection Fine-Tuning forLLMs
  721. Towards Objective Fine-tuning: HowLLMs’ Prior Knowledge Causes Potential Poor Calibration?
  722. Towards RobustESGAnalysis Against Greenwashing Risks: Aspect-Action Analysis with Cross-Category Generalization
  723. HiddenDetect: Detecting Jailbreak Attacks against Multimodal Large Language Models via Monitoring Hidden States
  724. SwiLTra-Bench: TheSwiss Legal Translation Benchmark
  725. Two Intermediate Translations Are Better Than One: Fine-tuningLLMs for Document-level Translation Refinement
  726. Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
  727. CanLLMs Ground when they (Don’t) Know: A Study on Direct and Loaded Political Questions
  728. GraphCheck: Breaking Long-Term Text Barriers with Extracted Knowledge Graph-Powered Fact-Checking
  729. SCULPT: Systematic Tuning of Long Prompts
  730. Crab: A Novel Configurable Role-PlayingLLMwith Assessing Benchmark
  731. ChineseSafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
  732. TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data Synthesis
  733. Cross-Lingual Optimization for Language Transfer in Large Language Models
  734. CART: A Generative Cross-Modal Retrieval Framework With Coarse-To-Fine Semantic Modeling
  735. MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
  736. Cheems: A Practical Guidance for Building and EvaluatingChinese Reward Models from Scratch
  737. Why Safeguarded Ships Run Aground? Aligned Large Language Models’ Safety Mechanisms Tend to Be Anchored in The Template Region
  738. LLaVASteering: Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering
  739. Efficient Long Context Language Model Retrieval with Compression
  740. Ontology-Guided Reverse Thinking Makes Large Language Models Stronger on Knowledge Graph Question Answering
  741. Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications
  742. Predicting Turn-Taking and Backchannel in Human-Machine Conversations Using Linguistic, Acoustic, and Visual Signals
  743. A New Formulation ofZipf’s Meaning-Frequency Law through Contextual Diversity
  744. The Mirage of Model Editing: Revisiting Evaluation in the Wild
  745. LAQuer: Localized Attribution Queries in Content-grounded Generation
  746. EPO: Explicit Policy Optimization for Strategic Reasoning inLLMs via Reinforcement Learning
  747. DCG-SQL: Enhancing In-Context Learning for Text-to-SQLwith Deep Contextual Schema Link Graph
  748. PreP-OCR: A Complete Pipeline for Document Image Restoration and EnhancedOCRAccuracy
  749. Digest the Knowledge: Large Language Models empowered Message Passing for Knowledge Graph Question Answering
  750. RecLM: Recommendation Instruction Tuning
  751. DS2-ABSA: Dual-Stream Data Synthesis with Label Refinement for Few-Shot Aspect-Based Sentiment Analysis
  752. MISP-Meeting: A Real-World Dataset with Multimodal Cues for Long-form Meeting Transcription and Summarization
  753. Learning Together to Perform Better: Teaching Small-ScaleLLMs to Collaborate via Preferential Rationale Tuning
  754. MolRAG: Unlocking the Power of Large Language Models for Molecular Property Prediction
  755. SkillAggregation: Reference-freeLLM-Dependent Aggregation
  756. MasRouter: Learning to RouteLLMs for Multi-Agent Systems
  757. Beyond Single Labels: Improving Conversational Recommendation throughLLM-Powered Data Augmentation
  758. Beyond One-Size-Fits-All: Tailored Benchmarks for Efficient Evaluation
  759. iQUEST: An Iterative Question-Guided Framework for Knowledge Base Question Answering
  760. IRT-Router: Effective and Interpretable Multi-LLMRouting via Item Response Theory
  761. MLAS-LoRA: Language-Aware Parameters Detection andLoRA-Based Knowledge Transfer for Multilingual Machine Translation
  762. M2RC-EVAL: Massively Multilingual Repository-level Code Completion Evaluation
  763. Evaluating Design Decisions for Dual Encoder-based Entity Disambiguation
  764. How to Compare Things Properly? A Study of Argument Relevance in Comparative Question Answering
  765. FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and Challenging
  766. Controllable Style Arithmetic with Language Models
  767. Masks Can be Learned as an Alternative to Experts
  768. Program Synthesis Benchmark for Visual Programming inXLogoOnline Environment
  769. Removal of Hallucination on Hallucination: Debate-AugmentedRAG
  770. CodeDPO: Aligning Code Models with Self Generated and Verified Source Code
  771. ProxAnn: Use-Oriented Evaluations of Topic Models and Document Clustering
  772. BOOKWORLD: From Novels to Interactive Agent Societies for Story Creation
  773. Quantifying Lexical Semantic Shift via Unbalanced Optimal Transport
  774. Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
  775. Adaptive and Robust Translation from Natural Language to Multi-model Query Languages
  776. SAKE: Steering Activations for Knowledge Editing
  777. Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-TunedLLMs
  778. Can External Validation Tools Improve Annotation Quality forLLM-as-a-Judge?
  779. One for All: Update Parameterized Knowledge Across Multiple Models with Once Edit
  780. VLMInferSlow: Evaluating the Efficiency Robustness of Large Vision-Language Models as a Service
  781. The Alternative Annotator Test forLLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators withLLMs
  782. CrisisTS: Coupling Social Media Textual Data and Meteorological Time Series for Urgency Classification
  783. How to Mitigate Overfitting in Weak-to-strong Generalization?
  784. Com2: A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language Models
  785. Dynamic Head Selection for Neural Lexicalized Constituency Parsing
  786. My Words Imply Your Opinion: Reader Agent-Based Propagation Enhancement for Personalized Implicit Emotion Analysis
  787. EvolveBench: A Comprehensive Benchmark for Assessing Temporal Awareness inLLMs on Evolving Knowledge
  788. EnablingLLMKnowledge Analysis via Extensive Materialization
  789. Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
  790. Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction inLLMs
  791. CritiQ: Mining Data Quality Criteria from Human Preferences
  792. Theoretical Guarantees for MinimumBayes Risk Decoding
  793. Mutual-Taught for Co-adapting Policy and Reward Models
  794. Enhancing Cross-Lingual Transfer through Reversible Transliteration: AHuffman-Based Approach for Low-Resource Languages
  795. Unmasking Style Sensitivity: A Causal Analysis of Bias Evaluation Instability in Large Language Models
  796. MockConf: A Student Interpretation Dataset: Analysis, Word- and Span-level Alignment and Baselines
  797. BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning
  798. What Matters in Evaluating Book-Length Stories? A Systematic Study of Long Story Evaluation
  799. PROPER: A Progressive Learning Framework for Personalized Large Language Models with Group-Level Adaptation
  800. Enhancing Event-centric News Cluster Summarization via Data Sharpening and Localization Insights
  801. MMBoundary: AdvancingMLLMKnowledge Boundary Awareness through Reasoning Step Confidence Calibration
  802. LIFBench: Evaluating the Instruction Following Performance and Stability of Large Language Models in Long-Context Scenarios
  803. Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering
  804. M2S: Multi-turn to Single-turn jailbreak in Red Teaming forLLMs
  805. RAEmoLLM: Retrieval AugmentedLLMs for Cross-Domain Misinformation Detection Using In-Context Learning Based on Emotional Information
  806. Task-Specific Information Decomposition for End-to-End Dense Video Captioning
  807. CalibraEval: Calibrating Prediction Distribution to Mitigate Selection Bias inLLMs-as-Judges
  808. Explaining Matters: Leveraging Definitions and Semantic Expansion for Sexism Detection
  809. Private Memorization Editing: Turning Memorization into a Defense to Strengthen Data Privacy in Large Language Models
  810. PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning
  811. Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information
  812. Velocitune: A Velocity-based Dynamic Domain Reweighting Method for Continual Pre-training
  813. Sheep’s Skin, Wolf’s Deeds: AreLLMs Ready for Metaphorical Implicit Hate Speech?
  814. Neuron-Level Sequential Editing for Large Language Models
  815. Automatic Expert Discovery inLLMUpcycling via Sparse Interpolated Mixture-of-Experts
  816. SimulS2S-LLM: Unlocking Simultaneous Inference of SpeechLLMs for Speech-to-Speech Translation
  817. VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models
  818. RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation
  819. The Role of Deductive and Inductive Reasoning in Large Language Models
  820. Disentangling the Roles of Representation and Selection in Data Pruning
  821. FRACTAL: Fine-Grained Scoring from Aggregate Text Labels
  822. ACT: Knowledgeable Agents to Design and Perform Complex Tasks
  823. Logical forms complement probability in understanding language model (and human) performance
  824. Length Controlled Generation for Black-boxLLMs
  825. Improving Contextual Faithfulness of Large Language Models via Retrieval Heads-Induced Optimization
  826. Global Eye: Breaking the “Fixed Thinking Pattern” during the Instruction Expansion Process
  827. On Synthesizing Data for Context Attribution in Question Answering
  828. TST: A Schema-Based Top-Down and Dynamic-Aware Agent of Text-to-Table Tasks
  829. EventRAG: EnhancingLLMGeneration with Event Knowledge Graphs
  830. Analyzing the Rapid Generalization ofSFTvia the Perspective of Attention Head Activation Patterns
  831. Can’t See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for MultimodalLLMs
  832. Mis-prompt: Benchmarking Large Language Models for Proactive Error Handling
  833. TripCraft: A Benchmark for Spatio-Temporally Fine Grained Travel Planning
  834. DualGuard: A Parameter Space Transformation Approach for Bidirectional Defense in Split-BasedLLMFine-Tuning
  835. Movie101v2: Improved Movie Narration Benchmark
  836. CanLLMs Evaluate Complex Attribution inQA? Automatic Benchmarking using Knowledge Graphs
  837. Value Portrait: Assessing Language Models’ Values through Psychometrically and Ecologically Valid Items
  838. FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation
  839. Do not Abstain! Identify and Solve the Uncertainty
  840. Decoding by Contrasting Knowledge: Enhancing Large Language Model Confidence on Edited Facts
  841. ImpliHateVid: A Benchmark Dataset and Two-stage Contrastive Learning Framework for Implicit Hate Speech Detection in Videos
  842. Improving Chain-of-Thought Reasoning via Quasi-Symbolic Abstractions
  843. Information Extraction from Visually Rich Documents usingLLM-based Organization of Documents into Independent Textual Segments
  844. Enhancing Open-Domain Task-Solving Capability ofLLMs via Autonomous Tool Integration fromGitHub
  845. LLMs Can Simulate Standardized Patients via Agent Coevolution
  846. Donate or Create? Comparing Data Collection Strategies for Emotion-labeled Multimodal Social Media Posts
  847. Which Demographics doLLMs Default to During Annotation?
  848. Can You Really Trust Code Copilot? Evaluating Large Language Models from a Code Security Perspective
  849. From Sub-Ability Diagnosis to Human-Aligned Generation: Bridging the Gap for Text Length Control viaMarkerGen
  850. AGD: Adversarial Game Defense Against Jailbreak Attacks in Large Language Models
  851. SCOP: Evaluating the Comprehension Process of Large Language Models from a Cognitive View
  852. Table-Critic: A Multi-Agent Framework for Collaborative Criticism and Refinement in Table Reasoning
  853. An Expanded Massive Multilingual Dataset for High-Performance Language Technologies (HPLT)
  854. Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
  855. Hierarchical Attention Generates Better Proofs
  856. Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents
  857. It’s Not Bragging If You Can Back It Up: CanLLMs Understand Braggings?
  858. A Troublemaker with Contagious Jailbreak Makes Chaos in Honest Towns
  859. Meta-Learning Neural Mechanisms rather thanBayesian Priors
  860. Shifting from Ranking to Set Selection for Retrieval Augmented Generation
  861. Understanding Large Language Model Vulnerabilities to Social Bias Attacks
  862. ChatSOP: AnSOP-GuidedMCTSPlanning Framework for ControllableLLMDialogue Agents
  863. Pixel-Level Reasoning Segmentation via Multi-turn Conversations
  864. Fixing Distribution Shifts ofLLMSelf-Critique via On-Policy Self-Play Training
  865. Inferring Functionality of Attention Heads from their Parameters
  866. Faithful and RobustLLM-Driven Theorem Proving forNLIExplanations
  867. Revealing the Deceptiveness of Knowledge Editing: A Mechanistic Analysis of Superficial Editing
  868. Masking in Multi-hopQA: An Analysis of How Language Models Perform with Context Permutation
  869. From Human Reading toNLMUnderstanding: Evaluating the Role of Eye-Tracking Data in Encoder-Based Models
  870. Optimizing Question Semantic Space for Dynamic Retrieval-Augmented Multi-hop Question Answering
  871. Insight Over Sight: Exploring the Vision-Knowledge Conflicts in MultimodalLLMs
  872. SceneGenAgent: Precise Industrial Scene Generation with Coding Agent
  873. ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language Models
  874. Enhancing Text Editing for Grammatical Error Correction:Arabic as a Case Study
  875. From Isolates to Families: Using Neural Networks for Automated Language Affiliation
  876. ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models
  877. Less, but Better: Efficient Multilingual Expansion forLLMs via Layer-wise Mixture-of-Experts
  878. When Harry Meets Superman: The Role of The Interlocutor in Persona-Based Dialogue Generation
  879. ICRProbe: Tracking Hidden State Dynamics for Reliable Hallucination Detection inLLMs
  880. Revisit Self-Debugging with Self-Generated Tests for Code Generation
  881. InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training
  882. ExploringLLMs’ Ability to Spontaneously and Conditionally Modify Moral Expressions through Text Manipulation
  883. Mixture of Ordered Scoring Experts for Cross-prompt Essay Trait Scoring
  884. Sparse Logit Sampling: Accelerating Knowledge Distillation inLLMs
  885. Enhancing Spoken Discourse Modeling in Language Models Using Gestural Cues
  886. ExploraCoder: Advancing Code Generation for Multiple UnseenAPIs via Planning and Chained Exploration
  887. Segment First or Comprehend First? Explore the Limit of Unsupervised Word Segmentation with Large Language Models
  888. RUBY: An Effective Framework for Multi-Constraint Multi-Hop Question Generation
  889. Can Indirect Prompt Injection Attacks Be Detected and Removed?
  890. Identifying Open Challenges in Language Identification
  891. The Distracting Effect: Understanding Irrelevant Passages inRAG
  892. Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining for Extremely Low-Resource Languages
  893. Graphically Speaking: Unmasking Abuse in Social Media with Conversation Insights
  894. CodeTool: Enhancing Programmatic Tool Invocation ofLLMs via Process Supervision
  895. RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models
  896. Defense Against Prompt Injection Attack by Leveraging Attack Techniques
  897. Acquisition and Application of Novel Knowledge in Large Language Models
  898. DNCASR: End-to-End Training for Speaker-AttributedASR
  899. Exploring Persona Sentiment Sensitivity in Personalized Dialogue Generation
  900. AntiLeakBench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge
  901. LLM-Guided Semantic-Aware Clustering for Topic Modeling
  902. Hierarchical Bracketing Encodings for Dependency Parsing as Tagging
  903. OASIS: Order-Augmented Strategy for Improved Code Search
  904. Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
  905. OmniAlign-V: Towards Enhanced Alignment ofMLLMs with Human Preference
  906. Tree-KG: An Expandable Knowledge Graph Construction Framework for Knowledge-intensive Domains
  907. Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric
  908. Micro-Act: Mitigate Knowledge Conflict in Question Answering via Actionable Self-Reasoning
  909. Minimal Pair-Based Evaluation of Code-Switching
  910. DNASpeech: A Contextualized and Situated Text-to-Speech Dataset with Dialogues, Narratives and Actions
  911. LLaMA-Omni 2:LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
  912. Error Comparison Optimization for Large Language Models on Aspect-Based Sentiment Analysis
  913. TheAIGap: How Socioeconomic Status Affects Language Technology Interactions
  914. ProbingLLMs for Multilingual Discourse Generalization Through a Unified Label Set
  915. Crowdsource, Crawl, or Generate? CreatingSEA-VL, a Multicultural Vision-Language Dataset forSoutheastAsia
  916. Soundwave: Less is More for Speech-Text Alignment inLLMs
  917. RoToR: Towards More Reliable Responses for Order-Invariant Inputs
  918. GlobalMMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
  919. Improving Dialogue Discourse Parsing through Discourse-aware Utterance Clarification
  920. ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging inLLMs
  921. Words of Warmth: Trust and Sociability Norms for over 26kEnglish Words
  922. BehaviorBox: Automated Discovery of Fine-Grained Performance Differences Between Language Models
  923. HAF-RM: A Hybrid Alignment Framework for Reward Model Training
  924. CULEMO: Cultural Lenses on Emotion - BenchmarkingLLMs for Cross-Cultural Emotion Understanding
  925. DiffPO: Diffusion-styled Preference Optimization for Inference Time Alignment of Large Language Models
  926. MemeQA: Holistic Evaluation for Meme Understanding
  927. LoGU: Long-form Generation with Uncertainty Expressions
  928. KiRAG: Knowledge-Driven Iterative Retriever for Enhancing Retrieval-Augmented Generation
  929. Enhancing Lexicon-Based Text Embeddings with Large Language Models
  930. CoCoLex: Confidence-guided Copy-based Decoding for Grounded Legal Text Generation
  931. Beyond N-Grams: Rethinking Evaluation Metrics and Strategies for Multilingual Abstractive Summarization
  932. CC-Tuning: A Cross-Lingual Connection Mechanism for Improving Joint Multilingual Supervised Fine-Tuning
  933. SConU: Selective Conformal Uncertainty in Large Language Models
  934. MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval
  935. WhenGPTSpills the Tea: Comprehensive Assessment of Knowledge File Leakage inGPTs
  936. UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook
  937. KERL: Knowledge-Enhanced Personalized Recipe Recommendation using Large Language Models
  938. Multilingual Arbitration: Optimizing Data Pools to Accelerate Multilingual Progress
  939. Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language Models
  940. ChineseSimpleQA: AChinese Factuality Evaluation for Large Language Models
  941. PVP: An Image Dataset for Personalized Visual Persuasion with Persuasion Strategies, Viewer Characteristics, and Persuasiveness Ratings
  942. Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval
  943. TunableLLM-based Proactive Recommendation Agent
  944. AgentRM: Enhancing Agent Generalization with Reward Modeling
  945. From Outcomes to Processes: GuidingPRMLearning fromORMfor Inference-Time Alignment
  946. Segment-Based Attention Masking forGPTs
  947. Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity
  948. Bi-Tuning with Collaborative Information for ControllableLLM-based Sequential Recommendation
  949. A Modular Approach for ClinicalSLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment
  950. DIVEintoMoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts
  951. DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression
  952. Computation Mechanism BehindLLMPosition Generalization
  953. IPO: Your Language Model is Secretly a Preference Classifier
  954. Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up
  955. Déjà Vu? Decoding Repeated Reading from Eye Movements
  956. LLMs can be easily Confused by Instructional Distractions
  957. PlanGenLLMs: A Modern Survey ofLLMPlanning Capabilities
  958. IAM: Efficient Inference through Attention Mapping between Different-scaleLLMs
  959. nvAgent: Automated Data Visualization from Natural Language via Collaborative Agent Workflow
  960. ZIPA: A family of efficient models for multilingual phone recognition
  961. GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration
  962. Dynamic Evaluation with Cognitive Reasoning for Multi-turn Safety of Large Language Models
  963. From Tools to Teammates: EvaluatingLLMs in Multi-Session Coding Interactions
  964. Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks onLLMs via Removing Superfluous Constraints
  965. Multilingual Text-to-Image Generation Magnifies Gender Stereotypes
  966. Adversarial Alignment with Anchor Dragging Drift (A3D2): Multimodal Domain Adaptation with Partially Shifted Modalities
  967. A Reality Check on Context Utilisation for Retrieval-Augmented Generation
  968. CU-MAM: Coherence-Driven Unified Macro-Structures for Argument Mining
  969. Safer or Luckier?LLMs as Safety Evaluators Are Not Robust to Artifacts
  970. Text-to-ESBench: A Comprehensive Benchmark for Converting Natural Language toElasticsearch Query
  971. AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation
  972. DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal
  973. Steering off Course: Reliability Challenges in Steering Language Models
  974. Impartial Multi-task Representation Learning via Variance-invariant Probabilistic Decoding
  975. If Eleanor Rigby Had MetChatGPT: A Study on Loneliness in a Post-LLMWorld
  976. Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization on Multi-party Conversation
  977. Vulnerability ofLLMs to Vertically Aligned Text Manipulations
  978. AutoMixer: Checkpoint Artifacts as Automatic Data Mixers
  979. Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow
  980. Beyond Prompting: An Efficient Embedding Framework for Open-Domain Question Answering
  981. AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark
  982. We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
  983. Modeling the Evolution ofEnglish Noun Compounds with Feature-Rich Diachronic Compositionality Prediction
  984. What’s the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns
  985. V-Oracle: Making Progressive Reasoning in Deciphering Oracle Bones for You and Me
  986. Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension
  987. Improving Language and Modality Transfer in Translation by Character-level Modeling
  988. DialUp! Modeling the Language Continuum by Adapting Models to Dialects and Dialects to Models
  989. AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization inLLMs
  990. Modeling Complex Semantics Relation with Contrastively Fine-Tuned Relational Encoders
  991. Error-driven Data-efficient Large Multimodal Model Tuning
  992. Planning with Diffusion Models for Target-Oriented Dialogue Systems
  993. Interactive and Expressive Code-Augmented Planning with Large Language Models
  994. Synergistic Weak-Strong Collaboration by Aligning Preferences
  995. Understanding Silent Data Corruption inLLMTraining
  996. Align-SLM: Textless Spoken Language Models with Reinforcement Learning fromAIFeedback
  997. CanLLMs Help Uncover Insights aboutLLMs? A Large-Scale, Evolving Literature Analysis of FrontierLLMs
  998. BIG5-CHAT: ShapingLLMPersonalities Through Training on Human-Grounded Data
  999. Deep Temporal Reasoning in Video Language Models: A Cross-Linguistic Evaluation of Action Duration and Completion through Perfect Times
  1000. Amplifying Trans and Nonbinary Voices: A Community-Centred Harm Taxonomy forLLMs
  1001. Enhancing Human Evaluation in Machine Translation with Comparative Judgement
  1002. Infogen: Generating Complex Statistical Infographics from Documents
  1003. Partial Colexifications Improve Concept Embeddings
  1004. Improved Unbiased Watermark for Large Language Models
  1005. MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection
  1006. Multi-Attribute Steering of Language Models via Targeted Intervention
  1007. AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations
  1008. CanLLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation onAIResearch Papers
  1009. On the Acquisition of Shared Grammatical Representations in Bilingual Language Models
  1010. Using Shapley interactions to understand how models use structure
  1011. Adversarial Tokenization
  1012. Classifying Unreliable Narrators with Large Language Models
  1013. ConceptCarve: Dynamic Realization of Evidence
  1014. QQSUM: A Novel Task and Model of Quantitative Query-Focused Summarization for Review-based Product Question Answering
  1015. Navigating Rifts in Human-LLMGrounding: Study and Benchmark
  1016. Substance over Style: Evaluating Proactive Conversational Coaching Agents
  1017. Open-World Planning via Lifted Regression withLLM-Inferred Affordances for Embodied Agents
  1018. (RSA)²: A Rhetorical-Strategy-Aware Rational Speech Act Framework for Figurative Language Understanding
  1019. SYNTHIA: Novel Concept Design with Affordance Composition
  1020. Consistent Client Simulation for Motivational Interviewing-based Counseling
  1021. AUTALIC: A Dataset for Anti-AUTistic Ableist Language In Context
  1022. Structural Reasoning Improves Molecular Understanding ofLLM
  1023. CAMI: A Counselor Agent Supporting Motivational Interviewing through State Inference and Topic Exploration
  1024. Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles
  1025. Targeted Syntactic Evaluation for Grammatical Error Correction
  1026. VF-Eval: Evaluating MultimodalLLMs for Generating Feedback onAIGCVideos
  1027. Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions
  1028. TESS2: A Large-Scale Generalist Diffusion Language Model
  1029. KatFishNet: DetectingLLM-GeneratedKorean Text through Linguistic Feature Analysis
  1030. Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL
  1031. On Generalization across Measurement Systems:LLMs Entail More Test-Time Compute for Underrepresented Cultures
  1032. CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?
  1033. Veracity Bias and Beyond: UncoveringLLMs’ Hidden Beliefs in Problem-Solving Reasoning
  1034. Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization
  1035. LLMMeets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study
  1036. Beyond Frameworks: Unpacking Collaboration Strategies in Multi-Agent Systems
  1037. The Invisible Hand: Unveiling Provider Bias in Large Language Models for Code Generation
  1038. K/DA: Automated Data Generation Pipeline for Detoxifying Implicitly Offensive Language inKorean
  1039. THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation
  1040. Neuron Empirical Gradient: Discovering and Quantifying Neurons’ Global Linear Controllability
  1041. Can Third Parties Read Our Emotions?
  1042. OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching
  1043. World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning
  1044. JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks AgainstLLMs
  1045. CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models
  1046. Neural Incompatibility: The Unbridgeable Gap of Cross-Scale Parametric Knowledge Transfer in Large Language Models
  1047. Enhancing Mathematical Reasoning inLLMs by Stepwise Correction
  1048. PsyDial: A Large-scale Long-term Conversational Dataset for Mental Health Support
  1049. Enhancing Goal-oriented Proactive Dialogue Systems via Consistency Reflection and Correction
  1050. Exclusion of Thought: Mitigating Cognitive Load in Large Language Models for Enhanced Reasoning in Multiple-Choice Tasks
  1051. Registering Source Tokens to Target Language Spaces in Multilingual Neural Machine Translation
  1052. VisuoThink: EmpoweringLVLMReasoning with Multimodal Tree Search
  1053. AutomatedCADModeling Sequence Generation from Text Descriptions via Transformer-Based Large Language Models
  1054. LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint
  1055. Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback
  1056. PerSphere: A Comprehensive Framework for Multi-Faceted Perspective Retrieval and Summarization
  1057. Prompt-Guided Internal States for Hallucination Detection of Large Language Models
  1058. Typology-Guided Adaptation in Multilingual Models
  1059. Don’t Erase, Inform! Detecting and Contextualizing Harmful Language in Cultural Heritage Collections
  1060. ECLM: Entity Level Language Model for Spoken Language Understanding with Chain of Intent
  1061. FaithfulRAG: Fact-Level Conflict Modeling for Context-Faithful Retrieval-Augmented Generation
  1062. Knowledge Image Matters: Improving Knowledge-Based Visual Reasoning with Multi-Image Large Language Models
  1063. Evaluating Personalized Tool-AugmentedLLMs from the Perspectives of Personalization and Proactivity
  1064. GUICourse: From General Vision Language Model to VersatileGUIAgent
  1065. Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLMCollaboration
  1066. Maximizing the Effectiveness of LargerBERTModels for Compression
  1067. CanLLMs Reason About Program Semantics? A Comprehensive Evaluation ofLLMs on Formal Specification Inference
  1068. HACo-Det: A Study Towards Fine-Grained Machine-Generated Text Detection under Human-AICoauthoring
  1069. IndicSynth: A Large-Scale Multilingual Synthetic Speech Dataset for Low-ResourceIndian Languages
  1070. ReinforcedIR: A Self-Boosting Framework For Domain-Adapted Information Retrieval
  1071. CoIR: A Comprehensive Benchmark for Code Information Retrieval Models
  1072. Enhancing Multimodal Retrieval via Complementary Information Extraction and Alignment
  1073. JoPA: Explaining Large Language Model’s Generation via Joint Prompt Attribution
  1074. Proxy-Driven Robust Multimodal Sentiment Analysis with Incomplete Data
  1075. Not All Terms Matter: Recall-Oriented Adaptive Learning forPLM-aided Query Expansion in Open-Domain Question Answering
  1076. A Mutual Information Perspective on Knowledge Graph Embedding
  1077. Aligned but Blind: Alignment Increases Implicit Bias by Reducing Awareness of Race
  1078. IOPO: EmpoweringLLMs with Complex Instruction Following via Input-Output Preference Optimization
  1079. ProMALex: Progressive Modular Adapters for Multi-Jurisdictional Legal Language Modeling
  1080. Flipping Knowledge Distillation: Leveraging Small Models’ Expertise to EnhanceLLMs in Text Matching
  1081. Disentangling Language and Culture for Evaluating Multilingual Large Language Models
  1082. Detecting Sockpuppetry onWikipedia Using Meta-Learning
  1083. Diversity-oriented Data Augmentation with Large Language Models
  1084. CoreEval: Automatically Building Contamination-Resilient Datasets with Real-World Knowledge toward ReliableLLMEvaluation
  1085. RiOT: Efficient Prompt Refinement with Residual Optimization Tree
  1086. Caution for the Environment: MultimodalLLMAgents are Susceptible to Environmental Distractions
  1087. Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark
  1088. Mitigating Lost-in-Retrieval Problems in Retrieval Augmented Multi-Hop Question Answering
  1089. TableLoRA: Low-rank Adaptation on Table Structure Understanding for Large Language Models
  1090. Condor: EnhanceLLMAlignment with Knowledge-Driven Data Synthesis and Refinement
  1091. CulFiT: A Fine-grained Cultural-awareLLMTraining Paradigm via Multilingual Critique Data Synthesis
  1092. Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis
  1093. ChartLens: Fine-grained Visual Attribution in Charts
  1094. LESA: LearnableLLMLayer Scaling-Up
  1095. MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation
  1096. Towards the Law of Capacity Gap in Distilling Language Models
  1097. WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning
  1098. Keys to Robust Edits: From Theoretical Insights to Practical Advances
  1099. BoostingLLM’s Molecular Structure Elucidation with Knowledge Enhanced Tree Search Reasoning
  1100. MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation
  1101. The Role of Visual Modality in Multimodal Mathematical Reasoning: Challenges and Insights
  1102. The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters
  1103. S2R: TeachingLLMs to Self-verify and Self-correct via Reinforcement Learning
  1104. Advancing Collaborative Debates with Role Differentiation through Multi-Agent Reinforcement Learning
  1105. Retrieval-Augmented Fine-Tuning With Preference Optimization For Visual Program Generation
  1106. STRICTA: Structured Reasoning in Critical Text Assessment for Peer Review and Beyond
  1107. XDAC:XAI-Driven Detection and Attribution ofLLM-Generated News Comments inKorean
  1108. CENTAUR: Bridging the Impossible Trinity of Privacy, Efficiency, and Performance in Privacy-Preserving Transformer Inference
  1109. Silencing Empowerment, Allowing Bigotry: Auditing the Moderation of Hate Speech on Twitch
  1110. EdiText: Controllable Coarse-to-Fine Text Editing with Diffusion Language Models
  1111. TUMLU: A Unified and Native Language Understanding Benchmark forTurkic Languages
  1112. Look Both Ways and No Sink: ConvertingLLMs into Text Encoders without Training
  1113. A Statistical and Multi-Perspective Revisiting of the Membership Inference Attack in Large Language Models
  1114. Around the World in 24 Hours: ProbingLLMKnowledge of Time and Place
  1115. Mining the uncertainty patterns of humans and models in the annotation of moral foundations and human values
  1116. “What do you call a dog that is incontrovertibly true? Dogma”: TestingLLMGeneralization through Humor
  1117. Towards Harmonized Uncertainty Estimation for Large Language Models
  1118. VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare
  1119. Are We in theAI-Generated Text World Already? Quantifying and MonitoringAIGTon Social Media
  1120. FromEnglish to Second Language Mastery: EnhancingLLMs with Cross-Lingual Continued Instruction Tuning
  1121. WET: Overcoming Paraphrasing Vulnerabilities in Embeddings-as-a-Service with Linear Transformation Watermarks
  1122. HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation
  1123. OneQuantLLMforALL: Fine-tuning QuantizedLLMs Once for Efficient Deployments
  1124. Beyond Logits: Aligning Feature Dynamics for Effective Knowledge Distillation
  1125. Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
  1126. DRAE: Dynamic Retrieval-Augmented Expert Networks for Lifelong Learning and Task Adaptation in Robotics
  1127. MT-RAIG: Novel Benchmark and Evaluation Framework for Retrieval-Augmented Insight Generation over Multiple Tables
  1128. Enhancing Chain-of-Thought Reasoning with Critical Representation Fine-tuning
  1129. Does the Emotional Understanding ofLVLMs Vary Under High-Stress Environments and Across Different Demographic Attributes?
  1130. S2WTM: Spherical Sliced-Wasserstein Autoencoder for Topic Modeling
  1131. Learning to Look at the Other Side: A Semantic Probing Study of Word Embeddings inLLMs with Enabled Bidirectional Attention
  1132. Tracing and Dissecting HowLLMs Recall Factual Knowledge for Real World Questions
  1133. Employing Discourse Coherence Enhancement to Improve Cross-Document Event and Entity Coreference Resolution
  1134. Data Whisperer: Efficient Data Selection for Task-SpecificLLMFine-Tuning via Few-Shot In-Context Learning
  1135. Synthesizing Post-Training Data forLLMs through Multi-Agent Simulation
  1136. SoftCoT: Soft Chain-of-Thought for Efficient Reasoning withLLMs
  1137. FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning
  1138. Beyond Prompt Engineering: Robust Behavior Control inLLMs via Steering Target Atoms
  1139. MobiLoRA: AcceleratingLoRA-basedLLMInference on Mobile Devices via Context-awareKVCache Optimization
  1140. Language Models Resist Alignment: Evidence From Data Compression
  1141. Beyond the Answer: Advancing Multi-HopQAwith Fine-Grained Graph Reasoning and Evaluation
  1142. Mamba Knockout for Unraveling Factual Information Flow
  1143. Small Changes, Big Impact: How Manipulating a Few Neurons Can Drastically AlterLLMAggression
  1144. Marco-o1 v2: Towards Widening The Distillation Bottleneck for Reasoning Models
  1145. Curiosity-Driven Reinforcement Learning from Human Feedback
  1146. T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grainedAIFeedback
  1147. CoE: A Clue of Emotion Framework for Emotion Recognition in Conversations
  1148. MPO: Multilingual Safety Alignment via Reward Gap Optimization
  1149. QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
  1150. On the Relation Between Fine-Tuning, Topological Properties, and Task Performance in Sense-Enhanced Embeddings
  1151. Finding Needles in Images: Can Multi-modalLLMs Locate Fine Details?
  1152. Don’t Half-listen: Capturing Key-part Information in Continual Instruction Tuning
  1153. Generating Plausible Distractors for Multiple-Choice Questions via Student Choice Prediction
  1154. Exploring Explanations Improves the Robustness of In-Context Learning
  1155. Prediction Hubs are Context-Informed Frequent Tokens inLLMs
  1156. Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling Law
  1157. CRUXEVAL-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution
  1158. Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs
  1159. Rubrik’s Cube: Testing a New Rubric for Evaluating Explanations on theCUBEdataset
  1160. A Dual-Mind Framework for Strategic and Expressive Negotiation Agent
  1161. Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language Models
  1162. Revisiting Scaling Laws for Language Models: The Role of Data Quality and Training Strategies
  1163. Limited Generalizability in Argument Mining: State-Of-The-Art Models Learn Datasets, Not Arguments
  1164. Enhancing Machine Translation with Self-Supervised Preference Data
  1165. Unveil: Unified Visual-Textual Integration and Distillation for Multi-modal Document Retrieval
  1166. Don’t Get Lost in the Trees: StreamliningLLMReasoning by Overcoming Tree Search Exploration Pitfalls
  1167. MEXMA: Token-level objectives improve sentence representations
  1168. Uncertainty-Aware Iterative Preference Optimization for EnhancedLLMReasoning
  1169. AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-PerformanceLLM-Based Multi-Agent Collaboration
  1170. Towards Dynamic Theory of Mind: EvaluatingLLMAdaptation to Temporal Evolution of Human States
  1171. Marco-Bench-MIF: On Multilingual Instruction-Following Capability of Large Language
  1172. Representation Bending for Large Language Model Safety
  1173. AnalyzingLLMs’ Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations
  1174. Enhancing Retrieval-Augmented Generation via Evidence Tree Search
  1175. HalluLens:LLMHallucination Benchmark
  1176. DEEPERInsight into Your User: Directed Persona Refinement for Dynamic Persona Modeling
  1177. Asclepius: A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models
  1178. InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning
  1179. GRaMPa: Subword Regularisation by Skewing Uniform Segmentation Distributions with an Efficient Path-countingMarkov Model
  1180. Evaluating the Evaluation of Diversity in Commonsense Generation
  1181. Generate First, Then Sample: Enhancing Fake News Detection withLLM-Augmented Reinforced Sampling
  1182. ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions withLLM-Generated Data
  1183. Towards Fully ExploitingLLMInternal States to Enhance Knowledge Boundary Perception
  1184. ALGEN: Few-shot Inversion Attacks on Textual Embeddings via Cross-Model Alignment and Generation
  1185. Decoding on Graphs: Faithful and Sound Reasoning on Knowledge Graphs through Generation of Well-Formed Chains
  1186. STaR-SQL: Self-Taught Reasoner for Text-to-SQL
  1187. Fairness Beyond Performance: Revealing Reliability Disparities Across Groups in LegalNLP
  1188. Beyond Similarity: A Gradient-based Graph Method for Instruction Tuning Data Selection
  1189. FastMCTS: A Simple Sampling Strategy for Data Synthesis
  1190. Dialogue-RAG: Enhancing Retrieval forLLMs via Node-Linking Utterance Rewriting
  1191. Using Information Theory to Characterize Prosodic Typology: The Case of Tone, Pitch-Accent and Stress-Accent
  1192. EvaluatingLLMs forPortuguese Sentence Simplification with Linguistic Insights
  1193. LaTIM: Measuring Latent Token-to-Token Interactions in Mamba Models
  1194. Improving Low-Resource Morphological Inflection via Self-Supervised Objectives
  1195. Don’t Reinvent the Wheel: Efficient Instruction-Following Text Embedding based on Guided Space Transformation
  1196. BOOKCOREF: Coreference Resolution at Book Scale
  1197. OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval
  1198. Alleviating Hallucinations from Knowledge Misalignment in Large Language Models via Selective Abstention Learning
  1199. Retrospective Learning from Interactions
  1200. Personalized Generation In Large Model Era: A Survey
  1201. Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to EnhanceLLMReasoning
  1202. SOTOPIA-Ω: Dynamic Strategy Injection Learning and Social Instruction Following Evaluation for Social Agents
  1203. Can Language Models Replace Programmers for Coding?REPOCODSays ‘Not Yet’
  1204. Leveraging In-Context Learning for Political Bias Testing ofLLMs
  1205. ACORD: An Expert-Annotated Retrieval Dataset for Legal Contract Drafting
  1206. LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts
  1207. WAFFLE: Fine-tuning Multi-Modal Model for Automated Front-End Development
  1208. Math Neurosurgery: Isolating Language Models’ Math Reasoning Abilities Using Only Forward Passes
  1209. MultipleLLMAgents Debate for Equitable Cultural Alignment
  1210. RefreshKV: Updating SmallKVCache During Long-form Generation
  1211. SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings
  1212. Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective
  1213. Language Models Grow Less Humanlike beyond Phase Transition
  1214. PCoT: Persuasion-Augmented Chain of Thought for Detecting Fake News and Social Media Disinformation
  1215. Coordinating Chaos: A Structured Review of Linguistic Coordination Methodologies
  1216. iNews: A Multimodal Dataset for Modeling Personalized Affective Responses to News
  1217. Mind the Gesture: EvaluatingAISensitivity to Culturally Offensive Non-Verbal Gestures
  1218. 500xCompressor: Generalized Prompt Compression for Large Language Models
  1219. Estimating Privacy Leakage of Augmented Contextual Knowledge in Language Models
  1220. Document-Level Event-Argument Data Augmentation for Challenging Role Types
  1221. Mapping the Podcast Ecosystem with the Structured Podcast Research Corpus
  1222. Unravelling the Logic: Investigating the Generalisation of Transformers in Numerical Satisfiability Problems
  1223. The Nature ofNLP: Analyzing Contributions inNLPPapers
  1224. GeLLM³O: Generalizing Large Language Models for Multi-property Molecule Optimization
  1225. Follow-up Question Generation For Enhanced Patient-Provider Conversations
  1226. Unveiling Privacy Risks inLLMAgent Memory
  1227. Watching the Watchers: Exposing Gender Disparities in Machine Translation Quality Estimation
  1228. Language Constrained Multimodal Hyper Adapter For Many-to-Many Multimodal Summarization
  1229. PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
  1230. Efficient Ensemble for Fine-tuning Language Models on Multiple Datasets
  1231. Library-Like Behavior In Language Models is Enhanced by Self-Referencing Causal Cycles
  1232. Shaping the Safety Boundaries: Understanding and Defending Against Jailbreaks in Large Language Models
  1233. ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution
  1234. ReflectDiffu: Reflect between Emotion-intent Contagion and Mimicry for Empathetic Response Generation via aRL-Diffusion Framework
  1235. SARA: Salience-Aware Reinforced Adaptive Decoding for Large Language Models in Abstractive Summarization
  1236. Embedding-Converter: A Unified Framework for Cross-Model Embedding Transformation
  1237. Improving Automatic Evaluation of Large Language Models (LLMs) in Biomedical Relation Extraction viaLLMs-as-the-Judge
  1238. Answering Complex Geographic Questions by Adaptive Reasoning with Visual Context and External Commonsense Knowledge
  1239. Safety Alignment via Constrained Knowledge Unlearning
  1240. Response Wide Shut? Surprising Observations in Basic Vision Language Model Capabilities
  1241. EffiVLM-BENCH: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Vision-Language Models
  1242. Pre-Training Curriculum for Multi-Token Prediction in Language Models
  1243. Can We Further Elicit Reasoning inLLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks
  1244. On Many-Shot In-Context Learning for Long-Context Evaluation
  1245. HelpSteer3: Human-Annotated Feedback and Edit Data to Empower Inference-Time Scaling in Open-Ended General-Domain Tasks
  1246. CulturalBench: A Robust, Diverse and Challenging Benchmark for MeasuringLMs’ Cultural Knowledge Through Human-AIRed-Teaming
  1247. Balancing the Budget: Understanding Trade-offs Between Supervised and Preference-Based Finetuning
  1248. All That Glitters is Not Novel: Plagiarism inAIGenerated Research
  1249. Writing Like the Best: Exemplar-Based Expository Text Generation
  1250. Temporal Relation Extraction in Clinical Texts: A Span-based Graph Transformer Approach
  1251. Finding A Voice: Exploring the Potential ofAfricanAmerican Dialect and Voice Generation for Chatbots
  1252. Delta-KNN: Improving Demonstration Selection in In-Context Learning forAlzheimer’s Disease Detection
  1253. Help Me Write a Story: EvaluatingLLMs’ Ability to Generate Writing Feedback
  1254. Language Fusion for Parameter-Efficient Cross-lingual Transfer
  1255. Culture is Not Trivia: Sociocultural Theory for CulturalNLP
  1256. AAD-LLM: Neural Attention-Driven Auditory Scene Understanding
  1257. Do Language Models Have Semantics? On the Five Standard Positions
  1258. Dehumanizing Machines: Mitigating Anthropomorphic Behaviors in Text Generation Systems
  1259. Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users
  1260. HumTDumT: Measuring and controlling human-like language inLLMs
  1261. ChatBench: From Static Benchmarks to Human-AIEvaluation
  1262. Teaching an OldLLMSecure Coding: Localized Preference Optimization on Distilled Preferences
  1263. Anything Goes? A Crosslinguistic Study of (Im)possible Language Learning inLMs
  1264. Ranking Unraveled: Recipes forLLMRankings in Head-to-HeadAICombat
  1265. LLMAgents Making Agent Tools
  1266. CrafText Benchmark: Advancing Instruction Following in Complex Multimodal Open-Ended World
  1267. QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation
  1268. Causal Graph based Event Reasoning using Semantic Relation Experts
  1269. LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning
  1270. DoLLMs Understand Dialogues? A Case Study on Dialogue Acts
  1271. Research Borderlands: Analysing Writing Across Research Cultures
  1272. CEAES: Bidirectional Reinforcement Learning Optimization for Consistent and Explainable Essay Assessment
  1273. DeAL: Decoding-time Alignment for Large Language Models
  1274. Cultural Bias Matters: A Cross-Cultural Benchmark Dataset and Sentiment-Enriched Model for Understanding Multimodal Metaphors
  1275. OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
  1276. Mixtures of In-Context Learners
  1277. Balancing Diversity and Risk inLLMSampling: How to Select Your Method and Parameter for Open-Ended Text Generation
  1278. RADAR: Enhancing Radiology Report Generation with Supplementary Knowledge Injection
  1279. CanLLMs DeceiveCLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates
  1280. Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models
  1281. MTSA: Multi-turn Safety Alignment forLLMs through Multi-round Red-teaming
  1282. The Efficiency vs. Accuracy Trade-off: OptimizingRAG-EnhancedLLMRecommender Systems Using Multi-Head Early Exit
  1283. UnravelingLoRAInterference: Orthogonal Subspaces for Robust Model Merging
  1284. BIG-Bench Extra Hard
  1285. CSTree-SRI: Introspection-Driven Cognitive Semantic Tree for Multi-Turn Question Answering over Extra-Long Contexts
  1286. InductionBench:LLMs Fail in the Simplest Complexity Class
  1287. RATIONALYST: Pre-training Process-Supervision for Improving Reasoning
  1288. Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation
  1289. AdvancingSMoEfor Continuous Domain Adaptation ofMLLMs: Adaptive Router and Domain-Specific Loss
  1290. Multi-document Summarization through Multi-document Event Relation Graph Reasoning inLLMs: a case study in Framing Bias Mitigation
  1291. Who Writes What: Unveiling the Impact of Author Roles onAI-generated Text Detection
  1292. RoCoFT: Efficient Finetuning of Large Language Models with Row-Column Updates
  1293. Scaling Laws and Efficient Inference for Ternary Language Models
  1294. Exploring the Impact of Instruction-Tuning onLLM’s Susceptibility to Misinformation
  1295. Do Language Models Understand Honorific Systems inJavanese?
  1296. Generative Reward Modeling via Synthetic Criteria Preference Learning
  1297. Exploring Multimodal Relation Extraction of Hierarchical Tabular Data with Multi-task Learning
  1298. A Self-Denoising Model for Robust Few-Shot Relation Extraction
  1299. QuASAR: A Question-Driven Structure-Aware Approach for Table-to-Text Generation
  1300. Automated Structured Radiology Report Generation
  1301. LPOI: Listwise Preference Optimization for Vision Language Models
  1302. Predicting Through Generation: Why Generation Is Better for Prediction
  1303. “Give MeBF16 or Give Me Death”? Accuracy-Performance Trade-Offs inLLMQuantization
  1304. StitchLLM: ServingLLMs, One Block at a Time
  1305. Walk in Others’ Shoes with a Single Glance: Human-Centric Visual Grounding with Top-View Perspective Transformation
  1306. Is linguistically-motivated data augmentation worth it?
  1307. From Lists to Emojis: How Format Bias Affects Model Alignment
  1308. Colloquial SingaporeanEnglish Style Transfer with Fine-Grained Explainable Control
  1309. From Informal to Formal – Incorporating and EvaluatingLLMs on Natural Language Requirements to Verifiable Formal Proofs
  1310. CoAM: Corpus of All-Type Multiword Expressions
  1311. SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation
  1312. Exposing the Achilles’ Heel: EvaluatingLLMs Ability to Handle Mistakes in Mathematical Reasoning
  1313. Understanding the Dark Side ofLLMs’ Intrinsic Self-Correction
  1314. VideoVista-CulturalLingo: 360° Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension
  1315. What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices
  1316. Knowledge Graph Retrieval-Augmented Generation forLLM-based Recommendation
  1317. SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment
  1318. I0T: Embedding Standardization Method Towards Zero Modality Gap
  1319. Odysseus Navigates the Sirens’ Song: Dynamic Focus Decoding for Factual and Diverse Open-Ended Text Generation
  1320. Better Embeddings with CoupledAdam
  1321. Bone Soups: A Seek-and-Soup Model Merging Approach for Controllable Multi-Objective Generation
  1322. Controllable and Reliable Knowledge-Intensive Task-Oriented Conversational Agents with Declarative Genie Worksheets
  1323. Benchmarking Long-Context Language Models on Long Code Understanding
  1324. MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities
  1325. Internal Value Alignment in Large Language Models through Controlled Value Vector Activation
  1326. A Dual-PerspectiveNLGMeta-Evaluation Framework with Automatic Benchmark and Better Interpretability
  1327. Recurrent Knowledge Identification and Fusion for Language Model Continual Learning
  1328. Data-Constrained Synthesis of Training Data for De-Identification
  1329. Just a Scratch: EnhancingLLMCapabilities for Self-harm Detection through Intent Differentiation and Emoji Interpretation
  1330. Contrastive Learning onLLMBack Generation Treebank for Cross-domain Constituency Parsing
  1331. MMDEND: Dendrite-Inspired Multi-Branch Multi-Compartment Parallel Spiking Neuron for Sequence Modeling
  1332. Understanding Impact of Human Feedback via Influence Functions
  1333. T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts
  1334. InspireDebate: Multi-Dimensional Subjective-Objective Evaluation-Guided Reasoning and Optimization for Debating
  1335. OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization
  1336. FOCUS: Evaluating Pre-trained Vision-Language Models on Underspecification Reasoning
  1337. Sightation Counts: Leveraging Sighted User Feedback in Building aBLV-aligned Dataset of Diagram Descriptions
  1338. Personal Travel Solver: A Preference-DrivenLLM-Solver System for Travel Planning
  1339. Counterspeech the ultimate shield! Multi-Conditioned Counterspeech Generation through Attributed Prefix Learning
  1340. LLM×MapReduce: Simplified Long-Sequence Processing using Large Language Models
  1341. CheXalign: Preference fine-tuning in chestX-ray interpretation models without human feedback
  1342. Knowledge Tracing in Programming Education Integrating Students’ Questions
  1343. PRISM: A Framework for Producing Interpretable Political Bias Embeddings with Political-Aware Cross-Encoder
  1344. Representations of Fact, Fiction and Forecast in Large Language Models: Epistemics and Attitudes
  1345. Lexical Diversity-aware Relevance Assessment for Retrieval-Augmented Generation
  1346. Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
  1347. Online Iterative Self-Alignment for Radiology Report Generation
  1348. Chinese InertialGANfor Handwriting Signal Generation and Recognition
  1349. LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges
  1350. Evaluating Sequence Labeling on the basis of Information Theory
  1351. GRAT: Guiding Retrieval-Augmented Reasoning through Process Rewards Tree Search
  1352. T-REG: Preference Optimization with Token-Level Reward Regularization
  1353. Gödel Agent: A Self-Referential Agent Framework for Recursively Self-Improvement
  1354. AgentGym: Evaluating and Training Large Language Model-based Agents across Diverse Environments
  1355. Rethinking the Role of Prompting Strategies inLLMTest-Time Scaling: A Perspective of Probability Theory
  1356. Information Locality as an Inductive Bias for Neural Language Models
  1357. Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models
  1358. Query-driven Document-level Scientific Evidence Extraction from Biomedical Studies
  1359. Towards Robust Universal Information Extraction: Dataset, Evaluation, and Solution
  1360. Multi-perspective Alignment for Increasing Naturalness in Neural Machine Translation
  1361. Temporal reasoning for timeline summarisation in social media
  1362. Beyond Negative Stereotypes – Non-Negative Abusive Utterances about Identity Groups and Their Semantic Variants
  1363. Persistent Homology of Topic Networks for the Prediction of Reader Curiosity
  1364. Tokenisation isNP-Complete
  1365. Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning
  1366. Parameter-Aware Contrastive Knowledge Editing: Tracing and Rectifying based on Critical Transmission Paths
  1367. Many Heads Are Better Than One: Improved Scientific Idea Generation by ALLM-Based Multi-Agent System
  1368. Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking
  1369. Document-Level Text Generation with MinimumBayes Risk Decoding using Optimal Transport
  1370. Opt-Out: Investigating Entity-Level Unlearning for Large Language Models via Optimal Transport
  1371. Mixture of Small and Large Models forChinese Spelling Check
  1372. DISC: Plug-and-Play Decoding Intervention with Similarity of Characters forChinese Spelling Check
  1373. Causal Estimation of Tokenisation Bias
  1374. Value Residual Learning
  1375. SGIC: A Self-Guided Iterative Calibration Framework forRAG
  1376. NusaAksara: A Multimodal and Multilingual Benchmark for PreservingIndonesian Indigenous Scripts
  1377. LLM-based Rumor Detection via Influence Guided Sample Selection and Game-based Perspective Analysis
  1378. Hierarchical-Task-Aware Multi-modal Mixture of IncrementalLoRAExperts for Embodied Continual Learning
  1379. SpindleKV: A NovelKVCache Reduction Method Balancing Both Shallow and Deep Layers
  1380. Medical GraphRAG: Evidence-based Medical Large Language Model via Graph Retrieval-Augmented Generation
  1381. Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models
  1382. Agentic Reasoning: A Streamlined Framework for EnhancingLLMReasoning with Agentic Tools
  1383. Probing Relative Interaction and Dynamic Calibration in Multi-modal Entity Alignment
  1384. Learn to Memorize: Scalable Continual Learning in Semiparametric Models with Mixture-of-Neighbors Induction Memory
  1385. Adverse Event Extraction from Discharge Summaries: A New Dataset, Annotation Scheme, and Initial Findings
  1386. Speed Up Your Code: Progressive Code Acceleration Through Bidirectional Tree Editing
  1387. Multi-Facet Blending for Faceted Query-by-Example Retrieval
  1388. PIPER: Benchmarking and Prompting Event Reasoning Boundary ofLLMs via Debiasing-Distillation Enhanced Tuning
  1389. MIR: Methodology Inspiration Retrieval for Scientific Research Problems
  1390. Sticking to the Mean: Detecting Sticky Tokens in Text Embedding Models
  1391. Memorizing is Not Enough: Deep Knowledge Injection Through Reasoning
  1392. Improving Dialogue State Tracking through Combinatorial Search for In-Context Examples
  1393. Pretraining Context Compressor for Large Language Models with Embedding-Based Memory
  1394. Dialogue Systems for Emotional Support via Value Reinforcement
  1395. Length-Induced Embedding Collapse inPLM-based Models
  1396. SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction
  1397. ERU-KG: Efficient Reference-aligned Unsupervised Keyphrase Generation
  1398. Know Your Mistakes: Towards Preventing Overreliance on Task-Oriented ConversationalAIThrough Accountability Modeling
  1399. LLMs Trust Humans More, That’s a Problem! Unveiling and Mitigating the Authority Bias in Retrieval-Augmented Generation
  1400. Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool Invocation
  1401. Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration
  1402. PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment
  1403. Robust Utility-Preserving Text Anonymization Based on Large Language Models
  1404. SEAL: Scaling to Emphasize Attention for Long-Context Retrieval
  1405. From Neurons to Semantics: Evaluating Cross-Linguistic Alignment Capabilities of Large Language Models via Neurons Alignment
  1406. 𝒜3: Automatic Alignment Framework for Attributed Text Generation
  1407. Towards Better Value Principles for Large Language Model Alignment: A Systematic Evaluation and Enhancement
  1408. Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
  1409. Diversity Explains Inference Scaling Laws: Through a Case Study of MinimumBayes Risk Decoding
  1410. Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
  1411. SDD: Self-Degraded Defense against Malicious Fine-tuning
  1412. CoachMe: Decoding Sport Elements with a Reference-Based Coaching Instruction Generation Model
  1413. DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization
  1414. HowLLMs Comprehend Temporal Meaning in Narratives: A Case Study in Cognitive Evaluation ofLLMs
  1415. Data Caricatures: On the Representation ofAfricanAmerican Language in Pretraining Corpora
  1416. Language Model Probabilities areNotCalibrated in Numeric Contexts
  1417. MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
  1418. Cross-Lingual Auto Evaluation for Assessing MultilingualLLMs
  1419. DeepReview: ImprovingLLM-based Paper Review with Human-like Deep Thinking Process
  1420. Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient
  1421. Tree-of-Debate: Multi-Persona Debate Trees Elicit Critical Thinking for Scientific Comparative Analysis
  1422. Hierarchical Memory Organization forWikipedia Generation
  1423. Class Distillation with Mahalanobis Contrast: An Efficient Training Paradigm for Pragmatic Language Understanding Tasks
  1424. Structure-aware Domain Knowledge Injection for Large Language Models
  1425. FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
  1426. Dialectal Coverage And Generalization inArabic Speech Recognition
  1427. EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits
  1428. ReconsideringLLMUncertainty Estimation Methods in the Wild
  1429. Bregman Conditional Random Fields: Sequence Labeling with Parallelizable Inference Algorithms
  1430. SEE: Strategic Exploration and Exploitation for Cohesive In-Context Prompt Optimization
  1431. Programming by Example meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction
  1432. Synergizing Unsupervised Episode Detection withLLMs for Large-Scale News Events
  1433. Beyond True or False: Retrieval-Augmented Hierarchical Analysis of Nuanced Claims
  1434. The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection inLLMAgents
  1435. Sandcastles in the Storm: Revisiting the (Im)possibility of Strong Watermarking
  1436. Time-MQA: Time Series Multi-Task Question Answering with Context Enhancement
  1437. From Perceptions to Decisions: Wildfire Evacuation Decision Prediction with Behavioral Theory-informedLLMs
  1438. GETReason: Enhancing Image Context Extraction through Hierarchical Multi-Agent Reasoning
  1439. Hanging in the Balance: Pivotal Moments in Crisis Counseling Conversations
  1440. Unveiling the Potential ofBERT-family: A New Recipe for Building Scalable, General and Competitive Large Language Models
  1441. TaxoAdapt: AligningLLM-Based Multidimensional Taxonomy Construction to Evolving Research Corpora
  1442. An Empirical Study of Iterative Refinements for Non-autoregressive Translation
  1443. Retrofitting Large Language Models with Dynamic Tokenization
  1444. Principled Content Selection to Generate Diverse and Personalized Multi-Document Summaries
  1445. Bilingual Zero-Shot Stance Detection
  1446. GrammaMT: Improving Machine Translation with Grammar-Informed In-Context Learning
  1447. Theorem Prover as a Judge for Synthetic Data Generation
  1448. Measuring the Effect of Transcription Noise on Downstream Language Understanding Tasks
  1449. Assessing Reliability and Political Bias InLLMs’ Judgements of Formal and Material Inferences With Partisan Conclusions
  1450. PARME: Parallel Corpora for Low-ResourcedMiddleEastern Languages
  1451. METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling
  1452. ConLoan: A Contrastive Multilingual Dataset for Evaluating Loanwords
  1453. A Theory of Response Sampling inLLMs: Part Descriptive and Part Prescriptive
  1454. MEraser: An Effective Fingerprint Erasure Approach for Large Language Models
  1455. VISA: Retrieval Augmented Generation with Visual Source Attribution
  1456. DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers
  1457. Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization inLLMs
  1458. MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning
  1459. Map&Make: Schema Guided Text to Table Generation
  1460. IRIS: Interpretable Retrieval-Augmented Classification for Long Interspersed Document Sequences
  1461. Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
  1462. Can we Retrieve Everything All at Once?ARM: An Alignment-OrientedLLM-based Retrieval Method
  1463. R2D2: Remembering, Replaying and Dynamic Decision Making with a Reflective Agentic Memory
  1464. FairITales: Evaluation of Fairness inIndian Contexts with a Focus on Bias and Stereotypes
  1465. SpeechIQ: Speech-Agentic Intelligence Quotient Across Cognitive Levels in Voice Understanding by Large Language Models
  1466. Predicting Implicit Arguments in Procedural Video Instructions
  1467. PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free
  1468. CLIPErase: Efficient Unlearning of Visual-Textual Associations inCLIP
  1469. ViGiL3D: A Linguistically Diverse Dataset for 3DVisual Grounding
  1470. The time scale of redundancy between prosody and linguistic context
  1471. Basic Reading Distillation
  1472. Quantized Can Still Be Calibrated: A Unified Framework to Calibration in Quantized Large Language Models
  1473. A Spatio-Temporal Point Process for Fine-Grained Modeling of Reading Behavior
  1474. More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives
  1475. AstuteRAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models
  1476. SubLIME: Subset Selection via Rank Correlation Prediction for Data-EfficientLLMEvaluation
  1477. M³GQA: A Multi-Entity Multi-Hop Multi-Setting Graph Question Answering Benchmark
  1478. LSSF: Safety Alignment for Large Language Models through Low-Rank Safety Subspace Fusion
  1479. ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries
  1480. Meta-Tool: Unleash Open-World Function Calling Capabilities of General-Purpose Large Language Models
  1481. Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning
  1482. ISR: Self-Refining Referring Expressions for Entity Grounding
  1483. Activating Distributed Visual Region withinLLMs for Efficient and Effective Vision-Language Training and Inference
  1484. CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models
  1485. TestNUC: Enhancing Test-Time Computing Approaches and Scaling through Neighboring Unlabeled Data Consistency
  1486. The Esethu Framework: Reimagining Sustainable Dataset Governance and Curation for Low-Resource Languages
  1487. Theoretical Analysis of Hierarchical Language Recognition and Generation by Transformers without Positional Encoding
  1488. Less is More: Explainable and EfficientICDCode Prediction with Clinical Entities
  1489. BenchmarkingLLMs andLLM-based Agents in Practical Vulnerability Detection for Code Repositories
  1490. Multi-Modality Expansion and Retention forLLMs through Parameter Merging and Decoupling
  1491. Serial Lifelong Editing via Mixture of Knowledge Experts
  1492. A Survey on Efficient Large Language Model Training: From Data-centric Perspectives
  1493. IMOL: Incomplete-Modality-Tolerant Learning for Multi-Domain Fake News Video Detection
  1494. DDxTutor: Clinical Reasoning Tutoring System with Differential Diagnosis-Based Structured Reasoning
  1495. SocialEval: Evaluating Social Intelligence of Large Language Models
  1496. Hidden in Plain Sight: Evaluation of the Deception Detection Capabilities ofLLMs in Multimodal Settings
  1497. PlanningArena: A Modular Benchmark for Multidimensional Evaluation of Planning and Tool Learning
  1498. FocusLLM: Precise Understanding of Long Context by Dynamic Condensing
  1499. Negative Matters: Multi-Granularity Hard-Negative Synthesis and Anchor-Token-Aware Pooling for Enhanced Text Embeddings
  1500. GPT-4 as a Homework Tutor Can Improve Student Engagement and Learning Outcomes
  1501. Diffusion Models Through a Global Lens: Are They Culturally Inclusive?
  1502. Efficient Safety Alignment of Large Language Models via Preference Re-ranking and Representation-based Reward Modeling
  1503. English-based acoustic models perform well in the forced alignment of twoEnglish-based Pacific Creoles
  1504. Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing
  1505. Truth Knows No Language: Evaluating Truthfulness BeyondEnglish
  1506. Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability
  1507. Batayan: AFilipinoNLPbenchmark for evaluating Large Language Models
  1508. HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims
  1509. CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory
  1510. It’s Not a Walk in the Park! Challenges of Idiom Translation in Speech-to-text Systems
  1511. PolyNarrative: A Multilingual, Multilabel, Multi-domain Dataset for Narrative Extraction from News Articles
  1512. A Parameter-Efficient and Fine-Grained Prompt Learning for Vision-Language Models
  1513. Persona Dynamics: Unveiling the Impact of Persona Traits on Agents in Text-Based Games
  1514. SeedBench: A Multi-task Benchmark for Evaluating Large Language Models in Seed Science
  1515. 𝛿-Stance: A Large-Scale Real World Dataset of Stances in Legal Argumentation
  1516. Re3Syn: A Dependency-Based Data Synthesis Framework for Long-Context Post-training
  1517. Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions
  1518. Multimodal Coreference Resolution forChinese Social Media Dialogues: Dataset and Benchmark Approach
  1519. TACLR: A Scalable and Efficient Retrieval-based Method for Industrial Product Attribute Value Identification
  1520. Theory of Mind in Large Language Models: Assessment and Enhancement
  1521. Completing A Systematic Review in Hours instead of Months with InteractiveAIAgents
  1522. CMHKF: Cross-Modality Heterogeneous Knowledge Fusion for Weakly Supervised Video Anomaly Detection
  1523. CLaSp: In-Context Layer Skip for Self-Speculative Decoding
  1524. Teaching Text Agents to Learn Sequential Decision Making from Failure
  1525. The Harmonic Structure of Information Contours
  1526. REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark
  1527. Only a Little to the Left: A Theory-grounded Measure of Political Bias in Large Language Models
  1528. LongSafety: Evaluating Long-Context Safety of Large Language Models
  1529. Exploiting Contextual Knowledge inLLMs through𝒱-usable Information based Layer Enhancement
  1530. Unintended Harms of Value-AlignedLLMs: Psychological and Empirical Insights
  1531. Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval
  1532. The Noisy Path from Source to Citation: Measuring How Scholars Engage with Past Research
  1533. MAPLE: Enhancing Review Generation with Multi-Aspect PromptLEarning in Explainable Recommendation
  1534. Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers
  1535. Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language Models
  1536. DualRAG: A Dual-Process Approach to Integrate Reasoning and Retrieval for Multi-Hop Question Answering
  1537. Deliberate Reasoning in Language Models as Structure-Aware Planning with an Accurate World Model
  1538. Refining Salience-Aware Sparse Fine-Tuning Strategies for Language Models
  1539. Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention
  1540. ScaleBiO: Scalable Bilevel Optimization forLLMData Reweighting
  1541. PKU-SafeRLHF: Towards Multi-Level Safety Alignment forLLMs with Human Preference
  1542. What Happened inLLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
  1543. Beyond Text Compression: Evaluating Tokenizers Across Scales
  1544. Emergent Abilities of Large Language Models under Continued Pre-training for Language Adaptation
  1545. R-Fairness: Assessing Fairness of Ranking in Subjective Data
  1546. RePanda: Pandas-powered Tabular Verification and Reasoning
  1547. Towards Style Alignment in Cross-Cultural Translation
  1548. TiC-LM: A Web-Scale Benchmark for Time-ContinualLLMPretraining
  1549. Entailed Between the Lines: Incorporating Implication intoNLI
  1550. Multi-Level Explanations for Generative Language Models
  1551. A Multi-Agent Framework for Mitigating Dialect Biases in Privacy Policy Question-Answering Systems
  1552. Low-Bit Quantization Favors UndertrainedLLMs
  1553. LETS-C: Leveraging Text Embedding for Time Series Classification
  1554. UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces
  1555. HELIOS: Harmonizing Early Fusion, Late Fusion, andLLMReasoning for Multi-Granular Table-Text Retrieval
  1556. ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities
  1557. La Leaderboard: A Large Language Model Leaderboard forSpanish Varieties and Languages ofSpain andLatinAmerica
  1558. Why Prompt Design Matters and Works: A Complexity Analysis of Prompt Search Space inLLMs
  1559. Energy Considerations of Large Language Model Inference and Efficiency Optimizations
  1560. Optimizing Pre-Training Data Mixtures with Mixtures of Data Expert Models
  1561. BFS-Prover: Scalable Best-First Tree Search forLLM-based Automatic Theorem Proving
  1562. Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation
  1563. Logic-Regularized Verifier Elicits Reasoning fromLLMs
  1564. Squeezed Attention: Accelerating Long Context LengthLLMInference
  1565. LangMark: A Multilingual Dataset for Automatic Post-Editing
  1566. Neural Parameter Search for Slimmer Fine-Tuned Models and Better Transfer
  1567. Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models
  1568. Where Are We? EvaluatingLLMPerformance onAfrican Languages
  1569. Beyond Output Matching: Bidirectional Alignment for Enhanced In-Context Learning
  1570. CiteEval: Principle-Driven Citation Evaluation for Source Attribution
  1571. HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model
  1572. EducationQ: EvaluatingLLMs’ Teaching Capabilities Through Multi-Agent Dialogue Framework
  1573. KRISTEVA: Close Reading as a Novel Task for Benchmarking Interpretive Reasoning
  1574. Efficient Domain Continual pretraining by Mitigating the Stability Gap
  1575. Palm: A Culturally Inclusive and Linguistically Diverse Dataset forArabicLLMs
  1576. NewsInterview: a Dataset and a Playground to EvaluateLLMs’ Grounding Gap via Informational Interviews
  1577. CFBench: A Comprehensive Constraints-Following Benchmark forLLMs
  1578. Towards Building Large Scale Datasets and State-of-the-Art Automatic Speech Translation Systems for 14Indian Languages
  1579. CoRe-MMRAG: Cross-Source Knowledge Reconciliation for MultimodalRAG
  1580. Mapping 1,000+ Language Models via the Log-Likelihood Vector
  1581. ConsistencyChecker: Tree-based Evaluation ofLLMGeneralization Capabilities
  1582. Robust Estimation of Population-Level Effects in Repeated-MeasuresNLPExperimental Designs
  1583. FactBench: A Dynamic Benchmark for In-the-Wild Language Model Factuality Evaluation
  1584. Training-freeLLMMerging for Multi-task Learning
  1585. Inferring from Logits: Exploring Best Practices for Decoding-Free Generative Candidate Selection
  1586. Comparison-based Active Preference Learning for Multi-dimensional Personalization
  1587. OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
  1588. LlamaDuo:LLMOps Pipeline for Seamless Migration from ServiceLLMs to Small-Scale LocalLLMs
  1589. AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment
  1590. SocialCC: Interactive Evaluation for Cultural Competence in Language Agents
  1591. Scalable Vision Language Model Training via High Quality Data Curation
  1592. GRAM: Generative Recommendation via Semantic-aware Multi-granular Late Fusion
  1593. Towards Economical Inference: EnablingDeepSeek’s Multi-Head Latent Attention in Any Transformer-basedLLMs
  1594. TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding
  1595. Introducing Verification Task of Set Consistency with Set-Consistency Energy Networks
  1596. Language Models can Subtly Deceive Without Lying: A Case Study on Strategic Phrasing in Legislation
  1597. AfroCS-xs: Creating a Compact, High-Quality, Human-Validated Code-Switched Dataset forAfrican Languages
  1598. Just Go Parallel: Improving the Multilingual Capabilities of Large Language Models
  1599. Design Choices for Extending the Context Length of Visual Language Models

This index was automatically generated from 1599 papers across 50 parts.

About

This is a repository dedicated to high quality figures from ACL 2025 long papers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •