| Title | Venue | Date | Code | Project |
|---|---|---|---|---|
| Understanding HTML with LLMs | - | - | Github | Project |
| Web page Extraction using GPT | - | - | Github | Project |
| Extracting Financial News using Dolphin | - | - | Github | Project |
| Social Network Data Extraction via LLMs | - | - | Github | Project |
| Knowledge Extraction from Literature (ChatCite, KnowledgeFlow) | - | - | Github | Project |
| Profile Information Extraction using LLMs | - | - | Github | Project |
| POI Data Extraction with LLMs | - | - | Github | Project |
| Epidemiological Data Extraction with Text Analysis | - | - | Github | Project |
| Knowledge Graph Extraction (ExtractKG, LLM-KG) | - | - | Github | Project |
| Multimodal Data Extraction (LLM-MM, WorldGPT, MultimodalExtractor) | - | - | Github | Project |
| Multilingual Data Processing (LlamaLens, MultimodalLLM, QuantifyingLLMs) | - | - | Github | Project |
| IoT Data Extraction (LLMind, UnifiedIoT, EfficientIoT, LLM-IoT) | - | - | Github | Project |
| Medical Data Extraction (APrompt4EM, IterativeLLM, AutoMed-LLM) | - | - | Github | Project |
| Retrieval-Augmented Generation (Survey on RAG, Business-RAG, Enhanced-RAG) | - | - | Github | Project |
| Title | Venue | Date | Code | Project |
|---|---|---|---|---|
| Human-LLM Collaboration (HumanLLM, AnnotationLLM) | - | - | Github | Project |
| HowToCaption: Automated Captioning with LLMs | - | - | Github | Project |
| Zero-Shot Data Annotation (ZeroShot-LLM) | - | - | Github | Project |
| LLMaAA: LLMs for Automated Annotation Assistance | - | - | Github | Project |
| Gollie: LLM-based Data Annotation Framework | - | - | Github | Project |
| Self-Correction in Data Annotation (LLM-SelfCorrect) | - | - | Github | Project |
| Eagle: Enhancing Annotation with LLMs | - | - | Github | Project |
| CoAnnotating: Collaborative Annotation with LLMs | - | - | Github | Project |
| PDFChatAnnotator: LLM-driven PDF Data Annotation | - | - | Github | Project |
| Title | Venue | Date | Code | Project |
|---|---|---|---|---|
| Multimodal LLM for Data Aggregation (MultimodalLLM, AggregatorGPT) | - | - | Github | Project |
| TableGPT2: LLMs for Table Data Processing | - | - | Github | Project |
| DataChat: Conversational AI for Data Analysis | - | - | Github | Project |
| InsightLens: AI-driven Data Aggregation | - | - | Github | Project |
| DataLab: LLMs for Data Curation | - | - | Github | Project |
| Title | Venue | Date | Code | Project |
|---|---|---|---|---|
| TSL: LLM-based Time-Series Data Generation | - | - | Github | Project |
| LLM-PTM: Pre-trained Model for Data Generation | - | - | Github | Project |
| LLM-Forest: Multi-agent Learning for Data Synthesis | - | - | Github | Project |
| IoT-LLM: Data Generation for Internet of Things | - | - | Github | Project |
| DPDA: Privacy-Preserving Data Generation with LLMs | - | - | Github | Project |
| UnIMP: Unsupervised Data Imputation using LLMs | - | - | Github | Project |
| Title | Venue | Date | Code | Project |
|---|---|---|---|---|
| Llm-Select: Feature Selection with LLMs | - | - | Github | Project |
| LmPriors: Prior Knowledge for Feature Engineering | - | - | Github | Project |
| AltFS: Alternative Feature Selection with LLMs | - | - | Github | Project |
| ICE-SEARCH: Efficient Feature Extraction using LLMs | - | - | Github | Project |
| Title | Venue | Date | Code | Project |
|---|---|---|---|---|
| VIDS: AI-based Data Cleaning for Visual Datasets | - | - | Github | Project |
| Cocoon: LLM-powered Data Refinement | - | - | Github | Project |
| Gidcl: Graph-based Cleaning using LLMs | - | - | Github | Project |
| Multi-News+: Automated Data Cleaning for News Articles | - | - | Github | Project |
| Title | Venue | Date | Code | Project |
|---|---|---|---|---|
| ClusterLLM: Large language models as a guide for text clustering | EMNLP | 2023 | Github | Project |
| Efficient Few-Shot Fine-Tuning for Opinion Summarization | NAACL | 2023 | Github | Project |
| Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization | EMNLP | 2021 | Github | Project |
| Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method | ACL | 2023 | Github | Project |
| Planning with Learned Entity Prompts for Abstractive Summarization | TACL | 2021 | Github | Project |
| Personalized Abstractive Summarization by Tri-agent Generation Pipeline | EACL | 2023 | Github | Project |
| Similar Data Points Identification with LLM: A Human-in-the-loop Strategy Using Summarization and Hidden State Insights | ArXiv | 2024 | Github | Project |
| TopicGPT: A Prompt-based Topic Modeling Framework | NAACL | 2024 | Github | Project |
| Neural Topic Modeling with Large Language Models in the Loop | ArXiv | 2024 | Github | Project |
| Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling | ArXiv | 2024 | Github | Project |
| Designing Heterogeneous LLM Agents for Financial Sentiment Analysis | ACM TMIS | 2024 | Github | Project |
| WisdoM: Improving Multimodal Sentiment Analysis by Fusing Contextual World Knowledge | ACMMM | 2024 | Github | Project |
| Unified Multi-modal Pre-training for Few-shot Sentiment Analysis with Prompt-based Learning | ACMMM | 2022 | Github | Project |
| A multimodal approach to cross‑lingual sentiment analysis with ensemble of transformer and LLM | Sci. Rep. | 2024 | Github | Project |
| Sentiment Analysis through LLM Negotiations | ArXiv | 2023 | Github | Project |
| Title | Venue | Date | Code | Project |
|---|---|---|---|---|
| PaLM: Scaling Language Modeling with Pathways | TMLR | 2023 | Github | Project |
| BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | NAACL | 2019 | Github | Project |
| Learning Transferable Visual Models From Natural Language Supervision | ICML | 2021 | Github | Project |
| The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) | ArXiv | 2023 | Github | Project |
| I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification | CVPR | 2023 | Github | Project |
| CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations | EMNLP | 2021 | Github | Project |
| One for all: Towards training one graph model for all classification task | ICLR | 2024 | Github | Project |
| Language is All a Graph Needs | EACL | 2024 | Github | Project |
| GPT4Graph: Can Large Language Models Understand Graph Structured Data? An Empirical Evaluation and Benchmarking | ArXiv | 2023 | Github | Project |
| Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning | NeurIPS Workshop | 2023 | Github | Project |
| Scaling Laws for Discriminative Classification in Large Language Models | Applied AI Letters | 2025 | Github | Project |
| Large Language Models for Text Classification: From Zero-Shot Learning to Instruction-Tuning | Open Science Foundation | 2024 | Github | Project |
| An Experimental Evaluation of LLM on Image Classification | ADC | 2024 | Github | Project |
| Contextual Speech Emotion Recognition with Large Language Models and ASR-Based Transcript | NeurIPS Workshop | 2024 | Github | Project |
| Enhancing Speech De-Identification with LLM-Based Data Augmentation | ICAICTA | 2024 | Github | Project |
| # Rethinking VLMs and LLMs for Image Classification | ArXiv | 2024 | Github | Project |
| Music Genre Classification using Large Language Models | ArXiv | 2024 | Github | Project |
| Title | Venue | Date | Code | Project |
|---|---|---|---|---|
| Lost in the middle: How language models use long contexts | - | - | Github | Project |
| Me llama: Foundation large language models for medical applications | - | - | Github | Project |
| TopicGPT: A prompt-based topic modeling framework | - | - | Github | Project |
| FEQA: A question answering evaluation framework for faithfulness assessment in abstractive summarization | - | - | Github | Project |
| When2Ask: Learning to schedule human interaction questions with language models | - | - | Github | Project |
| TL;DR: Mining significant science from academic papers | - | - | Github | Project |
| Sentiment analysis through llm negotiations | - | - | Github | Project |
| Efficient few-shot fine-tuning for opinion summarization | - | - | Github | Project |
| Vision guided generative pre-trained language models for multimodal abstractive summarization | - | - | Github | Project |
| Element-aware summarization with large language models: Expert-aligned evaluation and chain-of-thought method | - | - | Github | Project |
| Planning with learned entity prompts for abstractive summarization | - | - | Github | Project |
| Personalized abstractive summarization by tri-agent generation pipeline | - | - | Github | Project |
| Title | Venue | Date | Code | Project |
|---|---|---|---|---|
| Palm: Scaling language modeling with pathways | JMLR | 2023 | - | - |
| BERT: Pre-training of deep bidirectional transformers for language understanding | NAACL | 2019 | - | - |
| Learning transferable visual models from natural language supervision | ICML | 2021 | - | - |
| The dawn of lmms: Preliminary explorations with gpt-4v (ision) | arXiv | 2023 | - | - |
| I2mvformer: Large language model generated multi-view document supervision for zero-shot image classification | CVPR | 2023 | - | - |
| CTAL: Pre-training cross-modal transformer for audio-and-language representations | arXiv | 2021 | - | - |
| GPT4Graph: Can large language models understand graph structured data? An empirical evaluation and benchmarking | arXiv | 2023 | - | - |
| Cross-modal learning for chemistry property prediction: Large language models meet graph machine learning | arXiv | 2024 | - | - |
| Title | Venue | Date | Code | Project |
|---|---|---|---|---|
| Data Interpreter | - | 2024 | Github | Project |
| Reflexion: Language Agents with Verbal Reinforcement Learning | NeurIPS | 2024 | Github | Project |
| PlotGen: Multi-agent LLM-based Scientific Data Visualization via Multimodal Feedback | arXiv | 2025 | Github | Project |
| Memocrs: Memory-Enhanced Sequential Conversational Recommender Systems with Large Language Models | ICIKM | 2024 | Github | Project |
| Expel: LLM Agents are Experiential Learners | AAAI | 2024 | Github | Project |
| Large Language Models Can Self-Improve | arXiv | 2022 | Github | Project |
| G-eval: NLG Evaluation Using GPT-4 with Better Human Alignment | EMNLP | 2023 | Github | Project |
| Drdt: Dynamic Reflection with Divergent Thinking for LLM-based Sequential Recommendation | arXiv | 2023 | Github | Project |
| Autokaggle: A Multiagent Framework for Autonomous Data Science Competitions | arXiv | 2024 | Github | Project |
| Chartllama: A Multimodal LLM for Chart Understanding and Generation | arXiv | 2023 | Github | Project |
| Tinychart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning | arXiv | 2024 | Github | Project |
| Unichart: A Universal Vision-Language Pretrained Model for Chart Comprehension and Reasoning | arXiv | 2023 | Github | Project |
| Graphotter: Evolving LLM-based Graph Reasoning for Complex Table Question Answering | arXiv | 2024 | Github | Project |
| Mplug-paperowl: Scientific Diagram Analysis with the Multimodal Large Language Model | ACM MM | 2024 | Github | Project |
| CoG-DQA: Chain-of-guiding Learning with Large Language Models for Diagram Question Answering | CVPR | 2024 | Github | Project |
| NLGift: Graph LLM for Intelligent Finance Task | arXiv | 2024 | Github | Project |
| Insightpilot: An LLM-empowered Automated Data Exploration System | EMNLP | 2023 | Github | Project |
| Talk2data: A Natural Language Interface for Exploratory Visual Analysis via Question Decomposition | ACM TOIS | 2024 | Github | Project |
| TiInsight: Comprehensive LLM-based Exploratory Analysis for Time Series | arXiv | 2024 | Github | Project |
| Genoagent: A Baseline Method for LLM-based Exploration of Gene Expression Data | OpenReview | 2025 | Github | Project |
| Data-copilot: Bridging Billions of Data and Humans with Autonomous Workflow | arXiv | 2023 | Github | Project |
| QUIS: Question-guided Insights Generation for Automated Exploratory Data Analysis | arXiv | 2024 | Github | Project |
| Lida: A Tool for Automatic Generation of Grammar-agnostic Visualizations and Infographics using Large Language Models | ACL | 2023 | Github | Project |
| Title | Venue | Date | Code | Project |
|---|---|---|---|---|
| Unichart: A Universal Vision-Language Pretrained Model for Chart Comprehension and Reasoning | EMNLP | 2024 | Github | Project |
| EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding | AAAI | 2025 | Github | Project |
| ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning | ACL | 2024 | Github | Project |
| ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training | ACL | 2024 | Github | Project |
| Position: What Can Large Language Models Tell Us About Time Series Analysis | arXiv | 2024 | Github | Project |
| Lambda: A Large Model Based Data Agent | arXiv | 2024 | Github | Project |
| Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding | ICLR | 2024 | Github | Project |
| TS-Reasoner: Compositional Time Series Reasoning for End-to-End Task Execution | arXiv | 2024 | Github | Project |
| Proteingpt: Multimodal LLM for Protein Property Prediction and Structure Understanding | arXiv | 2024 | Github | Project |
| Time-LLM: Time Series Forecasting by Reprogramming Large Language Models | ICLR | 2024 | Github | Project |
| LLM4TS: Two-Stage Fine-Tuning for Time-Series Forecasting with Pre-Trained LLMs | TIST | 2025 | Github | Project |
| GPT4MTS: Prompt-Based Large Language Model for Multimodal Time-Series Forecasting | AAAI | 2024 | Github | Project |
| Strada-LLM: Graph LLM for Traffic Prediction | arXiv | 2024 | Github | Project |
| LeMoLE: LLM-Enhanced Mixture of Linear Experts for Time Series Forecasting | arXiv | 2024 | Github | Project |
| ST-LLM: Spatial-Temporal Large Language Model for Traffic Prediction | MDM | 2024 | Github | Project |
| Realtcd: Temporal Causal Discovery from Interventional Data with Large Language Model | CIKM | 2024 | Github | Project |
| MATMCD: From Query Tools to Causal Architects | arXiv | 2023 | Github | Project |
| Order-of-Though: Are Large Language Models Capable of Causal Reasoning for Sensing Data Analysis | EdgeFM | 2024 | Github | Project |
| Title | Venue | Date | Code | Project |
|---|---|---|---|---|
| ClinicalGPT: AI for Clinical Diagnostics | - | - | Github | Project |
| Mole-BERT: Molecular Biology with Transformers | - | - | Github | Project |
| Me-LLaMA: Medical Large Language Model Applications | - | - | Github | Project |
| Title | Venue | Date | Code | Project |
|---|---|---|---|---|
| ECC Analyzer: AI for Risk Management | - | - | Github | Project |
| RiskLabs: Data-driven Risk Management | - | - | Github | Project |
| TradingAgents: AI-Powered Financial Trading | - | - | Github | Project |