Paper Review
Paper Review
DrugAgent: Automating AI-aided Drug   DrugAgent, a multi-agent framework              ADMET Prediction: DrugAgent          Depends heavily on the accuracy of
Discovery Programming through LLM     designed to automate machine                     successfully automated the ML        language models.
Multi-Agent Collaboration             learning (ML) programming tasks in               pipeline for predicting absorption   May lack deep domain knowledge in
                                      drug discovery. The framework                    using the PAMPA dataset,             complex biomedical areas.
                                                                                                                            Tested on limited drug discovery tasks
                                      comprises two primary components:                achieving an F1 score of 0.92.
                                                                                                                            only.
                                                                                      Drug-Target Interaction: In DTI      Not yet validated in real-world pharma
                                      • The LLM Planner comes up with                  tasks, DrugAgent outperformed        workflows.
                                      different solution ideas and improves            the ReAct baseline with a 4.92%      Computationally expensive due to
                                      them based on results.                           relative improvement in ROC-         multiple iterations.
                                                                                       AUC.
                                      • The LLM Instructor turns those
                                      ideas into working code using drug-
                                      specific knowledge.
RAG-Enhanced Collaborative LLM           Introduces CLADD, a system of               CLADD effectively handled drug          Difficulty in integrating varied data
Agents for Drug Discovery                 collaborative LLM agents enhanced            discovery tasks using general-           types (molecular, protein, disease
                                          with Retrieval-Augmented                     purpose LLMs.                            data).
                                          Generation (RAG).                           Showed promising results without        Retrieval errors can lead to incorrect
                                         Agents dynamically retrieve                  the need for specialized training.       outputs.
                                          biomedical knowledge and                    Demonstrated the feasibility of
                                          coordinate to perform tasks like             RAG-enhanced agent collaboration
                                          molecule property prediction and             for scientific applications.
                                          target identification.
                                         Avoids domain-specific fine-tuning
                                          by relying on real-time retrieval and
                                          inter-agent collaboration.
LLM-Assisted Drug Discovery              Conducts a qualitative analysis of          Accelerating target identification      The study is theoretical and
                                          the role of Large Language Models            and drug efficacy prediction,        conceptual, lacking empirical
                                          (LLMs) in drug discovery. It                Reducing costs and time-to-market,   validation or real-world application
                                          examines how LLMs process                   Improving accuracy in preclinical    results.
                                          biomedical literature, clinical trial        insights.
                                          data, and molecular databases to:
                                         Identify new therapeutic targets,
                                          Predict drug efficacy, and
                                          Streamline development workflows.
                                         It leverages case studies and existing
                                          research to argue LLMs' potential,
                                          rather than implementing or
                                          evaluating a new model.
Generating Novel Leads for Drug             Proposed LMLF (Language Models             Relies on existing tools (RDKit,          GPTLF++ and PaLMLF++
Discovery Using LLMs with Logical            with Logical Feedback) — a novel,           GNINA) for molecular property              produced molecules with higher
Feedback                                     iterative framework to guide LLMs           validation and docking — results           binding scores than baselines.
                                             in generating drug-like molecules.          depend on their accuracy.                 15–40% of molecules contained
                                            The method separates the prompt            Generalization is limited to numeric       selective functional groups (JAK2,
                                             into two parts:                             constraints.                               DRD2).
                                             1. A domain-specific logical               The feedback loop is constrained to       Many generated leads were novel
                                                  constraint (e.g., molecular            iterative logic updates — might not        (Tanimoto similarity < 0.75 to
                                                  weight, binding affinity).             capture broader chemical intuition.        known drugs).
                                             2. A domain-independent query
                                                  (e.g., "Generate valid
                                                  molecules").
KRAGEN: a knowledge graph-enhanced          Proposed KRAGEN, a novel tool              KRAGEN improves logical                   Lacks extensive benchmarking
RAG framework for biomedical problem         that enhances standard Retrieval-           structure, transparency, and factual       against traditional RAG or clinical
solving using large language models          Augmented Generation (RAG) by               accuracy of LLM responses in               decision systems
                                             integrating knowledge graphs and            biomedical domains.
                                             Graph-of-Thoughts (GoT)                    The use of GoT allows for traceable
                                             prompting.                                  reasoning paths, helping users
                                            Knowledge Graphs are converted              understand and verify the logic
                                             into a vector database for retrieval,       behind outputs.
                                             enhancing factual grounding.
                                            The GoT technique breaks down
                                             complex biomedical problems into
                                             smaller subproblems (nodes in a
                                             graph), and each is solved
                                             individually using LLMs and
                                             retrieved knowledge.
Computational Strategies for Drug           Data Collection: Bioactive                 The study identified multiple lead        The study relies heavily on in silico
Discovery: Harnessing Indian Medicinal       compounds were sourced from                 compounds with strong binding              predictions without experimental
Plants                                       databases like IMMPAT,                      affinity, favorable ADMET profiles,        (wet lab) validation.
                                             IMPPAT2.0, and COCONUT using                and stability in molecular                Traditional knowledge integration is
                                             a literature review.                        simulations.                               dependent on available databases,
                                            ADMET Analysis: Tools such as              Demonstrated that Indian medicinal         which may not capture the full
                                             pkCSM and ADMET3.0 predicted                plants are a rich reservoir for            ethnopharmacological scope.
                                             absorption, distribution, metabolism,       potential drug candidates.
                                             excretion, and toxicity profiles to        Highlighted that integrating
                                             screen drug-like compounds.                 traditional medicinal knowledge
                                            Network Pharmacology:                       with modern computational
                                             Constructed interaction networks to         techniques can accelerate the drug
                                             understand multi-target effects and         discovery pipeline.
                                          mechanisms of action within
                                          biological pathways.
                                       Molecular Docking: Performed
                                          virtual screening using tools like
                                          AutoDock Vina, PyRx, CB-Dock,
                                          etc., to predict binding affinity of
                                          compounds with disease-related
                                          proteins.
                                       Molecular Dynamics (MD)
                                          Simulations: Used GROMACS,
                                          NAMD, Desmond to simulate
                                          stability and behavior of protein-
                                          ligand complexes; RMSD/RMSF
                                          and binding free energies were
                                          evaluated.
Molecular Simulations for Ayurvedic    Offers a review and conceptual               Molecular simulations (especially         Lack of experimental data to
Phytochemicals                            framework for using CADD                    MD) have the potential to reveal           validate in silico predictions of
                                          (Computer-Aided Drug Discovery)             mechanistic insights into Ayurvedic        Ayurvedic phytochemicals.
                                          tools—particularly molecular                formulations.                             No comprehensive phytochemical-
                                          docking and molecular dynamics             Highlights the urgent need for:            target database for Ayurvedic plants.
                                          (MD) simulations—to study                  Phytochemical-specific force fields,
                                          Ayurvedic phytochemicals.                  Expanded compound-target
                                       Molecular Docking: Described as a             association databases,
                                          first-pass screening technique to          Collaborative efforts between
                                          evaluate ligand-protein interactions,       computational researchers and
                                          using structure-based approaches.           Ayurveda experts.
                                       Molecular Dynamics (MD)
                                          Simulations: Proposed to overcome
                                          limitations of docking by modelling
                                          ligand flexibility, solvent effects,
                                          and complex interactions such as
                                          allosteric or competitive binding.
A Comprehensive Survey on Vector      This is a survey paper, not an                 Vector databases are essential for        The survey focuses mostly on
Database: Storage and Retrieval       experimental or implementation-based            storing and retrieving high-               architectural and algorithmic
Technique, Challenge                  study. It provides a comprehensive              dimensional, unstructured data used        overviews; lacks empirical
                                      review of:                                      in modern AI.                              benchmarks or performance
                                       Vector database architecture —               They support efficient similarity          comparisons.
                                          particularly how data is stored,            search through techniques like
                                          retrieved, and queried.                     ANNS and scalable storage with
                                                                                      sharding and replication.
                                             Four key approximate nearest             Integration with LLMs can enhance
                                              neighbour search (ANNS)                   semantic understanding in search
                                              approaches:                               and RAG pipelines.
                                              1. Hash-based
                                              2. Tree-based
                                              3. Graph-based
                                              4. Quantization-based
                                          Storage techniques including
                                              sharing, partitioning, caching, and
                                              replication.
                                          Retrieval techniques using NNS and
                                              ANNS strategies.
IMPPAT: A curated database of Indian     Developed IMPPAT, a manually curated          IMPPAT is the largest curated             Incomplete phytochemical data for
Medicinal Plants, Phytochemistry And     database featuring:                            digital repository of phytochemicals       many plants due to limitations in
Therapeutics                                  1. 1742 Indian medicinal plants           from Indian medicinal plants.              literature and digitized sources.
                                              2. 9596 phytochemicals (with 2D          960 phytochemicals were identified
                                                  & 3D structures)                      as druggable, with 591 being novel
                                              3. 1124 therapeutic uses                  (no similarity to FDA drugs).
                                          Data sources included traditional           The chemical space of IMPPAT
                                              medicine books, databases (e.g.,          phytochemicals is more complex
                                              PubMed), and the Traditional              and diverse (stereochemistry, Fsp³)
                                              Knowledge Digital Library (TKDL).         than commercial compound
                                          Chemical properties (e.g., logP,             libraries.
                                              molecular weight) and ADMET              The database provides tools for
                                              (Absorption, Distribution,                chemical filtering, drug-likeness
                                              Metabolism, Excretion, Toxicity)          evaluation, structure downloads, and
                                              predictions were computed using           network visualization.
                                              cheminformatic tools like FAF-
                                              Drugs4 and admetSAR.
IMPPAT 2.0: An Enhanced and              IMPPAT 2.0 is a major update to the           Identified 960 druggable                  Phytochemical-target associations
Expanded Phytochemical Atlas of Indian   original IMPPAT database.                      phytochemicals, many of which are          are not comprehensively mapped.
Medicinal Plants                         It now includes:                               structurally novel compared to
                                              1. 4010 Indian medicinal plants (up       known drugs.
                                                  from 1742),                          Revealed the chemical uniqueness of
                                              2. 17,967 phytochemicals (90%             Indian phytochemicals compared to
                                                  increase),                            Chinese and FDA drugs.
                                              3. 1,659 therapeutic uses.
                                          Added chemical structures in
                                              formats like SDF, PDB, and MOL2.
Prompt-RAG: Pioneering Vector           The paper introduces Prompt-RAG,           Prompt-RAG outperformed                Prompt-RAG may struggle with
Embedding-Free Retrieval-Augmented       a retrieval-augmented generation            traditional vector-based RAG and        retrieving highly structured or
Generation in Niche Domains,             (RAG) system that does not rely on          ChatGPT baselines in terms of           hierarchically organized data.
Exemplified by Korean Medicine           vector embeddings.                          relevance and informativeness.         Natural language retrieval could
                                        Instead of traditional dense               Showed that vector embeddings are       become inefficient with very large
                                         embedding searches, it uses natural         not always optimal for niche or         corpora compared to ANN
                                         language prompts directly to retrieve       culturally-specific knowledge           (Approximate Nearest Neighbor)
                                         relevant documents.                         domains.                                methods.
                                        The approach is designed for niche         Demonstrated the potential for
                                         domains like Korean Medicine                prompt-based retrieval systems to
                                         (KM), where embedding models                enhance RAG pipelines, especially
                                         often poorly capture semantic               in specialized fields.
                                         relationships.
                                        Comparative experiments were
                                         conducted between:
                                         1. Vector-based RAG (using
                                              traditional embeddings)
                                         2. Prompt-RAG (using natural
                                              language querying)
Explainable Biomedical Hypothesis       The authors propose RUGGED                 RUGGED enhances hypothesis             RUGGED was evaluated on a single
Generation via Retrieval Augmented       (Retrieval Under Graph-Guided               generation by combining RAG with        case study (ACM vs. DCM).
Generation enabled Large Language        Explainable Disease Distinction), a         explainable knowledge graphs.           Broader validation is pending.
Models                                   workflow to generate biomedical            It reduced hallucination risk          The effectiveness of hypothesis
                                         hypotheses using RAG-enabled                compared to purely generative LLM       generation depends heavily on the
                                         LLMs.                                       approaches.                             relevance and accuracy of retrieved
                                        Combines three main steps:                 Successfully identified potential       documents.
                                         1. Text mining: Extracts disease,           therapeutic targets and disease
                                              drug, and molecular associations       linkages in the ACM/DCM case
                                              from biomedical literature.            study.
                                         2. Graph prediction models:                Demonstrated that structured
                                              Creates explainable graphs that        retrieval plus graph-based
                                              forecast potential links among         explainability can strengthen
                                              diseases and therapeutics.             biomedical research support
                                         3. RAG-based LLM interaction:               systems.
                                              Supports human users in
                                              generating and refining
                                              biomedical hypotheses based on
                                              retrieved evidence and graph
                                              suggestions.
                                     Evaluated using a clinical case study
                                      involving Arrhythmogenic
                                      Cardiomyopathy (ACM) and Dilated
                                      Cardiomyopathy (DCM) to suggest
                                      repurposed therapeutics.
REASONING-ENHANCED                   KARE constructs a multi-source             KARE outperforms traditional RAG            Scalability to complex healthcare
HEALTHCARE PREDICTIONS WITH           medical Knowledge Graph (KG) by             and other baseline models on all             tasks is mentioned as future work;
KNOWLEDGE GRAPH                       integrating:                                tasks:                                       current validation is limited to
COMMUNITY RETRIEVAL (KARE)            1. Biomedical databases (e.g.,              1. Upto 10.8%-15.0%                          mortality and readmission.
                                           UMLS),                                     improvement on MIMIC-III                Fine-grained clinical concepts may
                                      2. Clinical literature (e.g.,                   tasks.                                   be lost due to reliance on code
                                           PubMed),                               2. Upto 12.6%-12.7%                          mappings.
                                      3. LLM-generated domain-specific                improvement on MIMIC-IV                 For extremely large graph
                                           insights.                                  tasks.                                   communities, summaries are not
                                     It detects graph communities               Significantly improved                       generated due to LLM context
                                      hierarchically (using the Leiden            interpretability by producing                window limits.
                                      algorithm) and summarizes them.             reasoning chains for each clinical
                                     Patient EHR data is augmented with          prediction.
                                      relevant community summaries
                                      (context augmentation).
                                     A local small LLM is fine-tuned on
                                      augmented EHRs, generating:
                                     Reasoning chains (step-by-step
                                      explanations)
                                     Prediction labels (e.g., mortality,
                                      readmission).
                                     Datasets used: MIMIC-III and
                                      MIMIC-IV electronic health record
                                      (EHR) datasets.
Y-Mol: A Multiscale Biomedical       Y-Mol is a domain-specific LLM             Y-Mol successfully establishes the       Y-Mol is built on synthesized instruction
Knowledge-Guided Large Language       built on LLaMA2 and fine-tuned for          first multiscale knowledge-guided        datasets; real-world noise and variation
Model for Drug Development            drug development tasks.                     LLM pipeline for drug development.       may impact downstream performance.
                                     Constructs a multiscale biomedical         Shows that integrating biomedical
                                      knowledge dataset from:                     domain knowledge at scale can
                                      1. Publications (PubMed corpus)             improve LLM reasoning,
                                      2. Biomedical knowledge graphs              generalization, and predictive ability
                                           (for interaction relationships)        in drug R&D.
                                      3. Expert-designed synthetic data          Y-Mol offers a blueprint for future
                                           from small models (e.g.,               biomedical-specific LLMs
                                                   ADMET predictions, drug                 leveraging structured and
                                                   repurposing tools).                     unstructured knowledge sources.
                                             Instruction dataset crafted into three
                                              types:
                                              1. Description-based prompts
                                                   (extracted from publications)
                                              2. Semantic-based prompts
                                                   (capturing relations in
                                                   knowledge graphs)
                                              3. Template-based prompts
                                                   (simulating expert domain
                                                   knowledge)
KEDRec-LM: A Knowledge-distilled             Drug-Disease Pair Extraction:               KEDRec-LM achieves strong                   The model’s scope is restricted to
Explainable Drug Recommendation               Important drug-disease associations          performance in generating accurate,          drug-disease pairs drawn from
Large Language Model                          are gathered from the Drug                   explainable drug recommendations.            DRKG — not comprehensive across
                                              Repurposing Knowledge Graph                 Distilled models retain high                 all biomedical domains.
                                              (DRKG).                                      reasoning ability while being more          The retrieval stage relies heavily on
                                             Evidence Retrieval: A Retrieval-             efficient than teacher LLMs.                 PubMed and Clinical Trials; missing
                                              Augmented Generation (RAG)                  Demonstrates that integrating                or low-quality literature can affect
                                              system fetches supporting                    knowledge graphs, literature                 the rationale generation.
                                              biomedical documents from PubMed             retrieval, and instruction fine-tuning      The model’s scope is restricted to
                                              and Clinical Trials.                         can lead to practical, explainable           drug-disease pairs drawn from
                                             Teacher Model Reasoning: A GPT-              biomedical LLMs.                             DRKG — not comprehensive across
                                              based teacher model answers clinical        Authors open-sourced both the                all biomedical domains.
                                              questions about each drug-disease            dataset and the fine-tuned model to
                                              pair by reasoning over the retrieved         foster further research in explainable
                                              evidence.                                    drug recommendation.
                                             Knowledge Distillation: A smaller
                                              student model (based on fine-tuned
                                              LLaMA) is trained to predict drug
                                              recommendations and generate
                                              human-readable rationales based on
                                              the teacher’s responses.
Harnessing the Power of Knowledge            Model that enhances biomedical              The CMA model (cross-modal                  Although the model uses attention
Graphs to Enhance LLM Explainability in       reasoning by fusing Knowledge                attention between LLM and KG                 mechanisms for explainability, the
the Biomedical Domain                         Graph (KG) and LLM                           embeddings) outperforms fine-tuned           paper acknowledges that attention-
                                              representations through a Cross-             BioBERT on the MedQA                         based explanations are debated and
                                              Modal Attention (CMA)                        biomedical reasoning task.                   not always fully faithful.
                                              mechanism.                                  Combines improved task                      Model explainability and reasoning
                                             The process follows these steps:             performance with plausible local             improvements are based on a single
                                               First, relevant triples are extracted       explainability through attention         biomedical KG (UMLS);
                                                from the UMLS Knowledge Graph               visualization.                           generalization to other biomedical
                                                based on each MedQA question and           Demonstrates that structured             knowledge bases is not evaluated.
                                                its answer choices.                         biomedical knowledge (from KGs)
                                               Two parallel encodings are then             can significantly enhance both
                                                performed:                                  accuracy and interpretability of
                                               KG triples are encoded using a              LLM predictions.
                                                Graph Neural Network (GNN).                Provides a promising step toward
                                               Textual data (questions and answers)        integrating knowledge graphs and
                                                are encoded using a pre-trained             LLMs for transparent biomedical AI
                                                BioBERT model.                              systems.
                                               A single-layer cross-modal
                                                transformer is employed, where
                                                CMA enables the fusion of KG
                                                embeddings and text embeddings,
                                                replacing traditional self-attention
                                                mechanisms.
                                               The model then uses the CLS token
                                                output from the transformer to make
                                                a final answer prediction.
                                               To enhance explainability, the
                                                attention scores linking the CLS
                                                token and KG nodes are visualized,
                                                providing local, interpretable
                                                rationales for each prediction.
Leveraging AI in ayurvedic agriculture: A      Developed a hybrid deep learning           The DeiT+VGG16 hybrid model             The model sometimes misidentified
RAG chatbot for comprehensive                   model combining DeiT (Data-                 achieved 96.75% testing accuracy,        medicinal plants, especially among
medicinal plant insights using hybrid           efficient Image Transformer) and            outperforming individual DeiT            visually similar species.
deep learning approaches                        VGG16 CNN model.                            (95.97%) and VGG16 (90.26%)             The RAG chatbot occasionally
                                               Dataset used: Indian Medicinal Plant        models.                                  produced misinformation about
                                                dataset (4161 training images, 893         Successfully developed an offline-       plants’ uses and economic values.
                                                testing images across 40 classes).          capable, bilingual, and farmer-         Translation inaccuracies occurred
                                               Preprocessing steps included image          accessible RAG chatbot.                  due to reliance on the GoogleTrans
                                                resizing, random cropping, flipping,       The system allows users to scan          API for Nepali-English switching.
                                                normalization.                              plants, identify them, and get          Performance wasn't strong enough to
                                               Models trained using 70:15:15 split         generated medicinal and economic         handle all possible medicinal plants
                                                (training:validation:testing) with          insights, enhancing farmer               beyond those present in the dataset.
                                                batch size 32, learning rate 1e-4           empowerment and Ayurvedic
                                                using AdamW optimizer.                      research support.
                                            Final DeiT+VGG16 hybrid model              Future work: Plan to extend to real-
                                             used concatenation of local                 time mobile apps, improve chatbot
                                             (VGG16) and global (DeiT)                   UI, support more languages, and
                                             features.                                   develop IoT-based plant tracking.
                                            Integrated this model into a
                                             Retrieval-Augmented Generation
                                             (RAG) chatbot using LangChain +
                                             OpenAI API to generate insights
                                             about the medicinal plants in English
                                             and Nepali.
                                            Googletrans API used for bilingual
                                             translation support.
Drug discovery from plant sources: An       Plant selection is prioritized using:      A systematic integration of               Extensive use of natural sources for
integrated approach                          1. Traditional documented use,              Ayurvedic knowledge into plant             drug development could threaten
                                             2. Tribal/ethnomedicinal                    selection significantly improves           biodiversity.
                                                 undocumented use,                       success rates for drug discovery.         Not all bioactive plant compounds
                                             3. Exhaustive literature search,           Applying Ayurvedic concepts (like          are easily or completely
                                             4. Ayurvedic pharmacological                Rasa, Guna, Veerya) allows strategic       synthetically reproducible.
                                                 attributes (Rasa, Guna, Veerya,         shortlisting of potential plants,         Access and benefit-sharing rules
                                                 Vipaka, Dosha Karma).                   reducing time, costs, and                  under the Convention on Biological
                                            Extraction strategy suggested:              development risks.                         Diversity (CBD) complicate drug
                                             1. Use parallel extraction (multiple       Proposes that modern                       commercialization from plant
                                                 solvents simultaneously) for            pharmaceutical industries should           sources.
                                                 plants with known activity,             embrace herbal extracts and                Many bioactive natural compounds
                                             2. Use sequential extraction                standardized botanical drugs for           violate Lipinski’s Rule of Five,
                                                 (polarity-based) for plants with        faster, safer drug development.            meaning they may face issues in
                                                 unknown activity.                      Reinforces that combining                  bioavailability, requiring alternate
                                            Bioassay-guided fractionation is            traditional medicine insights with         drug ability criteria.
                                             performed to isolate and standardize        modern biological screening creates        Especially when plants are selected
                                             bioactive compounds.                        a cost-effective and scientifically        randomly, success rates in drug
                                            Application of Go/No-Go criteria            sound pathway for developing new           discovery programs are low
                                             based on potency of extracts vs pure        plant-derived drugs.                       compared to systematic, Ayurveda-
                                             compounds to decide development                                                        guided selection.
                                             paths.
Integrating Retrieval-Augmented             The fine-tune Mistral 7b model             Mistral 7b with RAG achieved a            Only 9 journals were used for model
Generation with Large Language Model         using the Retrieval-Augmented               higher METEOR score (0.22%) than           fine-tuning — limiting the herbal
Mistral 7b for Indonesian Medical Herb       Generation (RAG) method.                    LLaMa2 7b (0.14%).                         plant knowledge coverage.
                                            Dataset: 9 academic journals               The RAG-Mistral model generated           Validation was done with only one
                                             focused exclusively on Indonesian           more creative, context-grounded            expert (ethnobotany field), lacking
                                                                                                                                    broader clinical validation.
                                       medicinal herbs were collected and          herbal recommendations than                  Evaluation was done for only 6
                                       processed.                                  LLaMa2 7b.                                   herbal-related diseases (e.g.,
                                    Preprocessing: Text chunked into             Precision was slightly lower due to          headache, diabetes, hypertension,
                                       500-sized chunks, 512 tokens per            creative outputs, but relevance and          fever, rheumatism, heartburn).
                                       chunk, embedded with Sentence-              factual correctness were better.            Mistral 7b outputs were more
                                       Transformers, indexed using FAISS.         They recommend further research              creative but had lower precision
                                   System architecture:                            with more experts and expanding the          compared to LLaMa2 7b.
                                    User query → Semantic search over             journal dataset to improve reliability      Overfitting risks since the training
                                       vector DB → Retrieved context +             and application in real-world herbal         was narrow on Indonesian plants and
                                       Query fed to Mistral 7b → Answer            medicine Q&A systems.                        may not generalize outside this
                                       generated.                                                                               scope.
VAIV bio-discovery service using    Developed VAIV Bio-Discovery, a              VAIV Bio-Discovery significantly            Currently only covers PubMed
transformer model and retrieval        biomedical neural search service.           improves biomedical information              abstracts, not full-text articles,
augmented generation                The system combines transformer-              retrieval by combining neural                limiting depth of information.
                                       based models (like T5slim_dec) for          search, entity recognition, relation        The number of recognized
                                       relation extraction and Retrieval-          extraction, and RAG-based                    interactions is lower compared to
                                       Augmented Generation (RAG) for              summarization.                               curated databases like CTD because
                                       natural language search and                It provides user-friendly interfaces         it only uses direct extraction without
                                       summarization.                              supporting basic search, entity and          inferred associations.
                                    It uses PubMed abstracts and                  interaction search, and natural             Limited coverage for Chemical-
                                       Therapeutic Target Database (TTD)           language queries.                            Disease Relations (CDR) because of
                                       documents to recognize entities            Outperforms traditional databases in         small training datasets.
                                       (chemicals, genes/proteins, diseases)       discovering new interactions and            Some issues in named entity
                                       and their interactions (e.g., drug-         providing richer summaries.                  recognition (due to synonyms,
                                       drug, chemical-protein, chemical-          Achieved high QA performance:                abbreviations) and relation
                                       disease).                                   ROUGE-1 F-score of 0.912 and                 extraction granularity (especially for
                                    Neural search (using RoBERTa                  BLEU score of 0.795.                         underrepresented classes like
                                       embeddings) is combined with               Positioned as a powerful tool for            modulators, cofactors).
                                       BM25 keyword search to retrieve             hypothesis development, database            Still needs frequent updates and
                                       relevant documents.                         curation, and biomedical research            expansion to other resources (e.g.,
                                                                                   support.                                     full texts, arXiv biomedical papers).
                                                                                                                               Requires improvement in force field
                                                                                                                                models to enhance molecular
                                                                                                                                dynamics simulations if applied.
MoC: Mixtures of Text Chunking        New metrics: Introduce Boundary            Boundary Clarity and Chunk                  The model assumes that document
Learners for Retrieval-Augmented       Clarity and Chunk Stickiness for            Stickiness are effective at directly         structures can be captured using
Generation System                      direct evaluation of chunking quality       measuring chunking quality,                  regular patterns (regex), which may
                                       (instead of indirect QA accuracy).          providing better insight than                not generalize to highly unstructured
                                                                                   downstream QA evaluation alone.              texts.
                                         MoC Framework:                              MoC improves both:                             While computationally efficient,
                                             1. Granularity-aware router               1. Chunking precision (better                 smaller specialized chunkers may
                                                  dynamically selects lightweight,          content retrieval for RAG),              lack general reasoning ability for
                                                  specialized chunkers.                2. Computational efficiency (low              complex or ambiguous inputs.
                                             2. Each chunker outputs regular                resource cost per chunking              Introducing multiple chunkers,
                                                  expressions to segment the text           operation).                              routing, and recovery adds system
                                                  efficiently.                        Achieved superior QA results                  complexity and may pose
                                             3. Edit distance recovery algorithm       compared to baseline chunking                 deployment challenges.
                                                  fixes potential hallucination        strategies on multiple datasets.             Focuses primarily on QA datasets;
                                                  errors from LLM chunkers by         Demonstrates that better chunking             performance on non-QA RAG tasks
                                                  comparing generated chunks           strategies can significantly enhance          (like summarization or multi-hop
                                                  with original text.                  overall RAG system performance                reasoning) remains untested.
                                         The system optimizes the balance              without heavy LLM usage.
                                         between:
                                          High chunking precision (important
                                             for RAG quality),
                                          Low computational overhead (only
                                             activating a small lightweight model
                                             at a time).
Meta-Chunking: Learning Efficient Text    The paper proposes Meta-Chunking,            Meta-Chunking outperformed rule-           Assumes reliable sentence splitting
Segmentation via Logical Perception          a new text segmentation method for          based and similarity-based chunking         before chunking, which may not be
                                             Retrieval-Augmented Generation              methods on 11 benchmarks,                   trivial in noisy or OCR datasets.
                                             (RAG) systems, targeting better             including single-hop and multi-hop         The effectiveness of Margin
                                             logical coherence than traditional          QA tasks.                                   Sampling relies on how well the
                                             rule- or similarity-based chunking.        Achieved 1.32% improvement on               LLM can perceive logical
                                         Two core strategies are introduced:             2WikiMultihopQA while reducing              connections; smaller/weaker LLMs
                                         1. Margin Sampling Chunking:                    chunking time to 45.8% of previous          might degrade performance.
                                             Uses an LLM to perform binary               LLM-based chunkers like                    PPL Chunking requires calculation
                                             classification between consecutive          LumberChunker.                              of perplexities sentence-by-sentence,
                                             sentences (whether they should be          The approach balances logical               which adds computational cost
                                             merged or segmented) based on               consistency, retrieval relevance, and       during chunking pre-processing
                                             margin probability sampling.                efficiency — making it highly               (although cheaper than Gemini-
                                         2. Perplexity (PPL) Chunking:                   suitable for practical RAG pipelines.       based LumberChunker).
                                             Calculates the perplexity of each
                                             sentence given preceding sentences,
                                             identifying chunk boundaries where
                                             perplexity distribution shows
                                             minima (logical break points).
ChunkRAG: Novel LLM-Chunk Filtering   1. Semantic Chunking:                         ChunkRAG substantially reduced           Running an LLM to assess each
Method for RAG Systems                   Documents are first segmented into          hallucination and irrelevance             chunk adds significant
                                         logical, meaningful units —                 compared to standard RAG setups.          computational cost compared to
                                         sentences or coherent sections —           Improved factual accuracy on              traditional retrieval pipelines.
                                         using semantic-based tokenization.          knowledge-intensive tasks (e.g.,         The relevance scoring quality is
                                      2. Relevance Scoring via LLM:                  complex QA benchmarks).                   strongly dependent on how well the
                                         A large language model evaluates           Achieved better retrieval precision       primary LLM understands subtle
                                         each chunk individually, assigning a        by avoiding irrelevant information        domain-specific context.
                                         relevance score based on how well           "leaking" into final generations.        For extremely large documents, even
                                         the chunk aligns with the user query.      Demonstrated that chunk-level             semantic chunking and per-chunk
                                      3. Chunk Filtering:                            filtering is more effective than          evaluation can become inefficient if
                                         Only the most relevant chunks               document-level or passage-level           not batched or optimized.
                                         (above a dynamic threshold) are             methods for high-stakes reasoning
                                         selected and passed forward into the        tasks.
                                         retrieval pipeline.
                                      4. Dynamic Thresholding:
                                         Instead of using a fixed relevance
                                         cut-off, the threshold is dynamically
                                         adjusted based on document
                                         complexity and query specificity.
                                      5. Critic Model Feedback (Optional):
                                         A second LLM (the "Critic") re-
                                         assesses selected chunks to further
                                         improve relevance filtering and
                                         catch false positives.