Skip to content

A curated collection of resources for prompt engineering, optimization, and automatic prompt generation across text, image, video, and multimodal AI systems.

License

Notifications You must be signed in to change notification settings

malteos/awesome-prompt-optimization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Awesome Prompt Optimization ✨

Awesome

A curated collection of resources for prompt engineering, optimization, and automatic prompt generation across text, image, video, and multimodal AI systems.

Prompt optimization is the systematic process of improving prompts to achieve better AI model performance, consistency, and safety. This collection covers everything from manual techniques and best practices to cutting-edge automated optimization frameworks, evaluation tools, and research papers.

Contents

Manual Prompt Engineering

Best Practices and Techniques

Chain-of-Thought and Reasoning

Domain-Specific Optimization

Interactive Learning

Automatic Prompt Optimization

Frameworks and Libraries

  • DSPy - Stanford's declarative framework for programming language models with automatic optimization (20k+ stars)
  • Automatic Prompt Engineer (APE) - Treats instructions as programs optimized through LLM-proposed candidates with up to 25% improvement on benchmarks
  • EvoPrompt - Microsoft's evolutionary algorithm approach connecting LLMs with genetic optimization for discrete prompt optimization
  • RLPrompt - Reinforcement learning framework formulating prompt optimization as RL problem with policy networks

Advanced Optimization Methods

  • GeneticPromptLab - Python library using genetic algorithms with Sentence Transformers and k-means clustering
  • PromptAgent - Monte Carlo Tree Search approach for expert-level prompt optimization using strategic planning
  • Promptomatix (Salesforce) - AI-driven framework with DSPy integration and real-time human feedback

Meta-Prompting and Instruction Optimization

  • GEPA (Genetic-Pareto) Research - Outperforms GRPO by 10% average using 35x fewer rollouts than traditional RL approaches
  • AutoPrompt - Intent-based prompt calibration framework for production moderation and classification tasks

Multimodal Prompting

Text-to-Image Optimization

  • Comprehensive Text-to-Image Guide - Cross-platform strategies for DALL-E, Midjourney, and Stable Diffusion with structured templates
  • Style Transfer Methodology - Separating style and content prompts for consistent aesthetic control
  • NegOpt Research - Automated negative prompt optimization achieving 25% improvement in Inception Score
  • PromptHero - World's largest searchable database with millions of prompts across major image generation models

Vision-Language Models

  • Awesome Prompting on Vision-Language Model - Comprehensive survey of prompting methods for CLIP, BLIP, LLaVA, and GPT-4V
  • LLaVA - Visual instruction tuning framework with two-stage training and GPT-4 generated data
  • ViP-LLaVA - Framework for understanding visual prompts with rectangles, ellipses, points, and arrows

Video and Audio Prompting

Multimodal Integration

Tools and Frameworks

Development and Testing

  • promptfoo - Developer-friendly testing tool with red teaming, CI/CD integration, and multi-provider support (5k+ stars)
  • Microsoft PromptBench - Unified evaluation framework supporting 30+ models with adversarial testing and dynamic evaluation
  • DeepEval - Pytest-like framework with G-Eval, hallucination detection, and safety vulnerability scanning
  • PromptTools (Hegel AI) - Open-source platform for prompt testing with local playground and Jupyter integration

IDE Extensions and Development Environment

  • VS Code Prompt Runner - Transforms VS Code into powerful prompt IDE with multi-provider support and agent workflows
  • Microsoft Prompty Extension - VS Code extension for single-file prompt assets with built-in execution and debugging
  • LangChain Visualizer - Real-time visualization for LangChain workflows with cost tracking and interactive trace inspection

Production and Optimization

  • LangChain - Comprehensive framework with prompt templates, chain system, and agent frameworks (90k+ stars)
  • Promptim (LangChain Labs) - Experimental optimization library with automated loops and human feedback integration

Research Papers

Foundational Work

Automatic Prompt Engineering

Evaluation and Robustness

  • The Prompt Report (2024) - Most comprehensive survey with taxonomy of 58 LLM techniques and 40 multimodal techniques
  • Benchmarking LLM Uncertainty - Introduced benchmark evaluating uncertainty metrics for prompt optimization with Answer, Correctness, Aleatoric, Epistemic measures

Multimodal and Specialized Applications

  • Visual Prompting in MLLMs Survey - First comprehensive survey on visual prompting methods examining alignment between visual encoders and LLMs
  • Transferability of Visual Prompts - Proposed Transferable Visual Prompting enabling cross-MLLM prompt transfer with consistency alignment
  • MathCoder2 - Novel method for mathematical reasoning using model-translated code with comprehensive MathCode-Pile dataset

Datasets and Benchmarks

Comprehensive Evaluation Frameworks

  • PromptBench (Microsoft) - Unified framework supporting GLUE, MMLU, BigBench Hard, GSM8K with adversarial and dynamic evaluation
  • HELM (Stanford) - Holistic evaluation across 16 scenarios and 7 metrics (accuracy, calibration, robustness, fairness, bias, toxicity, efficiency)
  • BigBench Hard - 23 challenging tasks requiring chain-of-thought reasoning where LLMs previously failed to exceed human performance

Prompt Collections and Templates

Mathematical and Reasoning Benchmarks

  • GSM8K - 8.5K high-quality grade school math word problems (7.5K training, 1K test) with 2-8 step solutions
  • MMLU (Massive Multitask Language Understanding) - 15,908 questions across 57 subjects from elementary to professional level
  • MultiArith - Arithmetic word problems requiring multi-step reasoning for chain-of-thought evaluation

Bias and Safety Evaluation

  • BOLD Dataset - 23,679 prompts across 5 demographic domains for bias evaluation in open-ended language generation
  • HolisticBias - 460,000 sentence prompts across 13 demographic axes with 600 associated terms per category
  • TruthfulQA - Evaluation framework for truthfulness and factual accuracy with adversarial question design

Multimodal Evaluation Suites

  • VQAv2 - Visual question answering with image-text pairs for vision-language model evaluation
  • MMBench - 3,000 single-choice questions across 20 vision skills with bilingual support
  • MathVista - Visual mathematical reasoning evaluation combining multiple math datasets with visual elements

Educational Resources

Beginner-Friendly Guides

Advanced and Specialized Training

Community Resources

  • Awesome Prompt Engineering - Hand-curated collection focusing on GPT, ChatGPT, and PaLM with tools, datasets, and research papers (5k+ stars)
  • The Big Prompt Library - System prompts and custom instructions with multi-provider support and security focus
  • F/Awesome ChatGPT Prompts - Curated collection of ready-to-use ChatGPT prompts for creative and professional applications

Contributing

We welcome contributions! Here's how you can help improve this collection:

  1. Adding Resources: Submit PRs with new tools, papers, or datasets that fit our focus on open-source and research resources
  2. Improving Descriptions: Help make descriptions more accurate and helpful for both beginners and experts
  3. Categorization: Suggest better organization or new categories as the field evolves
  4. Quality Control: Report broken links, outdated information, or resources that no longer meet quality standards

Contribution Guidelines

  • Focus on open-source tools and freely accessible resources
  • Include brief but informative descriptions (1-2 sentences)
  • Provide context on target audience (beginner/intermediate/advanced)
  • Verify links work and resources are actively maintained
  • Follow the existing format and categorization structure

Quality Criteria

  • Educational Value: Resources should teach techniques or provide practical value
  • Accessibility: Prefer free and open-source over commercial/proprietary tools
  • Recency: Favor resources updated within the last 2 years unless foundational
  • Documentation: Well-documented tools and clear usage instructions

Maintained by the communityLicensed under CC0Star this repo if it helps you!

License

CC0

About

A curated collection of resources for prompt engineering, optimization, and automatic prompt generation across text, image, video, and multimodal AI systems.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published