code-evaluation

Here are 30 public repositories matching this topic...

elonbusk2 / evaluator-driven-agent-flow

🛠️ Generate high-quality code automatically with EDAF, a flexible system that adapts to project needs through integrated quality checks.

python typescript code-quality framework-agnostic code-evaluation ai-code-generation self-adapting claude-code

Updated Feb 4, 2026
Shell

doronpers / sono-eval

Star

Explainable developer assessment platform with multi-path evaluation, evidence-based scoring, and growth-focused feedback. Helps candidates understand their strengths and areas for improvement. Built with Python, FastAPI, and Docker.

python docker education machine-learning analytics developer-experience code-evaluation evidence-based explainable-ai technical-assessment fastapi skill-assessment feedback-system growth-mindset developer-assessment semantic-tagging talent-assessment multi-path-evaluation

Updated Jan 31, 2026
Python

Tsuchiya2 / evaluator-driven-agent-flow

Star

A framework-agnostic system for AI-powered code generation with automatic quality gates. Self-adapting workers and evaluators for any language/framework.

python golang typescript code-quality ruby-on-rails framework-agnostic code-evaluation automated-testing ai-code-generation self-adapting claude-code

Updated Jan 22, 2026
Shell

GregoryKogan / scrc

Star

A production-ready worker service for secure, isolated code execution. Consumes submissions from Kafka, executes programs in Docker containers with resource limits, and publishes results. Supports Python, Go, C, C++, and Java with horizontal scaling.

docker golang kafka sandbox online-judge multi-language code-evaluation containerization code-execution horizontal-scaling code-compilation secure-execution

Updated Dec 7, 2025
Go

syed-waleed-ahmed / LLM-as-Judge

Star

A Streamlit web app that uses a Groq-powered LLM (Llama 3) to act as an impartial judge for evaluating and comparing two model outputs. Supports custom criteria, presets like creativity and brand tone, and returns structured scores, explanations, and a winner. Built end-to-end with Python, Groq API, and Streamlit.

python code-evaluation a-b-testing text-evaluation groq streamlit model-benchmarking ai-automation ai-evaluation llm prompt-evaluation llama3 llm-judge output-evaluation scoring-framework

Updated Nov 24, 2025
Python

juanjh1 / Artha-server

Star

Artha is a code evaluation system developed with Django and Django REST Framework that uses Judge0 as the code execution engine.

education backend django-rest-framework online-judge code-evaluation

Updated Nov 14, 2025
Python

QuidVolo / lua-eval-rank-repair

Star

Rank/repair AI-generated Lua game code with pure-Lua tests + tiny LÖVE harness. Includes scheduler, toy linter, micro-bench, GDScript port, and a weighted rubric

benchmark lua linter love2d gdscript code-evaluation game-dev game-scripting ai-evaluation

Updated Oct 3, 2025
Lua

Infinitode / CodeSafe

Star

An open-source Python library for code encryption, decryption, and safe evaluation using Python's built-in AST module, complete with allowed functions, variables, built-in imports, timeouts, and blocked access to attributes.

python sandboxing eval code-evaluation safe-evaluation code-obfuscation codesafe safe-eval

Updated Sep 24, 2025
Python

juliolmuller / algebraic-expressions-parser

Star

A calculator of arithmetic and algebraic expressions with support to functions.

calculator parser binary-tree expressions code-evaluation

Updated Aug 28, 2025
JavaScript

codernoahx / socratiq-ai

Star

SocratiQ AI uses socratic method of teaching to guide users through learning, asking questions that prompt critical thinking and problem-solving rather than providing direct answers.

code-evaluation dsa data-structures-and-algorithms gemini-api problem-generator streamlit

Updated Aug 23, 2025
Python

aidevelopertraining / gowlin

Star

Gowlin: Open-source Secure autograder for LLM agent development and evaluation

python education machine-learning artificial-intelligence autograder code-evaluation llm agent-training

Updated Jun 27, 2025
Python

StressTestor / CodeEfficiencyEvalTool

Star

Python toolkit for automated evaluation and benchmarking of code efficiency, performance, and resource usage. Easily analyze, compare, and score scripts or code snippets in a fast, modular CLI workflow.

python cli productivity engineering benchmark automation performance opensource metrics static-analysis test-suite code-analysis efficiency developer-tools code-evaluation llm

Updated May 26, 2025
Python

codefuse-ai / codefuse-evaluation

Star

Industrial-level evaluation benchmarks for Coding LLMs in the full life-cycle of AI native software developing.企业级代码大模型评测体系,持续开放中

code-evaluation lcc evaluation-framework repository-eval codetranseval codecommenteval codefuse

Updated Apr 28, 2025
Python

k4black / codebleu

Star

Pip compatible CodeBLEU metric implementation available for linux/macos/win

code evaluation code-generation code-evaluation evaluation-metrics codebleu

Updated Mar 31, 2025
Python

Harsh9871 / codeQuest

Star

CodeQuest is a personal project that replicates the core functionalities of the popular coding practice platform, LeetCode. This clone allows users to practice coding problems, track their progress, and participate in coding challenges, all within a self-hosted environment.