🛠️ Generate high-quality code automatically with EDAF, a flexible system that adapts to project needs through integrated quality checks.
-
Updated
Feb 4, 2026 - Shell
🛠️ Generate high-quality code automatically with EDAF, a flexible system that adapts to project needs through integrated quality checks.
Explainable developer assessment platform with multi-path evaluation, evidence-based scoring, and growth-focused feedback. Helps candidates understand their strengths and areas for improvement. Built with Python, FastAPI, and Docker.
A framework-agnostic system for AI-powered code generation with automatic quality gates. Self-adapting workers and evaluators for any language/framework.
A production-ready worker service for secure, isolated code execution. Consumes submissions from Kafka, executes programs in Docker containers with resource limits, and publishes results. Supports Python, Go, C, C++, and Java with horizontal scaling.
A Streamlit web app that uses a Groq-powered LLM (Llama 3) to act as an impartial judge for evaluating and comparing two model outputs. Supports custom criteria, presets like creativity and brand tone, and returns structured scores, explanations, and a winner. Built end-to-end with Python, Groq API, and Streamlit.
Artha is a code evaluation system developed with Django and Django REST Framework that uses Judge0 as the code execution engine.
Rank/repair AI-generated Lua game code with pure-Lua tests + tiny LÖVE harness. Includes scheduler, toy linter, micro-bench, GDScript port, and a weighted rubric
An open-source Python library for code encryption, decryption, and safe evaluation using Python's built-in AST module, complete with allowed functions, variables, built-in imports, timeouts, and blocked access to attributes.
A calculator of arithmetic and algebraic expressions with support to functions.
SocratiQ AI uses socratic method of teaching to guide users through learning, asking questions that prompt critical thinking and problem-solving rather than providing direct answers.
Gowlin: Open-source Secure autograder for LLM agent development and evaluation
Python toolkit for automated evaluation and benchmarking of code efficiency, performance, and resource usage. Easily analyze, compare, and score scripts or code snippets in a fast, modular CLI workflow.
Industrial-level evaluation benchmarks for Coding LLMs in the full life-cycle of AI native software developing.企业级代码大模型评测体系,持续开放中
Pip compatible CodeBLEU metric implementation available for linux/macos/win
CodeQuest is a personal project that replicates the core functionalities of the popular coding practice platform, LeetCode. This clone allows users to practice coding problems, track their progress, and participate in coding challenges, all within a self-hosted environment.
Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs.
The SF Code Evaluator
Frontend for the Codemaze backend
Backend for automated evaluation of programming tasks in higher education
🕸️ Sinatra server for validating Web pages with HTML, CSS and JavaScipt within Mumuki
Add a description, image, and links to the code-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the code-evaluation topic, visit your repo's landing page and select "manage topics."