Decoding Biology Hackathon Platform

This folder contains all the tools and resources needed for participants to start hacking Biology.

🚀 Quick Start

Train set ground truth and test set data:

The data should already be present in your instance at
```
/home/ec2-user/SageMaker/data/
```
Install dependencies:

Create a virtual environment and install all dependencies (via uv):
```
bash install.sh
```
Start the vLLM server (once per team):

Start your own inference endpoint.
```
bash start_vllm_docker.sh
```
This will start a local LLM endpoint for you and your team, that you can access from e.g. localhost:8000 (check the contents of start_vllm_docker.sh).
Optional : Create a HF token for your account if you plan to use a private model or one with a required user agreement.

📁 Files Overview

Core Notebooks

vllm_demo.ipynb - Demo notebook showing how to use the vLLM server
smolagents_demo.ipynb - Demo notebook for using SmolAgents. This might be a good solution to call python code and use the output to answer Q&A questions.
generate_answers_demo.ipynb - Demo notebook for using the vLLM server to generate a submission.

Scripts

upload_answers.py - Script to validate and upload JSONL files to S3

Configuration

pyproject.toml - Python dependencies
install.sh - Installation script
start_vllm_docker.sh - Script to start the vLLM Docker container

Uploading Answers

The expected format of the answers is a JSONL file with the following fields:

{
  "question": "Is G3BP1 druggable with monoclonal antibodies?", # From the original data
  "options": "{\"A\": \"No\", \"B\": \"Yes\"}", # From the original data
  "answer_letter": "A", # Response extracted from the model's raw_response
  "raw_response": "<think>...</think><answer>A</answer>" # Optional, but better to include it
}

With one line per question.

You need to provide the original question and options from the original datasets, as they are used for looking up the correct answer (the order does not matter).

Including the raw_response is optional to be on the leaderboard, but we will ask the winning team to provide it. In other words, we will only consider full submissions with the raw_responses for the leaderboard prize.

You can use the following script to upload your answers to the test set:

# Validate only
uv run python upload_answers.py test_answers.jsonl --team-name "Team1" --validate-only

# Upload 
uv run python upload_answers.py test_answers.jsonl --team-name "Team1"

# Upload with team name and tag (e.g., model name)
uv run python upload_answers.py test_answers.jsonl --team-name "Team1" --tag "qwen3_8b_no_tooling"

Viewing the Leaderboard

The leaderboard available at https://d18bag07vdubnx.cloudfront.net/

The total score is the percentage of the test set that has a correctly formatted answer and is also correct.

📊 Data Format

The provided data has the following format:

Input Training set Questions (hackathon-train.json)

{
  "question": "Is G3BP1 druggable with monoclonal antibodies?",
  "options": "{\"A\": \"No\", \"B\": \"Yes\"}",
  "answer": "A",
  "question_type": "antibody",
  "metadata": "{\"target_protein\": \"G3BP1\", \"original_question\": \"Target X can be targeted by Monoclonal Ab ?\", \"original_answer\": 0, \"answer_type\": \"binary\", \"question_category\": \"subquestion 6\", \"template_used\": \"Is {target} druggable with monoclonal antibodies?\", \"data_row_index\": 70}",
  "dataset_name": "Therapeutic Target Profiling"
}

Input Test set Questions (hackathon-test.json)

{
   "question": "Based on phenylbutazone (computed as the average activity of: CYP2C19, NR1I2, CYP2D6, CYP3A4, TP53, ESR2, EHMT2, CYP2C9, MCL1, PTGS2, and 6 more genes) signature activity patterns from bulk RNA-seq data, which cancer type is more similar to Pheochromocytoma and Paraganglioma?",
   "options": "{\"A\": \"Ovarian serous cystadenocarcinoma\", \"B\": \"Prostate adenocarcinoma\"}",
   "question_type": "cancer_similarity_binary",
   "metadata": "{'options': array(['OV', 'PRAD'], dtype=object), 'signature': 'phenylbutazone', 'split': 'test', 'subject': 'PCPG'}",
   "dataset_name": "TCGA Cancer Similarity"
}

🔍 Troubleshooting and best practices

Common Issues

vLLM Server Not Running

# Check if Docker is running
docker ps

# Check the GPU usage
nvidia-smi

# Start the server
./start_vllm_docker.sh

If you want to start fresh, you can kill all docker processes and all python processes running on the GPU. This will kill the processes for everyone on your team.:

pkill -f "docker"
# Kill all processes running on the GPU
nvidia-smi | grep 'python' | awk '{ print $3 }' | xargs -n1 kill -9

# Check again nvidia-smi, GPUs should be free
nvidia-smi

File Not Found Errors
- Ensure hackathon-train.json is in the current directory
- Check file paths in the notebook
Answer Format Issues
- You can ask the LLM to format its answer as <answer>[letter]</answer> where letter is A, B, C, D, etc, then use a regex like shown in generate_answers_demo.ipynb to extract the answer. Bear in mind that the LLM might not always follow the format, you may need to do prompt engineering.
- You just need to include the answer letter in the jsonl responses file. Make sure that it corresponds to one of the original options.

🎯 Best Practices and ideas

Encourage test time compute: Several studies have shown that LLMs can improve their performance with test time compute. You can encourage this by adding a prompt like "Think through the question step by step", or "First define each biological concept and then answer the question" etc...
Choosing the right model:
- You do not have to use vLLM for inference. You can use any system that you like. Feel free to query your favorite assistant to get a list of models that you can run on the available resources (8xL4 GPUs, 192Go VRAM).
- Some models are already finetuned on adjacent domains, e.g. MedGemma
- You can also look into larger quantized models that are published on Hugging Face (e.g. https://huggingface.co/models?other=base_model:quantized:openai/gpt-oss-120b).
Prompt optimization: Careful prompting both the system prompt and the user requests is the key to having good performances. This is where you can inject expert knowledge and guide the model's reasoning. You can also try some of the promising prompt optimization framework, using the training set : gepa, GAAPO or promptomatix are examples.
Post-training: You can use the training set to try to fine-tune open-weights models, probably up to 32B with LoRA. Unfortunately, the resources will not be enough to do reinforcement learning.

Good luck with your submissions! 🧬🏆

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.gitignore		.gitignore
README.md		README.md
generate_answers_demo.ipynb		generate_answers_demo.ipynb
image.png		image.png
install.sh		install.sh
pyproject.toml		pyproject.toml
smolagents_demo.ipynb		smolagents_demo.ipynb
start_vllm_docker.sh		start_vllm_docker.sh
upload_answers.py		upload_answers.py
uv.lock		uv.lock
vllm_demo.ipynb		vllm_demo.ipynb
vllm_in_python.ipynb		vllm_in_python.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Decoding Biology Hackathon Platform

🚀 Quick Start

📁 Files Overview

Core Notebooks

Scripts

Configuration

Uploading Answers

Viewing the Leaderboard

📊 Data Format

Input Training set Questions (hackathon-train.json)

Input Test set Questions (hackathon-test.json)

🔍 Troubleshooting and best practices

Common Issues

🎯 Best Practices and ideas

About

Uh oh!

Releases

Packages

Uh oh!

Languages

owkin/decoding_biology_hackathon

Folders and files

Latest commit

History

Repository files navigation

Decoding Biology Hackathon Platform

🚀 Quick Start

📁 Files Overview

Core Notebooks

Scripts

Configuration

Uploading Answers

Viewing the Leaderboard

📊 Data Format

Input Training set Questions (hackathon-train.json)

Input Test set Questions (hackathon-test.json)

🔍 Troubleshooting and best practices

Common Issues

🎯 Best Practices and ideas

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages