Skip to content

bmdavis419/gpt-5-cfg-testing

Repository files navigation

GPT-5 can't do parallel tool calls with CFG schemas

I've run 50+ tests today, and I have not gotten GPT-5 to do a single parallel tool call with a CFG schema.

This is not listed anywhere in the docs. I assume the reason why it does this is because it's constraining the model output, which means you have to have it just do one tool call at a time.

Explanations are below; labeled screenshots are provided at the bottom.

FINDINGS

todos-test

GPT-5-MINI MINIMAL REASONING EFFORT

Mode API calls to OpenAI Breakdown
normal functions 3 1 to get current date/time
1 to add 3 todos
1 for final summary
cfg functions 5 1 to get current date/time
1 to add 1 todo
1 to add 1 todo
1 to add 1 todo
1 for final summary

price-test

GPT-5 HIGH REASONING EFFORT

Mode API calls to OpenAI Breakdown
normal functions 2 1 for 4 tool calls (price and shipping info)
1 for final summary
cfg functions 5 1 to get price info tool call
1 to get shipping info tool call
1 to get price info tool call
1 to get shipping info tool call
1 for final summary

email-triage-test

GPT-5 MINIMAL REASONING EFFORT

Mode API calls to OpenAI Breakdown
normal functions 2 1 with 2 tool calls (list unread threads and get calendar availability)
1 for final summary
cfg functions 3 1 with 1 tool call (list unread threads)
1 with 1 tool call (get calendar availability)
1 for final summary

Steps to reproduce

This project uses uv for package management and running.

  1. get an openai api key and add it to a .env file in the root of the project
OPENAI_API_KEY=sk-...
  1. install dependencies
uv sync
  1. run the tests
  • uv run todos-test/cfg_functions.py
  • uv run todos-test/normal_functions.py
  • uv run price-test/cfg_price_compare.py
  • uv run price-test/price_compare.py
  • uv run email-triage-test/cfg_email_triage.py
  • uv run email-triage-test/email_triage.py
  1. check the output in the output directory (and see what tools are being called per request in the terminal output)

Screenshots (labeled)

todos-test

  • normal functions

todos-test — normal functions

  • cfg functions

todos-test — cfg functions

price-test

  • normal functions

price-test — normal functions

  • cfg functions

price-test — cfg functions

email-triage-test

  • normal functions

email-triage-test — normal functions

  • cfg functions

email-triage-test — cfg functions

About

A test of how well the new CFG tool calls work in GPT-5 https://platform.openai.com/docs/guides/function-calling#context-free-grammars

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages