feat: Add vLLM provider support for local models without API key #50

mohammadabushalhoob · 2025-10-26T22:17:30Z

Add VLLMProvider implementation in providers/vllm.go
Support OpenAI-compatible API without authentication
Enable JSON schema validation for vLLM
Support streaming responses
Skip API key validation for vLLM provider in llm/validate.go
Register vLLM provider in providers/provider.go

This enables using local vLLM models with gollm without requiring API key authentication, while maintaining all gollm features like prompt templates, chain of thought, and memory.

Summary by Sourcery

Add full support for local vLLM models by introducing a VLLMProvider implementation that plugs into the existing OpenAI-compatible interface without requiring API key authentication.

New Features:

Implement VLLMProvider with request preparation, response parsing, JSON schema support, and streaming support for local vLLM servers
Register vLLM provider in the provider registry and configure its headers, endpoint, and capabilities

Enhancements:

Skip API key validation for the vLLM provider in the API key validator

sourcery-ai · 2025-10-26T22:17:35Z

Reviewer's Guide

Introduces full local vLLM provider support by implementing a new VLLMProvider (OpenAI-compatible) that requires no API key, integrates into the provider registry, and retains gollm features like JSON schema validation and streaming.

Sequence diagram for vLLMProvider streaming response handling

sequenceDiagram
  participant Client
  participant VLLMProvider
  participant vLLM_Server
  Client->>VLLMProvider: Request streaming completion
  VLLMProvider->>vLLM_Server: POST /chat/completions (stream: true)
  vLLM_Server-->>VLLMProvider: Streamed response chunks
  VLLMProvider-->>Client: Parse and forward streamed chunks

Entity relationship diagram for provider configuration changes

erDiagram
  PROVIDER_REGISTRY {
    string Name
    string Type
    string Endpoint
    string AuthHeader
    string AuthPrefix
    map RequiredHeaders
    bool SupportsSchema
    bool SupportsStreaming
  }
  VLLM_PROVIDER {
    string Name
    string Type
    string Endpoint
    string AuthHeader
    string AuthPrefix
    map RequiredHeaders
    bool SupportsSchema
    bool SupportsStreaming
  }
  PROVIDER_REGISTRY ||--|| VLLM_PROVIDER : includes

Class diagram for the new VLLMProvider implementation

classDiagram
class VLLMProvider {
  - baseURL string
  - model string
  - extraHeaders map[string]string
  - options map~string~interface~
  - logger utils.Logger
  + SetLogger(logger utils.Logger)
  + SetOption(key string, value interface~)
  + SetDefaultOptions(config *config.Config)
  + Name() string
  + Endpoint() string
  + SupportsJSONSchema() bool
  + Headers() map~string~string
  + PrepareRequest(prompt string, options map~string~interface~) ([]byte, error)
  + PrepareRequestWithSchema(prompt string, options map~string~interface~, schema interface~) ([]byte, error)
  + ParseResponse(body []byte) (string, error)
  + HandleFunctionCalls(body []byte) ([]byte, error)
  + SetExtraHeaders(extraHeaders map~string~string)
  + SupportsStreaming() bool
  + PrepareStreamRequest(prompt string, options map~string~interface~) ([]byte, error)
  + ParseStreamResponse(chunk []byte) (string, error)
  + PrepareRequestWithMessages(messages []types.MemoryMessage, options map~string~interface~) ([]byte, error)
}
Provider <|.. VLLMProvider

File-Level Changes

Change	Details	Files
Add VLLMProvider implementation	Define VLLMProvider struct with base URL, model, headers, and options Implement constructor, Name, Endpoint, Headers, PrepareRequest, ParseResponse, and option methods Integrate logger support and memory message handling	`providers/vllm.go`
Enable JSON schema validation for vLLM	Implement PrepareRequestWithSchema for schema-based requests Ensure SupportsJSONSchema returns true Merge global and per-call options into schema requests	`providers/vllm.go`
Support streaming responses	Add SupportsStreaming, PrepareStreamRequest, and ParseStreamResponse methods Handle chunk parsing and EOF signaling	`providers/vllm.go`
Register vLLM in provider registry	Add "vllm" factory entry in NewProviderRegistry Add default config with no auth header, JSON schema, and streaming flags	`providers/provider.go`
Bypass API key validation for vLLM	Update validateAPIKey to skip API key check when provider is "vllm"	`llm/validate.go`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

There’s a lot of duplicated logic across PrepareRequest, PrepareRequestWithSchema, PrepareStreamRequest, and PrepareRequestWithMessages—consider extracting a shared helper for building/marshaling requests and merging options to reduce repetition.
NewVLLMProvider currently just concatenates baseURL without validation—add URL normalization (ensure scheme, trailing slash or v1 path) to avoid malformed endpoints at runtime.
HandleFunctionCalls is unimplemented but consumers might expect it—either implement function-calling support or explicitly document/disable it in the provider capabilities to prevent unexpected errors.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- There’s a lot of duplicated logic across PrepareRequest, PrepareRequestWithSchema, PrepareStreamRequest, and PrepareRequestWithMessages—consider extracting a shared helper for building/marshaling requests and merging options to reduce repetition.
- NewVLLMProvider currently just concatenates baseURL without validation—add URL normalization (ensure scheme, trailing slash or v1 path) to avoid malformed endpoints at runtime.
- HandleFunctionCalls is unimplemented but consumers might expect it—either implement function-calling support or explicitly document/disable it in the provider capabilities to prevent unexpected errors.

## Individual Comments

### Comment 1
<location> `providers/vllm.go:99` </location>
<code_context>
+return headers
+}
+
+// PrepareRequest creates the request body for a vLLM API call.
+func (p *VLLMProvider) PrepareRequest(prompt string, options map[string]interface{}) ([]byte, error) {
+request := map[string]interface{}{
</code_context>

<issue_to_address>
**issue (complexity):** Consider refactoring request-building logic into a helper method to eliminate code duplication and simplify maintenance.

```go
// Add this helper in VLLMProvider to consolidate request‐building/option merging:
func (p *VLLMProvider) buildRequest(
    messages []map[string]interface{},
    opts map[string]interface{},
    extras ...func(map[string]interface{}),
) ([]byte, error) {
    req := map[string]interface{}{
        "model":    p.model,
        "messages": messages,
    }
    // merge provider defaults
    for k, v := range p.options {
        if k != "system_prompt" {
            req[k] = v
        }
    }
    // merge per‐call overrides
    for k, v := range opts {
        if k != "system_prompt" {
            req[k] = v
        }
    }
    // apply any extra customizations
    for _, fn := range extras {
        fn(req)
    }
    return json.Marshal(req)
}

// Then simplify each Prepare… method. Example for PrepareRequest:
func (p *VLLMProvider) PrepareRequest(prompt string, options map[string]interface{}) ([]byte, error) {
    var msgs []map[string]interface{}
    if sp, ok := options["system_prompt"].(string); ok && sp != "" {
        msgs = append(msgs, map[string]interface{}{"role": "system", "content": sp})
    }
    msgs = append(msgs, map[string]interface{}{"role": "user", "content": prompt})
    return p.buildRequest(msgs, options)
}

// And for streaming:
func (p *VLLMProvider) PrepareStreamRequest(prompt string, options map[string]interface{}) ([]byte, error) {
    msgs := []map[string]interface{}{{"role": "user", "content": prompt}}
    return p.buildRequest(msgs, options, func(r map[string]interface{}) {
        r["stream"] = true
    })
}

// And for schema requests:
func (p *VLLMProvider) PrepareRequestWithSchema(
    prompt string, options map[string]interface{}, schemaObj interface{},
) ([]byte, error) {
    msgs := []map[string]interface{}{{"role":"user","content":prompt}}
    if sp, ok := options["system_prompt"].(string); ok && sp != "" {
        msgs = append([]map[string]interface{}{{"role":"system","content":sp}}, msgs...)
    }
    return p.buildRequest(msgs, options, func(r map[string]interface{}) {
        r["response_format"] = map[string]interface{}{"type": "json_object"}
        // if you need to inject the schema itself:
        // r["schema"] = schemaObj
    })
}
```
This removes the repeated message‐building and option‐merging logic while preserving every feature.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

- Add VLLMProvider implementation in providers/vllm.go - Support OpenAI-compatible API without authentication - Enable JSON schema validation for vLLM - Support streaming responses - Skip API key validation for vLLM provider in llm/validate.go - Register vLLM provider in providers/provider.go This enables using local vLLM models (like eduMind-6.7b) with gollm without requiring API key authentication, while maintaining all gollm features like prompt templates, chain of thought, and memory.

- Changed PrepareRequestWithMessages to use 'messages' array instead of 'chat_history' - Aligned with Cohere v2 API specification - Supports system messages as first message in array - Maintains backward compatibility with other methods

- Remove trailing slashes from base URL - Ensure base URL ends with /v1 path - Prevent malformed endpoints at runtime - Addresses sourcery-ai bot feedback

mohammadabushalhoob · 2025-10-26T23:22:34Z

Thank you for the feedback! Here's how we addressed your comments:

Comment 1 (Duplicated logic): We acknowledge this suggestion. However, we've kept the current structure to maintain consistency with other providers in the codebase (OpenAI, Cohere, Anthropic, etc.). This can be refactored in a future PR that addresses all providers uniformly.

Comment 2 (URL validation): ✅ Fixed! We've added URL normalization to the Endpoint() method that:

Removes trailing slashes
Ensures /v1 path is present
Prevents malformed endpoints at runtime
See commit: db1d582 - "fix: Add URL normalization to vLLM provider Endpoint method"

Comment 3 (HandleFunctionCalls): This is already implemented. The method returns a clear error message: "function calling not implemented for vLLM". This follows the same pattern as other providers and can be extended in the future when vLLM models support function calling.

All tests pass successfully with these changes. Ready for review!

Cohere API v2 requires 'bearer' (lowercase) instead of 'Bearer' (uppercase) in the Authorization header. This fixes HTTP 405 errors when using Cohere provider.

- Add baseURL field to CohereProvider struct - Add NewCohereProviderWithURL constructor for custom endpoints - Update Endpoint() to use configurable baseURL - Fix Authorization header to use lowercase 'bearer' (required by Cohere v2 API) This fixes HTTP 405 errors when using Cohere provider.

- Add baseURL field to CohereProvider struct - Add NewCohereProviderWithURL constructor - Update Endpoint() to use configurable baseURL (FIXED) - Fix Authorization header to lowercase 'bearer' This properly fixes HTTP 405 errors with Cohere API.

This ensures baseURL is properly set when creating Cohere provider. Default baseURL is https://api.cohere.com

- Log request body (formatted JSON) - Log endpoint URL - Log headers - This will help debug HTTP 422 errors

HTTP headers are case-insensitive but using standard casing is best practice. Changed from 'Content-type' to 'Content-Type'.

Cohere v2 API requires content to be an array of objects with type and text fields: { "content": [ { "type": "text", "text": "message content" } ] } This fixes HTTP 405 errors caused by sending content as a plain string. Fixed in both PrepareRequest and PrepareRequestWithMessages methods.

System message content must also be an array of objects with type and text fields, not a plain string. This completes the Cohere v2 API format fix.

Cohere v2 API does not support 'top_p' parameter. It only supports: - temperature - max_tokens - seed - frequency_penalty - presence_penalty - k - p This fixes HTTP 405 errors caused by sending unsupported parameters. Fixed in both PrepareRequest and PrepareRequestWithMessages methods.

Cohere v2 API accepts content as string, not array of objects. Changes: - PrepareRequest: Changed content from array to string - PrepareRequestWithMessages: Changed content from array to string - System message: Changed content from array to string This matches the official Cohere v2 API format: { "messages": [ { "role": "user", "content": "message text" } ] } Tested with curl - all requests work correctly with string content.

sourcery-ai bot reviewed Oct 26, 2025

View reviewed changes

mohammadabushalhoob force-pushed the feature/vllm-support branch from 19b9a44 to 0ea740b Compare October 26, 2025 22:29

Mohammad Abu Shalhoob added 2 commits October 27, 2025 01:59

fix: Add URL normalization to vLLM provider Endpoint method

db1d582

- Remove trailing slashes from base URL - Ensure base URL ends with /v1 path - Prevent malformed endpoints at runtime - Addresses sourcery-ai bot feedback

Mohammad Abu Shalhoob added 10 commits October 27, 2025 03:35

fix: Use lowercase 'bearer' in Authorization header for Cohere API

c083070

Cohere API v2 requires 'bearer' (lowercase) instead of 'Bearer' (uppercase) in the Authorization header. This fixes HTTP 405 errors when using Cohere provider.

fix: Update NewCohereProvider to use NewCohereProviderWithURL

4ee68bd

This ensures baseURL is properly set when creating Cohere provider. Default baseURL is https://api.cohere.com

debug: Add detailed logging to Cohere provider

5bd4499

- Log request body (formatted JSON) - Log endpoint URL - Log headers - This will help debug HTTP 422 errors

fix: Use correct Content-Type header (uppercase T)

a3593d0

HTTP headers are case-insensitive but using standard casing is best practice. Changed from 'Content-type' to 'Content-Type'.

fix: Fix system message content format in Cohere provider

3d7cb10

System message content must also be an array of objects with type and text fields, not a plain string. This completes the Cohere v2 API format fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add vLLM provider support for local models without API key #50

feat: Add vLLM provider support for local models without API key #50

Uh oh!

mohammadabushalhoob commented Oct 26, 2025 •

edited

Loading

Uh oh!

sourcery-ai bot commented Oct 26, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

mohammadabushalhoob commented Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Add vLLM provider support for local models without API key #50

Are you sure you want to change the base?

feat: Add vLLM provider support for local models without API key #50

Uh oh!

Conversation

mohammadabushalhoob commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for vLLMProvider streaming response handling

Entity relationship diagram for provider configuration changes

Class diagram for the new VLLMProvider implementation

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

mohammadabushalhoob commented Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mohammadabushalhoob commented Oct 26, 2025 •

edited

Loading

sourcery-ai bot commented Oct 26, 2025 •

edited

Loading