Skip to content

Conversation

@mohammadabushalhoob
Copy link

@mohammadabushalhoob mohammadabushalhoob commented Oct 26, 2025

  • Add VLLMProvider implementation in providers/vllm.go
  • Support OpenAI-compatible API without authentication
  • Enable JSON schema validation for vLLM
  • Support streaming responses
  • Skip API key validation for vLLM provider in llm/validate.go
  • Register vLLM provider in providers/provider.go

This enables using local vLLM models with gollm without requiring API key authentication, while maintaining all gollm features like prompt templates, chain of thought, and memory.

Summary by Sourcery

Add full support for local vLLM models by introducing a VLLMProvider implementation that plugs into the existing OpenAI-compatible interface without requiring API key authentication.

New Features:

  • Implement VLLMProvider with request preparation, response parsing, JSON schema support, and streaming support for local vLLM servers
  • Register vLLM provider in the provider registry and configure its headers, endpoint, and capabilities

Enhancements:

  • Skip API key validation for the vLLM provider in the API key validator

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Oct 26, 2025

Reviewer's Guide

Introduces full local vLLM provider support by implementing a new VLLMProvider (OpenAI-compatible) that requires no API key, integrates into the provider registry, and retains gollm features like JSON schema validation and streaming.

Sequence diagram for vLLMProvider streaming response handling

sequenceDiagram
  participant Client
  participant VLLMProvider
  participant vLLM_Server
  Client->>VLLMProvider: Request streaming completion
  VLLMProvider->>vLLM_Server: POST /chat/completions (stream: true)
  vLLM_Server-->>VLLMProvider: Streamed response chunks
  VLLMProvider-->>Client: Parse and forward streamed chunks
Loading

Entity relationship diagram for provider configuration changes

erDiagram
  PROVIDER_REGISTRY {
    string Name
    string Type
    string Endpoint
    string AuthHeader
    string AuthPrefix
    map RequiredHeaders
    bool SupportsSchema
    bool SupportsStreaming
  }
  VLLM_PROVIDER {
    string Name
    string Type
    string Endpoint
    string AuthHeader
    string AuthPrefix
    map RequiredHeaders
    bool SupportsSchema
    bool SupportsStreaming
  }
  PROVIDER_REGISTRY ||--|| VLLM_PROVIDER : includes
Loading

Class diagram for the new VLLMProvider implementation

classDiagram
class VLLMProvider {
  - baseURL string
  - model string
  - extraHeaders map[string]string
  - options map~string~interface~
  - logger utils.Logger
  + SetLogger(logger utils.Logger)
  + SetOption(key string, value interface~)
  + SetDefaultOptions(config *config.Config)
  + Name() string
  + Endpoint() string
  + SupportsJSONSchema() bool
  + Headers() map~string~string
  + PrepareRequest(prompt string, options map~string~interface~) ([]byte, error)
  + PrepareRequestWithSchema(prompt string, options map~string~interface~, schema interface~) ([]byte, error)
  + ParseResponse(body []byte) (string, error)
  + HandleFunctionCalls(body []byte) ([]byte, error)
  + SetExtraHeaders(extraHeaders map~string~string)
  + SupportsStreaming() bool
  + PrepareStreamRequest(prompt string, options map~string~interface~) ([]byte, error)
  + ParseStreamResponse(chunk []byte) (string, error)
  + PrepareRequestWithMessages(messages []types.MemoryMessage, options map~string~interface~) ([]byte, error)
}
Provider <|.. VLLMProvider
Loading

File-Level Changes

Change Details Files
Add VLLMProvider implementation
  • Define VLLMProvider struct with base URL, model, headers, and options
  • Implement constructor, Name, Endpoint, Headers, PrepareRequest, ParseResponse, and option methods
  • Integrate logger support and memory message handling
providers/vllm.go
Enable JSON schema validation for vLLM
  • Implement PrepareRequestWithSchema for schema-based requests
  • Ensure SupportsJSONSchema returns true
  • Merge global and per-call options into schema requests
providers/vllm.go
Support streaming responses
  • Add SupportsStreaming, PrepareStreamRequest, and ParseStreamResponse methods
  • Handle chunk parsing and EOF signaling
providers/vllm.go
Register vLLM in provider registry
  • Add "vllm" factory entry in NewProviderRegistry
  • Add default config with no auth header, JSON schema, and streaming flags
providers/provider.go
Bypass API key validation for vLLM
  • Update validateAPIKey to skip API key check when provider is "vllm"
llm/validate.go

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • There’s a lot of duplicated logic across PrepareRequest, PrepareRequestWithSchema, PrepareStreamRequest, and PrepareRequestWithMessages—consider extracting a shared helper for building/marshaling requests and merging options to reduce repetition.
  • NewVLLMProvider currently just concatenates baseURL without validation—add URL normalization (ensure scheme, trailing slash or v1 path) to avoid malformed endpoints at runtime.
  • HandleFunctionCalls is unimplemented but consumers might expect it—either implement function-calling support or explicitly document/disable it in the provider capabilities to prevent unexpected errors.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- There’s a lot of duplicated logic across PrepareRequest, PrepareRequestWithSchema, PrepareStreamRequest, and PrepareRequestWithMessages—consider extracting a shared helper for building/marshaling requests and merging options to reduce repetition.
- NewVLLMProvider currently just concatenates baseURL without validation—add URL normalization (ensure scheme, trailing slash or v1 path) to avoid malformed endpoints at runtime.
- HandleFunctionCalls is unimplemented but consumers might expect it—either implement function-calling support or explicitly document/disable it in the provider capabilities to prevent unexpected errors.

## Individual Comments

### Comment 1
<location> `providers/vllm.go:99` </location>
<code_context>
+return headers
+}
+
+// PrepareRequest creates the request body for a vLLM API call.
+func (p *VLLMProvider) PrepareRequest(prompt string, options map[string]interface{}) ([]byte, error) {
+request := map[string]interface{}{
</code_context>

<issue_to_address>
**issue (complexity):** Consider refactoring request-building logic into a helper method to eliminate code duplication and simplify maintenance.

```go
// Add this helper in VLLMProvider to consolidate request‐building/option merging:
func (p *VLLMProvider) buildRequest(
    messages []map[string]interface{},
    opts map[string]interface{},
    extras ...func(map[string]interface{}),
) ([]byte, error) {
    req := map[string]interface{}{
        "model":    p.model,
        "messages": messages,
    }
    // merge provider defaults
    for k, v := range p.options {
        if k != "system_prompt" {
            req[k] = v
        }
    }
    // merge per‐call overrides
    for k, v := range opts {
        if k != "system_prompt" {
            req[k] = v
        }
    }
    // apply any extra customizations
    for _, fn := range extras {
        fn(req)
    }
    return json.Marshal(req)
}

// Then simplify each Prepare… method. Example for PrepareRequest:
func (p *VLLMProvider) PrepareRequest(prompt string, options map[string]interface{}) ([]byte, error) {
    var msgs []map[string]interface{}
    if sp, ok := options["system_prompt"].(string); ok && sp != "" {
        msgs = append(msgs, map[string]interface{}{"role": "system", "content": sp})
    }
    msgs = append(msgs, map[string]interface{}{"role": "user", "content": prompt})
    return p.buildRequest(msgs, options)
}

// And for streaming:
func (p *VLLMProvider) PrepareStreamRequest(prompt string, options map[string]interface{}) ([]byte, error) {
    msgs := []map[string]interface{}{{"role": "user", "content": prompt}}
    return p.buildRequest(msgs, options, func(r map[string]interface{}) {
        r["stream"] = true
    })
}

// And for schema requests:
func (p *VLLMProvider) PrepareRequestWithSchema(
    prompt string, options map[string]interface{}, schemaObj interface{},
) ([]byte, error) {
    msgs := []map[string]interface{}{{"role":"user","content":prompt}}
    if sp, ok := options["system_prompt"].(string); ok && sp != "" {
        msgs = append([]map[string]interface{}{{"role":"system","content":sp}}, msgs...)
    }
    return p.buildRequest(msgs, options, func(r map[string]interface{}) {
        r["response_format"] = map[string]interface{}{"type": "json_object"}
        // if you need to inject the schema itself:
        // r["schema"] = schemaObj
    })
}
```
This removes the repeated message‐building and option‐merging logic while preserving every feature.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

- Add VLLMProvider implementation in providers/vllm.go
- Support OpenAI-compatible API without authentication
- Enable JSON schema validation for vLLM
- Support streaming responses
- Skip API key validation for vLLM provider in llm/validate.go
- Register vLLM provider in providers/provider.go

This enables using local vLLM models (like eduMind-6.7b) with gollm
without requiring API key authentication, while maintaining all
gollm features like prompt templates, chain of thought, and memory.
Mohammad Abu Shalhoob added 2 commits October 27, 2025 01:59
- Changed PrepareRequestWithMessages to use 'messages' array instead of 'chat_history'
- Aligned with Cohere v2 API specification
- Supports system messages as first message in array
- Maintains backward compatibility with other methods
- Remove trailing slashes from base URL
- Ensure base URL ends with /v1 path
- Prevent malformed endpoints at runtime
- Addresses sourcery-ai bot feedback
@mohammadabushalhoob
Copy link
Author

Thank you for the feedback! Here's how we addressed your comments:

Comment 1 (Duplicated logic): We acknowledge this suggestion. However, we've kept the current structure to maintain consistency with other providers in the codebase (OpenAI, Cohere, Anthropic, etc.). This can be refactored in a future PR that addresses all providers uniformly.

Comment 2 (URL validation): ✅ Fixed! We've added URL normalization to the Endpoint() method that:

Removes trailing slashes
Ensures /v1 path is present
Prevents malformed endpoints at runtime
See commit: db1d582 - "fix: Add URL normalization to vLLM provider Endpoint method"

Comment 3 (HandleFunctionCalls): This is already implemented. The method returns a clear error message: "function calling not implemented for vLLM". This follows the same pattern as other providers and can be extended in the future when vLLM models support function calling.

All tests pass successfully with these changes. Ready for review!

Mohammad Abu Shalhoob added 10 commits October 27, 2025 03:35
Cohere API v2 requires 'bearer' (lowercase) instead of 'Bearer' (uppercase)
in the Authorization header. This fixes HTTP 405 errors when using Cohere provider.
- Add baseURL field to CohereProvider struct
- Add NewCohereProviderWithURL constructor for custom endpoints
- Update Endpoint() to use configurable baseURL
- Fix Authorization header to use lowercase 'bearer' (required by Cohere v2 API)

This fixes HTTP 405 errors when using Cohere provider.
- Add baseURL field to CohereProvider struct
- Add NewCohereProviderWithURL constructor
- Update Endpoint() to use configurable baseURL (FIXED)
- Fix Authorization header to lowercase 'bearer'

This properly fixes HTTP 405 errors with Cohere API.
This ensures baseURL is properly set when creating Cohere provider.
Default baseURL is https://api.cohere.com
- Log request body (formatted JSON)
- Log endpoint URL
- Log headers
- This will help debug HTTP 422 errors
HTTP headers are case-insensitive but using standard casing is best practice.
Changed from 'Content-type' to 'Content-Type'.
Cohere v2 API requires content to be an array of objects with type and text fields:
{
  "content": [
    {
      "type": "text",
      "text": "message content"
    }
  ]
}

This fixes HTTP 405 errors caused by sending content as a plain string.

Fixed in both PrepareRequest and PrepareRequestWithMessages methods.
System message content must also be an array of objects with type and text fields,
not a plain string. This completes the Cohere v2 API format fix.
Cohere v2 API does not support 'top_p' parameter. It only supports:
- temperature
- max_tokens
- seed
- frequency_penalty
- presence_penalty
- k
- p

This fixes HTTP 405 errors caused by sending unsupported parameters.

Fixed in both PrepareRequest and PrepareRequestWithMessages methods.
Cohere v2 API accepts content as string, not array of objects.

Changes:
- PrepareRequest: Changed content from array to string
- PrepareRequestWithMessages: Changed content from array to string
- System message: Changed content from array to string

This matches the official Cohere v2 API format:
{
  "messages": [
    {
      "role": "user",
      "content": "message text"
    }
  ]
}

Tested with curl - all requests work correctly with string content.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant