-
Notifications
You must be signed in to change notification settings - Fork 58
feat: Add vLLM provider support for local models without API key #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Add vLLM provider support for local models without API key #50
Conversation
Reviewer's GuideIntroduces full local vLLM provider support by implementing a new VLLMProvider (OpenAI-compatible) that requires no API key, integrates into the provider registry, and retains gollm features like JSON schema validation and streaming. Sequence diagram for vLLMProvider streaming response handlingsequenceDiagram
participant Client
participant VLLMProvider
participant vLLM_Server
Client->>VLLMProvider: Request streaming completion
VLLMProvider->>vLLM_Server: POST /chat/completions (stream: true)
vLLM_Server-->>VLLMProvider: Streamed response chunks
VLLMProvider-->>Client: Parse and forward streamed chunks
Entity relationship diagram for provider configuration changeserDiagram
PROVIDER_REGISTRY {
string Name
string Type
string Endpoint
string AuthHeader
string AuthPrefix
map RequiredHeaders
bool SupportsSchema
bool SupportsStreaming
}
VLLM_PROVIDER {
string Name
string Type
string Endpoint
string AuthHeader
string AuthPrefix
map RequiredHeaders
bool SupportsSchema
bool SupportsStreaming
}
PROVIDER_REGISTRY ||--|| VLLM_PROVIDER : includes
Class diagram for the new VLLMProvider implementationclassDiagram
class VLLMProvider {
- baseURL string
- model string
- extraHeaders map[string]string
- options map~string~interface~
- logger utils.Logger
+ SetLogger(logger utils.Logger)
+ SetOption(key string, value interface~)
+ SetDefaultOptions(config *config.Config)
+ Name() string
+ Endpoint() string
+ SupportsJSONSchema() bool
+ Headers() map~string~string
+ PrepareRequest(prompt string, options map~string~interface~) ([]byte, error)
+ PrepareRequestWithSchema(prompt string, options map~string~interface~, schema interface~) ([]byte, error)
+ ParseResponse(body []byte) (string, error)
+ HandleFunctionCalls(body []byte) ([]byte, error)
+ SetExtraHeaders(extraHeaders map~string~string)
+ SupportsStreaming() bool
+ PrepareStreamRequest(prompt string, options map~string~interface~) ([]byte, error)
+ ParseStreamResponse(chunk []byte) (string, error)
+ PrepareRequestWithMessages(messages []types.MemoryMessage, options map~string~interface~) ([]byte, error)
}
Provider <|.. VLLMProvider
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- There’s a lot of duplicated logic across PrepareRequest, PrepareRequestWithSchema, PrepareStreamRequest, and PrepareRequestWithMessages—consider extracting a shared helper for building/marshaling requests and merging options to reduce repetition.
- NewVLLMProvider currently just concatenates baseURL without validation—add URL normalization (ensure scheme, trailing slash or v1 path) to avoid malformed endpoints at runtime.
- HandleFunctionCalls is unimplemented but consumers might expect it—either implement function-calling support or explicitly document/disable it in the provider capabilities to prevent unexpected errors.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- There’s a lot of duplicated logic across PrepareRequest, PrepareRequestWithSchema, PrepareStreamRequest, and PrepareRequestWithMessages—consider extracting a shared helper for building/marshaling requests and merging options to reduce repetition.
- NewVLLMProvider currently just concatenates baseURL without validation—add URL normalization (ensure scheme, trailing slash or v1 path) to avoid malformed endpoints at runtime.
- HandleFunctionCalls is unimplemented but consumers might expect it—either implement function-calling support or explicitly document/disable it in the provider capabilities to prevent unexpected errors.
## Individual Comments
### Comment 1
<location> `providers/vllm.go:99` </location>
<code_context>
+return headers
+}
+
+// PrepareRequest creates the request body for a vLLM API call.
+func (p *VLLMProvider) PrepareRequest(prompt string, options map[string]interface{}) ([]byte, error) {
+request := map[string]interface{}{
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring request-building logic into a helper method to eliminate code duplication and simplify maintenance.
```go
// Add this helper in VLLMProvider to consolidate request‐building/option merging:
func (p *VLLMProvider) buildRequest(
messages []map[string]interface{},
opts map[string]interface{},
extras ...func(map[string]interface{}),
) ([]byte, error) {
req := map[string]interface{}{
"model": p.model,
"messages": messages,
}
// merge provider defaults
for k, v := range p.options {
if k != "system_prompt" {
req[k] = v
}
}
// merge per‐call overrides
for k, v := range opts {
if k != "system_prompt" {
req[k] = v
}
}
// apply any extra customizations
for _, fn := range extras {
fn(req)
}
return json.Marshal(req)
}
// Then simplify each Prepare… method. Example for PrepareRequest:
func (p *VLLMProvider) PrepareRequest(prompt string, options map[string]interface{}) ([]byte, error) {
var msgs []map[string]interface{}
if sp, ok := options["system_prompt"].(string); ok && sp != "" {
msgs = append(msgs, map[string]interface{}{"role": "system", "content": sp})
}
msgs = append(msgs, map[string]interface{}{"role": "user", "content": prompt})
return p.buildRequest(msgs, options)
}
// And for streaming:
func (p *VLLMProvider) PrepareStreamRequest(prompt string, options map[string]interface{}) ([]byte, error) {
msgs := []map[string]interface{}{{"role": "user", "content": prompt}}
return p.buildRequest(msgs, options, func(r map[string]interface{}) {
r["stream"] = true
})
}
// And for schema requests:
func (p *VLLMProvider) PrepareRequestWithSchema(
prompt string, options map[string]interface{}, schemaObj interface{},
) ([]byte, error) {
msgs := []map[string]interface{}{{"role":"user","content":prompt}}
if sp, ok := options["system_prompt"].(string); ok && sp != "" {
msgs = append([]map[string]interface{}{{"role":"system","content":sp}}, msgs...)
}
return p.buildRequest(msgs, options, func(r map[string]interface{}) {
r["response_format"] = map[string]interface{}{"type": "json_object"}
// if you need to inject the schema itself:
// r["schema"] = schemaObj
})
}
```
This removes the repeated message‐building and option‐merging logic while preserving every feature.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
- Add VLLMProvider implementation in providers/vllm.go - Support OpenAI-compatible API without authentication - Enable JSON schema validation for vLLM - Support streaming responses - Skip API key validation for vLLM provider in llm/validate.go - Register vLLM provider in providers/provider.go This enables using local vLLM models (like eduMind-6.7b) with gollm without requiring API key authentication, while maintaining all gollm features like prompt templates, chain of thought, and memory.
19b9a44 to
0ea740b
Compare
- Changed PrepareRequestWithMessages to use 'messages' array instead of 'chat_history' - Aligned with Cohere v2 API specification - Supports system messages as first message in array - Maintains backward compatibility with other methods
- Remove trailing slashes from base URL - Ensure base URL ends with /v1 path - Prevent malformed endpoints at runtime - Addresses sourcery-ai bot feedback
|
Thank you for the feedback! Here's how we addressed your comments: Comment 1 (Duplicated logic): We acknowledge this suggestion. However, we've kept the current structure to maintain consistency with other providers in the codebase (OpenAI, Cohere, Anthropic, etc.). This can be refactored in a future PR that addresses all providers uniformly. Comment 2 (URL validation): ✅ Fixed! We've added URL normalization to the Endpoint() method that: Removes trailing slashes Comment 3 (HandleFunctionCalls): This is already implemented. The method returns a clear error message: "function calling not implemented for vLLM". This follows the same pattern as other providers and can be extended in the future when vLLM models support function calling. All tests pass successfully with these changes. Ready for review! |
Cohere API v2 requires 'bearer' (lowercase) instead of 'Bearer' (uppercase) in the Authorization header. This fixes HTTP 405 errors when using Cohere provider.
- Add baseURL field to CohereProvider struct - Add NewCohereProviderWithURL constructor for custom endpoints - Update Endpoint() to use configurable baseURL - Fix Authorization header to use lowercase 'bearer' (required by Cohere v2 API) This fixes HTTP 405 errors when using Cohere provider.
- Add baseURL field to CohereProvider struct - Add NewCohereProviderWithURL constructor - Update Endpoint() to use configurable baseURL (FIXED) - Fix Authorization header to lowercase 'bearer' This properly fixes HTTP 405 errors with Cohere API.
This ensures baseURL is properly set when creating Cohere provider. Default baseURL is https://api.cohere.com
- Log request body (formatted JSON) - Log endpoint URL - Log headers - This will help debug HTTP 422 errors
HTTP headers are case-insensitive but using standard casing is best practice. Changed from 'Content-type' to 'Content-Type'.
Cohere v2 API requires content to be an array of objects with type and text fields:
{
"content": [
{
"type": "text",
"text": "message content"
}
]
}
This fixes HTTP 405 errors caused by sending content as a plain string.
Fixed in both PrepareRequest and PrepareRequestWithMessages methods.
System message content must also be an array of objects with type and text fields, not a plain string. This completes the Cohere v2 API format fix.
Cohere v2 API does not support 'top_p' parameter. It only supports: - temperature - max_tokens - seed - frequency_penalty - presence_penalty - k - p This fixes HTTP 405 errors caused by sending unsupported parameters. Fixed in both PrepareRequest and PrepareRequestWithMessages methods.
Cohere v2 API accepts content as string, not array of objects.
Changes:
- PrepareRequest: Changed content from array to string
- PrepareRequestWithMessages: Changed content from array to string
- System message: Changed content from array to string
This matches the official Cohere v2 API format:
{
"messages": [
{
"role": "user",
"content": "message text"
}
]
}
Tested with curl - all requests work correctly with string content.
This enables using local vLLM models with gollm without requiring API key authentication, while maintaining all gollm features like prompt templates, chain of thought, and memory.
Summary by Sourcery
Add full support for local vLLM models by introducing a VLLMProvider implementation that plugs into the existing OpenAI-compatible interface without requiring API key authentication.
New Features:
Enhancements: