- Dynamic search and rendering for answer-supporting images
- A tiered LLM approach, based on grounding supports relevancy (all Gemini-based):
- Multi-turn custom grounded responses (most preferred)
- Single-turn custom grounded responses
- Single-turn Google Search grounded responses
- Ungrounded responses (least preferred)
- Custom grounding citations rendering
- EDW reporting on sessions, prompts, and responses
- Aggregate LLM request rate limiting (abuse prevention and cost control)
| Environment Variable | Description |
|---|---|
| DATASTORE_LOCATION | Location of the Vertex AI Agent Builder data store for custom grounding (e.g., "global") |
| DATASTORE_ID | ID of the Vertex AI Agent Builder data store for custom grounding (e.g., "fishbot-123") |
| DATASTORE_STATIC_HOST | Protocol and host for the file server containing the custom grounding sources (e.g., "https://example.com") |
| PEXELS_API_KEY | API key from pexels.com, for use in dynamic image rendering |
| REPORTING_DATASET | BigQuery dataset for storing usage reporting data (e.g., "fishbot_reporting") |
| REPORTING_TABLE | BigQuery table for storing usage reporting data, which will be created if it does not already exist (e.g., "responses") |
| REDIS_URL | IP or hostname for the Redis instance which is used for application-level LLM request rate limiting (e.g., "redis://localhost:6379") |
The architecture includes:
- A Cloud Run service with:
- "Internal + load balancer" ingress
- Custom 3600s HTTP request timeout (to reduce websocket-reconnection-related issues)
- An External Global Application Load Balancer spanning both the chatbot and static files (custom grounding source)
- Serverless Network Endpoint backend for the Cloud Run service
- Public backend bucket for the static files
- Vertex AI Agent Builder search app and data store, built over the static files bucket, for grounding
- Cloud Armor backend policy, including:
- Select US sanctioned country blocking: Russia, Iran, Cuba, North Korea, Syria, and Belarus
- OWASP ModSecurity CRS 3.3 scanner protection
- Rate-based 30-minute ban for exceeding 60 requests/minute
- Enforcement key is the concatenation of IP address and HTTP path
- BigQuery dataset for prompt-response usage reporting
- A small Memorystore for Redis instance to back aggregate LLM request rate-limiting