Skip to content

Conversation

@jLopezbarb
Copy link
Contributor

@jLopezbarb jLopezbarb commented Dec 18, 2025

Signed-off-by: Javier Lopez javier@okteto.com

Proposed changes

Fixes DEV-1270

Overview

This PR introduces InClusterConnector, a new BuildKit connector optimized for in-cluster execution scenarios. When Okteto CLI runs inside the cluster (remote deploys, pipeline installer, or managed namespace pods), it can now connect directly to BuildKit pods via their internal IP, bypassing the ingress layer.

Problem

Previously, all BuildKit connections went through the Ingress (public endpoint), even when the CLI was running inside the same cluster:

┌─────────────────────────────────────────────────────────────────┐
│                        Kubernetes Cluster                       │
│  ┌──────────────┐                               ┌──────────────┐  │
│  │  Remote Pod  │ ──── Internet ────► Ingress ─►│  BuildKit    │   │
│  │  (okteto)    │      (unnecessary)           │  Pod         │          │
│  └──────────────┘                               └──────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Solution

The new InClusterConnector connects directly to BuildKit pods via internal IP:

┌─────────────────────────────────────────────────────────────────┐
│                        Kubernetes Cluster                        │
│  ┌──────────────┐         direct TCP          ┌──────────────┐  │
│  │  Remote Pod  │ ─────────────────────────► │  BuildKit    │  │
│  │  (okteto)    │       (pod IP:1234)         │  Pod         │  │
│  └──────────────┘                              └──────────────┘  │
└─────────────────────────────────────────────────────────────────┘

How was it solved:

  • Adds a new connector to the connector selection. This new connector is used when you are running inside an okteto pod (OKTETO_MANAGED_POD) in a remote execution or in an installer job
  • We verify if the connection to the cluster is correct and if not retry the connection
  • There is a graceful reconnection if the build fails with a retryable error
  • Fallback to ingress if the API doesn't return a buildkit pod correctly

Diagrams:

Connector Selector

flowchart TB
    subgraph "Connector Selection"
        CLI[Okteto CLI] --> Check{Where am I running?}
        
        Check -->|"OKTETO_DEPLOY_REMOTE=true<br/>or OKTETO_IN_INSTALLER=true<br/>or OKTETO_MANAGED_POD=true"| IC[InClusterConnector]
        Check -->|"OKTETO_BUILD_QUEUE_ENABLED=true"| PF[PortForwarder]
        Check -->|"Default (local machine)"| IG[IngressConnector]
    end

    subgraph "Connection Methods"
        IC -->|"tcp://pod-ip:1234"| BK[BuildKit Pod]
        PF -->|"localhost:random → pod:1234"| BK
        IG -->|"tcp://buildkit.okteto.dev"| Ingress[Ingress] --> BK
    end

    style IC fill:#51cf66,stroke:#333
    style PF fill:#339af0,stroke:#333
    style IG fill:#ffd43b,stroke:#333
Loading

Comparison:

Connector Use Case Connection Path Network Dependency
InClusterConnector Remote commands, Installer, Managed pods Direct to Pod IP Internal only
PortForwarder Local CLI with queue enabled Port forward to Pod Requires K8s API
IngressConnector Local CLI (default) Via public Ingress External network

How to validate

  1. Using this CLI in an autocreate up session, run okteto build -l debug and check that is working
  2. Using this CLI in an autocreate up session, run okteto build -l debug, kill the pod in the middle of the execution and wait for the reconnection

CLI Quality Reminders 🔧

For both authors and reviewers:

  • Scrutinize for potential regressions
  • Ensure key automated tests are in place
  • Build the CLI and test using the validation steps
  • Assess Developer Experience impact (log messages, performances, etc)
  • If too broad, consider breaking into smaller PRs
  • Adhere to our code style and code review guidelines

Signed-off-by: Javier Lopez <javier@okteto.com>
- Updated the GetKubeToken method across multiple files to include a target parameter, allowing for more flexible token requests.
- Modified related tests to accommodate the new parameter, ensuring consistent behavior and improved functionality.
- Introduced a new method in the Client struct to retrieve the Buildkit client, enhancing the overall client capabilities.

This change aims to improve the integration and usability of the kubetoken functionality within the Okteto environment.

Signed-off-by: Javier Lopez <javier@okteto.com>
Signed-off-by: Javier Lopez <javier@okteto.com>
- Added an `isActive` field to the PortForwarder struct to track the active state of the port forward connection.
- Updated the `WaitUntilIsReady` method to reuse existing port forward connections if already active, improving efficiency.
- Modified the `Stop` method to set `isActive` to false when stopping the port forward connection.

This change aims to optimize the port forwarding process by preventing unnecessary reconnections and enhancing resource management.

Signed-off-by: Javier Lopez <javier@okteto.com>
- Refactored the initialization logic for the buildkit connector in the OktetoBuilder to ensure that a direct connector is used when the build queue is enabled, simplifying the connection handling.
- This change enhances the clarity of the connection setup process and ensures consistent behavior when establishing connections for builds.

Signed-off-by: Javier Lopez <javier@okteto.com>
Signed-off-by: Javier Lopez <javier@okteto.com>
- Updated the build command to utilize a Buildkit connector, allowing for improved connection management during build operations.
- Refactored the builder initialization across various commands to ensure consistent use of the Buildkit connector.
- Added methods to retrieve the Buildkit connector from the builder interface, facilitating reuse across multiple build and deploy operations.

This change aims to streamline the build process and enhance the overall efficiency of the deployment workflow.

Signed-off-by: Javier Lopez <javier@okteto.com>
Signed-off-by: Javier Lopez <javier@okteto.com>
Signed-off-by: Javier Lopez <javier@okteto.com>
- Added Buildkit client support to the FakeOktetoClient for improved testing capabilities.
- Implemented a mockBuildkitClient to simulate Buildkit responses and errors in tests.
- Updated the PortForwarder to include enhanced waiting logic for Buildkit pod readiness, with exponential backoff for polling.
- Improved error handling and logging during the port forwarding process to provide clearer feedback on connection status.

These changes aim to streamline the integration of Buildkit within the Okteto framework and enhance the reliability of related tests.

Signed-off-by: Javier Lopez <javier@okteto.com>
Signed-off-by: Javier Lopez <javier@okteto.com>
Signed-off-by: Javier Lopez <javier@okteto.com>
- Removed the nested forwarder struct from PortForwarder, integrating its fields directly into PortForwarder for improved clarity and accessibility.
- Updated related tests to reflect the new structure, ensuring consistent handling of stop and ready channels.
- Enhanced logging to provide clearer feedback on port forwarding status.

These changes aim to streamline the PortForwarder implementation and improve test maintainability.

Signed-off-by: Javier Lopez <javier@okteto.com>
Signed-off-by: Javier Lopez <javier@okteto.com>
- Added a no-op implementation of the connection manager for the Buildkit connector, allowing for simplified connection handling in tests and scenarios where a connection is not required.
- Updated the Runner and IngressConnector to utilize the new no-op connection manager, ensuring compatibility with existing logic while improving testability.
- Enhanced the Waiter to incorporate the connection manager, allowing for better management of connection states during readiness checks.

These changes aim to improve the flexibility and maintainability of the Buildkit integration within the Okteto framework.

Signed-off-by: Javier Lopez <javier@okteto.com>
Signed-off-by: Javier Lopez <javier@okteto.com>
- Modified the initialization of the Buildkit client waiter in the DirectConnector to utilize a NoOpConnectionManager, enhancing connection management during build operations.
- This change aims to improve the flexibility and maintainability of the Buildkit integration within the Okteto framework.

Signed-off-by: Javier Lopez <javier@okteto.com>
- Deleted the DirectConnector implementation from the buildkit connector package as it is no longer needed. This cleanup helps streamline the codebase and improve maintainability.

Signed-off-by: Javier Lopez <javier@okteto.com>
Signed-off-by: Javier Lopez <javier@okteto.com>
Signed-off-by: Javier Lopez <javier@okteto.com>
Signed-off-by: Javier Lopez <javier@okteto.com>
- Added a `buildkitClient` field to the PortForwarder struct to cache the Buildkit client, improving efficiency by reusing existing clients.
- Updated the `GetBuildkitClient` method to return the cached client if available, reducing unnecessary client creation.
- Modified the `Stop` method to clear the cached client when stopping the port forward connection.

These changes aim to optimize the management of Buildkit clients within the port forwarding process.

Signed-off-by: Javier Lopez <javier@okteto.com>
Signed-off-by: Javier Lopez <javier@okteto.com>
Signed-off-by: Javier Lopez <javier@okteto.com>
@jLopezbarb jLopezbarb requested a review from a team as a code owner December 18, 2025 15:19
Signed-off-by: Javier Lopez <javier@okteto.com>
@codecov
Copy link

codecov bot commented Dec 18, 2025

Codecov Report

❌ Patch coverage is 8.73016% with 115 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.35%. Comparing base (78f1992) to head (6c0c158).

❌ Your patch check has failed because the patch coverage (8.73%) is below the target coverage (60.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4851      +/-   ##
==========================================
- Coverage   48.50%   48.35%   -0.15%     
==========================================
  Files         373      374       +1     
  Lines       31004    31118     +114     
==========================================
+ Hits        15037    15047      +10     
- Misses      14768    14872     +104     
  Partials     1199     1199              
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

func shouldUseInClusterConnector() bool {
return env.LoadBoolean(constants.OktetoDeployRemote) || // Remote commands (deploy --remote, destroy --remote, test)
config.RunningInInstaller() || // Pipeline installer
env.LoadBoolean("OKTETO_MANAGED_POD") // Pods in managed namespaces
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to extract a constant, too?

conn, err := connector.NewInClusterConnector(context.Background(), okCtx, logger)
if err != nil {
logger.Infof("could not create in-cluster connector: %s, falling back to ingress", err)
logger.Out().Warning("Could not create in-cluster buildkit connector, falling back to ingress connector")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I don't think this customer face warning means something to the final user. The don't probably now what it means. To review later on

return fmt.Errorf("failed to start in-cluster connection: %w", err)
}
}
return ic.waiter.WaitUntilIsUp(ctx, ic.GetBuildkitClient)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline, this could make the operation to be stuck waiting until timeout if the podIP changes between we get it, and this point. I think this connector wouldn't need this and just trust that the IP will be available. Then, we should make sure that any error trying to connect to BuildKit is retryable, so a new podIP is requested.

Try the different things we discussed, and see how it behaves

Base automatically changed from jlo/reconnection-logic to master December 19, 2025 11:48
Signed-off-by: Javier Lopez <javier@okteto.com>
Signed-off-by: Javier Lopez <javier@okteto.com>
Signed-off-by: Javier Lopez <javier@okteto.com>
Signed-off-by: Javier Lopez <javier@okteto.com>
Copy link
Member

@ifbyol ifbyol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, but it is looking good!

}

// createBuildkitClient creates a buildkit client with TLS certificates (like PortForwarder)
func (ic *InClusterConnector) createBuildkitClient(ctx context.Context) (*client.Client, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: as we talked offline, possible optimization for the future. Make a singleton client, and detect if the ic.podIP is the same that the one used for the existent client, and if don't, create a new one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not trivial to get it from the Buildkit client so I will try to do it in another one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants