Skip to content

[P2] security: optional signature verification for MLnode PoC endpoints#537

Closed
x0152 wants to merge 2 commits into
gonka-ai:upgrade-v0.2.12from
x0152:security/mlnode-poc-signature
Closed

[P2] security: optional signature verification for MLnode PoC endpoints#537
x0152 wants to merge 2 commits into
gonka-ai:upgrade-v0.2.12from
x0152:security/mlnode-poc-signature

Conversation

@x0152

@x0152 x0152 commented Jan 9, 2026

Copy link
Copy Markdown
Collaborator

MLnode api endpoints are currently unauthenticated, allowing an attacker who can reach the mlnode port to hijack POC by sending their own callback_url

Adds request signing using the existing signerAccount key. Network Node signs requests with X-Signature header, mlnode verifies using SIGNER_PUBKEY environment variable. For standard deployment where mlnode is on the same machine, it works automatically since SIGNER_PUBKEY defaults to ACCOUNT_PUBKEY

For remote mlnode setups, you need to set SIGNER_PUBKEY manually. If SIGNER_PUBKEY is not set, verification is skipped for backward compatibility

Get public key on network node:

source config.env
inferenced keys show $KEY_NAME --pubkey --keyring-backend $KEYRING_BACKEND | jq -r '.key'

Start mlnode with signature verification:

export SIGNER_PUBKEY=<public_key_from_above>
docker-compose -f docker-compose.mlnode.yml up -d

@akup akup left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is uncomplete patch, as there are following functions in MLNodeClient:

// Training operations
	StartTraining(ctx context.Context, taskId uint64, participant string, nodeId string, masterNodeAddr string, rank int, worldSize int) error
	GetTrainingStatus(ctx context.Context) error

	// Node state operations
	Stop(ctx context.Context, num *int) error
	NodeState(ctx context.Context) (*StateResponse, error)

	// PoC operations
	GetPowStatus(ctx context.Context, num *int) (*PowStatusResponse, error)
	InitGenerate(ctx context.Context, dto InitDto, num *int) error
	InitValidate(ctx context.Context, dto InitDto, num *int) error
	ValidateBatch(ctx context.Context, batch ProofBatch, num *int) error

	// Inference operations
	InferenceHealth(ctx context.Context) (bool, error)
	InferenceUp(ctx context.Context, model string, args []string) error

	// GPU operations
	GetGPUDevices(ctx context.Context) (*GPUDevicesResponse, error)
	GetGPUDriver(ctx context.Context) (*DriverInfo, error)

	// Model management operations
	CheckModelStatus(ctx context.Context, model Model) (*ModelStatusResponse, error)
	DownloadModel(ctx context.Context, model Model) (*DownloadStartResponse, error)
	DeleteModel(ctx context.Context, model Model) (*DeleteResponse, error)
	ListModels(ctx context.Context) (*ModelListResponse, error)
	GetDiskSpace(ctx context.Context) (*DiskSpaceInfo, error)

Stop and InferenceUp are also critical to protect, as they are changing node state.
Other are get commands, but it could be more complete, to sign all the MLNode requests.

@x0152

x0152 commented Jan 9, 2026

Copy link
Copy Markdown
Collaborator Author

Current PR covers POC as the most critical - direct reward theft. Firstly would be good to understand if this direction makes sense. I should update PR description to clarify scope

Comment thread mlnode/packages/pow/src/pow/service/auth.py Outdated
Comment thread mlnode/packages/pow/src/pow/service/routes.py Outdated
@akup

akup commented Jan 14, 2026

Copy link
Copy Markdown
Collaborator

Current PR covers POC as the most critical - direct reward theft. Firstly would be good to understand if this direction makes sense. I should update PR description to clarify scope

I'm just pointing that before cleaning it this PR cannot be merged, because it is bad practice to merge something semi-ready

@x0152 x0152 marked this pull request as ready for review January 15, 2026 15:55
@tcharchian tcharchian added this to Triage Feb 9, 2026
@github-project-automation github-project-automation Bot moved this to New in Triage Feb 9, 2026
@tcharchian tcharchian moved this from New to Needs triage in Triage Feb 9, 2026
@tcharchian tcharchian removed this from Triage Feb 11, 2026
@tcharchian tcharchian added this to the v0.2.11 milestone Feb 11, 2026
@IgnatovFedor IgnatovFedor changed the base branch from main to upgrade-v0.2.11 February 23, 2026 16:40
@IgnatovFedor IgnatovFedor modified the milestones: v0.2.11, v0.2.12 Feb 24, 2026
@tcharchian tcharchian moved this from Todo to Needs reviewer in Upgrade v0.2.12 Feb 28, 2026
@tcharchian tcharchian changed the title security: optional signature verification for MLnode PoC endpoints [P2] security: optional signature verification for MLnode PoC endpoints Mar 21, 2026
@tcharchian tcharchian moved this from Needs reviewer to Waiting on the author in Upgrade v0.2.12 Mar 21, 2026
@tcharchian tcharchian removed the request for review from DimaOrekhovPS March 21, 2026 00:50
@tcharchian

Copy link
Copy Markdown
Collaborator

Hey @x0152 @akup! It would be great if you could sync on the next steps for this pull request and make the needed decisions together. If you can move it forward on your own, it could be included in v0.2.12. But overall, this is a nice-to-have rather than something critical.

@akup

akup commented Mar 23, 2026

Copy link
Copy Markdown
Collaborator

I think it is useful PR, but it should be aligned with #717
And we should merge it with new MLNode and #417

The artifacts that are sent to callback URL also should be signed, as the attacker can send bad artifacts if the URL is not protected, and node will not pass validation on PoC.

I think we should first update main repo with new MLNode PoC apis (now PoC completely lives at https://github.com/gonka-ai/vllm repo). Then look and compare approaches with Jf/token auth and finally take into account with websocket that should go next

@akup akup mentioned this pull request Mar 23, 2026
@IgnatovFedor IgnatovFedor changed the base branch from upgrade-v0.2.11 to upgrade-v0.2.12 March 23, 2026 12:59
@x0152

x0152 commented Mar 25, 2026

Copy link
Copy Markdown
Collaborator Author

@akup If the poc api update is coming, let's postpone #537 and #417 until after that. #717 can go in this upgrade - it's optional, shouldn't be affected by PoC API changes and enables cloud-hosted MLNode setups

#537 and #417 for security and stability should go on top of the new api once it is validated on mainnet, to avoid regressions

what do you think about that?

@akup

akup commented Mar 26, 2026

Copy link
Copy Markdown
Collaborator

@x0152 yes #717 can go in this release, but I want to align vision that best option of jf/token and request signatures go to main.
And hornestly, haven't time to dig it yet

But it seams that really it is better to align signatures in other PRs to jf/tokens that are present at #717

@x0152

x0152 commented Mar 27, 2026

Copy link
Copy Markdown
Collaborator Author

Agree, let's merge #717 in this release and postpone #537/ #417 to the next one

@x0152 x0152 removed this from the v0.2.12 milestone Mar 27, 2026
@akup akup mentioned this pull request Apr 1, 2026
@akup

akup commented Apr 1, 2026

Copy link
Copy Markdown
Collaborator

@x0152 I've added small PR that cleans the mlnode. Take a look: #994

I think this one can be merged next

@patimen

patimen commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

@akup , @x0152 - Do we want to push on this one? You seemed ready to push it forward.

@akup

akup commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator

I think that certification of communication between devshard (or dapi) and mlnode should be certified. Because there are always some attacks on open ports and this is negative to adoption and newcomers.
I think it could be scheduled to 0.2.13

@x0152

x0152 commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator Author

Agreed - let's include this in v0.2.13

@x0152 x0152 modified the milestone: v0.2.13 Apr 29, 2026
@x0152 x0152 marked this pull request as draft April 29, 2026 09:16
@tcharchian

Copy link
Copy Markdown
Collaborator

Hi @x0152, are you ready to include this PR in the next upgrade?

@x0152

x0152 commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

#1329

@x0152 x0152 closed this Jun 10, 2026
@tcharchian tcharchian modified the milestones: v0.2.14, v0.2.15 Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[P2] Security MerkleTree Proofs; Merge participant validation till block0; Need to add signature check at recording

5 participants