Skip to content

Conversation

@timburks
Copy link
Contributor

@timburks timburks commented Aug 25, 2022

I've been testing with remote databases and very high parallelism, logging and handling errors that arose from that, and saw some issues related to our lack of transaction support.

This PR wraps all mutating operations in action handlers in transactions (stage 1 of #706). The transaction wrapper also translates errors into gRPC errors, which with other proposed changes would allow us remove the majority of grpcErrorForDBError calls in the storage package.

Since delete actions are wrapped in transactions, transactions are removed from the deletion support code in the storage package (stage 2 of #706).

Stages 3-5 of #706 are left for future PRs.

@timburks timburks changed the title Wrap database updates in transactions Registry server: wrap update actions in transactions Aug 25, 2022
@timburks timburks marked this pull request as draft August 25, 2022 22:05
@codecov
Copy link

codecov bot commented Aug 25, 2022

Codecov Report

Merging #700 (a1ece2a) into main (e9c6f8c) will increase coverage by 1.02%.
The diff coverage is 70.89%.

❗ Current head a1ece2a differs from pull request most recent head aa02e3e. Consider uploading reports for the commit aa02e3e to get more accurate results

@@            Coverage Diff             @@
##             main     #700      +/-   ##
==========================================
+ Coverage   55.68%   56.70%   +1.02%     
==========================================
  Files          93       93              
  Lines        7959     8008      +49     
==========================================
+ Hits         4432     4541     +109     
+ Misses       3063     3028      -35     
+ Partials      464      439      -25     
Impacted Files Coverage Δ
server/registry/actions_deployment_revisions.go 43.06% <50.00%> (+6.48%) ⬆️
server/registry/server.go 45.58% <50.00%> (+0.42%) ⬆️
server/registry/actions_spec_revisions.go 46.20% <52.50%> (+8.03%) ⬆️
server/registry/actions_apis.go 73.10% <78.84%> (+9.01%) ⬆️
server/registry/actions_versions.go 73.10% <78.84%> (+9.01%) ⬆️
server/registry/actions_artifacts.go 81.60% <80.00%> (+5.88%) ⬆️
server/registry/actions_deployments.go 72.78% <80.00%> (+4.94%) ⬆️
server/registry/actions_specs.go 75.89% <81.35%> (+3.76%) ⬆️
server/registry/actions_projects.go 80.48% <84.00%> (+7.27%) ⬆️
... and 7 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

logger.Info("Success.")
case codes.Internal:
logger.Error("Internal error.")
case codes.Canceled:
Copy link
Contributor Author

@timburks timburks Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds support for all gRPC status codes so that logging won't report "User error" for the unrecognized ones. It's in this commit because I used it for testing but I'll make a separate PR for just this change (to interceptor.go) to hopefully commit before this one.

if _, ok := status.FromError(err); ok {
return err
}
if err == context.DeadlineExceeded {
Copy link
Contributor Author

@timburks timburks Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file adds support for new errors that were encountered in load-testing of the server to verify the changes in this PR. I think they are off-topic enough to put in a separate PR that just modifies errors.go and would be merged prior to this one.

@timburks timburks changed the title Registry server: wrap update actions in transactions Registry server: wrap mutating actions in transactions Aug 30, 2022
@timburks timburks marked this pull request as ready for review August 30, 2022 23:35
@timburks timburks requested a review from theganyo August 30, 2022 23:35
@timburks timburks requested a review from seaneganx August 30, 2022 23:35
@timburks
Copy link
Contributor Author

@seaneganx if we merge this before you're back, PTAL anyway, comment on anything, and we'll address it in issues/followup PRs.

var response *rpc.Artifact
if err := s.runWithTransaction(ctx, func(ctx context.Context, db *storage.Client) error {
// Creation should only succeed when the parent exists.
switch parent := parent.(type) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Probably subjective, but might this be easier to read like this?

	var err error
	switch parent := parent.(type) {
	case names.Project:
		_, err = db.GetProject(ctx, parent)
	case names.Api:
		_, err = db.GetApi(ctx, parent)
	case names.Version:
		_, err = db.GetVersion(ctx, parent)
	case names.Spec:
		_, err = db.GetSpec(ctx, parent)
	case names.Deployment:
		_, err = db.GetDeployment(ctx, parent)
	}
	if err != nil {
		return err
	}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return s.storageClient, nil
}

func (s *RegistryServer) runWithTransaction(ctx context.Context, fn func(ctx context.Context, db *storage.Client) error) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Personally, I would prefer runInTransaction() or asTransaction().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll change this to "runInTransaction"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@theganyo
Copy link
Member

theganyo commented Aug 31, 2022

That last commit (a1ece2a) adds code without supporting tests. Is it reasonable to test?

@timburks
Copy link
Contributor Author

timburks commented Aug 31, 2022

@theganyo let's discuss that -- I was just looking into retry configuration before adding a comment about that change. We can trigger the concurrency problem by running the bulk uploaders, e.g. registry upload bulk discovery, but I think this is too big to put in our CI flow. It might be better to directly construct a test by forcing a set of concurrent updates -- maybe starting several goroutines that all update the same resource at once? I think that's out of scope for this PR and would rather 1) leave this change in and file an issue for testing or 2) pull the last commit and address it in a different PR. [EDIT: I'll do 2) and start a new PR]

The change in that commit is observable here which explicitly ignores the (weird) AlreadyExists errors that concurrent updates can trigger. With the change, those errors are reported as Aborted and are retried by the client libraries (because we configured this).

@theganyo
Copy link
Member

I've been worried that we don't have a way of reliably triggering the concurrency / transactional conditions that we've been running into - and thus we'd be unable to verify that we've eliminated the issues. That said, if we can reliably trigger our concurrency issues through the bulk uploader and we don't have the means to more specifically test at this point, perhaps that's good enough for now.

// with improvements closer to the database level.
updateSpecMutex.Lock()
defer updateSpecMutex.Unlock()
log.Debugf(ctx, "Acquired lock after blocking for %v", time.Since(before))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this here, but will address the underlying problem in #715

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants