Benchmark test for CRUD operations on Api/Api Version/Api spec/Artifact #480

giteshk · 2022-03-07T22:42:31Z

The benchmark tests can be used against any running registry service

To run it against a local running service you can run

source auth/LOCAL.sh
export REGISTRY_PROJECT_IDENTIFIER=bench
apg admin create-project --project_id=$REGISTRY_PROJECT_IDENTIFIER
go test --bench=Create ./tests/benchmark --benchtime=10x
go test --bench=Update ./tests/benchmark --benchtime=10x
go test --bench=Get ./tests/benchmark --benchtime=10x
go test --bench=List ./tests/benchmark --benchtime=10x
go test --bench=Delete ./tests/benchmark --benchtime=10x

If you wish to create 5 versions for every API set export REGISTRY_VERSION_COUNT=5

To run this test against hosted version

source auth/HOSTED.sh
export REGISTRY_PROJECT_IDENTIFIER=<Registry project name>
apg admin create-project --project_id=$REGISTRY_PROJECT_IDENTIFIER
go test --bench=Create ./tests/benchmark --benchtime=10x
go test --bench=Update ./tests/benchmark --benchtime=10x
go test --bench=Get ./tests/benchmark --benchtime=10x
go test --bench=List ./tests/benchmark --benchtime=10x
go test --bench=Delete ./tests/benchmark --benchtime=10x

seaneganx

There are a lot of new features in here that I don't understand. Can you expand the documentation to explain each feature and why it's useful?

tests/README.md

tests/benchmark/crud_test.go

seaneganx · 2022-03-15T21:28:22Z

tests/benchmark/crud_test.go

+			// Only take reading for the first version
+			if j == 1 {
+				b.StartTimer()
+			}


Why do we only measure the first version? Why do we continue creating other versions after that?

I wanted to create 3 versions and 3 specs. But the benchmark would add up time of all 3 versions which is how it is reported.

To make sure our reports are accurate we only measure the first pass for revisions and specs but let the scripts create all versions.

Right, so let's talk about my second question.

Why do we continue creating other versions after that?

This is supposed to be benchmarking CreateApiVersion alone. Each iteration of the benchmark loop should only create one ApiVersion. Why are we creating three?

If I'm understanding correctly, BenchmarkCreateApiVersion a benchmark and setup code for BenchmarkCreateApiSpec. Is that correct?

Tests should work correctly regardless of the order they execute in. We should be able to run go test -bench=Get ./tests/benchmark/... to successfully measure the performance of every Get method.

.github/workflows/composite/benchmark/action.yml

tests/benchmark/crud_test.go

seaneganx · 2022-03-15T21:50:36Z

tests/benchmark/crud_test.go

+func readAndGZipFile(filename string) (*bytes.Buffer, error) {
+	fileBytes, _ := ioutil.ReadFile(filename)
+	var buf bytes.Buffer
+	zw, _ := gzip.NewWriterLevel(&buf, gzip.BestCompression)
+	_, err := zw.Write(fileBytes)
+	if err != nil {
+		return nil, err
+	}
+	if err := zw.Close(); err != nil {
+		return nil, err
+	}
+	return &buf, nil
+}


Can we gzip the files that are being checked into the testdata/ directory instead? Doing it here seems like unnecessary work.

I want the ability to change the spec in the future. If we believe this should be a larger file.

I reworked the code to reduce the amount of times we read and GZip the file to once

I agree that being able to change the spec is a good feature to have. What happens if we want to benchmark a file that is not compressed? We can't do that with the current code.

The benchmarks should load the file without compressing it because it gives us the most flexibility. We can update the file to be compressed or not, and empty or megabytes large. IMO we should start with benchmarks uploading the current spec in uncompressed form because it's simple and fairly representative of a typical spec size.

More generally, if we find we want to test against a variety of specs, we can add a flag to specify which file the test should load. For now though, YAGNI.

Added steps to run benchmark test on Github pull requests Added benchmark tests for Listing Apis/ApiVersions/ApiSpecs/Artifacts

Additionally run benchmark tests on PRs only when the registry server code has been updated.

It will be added as a separate PR

this will be addressed in a seperate PR.

seaneganx

Thanks for splitting this into smaller parts Gitesh. Still, there are a lot of new features and changes to how the benchmarks work, so I have a lot of questions and concerns.

Many of my concerns are about the accuracy of the benchmarks. We should strive to keep the benchmarks as simple as possible because the cost of accidentally producing incorrect measurements are high. I think there's a lot we can do to reduce complexity here.

tests/benchmark/crud_test.go

seaneganx · 2022-04-05T20:46:16Z

tests/benchmark/crud_test.go

+func getRootResource(b *testing.B) string {
+	b.Helper()
+
+	p, ok := os.LookupEnv("REGISTRY_PROJECT_IDENTIFIER")
+	if !ok || p == "" {
+		p = "bench"
+	}
+	return fmt.Sprintf("projects/%s/locations/global", p)
+}
+
+func loadVersionCount(b *testing.B) int {
+	b.Helper()
+
+	s, ok := os.LookupEnv("REGISTRY_VERSION_COUNT")
+	if !ok {
+		return 1
+	}
+
+	c, err := strconv.Atoi(s)
+	if err != nil {
+		b.Fatalf("Setup: Invalid REGISTRY_VERSION_COUNT %q: count must be an integer", s)
+	}
+
+	return c
+}
+
+func getApiName(i int, b *testing.B) string {
+	b.Helper()
+
+	prefix, ok := os.LookupEnv("API_NAME_PREFIX")
+	if !ok {
+		prefix = "benchtest"
+	}
+	s, ok := os.LookupEnv("API_NAME_START_OFFSET")
+	if ok {
+		offset, err := strconv.Atoi(s)
+		if err != nil {
+			b.Fatalf("Setup: Invalid API_NAME_START_OFFSET %q: offset must be an integer", s)
+		}
+		i += offset
+	}
+
+	return fmt.Sprintf("%s-%d", prefix, i)
+}


Can you replace these environment variables with flags and still achieve the same functionality? The server tests already use flags for test configuration, and I would like to remain consistent if possible.

As a bonus, it should simplify the code here because the flag library already handles type verification and default values. Then if users accidentally set the wrong flag, it will give them an error instead of silently defaulting.

Great recommendation ! made it so much simpler to handle

seaneganx · 2022-04-05T20:54:17Z

tests/benchmark/crud_test.go

+func readAndGZipFile(filename string) (*bytes.Buffer, error) {
+	fileBytes, _ := ioutil.ReadFile(filename)
+	var buf bytes.Buffer
+	zw, _ := gzip.NewWriterLevel(&buf, gzip.BestCompression)
+	_, err := zw.Write(fileBytes)
+	if err != nil {
+		return nil, err
+	}
+	if err := zw.Close(); err != nil {
+		return nil, err
+	}
+	return &buf, nil
+}


I agree that being able to change the spec is a good feature to have. What happens if we want to benchmark a file that is not compressed? We can't do that with the current code.

The benchmarks should load the file without compressing it because it gives us the most flexibility. We can update the file to be compressed or not, and empty or megabytes large. IMO we should start with benchmarks uploading the current spec in uncompressed form because it's simple and fairly representative of a typical spec size.

More generally, if we find we want to test against a variety of specs, we can add a flag to specify which file the test should load. For now though, YAGNI.

seaneganx · 2022-04-05T21:14:04Z

tests/benchmark/crud_test.go

+func pauseAfter1000Iterations(i int) {
+	if i%1000 == 0 {
+		/*
+			Pause 5 seconds every 1000 iterations to reduce chances of failure on
+			CREATE, UPDATE, DELETE
+		*/
+		time.Sleep(5 * time.Second)
+	}
+}


Can we replace this with a simpler throttle? Making a file-level variable with a fixed duration to sleep between every iteration seems much easier to understand.

I parameterized the tests so we can pass different specs. It defaults to the openapi example, but we provide grpc or graphql specs in the future.

The throttle is a variable we set and is passed through the flags.

tests/benchmark/crud_test.go

seaneganx · 2022-04-05T21:16:37Z

tests/benchmark/crud_test.go

+		/**
+		Go Benchmarking always runs all the benchmark tests once before it runs
+		for the duration specified. So the first row always returns error.
+		*/


Do you have a link to any documentation about this? Why does it cause an error?

Added this info in the commit message:

golang/go#32051
The issue seems to be fixed for --benchtime=1x but does not fix --benchtime=100X

seaneganx · 2022-04-05T21:23:30Z

tests/benchmark/crud_test.go

+		i += offset
+	}
+
+	return fmt.Sprintf("%s-%d", prefix, i)


Say I ran the create script and created 100 APIs. I wanted to add 100 more APIs to an existing environment without having to delete and rerun the script with 200 APIs. This was Needed when I started seeing errors while trying to adding 2000 APIs. so I could break down the create into smaller chunks

I understand it was helpful during development, but why are we including it now? I'm not understanding the benefit to having custom prefixes and offsets. If I want to add 100 more APIs, then I can pick a new prefix and re-run the benchmarks on the same project.

Say I want to create 10k APIs to test.
I can run the create scripts 5 times with 2k APIs created every time.
Once the 10k APIs are created, we can now run the GET and LIST benchmarks on the 10k APIs and they follow the same pattern for api names (test-1000 test-5000 , test-10000)

seaneganx · 2022-04-05T21:42:25Z

tests/benchmark/crud_test.go

+			// Only take reading for the first version
+			if j == 1 {
+				b.StartTimer()
+			}


Right, so let's talk about my second question.

Why do we continue creating other versions after that?

This is supposed to be benchmarking CreateApiVersion alone. Each iteration of the benchmark loop should only create one ApiVersion. Why are we creating three?

If I'm understanding correctly, BenchmarkCreateApiVersion a benchmark and setup code for BenchmarkCreateApiSpec. Is that correct?

Tests should work correctly regardless of the order they execute in. We should be able to run go test -bench=Get ./tests/benchmark/... to successfully measure the performance of every Get method.

seaneganx · 2022-04-05T21:43:05Z

tests/benchmark/crud_test.go

+	buf, err := readAndGZipFile(filepath.Join("testdata", "openapi1.yaml"))
+	if err != nil {
+		b.Errorf("BenchmarkCreateApiSpec could not read openapi1.yaml : %s", err)
+	} else {


This else can be removed. https://github.com/golang/go/wiki/CodeReviewComments#indent-error-flow

You are correct

If I'm understanding correctly, BenchmarkCreateApiVersion a benchmark and setup code for BenchmarkCreateApiSpec. Is that correct?

I think you are right in terms of we don't need to iterate through every api spec version for the UPDATE/GET calls
That really does not give us any metrics and is wasted execution time. I changed the logic for Update and gets to only operate on the first spec of the first revision of every api.

I left the looping over api versions on the Deletes, to clean everything that was created during the tests.

tests/benchmark/testdata/openapi1.yaml

giteshk

Thank you for your taking the time to review @seaneganx .
I have made the changes requested and replied to your comments.

seaneganx

The most important thing I've said in the past few rounds of feedback is this: Tests must be self-contained. Here's a list of statements on this topic that should all be true.

Running an individual benchmark should work without any manual setup.
Benchmarks should work regardless of the order they are executed in.
Running a benchmark multiple times repeatedly should not affect its performance.
Benchmarks should not leave any resources behind when they finish successfully.

Here's how I recommend you structure the benchmarks to achieve this.

Set up the server according to the benchmark parameters.
In the benchmark loop, measure the performance of the operation being tested.
After the benchmark loop is complete, clean up all the resources created by the test.

seaneganx · 2022-04-08T00:02:34Z

tests/benchmark/crud_test.go

+	flag.IntVar(&apiNameStartOffset, "api_offset", 0, "Offset the name of the API with this value")
+	flag.StringVar(&specPath, "spec_path", filepath.Join("testdata", "openapi.yaml"), "Absolute path to the specification file to use")
+	flag.StringVar(&mimeType, "mime_type", "application/x.openapi;version=3.0.0", "MIME type of the spec file")
+	flag.Int64Var(&pauseDuration, "pause_duration", 5, "Number of seconds to pause after every 1000 runs")


Can this be pause_duration_ms instead? Also, let's default to zero instead. We should only add pauses when users intentionally want to remedy an issue.

seaneganx · 2022-04-08T00:04:25Z

tests/benchmark/crud_test.go

+}
+
+func getRootResource(b *testing.B) string {
+	b.Helper()


The Helper() function is only needed if the functions fail, error, or log. It just makes sure any output uses the line number where the helper function is called, instead of the line number inside the helper where the fail/error/log occurs.

seaneganx · 2022-04-08T16:10:34Z

tests/benchmark/crud_test.go

+func pauseAfter1000Iterations(i int, b *testing.B) {
+	b.Helper()
+	if i%1000 == 0 {
+		/*
+			Pause 5 seconds every 1000 iterations to reduce chances of failure on
+			CREATE, UPDATE, DELETE
+		*/
+		time.Sleep(time.Duration(pauseDuration * time.Second.Nanoseconds()))
+	}
+}


Can we replace this with a pause between every iteration, measured in milliseconds? I would like this to be as uniform as possible, even if the clock isn't on while we do the pauses.

Why is 5000ms after 1000 iterations the magic length of pause that works, as opposed to 5ms after every iteration?

These pauses make me uncomfortable because they suggest that we're accepting and working around some odd performance that we don't fully understand (and might need to fix).

seaneganx · 2022-04-08T16:14:24Z

tests/benchmark/crud_test.go

+	flag.StringVar(&apiNamePrefix, "api_prefix", "test", "Prefix to use for the APIs")
+	flag.IntVar(&apiNameStartOffset, "api_offset", 0, "Offset the name of the API with this value")
+	flag.StringVar(&specPath, "spec_path", filepath.Join("testdata", "openapi.yaml"), "Absolute path to the specification file to use")
+	flag.StringVar(&mimeType, "mime_type", "application/x.openapi;version=3.0.0", "MIME type of the spec file")


Are we planning on benchmarking performance for different MIME types? I don't understand why this is part of the interface. Also, why OpenAPI v3 as a default? We don't know what type of spec the user will provide.

seaneganx · 2022-04-08T16:22:42Z

tests/benchmark/crud_test.go

+	flag.IntVar(&versionCount, "version_count", 1, "Number of versions of the API to create (Each version will have 1 spec)")
+	flag.StringVar(&apiNamePrefix, "api_prefix", "test", "Prefix to use for the APIs")
+	flag.IntVar(&apiNameStartOffset, "api_offset", 0, "Offset the name of the API with this value")
+	flag.StringVar(&specPath, "spec_path", filepath.Join("testdata", "openapi.yaml"), "Absolute path to the specification file to use")


Can we default to an empty spec? I think our default behavior should be to measure the most basic type of request. Uploading a spec adds more noise from network latency.

seaneganx · 2022-04-08T16:48:50Z

tests/benchmark/crud_test.go

+		req := &rpc.GetApiRequest{
+			Name: fmt.Sprintf("%s/apis/%s", rootResource, apiId),
+		}
+		api, err := client.GetApi(ctx, req)


Why are we including this in the measurement? This test is only supposed to benchmark the UpdateApi method.

seaneganx · 2022-04-08T16:50:37Z

tests/benchmark/crud_test.go

+		}
+		api, err := client.GetApi(ctx, req)
+		if err != nil {
+			b.Errorf("BenchmarkUpdateApi:GetApi(%+v) returned unexpected error: %s", req, err)


When errors are printed they already include the function and line number. The message you write should only include information about the cause of the error, not the location.

seaneganx · 2022-04-08T16:52:59Z

tests/benchmark/crud_test.go

+		} else {
+			api.DisplayName = fmt.Sprintf("Updated %s", apiId)
+			updateReq := &rpc.UpdateApiRequest{
+				Api: api,


Why send a fully populated API message with a field mask that only updates a single field? This adds unnecessary network latency noise.

seaneganx · 2022-04-08T16:54:19Z

tests/benchmark/crud_test.go

+			PageSize:  int32(pageSize),
+			PageToken: pageToken,
+		}
+		b.StartTimer()


The timer is already started on the first iteration. Please add a timer stop and reset before the loop to avoid measuring the client creation.

seaneganx · 2022-04-08T16:55:18Z

tests/benchmark/crud_test.go

+				b.StartTimer()
+			}
+			_, err := client.DeleteApiSpec(ctx, req)
+			// Only take reading for the first version


These comments aren't referring to the correct resource.

codecov · 2022-04-18T20:24:51Z

Codecov Report

Merging #480 (8a46fe7) into main (6affe78) will not change coverage.
The diff coverage is n/a.

❗ Current head 8a46fe7 differs from pull request most recent head cbaffe0. Consider uploading reports for the commit cbaffe0 to get more accurate results

@@           Coverage Diff           @@
##             main     #480   +/-   ##
=======================================
  Coverage   40.79%   40.79%           
=======================================
  Files          68       68           
  Lines        7956     7956           
=======================================
  Hits         3246     3246           
  Misses       4319     4319           
  Partials      391      391

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6affe78...cbaffe0. Read the comment docs.

timburks · 2022-06-24T20:03:50Z

Hi, I was looking into this today to try to better understand how it got stuck, and I think we're probably trying to do too much in an initial PR. I'll go back and add some comments here-and-there, but I think it might be more productive to restart with something very minimal at first that we later expand to measure more things.

timburks · 2022-06-24T20:19:24Z

tests/benchmark/benchmark_helpers.go

+
+func init() {
+	flag.StringVar(&registryProject, "registry_project", "bench", "Name of the Registry project")
+	rootResource = fmt.Sprintf("projects/%s/locations/global", registryProject)


This isn't doing what is probably intended -- because it is evaluated in the init() function, the registryProject variable still has its default value, so "rootResource" is always "projects/bench/locations/global".

I found that inside the actual benchmarking function, e.g. BenchmarkListApis, registryProject does hold the value set by the flag, so rootResource could be constructed there either directly or by calling a helper function that refers to the registryProject variable.

timburks · 2022-06-24T20:23:41Z

tests/benchmark/README.md

+    go test --bench=Create ./tests/benchmark \
+        --registry_project=$REGISTRY_PROJECT_IDENTIFIER \
+        --version_count=$API_VERSIONS_COUNT \
+        --benchtime=${ITERATION_SIZE}x --timeout=0


I was initially puzzled by this and subsequent invocations, wondering what was actually getting called for each value of the --bench argument. I found that it's treated as a regex that is used to match benchmark test function names, and since none of the test function names include "Create", "Update", etc., only the --bench=List invocation below calls an actual test function. That's probably because this was design for an implementation that is coming, but at least for clarity in review, I think it would be good to have the README instructions match the implementation.

timburks · 2022-06-24T20:28:12Z

tests/benchmark/benchmark_test.go

+		b.Run(fmt.Sprintf("%s", operation), func(b *testing.B) {
+			for i := 1; i <= b.N; i++ {
+				apiId := getApiName(i)
+				switch operation {


This is an interesting structure (almost like a state machine?) which I think might be intended to provide some flexibility that isn't being used yet (since operations above is a fixed array). If we wanted a way to repeatedly do a series of things in a standard wrapper, I'd suggest having an array of Go anonymous functions instead, but before doing that I would probably just write one instance in a simpler, possibly more tedious way and abstract from there.

timburks · 2022-06-24T21:21:21Z

tests/benchmark/benchmark_helpers.go

+	return nil, client, ctx
+}
+
+func CreateApi(apiId string, b *testing.B, record bool, client rpc.RegistryClient, ctx context.Context) *rpc.Api {


I didn't find any calls of this function where record is set to true. I tend to think that this is more granularity than necessary -- I would instead just make sure that measurement is off during initialization and teardown, which could be done with calls that start/stop the timer in the main test function. Why might we keep this?

timburks · 2022-06-29T20:28:30Z

See #631 which contains some initial benchmarks with smaller scope (and is based on these commits)

timburks · 2022-09-01T16:47:04Z

@giteshk feel free to reopen this if needed

giteshk requested a review from seaneganx March 8, 2022 19:20

giteshk marked this pull request as draft March 8, 2022 19:30

giteshk removed the request for review from seaneganx March 9, 2022 21:20

giteshk force-pushed the benchmark-test branch 2 times, most recently from 214c3a8 to 7055a7e Compare March 15, 2022 03:08

giteshk requested a review from seaneganx March 15, 2022 05:57

giteshk marked this pull request as ready for review March 15, 2022 05:58

seaneganx reviewed Mar 15, 2022

View reviewed changes

giteshk added 2 commits March 30, 2022 18:18

Benchmark test for CRUD operations on Api/Api Version/Api spec/Artifact

009263e

Added steps to run benchmark test on Github pull requests Added benchmark tests for Listing Apis/ApiVersions/ApiSpecs/Artifacts

Run the benchmark tests only if there are updates to the Server code

6bcea21

giteshk force-pushed the benchmark-test branch from a5c2529 to 6bcea21 Compare March 30, 2022 22:18

giteshk and others added 13 commits March 30, 2022 20:25

Core Review comments addressed

4315a91

Run BenchMark tests on every PR

32942f3

Run Benchmark test every Saturday at 1 am.

6102cb1

Additionally run benchmark tests on PRs only when the registry server code has been updated.

Moved Deletes to the end of the file

43e27ce

Store BenchMark results and generate visual graphs

fc9e868

Only run this on main branch and every Saturday at 1 am

80d1090

Merge branch 'main' into benchmark-test

fbd0aa2

Make the page size consistent for List APIs call Without Filters

178463b

Merge branch 'main' into benchmark-test

50e15a6

Merge branch 'main' into benchmark-test

6aee1a8

Removed github actions workflow for benchmarks.

4013286

It will be added as a separate PR

Remove unused test file

4b0ff0d

Revert back the action.yml.

44a31ca

this will be addressed in a seperate PR.

seaneganx requested changes Apr 5, 2022

View reviewed changes

Code review comments addressed.

05659fd

giteshk commented Apr 6, 2022

View reviewed changes

seaneganx requested changes Apr 8, 2022

View reviewed changes

Merge branch 'main' into benchmark-test

8a46fe7

giteshk marked this pull request as draft April 20, 2022 15:00

Rework benchmark test

cbaffe0

timburks reviewed Jun 24, 2022

View reviewed changes

timburks closed this Sep 1, 2022

Benchmark test for CRUD operations on Api/Api Version/Api spec/Artifact #480

Benchmark test for CRUD operations on Api/Api Version/Api spec/Artifact #480

Uh oh!

Conversation

giteshk commented Mar 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seaneganx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seaneganx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

giteshk Apr 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

giteshk left a comment

Choose a reason for hiding this comment

Uh oh!

seaneganx left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

giteshk commented Mar 7, 2022 •

edited

Loading

giteshk Apr 6, 2022 •

edited

Loading

codecov bot commented Apr 18, 2022 •

edited

Loading

timburks Jun 24, 2022 •

edited

Loading

timburks commented Jun 29, 2022 •

edited

Loading