Skip to content

Conversation

@giteshk
Copy link
Contributor

@giteshk giteshk commented Mar 7, 2022

The benchmark tests can be used against any running registry service

To run it against a local running service you can run

source auth/LOCAL.sh
export REGISTRY_PROJECT_IDENTIFIER=bench
apg admin create-project --project_id=$REGISTRY_PROJECT_IDENTIFIER
go test --bench=Create ./tests/benchmark --benchtime=10x
go test --bench=Update ./tests/benchmark --benchtime=10x
go test --bench=Get ./tests/benchmark --benchtime=10x
go test --bench=List ./tests/benchmark --benchtime=10x
go test --bench=Delete ./tests/benchmark --benchtime=10x

If you wish to create 5 versions for every API set export REGISTRY_VERSION_COUNT=5

To run this test against hosted version

source auth/HOSTED.sh
export REGISTRY_PROJECT_IDENTIFIER=<Registry project name>
apg admin create-project --project_id=$REGISTRY_PROJECT_IDENTIFIER
go test --bench=Create ./tests/benchmark --benchtime=10x
go test --bench=Update ./tests/benchmark --benchtime=10x
go test --bench=Get ./tests/benchmark --benchtime=10x
go test --bench=List ./tests/benchmark --benchtime=10x
go test --bench=Delete ./tests/benchmark --benchtime=10x

@giteshk giteshk requested a review from seaneganx March 8, 2022 19:20
@giteshk giteshk marked this pull request as draft March 8, 2022 19:30
@giteshk giteshk removed the request for review from seaneganx March 9, 2022 21:20
@giteshk giteshk force-pushed the benchmark-test branch 2 times, most recently from 214c3a8 to 7055a7e Compare March 15, 2022 03:08
@giteshk giteshk requested a review from seaneganx March 15, 2022 05:57
@giteshk giteshk marked this pull request as ready for review March 15, 2022 05:58
Copy link
Contributor

@seaneganx seaneganx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of new features in here that I don't understand. Can you expand the documentation to explain each feature and why it's useful?

Comment on lines 144 to 147
// Only take reading for the first version
if j == 1 {
b.StartTimer()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we only measure the first version? Why do we continue creating other versions after that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to create 3 versions and 3 specs. But the benchmark would add up time of all 3 versions which is how it is reported.

To make sure our reports are accurate we only measure the first pass for revisions and specs but let the scripts create all versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so let's talk about my second question.

Why do we continue creating other versions after that?

This is supposed to be benchmarking CreateApiVersion alone. Each iteration of the benchmark loop should only create one ApiVersion. Why are we creating three?

If I'm understanding correctly, BenchmarkCreateApiVersion a benchmark and setup code for BenchmarkCreateApiSpec. Is that correct?

Tests should work correctly regardless of the order they execute in. We should be able to run go test -bench=Get ./tests/benchmark/... to successfully measure the performance of every Get method.

Comment on lines 67 to 79
func readAndGZipFile(filename string) (*bytes.Buffer, error) {
fileBytes, _ := ioutil.ReadFile(filename)
var buf bytes.Buffer
zw, _ := gzip.NewWriterLevel(&buf, gzip.BestCompression)
_, err := zw.Write(fileBytes)
if err != nil {
return nil, err
}
if err := zw.Close(); err != nil {
return nil, err
}
return &buf, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we gzip the files that are being checked into the testdata/ directory instead? Doing it here seems like unnecessary work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want the ability to change the spec in the future. If we believe this should be a larger file.

I reworked the code to reduce the amount of times we read and GZip the file to once

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that being able to change the spec is a good feature to have. What happens if we want to benchmark a file that is not compressed? We can't do that with the current code.

The benchmarks should load the file without compressing it because it gives us the most flexibility. We can update the file to be compressed or not, and empty or megabytes large. IMO we should start with benchmarks uploading the current spec in uncompressed form because it's simple and fairly representative of a typical spec size.

More generally, if we find we want to test against a variety of specs, we can add a flag to specify which file the test should load. For now though, YAGNI.

giteshk added 2 commits March 30, 2022 18:18
Added steps to run benchmark test on Github pull requests
Added benchmark tests for Listing Apis/ApiVersions/ApiSpecs/Artifacts
Copy link
Contributor

@seaneganx seaneganx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for splitting this into smaller parts Gitesh. Still, there are a lot of new features and changes to how the benchmarks work, so I have a lot of questions and concerns.

Many of my concerns are about the accuracy of the benchmarks. We should strive to keep the benchmarks as simple as possible because the cost of accidentally producing incorrect measurements are high. I think there's a lot we can do to reduce complexity here.

Comment on lines 34 to 77
func getRootResource(b *testing.B) string {
b.Helper()

p, ok := os.LookupEnv("REGISTRY_PROJECT_IDENTIFIER")
if !ok || p == "" {
p = "bench"
}
return fmt.Sprintf("projects/%s/locations/global", p)
}

func loadVersionCount(b *testing.B) int {
b.Helper()

s, ok := os.LookupEnv("REGISTRY_VERSION_COUNT")
if !ok {
return 1
}

c, err := strconv.Atoi(s)
if err != nil {
b.Fatalf("Setup: Invalid REGISTRY_VERSION_COUNT %q: count must be an integer", s)
}

return c
}

func getApiName(i int, b *testing.B) string {
b.Helper()

prefix, ok := os.LookupEnv("API_NAME_PREFIX")
if !ok {
prefix = "benchtest"
}
s, ok := os.LookupEnv("API_NAME_START_OFFSET")
if ok {
offset, err := strconv.Atoi(s)
if err != nil {
b.Fatalf("Setup: Invalid API_NAME_START_OFFSET %q: offset must be an integer", s)
}
i += offset
}

return fmt.Sprintf("%s-%d", prefix, i)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you replace these environment variables with flags and still achieve the same functionality? The server tests already use flags for test configuration, and I would like to remain consistent if possible.

As a bonus, it should simplify the code here because the flag library already handles type verification and default values. Then if users accidentally set the wrong flag, it will give them an error instead of silently defaulting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great recommendation ! made it so much simpler to handle

Comment on lines 67 to 79
func readAndGZipFile(filename string) (*bytes.Buffer, error) {
fileBytes, _ := ioutil.ReadFile(filename)
var buf bytes.Buffer
zw, _ := gzip.NewWriterLevel(&buf, gzip.BestCompression)
_, err := zw.Write(fileBytes)
if err != nil {
return nil, err
}
if err := zw.Close(); err != nil {
return nil, err
}
return &buf, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that being able to change the spec is a good feature to have. What happens if we want to benchmark a file that is not compressed? We can't do that with the current code.

The benchmarks should load the file without compressing it because it gives us the most flexibility. We can update the file to be compressed or not, and empty or megabytes large. IMO we should start with benchmarks uploading the current spec in uncompressed form because it's simple and fairly representative of a typical spec size.

More generally, if we find we want to test against a variety of specs, we can add a flag to specify which file the test should load. For now though, YAGNI.

Comment on lines 92 to 100
func pauseAfter1000Iterations(i int) {
if i%1000 == 0 {
/*
Pause 5 seconds every 1000 iterations to reduce chances of failure on
CREATE, UPDATE, DELETE
*/
time.Sleep(5 * time.Second)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace this with a simpler throttle? Making a file-level variable with a fixed duration to sleep between every iteration seems much easier to understand.

Copy link
Contributor Author

@giteshk giteshk Apr 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I parameterized the tests so we can pass different specs. It defaults to the openapi example, but we provide grpc or graphql specs in the future.

The throttle is a variable we set and is passed through the flags.

Comment on lines 128 to 131
/**
Go Benchmarking always runs all the benchmark tests once before it runs
for the duration specified. So the first row always returns error.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a link to any documentation about this? Why does it cause an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this info in the commit message:

golang/go#32051
The issue seems to be fixed for --benchtime=1x but does not fix --benchtime=100X

i += offset
}

return fmt.Sprintf("%s-%d", prefix, i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say I ran the create script and created 100 APIs. I wanted to add 100 more APIs to an existing environment without having to delete and rerun the script with 200 APIs. This was Needed when I started seeing errors while trying to adding 2000 APIs. so I could break down the create into smaller chunks

I understand it was helpful during development, but why are we including it now? I'm not understanding the benefit to having custom prefixes and offsets. If I want to add 100 more APIs, then I can pick a new prefix and re-run the benchmarks on the same project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say I want to create 10k APIs to test.
I can run the create scripts 5 times with 2k APIs created every time.
Once the 10k APIs are created, we can now run the GET and LIST benchmarks on the 10k APIs and they follow the same pattern for api names (test-1000 test-5000 , test-10000)

Comment on lines 144 to 147
// Only take reading for the first version
if j == 1 {
b.StartTimer()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so let's talk about my second question.

Why do we continue creating other versions after that?

This is supposed to be benchmarking CreateApiVersion alone. Each iteration of the benchmark loop should only create one ApiVersion. Why are we creating three?

If I'm understanding correctly, BenchmarkCreateApiVersion a benchmark and setup code for BenchmarkCreateApiSpec. Is that correct?

Tests should work correctly regardless of the order they execute in. We should be able to run go test -bench=Get ./tests/benchmark/... to successfully measure the performance of every Get method.

buf, err := readAndGZipFile(filepath.Join("testdata", "openapi1.yaml"))
if err != nil {
b.Errorf("BenchmarkCreateApiSpec could not read openapi1.yaml : %s", err)
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct

If I'm understanding correctly, BenchmarkCreateApiVersion a benchmark and setup code for BenchmarkCreateApiSpec. Is that correct?

I think you are right in terms of we don't need to iterate through every api spec version for the UPDATE/GET calls
That really does not give us any metrics and is wasted execution time. I changed the logic for Update and gets to only operate on the first spec of the first revision of every api.

I left the looping over api versions on the Deletes, to clean everything that was created during the tests.

Copy link
Contributor Author

@giteshk giteshk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your taking the time to review @seaneganx .
I have made the changes requested and replied to your comments.

Copy link
Contributor

@seaneganx seaneganx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most important thing I've said in the past few rounds of feedback is this: Tests must be self-contained. Here's a list of statements on this topic that should all be true.

  • Running an individual benchmark should work without any manual setup.
  • Benchmarks should work regardless of the order they are executed in.
  • Running a benchmark multiple times repeatedly should not affect its performance.
  • Benchmarks should not leave any resources behind when they finish successfully.

Here's how I recommend you structure the benchmarks to achieve this.

  1. Set up the server according to the benchmark parameters.
  2. In the benchmark loop, measure the performance of the operation being tested.
  3. After the benchmark loop is complete, clean up all the resources created by the test.

flag.IntVar(&apiNameStartOffset, "api_offset", 0, "Offset the name of the API with this value")
flag.StringVar(&specPath, "spec_path", filepath.Join("testdata", "openapi.yaml"), "Absolute path to the specification file to use")
flag.StringVar(&mimeType, "mime_type", "application/x.openapi;version=3.0.0", "MIME type of the spec file")
flag.Int64Var(&pauseDuration, "pause_duration", 5, "Number of seconds to pause after every 1000 runs")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be pause_duration_ms instead? Also, let's default to zero instead. We should only add pauses when users intentionally want to remedy an issue.

}

func getRootResource(b *testing.B) string {
b.Helper()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Helper() function is only needed if the functions fail, error, or log. It just makes sure any output uses the line number where the helper function is called, instead of the line number inside the helper where the fail/error/log occurs.

Comment on lines 63 to 72
func pauseAfter1000Iterations(i int, b *testing.B) {
b.Helper()
if i%1000 == 0 {
/*
Pause 5 seconds every 1000 iterations to reduce chances of failure on
CREATE, UPDATE, DELETE
*/
time.Sleep(time.Duration(pauseDuration * time.Second.Nanoseconds()))
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace this with a pause between every iteration, measured in milliseconds? I would like this to be as uniform as possible, even if the clock isn't on while we do the pauses.

Why is 5000ms after 1000 iterations the magic length of pause that works, as opposed to 5ms after every iteration?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These pauses make me uncomfortable because they suggest that we're accepting and working around some odd performance that we don't fully understand (and might need to fix).

flag.StringVar(&apiNamePrefix, "api_prefix", "test", "Prefix to use for the APIs")
flag.IntVar(&apiNameStartOffset, "api_offset", 0, "Offset the name of the API with this value")
flag.StringVar(&specPath, "spec_path", filepath.Join("testdata", "openapi.yaml"), "Absolute path to the specification file to use")
flag.StringVar(&mimeType, "mime_type", "application/x.openapi;version=3.0.0", "MIME type of the spec file")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning on benchmarking performance for different MIME types? I don't understand why this is part of the interface. Also, why OpenAPI v3 as a default? We don't know what type of spec the user will provide.

flag.IntVar(&versionCount, "version_count", 1, "Number of versions of the API to create (Each version will have 1 spec)")
flag.StringVar(&apiNamePrefix, "api_prefix", "test", "Prefix to use for the APIs")
flag.IntVar(&apiNameStartOffset, "api_offset", 0, "Offset the name of the API with this value")
flag.StringVar(&specPath, "spec_path", filepath.Join("testdata", "openapi.yaml"), "Absolute path to the specification file to use")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we default to an empty spec? I think our default behavior should be to measure the most basic type of request. Uploading a spec adds more noise from network latency.

req := &rpc.GetApiRequest{
Name: fmt.Sprintf("%s/apis/%s", rootResource, apiId),
}
api, err := client.GetApi(ctx, req)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we including this in the measurement? This test is only supposed to benchmark the UpdateApi method.

}
api, err := client.GetApi(ctx, req)
if err != nil {
b.Errorf("BenchmarkUpdateApi:GetApi(%+v) returned unexpected error: %s", req, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When errors are printed they already include the function and line number. The message you write should only include information about the cause of the error, not the location.

} else {
api.DisplayName = fmt.Sprintf("Updated %s", apiId)
updateReq := &rpc.UpdateApiRequest{
Api: api,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why send a fully populated API message with a field mask that only updates a single field? This adds unnecessary network latency noise.

PageSize: int32(pageSize),
PageToken: pageToken,
}
b.StartTimer()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer is already started on the first iteration. Please add a timer stop and reset before the loop to avoid measuring the client creation.

b.StartTimer()
}
_, err := client.DeleteApiSpec(ctx, req)
// Only take reading for the first version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments aren't referring to the correct resource.

@codecov
Copy link

codecov bot commented Apr 18, 2022

Codecov Report

Merging #480 (8a46fe7) into main (6affe78) will not change coverage.
The diff coverage is n/a.

❗ Current head 8a46fe7 differs from pull request most recent head cbaffe0. Consider uploading reports for the commit cbaffe0 to get more accurate results

@@           Coverage Diff           @@
##             main     #480   +/-   ##
=======================================
  Coverage   40.79%   40.79%           
=======================================
  Files          68       68           
  Lines        7956     7956           
=======================================
  Hits         3246     3246           
  Misses       4319     4319           
  Partials      391      391           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6affe78...cbaffe0. Read the comment docs.

@giteshk giteshk marked this pull request as draft April 20, 2022 15:00
@timburks
Copy link
Contributor

Hi, I was looking into this today to try to better understand how it got stuck, and I think we're probably trying to do too much in an initial PR. I'll go back and add some comments here-and-there, but I think it might be more productive to restart with something very minimal at first that we later expand to measure more things.


func init() {
flag.StringVar(&registryProject, "registry_project", "bench", "Name of the Registry project")
rootResource = fmt.Sprintf("projects/%s/locations/global", registryProject)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't doing what is probably intended -- because it is evaluated in the init() function, the registryProject variable still has its default value, so "rootResource" is always "projects/bench/locations/global".

I found that inside the actual benchmarking function, e.g. BenchmarkListApis, registryProject does hold the value set by the flag, so rootResource could be constructed there either directly or by calling a helper function that refers to the registryProject variable.

go test --bench=Create ./tests/benchmark \
--registry_project=$REGISTRY_PROJECT_IDENTIFIER \
--version_count=$API_VERSIONS_COUNT \
--benchtime=${ITERATION_SIZE}x --timeout=0
Copy link
Contributor

@timburks timburks Jun 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially puzzled by this and subsequent invocations, wondering what was actually getting called for each value of the --bench argument. I found that it's treated as a regex that is used to match benchmark test function names, and since none of the test function names include "Create", "Update", etc., only the --bench=List invocation below calls an actual test function. That's probably because this was design for an implementation that is coming, but at least for clarity in review, I think it would be good to have the README instructions match the implementation.

b.Run(fmt.Sprintf("%s", operation), func(b *testing.B) {
for i := 1; i <= b.N; i++ {
apiId := getApiName(i)
switch operation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting structure (almost like a state machine?) which I think might be intended to provide some flexibility that isn't being used yet (since operations above is a fixed array). If we wanted a way to repeatedly do a series of things in a standard wrapper, I'd suggest having an array of Go anonymous functions instead, but before doing that I would probably just write one instance in a simpler, possibly more tedious way and abstract from there.

return nil, client, ctx
}

func CreateApi(apiId string, b *testing.B, record bool, client rpc.RegistryClient, ctx context.Context) *rpc.Api {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any calls of this function where record is set to true. I tend to think that this is more granularity than necessary -- I would instead just make sure that measurement is off during initialization and teardown, which could be done with calls that start/stop the timer in the main test function. Why might we keep this?

@timburks
Copy link
Contributor

timburks commented Jun 29, 2022

See #631 which contains some initial benchmarks with smaller scope (and is based on these commits)

@timburks
Copy link
Contributor

timburks commented Sep 1, 2022

@giteshk feel free to reopen this if needed

@timburks timburks closed this Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants