- 
                Notifications
    
You must be signed in to change notification settings  - Fork 34
 
Benchmark test for CRUD operations on Api/Api Version/Api spec/Artifact #480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
214c3a8    to
    7055a7e      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of new features in here that I don't understand. Can you expand the documentation to explain each feature and why it's useful?
        
          
                tests/benchmark/crud_test.go
              
                Outdated
          
        
      | // Only take reading for the first version | ||
| if j == 1 { | ||
| b.StartTimer() | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we only measure the first version? Why do we continue creating other versions after that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to create 3 versions and 3 specs. But the benchmark would add up time of all 3 versions which is how it is reported.
To make sure our reports are accurate we only measure the first pass for revisions and specs but let the scripts create all versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so let's talk about my second question.
Why do we continue creating other versions after that?
This is supposed to be benchmarking CreateApiVersion alone. Each iteration of the benchmark loop should only create one ApiVersion. Why are we creating three?
If I'm understanding correctly, BenchmarkCreateApiVersion a benchmark and setup code for BenchmarkCreateApiSpec. Is that correct?
Tests should work correctly regardless of the order they execute in. We should be able to run go test -bench=Get ./tests/benchmark/... to successfully measure the performance of every Get method.
        
          
                tests/benchmark/crud_test.go
              
                Outdated
          
        
      | func readAndGZipFile(filename string) (*bytes.Buffer, error) { | ||
| fileBytes, _ := ioutil.ReadFile(filename) | ||
| var buf bytes.Buffer | ||
| zw, _ := gzip.NewWriterLevel(&buf, gzip.BestCompression) | ||
| _, err := zw.Write(fileBytes) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| if err := zw.Close(); err != nil { | ||
| return nil, err | ||
| } | ||
| return &buf, nil | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we gzip the files that are being checked into the testdata/ directory instead? Doing it here seems like unnecessary work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want the ability to change the spec in the future. If we believe this should be a larger file.
I reworked the code to reduce the amount of times we read and GZip the file to once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that being able to change the spec is a good feature to have. What happens if we want to benchmark a file that is not compressed? We can't do that with the current code.
The benchmarks should load the file without compressing it because it gives us the most flexibility. We can update the file to be compressed or not, and empty or megabytes large. IMO we should start with benchmarks uploading the current spec in uncompressed form because it's simple and fairly representative of a typical spec size.
More generally, if we find we want to test against a variety of specs, we can add a flag to specify which file the test should load. For now though, YAGNI.
Added steps to run benchmark test on Github pull requests Added benchmark tests for Listing Apis/ApiVersions/ApiSpecs/Artifacts
Additionally run benchmark tests on PRs only when the registry server code has been updated.
It will be added as a separate PR
this will be addressed in a seperate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for splitting this into smaller parts Gitesh. Still, there are a lot of new features and changes to how the benchmarks work, so I have a lot of questions and concerns.
Many of my concerns are about the accuracy of the benchmarks. We should strive to keep the benchmarks as simple as possible because the cost of accidentally producing incorrect measurements are high. I think there's a lot we can do to reduce complexity here.
        
          
                tests/benchmark/crud_test.go
              
                Outdated
          
        
      | func getRootResource(b *testing.B) string { | ||
| b.Helper() | ||
| 
               | 
          ||
| p, ok := os.LookupEnv("REGISTRY_PROJECT_IDENTIFIER") | ||
| if !ok || p == "" { | ||
| p = "bench" | ||
| } | ||
| return fmt.Sprintf("projects/%s/locations/global", p) | ||
| } | ||
| 
               | 
          ||
| func loadVersionCount(b *testing.B) int { | ||
| b.Helper() | ||
| 
               | 
          ||
| s, ok := os.LookupEnv("REGISTRY_VERSION_COUNT") | ||
| if !ok { | ||
| return 1 | ||
| } | ||
| 
               | 
          ||
| c, err := strconv.Atoi(s) | ||
| if err != nil { | ||
| b.Fatalf("Setup: Invalid REGISTRY_VERSION_COUNT %q: count must be an integer", s) | ||
| } | ||
| 
               | 
          ||
| return c | ||
| } | ||
| 
               | 
          ||
| func getApiName(i int, b *testing.B) string { | ||
| b.Helper() | ||
| 
               | 
          ||
| prefix, ok := os.LookupEnv("API_NAME_PREFIX") | ||
| if !ok { | ||
| prefix = "benchtest" | ||
| } | ||
| s, ok := os.LookupEnv("API_NAME_START_OFFSET") | ||
| if ok { | ||
| offset, err := strconv.Atoi(s) | ||
| if err != nil { | ||
| b.Fatalf("Setup: Invalid API_NAME_START_OFFSET %q: offset must be an integer", s) | ||
| } | ||
| i += offset | ||
| } | ||
| 
               | 
          ||
| return fmt.Sprintf("%s-%d", prefix, i) | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you replace these environment variables with flags and still achieve the same functionality? The server tests already use flags for test configuration, and I would like to remain consistent if possible.
As a bonus, it should simplify the code here because the flag library already handles type verification and default values. Then if users accidentally set the wrong flag, it will give them an error instead of silently defaulting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great recommendation ! made it so much simpler to handle
        
          
                tests/benchmark/crud_test.go
              
                Outdated
          
        
      | func readAndGZipFile(filename string) (*bytes.Buffer, error) { | ||
| fileBytes, _ := ioutil.ReadFile(filename) | ||
| var buf bytes.Buffer | ||
| zw, _ := gzip.NewWriterLevel(&buf, gzip.BestCompression) | ||
| _, err := zw.Write(fileBytes) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| if err := zw.Close(); err != nil { | ||
| return nil, err | ||
| } | ||
| return &buf, nil | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that being able to change the spec is a good feature to have. What happens if we want to benchmark a file that is not compressed? We can't do that with the current code.
The benchmarks should load the file without compressing it because it gives us the most flexibility. We can update the file to be compressed or not, and empty or megabytes large. IMO we should start with benchmarks uploading the current spec in uncompressed form because it's simple and fairly representative of a typical spec size.
More generally, if we find we want to test against a variety of specs, we can add a flag to specify which file the test should load. For now though, YAGNI.
        
          
                tests/benchmark/crud_test.go
              
                Outdated
          
        
      | func pauseAfter1000Iterations(i int) { | ||
| if i%1000 == 0 { | ||
| /* | ||
| Pause 5 seconds every 1000 iterations to reduce chances of failure on | ||
| CREATE, UPDATE, DELETE | ||
| */ | ||
| time.Sleep(5 * time.Second) | ||
| } | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we replace this with a simpler throttle? Making a file-level variable with a fixed duration to sleep between every iteration seems much easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I parameterized the tests so we can pass different specs. It defaults to the openapi example, but we provide grpc or graphql specs in the future.
The throttle is a variable we set and is passed through the flags.
        
          
                tests/benchmark/crud_test.go
              
                Outdated
          
        
      | /** | ||
| Go Benchmarking always runs all the benchmark tests once before it runs | ||
| for the duration specified. So the first row always returns error. | ||
| */ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a link to any documentation about this? Why does it cause an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this info in the commit message:
golang/go#32051
The issue seems to be fixed for --benchtime=1x but does not fix --benchtime=100X
        
          
                tests/benchmark/crud_test.go
              
                Outdated
          
        
      | i += offset | ||
| } | ||
| 
               | 
          ||
| return fmt.Sprintf("%s-%d", prefix, i) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say I ran the create script and created 100 APIs. I wanted to add 100 more APIs to an existing environment without having to delete and rerun the script with 200 APIs. This was Needed when I started seeing errors while trying to adding 2000 APIs. so I could break down the create into smaller chunks
I understand it was helpful during development, but why are we including it now? I'm not understanding the benefit to having custom prefixes and offsets. If I want to add 100 more APIs, then I can pick a new prefix and re-run the benchmarks on the same project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say I want to create 10k APIs to test.
I can run the create scripts 5 times with 2k APIs created every time.
Once the 10k APIs are created, we can now run the GET and LIST benchmarks on the 10k APIs and they follow the same pattern for api names (test-1000 test-5000 , test-10000)
        
          
                tests/benchmark/crud_test.go
              
                Outdated
          
        
      | // Only take reading for the first version | ||
| if j == 1 { | ||
| b.StartTimer() | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so let's talk about my second question.
Why do we continue creating other versions after that?
This is supposed to be benchmarking CreateApiVersion alone. Each iteration of the benchmark loop should only create one ApiVersion. Why are we creating three?
If I'm understanding correctly, BenchmarkCreateApiVersion a benchmark and setup code for BenchmarkCreateApiSpec. Is that correct?
Tests should work correctly regardless of the order they execute in. We should be able to run go test -bench=Get ./tests/benchmark/... to successfully measure the performance of every Get method.
        
          
                tests/benchmark/crud_test.go
              
                Outdated
          
        
      | buf, err := readAndGZipFile(filepath.Join("testdata", "openapi1.yaml")) | ||
| if err != nil { | ||
| b.Errorf("BenchmarkCreateApiSpec could not read openapi1.yaml : %s", err) | ||
| } else { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This else can be removed. https://github.com/golang/go/wiki/CodeReviewComments#indent-error-flow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct
If I'm understanding correctly, BenchmarkCreateApiVersion a benchmark and setup code for BenchmarkCreateApiSpec. Is that correct?
I think you are right in terms of we don't need to iterate through every api spec version for the UPDATE/GET calls
That really does not give us any metrics and is wasted execution time. I changed the logic for Update and gets to only operate on the first spec of the first revision of every api.
I left the looping over api versions on the Deletes, to clean everything that was created during the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your taking the time to review @seaneganx .
I have made the changes requested and replied to your comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most important thing I've said in the past few rounds of feedback is this: Tests must be self-contained. Here's a list of statements on this topic that should all be true.
- Running an individual benchmark should work without any manual setup.
 - Benchmarks should work regardless of the order they are executed in.
 - Running a benchmark multiple times repeatedly should not affect its performance.
 - Benchmarks should not leave any resources behind when they finish successfully.
 
Here's how I recommend you structure the benchmarks to achieve this.
- Set up the server according to the benchmark parameters.
 - In the benchmark loop, measure the performance of the operation being tested.
 - After the benchmark loop is complete, clean up all the resources created by the test.
 
        
          
                tests/benchmark/crud_test.go
              
                Outdated
          
        
      | flag.IntVar(&apiNameStartOffset, "api_offset", 0, "Offset the name of the API with this value") | ||
| flag.StringVar(&specPath, "spec_path", filepath.Join("testdata", "openapi.yaml"), "Absolute path to the specification file to use") | ||
| flag.StringVar(&mimeType, "mime_type", "application/x.openapi;version=3.0.0", "MIME type of the spec file") | ||
| flag.Int64Var(&pauseDuration, "pause_duration", 5, "Number of seconds to pause after every 1000 runs") | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be pause_duration_ms instead? Also, let's default to zero instead. We should only add pauses when users intentionally want to remedy an issue.
        
          
                tests/benchmark/crud_test.go
              
                Outdated
          
        
      | } | ||
| 
               | 
          ||
| func getRootResource(b *testing.B) string { | ||
| b.Helper() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Helper() function is only needed if the functions fail, error, or log. It just makes sure any output uses the line number where the helper function is called, instead of the line number inside the helper where the fail/error/log occurs.
        
          
                tests/benchmark/crud_test.go
              
                Outdated
          
        
      | func pauseAfter1000Iterations(i int, b *testing.B) { | ||
| b.Helper() | ||
| if i%1000 == 0 { | ||
| /* | ||
| Pause 5 seconds every 1000 iterations to reduce chances of failure on | ||
| CREATE, UPDATE, DELETE | ||
| */ | ||
| time.Sleep(time.Duration(pauseDuration * time.Second.Nanoseconds())) | ||
| } | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we replace this with a pause between every iteration, measured in milliseconds? I would like this to be as uniform as possible, even if the clock isn't on while we do the pauses.
Why is 5000ms after 1000 iterations the magic length of pause that works, as opposed to 5ms after every iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These pauses make me uncomfortable because they suggest that we're accepting and working around some odd performance that we don't fully understand (and might need to fix).
        
          
                tests/benchmark/crud_test.go
              
                Outdated
          
        
      | flag.StringVar(&apiNamePrefix, "api_prefix", "test", "Prefix to use for the APIs") | ||
| flag.IntVar(&apiNameStartOffset, "api_offset", 0, "Offset the name of the API with this value") | ||
| flag.StringVar(&specPath, "spec_path", filepath.Join("testdata", "openapi.yaml"), "Absolute path to the specification file to use") | ||
| flag.StringVar(&mimeType, "mime_type", "application/x.openapi;version=3.0.0", "MIME type of the spec file") | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we planning on benchmarking performance for different MIME types? I don't understand why this is part of the interface. Also, why OpenAPI v3 as a default? We don't know what type of spec the user will provide.
        
          
                tests/benchmark/crud_test.go
              
                Outdated
          
        
      | flag.IntVar(&versionCount, "version_count", 1, "Number of versions of the API to create (Each version will have 1 spec)") | ||
| flag.StringVar(&apiNamePrefix, "api_prefix", "test", "Prefix to use for the APIs") | ||
| flag.IntVar(&apiNameStartOffset, "api_offset", 0, "Offset the name of the API with this value") | ||
| flag.StringVar(&specPath, "spec_path", filepath.Join("testdata", "openapi.yaml"), "Absolute path to the specification file to use") | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we default to an empty spec? I think our default behavior should be to measure the most basic type of request. Uploading a spec adds more noise from network latency.
        
          
                tests/benchmark/crud_test.go
              
                Outdated
          
        
      | req := &rpc.GetApiRequest{ | ||
| Name: fmt.Sprintf("%s/apis/%s", rootResource, apiId), | ||
| } | ||
| api, err := client.GetApi(ctx, req) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we including this in the measurement? This test is only supposed to benchmark the UpdateApi method.
        
          
                tests/benchmark/crud_test.go
              
                Outdated
          
        
      | } | ||
| api, err := client.GetApi(ctx, req) | ||
| if err != nil { | ||
| b.Errorf("BenchmarkUpdateApi:GetApi(%+v) returned unexpected error: %s", req, err) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When errors are printed they already include the function and line number. The message you write should only include information about the cause of the error, not the location.
        
          
                tests/benchmark/crud_test.go
              
                Outdated
          
        
      | } else { | ||
| api.DisplayName = fmt.Sprintf("Updated %s", apiId) | ||
| updateReq := &rpc.UpdateApiRequest{ | ||
| Api: api, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why send a fully populated API message with a field mask that only updates a single field? This adds unnecessary network latency noise.
        
          
                tests/benchmark/crud_test.go
              
                Outdated
          
        
      | PageSize: int32(pageSize), | ||
| PageToken: pageToken, | ||
| } | ||
| b.StartTimer() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timer is already started on the first iteration. Please add a timer stop and reset before the loop to avoid measuring the client creation.
        
          
                tests/benchmark/crud_test.go
              
                Outdated
          
        
      | b.StartTimer() | ||
| } | ||
| _, err := client.DeleteApiSpec(ctx, req) | ||
| // Only take reading for the first version | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments aren't referring to the correct resource.
          Codecov Report
 
 @@           Coverage Diff           @@
##             main     #480   +/-   ##
=======================================
  Coverage   40.79%   40.79%           
=======================================
  Files          68       68           
  Lines        7956     7956           
=======================================
  Hits         3246     3246           
  Misses       4319     4319           
  Partials      391      391           Continue to review full report at Codecov. 
  | 
    
| 
           Hi, I was looking into this today to try to better understand how it got stuck, and I think we're probably trying to do too much in an initial PR. I'll go back and add some comments here-and-there, but I think it might be more productive to restart with something very minimal at first that we later expand to measure more things.  | 
    
| 
               | 
          ||
| func init() { | ||
| flag.StringVar(®istryProject, "registry_project", "bench", "Name of the Registry project") | ||
| rootResource = fmt.Sprintf("projects/%s/locations/global", registryProject) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't doing what is probably intended -- because it is evaluated in the init() function, the registryProject variable still has its default value, so "rootResource" is always "projects/bench/locations/global".
I found that inside the actual benchmarking function, e.g. BenchmarkListApis, registryProject does hold the value set by the flag, so rootResource could be constructed there either directly or by calling a helper function that refers to the registryProject variable.
| go test --bench=Create ./tests/benchmark \ | ||
| --registry_project=$REGISTRY_PROJECT_IDENTIFIER \ | ||
| --version_count=$API_VERSIONS_COUNT \ | ||
| --benchtime=${ITERATION_SIZE}x --timeout=0 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was initially puzzled by this and subsequent invocations, wondering what was actually getting called for each value of the --bench argument. I found that it's treated as a regex that is used to match benchmark test function names, and since none of the test function names include "Create", "Update", etc., only the --bench=List invocation below calls an actual test function. That's probably because this was design for an implementation that is coming, but at least for clarity in review, I think it would be good to have the README instructions match the implementation.
| b.Run(fmt.Sprintf("%s", operation), func(b *testing.B) { | ||
| for i := 1; i <= b.N; i++ { | ||
| apiId := getApiName(i) | ||
| switch operation { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting structure (almost like a state machine?) which I think might be intended to provide some flexibility that isn't being used yet (since operations above is a fixed array). If we wanted a way to repeatedly do a series of things in a standard wrapper, I'd suggest having an array of Go anonymous functions instead, but before doing that I would probably just write one instance in a simpler, possibly more tedious way and abstract from there.
| return nil, client, ctx | ||
| } | ||
| 
               | 
          ||
| func CreateApi(apiId string, b *testing.B, record bool, client rpc.RegistryClient, ctx context.Context) *rpc.Api { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find any calls of this function where record is set to true. I tend to think that this is more granularity than necessary -- I would instead just make sure that measurement is off during initialization and teardown, which could be done with calls that start/stop the timer in the main test function. Why might we keep this?
| 
           See #631 which contains some initial benchmarks with smaller scope (and is based on these commits)  | 
    
| 
           @giteshk feel free to reopen this if needed  | 
    
The benchmark tests can be used against any running registry service
To run it against a local running service you can run
If you wish to create 5 versions for every API set
export REGISTRY_VERSION_COUNT=5To run this test against hosted version