Skip to content

Conversation

@adonovan
Copy link
Contributor

@adonovan adonovan commented Dec 10, 2025

Previously, the lexer ran as a separate goroutine that fed a channel with items. This change expresses the lexer as an iter.Seq[item], and then uses the iter.Pull mechanism to run it as a lightweight coroutine. This reduces the running time of the added benchmark by about 32%.

$ go test -benchtime=10s -bench=. ./pattern

before
BenchmarkParser-8 649060 19051 ns/op
after
BenchmarkParser-8 908734 12971 ns/op

Interestingly, the total time spent by gopls in Parse, measured by bracketing Parse with time.Now/time.Time.Sub calls, consistently reduces by a factor of around 20x (down to 800us, originally 15ms). I can't yet explain this discrepancy.

Previously, the lexer ran as a separate goroutine that fed a
channel with items. This change expresses the lexer as an
iter.Seq[item], and then uses the iter.Pull mechanism to
run it as a lightweight coroutine. This reduces the running
time of the added by about 32%.

$ go test  -benchtime=10s -bench=. ./pattern

 before
 BenchmarkParser-8   	  649060	     19051 ns/op
 after
 BenchmarkParser-8   	  908734	     12971 ns/op

Interestingly, the total time spent by gopls in Parse,
measured by bracketing Parse with time.Now/time.Sub calls,
consistently reduces by a factor of around 20x (down to 800us,
originally 15ms). I can't yet explain this discrepancy.
@dominikh
Copy link
Owner

Those gopls timings are quite worrying...

As for the change itself, maybe we should just have the lexer return a slice of tokens? Patterns are small, we only parse them at program startup, and the [cg]oroutines probably don't save enough garbage to justify the added complexity.

@adonovan
Copy link
Contributor Author

Those gopls timings are quite worrying...

As for the change itself, maybe we should just have the lexer return a slice of tokens? Patterns are small, we only parse them at program startup,

That's a possibility. Especially if the tokens themselves could be made smaller by not including the string, only its start/end offsets into the content string shared by lexer and parser, something like:

type item struct {
	typ            uint8
	start, end int32
}

and the [cg]oroutines probably don't save enough garbage to justify the added complexity.

I felt the coroutines actually reduced complexity (a small amount) by reconsidering the lexer as just the implementation details of an iterator; and the parser's nextToken function exactly matches what iter.Pull gives us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants