Grackle is a project to automate the marshaling and unmarshaling code for verified go programs using
the perennial framework. Effectively, it aims to build on top of the existing marshaling library by
creating a compiler capable of taking a subset of the Protobuf specification (.proto file) and
emitting the Go code which can perform the marshaling and unmarshaling of the described messages and
a rocq proof which enables reasoning about the marshaling process in other verified projects.
- Usage
- Why “Grackle”
- Compatibility with =protoc=
- Supported Protobuf Features
- Grackle Wire Format
- Generated Code Guide
- Technical Considerations
- Development
Grackle is capable of generating three outputs for an input directory of proto files:
- Go code which marshals and unmarshals the messages.
- Goose output, a representation of the generated go code in goose lang and some low-level
generated rocq projects from the
proofgentool. - Rocq proofs using perennial that the generated go code is correct.
Grackle can generate just the go code, just the rocq proofs, go and goose code, rocq and go or all three. Basically, the go and rocq proof generation can happen separately, but goose code can only be generated when go code is.
Usage: grackle [options] <path to protobuf files>
-debug
Output all generated code to stdout
-go-output-path string
Physical path to output go code into
-go-package string
Fully qualified root package for the output packages
-goose-output string
Directory to write the goose output
-proofgen-path string
Physical output path for proofgen files
-rocq-logical-path string
Logical path to import the marshal proofs from
-rocq-physical-path string
Physical output path for rocq proofs
To use grackle, the tool needs at least three pieces of information, a directory
with the .proto files and either
- the two
rocqflags - the two
goflags
The other flags are optional. A full invocation of grackle would be
grackle -rocq-logical-path Grackle.test -rocq-physical-path testdata/out/rocq/ -proofgen-path testdata/out/pg -go-output-path testdata/out/go -go-package github.com/mjschwenne/grackle/testdata/out/go -goose-output testdata/out/goose/ testdata/proto/calendar
A more detailed description of the flags and ideas is given below
rocq-logical-path: The Rocq import systems works by mapping physical directories on disk into logical directories which are referenced in the proofs to perform the imports. This flag provides the logical path where the output rocq proofs will be written. For any message with nested messages, this is the logical path where the proofs of the other messages will be imported from. The proofs will always import the correspondinggooseoutput from a logical directory namedNew.code. This is a standard assumption for perennial and related projects with new goose.rocq-physical-path: This is where on disk to write the rocq proofs. There is no strict relation between the rocq physical path and the corresponding rocq logical path. The two should be linked via a-Qflag in the_RocqProjectfile. Grackle will create this directory if it doesn’t exist.debug: Causes the generated go and rocq code to be written to standard output. The format for this output is--- Begin: < file output path > --- < file contents > --- End: < file output path > --- --- Begin: < next file output path > --- < next file contents > --- End: < next file output path > ---go-output-path: The path on disk to write the go output from grackle. As explained later, each message in the proto files of the input directories will get it’s own go package, so grackle will create a nested directory within the provided one.go-package: The full path name for the go package that would correspond to thego-output-path. Unlike the rocq logical and physical paths, there is an expected (but not required) correspondence between the package name and the output path. For projected hosted on GitHub, go convention expects the package name to start withgithub.com/<user name>/<repo name>, for example importing something from grackle as a library would start withgithub.com/mjschwenne/grackle. From this base, go convention expects the package name to be basically the relative path to the repository root to the directory the package lives in,go-output-path. As with the rocq logical path, the generated go code will import from the go code for any message containing enumerations or nested messages from this package.goose-output: If this flag is set, callgooseprogrammatically on the outputgofiles, writing the output to this directory. The output will follow the standardgoosepractice of writing the output into subdirectories that follow the full go package name.proofgen-path: If this flag is set, and thegooseoutput flag is set, call theproofgentool programmatically on the outputgofiles, writing the output to this directory. The output will follow the standardproofgenpractice of writing the output into subdirectories that mirror the full go package name.
A grackle is a bird capable of mimicking other birds and even human speech if it wants to, which seems approate for a tool focused on allowing verified distributed systems to communicate. It’s also a bird whose name starts with ‘g’ to complement goose, a project which converts a subset of Go to Rocq and is used as part of grackle.
Grackle is not compatible with protoc. Neither the wire format grackle uses nor the generated go API
is compatible. Some major differences are highlighted below.
- The
int32andint64fields are treated as unsigned, rather than signed, since support for signed integers in Perennial is still developing. - The wire format is markedly different, see Grackle Wire Format for more information. Probably the most notable two differences is that the grackle wire format does not use variable width integer encodings and fields are not tagged with the field number, but rather serialized in order as they appear in the proto file.
- Go code generation is structured differently. The
protoccompiler outputs one file per proto file while grackle outputs one go package per message (and enum). - Recursive messages, naturally including mutually recursive messages, are not supported by grackle
despite being supported by
protoc. I believegocode generation may work, but the output rocq proofs certainly won’t compile. - Grackle appends a
_gksuffix to generated go packages and proof file names. On the Rocq side, output modules use a_proof_gksuffix.
The proto language is used by Google’s protocol buffers to describe the layout of binary messages in a language agnostic manner (which is a bit ironic since proto is itself a language). Below is a list of supported and unsupported features as well as any commentary on them.
- Messages, including nested messages. (Mutually) recursive use of messages is
not supported, regardless of if it is supported by
protoc. - Integer fields, although all of the integers are mapped to unsigned integers in both go and
rocq. This is mostly due to limited support for signed integers within perennial. Should support
improve, support for signed integers may be added.
int32uint32fixed32int64uint64fixed64
- Boolean fields,
bool. - String fields,
string. - Byte fields,
bytes. - Enums, defined with
enum <name> { ... } - Repeated fields of scalar types, message types and enum types.
- Imports between proto files in the input proto directory or a subdirectory of the input proto directory.
- Maps. After slices, map support is important and used in several places in the perennial ecosystem.
- Signed integer fields
sint32,sfixed32,sint64andsfixed64. This is mostly due to limited support for signed integers within perennial. According to the proto language, regularint32andint64should also be signed even though grackle treats them as unsigned. It is unclear at this point if support for signed integers would change the interpretation of these fields, which would be a breaking change. - Recursive messages. This issue seems particularly challenging on the rocq side, although go code generation is unlikely to be significant issue.
- Nested messages declaration. Referencing other top-level messages for a field
is supported. However, protobuf supports declaring nested messages within a
message, like this example from the protobuf documentation:
message SearchResponse { message Result { string url = 1; string title = 2; repeated string snippets = 3; } repeated Result results = 1; }
anyandoneoflabels. These labels operate as a wildcard type and sum type. They are ignored by grackle and unsupported. Moreover, support for these labels is not planned.
The grackle wire format is must simpler than the protobuf wire format. The two most notable differences are:
- Grackle does not use a variable width encoding for integers. All integers are fixed width and embedded in the serialized message in little-endian byte order.
- Grackle fields are not tagged with the field number defined in the proto file. Fields are serialized and deserialized in the order they appear in the proto file.
As an example, consider the following event message. The complete proto file for
this message can be found here.
message Event {
int32 id = 1;
string name = 4;
TimeStamp startTime = 2;
TimeStamp endTime = 3;
}The wire format for the message would start with the four bytes needed for id,
then the length of the name string as a 64 bit number followed by the bytes
composing the string itself. The null byte is not included since casting a
string to a byte slice ([]byte) does not include the null byte. Finally, the
marshaled message includes the 12 bytes needed to marshal a TimeStamp. In the
below diagram, the number after each field is the length of the field. Note that
the nested structures don’t have to be of a fixed size, but are in this example.
[ | | | | ] [ id (4) | length(name) (8) | name (variable) | startTime (12) | endTime (12) ] [ | | | | ]
Below is an overview of what type of code is generated by grackle. Full examples can be found in the testdata/out subdirectories.
The generated code guide will use the calendar example, which features the Event
message from above and a TimeStamp message containing three integers for the
hour, minute and second. All example output snippets will reference the Event
message since it’s a bit more interesting. Note that the examples in this README
may not be kept up to date, but the files in testdata/out will be.
For this example, grackle might be invoked as
grackle -rocq-logical-path Grackle.test -rocq-physical-path testdata/out/rocq/ -proofgen-path testdata/out/pg -go-output-path testdata/out/go -go-package github.com/mjschwenne/grackle/testdata/out/go -goose-output testdata/out/goose/ testdata/proto/calendar
Which would generate these files (which can be accessed in the testdata/out directory of this repository, along with the other test files):
testdata/out/
├── go
│ ├── calendar_gk
│ │ └── calendar_gk.go
│ ├── event_gk
│ │ └── event_gk.go
│ └── timestamp_gk
│ └── timestamp_gk.go
├── goose
│ └── github_com
│ └── mjschwenne
│ └── grackle
│ └── testdata
│ └── out
│ └── go
│ ├── calendar_gk.v
│ ├── event_gk.v
│ └── timestamp_gk.v
├── pg
│ └── github_com
│ └── mjschwenne
│ └── grackle
│ └── testdata
│ └── out
│ └── go
│ ├── calendar_gk.v
│ ├── event_gk.v
│ └── timestamp_gk.v
└── rocq
├── calendar_proof_gk.v
├── event_proof_gk.v
└── timestamp_proof_gk.v
For a top-level message, grackle will output:
- A go package with the name of the message, all lowercase, and the
_gksuffix, such asevent_gk. Inside the package is one file with the same name. - That file contains
- A struct definition
S. Since go access patterns always use the package name, every message will haveSas the struct name. It can be accessed from outside the package as, e.g.event_gk.Swhich will differentiate between separate messages. - A
Marshalfunction which takes anSand a byte sliceprefixand returnsprefixwith the encoding ofSappended to it. - An
Unmarshalfunction which takes a byte slice and returns anSstruct and the unusedsuffix.
- A struct definition
The use of prefix and suffix enable a stateless marshaling design which is easier to reason about
and more compositional. More information can be found in the Technical Considerations section.
Here is an example of the generated struct definition for the Event message:
type S struct {
Id uint32
Name string
StartTime timestamp_gk.S
EndTime timestamp_gk.S
}Now we have the Marshal function. This function encodes in the input event by serializing each field
in the order declared in the source proto file. The primitive.AssumeNoStringOverflow function comes
from the goose lang primitive library, which allows us to inject the assumption that the string is
less than 2^64 bytes long in the rocq proof.
func Marshal(enc []byte, e S) []byte {
enc = marshal.WriteInt32(enc, e.Id)
primitive.AssumeNoStringOverflow(e.Name)
enc = marshal.WriteLenPrefixedBytes(enc, []byte(e.Name))
enc = timestamp_gk.Marshal(enc, e.StartTime)
enc = timestamp_gk.Marshal(enc, e.EndTime)
return enc
}Finally we have the Unmarshal function. This function iterates through the fields of the struct,
deserializing them one at a time. In the function conclusion the values read from the serialized
byte slice are packed into a struct and returned, along with the remaining bytes in the input byte
slice. Notice that strings are first read as bytes, then converted back to strings. Both bytes and
strings need to be cloned during the unmarshaling process for proof reasons, basically so that full
ownership of the new structure can be returned in the case that the input byte slice has fractional ownership.
func Unmarshal(s []byte) (S, []byte) {
id, s := marshal.ReadInt32(s)
nameBytes, s := marshal.ReadLenPrefixedBytes(s)
name := string(std.BytesClone(nameBytes))
startTime, s := timestamp_gk.Unmarshal(s)
endTime, s := timestamp_gk.Unmarshal(s)
return S{
Id: id,
Name: name,
StartTime: startTime,
EndTime: endTime,
}, s
}The go code generation is the simple part, while the rocq code proof generation
is much more complex.
For a top-level message, grackle will generate a proof file with the lower-cased
message name plus the _proof_gk suffix. That file will contain:
- A rocq type
Ccorresponding to the go struct and the proto message. This could either be a rocq record or an alias to the goose type directly. See the <a href=”#simple-message-optimization “>Simple Message Optimization for more details. - A definition
has_encodingof what it means for a byte slice (or list of bytes in rocq) to encode the record. - A separation logic proposition
ownabout what it means to own a struct. - An encoding lemma
wp_Encodewhich describes the behavior of calling theMarhsalfunction, as well as a proof of correctness. - A decoding lemma
wp_Decodewhich describes the behavior of calling theUnmarshalfunction, as well as a proof of correctness.
Just like with the Go code, grackle starts by outputing a Rocq record. Since the running example,
Event, is a simple message, this definition is an alias for the goose output struct.
Definition C := event_gk.S.t.For a more complex example, like calendar_gk, the generated struct will closely match the go struct
S.
Record C :=
mkC {
hash' : list u8;
events' : list Event_gk.C;
}.Next grackle produces a has_encoding definition which relates the record C to the wire format. In
cases where the field is an integer or similar stand-alone field, it is directly converted into a
sequence of bytes and appended together. Nested messages are a bit different though. The definition
is designed to be compositional, so Event.has_encoding asserts that there exists some sequence of
bytes which faithfully encodes the startTime and endTime time stamps using the
TimeStamp.has_encoding definition.
Definition has_encoding (encoded:list u8) (args:C) : Prop :=
∃ (startTime_enc endTime_enc : list u8),
encoded = (u32_le args.(event_gk.S.Id')) ++
(u64_le $ length $ args.(event_gk.S.Name')) ++ args.(event_gk.S.Name') ++
startTime_enc ++
endTime_enc
/\ length args.(event_gk.S.Name') < 2^64
/\ TimeStamp_gk.has_encoding startTime_enc args.(event_gk.S.StartTime')
/\ TimeStamp_gk.has_encoding endTime_enc args.(event_gk.S.EndTime').Some extra flexibility here is that grackle can allow the TimeStamp encoding to change without
requiring regeneration of the Event code, although a recompile will be required according to rocq
consistency requirements.
Since lists in Rocq can be of arbitrary length, we require that the length of lists is less than 2^64 in the encoding function so that the length can be correctly encoded into a 64 bit integer.
After has_encoding, which related the record to the wire format, we need a way
to represent owning an instance of the record, which will be used in the pre and
post conditions for the serialization lemmas.
Definition own (args__v: event_gk.S.t) (args__c: C) (dq: dfrac) : iProp Σ :=
"%Hown_id" ∷ ⌜ args__v.(event_gk.S.Id') = args__c.(event_gk.S.Id') ⌝ ∗
"%Hown_name" ∷ ⌜ args__v.(event_gk.S.Name') = args__c.(event_gk.S.Name') ⌝ ∗
"%Hown_name_len" ∷ ⌜ length args__c.(event_gk.S.Name') < 2^64 ⌝ ∗
"Hown_startTime" ∷ TimeStamp_gk.own args__v.(event_gk.S.StartTime') args__c.(event_gk.S.StartTime') dq ∗
"Hown_endTime" ∷ TimeStamp_gk.own args__v.(event_gk.S.EndTime') args__c.(event_gk.S.EndTime') dq.The own proposition relates the rocq record to struct representation in goose lang. New goose structs
have fields which the rocq proofs can access. Like with has_encoding, most of the simple fields like
integers can be embedded directly. Since owning a nested struct may entail using some other piece of
state (if there is a slice in the nested message, for instance), grackle also needs to assert
ownership over that, which is accomplished by adding a call to the nested own predicate.
The above rocq definitions exist to enable grackle to actually verify the Marshal and Unmarshal
function generated on the Go side. The wp_Encode lemma describes the behavior of the Marshal
function.
Lemma wp_Encode (args__t : event_gk.S.t) (args__c : C) (pre_sl : slice.t) (prefix : list u8) (dq : dfrac):
{{{
is_pkg_init event_gk ∗
own args__t args__c dq ∗
own_slice pre_sl (DfracOwn 1) prefix ∗
own_slice_cap w8 pre_sl (DfracOwn 1)
}}}
@! event_gk.Marshal #pre_sl #args__t
{{{
enc enc_sl, RET #enc_sl;
⌜ has_encoding enc args__c ⌝ ∗
own args__t args__c dq ∗
own_slice enc_sl (DfracOwn 1) (prefix ++ enc) ∗
own_slice_cap w8 enc_sl (DfracOwn 1)
}}}.
Proof.
(* Generated, but omitted from this README *)
Qed.The encoding lemma requires some level of ownership over an instance of the generated message, using
args__t as the goose lang value and args__c as the rocq level representation (For a simple message,
these might be the same). Here, dq quantifies
the discardable fractional ownership over any heap values in the own definition. Since only read
access is needed to marshal, the value of dq is unbounded with respect to a dfrac. The Marshal
function also needs ownership of the prefix slice that the serialized args__t will be appended
to. Since Marshal writes to this slice, full ownership (DfracOwn 1) and access to the slice’s
capacity is required.
After marshaling, ownership of args__t is returned to the caller (the second piece of the post
condition, own args__t args__c dq=) at the same level of ownership =Marshal was called
with. Additionally, the caller learns that the returned slice encodes prefix ++ enc (the third part
of the postcondition), with enc being some list of bytes which correctly has an encoding for args__c
(using has_encoding, the first part of the post condition).
The mirror image of wp_Encode is naturally wp_Decode, which specifies the
behavior of calling Unmarshal.
Lemma wp_Decode (enc : list u8) (enc_sl : slice.t) (args__c : C) (suffix : list u8) (dq : dfrac):
{{{
is_pkg_init event_gk ∗
⌜ has_encoding enc args__c ⌝ ∗
own_slice enc_sl dq (enc ++ suffix)
}}}
@! event_gk.Unmarshal #enc_sl
{{{
args__t suff_sl, RET (#args__t, #suff_sl);
own args__t args__c (DfracOwn 1) ∗
own_slice suff_sl dq suffix
}}}.
Proof.
(* Generated, but omitted from this README *)
Qed.The conditions for this Texan triple are basically the inverse of wp_Encode. Calling Unmarshal
requires some level of access to a byte slice (the serialized struct with a potentially empty
suffix) and the pure fact that the beginning of the byte slice encodes some rocq level args__c.
After Unmarshal returns, the caller learns that they have full ownership of the newly created struct
and ownership at the same level of just the suffix slice. Note that dq is only bound by being a
valid discardable fractional permission. Removing enc from the beginning of the slice isn’t a write
(which would require full ownership), but rather returns a new pointer to an existing location in
the same potentially read only slice used to call Unmarshal.
Protobuf enums are largely treated as a different type of protobuf message, following many of Grackle’s conventions for working with messages, including:
- Outputting one go package per enum.
- Generating a simple
MarshalandUnmarshalfunction for each enum. - Outputting one rocq proof file per enum.
This section will use the running example of the error enum defined in testdata/proto/calendar/calendar.proto:
enum error {
eOk = 0;
eEndOfFile = 1;
eUnknown = 2;
}Recall that protobuf requires enums have an element with ID number 0, which is treated as the default value of the enum.
type E uint32
const (
EOk E = 0
EEndOfFile E = 1
EUnknown E = 2
)Grackle models the type of an enum as a named uint32. Like the structs output from protobuf
messages, enums are always named E for enum, and are expected to be referenced in other programs as
error_gk.E for this example.
A named type is different from a type alias, with the primary difference being that an E cannot be
assigned to a uint32, although it still can be cast to one. See Open and Closed Enums for more details.
func Marshal(enc []byte, e E) []byte {
return marshal.WriteInt32(enc, uint32(e))
}The Marshal is extremely simple, basically just casts the enum value to a uint32 and marshals the
underlying integer. This function needs to exist to support repeated protobuf fields, modeled in go
as []error_gk.E.
func Unmarshal(s []byte) (E, []byte) {
e_raw, s := marshal.ReadInt32(s)
return E(e_raw), s
}Likewise, the Unmarshal function reads one 32-bit integer and returns the cast results. While the go
code makes no provision for unmarshaling a value not defined in the proto file, the rocq proof for
these functions will require knowing that the encoded value is one of the options defined in the
proto file.
var Name = map[uint32]string{
0: "eOk",
1: "eEndOfFile",
2: "eUnknown",
}
var Value = map[string]uint32{
"eOk": 0,
"eEndOfFile": 1,
"eUnknown": 2,
}These global maps are defined as a convenience to the go programming, mapping the underlying uint32
value to the name of the enum option defined in the proto file and via versa. This enables enum
values to be round tripped through a string if desired.
func (e E) String() string {
return Name[uint32(e)]
}This is the standard go String function, which is used by (for example) the %v format specifier to
print the enum value. This could be useful for debugging or logging purposes in a verified go
project.
The rocq proof will be generated in error_proof_gk.v.
Unlike the packages generated from a protobuf message, the go code generated for an enum includes global variables. In perennial, this means that we have prove during the initialization process that the map variables are initialized with the correct value. The package initialization predicate will expose the fixed values in the go code using a discarded fraction, so verified code cannot modify these maps.
Definition name_map_def : gmap w32 go_string := list_to_map [
((W32 0), "eOk"%go);
((W32 1), "eEndOfFile"%go);
((W32 2), "eUnknown"%go)
].
Definition value_map_def : gmap go_string w32 := list_to_map [
("eOk"%go, (W32 0));
("eEndOfFile"%go, (W32 1));
("eUnknown"%go, (W32 2))
].
Definition is_initialized : iProp Σ :=
∃ name_map value_map,
"HglobalName" ∷ (global_addr error_gk.Name) ↦□ name_map ∗
"Hname" ∷ name_map ↦$□ name_map_def ∗
"HglobalValue" ∷ (global_addr error_gk.Value) ↦□ value_map ∗
"Hvalue" ∷ value_map ↦$□ value_map_def.
#[global] Instance : IsPkgInit error_gk := define_is_pkg_init is_initialized.
#[global] Instance : GetIsPkgInitWf error_gk := build_get_is_pkg_init_wf.
Lemma wp_initialize' get_is_pkg_init :
get_is_pkg_init_prop error_gk get_is_pkg_init ->
{{{ own_initializing get_is_pkg_init ∗ is_go_context ∗ □ is_pkg_defined error_gk }}}
error_gk.initialize' #()
{{{ RET #(); own_initializing get_is_pkg_init ∗ is_pkg_init error_gk }}}.
Proof.
(* Generated, but omitted from this README *)
Qed.While the go code models the enum as technically open (meaning that you can define new members of
the enum’s type outside of the definition, i.e. with error_gk.E(9)), rocq has excellent support for
inductive data types which naturally map to closed enums (meaning it is not possible to define new
values of the type outside of the original declaration).
See Open and Closed Enums for more information.
Inductive I :=
| eOk
| eEndOfFile
| eUnknown.
Definition to_tag i : w32 :=
match i with
| eOk => W32 0
| eEndOfFile => W32 1
| eUnknown => W32 2
end.Rocq users can take advantage of the inductive type error_gk.I using basic rocq tactics like
destruct.
Definition has_encoding (encoded : list u8) (args : I) : Prop :=
encoded = u32_le $ to_tag $ args.
Definition own (args__v : error_gk.E.t) (args__c : I) (dq : dfrac) : iProp Σ :=
"%Herror_eq" ∷ ⌜ args__v = to_tag args__c ⌝.The definitions for has_encoding and own output by Grackle follow all the rules for protobuf
messages, but can be simplified by knowing an enum is just a single 32 bit unsigned integer. Since
enums are modeled as an inductive type, they are by definition not simple.
Lemma wp_Encode (args__t : error_gk.E.t) (args__c : I) (pre_sl : slice.t) (prefix : list u8) (dq : dfrac):
{{{
is_pkg_init error_gk ∗
own args__t args__c dq ∗
own_slice pre_sl (DfracOwn 1) prefix ∗
own_slice_cap w8 pre_sl (DfracOwn 1)
}}}
@! error_gk.Marshal #pre_sl #args__t
{{{
enc enc_sl, RET #enc_sl;
⌜ has_encoding enc args__c ⌝ ∗
own args__t args__c dq ∗
own_slice enc_sl (DfracOwn 1) (prefix ++ enc) ∗
own_slice_cap w8 enc_sl (DfracOwn 1)
}}}.
Proof.
(* Generated, but omitted from this README *)
Qed.
Lemma wp_Decode (enc : list u8) (enc_sl : slice.t) (args__c : I) (suffix : list u8) (dq : dfrac):
{{{
is_pkg_init error_gk ∗
own_slice enc_sl dq (enc ++ suffix) ∗
⌜ has_encoding enc args__c ⌝
}}}
@! error_gk.Unmarshal #enc_sl
{{{
args__t suff_sl, RET (#args__t, #suff_sl);
own args__t args__c (DfracOwn 1) ∗
own_slice suff_sl dq suffix
}}}.
Proof.
(* Generated, but omitted from this README *)
Qed.These lemma definitions exactly match the ones generated for protobuf messages, which is intentional, so that enums can be marshaled and unmarhaled in the same style as messages.
There are several technical considerations for a piece of software like grackle, relating to API design, proof generation and the underlying tool-chain. While the below discussion is non-exhaustive, they are some points I have additional comments on.
Grackle builds on top of the existing marshal library for perennial, which provides a set of
“primitives” for reading integers, byte slices and other go types. This library provides two APIs, a
stateful and stateless one. The stateful version requires the programmer to create an object which
internal tracks pervious write calls before expelling a binary blob when marshaling is complete (not
unlike a java StringBuilder). The nice feature about this design is that it removes the requirement
that the programmer track the prefix / suffix slices.
Grackle uses the stateless API which always return the prefix / suffix slices and adopts that style
itself. The reason for this is mostly that this style is easier to reason about within perennial and
enables stronger compositionality between messages. While grackle is technically a simple compiler,
it doesn’t employ standard compiler techniques like generating then transforming an abstract syntax
tree, instead using go’s template package (see go templates for more information). It’s ability to
track complex and contextual information like managing state is limited, particularly on the rocq
side. Ultimately the stateless design is a way to reduce complexity for an already complex piece of
software.
There are two ways to define the behavior of an enumeration, as discussed in the protobuf documentation: open enums and closed enums.
- An open enum (or really open inductive) allows you to define new members of the type ad-hoc. So if
a
2was parsed into an enum with only defined protobuf values for0and1, some representation of2would be passed into a program which may not be able to handle this value. - A closed enum which encounters an unknown value during a parse would report the default value for the enum, refusing to allow an unknown value in the enum.
The protobuf specification states that proto3, the version targeted by Grackle, should treat enums
as open. Their motivation is to avoid unknown enum values from being dropped or reordered by brokers
in the middle of a protobuf application. However, there are stronger reasoning benefits in rocq by
modeling the enum as an Inductive type. Thus, Grackle breaks from the protobuf spec to model an enum
as an Inductive, which you saw under rocq enums in the code generation guide.
Support for enums in Grackle came with global variable maps, inspired by the actual protobuf enum
API. However, support in perennial for map literals was limited. I worked on improving this area of
perennial. Goose would already output a map literal as a list of key-value pairs. However, it was
difficult to convert from a pair of goose lang value to a goose lang value that is a tuple. I
experimented with adding this rule to tactics like wp_pures, but that proved too aggressive and
broke some other perennial features. Rather, it proved beneficial to have the tuple IntoVal instance
but not include it in the default automation. Then it can be manually applied when needed to reduce
a list of pairs of values to a list of pair values. I also contributed wp_map_literal, which takes a
list of the new map.kv_entry goose lang constructs and returns a goose lang map containing that set
of key-value pairs.
This work was implemented in perennial #323.
Since new goose structs now have accessible fields, at first it seemed possible to have Grackle for
new goose not use custom rocq records, like the old goose version had. Early manual examples for
things like the TimeStamp and Event protobuf messages were even written which only used the goose
generated representation of timestamp_gk.S and event_gk.S. However, as we pushed for more features
in Grackle, it became clear that custom records were going to be required. The driving example here
was for slices. Representing a slice in perennial requires connecting a slice.t to some rocq level
list of objects of the type of the slice. Thus, we needed a way to track and move around this list,
which led to the re-introduction of the custom records and the own predicate.
However, since it was possible to write some Grackle proofs without a custom record type, I wanted
to preserve the simplicity when possible, since this would be easier for Grackle users when own can
just take the same parameter twice.
Formally, a message is simple if:
- It doesn’t contain an enum field
- It doesn’t contain a repeated field of any type.
- It doesn’t contain a field with protobuf type
bytes. - It doesn’t contain a message which isn’t simple.
The last requirement proved to be the tricky one, since checking if a message is simple now requires checking if all the nested messages are simple. Implementing this required in implicit topological sort of the messages in a protobuf file to determine the dependencies among messages. Also note that protobuf itself is fine with recursive or mutually recursive message definitions, which need to be handled to prevent Grackle from entering an infinite loop. Thus when Grackle detects such a loop (simply speaking by having a mechanism to check if Grackle currently in the process of determining weather the message it’s about to check is simple), it assumes that the messages are not simple. It is always safe to assume a message is not simple when it is, but assuming that a message is simple when it’s not will lead to failing proofs.
The body of the main wp_Encode and wp_Decode lemmas aren’t given in this README, but there are some
interesting things I have to say about their structure. These proofs can be found in the
event_proof.v file.
As first described during code generation, the proofs for these important lemmas can be thought of as a proof setup, followed by a series of proof snippets, one for each field in the proto message and then a conclusion. Visually, that might look something like this:
Proof. +----------------------------------------------------------------------------+ | | | Proof Setup | | | +----------------------------------------------------------------------------+ +----------------------------------------------------------------------------+ | | | Field Snippets | | | | +------------------------------------------------------------------------+ | | | | | | | First Field | | | | | | | +------------------------------------------------------------------------+ | | | | +------------------------------------------------------------------------+ | | | | | | | Second Field | | | | | | | +------------------------------------------------------------------------+ | | | | . | | . | | . | | | | +------------------------------------------------------------------------+ | | | | | | | Last Field | | | | | | | +------------------------------------------------------------------------+ | | | +----------------------------------------------------------------------------+ +----------------------------------------------------------------------------+ | | | Conclusion | | | +----------------------------------------------------------------------------+ Qed.
At first pass, it seems difficult to find proof snippets which can be arbitrarily ordered and
instantiated. Fortunately, we can effectively break the task of marshaling or unmarshaling a struct
down to operating on the head or end of a slice at a field level. Each snippet requires the same set
of hypotheses, such as Hsl which tracks the prefix / suffix slices. Thus, it’s the job of the proof
setup to create the initial hypotheses needed to process each field. Since perennial employs an
affine logic, each snippet then has to reintroduce the next version of any common hypotheses it
consumes. The snippets also leak information about each field further down the chain of fields until
the proof conclusion is reached. The conclusion’s role is to show that the post condition as been
satisfied, and to do this it assumes that all of the needed hypotheses have been created with a
well-known name by the corresponding snippet.
This type of proof design incentivizes compact, highly local proof snippets which ignore everything
that isn’t related to the field at hand. However in the conclusion, grackle needs to tie everything
together. Consider the conclusion from the calendar_gk wp_Encode proof:
unfold has_encoding.
split; last done.
exists events_enc.
split; first repeat (f_equal; try word).
all: try done.
rewrite <- Hown_events_sz'.
done. This generated snippet needs to know that events_enc and Hown_events_sz exist,
since those are variables and propositions generated over the course of the proof. The existence of
these non-local proof constructs mandates the usage of well-know names for everything that needs to
referenced later. Maintaining these types of structures are required but more challenging than
individual proof snippets. I like to believe that this requirement also enables grackle proofs to be
highly readable and understandable by the proof engineer.
Grackle makes extensive use of the text/template templating package in the go standard library. The
templates themselves can be found in internal/templates. Note that grackle uses << and >> as
template delimiters to avoid conflicts with Texan triples. This library is surprisingly expressive
and powerful, but does have limitations.
- Go templates, particularly with non-standard templating delimiters, do not have syntax
highlighting in any text editor I use, which can match is challenging to match up
<<end>>with the correspondingiforrangeconstructs. - While the templates do support comments using
<</*and*/>>,, the templates used in Grackle are largely uncommented, since without syntax highlighting I prefer to keep each template as light as possible. - The templates are all handcrafted. This is a boon as much as a con, since I can finely control things like indentation in the Rocq output, but it often leads to hard to read templates that may do the same thing in multiple different ways.
- The templates struggle to track state between templates, which influenced the choice to use the stateless API for grackle.
However I don’t really want to be excessively negative here. The templating library did include several features I was surprised by, including:
- Recursive templates. These were used in several places in the version of Grackle which targeted old goose, although the current version has been simplified sufficiently to eliminate the need for recursive templates.
- Exposing custom
gofunctions to the templates. In thesetupTemplatesfunction in grackle.go, lots of custom helper functions are exposed to the templates. A lot of these are basic string manipulation functions, but several extend the functionality of the templates themselves such ascallTemplatewhich enabled support for dynamically building the name of the template to call.dictwhich I use to bundle multiple “arguments” into one template call since the templates only support taking one argument by default.
- Extremely fine-grained control over the style of the output code. This is particularly relevant
for the rocq code since the go code passes through
gofmtbefore becoming output. This enables me to pursue readable automatically generated rocq which helps with debugging the proof snippets themselves.
Grackle development makes use of several tools selected solely for my own convenience, however they
aren’t required to use or development the software. The only hard requirement is to have go and
protoc installed (although I’d highly recommend having Rocq installed to so that you can compile the
output proofs). Notably, no code generation plugin is needed for protoc. Grackle uses it to generate
a descriptor set which can be parsed by the go protobuf libraries to interpret the structure of the
input proto files.
Other things to be aware of in the grackle repository:
direnvis setup to use anixflake. This is because I use nixos as my primary operating system. The flake notably installs go, rocq, protoc and goose. Goose is invoked by grackle via a go package, so it does not need to be installed on your system, but I’ve found it helpful to have around.- The
run_tests.fishscript runs grackle over the example proto files, then checks that all the output code compiles. Finally it runs the go test suite, which is admittedly limited since it’s easy to generate code that looks correct but doesn’t compile. Running this script provides a higher level of confidence than the go test suite itself.