Skip to content

Automated proof-instrumented marshalling and unmarshalling for verified distributed systems in Go.

Notifications You must be signed in to change notification settings

mjschwenne/grackle

Repository files navigation

Grackle

Grackle is a project to automate the marshaling and unmarshaling code for verified go programs using the perennial framework. Effectively, it aims to build on top of the existing marshaling library by creating a compiler capable of taking a subset of the Protobuf specification (.proto file) and emitting the Go code which can perform the marshaling and unmarshaling of the described messages and a rocq proof which enables reasoning about the marshaling process in other verified projects.

Table of Contents

Usage

Grackle is capable of generating three outputs for an input directory of proto files:

  1. Go code which marshals and unmarshals the messages.
  2. Goose output, a representation of the generated go code in goose lang and some low-level generated rocq projects from the proofgen tool.
  3. Rocq proofs using perennial that the generated go code is correct.

Grackle can generate just the go code, just the rocq proofs, go and goose code, rocq and go or all three. Basically, the go and rocq proof generation can happen separately, but goose code can only be generated when go code is.

Usage: grackle [options] <path to protobuf files>
  -debug
    	Output all generated code to stdout
  -go-output-path string
    	Physical path to output go code into
  -go-package string
    	Fully qualified root package for the output packages
  -goose-output string
    	Directory to write the goose output
  -proofgen-path string
    	Physical output path for proofgen files
  -rocq-logical-path string
    	Logical path to import the marshal proofs from
  -rocq-physical-path string
    	Physical output path for rocq proofs

To use grackle, the tool needs at least three pieces of information, a directory with the .proto files and either

  • the two rocq flags
  • the two go flags

The other flags are optional. A full invocation of grackle would be

grackle -rocq-logical-path Grackle.test -rocq-physical-path testdata/out/rocq/
-proofgen-path testdata/out/pg -go-output-path testdata/out/go -go-package
github.com/mjschwenne/grackle/testdata/out/go -goose-output testdata/out/goose/
testdata/proto/calendar

A more detailed description of the flags and ideas is given below

  • rocq-logical-path : The Rocq import systems works by mapping physical directories on disk into logical directories which are referenced in the proofs to perform the imports. This flag provides the logical path where the output rocq proofs will be written. For any message with nested messages, this is the logical path where the proofs of the other messages will be imported from. The proofs will always import the corresponding goose output from a logical directory named New.code. This is a standard assumption for perennial and related projects with new goose.
  • rocq-physical-path : This is where on disk to write the rocq proofs. There is no strict relation between the rocq physical path and the corresponding rocq logical path. The two should be linked via a -Q flag in the _RocqProject file. Grackle will create this directory if it doesn’t exist.
  • debug : Causes the generated go and rocq code to be written to standard output. The format for this output is
    --- Begin: < file output path > ---
    
    < file contents >
    
    --- End: < file output path > ---
    --- Begin: < next file output path > ---
    
    < next file contents >
    
    --- End: < next file output path > ---
        
  • go-output-path : The path on disk to write the go output from grackle. As explained later, each message in the proto files of the input directories will get it’s own go package, so grackle will create a nested directory within the provided one.
  • go-package : The full path name for the go package that would correspond to the go-output-path. Unlike the rocq logical and physical paths, there is an expected (but not required) correspondence between the package name and the output path. For projected hosted on GitHub, go convention expects the package name to start with github.com/<user name>/<repo name>, for example importing something from grackle as a library would start with github.com/mjschwenne/grackle. From this base, go convention expects the package name to be basically the relative path to the repository root to the directory the package lives in, go-output-path. As with the rocq logical path, the generated go code will import from the go code for any message containing enumerations or nested messages from this package.
  • goose-output : If this flag is set, call goose programmatically on the output go files, writing the output to this directory. The output will follow the standard goose practice of writing the output into subdirectories that follow the full go package name.
  • proofgen-path: If this flag is set, and the goose output flag is set, call the proofgen tool programmatically on the output go files, writing the output to this directory. The output will follow the standard proofgen practice of writing the output into subdirectories that mirror the full go package name.

Why “Grackle”

A grackle is a bird capable of mimicking other birds and even human speech if it wants to, which seems approate for a tool focused on allowing verified distributed systems to communicate. It’s also a bird whose name starts with ‘g’ to complement goose, a project which converts a subset of Go to Rocq and is used as part of grackle.

Compatibility with protoc

Grackle is not compatible with protoc. Neither the wire format grackle uses nor the generated go API is compatible. Some major differences are highlighted below.

  • The int32 and int64 fields are treated as unsigned, rather than signed, since support for signed integers in Perennial is still developing.
  • The wire format is markedly different, see Grackle Wire Format for more information. Probably the most notable two differences is that the grackle wire format does not use variable width integer encodings and fields are not tagged with the field number, but rather serialized in order as they appear in the proto file.
  • Go code generation is structured differently. The protoc compiler outputs one file per proto file while grackle outputs one go package per message (and enum).
  • Recursive messages, naturally including mutually recursive messages, are not supported by grackle despite being supported by protoc. I believe go code generation may work, but the output rocq proofs certainly won’t compile.
  • Grackle appends a _gk suffix to generated go packages and proof file names. On the Rocq side, output modules use a _proof_gk suffix.

Supported Protobuf Features

The proto language is used by Google’s protocol buffers to describe the layout of binary messages in a language agnostic manner (which is a bit ironic since proto is itself a language). Below is a list of supported and unsupported features as well as any commentary on them.

Supported

  • Messages, including nested messages. (Mutually) recursive use of messages is not supported, regardless of if it is supported by protoc.
  • Integer fields, although all of the integers are mapped to unsigned integers in both go and rocq. This is mostly due to limited support for signed integers within perennial. Should support improve, support for signed integers may be added.
    • int32
    • uint32
    • fixed32
    • int64
    • uint64
    • fixed64
  • Boolean fields, bool.
  • String fields, string.
  • Byte fields, bytes.
  • Enums, defined with enum <name> { ... }
  • Repeated fields of scalar types, message types and enum types.
  • Imports between proto files in the input proto directory or a subdirectory of the input proto directory.

Planned Support

  • Maps. After slices, map support is important and used in several places in the perennial ecosystem.

Unsupported

  • Signed integer fields sint32, sfixed32, sint64 and sfixed64. This is mostly due to limited support for signed integers within perennial. According to the proto language, regular int32 and int64 should also be signed even though grackle treats them as unsigned. It is unclear at this point if support for signed integers would change the interpretation of these fields, which would be a breaking change.
  • Recursive messages. This issue seems particularly challenging on the rocq side, although go code generation is unlikely to be significant issue.
  • Nested messages declaration. Referencing other top-level messages for a field is supported. However, protobuf supports declaring nested messages within a message, like this example from the protobuf documentation:
    message SearchResponse {
      message Result {
        string url = 1;
        string title = 2;
        repeated string snippets = 3;
      }
      repeated Result results = 1;
    }
        
  • any and oneof labels. These labels operate as a wildcard type and sum type. They are ignored by grackle and unsupported. Moreover, support for these labels is not planned.

Grackle Wire Format

The grackle wire format is must simpler than the protobuf wire format. The two most notable differences are:

  1. Grackle does not use a variable width encoding for integers. All integers are fixed width and embedded in the serialized message in little-endian byte order.
  2. Grackle fields are not tagged with the field number defined in the proto file. Fields are serialized and deserialized in the order they appear in the proto file.

As an example, consider the following event message. The complete proto file for this message can be found here.

message Event {
    int32 id = 1;
    string name = 4;
    TimeStamp startTime = 2;
    TimeStamp endTime = 3;
}

The wire format for the message would start with the four bytes needed for id, then the length of the name string as a 64 bit number followed by the bytes composing the string itself. The null byte is not included since casting a string to a byte slice ([]byte) does not include the null byte. Finally, the marshaled message includes the 12 bytes needed to marshal a TimeStamp. In the below diagram, the number after each field is the length of the field. Note that the nested structures don’t have to be of a fixed size, but are in this example.

[        |                  |                 |                |              ]
[ id (4) | length(name) (8) | name (variable) | startTime (12) | endTime (12) ]
[        |                  |                 |                |              ]

Generated Code Guide

Below is an overview of what type of code is generated by grackle. Full examples can be found in the testdata/out subdirectories.

The generated code guide will use the calendar example, which features the Event message from above and a TimeStamp message containing three integers for the hour, minute and second. All example output snippets will reference the Event message since it’s a bit more interesting. Note that the examples in this README may not be kept up to date, but the files in testdata/out will be.

For this example, grackle might be invoked as

grackle -rocq-logical-path Grackle.test -rocq-physical-path testdata/out/rocq/
-proofgen-path testdata/out/pg -go-output-path testdata/out/go -go-package
github.com/mjschwenne/grackle/testdata/out/go -goose-output testdata/out/goose/
testdata/proto/calendar

Which would generate these files (which can be accessed in the testdata/out directory of this repository, along with the other test files):

testdata/out/
├── go
│   ├── calendar_gk
│   │   └── calendar_gk.go
│   ├── event_gk
│   │   └── event_gk.go
│   └── timestamp_gk
│       └── timestamp_gk.go
├── goose
│   └── github_com
│       └── mjschwenne
│           └── grackle
│               └── testdata
│                   └── out
│                       └── go
│                           ├── calendar_gk.v
│                           ├── event_gk.v
│                           └── timestamp_gk.v
├── pg
│   └── github_com
│       └── mjschwenne
│           └── grackle
│               └── testdata
│                   └── out
│                       └── go
│                           ├── calendar_gk.v
│                           ├── event_gk.v
│                           └── timestamp_gk.v
└── rocq
    ├── calendar_proof_gk.v
    ├── event_proof_gk.v
    └── timestamp_proof_gk.v

Protobuf Messages

Go

For a top-level message, grackle will output:

  • A go package with the name of the message, all lowercase, and the _gk suffix, such as event_gk. Inside the package is one file with the same name.
  • That file contains
    • A struct definition S. Since go access patterns always use the package name, every message will have S as the struct name. It can be accessed from outside the package as, e.g. event_gk.S which will differentiate between separate messages.
    • A Marshal function which takes an S and a byte slice prefix and returns prefix with the encoding of S appended to it.
    • An Unmarshal function which takes a byte slice and returns an S struct and the unused suffix.

The use of prefix and suffix enable a stateless marshaling design which is easier to reason about and more compositional. More information can be found in the Technical Considerations section.

Go Struct

Here is an example of the generated struct definition for the Event message:

type S struct {
	Id        uint32
	Name      string
	StartTime timestamp_gk.S
	EndTime   timestamp_gk.S
}
Marshal Function

Now we have the Marshal function. This function encodes in the input event by serializing each field in the order declared in the source proto file. The primitive.AssumeNoStringOverflow function comes from the goose lang primitive library, which allows us to inject the assumption that the string is less than 2^64 bytes long in the rocq proof.

func Marshal(enc []byte, e S) []byte {
	enc = marshal.WriteInt32(enc, e.Id)
	primitive.AssumeNoStringOverflow(e.Name)
	enc = marshal.WriteLenPrefixedBytes(enc, []byte(e.Name))
	enc = timestamp_gk.Marshal(enc, e.StartTime)
	enc = timestamp_gk.Marshal(enc, e.EndTime)

	return enc
}
Unmarshal Function

Finally we have the Unmarshal function. This function iterates through the fields of the struct, deserializing them one at a time. In the function conclusion the values read from the serialized byte slice are packed into a struct and returned, along with the remaining bytes in the input byte slice. Notice that strings are first read as bytes, then converted back to strings. Both bytes and strings need to be cloned during the unmarshaling process for proof reasons, basically so that full ownership of the new structure can be returned in the case that the input byte slice has fractional ownership.

func Unmarshal(s []byte) (S, []byte) {
	id, s := marshal.ReadInt32(s)
	nameBytes, s := marshal.ReadLenPrefixedBytes(s)
	name := string(std.BytesClone(nameBytes))
	startTime, s := timestamp_gk.Unmarshal(s)
	endTime, s := timestamp_gk.Unmarshal(s)

	return S{
		Id:        id,
		Name:      name,
		StartTime: startTime,
		EndTime:   endTime,
	}, s
}

The go code generation is the simple part, while the rocq code proof generation is much more complex.

Rocq

For a top-level message, grackle will generate a proof file with the lower-cased message name plus the _proof_gk suffix. That file will contain:

  • A rocq type C corresponding to the go struct and the proto message. This could either be a rocq record or an alias to the goose type directly. See the <a href=”#simple-message-optimization “>Simple Message Optimization for more details.
  • A definition has_encoding of what it means for a byte slice (or list of bytes in rocq) to encode the record.
  • A separation logic proposition own about what it means to own a struct.
  • An encoding lemma wp_Encode which describes the behavior of calling the Marhsal function, as well as a proof of correctness.
  • A decoding lemma wp_Decode which describes the behavior of calling the Unmarshal function, as well as a proof of correctness.
Rocq Record

Just like with the Go code, grackle starts by outputing a Rocq record. Since the running example, Event, is a simple message, this definition is an alias for the goose output struct.

Definition C := event_gk.S.t.

For a more complex example, like calendar_gk, the generated struct will closely match the go struct S.

Record C :=
    mkC {
        hash' : list u8;
        events' : list Event_gk.C;
        }.
Has Encoding

Next grackle produces a has_encoding definition which relates the record C to the wire format. In cases where the field is an integer or similar stand-alone field, it is directly converted into a sequence of bytes and appended together. Nested messages are a bit different though. The definition is designed to be compositional, so Event.has_encoding asserts that there exists some sequence of bytes which faithfully encodes the startTime and endTime time stamps using the TimeStamp.has_encoding definition.

Definition has_encoding (encoded:list u8) (args:C) : Prop :=
  ∃ (startTime_enc endTime_enc : list u8), 
  encoded = (u32_le args.(event_gk.S.Id')) ++
              (u64_le $ length $ args.(event_gk.S.Name')) ++ args.(event_gk.S.Name') ++
              startTime_enc ++
              endTime_enc
  /\ length args.(event_gk.S.Name') < 2^64
  /\ TimeStamp_gk.has_encoding startTime_enc args.(event_gk.S.StartTime')
  /\ TimeStamp_gk.has_encoding endTime_enc args.(event_gk.S.EndTime').

Some extra flexibility here is that grackle can allow the TimeStamp encoding to change without requiring regeneration of the Event code, although a recompile will be required according to rocq consistency requirements.

Since lists in Rocq can be of arbitrary length, we require that the length of lists is less than 2^64 in the encoding function so that the length can be correctly encoded into a 64 bit integer.

Own

After has_encoding, which related the record to the wire format, we need a way to represent owning an instance of the record, which will be used in the pre and post conditions for the serialization lemmas.

Definition own (args__v: event_gk.S.t) (args__c: C) (dq: dfrac) : iProp Σ :=
  "%Hown_id" ∷ ⌜ args__v.(event_gk.S.Id') = args__c.(event_gk.S.Id') ⌝ ∗
  "%Hown_name" ∷ ⌜ args__v.(event_gk.S.Name') = args__c.(event_gk.S.Name') ⌝ ∗
  "%Hown_name_len" ∷ ⌜ length args__c.(event_gk.S.Name') < 2^64 ⌝ ∗
  "Hown_startTime" ∷ TimeStamp_gk.own args__v.(event_gk.S.StartTime') args__c.(event_gk.S.StartTime') dq ∗
  "Hown_endTime" ∷ TimeStamp_gk.own args__v.(event_gk.S.EndTime') args__c.(event_gk.S.EndTime') dq.

The own proposition relates the rocq record to struct representation in goose lang. New goose structs have fields which the rocq proofs can access. Like with has_encoding, most of the simple fields like integers can be embedded directly. Since owning a nested struct may entail using some other piece of state (if there is a slice in the nested message, for instance), grackle also needs to assert ownership over that, which is accomplished by adding a call to the nested own predicate.

Encoding

The above rocq definitions exist to enable grackle to actually verify the Marshal and Unmarshal function generated on the Go side. The wp_Encode lemma describes the behavior of the Marshal function.

Lemma wp_Encode (args__t : event_gk.S.t) (args__c : C) (pre_sl : slice.t) (prefix : list u8) (dq : dfrac):
  {{{
        is_pkg_init event_gk ∗
        own args__t args__c dq ∗ 
        own_slice pre_sl (DfracOwn 1) prefix ∗
        own_slice_cap w8 pre_sl (DfracOwn 1)
  }}}
    @! event_gk.Marshal #pre_sl #args__t
  {{{
        enc enc_sl, RET #enc_sl;
        ⌜ has_encoding enc args__c ⌝ ∗
        own args__t args__c dq ∗ 
        own_slice enc_sl (DfracOwn 1) (prefix ++ enc) ∗
        own_slice_cap w8 enc_sl (DfracOwn 1)
  }}}.

Proof.
  (* Generated, but omitted from this README *)
Qed.

The encoding lemma requires some level of ownership over an instance of the generated message, using args__t as the goose lang value and args__c as the rocq level representation (For a simple message, these might be the same). Here, dq quantifies the discardable fractional ownership over any heap values in the own definition. Since only read access is needed to marshal, the value of dq is unbounded with respect to a dfrac. The Marshal function also needs ownership of the prefix slice that the serialized args__t will be appended to. Since Marshal writes to this slice, full ownership (DfracOwn 1) and access to the slice’s capacity is required.

After marshaling, ownership of args__t is returned to the caller (the second piece of the post condition, own args__t args__c dq=​) at the same level of ownership =Marshal was called with. Additionally, the caller learns that the returned slice encodes prefix ++ enc (the third part of the postcondition), with enc being some list of bytes which correctly has an encoding for args__c (using has_encoding, the first part of the post condition).

Decoding

The mirror image of wp_Encode is naturally wp_Decode, which specifies the behavior of calling Unmarshal.

Lemma wp_Decode (enc : list u8) (enc_sl : slice.t) (args__c : C) (suffix : list u8) (dq : dfrac):
  {{{
        is_pkg_init event_gk ∗
        ⌜ has_encoding enc args__c ⌝ ∗
        own_slice enc_sl dq (enc ++ suffix)
  }}}
    @! event_gk.Unmarshal #enc_sl
  {{{
        args__t suff_sl, RET (#args__t, #suff_sl);
        own args__t args__c (DfracOwn 1) ∗ 
        own_slice suff_sl dq suffix
  }}}.

Proof.
  (* Generated, but omitted from this README *)
Qed.

The conditions for this Texan triple are basically the inverse of wp_Encode. Calling Unmarshal requires some level of access to a byte slice (the serialized struct with a potentially empty suffix) and the pure fact that the beginning of the byte slice encodes some rocq level args__c.

After Unmarshal returns, the caller learns that they have full ownership of the newly created struct and ownership at the same level of just the suffix slice. Note that dq is only bound by being a valid discardable fractional permission. Removing enc from the beginning of the slice isn’t a write (which would require full ownership), but rather returns a new pointer to an existing location in the same potentially read only slice used to call Unmarshal.

Protobuf Enums

Protobuf enums are largely treated as a different type of protobuf message, following many of Grackle’s conventions for working with messages, including:

  • Outputting one go package per enum.
  • Generating a simple Marshal and Unmarshal function for each enum.
  • Outputting one rocq proof file per enum.

This section will use the running example of the error enum defined in testdata/proto/calendar/calendar.proto:

enum error {
  eOk = 0;
  eEndOfFile = 1;
  eUnknown = 2;
}

Recall that protobuf requires enums have an element with ID number 0, which is treated as the default value of the enum.

Go

Type Definition
type E uint32

const (
	EOk        E = 0
	EEndOfFile E = 1
	EUnknown   E = 2
)

Grackle models the type of an enum as a named uint32. Like the structs output from protobuf messages, enums are always named E for enum, and are expected to be referenced in other programs as error_gk.E for this example.

A named type is different from a type alias, with the primary difference being that an E cannot be assigned to a uint32, although it still can be cast to one. See Open and Closed Enums for more details.

Marshal
func Marshal(enc []byte, e E) []byte {
	return marshal.WriteInt32(enc, uint32(e))
}

The Marshal is extremely simple, basically just casts the enum value to a uint32 and marshals the underlying integer. This function needs to exist to support repeated protobuf fields, modeled in go as []error_gk.E.

Unmarshal
func Unmarshal(s []byte) (E, []byte) {
	e_raw, s := marshal.ReadInt32(s)
	return E(e_raw), s
}

Likewise, the Unmarshal function reads one 32-bit integer and returns the cast results. While the go code makes no provision for unmarshaling a value not defined in the proto file, the rocq proof for these functions will require knowing that the encoded value is one of the options defined in the proto file.

Helper Global Variables
var Name = map[uint32]string{
	0: "eOk",
	1: "eEndOfFile",
	2: "eUnknown",
}

var Value = map[string]uint32{
	"eOk":        0,
	"eEndOfFile": 1,
	"eUnknown":   2,
}

These global maps are defined as a convenience to the go programming, mapping the underlying uint32 value to the name of the enum option defined in the proto file and via versa. This enables enum values to be round tripped through a string if desired.

String
func (e E) String() string {
	return Name[uint32(e)]
}

This is the standard go String function, which is used by (for example) the %v format specifier to print the enum value. This could be useful for debugging or logging purposes in a verified go project.

Rocq

The rocq proof will be generated in error_proof_gk.v.

Package Initialization

Unlike the packages generated from a protobuf message, the go code generated for an enum includes global variables. In perennial, this means that we have prove during the initialization process that the map variables are initialized with the correct value. The package initialization predicate will expose the fixed values in the go code using a discarded fraction, so verified code cannot modify these maps.

Definition name_map_def : gmap w32 go_string := list_to_map [
                                                ((W32 0), "eOk"%go);
                                                ((W32 1), "eEndOfFile"%go);
                                                ((W32 2), "eUnknown"%go)
                                              ].
Definition value_map_def : gmap go_string w32 := list_to_map [
                                                 ("eOk"%go, (W32 0));
                                                 ("eEndOfFile"%go, (W32 1));
                                                 ("eUnknown"%go, (W32 2))
                                               ].
Definition is_initialized : iProp Σ :=
  ∃ name_map value_map,
  "HglobalName" ∷ (global_addr error_gk.Name) ↦□ name_map ∗
  "Hname" ∷ name_map ↦$□ name_map_def ∗
  "HglobalValue" ∷ (global_addr error_gk.Value) ↦□ value_map ∗
  "Hvalue" ∷ value_map ↦$□ value_map_def.

#[global] Instance : IsPkgInit error_gk := define_is_pkg_init is_initialized.
#[global] Instance : GetIsPkgInitWf error_gk := build_get_is_pkg_init_wf.

Lemma wp_initialize' get_is_pkg_init :
  get_is_pkg_init_prop error_gk get_is_pkg_init ->
  {{{ own_initializing get_is_pkg_init ∗ is_go_context ∗ □ is_pkg_defined error_gk }}}
    error_gk.initialize' #()
  {{{ RET #(); own_initializing get_is_pkg_init ∗ is_pkg_init error_gk }}}.

Proof.
  (* Generated, but omitted from this README *)
Qed.
Inductive Definition

While the go code models the enum as technically open (meaning that you can define new members of the enum’s type outside of the definition, i.e. with error_gk.E(9)), rocq has excellent support for inductive data types which naturally map to closed enums (meaning it is not possible to define new values of the type outside of the original declaration).

See Open and Closed Enums for more information.

Inductive I :=
| eOk
| eEndOfFile
| eUnknown.

Definition to_tag i : w32 :=
  match i with
  | eOk => W32 0
  | eEndOfFile => W32 1
  | eUnknown => W32 2
  end.

Rocq users can take advantage of the inductive type error_gk.I using basic rocq tactics like destruct.

Has Encoding & Own
Definition has_encoding (encoded : list u8) (args : I) : Prop :=
  encoded = u32_le $ to_tag $ args.

Definition own (args__v : error_gk.E.t) (args__c : I) (dq : dfrac) : iProp Σ :=
  "%Herror_eq" ∷ ⌜ args__v = to_tag args__c ⌝.

The definitions for has_encoding and own output by Grackle follow all the rules for protobuf messages, but can be simplified by knowing an enum is just a single 32 bit unsigned integer. Since enums are modeled as an inductive type, they are by definition not simple.

Encode & Decode
Lemma wp_Encode (args__t : error_gk.E.t) (args__c : I) (pre_sl : slice.t) (prefix : list u8) (dq : dfrac):
  {{{
        is_pkg_init error_gk ∗
        own args__t args__c dq ∗
        own_slice pre_sl (DfracOwn 1) prefix ∗
        own_slice_cap w8 pre_sl (DfracOwn 1)
  }}}
    @! error_gk.Marshal #pre_sl #args__t
  {{{
        enc enc_sl, RET #enc_sl;
        ⌜ has_encoding enc args__c ⌝ ∗
        own args__t args__c dq ∗
        own_slice enc_sl (DfracOwn 1) (prefix ++ enc) ∗
        own_slice_cap w8 enc_sl (DfracOwn 1)
  }}}.

Proof.
  (* Generated, but omitted from this README *)
Qed.

Lemma wp_Decode (enc : list u8) (enc_sl : slice.t) (args__c : I) (suffix : list u8) (dq : dfrac):
  {{{
        is_pkg_init error_gk ∗  
        own_slice enc_sl dq (enc ++ suffix) ∗
        ⌜ has_encoding enc args__c ⌝
  }}}
    @! error_gk.Unmarshal #enc_sl
  {{{
        args__t suff_sl, RET (#args__t, #suff_sl);
        own args__t args__c (DfracOwn 1) ∗
        own_slice suff_sl dq suffix
  }}}.

Proof.
  (* Generated, but omitted from this README *)
Qed.

These lemma definitions exactly match the ones generated for protobuf messages, which is intentional, so that enums can be marshaled and unmarhaled in the same style as messages.

Technical Considerations

There are several technical considerations for a piece of software like grackle, relating to API design, proof generation and the underlying tool-chain. While the below discussion is non-exhaustive, they are some points I have additional comments on.

Stateless Design

Grackle builds on top of the existing marshal library for perennial, which provides a set of “primitives” for reading integers, byte slices and other go types. This library provides two APIs, a stateful and stateless one. The stateful version requires the programmer to create an object which internal tracks pervious write calls before expelling a binary blob when marshaling is complete (not unlike a java StringBuilder). The nice feature about this design is that it removes the requirement that the programmer track the prefix / suffix slices.

Grackle uses the stateless API which always return the prefix / suffix slices and adopts that style itself. The reason for this is mostly that this style is easier to reason about within perennial and enables stronger compositionality between messages. While grackle is technically a simple compiler, it doesn’t employ standard compiler techniques like generating then transforming an abstract syntax tree, instead using go’s template package (see go templates for more information). It’s ability to track complex and contextual information like managing state is limited, particularly on the rocq side. Ultimately the stateless design is a way to reduce complexity for an already complex piece of software.

Open and Closed Enums

There are two ways to define the behavior of an enumeration, as discussed in the protobuf documentation: open enums and closed enums.

  • An open enum (or really open inductive) allows you to define new members of the type ad-hoc. So if a 2 was parsed into an enum with only defined protobuf values for 0 and 1, some representation of 2 would be passed into a program which may not be able to handle this value.
  • A closed enum which encounters an unknown value during a parse would report the default value for the enum, refusing to allow an unknown value in the enum.

The protobuf specification states that proto3, the version targeted by Grackle, should treat enums as open. Their motivation is to avoid unknown enum values from being dropped or reordered by brokers in the middle of a protobuf application. However, there are stronger reasoning benefits in rocq by modeling the enum as an Inductive type. Thus, Grackle breaks from the protobuf spec to model an enum as an Inductive, which you saw under rocq enums in the code generation guide.

Map Literals

Support for enums in Grackle came with global variable maps, inspired by the actual protobuf enum API. However, support in perennial for map literals was limited. I worked on improving this area of perennial. Goose would already output a map literal as a list of key-value pairs. However, it was difficult to convert from a pair of goose lang value to a goose lang value that is a tuple. I experimented with adding this rule to tactics like wp_pures, but that proved too aggressive and broke some other perennial features. Rather, it proved beneficial to have the tuple IntoVal instance but not include it in the default automation. Then it can be manually applied when needed to reduce a list of pairs of values to a list of pair values. I also contributed wp_map_literal, which takes a list of the new map.kv_entry goose lang constructs and returns a goose lang map containing that set of key-value pairs.

This work was implemented in perennial #323.

Simple Message Optimization

Since new goose structs now have accessible fields, at first it seemed possible to have Grackle for new goose not use custom rocq records, like the old goose version had. Early manual examples for things like the TimeStamp and Event protobuf messages were even written which only used the goose generated representation of timestamp_gk.S and event_gk.S. However, as we pushed for more features in Grackle, it became clear that custom records were going to be required. The driving example here was for slices. Representing a slice in perennial requires connecting a slice.t to some rocq level list of objects of the type of the slice. Thus, we needed a way to track and move around this list, which led to the re-introduction of the custom records and the own predicate.

However, since it was possible to write some Grackle proofs without a custom record type, I wanted to preserve the simplicity when possible, since this would be easier for Grackle users when own can just take the same parameter twice.

Formally, a message is simple if:

  • It doesn’t contain an enum field
  • It doesn’t contain a repeated field of any type.
  • It doesn’t contain a field with protobuf type bytes.
  • It doesn’t contain a message which isn’t simple.

The last requirement proved to be the tricky one, since checking if a message is simple now requires checking if all the nested messages are simple. Implementing this required in implicit topological sort of the messages in a protobuf file to determine the dependencies among messages. Also note that protobuf itself is fine with recursive or mutually recursive message definitions, which need to be handled to prevent Grackle from entering an infinite loop. Thus when Grackle detects such a loop (simply speaking by having a mechanism to check if Grackle currently in the process of determining weather the message it’s about to check is simple), it assumes that the messages are not simple. It is always safe to assume a message is not simple when it is, but assuming that a message is simple when it’s not will lead to failing proofs.

Proof Structure

The body of the main wp_Encode and wp_Decode lemmas aren’t given in this README, but there are some interesting things I have to say about their structure. These proofs can be found in the event_proof.v file.

As first described during code generation, the proofs for these important lemmas can be thought of as a proof setup, followed by a series of proof snippets, one for each field in the proto message and then a conclusion. Visually, that might look something like this:

Proof.
 +----------------------------------------------------------------------------+
 |                                                                            |
 |                                 Proof Setup                                |
 |                                                                            |
 +----------------------------------------------------------------------------+

 +----------------------------------------------------------------------------+
 |                                                                            |
 |                               Field Snippets                               |
 |                                                                            |
 | +------------------------------------------------------------------------+ |
 | |                                                                        | |
 | |                               First Field                              | |
 | |                                                                        | |
 | +------------------------------------------------------------------------+ |
 |                                                                            |
 | +------------------------------------------------------------------------+ |
 | |                                                                        | |
 | |                              Second Field                              | |
 | |                                                                        | |
 | +------------------------------------------------------------------------+ |
 |                                                                            |
 |                                     .                                      |
 |                                     .                                      |
 |                                     .                                      |
 |                                                                            |
 | +------------------------------------------------------------------------+ |
 | |                                                                        | |
 | |                               Last Field                               | |
 | |                                                                        | |
 | +------------------------------------------------------------------------+ |
 |                                                                            |
 +----------------------------------------------------------------------------+

 +----------------------------------------------------------------------------+
 |                                                                            |
 |                                 Conclusion                                 |
 |                                                                            |
 +----------------------------------------------------------------------------+
Qed.

At first pass, it seems difficult to find proof snippets which can be arbitrarily ordered and instantiated. Fortunately, we can effectively break the task of marshaling or unmarshaling a struct down to operating on the head or end of a slice at a field level. Each snippet requires the same set of hypotheses, such as Hsl which tracks the prefix / suffix slices. Thus, it’s the job of the proof setup to create the initial hypotheses needed to process each field. Since perennial employs an affine logic, each snippet then has to reintroduce the next version of any common hypotheses it consumes. The snippets also leak information about each field further down the chain of fields until the proof conclusion is reached. The conclusion’s role is to show that the post condition as been satisfied, and to do this it assumes that all of the needed hypotheses have been created with a well-known name by the corresponding snippet.

This type of proof design incentivizes compact, highly local proof snippets which ignore everything that isn’t related to the field at hand. However in the conclusion, grackle needs to tie everything together. Consider the conclusion from the calendar_gk wp_Encode proof:

unfold has_encoding.
split; last done.
exists events_enc.
split; first repeat (f_equal; try word).
all: try done.
rewrite <- Hown_events_sz'.
done. 

This generated snippet needs to know that events_enc and Hown_events_sz exist, since those are variables and propositions generated over the course of the proof. The existence of these non-local proof constructs mandates the usage of well-know names for everything that needs to referenced later. Maintaining these types of structures are required but more challenging than individual proof snippets. I like to believe that this requirement also enables grackle proofs to be highly readable and understandable by the proof engineer.

Go Templates

Grackle makes extensive use of the text/template templating package in the go standard library. The templates themselves can be found in internal/templates. Note that grackle uses <​< and >​> as template delimiters to avoid conflicts with Texan triples. This library is surprisingly expressive and powerful, but does have limitations.

  • Go templates, particularly with non-standard templating delimiters, do not have syntax highlighting in any text editor I use, which can match is challenging to match up <<end>> with the corresponding if or range constructs.
  • While the templates do support comments using <​</* and */>​>,, the templates used in Grackle are largely uncommented, since without syntax highlighting I prefer to keep each template as light as possible.
  • The templates are all handcrafted. This is a boon as much as a con, since I can finely control things like indentation in the Rocq output, but it often leads to hard to read templates that may do the same thing in multiple different ways.
  • The templates struggle to track state between templates, which influenced the choice to use the stateless API for grackle.

However I don’t really want to be excessively negative here. The templating library did include several features I was surprised by, including:

  • Recursive templates. These were used in several places in the version of Grackle which targeted old goose, although the current version has been simplified sufficiently to eliminate the need for recursive templates.
  • Exposing custom go functions to the templates. In the setupTemplates function in grackle.go, lots of custom helper functions are exposed to the templates. A lot of these are basic string manipulation functions, but several extend the functionality of the templates themselves such as
    • callTemplate which enabled support for dynamically building the name of the template to call.
    • dict which I use to bundle multiple “arguments” into one template call since the templates only support taking one argument by default.
  • Extremely fine-grained control over the style of the output code. This is particularly relevant for the rocq code since the go code passes through gofmt before becoming output. This enables me to pursue readable automatically generated rocq which helps with debugging the proof snippets themselves.

Development

Grackle development makes use of several tools selected solely for my own convenience, however they aren’t required to use or development the software. The only hard requirement is to have go and protoc installed (although I’d highly recommend having Rocq installed to so that you can compile the output proofs). Notably, no code generation plugin is needed for protoc. Grackle uses it to generate a descriptor set which can be parsed by the go protobuf libraries to interpret the structure of the input proto files.

Other things to be aware of in the grackle repository:

  • direnv is setup to use a nix flake. This is because I use nixos as my primary operating system. The flake notably installs go, rocq, protoc and goose. Goose is invoked by grackle via a go package, so it does not need to be installed on your system, but I’ve found it helpful to have around.
  • The run_tests.fish script runs grackle over the example proto files, then checks that all the output code compiles. Finally it runs the go test suite, which is admittedly limited since it’s easy to generate code that looks correct but doesn’t compile. Running this script provides a higher level of confidence than the go test suite itself.

About

Automated proof-instrumented marshalling and unmarshalling for verified distributed systems in Go.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •