Table of Contents generated with DocToc
Chainsformer is an Apache Arrow Flight service built on top of ChainStorage as a stateless adaptor service. It currently supports batch data processing and micro batch data streaming from ChainStorage service to the Spark data processing platform.
It aims to provide a set of easy to use interfaces to support spark consumers to read and process ChainStorage Data on the Spark platform:
- It defines a set of standardized block and transaction data schema for each asset class (i.e EVM assets or bitcoin).
- It provides data transformation capability from protobuf to Arrow format.
- It can be easily scaled up to support higher data throughput.
- It can be easily integrated via the Chainsformer Spark Connector (https://github.com/coinbase/chainsformer-spark-source) for structured data streaming.
Make sure your local go version is 1.18 by running the following commands:
brew install go@1.18
brew unlink go
brew link go@1.18
brew install protobuf@3.21.12
brew unlink protobuf
brew link protobufTo set up for the first time (only done once):
make bootstrapRebuild everything:
make buildChainsformer depends on the following environment variables to resolve the path of the configuration.
The directory structure is as follows: config/chainsformer/{blockchain}/{network}/{environment}.yml.
CHAINSFORMER_CONFIG: This env var, in the format of{blockchain}-{network}, determines the blockchain and network managed by the service. The naming is defined in chainstorage/protos/coinbase/c3/common/common.protpCHAINSFORMER_ENVIRONMENT: This env var controls the{environment}in which the service is deployed. Possible values includeproduction,development, andlocal(which is also the default value).
Asset specific configurations are stored in the config directory under the Chainsformer service repo. The config folder structure follows the following form ./config/chainsformer/{blockchain}/{network}/base.yml
- Simply follow the config folder structure to add new configurations for any new blockchains or new networks of existing blockchains.
- Add new tests in the config_test.go
- Add new test configs in teh testapp.go
Clone the Chainsformer service repo:
git clone https://github.com/coinbase/chainsformer.gitChange directory to the Chainsformer service repo:
cd chainsformerSetup Chainstorage SDK credentials
export CHAINSTORAGE_SDK_AUTH_HEADER=cb-nft-api-token
export CHAINSTORAGE_SDK_AUTH_TOKEN=****To set up Chainsformer for the first time (only done once):
make bootstrapRebuild Chainsformer:
make buildStart the Chainsformer service with default CHAINSFORMER_CONFIG=ethereum-mainnet:
make serverQuery Chainsformer for a range of blocks
go run ./cmd/client --env local --blockchain ethereum --network mainnet --start 0 --end 10 --table blocksQuery Chainsformer for a range of block events
go run ./cmd/client --env local --blockchain ethereum --network mainnet --start 0 --end 10 --table streamed_blocksCalling the GetSchema API
cmd=$(echo -n '{"table": "blocks"}' | base64)
grpcurl --plaintext -d '{"cmd":'"\"$cmd\""',"type":2}' localhost:9090 arrow.flight.protocol.FlightService.GetSchemaCalling the GetFlightInfo API to partition the data
cmd=$(echo -n '{"batch_query": {"start_height": 0, "end_height": 10, "table": "blocks"}}' | base64)
grpcurl --plaintext -d '{"cmd":'"\"$cmd\""',"type":2}' localhost:9090 arrow.flight.protocol.FlightService.GetFlightInfoTake one of the ticket returned by the above command
...
"endpoint": [
{
"ticket": {
"ticket": "eyJiYXRjaF9xdWVyeSI6eyJlbmRfaGVpZ2h0IjoiMTAiLCJ0YWJsZSI6ImJsb2NrcyJ9fQ=="
}
}
]
...
Calling the DoGet API to get data for one of the partition
grpcurl --plaintext -d '{"ticket": "eyJiYXRjaF9xdWVyeSI6eyJlbmRfaGVpZ2h0IjoiMTAiLCJ0YWJsZSI6ImJsb2NrcyJ9fQ=="}' localhost:9090 arrow.flight.protocol.FlightService.DoGetCalling the DoGet API to get data of a specific partition
cmd=$(echo -n '{"batch_query":{"start_height":"1", "end_height":"2", "table":"blocks"}}' | base64)
grpcurl --plaintext -d '{"ticket": '"\"$cmd\""'}' localhost:9090 arrow.flight.protocol.FlightService.DoGetCalling the DoAction API to get the tip in ChainStorage via Chainsformer
grpcurl --plaintext -d '{"type": "TIP"}' localhost:9090 arrow.flight.protocol.FlightService.DoAction | jq '.body | @base64d'Calling the GetSchema API
cmd=$(echo -n '{"table": "streamed_blocks"}' | base64)
grpcurl --plaintext -d '{"cmd":'"\"$cmd\""',"type":2}' localhost:9090 arrow.flight.protocol.FlightService.GetSchemaCalling the GetFlightInfo API to partition the data
cmd=$(echo -n '{"stream_query": {"start_sequence": 0, "end_sequence": 10, "table": "streamed_blocks"}}' | base64)
grpcurl --plaintext -d '{"cmd":'"\"$cmd\""',"type":2}' localhost:9090 arrow.flight.protocol.FlightService.GetFlightInfoTake one of the ticket returned by the above command
...
"endpoint": [
{
"ticket": {
"ticket": "eyJzdHJlYW1fcXVlcnkiOnsic3RhcnRfc2VxdWVuY2UiOiIxIiwiZW5kX3NlcXVlbmNlIjoiMTAiLCJ0YWJsZSI6InN0cmVhbWVkX2Jsb2NrcyJ9fQ=="
}
}
]
...
Calling the DoGet API to get data for one of the partition
grpcurl --plaintext -d '{"ticket": "eyJzdHJlYW1fcXVlcnkiOnsic3RhcnRfc2VxdWVuY2UiOiIxIiwiZW5kX3NlcXVlbmNlIjoiMTAiLCJ0YWJsZSI6InN0cmVhbWVkX2Jsb2NrcyJ9fQ=="}' localhost:9090 arrow.flight.protocol.FlightService.DoGetCalling the DoGet API to get data of a specific partition
cmd=$(echo -n '{"stream_query":{"start_sequence":"1", "end_sequence":"2", "table":"streamed_blocks"}}' | base64)
grpcurl --plaintext -d '{"ticket": '"\"$cmd\""'}' localhost:9090 arrow.flight.protocol.FlightService.DoGetCalling the DoAction API to get the tip in ChainStorage via Chainsformer
grpcurl --plaintext -d '{"type": "STREAM_TIP"}' localhost:9090 arrow.flight.protocol.FlightService.DoAction | jq '.body | @base64d'# Run everything
make testUnder development
Under development