26 KiB
Tendermint Verifiable Computation and Storage Demo
This demo application shows how verifiable computations might be processed by a distributed cluster of nodes. It comes with a set of hardcoded operations that can be invoked by the client. Each requested operation is computed by every (ignoring failures or Byzantine cases) cluster node, and if any node disagrees with the computation outcome it can submit a dispute to an external Judge.
Results of each computation are stored on the cluster nodes and can be later on retrieved by the client. The storage of the results is secured with Merkle proofs, so malicious nodes can't substitute them with bogus data.
Because every computation is verified by the cluster nodes and computation outcomes are verified using Merkle proofs, the client normally doesn't have to interact with the entire cluster. Moreover, the client can interact with as little as a single node – this won't change safety properties. However, liveness might be compromised – for example if the node the client is interacting with is silently dropping incoming requests.
Motivation
The application is a proof-of-concept of a distributed system with the following properties:
- Support of arbitrary deterministic operations: simple reads/writes as well as complex and time-consuming calculations.
- High availability: tolerance to simultaneous failures or Byzantine actions of some subset of nodes.
- High throughput (1000 transactions per second) and low latency (1-2 seconds) of operations.
- Small blockchain finality time (several seconds).
- Extremely low probability of consistency violation.
Architecture overview
The entire application is distributed over set of machines having the following roles:
- client-side Proxy, which originates the requests to the cluster
- cluster Node, which serves the requests
- the Judge, which is in charge of resolving complicated disagreements between nodes
The application uses blockchain approach to decompose the application logic into 2 main parts:
- replicated transaction log
- state machine with the domain-specific logic This decomposition allows to simplify development process. This modularization is not only logical but also phisical: transaction log and state machine run in separate processes, developed in different languages.
The application uses Tendermint platform which provides replicated transaction log components (TM Core), in particular:
- distributed transaction cache (Mempool)
- blockchain (to store transactions persistently)
- Byzantine-resistant consensus logic (to reach agreement about the order of transactions)
- peer-to-peer layer to communicate with another nodes
- entry point for client requests
To perform domain-specific logic the application uses its own State machine implementing Tendermint's ABCI interface to follow Tendermint's architecture. It is written in Scala 2.12, compatible with Tendermint v0.19.x
and uses com.github.jtendermint.jabci
for Java ABCI definitions.
As the application is intended to run normally in presence of some failures, including Byzantine failures, the following principles used:
- Every operation result is verifiable (and thus trusted by the client).
- The application uses Tendermint's implementation of Byzantine fault-tolerant consensus algorithms to provide safety and liveness without external interference to the cluster – while more than 2/3 of cluster nodes are correct (quorum exists).
- It can restore liveness and even safety after violating quorum requirements – every node could rapidly detect problems with the blockchain or disagreement with the rest of nodes and raise a dispute to the Judge.
The State machine maintains its state using in-memory key-value string storage. Keys here are hierarchical, /
-separated. This key tree is merkelized, so every key stores Merkle hash of its associated value (if present) and its children keys.
Operations
Tendermint architecture suppose that the client typically interacts with the Application via the local Proxy. This application uses Python query.py
script as client-side Proxy to request arbitrary operations to the cluster, including:
- Simple
put
requests which specify a target key and a constant as its new value:put a/b=10
. - Computational
put
requests which specify that a target key should be assigned to the result of some operation (with arguments) invocation:put a/c=factorial(a/b)
. - Requests to obtain a result of running an arbitrary operation:
run factorial(a/b)
,run sum(a/b,a/c)
. - Requests to read the value of specified key:
get a/b
.
get
operations do not change the state of the application. They are implemented via Tendermint ABCI queries. As the result of a such query the State machine returns the value of the requested key together with the Merkle proof.
put
operations are effectful and change the application state explicitly. They are implemented via Tendermint transactions that combined into blocks. TM Core sends a transaction to the State machine and State machine applies this transaction to its state, typically changing the target key.
get
and put
operations use different techniques to prove to the client that the operation is actually invoked and its result is correct. get
-s take advantage of Merkelized structure of the application state and provide Merkle proof of the result correctness. Any put
invocation leads to adding the corresponding transaction to the blockchain, so observing this transaction in a correctly signed block means that there is a quorum in the cluster regarding this transaction's invocation.
run
operations are also effectul. They are implemented as combinations of put
and get
requests: to perform operation trustfully, Proxy first requests put
-ting the result of operation to some key and then queries its value. Thus the correctness is ensured by both the consensus and Merkle proof.
Installation and run
To run the App, a Node machine needs:
- Scala 2.12 with
sbt
- Tendermint
- GNU
screen
(to run single-machine cluster)
To query Nodes from client-side proxy, a client machine needs:
- Python 2.7 with
sha3
package installed
For single-node run just launch the application:
sbt run
And launch Tendermint in another terminal:
# uncomment line below to initialize Tendermint
#tendermint init
# uncomment line below to clear all Tendermint data
#tendermint unsafe_reset_all
tendermint node --consensus.create_empty_blocks=false
In case Tendermint launched first, it would periodically try to connect the app until the app started.
After successful launch the client can communicate with the application via sending RPC calls to TM Core on local 46678
port.
Local cluster
There are scripts that automate deployment and running 4 Application nodes on the local machine.
source node4-init.sh
node4-init.sh
prepares all required configuration files to launch 4-node cluster locally.
./node4-start.sh
node4-start.sh
starts 8 screen instances (app[1-4]
instances for the app and tm[1-4]
– for TM Core). Cluster initialization may take some seconds, after that the client can query RPC endpoints on any of 46158
, 46258
, 46358
or 46458
ports.
Other scripts allow to temporarily stop (node4-stop.sh
), delete (node4-delete.sh
) and reinitialize/rerun (node4-reset.sh
) the cluster.
Sending queries
Writing operations
Examples below use localhost:46157
to query TM Core, for 4-node single-machine cluster requests to other endpoints (46257
, 46357
, 46457
) behave the same way. For single-node launch (just one TM and one App) the default port is 46657
.
To set a new key-value mapping use:
python query.py localhost:46157 put a/b=10
...
OK
HEIGHT: 2
INFO: 10
This creates hierarchical key a/b
(if necessary) and map it to 10
. HEIGHT
value could be used later to verify the INFO
by querying the blockchain.
This script outputs the height value corresponding to the provided transaction. The height is available upon executing because query.py
script uses broadcast_tx_commit
RPC to send transactions to Tendermint. To query the latest transactions in the blockchain run:
python parse_chain.py localhost:46157
This command outputs last 50 non-empty blocks in the blockchain with a short summary about transactions. Here one can ensure that the provided transaction indeed included in the block with height from the response. This fact verifies that Tendermint majority (more than 2/3 of configured validator nodes) agreed on including this transaction in the mentioned block which certified by their signatures. Signature details (including information about all Consensus rounds and phases) can be found by requesting Tendermint RPC:
curl -s 'localhost:46157/block?height=_' # replace _ with actual height number
get
operation allows to copy a value from one key to another:
python query.py localhost:46157 put "a/c=get(a/b)"
...
INFO: 10
Submitting an increment
operation increments the referenced key value and copies the old referenced key value to target key:
python query.py localhost:46157 put "a/d=increment(a/c)"
...
INFO: 10
To prevent Tendermint from declining transaction that repeats one of the previously applied transactions, it's possible to put any characters after ###
at the end of transaction string, this part of string ignored:
python query.py localhost:46157 put "a/d=increment(a/c)###again"
...
INFO: 11
sum
operation sums the values of references keys and assigns the result to the target key:
python query.py localhost:46157 put "a/e=sum(a/c,a/d)"
...
INFO: 23
factorial
operation calculates the factorial of the referenced key value:
python query.py localhost:46157 put "a/f=factorial(a/b)"
...
INFO: 3628800
hiersum
operation calculates the sum of non-empty values for the referenced key and its descendants by hierarchy (all non-empty values should be integer):
python query.py localhost:46157 put "c/asum=hiersum(a)"
...
INFO: 3628856
Operations are not applied in case of wrong arguments (non-integer values to increment
, sum
, factorial
or wrong number of arguments). Operations with a target key like get
, increment
, sum
, factorial
return the new value of the target key as INFO
, but this value is tentative and cannot be trusted if the serving node is not reliable. To verify the returned INFO
one needs to query
the target key explicitly.
Simple queries
get
reads values from KVStore:
python query.py localhost:46157 get a/e
...
RESULT: 23
ls
can be used to obtain requested key's immediate children:
python query.py localhost:46657 ls a
...
RESULT: e f b c d
Computations without target key
As mentioned above, run
query is a combination of subsequent:
- operation processing
put
-ting its result to a special key- Merkelized
get
of this key
Below is the example (note that no target key specified here):
python query.py localhost:46157 run "factorial(a/b)"
...
RESULT: 3628800
Implementation details
A. How Proxy sees operation processing
Let's observe how run
request processing looks like.
- Proxy gets
run
from the client. - Proxy decomposes operation into 2 interactions with cluster: transaction submit and response query.
- It takes some state key
opTarget
(it might use different such keys and choose one of them somehow). - For transaction submit Proxy:
- Obtains
opTx
as<opTarget>=<op>
. TheopTx
binary representation is transaction in terms of Tendermint. - Queries some TM via RPC call:
http://<node_endpoint>/broadcast_tx_commit?tx=<opTx>
. - In case of correct (without error messages and not timed out) TM response it treats
height
from it asopHeight
and considers transaction committed (but yet not validated) and proceeds to the next step.
- Obtains
- Proxy check whether
opHeight
-th block containsopTx
indeed:- Queries
http://<node_endpoint>/block?height=<opHeight>
. - In case of correct TM response it checks for
opTx
existence in transaction list section of response and checks block signature. - Upon leaving this step Proxy is sure that the cluster already performed the operation, committed it to the state, but it has no information about reaching consensus for the operation result.
- Queries
- Proxy waits for
opHeight+1
-th block to ensure cluster consensus for resulting app hash:- Waits some small time.
- Starts periodically querying
http://<node_endpoint>/block?height=<opHeight+1>
. - Once getting successful response, it checks block signatures.
- It also get
app_hash
for response (it corresponds to app hash afterheight
-th block). - Query loop in this step can be replaced with
NewBlock
subscription via WebSocket RPC. - Upon leaving this step Proxy is sure that the cluster has already performed the operation, wrote it to
opTarget
and reached consensus aboutopTarget
value.
- Proxy queries
opTarget
value:- It makes RPC call for key-value read with explicit height and claim for proof
http://<node_endpoint>/abci_query?height=<opHeight>&prove=true&path=<opTarget>
. - It got response containing
value
(interpreted asopResult
) andproof
. - It checks that
opResult
,proof
andapp_hash
are consistent with each other.
- It makes RPC call for key-value read with explicit height and claim for proof
- Proxy returns
opResult
to the client.
B. How Tendermint sees transaction submit
Let's look how Tendermint on some node (say, N) treats transaction submit (step A4) and makes some post-submit checks (A5, A6).
- TM gets
broadcast_tx_commit
RPC call withopTx
binary string from Proxy. - Mempool processing:
- TM's RPC endpoint transfers the transaction to TM's Mempool module.
- Mempool invokes local App's
CheckTx
ABCI method. If App might reject the transaction, this information is sent to the client and no further action happens. - The transaction gossip begins: the
opTx
starts spreading through other nodes. - Also Mempool caches the transaction (in order to not accept repeated broadcasts of
opTx
).
- Consensus processing:
- When the current TM proposer (this is some cluster node, not N in common case) is ready to create a new block, it grabs some amount of the earliest yet not committed transactions from local Mempool. If the transaction rate is intensive enough or even exceed TM/App throughput, it is possible that
opTx
may 'wait' during several block formation before it would be grabbed by Consensus. - As soon as
opTx
and other transactions reach Consensus module, block election starts. Proposer creates block proposal (that describes all transactions in the current block) for current round, then other nodes make votes. In order to reach consensus for the block, the election should pass all consensus stages (propose, pre-vote, pre-commit) with the majority of TM votes (more that 2/3 of cluster nodes). If it doesn't work for some reason (votes timeout, Byzantive proposer), proposer changed and a new round starts (possibly with another transaction set for the current block).
- When the current TM proposer (this is some cluster node, not N in common case) is ready to create a new block, it grabs some amount of the earliest yet not committed transactions from local Mempool. If the transaction rate is intensive enough or even exceed TM/App throughput, it is possible that
- Post-consensus interaction with the local App:
- When election successfully passed all stages each correct TM understands that consensus is reached. Then it invokes App's ABCI methods:
BeginBlock
,DeliverTx
(for each transaction),EndBlock
,Commit
. - An information from
DeliverTx
call then sent back to Proxy app_hash
field fromCommit
call is stored by TM before making the next block.
- When election successfully passed all stages each correct TM understands that consensus is reached. Then it invokes App's ABCI methods:
- The new,
height
-th, block metadata and transaction set now committed and becomes available via RPC's likeblock
,blockchain
,status
(including call in Step A5). However the recently obtained from App block's app hash yet not stored in the blockchain (because an App hash for some block is stored in the blockchain metadata for the next block). - Next block processing:
- Steps B2-B5 repeated for the next,
height+1
-th block. It may take some time, depending on new transactions availability and rate and commit timeout settings. - The consensus for
height+1
-th block is only possible if the majority (more than 2/3 of TM's) agree aboutheight
-th block app hash. Soapp_hash
information inheight+1
-th block header refers toapp_hash
provided on Step B4 forheight
-th block (which is checked on Step A6).
- Steps B2-B5 repeated for the next,
C. How ABCI App sees transaction submit
Now we dig into details of processing the transaction on App side (on node N).
- On Step B2 TM asks App via
CheckTx
call. This is lightweight checking that works well if some signification part of transactions might be rejected by App by some reason (for example it becomes inconsistent after applying some recently committed other transaction). This allows avoiding unnecessary transaction gossip and need of permanent storing such incorrect transaction in the blockchain after commit.- In case
CheckTx
invoked once butopTx
is not grabbed by proposer's Consensus module for the next block,CheckTx
reinvoked for every subsequent block untilopTx
eventually grabbed by proposer (because after some block commit,opTx
might become incorrect and need to beCheckTx
-ed).
- In case
- On Step B4 TM invokes App's ABCI
DeliverTx
method.- App can reject the transaction (it's OK because lightweight
CheckTx
cannot check any possible failure cases) and avoid to change anything. It this case TM would store the transaction anyway because the block already formed. - Normally App applies the transaction to the state. It maintains the 'real-time' state that already applied all previous changes not only from previous blocks' transactions but even for all previous transactions of the current block.
- App can reject the transaction (it's OK because lightweight
- On Step B4 TM also invokes App's ABCI
Commit
method that signals that block commit is over. The App must return the actual state hash (app hash) as the result. As said before, this app hash would correspond toheight
-th block and be stored in theheight+1
-th block metadata.
Note that to follow Tendermint architecture Step C2 and C3 behavior are purely deterministic. It guarantees in normal (non-Byzantine) case scenario both the same app hash from different nodes and the same app hash from a single node after replaying transactions by TM (for example when node recovers from fail). This determinism includes transaction acceptance status (accepted or rejected), transaction application to the real-time state and app hash computation logic.
D. How Tendermint sees response query
Response query initiated by Proxy on Step A7. Queries are processed by TM's Query module. Processing is very straightforward: it's just proxying the query to the local App.
E. How ABCI App sees response query
Query processing on the App performed in the following way:
- App gets
height
,prove
flag andpath
from the query. - The query should be applied to the state exactly corresponded to
height
-th block (this is not 'real-time' consensus state and in general not 'mempool' state).- In case App do not store all block states and
height
is too old, it might reject the query. - Otherwise it applies the query to the corresponding state. Queries might be complex enough but not every query might be proved efficiently. Therefore it's expected that queries are relatively simple like specific value's read or hierarchical structure scan.
- In case of read query on Step A7, App just reads
opTarget
value previously written by applyingopTx
and committed inheight
-th block.
- In case App do not store all block states and
- If proof flag requested (as on Step A7), App also produce Merkle path (or some other provable information) that supply
opTarget
value verification with respect to givenheight
and it's app hash (fromheight+1
block metadata) - The response containing the value, Merkle proof and any other information sent back to the local TM.
Transactions and Merkle hashes
Examples above usually demonstrate a single transaction per block or empty blocks. Note that the App does not recalculate Merkle hashes during DeliverTx
processing. In case of several transactions per block (when massive broadcasting of multiple transactions via broadcast_tx_sync
or broadcast_tx_async
RPCs performed), the App modifies key tree and marks changed paths by clearing Merkle hashes until ABCI Commit
processing.
On Commit
the App recalculates Merkle hash along changed paths only. Finally, the app returns the resulting root Merkle hash to Tendermint and this hash is stored as app_hash
for corresponding height in the blockchain.
Note that described merkelized structure is just for demo purposes and not self-balanced, it remains efficient only until it the user transactions keep it relatively balanced. Something like Patricia tree should be more appropriate to achieve self-balancing.
Dispute cases
The examples below illustrate different situations when cluster's liveness and safety properties are violated. This can be caused not only by Byzantine behavior of nodes but also by some failures or bugs on particular cluster nodes. These examples show that such situations can be efficiently detected while at least one correct node exists. To fix such disputes in the production system, some Supervisor side should exist and correct nodes should be able to communicate with it.
Dispute case 1: some nodes honest, some not, no quorum
When the last block is height
-th and there is no quorum (neither honest nor Byzantine) for height+1
-th block's voting, liveness is violated and new blocks cannot be formed. Such situation might happen if the cluster cannot reach an agreement about next block. Even if TM Core works as expected, different Apps on different nodes might provide to local TM's different app hashes for height
-th block.
To simulate app_hash
disputes in 4-node cluster the App uses special key wrong
. Every node's App (indexed 1 to 4 which corresponds to their ports 46158
, 46258
, 46358
, 46458
) interprets any occurrence of its own index in wrong
value as the flag to provide wrong app_hash
to TM Core. This convention works well to illustrate Dispute case 1. First, let's try using tx
to submit new wrong
value:
python query.py localhost:46157 tx wrong=34
HEIGHT: 3
INFO: 34
OK
This invocation return info 34
and OK
status. At first glance, everything is well because height
-th (the 3rd actually) block formed and INFO
equal to new value 34
got. However this INFO
should be considered as tentative because despite successful the 3rd block formation it's needed to wait for the 4th block that should contain app_hash
for 3rd block. Note that the source of INFO
is just output of DeliverTx
from single App and this output is neither merkelized nor signed by other nodes.
Now the blockchain has inconsistent state. Let's reset it via node4-reset.sh
, wait some time for cluster initialization and use another command, txverify
:
python query.py localhost:46157 txverify wrong=34
HEIGHT: 3
BAD : Cannot verify tentative result '34'!
txverify
waits for height+1
-th block before responding. This behavior is similar to op
command logic. As before 3rd block formation is successful but it's not enough for txverify
, it waits for 4th block. After some timeout, it responds that this block is still not available, so tentative 34
value is not confirmed.
The App itself also monitors block creation. By checking it (screen -x app1
) one can observe the following message in the App's log:
NO CLUSTER QUORUM!
This message produced by Monitor thread of the App that checks the following condition periodically: if 1 second elapsed from last non-empty block in the blockchain there must be an empty block after that block. When developing production system such criterion can also be used to detect such kind of dispute and signal the Supervisor that cluster needs to be fixed. Of course, the timeout value (default is 1 second) is configurable.
Dispute case 2: dishonest quorum, minority of honest nodes
This case can also be illustrated using wrong
key:
python query.py localhost:46157 txverify wrong=234
HEIGHT: 3
BAD : Cannot verify tentative result '234'!
This message is the same as before but the situation is different actually. All nodes except Node 1 return wrong app hash to its TM's, but now those 'wrong' nodes have quorum! Therefore the result is not confirmed only from the point of view of Node 1. By checking it's log (screen -x app1
) another Monitor warning can be observed:
DISAGREEMENT WITH CLUSTER QUORUM!
To achieve this detection the App's Monitor periodically requests its peer's TM Core RPC's for the next block and compares its own app_hash
with their app_hash
-es.
Let's reset the cluster and try again, submit the same transaction, but connect to another node:
python query.py localhost:46257 txverify wrong=234
HEIGHT: 3
OK
RESULT: 234
PROOF : a7ff...
As expected, from the point of view of 'wrong' nodes everything is OK and tentative result is confirmed later by querying wrong
key with merkelizing explicitly.
This example shows that in presence of dishonest quorum Tendermint safety is violated and the blockchain is in a falsified state. However, for production system, the App's Monitor can efficiently detect such problem and raise the dispute to the Supervisor.
Dispute case 3: honest quorum, some nodes dishonest or not available
When quorum (2/3+ nodes of the cluster) exists availability of other nodes does not influence cluster's safety or liveness. This demo app does not implement any special checks for the existence of nodes absent or Byzantine during operation processing. Let's illustrate this using wrong
key:
python query.py localhost:46157 txverify wrong=4
HEIGHT: 3
OK
RESULT: 4
PROOF : a7ff...
It's supposed that if some node disagrees with the quorum, it needs to alert itself like for Dispute case 2, so cases 2 and 3 are symmetric in general.