fixes after review

This commit is contained in:
Dmitry Sergeev 2018-06-15 11:47:43 +05:00
parent 82e046c073
commit 497c40f768
17 changed files with 14 additions and 243 deletions

View File

@ -1,4 +1,4 @@
# Tendermint research
Research artifacts and tools for Tendermint
[Tendermint demo application](https://github.com/fluencelabs/tendermint_demo/tree/master/tmdemoapp)
[Tendermint demo application](tmdemoapp/docs)

View File

@ -1,132 +0,0 @@
# Tendermint Demo ABCI KVStore on Scala
This is demo application implementing Tendermint ABCI interface. It models in-memory key-value string storage. Key here are hierarchical, `/`-separated. This key hierarchy is *merkelized*, so every node stores Merkle hash of its associated value (if present) and its children.
The application is compatible with `Tendermint v0.19.x` and uses `com.github.jtendermint.jabci` for Java ABCI definitions.
## Installation and running
For single-node run just launch the application:
```bash
sbt run
```
And launch Tendermint:
```bash
# uncomment line below to initialize Tendermint
#tendermint init
# uncomment line below to clear all Tendermint data
#tendermint unsafe_reset_all
tendermint node --consensus.create_empty_blocks=false
```
In case Tendermint launched first, it would periodically try to connect the app until the app started.
## Changing and observing the application state: transactions and queries
Tendermint offers two main ways of interaction with the app: transactions and queries.
Transactions are treated by Tendermint just like arrays of bytes and stored in the blockchain after block formation also just like arrays of bytes. The transaction semantics only make sense for the application once Tendermint delivers a transaction to it. A transaction could (and usually does) change the application state upon being committed and could provide some metadata to verify that it's actually added to the blockchain and applied to the state. However in order to get some trustful information about the committed transaction result one needs to query the blockchain explicitly.
Queries, in comparison with transactions, do not change the state and are not stored in the blockchain. Queries can only be applied to already committed state that's why they could be used in order to get trustful information (signed by quorum during voting for one of existing blocks) just requesting only a single node.
For working with transactions and queries use Python scripts in [`parse`](https://github.com/fluencelabs/tendermint_research/tree/master/parse) directory.
## Making transactions
To set a new key-value mapping use:
```bash
python query.py localhost:46657 tx a/b=10
...
OK
HEIGHT: 2
INFO: 10
```
This would create hierarchical key `a/b` (if necessary) and map it to `10`. `HEIGHT` value could be used later to verify the `INFO` by querying the blockchain.
This script would output the height value corresponding to provided transaction. The height is available upon executing because `query.py` script uses `broadcast_tx_commit` RPC to send transactions to Tendermint. You can later find the latest transactions by running:
```bash
python parse_chain.py localhost:46657
```
This command would output last 50 non-empty blocks in chain with short summary about transactions. Here you can ensure that provided transaction indeed included in the block with height from response. This fact verifies that Tendermint majority (more than 2/3 of configured validator nodes) agreed on including this transaction in the mentioned block which certified by their signatures. Signature details (including information about all Consensus rounds and phases) can be found by requesting Tendermint RPC:
```bash
curl -s 'localhost:46657/block?height=_' # replace _ with actual height number
```
`get` transaction allows to copy a value from one key to another:
```bash
python query.py localhost:46657 tx a/c=get:a/b
...
INFO: 10
```
Submitting an `increment` transaction would increment the referenced key value and copy the old referenced key value to target key:
```bash
python query.py localhost:46657 tx a/d=increment:a/c
...
INFO: 10
```
To prevent Tendermint from declining transaction that repeats one of the previous applied transactions, it's possible to put any characters after `###` at the end of transaction string, this part of string would be ignored:
```bash
python query.py localhost:46657 tx a/d=increment:a/c###again
...
INFO: 11
```
`sum` transaction would sum the values of references keys and assign the result to the target key:
```bash
python query.py localhost:46657 tx a/e=sum:a/c,a/d
...
INFO: 23
```
`factorial` transaction would calculate the factorial of the referenced key value:
```bash
python query.py localhost:46657 tx a/f=factorial:a/b
...
INFO: 3628800
```
`hiersum` transaction would calculate the sum of non-empty values for the referenced key and its descendants by hierarchy (all non-empty values should be integer):
```bash
python query.py localhost:46657 tx c/asum=hiersum:a
...
INFO: 3628856
```
Transactions are not applied in case of wrong arguments (non-integer values to `increment`, `sum`, `factorial` or wrong number of arguments). Transactions with a target key like `get`, `increment`, `sum`, `factorial` return the new value of the target key as `INFO`, but this values cannot be trusted if the serving node is not reliable. To verify the returned `INFO` one needs to `query` the target key explicitly.
In case of massive broadcasting of multiple transactions via `broadcast_tx_sync` or `broadcast_tx_async` RPC, the app would not calculate Merkle hashes during `DeliverTx` processing. Instead it would modify key tree and mark changed paths by clearing Merkle hashes until ABCI `Commit` processing. On `Commit` the app would recalculate Merkle hash along changed paths only. Finally the app would return the resulting root Merkle hash to Tendermint and this hash would be stored as `app_hash` for corresponding height in the blockchain.
Note that described merkelized structure is just for demo purposes and not self-balanced, it would remain efficient only until it the user transactions keep it relatively balanced. Something like [Patricia tree](https://github.com/ethereum/wiki/wiki/Patricia-Tree) should be more appropriate to achieve self-balancing.
## Making queries
Use `get:` queries to read values from KVStore:
```bash
python query.py localhost:46657 query get:a/e
...
RESULT: 23
```
Use `ls:` queries to read key hierarchy:
```bash
python query.py localhost:46657 query ls:a
...
RESULT: e f b c d
```
These commands implemented by requesting `abci_query` RPC (which immediately proxies to ABCI `Query` in the app). Together with requested information the app method would return Merkle proof of this information. This Merkle proof is comma-separated list (`<level-1-proof>,<level-2-proof>,...`) of level proofs along the path to the requested key. For this implementation SHA-3 of a level in the list is exactly:
* either one of the space-separated item from the upper (the previous in comma-separated list) level proof;
* or the root app hash for the uppermost (the first) level proof.
The app stores historical changes and handle queries for any particular height. The requested height (the latest by default) and the corresponding `app_hash` also returned for `query` Python script. This combination (result, Merkle proof and `app_hash` from the blockchain) verifies the correctness of the result (because this `app_hash` could only appear in the blockchain as a result of Tendermint quorum consistent decision).
## Heavy-weight transactions
Applying simple transactions with different target keys makes the sizes of the blockchain (which contains transaction list) and the app state relatively close to each other. If target keys are often repeated, the blockchain size would become much larger than the app state size. To demonstrate the opposite situating (the app state much larger than the blockchain) *range* transactions are supported:
```bash
python query.py localhost:46657 tx 0-200:b/@1/@0=1
...
INFO: 1
```
Here `0-200:` prefix means that this transaction should consist of 200 subsequent key-value mappings, each of them obtained by applying a template `b/@1/@0=1` to a counter from 0 to 199, inclusive. `@0` and `@1` are substitution markers for the two lowermost hexadecimal digits of the counter. I. e. this transaction would create 200 keys: `b/0/0`, `b/0/1`, ..., `b/c/7` and put `1` to each of them.
We can check the result by querying the hierarchical sum of `b` children:
```bash
python query.py localhost:46657 tx c/bsum=hiersum:b
...
INFO: 200
```

View File

@ -1,99 +0,0 @@
# Fluence cluster typical operation processing
Fluence is distributed computations platform. It contains following components:
* Client proxy (Proxy)
* Node Tendermint (TM) with important modules: Mempool, Consensus and Query
* Node ABCI App (App)
Clients typically interact with Fluence via local Proxy. Basically Proxy provides some API like:
```scala
def doSomeOperation(req: SomeRequest): SomeResponse
```
## Normal-case operation
### A. How client sees operation processing
From the client point of view it just calls API function with similar signature synchronously (in blocking mode) or call asynchronous API request function (with provided response callback).
### B. How Proxy sees operation processing
Let's observe how operation processing looks like.
1. Proxy gets API call from the client.
2. Proxy decomposes operation into 2 interactions with cluster: transaction submit and response query.
3. Obtains some state key `opTarget` (it is chosen from some pool of such temporary target keys).
4. For transaction submit Proxy:
* Serializes API call to some string `opTx` like: "opTarget=SomeOperation(reqParam1,reqParam2)". Its binary representation is *transaction* in terms of Tendermint.
* Queries some TM via RPC call: `http://<node_host>:46678/broadcast_tx_commit?tx=<opTx>`.
* In case of correct (without error messages and not timed out) TM response it treats `height` from it as `opHeight` and considers transaction committed (but yet not validated) and proceeds to the next step.
5. Proxy check whether `opHeight`-th block contains `opTx` indeed:
* Queries `http://<node_endpoint>/block?height=<opHeight>`.
* In case of correct TM response it checks for `opTx` existence in transaction list section of response and checks block signature.
* Upon leaving this step Proxy is sure that the cluster has already performed the operation, committed it to the state, but it has no information about reaching consensus for the operation result.
6. Proxy waits for `opHeight+1`-th block to ensure cluster consensus for resulting app hash:
* Waits some small time.
* Starts periodically querying `http://<node_endpoint>/block?height=<opHeight+1>`.
* Once getting successful response, it checks block signatures.
* It also get `app_hash` for response (it corresponds to app hash after `height`-th block).
* Query loop in this step can be replaced with `NewBlock` subscription via WebSocket RPC.
* Upon leaving this step Proxy is sure that the cluster has already performed the operation, wrote it to `opTarget` and reached consensus about `opTarget` value.
7. Proxy queries `opTarget` value:
* It makes RPC call for key-value read with explicit height and claim for proof `http://<node_endpoint>/abci_query?height=<opHeight>&prove=true&path=<opTarget>`.
* It got response containing `value` (interpreted as `opResult`) and `proof`.
* It checks that `opResult`, `proof` and `app_hash` are consistent with each other.
8. Proxy deserialize `opResult` as SomeResponse and returns it to the client.
### C. How Tendermint sees transaction submit
Let's look how Tendermint on some node (say, N) treats transaction submit (step B4) and makes some post-submit checks (B5, B6).
1. TM gets `broadcast_tx_commit` RPC call with `opTx` binary string from Proxy.
2. Mempool processing:
* TM's RPC endpoint tranfers the transaction to TM's *Mempool* module.
* Mempool prepares a callback to provide some RPC response when transaction would be committed or rejected.
* Mempool invokes local App's `CheckTx` ABCI method. If App returns non-zero code then the transaction is considered rejected, this information is sending to the client via callback and no further action happens.
* If App returns zero code the transaction gossip begins: the `opTx` starts spreading through other nodes.
* Also Mempool caches the transaction (in order to not accept repeated broadcasts of `opTx`).
3. Consensus processing:
* When the current TM proposer (*Consensus* module some node PN, PN is possibly N) is ready to create new block it grabs some amount of the oldest yet not committed transactions from local Mempool. If the transaction rate is intensive enough or even exceed TM/App throughput, it is possible that `opTx` may 'wait' during several block formation before it would be grabbed by Consensus.
* As soon as `opTx` and other transactions reaches Consensus module, block election starts. Proposer creates block proposal (that describes all transactions in the current block) for current *round*, then other nodes makes votes. In order to reach consensus for the block, election should pass all consensus stages (propose, pre-vote, pre-commit) with the majority of TM votes (more that 2/3 of TM's). If this doesn't work by some reason (votes time out, Byzantive proposer), proposer changed and a new round starts (possibly with another transaction set for the current block).
4. Post-consensus interaction with the local App:
* When election successfully passed all stages each corrent TM undertands that consensus is reached. Then it invokes App's ABCI methods: `BeginBlock`, `DeliverTx` (for each transaction), `EndBlock`, `Commit`.
* An information from `opTx`' `DeliverTx` call then sent back to Proxy via callback prepared on Step C2.
* `app_hash` field from `Commit` call is stored by TM before making the next block.
5. The new block metadata and transaction set now gets associated via height `height` and becomes available via RPC's like `block`, `blockchain`, `status` (including call in Step B5). However the recently obtained from App block app_hash yet not stored in the blockchain (because an App hash for some block stored in the blockchain metadata for the next block).
6. Next block processing:
* Steps 2-5 repeated for the next, `height+1`-th block. It may take some time, depending on new transactions availability and rate and commit timeout settings.
* The consensus for `height+1`-th block is only possible if the majority (more that 2/3 of TM's) agree about `height`-th block app has. So `app_hash` information in `height+1`-th block header refers to `app_hash` provided on Step C4 for `height`-th block (which is check on Step B6).
### D. How ABCI App sees transaction submit
Now we dig into details of processing the transaction on App side (on node N).
1. On Step C2 TM asks App via `CheckTx` call. This is lightweight checking that works well if some signification part of transaction might be rejected by App by some reason (for example it is inconsistent after applying some recently committed other transaction). This would safe transaction gossip and need of permanent storing this transaction in the blockchain after commit. On this step App can return non-zero code in case of some check made against the latest committed state fails (`CheckTx` is not intended to make some changes, even against temporary internal structure).
* In case `CheckTx` invoked once but `opTx` is not grabbed by proposer's Consensus module for the next block, `CheckTx` would be reinvoked for every subsequent block until `opTx` would evnetually grabbed by proposer (because after some block commit, `opTx` might become incorrect).
2. On Step C4 TM invokes App's ABCI `DeliverTx` method.
* App can reject the transaction (it's OK because lightweight `CheckTx` does not necessary chech any possible failure cases), change nothing and return non-zero code. It this case TM would store the transaction anyway (because the block already formed), but would pass error code and any information from App to Proxy calling `broadcast_tx_commit` RPC.
* Normally App returns zero return code and apply the tranaction to it's state. It maintains the 'real-time' state that already applied all previous changes not only from previous blocks' transactions but even for all previous transactions in the current block.
3. On Step C4 TM also invokes App's ABCI `Commit` method that signals that block commit is over. App must return the actual state hash (*app hash*) as the result. As said before, this app hash would correspond to `height`-th block and be stored in the `height+1`-th block metadata.
Note that Step D2 and D3 behavior should be purely deterministic which should guarantee in normal (non-Byzantine) case scenario both the same app hash from different nodes and the same app hash from a single node after replaying transactions by TM (for example after node fail). This determinism includes transaction acceptance status (accepted or rejected, transaction application to the real-time state and app hash computation logic.
### E. How Tendermint sees response query
Response query initiated by Proxy on Step B7. Queries are processed by TM's Query module. Processing is very straightforward: it's just proxying the query to the local App.
### F. How ABCI App sees response query
Query processing on the App performed in the following way:
1. App gets `height`, `prove` flag and `path` from the query.
2. The query should be applied to the state exactly corresponded to `height`-th block (this is not 'real-time' consensus state and in general not 'mempool' state).
* In case App do not store all block states and `height` is too old, it might reject the query.
* Otherwise it applies query to the corresponding state. Queries might be complex enough but not every query might be proved efficiently that why it's expected that queries are usually look like specific value's read or hierarchical structure scan.
* In case of read query on Step B7 App just reads `opTarget` value previously written by applying `opTx` and committed in `height`-th block.
3. If proof flag requested (as on Step B7) App also produce Merkle path (or some other provable information) that supply `opTarget` value verification with respect to given `height` and it's app hash (from `height+1` block metadata)
4. The response containing value, Merkle proof and any other information are sent back to the local TM.
## Dispute cases
The next sections describe behavior in case of disagreement between cluster nodes about `height`-s app hash.
### Dispute case 1: honest quorum, some nodes dishonest or not available
TODO
### Dispute case 2: some nodes honest, some not, no quorum
TODO
### Dispute case 2: dishonest quorum, minority of honest nodes
TODO

View File

@ -1,18 +1,18 @@
# Tendermint Demo Key-Value Store on Scala
# Tendermint Verifiable Computation and Storage Demo
This is demo application modeling in-memory key-value distributed storage. It allows to store and modify key-value pairs, to request them and to make some operations with their values.
![Key-values in cluster](cluster_key_value.png)
This demo application shows how verifiable computations might be processed by a distributed cluster of nodes. It comes with a set of hardcoded operations that can be invoked by the client. Each requested operation is computed by every (ignoring failures or Byzantine cases) cluster node, and if any node disagrees with the computation outcome it can submit a dispute to an external Judge.
Results of each computation are stored on the cluster nodes and can be later on retrieved by the client. The storage of the results is secured with Merkle proofs, so malicious nodes can't substitute them with bogus data.
Because every computation is verified by the cluster nodes and computation outcomes are verified using Merkle proofs, the client normally doesn't have to interact with the entire cluster. Moreover, the client can interact with as little as a single node this won't change safety properties. However, liveness might be compromised for example if the node the client is interacting with is silently dropping incoming requests.
A *distributed* property means that the app might be deployed across a cluster of several machines (nodes) and tolerate failures of some subset of those machines. At the same time, the client typically interacts with only a single node and the interaction protocol provides some guarantees of availability and consistency.
![Nodes in cluster](cluster_nodes.png)
## Motivation
The application is intended to show a proof-of-concept of a system that provides the following properties:
* Support of arbitrary deterministic operations (including simple reads/writes and complex aggregations, time-consuming calculations etc.)
The application is a proof-of-concept of a system with the following properties:
* Support of arbitrary deterministic operations: simple reads/writes as well as complex and time-consuming calculations
* Having high throughput (1000 transaction per second) and low latency (1-2 seconds) of operations
* Having every operation response verifiable (and thus trusted by the client)
* Either validated by storing all operation data in the blockchain (in this case such data signed by the majority of nodes)
* Or validated by providing Merkle proofs to the client (in this case the client has all required information to validate the response)
* Ability to restore liveness and even safety after violating typical Byzantine quorum requirements (1/3 of failed nodes and more) every node could rapidly detect problems in the blockchain or disagreement with the rest of nodes
## Architecture overview
@ -30,13 +30,15 @@ The application is written in Scala 2.12. It is compatible with `Tendermint v0.1
It models in-memory key-value string storage. Keys here are hierarchical, `/`-separated. This key hierarchy is *merkelized*, so every node stores Merkle hash of its associated value (if present) and its children.
![Architecture](architecture.png)
![Key-values in cluster](cluster_key_value.png)
The entire application consists of the following components:
* **Client** proxy (**Proxy**)
* Node Tendermint (**TM** or **TM Core**) with important modules: Mempool, Consensus and Query
* Node Tendermint (**TM** or **TM Core**) with notable modules: Mempool, Consensus and Query
* Node ABCI Application itself (**App** or **ABCI App**)
![Architecture](architecture.png)
### Operations
Clients typically interact with Fluence via some local **Proxy**. This Proxy might be implemented in any language (because it communicates with TM Core by queries RPC endpoints), for example, Scala implementation of *some operation* may look like `def doSomeOperation(req: SomeRequest): SomeResponse`. This application uses simple (but powerful) Python `query.sh` script as Proxy to perform arbitrary operations, including:
* Write transactions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 72 KiB

After

Width:  |  Height:  |  Size: 449 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 163 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 155 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 139 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 22 KiB

After

Width:  |  Height:  |  Size: 127 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

After

Width:  |  Height:  |  Size: 173 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 25 KiB

After

Width:  |  Height:  |  Size: 177 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 26 KiB

File diff suppressed because one or more lines are too long

Binary file not shown.

Before

Width:  |  Height:  |  Size: 18 KiB

After

Width:  |  Height:  |  Size: 118 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 20 KiB

After

Width:  |  Height:  |  Size: 126 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 22 KiB