nanocryk · 3aefe62d · 2e1fa593 · 7d4950ee · 4ded65d8 · af6e52e2
--- a/rfc/0001 Abstract Syntax Tree-based output script locks.md 0 → 100644

+ 482

− 0

View file @ 3e261ab6

Open in Web IDE
+++ b/rfc/0001 Abstract Syntax Tree-based output script locks.md 0 → 100644

+ 482

− 0

View file @ 3e261ab6

Open in Web IDE
+# Abstract Syntax Tree-based output script locks
+```txt
+RFC: 1 (Update 2)
+Title: Abstract Syntax Tree-based output script locks
+Type: Protocol change (Hardfork)
+Parent: None
+Status: WIP
+Author: nanocryk <nanocryk@gmail.com>
+Created: 2017-11-22
+Last edited: 2017-12-20
+License: AGPL-3
+```
+> This document is currently being written and thus it's not complete. Specifications may change in the future.
+## Abstract
+In the current Duniter protocol input and output conditions are stored in machine-readable BNF text format. This is good for human readability, but it requires parsing and takes place due to its textual format.
+To lower blocksize and transactions weights, or to allow more complex conditions; it could be preferable to use a more simple binary format which need little or no parsing, close to memory and that can be converted back to text format. It can also allow to add new features without hard-forking the protocol.
+## Table of Contents
+1. [Basic script structure](#1-basic-script-structure)
+    1. [Scripts as ASTs](#11-scripts-as-asts)
+    1. [Signatures (`SIG`)](#12-signatures-sig)
+    1. [Unlock inputs (`INPUT`)](#13-unlock-inputs-input)
+    1. [Avoid redudancies (`INCLUDE`)](#14-avoid-redudancies-include)
+    1. [Avoid providing unused branches (`AST_HASH`/`EXEC_AST_HASH`)](#15-avoid-providing-unused-branches-branch_hashscript_hash)
+1. [Binary format](#2-binary-format)
+    1. [Document syntax](#21-document-syntax)
+    1. [Values](#22-values)
+    1. [Operators](#23-operators)
+        1. [Language operators (`[0x00;0x0f]`)](#231-language-operators-0x000x0f)
+        1. [Document data (`[0x10;0x2f]`)](#232-document-data-0x100x2f)
+        1. [Boolean operators (`[0x30;0x3f]`)](#233-boolean-operators-0x300x3f)
+        1. [Arithmetic operators (`[0x40;0x6f]`)](#234-arithmetic-operators-0x400x6f)
+        1. [Locks (`[0x70;0x9f]`)](#235-locks-0x700x9f)
+        1. [AST group format](#24-ast-group-format)
+        1. [Parameter group format](#25-parameter-group-format)
+1. [Examples](#3-examples)
+    1. [Simple signature](#31-simple-signature)
+    1. [2 out of 3 signatures (simple)](#32-2-out-of-3-signatures-simple)
+    1. [2 out of 3 signatures (merkelized + includes)](#33-2-out-of-3-signatures-merkelized-includes)
+## 1. Basic script structure
+### 1.1. Scripts as ASTs
+An output script allow to define conditions on how funds can be spent. The most basic case is to lock funds with a given public key, and requiring a signature (made with the associated private key) to use them as input of a new transaction.
+Since we define a *condition*, we can express it as a *boolean function*.
+We want to have multiple ways of spending an output, such as multi-signatures or time locks. It means we must have *operators* which combines conditions into one. One way to store these functions are as *Abstract Syntax Trees* : Each operator is a tree with its inputs as leaves. This leaves can be primitive values or other operators.
+For a classic 1-signature lock, an AST could look like :
+```text
+           SIG
+            |
+      -------------
+  PublicKey   Signature
+```
+- `SIG` would be an operator verifying the association between the signature and the public key.
+- `PublicKey` is stored in the script.
+- `Signature` is provided as an input when unlocking funds.
+If we want a *1 out of 2 signatures*, we could have :
+```txt
+                          OR
+                          |
+             ----------------------------
+            SIG                        SIG
+             |                          |
+      ---------------            --------------
+  PublicKeyA   SignatureA   PublicKeyB   SignatureB
+```
+- `OR` would be an operator returning `True` if at least one input is `True`.
+For a *2 out of 3 signatures*, we could have :
+```txt
+                                        GREAT_EQUAL
+                                        -----------
+                                       SUM        2
+                                        |
+             ----------------------------------------------------
+            SIG                        SIG                      SIG
+             |                          |                        |
+      ---------------            --------------           ---------------
+  PublicKeyA   SignatureA   PublicKeyB   SignatureB   PublicKeyC   SignatureC
+```
+- `SUM` would be an operator computing the sum of its leaves.
+- `GREAT_EQ` would be an operator comparing if the *left leaf* is *greater or equal* to the *right leaf*.
+### 1.2. Signatures (`SIG`)
+We want our system to cover as many cases as possible. Currently the Duniter protocol use only [Ed25519 signatures][ed25519], but it could in the future use othe schemes such as [Schnorr signatures][schnorr].
+[ed25519]: https://en.wikipedia.org/wiki/EdDSA
+[schnorr]: https://en.wikipedia.org/wiki/Schnorr_signature
+We also want the impossibility to change a script defined as an unlock condition, so the content of the script must be signed, and this signature required to unlock the funds.
+It means multiple things :
+- Our `SIG` operator should be usable with multiple cryptographic systems. We must have a parameter provided which system is used.
+- Since keys and signatures lengths may vary from a system to another, it should accept inputs of any length.
+- We'll use document signatures as signature source. We need an operator to fetch this data from the document.
+With these requirements we have :
+- `SIG` operator defined as `SIG(System, PublicKey, Signature)`.
+- An operator capable of providing data of any length.
+- An operator capable of fetching data outside of script.
+### 1.3 Unlock inputs (`INPUT`)
+When a user want to spend an output, it must provide inputs making the lock script returns true.
+They are stored as list of pairs of indexes and ASTs.
+A script can use an `INPUT(Index)` operator returning the input.
+### 1.4. Avoid redudancies (`INCLUDE`)
+Some parts of a script may required in multiple places the same computed result. To avoid redudancies, we allow to provide an script with multiple ASTs. Each AST is executed in order and only the output of the last one is taken as the result. An AST can use the result of an above script with an `INCLUDE` operator. With this structure it's not possible to have recursion or cycles which avoid infinite loops and Turing-Completeness. All ASTs are grouped into a single structure (an **AST group** or a **script**).
+### 1.5. Avoid providing unused branches (`AST_HASH`/`EXEC_AST_HASH`)
+It could be bad for privacy to expose a complete script with all of the possibilities of spending an output. With merkelized scripts, we allow the user to provide only the hash of unused branches which can be used to prove it's the same script while not providing its complete definition.
+We define an AST hash (on 32 bytes) as `HASH(AST) = SHA256(OperatorCode + HASH(Child0) + HASH(Child1) + ...)`. Since we want to use provide only the hash of a branch while keeping the script valid, we defined an operator `AST_HASH(Hash)` for which `HASH(AST_HASH(Hash)) = Hash`. To allow only hashes of unused branches, this operator will return an `Undefined` value and should be used with aggregators such as `OR` or `SUM`.
+Because we are using groups, we can define the group hash as `GHASH(Group) = SHA256(HASH(Group.0) + HASH(Group.1) + ...)`
+Then we can define a `EXEC_AST_HASH` operator taking the group and it's `GHASH`. It will first verify that the hash is correct, then execute it.
+## 2. Binary format
+### 2.1. Document syntax
+- Operators are written in **CAPS**.
+- Arguments are written as `<Name:Type>`.
+- Type can be :
+  - An in-place sized value (in bytes) `1`, `2`, ...
+  - An AST returning a sized value : `AST:1`, `AST:2`, ...
+Non fixed sizes can be expressed as `?1`, `?2`, etc. Multiple parameters with the same tag `?1` mut have the same size.
+The script is invalid if operator arguments have wrong sizes.
+> These rules only apply for this document and won't appear in the binary format. They just set rules to respect at script validation and execution.
+### 2.2. Values
+All values are raw arrays of bytes. Their meaning depends of their usage and with which operators they'll be used.
+There is an alias `B` for boolean values of size `1`. Value `0` means `False`, and any other value means `True`. Boolean operators however **must** return `0` or `1`. This allow to easily *sum* boolean results together.
+### 2.3. Operators
+We store an operator as a **1 byte** value representing its type :
+#### 2.3.1. Language operators (`[0x00;0x0f]`)
+- `0x00` or `VALUE:Size <Size:1> <Value:Size>`  
+  Return given `Value` of given `Size`.  
+  `Value` must have a size of `Size` bytes or the script is unreadable and thus invalid.
+- `0x01` or `INPUT:? <Index:1>`  
+  Return an input parameters as an *AST*.  
+  It this input don't exist, `Undefined` is returned.
+  If this signature don't exist, `Undefined` is returned.  
+- `0x03` or `INCLUDE:? <Index:1>`  
+  Return the `VALUE` returned by another script.
+- `0x04` or `AST_HASH:32 <Hash:32>`  
+  Return `Undefined`.  
+  It's hash is the provided `Hash`.
+- `0x05` or `EXEC_AST_HASH:? <Hash:32> <Script:AST:1>`  
+  Return script result `VALUE` is the hash is correct, `Undefined` otherwise.
+- `0x06` or `RESIZE:Size <Size:1> <Value:AST:?>`  
+  Resize the value by truncating or adding padding (`0x00`).
+#### 2.3.2. Document data (`[0x10;0x2f]`)
+- `0x10` or `FETCH_SIG:32 <Index:1>`  
+  Return the signature (hash) from the document signatures. This hash will be used to retrieve the real signature by the `SIG` operator.
+- `0x11` or `FETCH_LOCKTIME:8`  
+  Return the timestamp of the locking transaction.
+- `0x12` or `FETCH_UNLOCKTIME:8`  
+  Return the timestamp of the (new) unlocking transaction. 
+#### 2.3.3. Boolean operators (`[0x30;0x3f]`)
+- `0x30` or `NOT:B <Script:AST:B>`  
+  Return `False` if `Script` returns true, `True` otherwise.
+- `0x31` or `NORM:B` `<Script:AST:B>`  
+  Return `False` if `Script` returns false, `True(1)` otherwise.  
+  Allow to normalise any `True(...)` value to `True(1)`.
+- `0x32` or `OR:B <Count:1> <Script0:AST:B> <Script1:AST:B> ...`  
+  Return `True` if at least one child returns `True`, `False` otherwise.  
+  `Undefined` values are considered `False`.
+- `0x33` or `AND:B <Count:1> <Script0:AST:B> <Script1:AST:B> ...`  
+  Return `False` if at least one child returns `False`, `True` otherwise.
+- `0x34` or `XOR:B <Script0:AST:B> <Script1:AST:B>`  
+  Return `True` if one but not both returns `True`.
+#### 2.3.4. Arithmetic operators (`[0x40;0x6f]`)
+- `0x40` or `EQUAL:B <Script0:AST:?> <Script1:AST:?>`  
+  `True` if values are equal, `False` otherwise.
+- `0x41` or `NEQUAL:B <Script0:AST:?> <Script1:AST:?>`  
+  `True` if values are not equal, `False` otherwise.
+- `0x42` or `LESS:B <Script0:AST:?> <Script1:AST:?>`  
+  `True` if `Script0` return value is less than `Script1` return value.
+- `0x43` or `GREAT:B <Script0:AST:?> <Script1:AST:?>`  
+  `True` if `Script0` return value is greater than `Script1` return value.
+- `0x44` or `LESS_EQUAL:B <Script0:AST:?> <Script1:AST:?>`  
+  `True` if `Script0` return value is less or equal than `Script1` return value.
+- `0x45` or `GREAT_EQUAL:B <Script0:AST:?> <Script1:AST:?>`  
+  `True` if `Script0` return value is greater or equal than `Script1` return value.
+- `0x50` or `SUM:?1 <Count:1> <Script0:AST:?1> <Script1:AST:?1> ...`  
+  Return the sum of given values and discard any overflow. If you don't want overflows, try to `RESIZE` values first. `Undefined` values are ignored.
+#### 2.3.5. Locks (`[0x70;0x9f]`)
+- `0x70` or `PASS:B <PasswordHash:32> <Password:AST:32>`  
+  Return `True` if `PasswordHash = SHA256(Password)`.  
+  `Password` must be 32 bytes long, so the original password should be hashed a first time to have `Password`, then a second time to have `PasswordHash`.
+- `0x71` or `SIG:B <Cryptosystem:2> <PublicKeyHash:32> <SignatureHash:AST:32>`  
+  Verify the signature of the transaction with the public key.
+  To avoid dynamic sizes, only the hash is provided and will allow to fetch the real signature.
+### 2.4. AST group format
+An **AST group** is a list of ASTs of a fixed size.
+```txt
+3
+<AST 0>
+<AST 1>
+<AST 2>
+```
+They are executed in order and a AST can `INCLUDE` a result from an previous AST by its zero-based index.
+The last AST must return a **1 byte** boolean value (`0` is `False`, `True` otherwise).
+**This is the structure storing the unlocking conditions of an ouput.**
+### 2.5. Parameter group format
+A **parameter group** is a map of **AST groups** associated with their *own parameter group*.
+```txt
+3   (3 elements)
+0 <Parameter group for 0> <Group 0>
+1 <Parameter group for 1> <Group 1>
+2 <Parameter group for 2> <Group 2>
+```
+This structure allow groups to be simple values or complex merkelized scripts with *includes*. Indices are provided to avoid listing unused parameters. `INPUT` and `INCLUDE` in a script will fetch in the same level group/parameter group.
+> It can become very complicated, there are examples below.
+## 3. Examples
+### 3.1. Simple signature
+> All `< >` in these examples are placeholders and must be replaced with appropriate data.
+Lock :
+```txt
+1                     (1 ast)
+  SIG
+    0                 (if 0 corresponds to Ed25519)
+    <PublicKeyHash>   (on 32 bytes)
+    INPUT 0           (signature on 32 bytes)
+```
+Lock in hex :
+```txt
+01
+  71 00 <PublicKeyHash> 01 00
+```
+Unlocking :
+```txt
+1                     (1 parameter)
+  1                   (parameter 1)
+    0                 (no parameters in this group)
+    1                 (1 ast)
+      FETCH_SIG 0
+```
+Unlocking in hex :
+```txt
+01
+  01
+    00
+    01
+      10 00
+```
+### 3.2. 2 out of 3 signatures (simple)
+Lock :
+```txt
+1
+  GREAT_EQUAL
+    SUM 3
+      SIG 0 <PKHash0> INPUT 0
+      SIG 0 <PKHash1> INPUT 1
+      SIG 0 <PKHash2> INPUT 2
+    2
+```
+Lock in hex :
+```txt
+01
+  45
+    50 03
+      71 00 <PKHash0> 01 00
+      71 00 <PKHash1> 01 01
+      71 00 <PKHash2> 01 02
+    02
+```
+Unlocking :
+```txt
+2               (2 parameters)
+  0               (parameter 0)
+    0             (no parameters in this group)
+    1             (1 ast)
+      FETCH_SIG 0   (doc signature 0)
+  2               (parameter 2)
+    0             (no parameters in this group)
+    1             (1 ast)
+      FETCH_SIG 1   (doc signature 1)
+```
+Unlocking in hex :
+```txt
+02
+  00
+    00
+    01
+      10 00
+  02
+    00
+    01
+      10 01
+```
+## 3.3. 2 out of 3 signatures (merkelized + includes)
+> Node : This one is overcomplicated and is provided as an example of what it's possible to do. However it's very simple compared to the far more complex contracts you can make with that.
+Original Merkelized AST :
+```txt
+4                           (4 asts)
+  SIG 0 <PKHash0> INPUT 0   (sig 0)
+  SIG 0 <PKHash1> INPUT 1   (sig 1)
+  SIG 0 <PKHash2> INPUT 2   (sig 2)
+  GREAT_EQUAL
+    SUM 3
+      INCLUDE 0             (include above script 0)
+      INCLUDE 1             (include above script 1)
+      INCLUDE 2             (include above script 2)
+    2
+```
+Original Merkelized AST in hex :
+```txt
+04
+  71 00 <PKHash0> 01 00
+  71 00 <PKHash1> 01 01
+  71 00 <PKHash2> 01 02
+  45
+    50 03
+      03 00
+      03 01
+      03 02
+    02
+```
+Let's consider `<MHASH>` as the hash of the AST above.
+Lock :
+```txt
+1                           (1 ast)
+  EXEC_AST_HASH               (execute correct script in param 0)
+    <MHASH>
+    INPUT 0
+```
+Lock in hex :
+```txt
+01
+  05 <MHASH> 01 00
+```
+> This lock is contained in the transaction, and the merkelized AST need to be stored offchain.
+> It can then be only known by the participants and used partialy at unlock as shown below.
+Unlock :
+```txt
+1                               (1 parameter (the merklized script))
+  0                               (our script)
+    2                               (2 parameters for our script (signatures))
+      0                               (parameter 0)
+        0                               (no parameters in this group)
+        1                               (1 ast)
+          FETCH_SIG 0                     (doc signature 0)
+      2                               (parameter 2)
+        0                               (no parameters in this group)
+        1                               (1 ast)
+          FETCH_SIG 1                     (doc signature 1)
+    4                                (our script have 4 asts)
+      SIG 0 <PKHash0> INPUT 0          (sig 0)
+      AST_HASH <Hash>                  (we provide only the hash of the unused AST)
+      SIG 0 <PKHash2> INPUT 2          (sig 2)
+      GREAT_EQUAL
+        SUM 3
+          INCLUDE 0             (include above script 0)
+          INCLUDE 1             (include above script 1 / could be replaced with an AST_HASH too)
+          INCLUDE 2             (include above script 2)
+        2
+```
+Unlock in hex :
+> Same procedure as before
+Notes :
+In this exemple merkelized scripts allow to hide each public keys needed to spend the output, and `PKHash1` is never sent on the blockchain. Also, it's impossible to know what was the content of the branch, so it could have been any test of any complexity and nobody will never now it by looking at the blockchain.
+In this case it takes more place than the classic way, but for far more complicated contracts it'll be shorter to provide just a 32 bytes hash than the complete branch.
+\ No newline at end of file