Introduction
Floresta is a collection of Rust Bitcoin libraries designed for building a full node, alongside a default node implementation, all developed by Davidson Souza.
A key feature of Floresta is its use of the utreexo accumulator to maintain the UTXO set in a highly compact format. It also incorporates innovative techniques to significantly reduce Initial Block Download (IBD) times with minimal security tradeoffs.
The Utreexo accumulator consists of a forest of merkle trees where leaves are individual UTXOs, thus the name U-Tree-XO. The UTXO set at any moment is represented as the merkle roots of the forest.
The novelty in this cryptographic system is a mechanism to update the forest, both adding new UTXOs and deleting existing ones from the set. When a transaction spends UTXOs we can verify it with an inclusion proof, and then delete those specific UTXOs from the set.
Currently, the node can only operate in pruned mode, meaning it deletes block data after validation. Combined with utreexo, this design keeps storage requirements exceptionally low (< 1 GB).
The ultimate vision for Floresta is to deliver a reliable and ultra-lightweight node implementation capable of running on low-resource devices, democratizing the access to the Bitcoin blockchain. However, keep in mind that Floresta remains highly experimental software ⚠️.
About This Book
This documentation provides an overview of the process involved in creating and running a Floresta node (i.e., a UtreexoNode
). We start by examining the project's overall structure, which is necessary to build a foundation for understanding its internal workings.
The book will contain plenty of code snippets, which are identical to Floresta code sections. Each snippet includes a commented path referencing its corresponding file, visible by clicking the eye icon in the top-right corner of the snippet (e.g. // Path: floresta-chain/src/lib.rs
).
You will also find some interactive quizzes to test and reinforce your understanding of Floresta!
Note on Types
To avoid repetitive explanations, this documentation follows a simple convention: unless otherwise stated any mentioned type is assumed to come from the bitcoin
crate, which provides many of the building blocks for Floresta. If you see a type, and we don't mention its source, you can assume it's a bitcoin
type.
Project Overview
The Floresta project is made up of a few library crates, providing modular node components, and the florestad
binary (i.e. Floresta daemon), assembling components for a full node.
Developers can use the core components in the libraries to build their own wallets and node implementations. They can use individual libraries or use the whole pack of components with the floresta
meta-crate, which just re-exports libraries.
The libraries in Floresta are:
floresta-chain
- Validates the blockchain and tracks the state.
floresta-cli
- Provides command-line interface tools to interact with
florestad
.
- Provides command-line interface tools to interact with
floresta-common
- Contains shared data structures and traits used across other crates.
floresta-compact-filters
- Implements compact filter functionality for wallets.
floresta-electrum
- An Electrum server implementation.
floresta-watch_only
- A watch-only wallet implementation, optimized for Electrum servers.
floresta-wire
- Handles network communication and data fetching.
floresta
- A meta-crate that re-exports these components.
The most important libraries are floresta-chain
, to validate the chain and keep track of the state, and floresta-wire
, to fetch network data. We need both kinds of components to construct a full node.
A Car Analogy
A full node is like a self-driving car that must keep up with a constantly moving destination, as new blocks are produced. The Bitcoin network is the road. Floresta-wire
acts as the car's sensors and navigation system, gathering data from the road (transactions, blocks) and feeding it to Floresta-chain
, which is the engine and control system that moves the car forward. Without Floresta-wire
, the engine and control system wouldn't know what to do, and without Floresta-chain
, the car wouldn't move at all.
Both components must work properly: if Floresta-wire
fails or provides incorrect data, the car will either be paralyzed, unable to reach the destination (blockchain sync), or misled, arriving at the wrong destination (syncing to an incorrect chain). If Floresta-chain
fails, the car might get stuck or follow the wrong path because the control system isn't working properly, even with correct directions from Floresta-wire
.
UtreexoNode
UtreexoNode
is the top-level type in Floresta, responsible for managing P2P connections, receiving network data, and broadcasting transactions. All its logic is found at the floresta-wire
crate.
Blocks fetched by UtreexoNode
are passed to a blockchain backend for validation and state tracking. This backend is represented by a generic Chain
type. Additionally, UtreexoNode
relies on a separate generic Context
type to provide context-specific behavior.
Figure 1: Diagram of the UtreexoNode type.
Below is the actual type definition, which is a struct with two fields and trait bounds for the Chain
backend.
Filename: floresta-wire/src/p2p_wire/node.rs
// Path: floresta-wire/src/p2p_wire/node.rs
pub struct UtreexoNode<Chain: BlockchainInterface + UpdatableChainstate, Context> {
pub(crate) common: NodeCommon<Chain>,
pub(crate) context: Context,
}
The Chain
backend must implement two key traits:
UpdatableChainstate
: Provides methods to update the state of the blockchain backend.BlockchainInterface
: Defines the interface for interacting with other components, such as theUtreexoNode
.
Both traits are located in the floresta-chain
library crate, in src/pruned_utreexo/mod.rs.
In the next section, we will explore the required API for these traits.
Chain Backend API
We will now take a look at the API that the Chain
backend (from UtreexoNode
) is required to expose, as part of the BlockchainInterface
and UpdatableChainstate
traits.
The lists below are only meant to provide an initial sense of the expected chain API.
The BlockchainInterface Trait
The BlockchainInterface
methods are mainly about getting information from the current view of the blockchain and state of validation.
It has an associated generic Error
type bounded by the core2::error::Error
trait. In other words, implementations of BlockchainInterface
can choose their own error type as long as it implements core2::error::Error
.
core2
is a crate providing ano_std
version of thestd::error::Error
trait.
The list of required methods:
get_block_hash
, given a u32 height.get_tx
, given its txid.get_height
of the chain.broadcast
a transaction to the network.estimate_fee
for inclusion in usize target blocks.get_block
, given its hash.get_best_block
hash and height.get_block_header
, given its hash.is_in_ibd
, whether we are in Initial Block Download (IBD) or not.get_unbroadcasted
transactions.is_coinbase_mature
, given its block hash and height (on the mainchain, coinbase transactions mature after 100 blocks).get_block_locator
, i.e. a compact list of block hashes used to efficiently identify the most recent common point in the blockchain between two nodes for synchronization purposes.get_block_locator_for_tip
, given the hash of the tip block. This can be used for tips that are not canonical or best.get_validation_index
, i.e. the height of the last block we have validated.get_block_height
, given its block hash.get_chain_tips
block hashes, including the best tip and non-canonical ones.get_fork_point
, to get the block hash where a given branch forks (the branch is represented by its tip block hash).get_params
, to get the parameters for chain consensus.
Also, we have a subscribe
method which allows other components to receive notifications of new validated blocks from the blockchain backend.
Filename: floresta-chain/src/pruned_utreexo/mod.rs
// Path: floresta-chain/src/pruned_utreexo/mod.rs
pub trait BlockchainInterface {
type Error: core2::error::Error + Send + Sync + 'static;
// ...
fn get_block_hash(&self, height: u32) -> Result<bitcoin::BlockHash, Self::Error>;
fn get_tx(&self, txid: &bitcoin::Txid) -> Result<Option<bitcoin::Transaction>, Self::Error>;
fn get_height(&self) -> Result<u32, Self::Error>;
fn broadcast(&self, tx: &bitcoin::Transaction) -> Result<(), Self::Error>;
fn estimate_fee(&self, target: usize) -> Result<f64, Self::Error>;
fn get_block(&self, hash: &BlockHash) -> Result<Block, Self::Error>;
fn get_best_block(&self) -> Result<(u32, BlockHash), Self::Error>;
fn get_block_header(&self, hash: &BlockHash) -> Result<BlockHeader, Self::Error>;
fn subscribe(&self, tx: Arc<dyn BlockConsumer>);
// ...
fn is_in_idb(&self) -> bool;
fn get_unbroadcasted(&self) -> Vec<Transaction>;
fn is_coinbase_mature(&self, height: u32, block: BlockHash) -> Result<bool, Self::Error>;
fn get_block_locator(&self) -> Result<Vec<BlockHash>, Self::Error>;
fn get_block_locator_for_tip(&self, tip: BlockHash) -> Result<Vec<BlockHash>, BlockchainError>;
fn get_validation_index(&self) -> Result<u32, Self::Error>;
fn get_block_height(&self, hash: &BlockHash) -> Result<Option<u32>, Self::Error>;
fn update_acc(
&self,
acc: Stump,
block: UtreexoBlock,
height: u32,
proof: Proof,
del_hashes: Vec<sha256::Hash>,
) -> Result<Stump, Self::Error>;
fn get_chain_tips(&self) -> Result<Vec<BlockHash>, Self::Error>;
fn validate_block(
&self,
block: &Block,
proof: Proof,
inputs: HashMap<OutPoint, TxOut>,
del_hashes: Vec<sha256::Hash>,
acc: Stump,
) -> Result<(), Self::Error>;
fn get_fork_point(&self, block: BlockHash) -> Result<BlockHash, Self::Error>;
fn get_params(&self) -> bitcoin::params::Params;
}
// Path: floresta-chain/src/pruned_utreexo/mod.rs
pub enum Notification {
NewBlock((Block, u32)),
}
Any type that implements the BlockConsumer
trait can subscribe
to our BlockchainInterface
by passing a reference of itself, and receive notifications of new blocks (including block data and height). In the future this can be extended to also notify transactions.
Validation Methods
Finally, there are two validation methods that do NOT update the node state:
update_acc
, to get the new accumulator after applying a new block. It requires the current accumulator, the new block data, the inclusion proof for the spent UTXOs, and the hashes of the spent UTXOs.validate_block
, which instead of only verifying the inclusion proof, validates the whole block (including its transactions, for which the spent UTXOs themselves are needed).
The UpdatableChainstate Trait
On the other hand, the methods required by UpdatableChainstate
are expected to update the node state.
These methods use the BlockchainError
enum, found in pruned_utreexo/error.rs. Each variant of BlockchainError
represents a kind of error that is expected to occur (block validation errors, invalid utreexo proofs, etc.). The UpdatableChainstate
methods are:
Very important
connect_block
: Takes a block and utreexo data, validates the block and adds it to our chain.accept_header
: Checks a header and saves it in storage. This is called beforeconnect_block
, which is responsible for accepting or rejecting the actual block.
// Path: floresta-chain/src/pruned_utreexo/mod.rs
pub trait UpdatableChainstate {
fn connect_block(
&self,
block: &Block,
proof: Proof,
inputs: HashMap<OutPoint, TxOut>,
del_hashes: Vec<sha256::Hash>,
) -> Result<u32, BlockchainError>;
// ...
fn switch_chain(&self, new_tip: BlockHash) -> Result<(), BlockchainError>;
fn accept_header(&self, header: BlockHeader) -> Result<(), BlockchainError>;
// ...
fn handle_transaction(&self) -> Result<(), BlockchainError>;
fn flush(&self) -> Result<(), BlockchainError>;
fn toggle_ibd(&self, is_ibd: bool);
fn invalidate_block(&self, block: BlockHash) -> Result<(), BlockchainError>;
fn mark_block_as_valid(&self, block: BlockHash) -> Result<(), BlockchainError>;
fn get_root_hashes(&self) -> Vec<BitcoinNodeHash>;
fn get_partial_chain(
&self,
initial_height: u32,
final_height: u32,
acc: Stump,
) -> Result<PartialChainState, BlockchainError>;
fn mark_chain_as_assumed(&self, acc: Stump, tip: BlockHash) -> Result<bool, BlockchainError>;
}
Usually, in IBD we fetch a chain of headers with sufficient PoW first, and only then do we ask for the block data (i.e. the transactions) in order to verify the blocks. This way we ensure that DoS attacks sending our node invalid blocks, with the purpose of wasting our resources, are costly because of the required PoW.
Others
switch_chain
: Reorg to another branch, given its tip block hash.handle_transaction
: Process transactions that are in the mempool.flush
: Writes pending data to storage. Should be invoked periodically.toggle_ibd
: Toggle the IBD process on/off.invalidate_block
: Tells the blockchain backend to consider this block invalid.mark_block_as_valid
: Overrides a block that was marked as invalid, considering it as fully validated.get_root_hashes
: Returns the root hashes of our utreexo accumulator.get_partial_chain
: Returns aPartialChainState
(a Floresta type allowing to validate parts of the chain in parallel, explained in Chapter 5), given the height range and the initial utreexo state.mark_chain_as_assumed
: Given a block hash and the corresponding accumulator, assume every ancestor block is valid.
Using ChainState as Chain Backend
Recall that
UtreexoNode
is the high-level component of Floresta that will interact with the P2P network. It is made of itsContext
and aNodeCommon
that holds aChain
backend (for validation and state tracking). This and the following chapters will focus solely on the Floresta chain backend.
In this chapter we will learn about ChainState
, a type that implements UpdatableChainstate + BlockchainInterface
, so we can use it as Chain
backend.
ChainState
is the default blockchain backend provided by Floresta, with all its logic encapsulated within the floresta-chain
library.
The following associations clarify the module structure:
UtreexoNode
declaration and logic resides infloresta-wire
ChainState
declaration and logic resides infloresta-chain
ChainState Structure
The job of
ChainState
is to validate blocks, update the state and store it.
For a node to keep track of the chain state effectively, it is important to ensure the state is persisted to some sort of storage system. This way the node progress is saved, and we can recover it after the device is turned off.
That's why ChainState
uses a generic PersistedState
type, bounded by the ChainStore
trait, which defines how we interact with our persistent state database.
Figure 2: Diagram of the ChainState type.
Filename: pruned_utreexo/chain_state.rs
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
pub struct ChainState<PersistedState: ChainStore> {
inner: RwLock<ChainStateInner<PersistedState>>,
}
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
impl<PersistedState: ChainStore> BlockchainInterface for ChainState<PersistedState> {
// Implementation of BlockchainInterface for ChainState
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
impl<PersistedState: ChainStore> UpdatableChainstate for ChainState<PersistedState> {
// Implementation of UpdatableChainstate for ChainState
As Floresta is currently only pruned, the expected database primarily consists of the block header chain and the utreexo accumulator; blocks themselves are not stored.
The default implementation of ChainStore
is KvChainStore
, which uses a key-value store. This means that developers may:
- Implement custom
UpdatableChainstate + BlockchainInterface
types for use asChain
backends. - Use the provided
ChainState
backend:- With their own
ChainStore
implementation. - Or the provided
KvChainStore
.
- With their own
Next, let’s build the ChainState
struct step by step!
The ChainStore Trait
ChainStore
is a trait that abstracts the persistent storage layer for the FlorestaChainState
backend.
To create a ChainState
, we start by building its ChainStore
implementation. We will see the required interface and then take a look at the provided KvChainStore
.
ChainStore API
The methods required by ChainStore
, designed for interaction with persistent storage, are:
save_roots
/load_roots
: Save or load the utreexo accumulator (merkle roots).save_height
/load_height
: Save or load the current chain tip data.save_header
/get_header
: Save or retrieve a block header by itsBlockHash
.get_block_hash
/update_block_index
: Retrieve or associate aBlockHash
with a chain height.flush
: Immediately persist saved data that is still in memory. This ensures data recovery in case of a crash.
In other words, the implementation of these methods should allow us to save and load:
- The current accumulator (serialized as a
Vec<u8>
). - The current chain tip data (as
BestChain
). - Block headers (as
DiskBlockHeader
), associated to the block hash. - Block hashes (as
BlockHash
), associated to a height.
BestChain
and DiskBlockHeader
are important Floresta types that we will see in a minute. DiskBlockHeader
represents stored block headers, while BestChain
tracks the chain tip metadata.
With this data we have a pruned view of the blockchain, metadata about the chain we are in, and the compact UTXO set (the utreexo accumulator).
Figure 3: Diagram of the ChainStore trait.
ChainStore
also has an associated Error
type for the methods:
Filename: pruned_utreexo/mod.rs
// Path: floresta-chain/src/pruned_utreexo/mod.rs
pub trait ChainStore {
type Error: DatabaseError;
fn save_roots(&self, roots: Vec<u8>) -> Result<(), Self::Error>;
// ...
fn load_roots(&self) -> Result<Option<Vec<u8>>, Self::Error>;
fn load_height(&self) -> Result<Option<BestChain>, Self::Error>;
fn save_height(&self, height: &BestChain) -> Result<(), Self::Error>;
fn get_header(&self, block_hash: &BlockHash) -> Result<Option<DiskBlockHeader>, Self::Error>;
fn save_header(&self, header: &DiskBlockHeader) -> Result<(), Self::Error>;
fn get_block_hash(&self, height: u32) -> Result<Option<BlockHash>, Self::Error>;
fn flush(&self) -> Result<(), Self::Error>;
fn update_block_index(&self, height: u32, hash: BlockHash) -> Result<(), Self::Error>;
}
Hence, implementations of ChainStore
are free to use any error type as long as it implements DatabaseError
. This is just a marker trait that can be automatically implemented on any T: std::error::Error + std::fmt::Display
. This flexibility allows compatibility with different database implementations.
Now, we’ll explore KvChainStore
, Floresta’s default ChainStore
implementation, to see how these methods are applied in practice.
The KvChainStore Type
KvChainStore
is a struct built with kv
, a lightweight crate that wraps the sled
high-performance embedded database. From the kv
crate description:
kv is a simple way to embed a key/value store in Rust applications. It is built using sled and aims to be as lightweight as possible while still providing a nice high level interface.
One of the main concepts in kv
is the bucket, a collection of related key-value pairs. We use three buckets in KvChainStore
:
index
: Maps heights (K
) to block hashes (V
) for quick block hash lookups by height.header
: Maps block hashes (K
) to block headers (V
), enabling header retrieval by hash.meta
(the default bucket), whereK
is a string:- If
K = "roots"
, the storedV
is the utreexo accumulator. - If
K = "height"
, the storedV
is the best chain data.
- If
Filename: pruned_utreexo/chainstore.rs
// Path: floresta-chain/src/pruned_utreexo/chainstore.rs
pub struct KvChainStore<'a> {
_store: Store,
headers: Bucket<'a, Vec<u8>, Vec<u8>>,
index: Bucket<'a, Integer, Vec<u8>>,
meta: Bucket<'a, &'a str, Vec<u8>>,
headers_cache: RwLock<HashMap<BlockHash, DiskBlockHeader>>,
index_cache: RwLock<HashMap<u32, BlockHash>>,
}
The first field holds the kv::Store
itself, but it is not accessed directly; interaction with the database occurs through the buckets. We then find the headers
, index
and meta
buckets, with serialized values.
Finally, headers_cache
and index_cache
are in-memory HashMap
s, protected by locks for thread-safe reads/writes, to temporarily store header and index data before writing to disk.
KvChainStore Builder
Filename: pruned_utreexo/chainstore.rs
// Path: floresta-chain/src/pruned_utreexo/chainstore.rs
impl<'a> KvChainStore<'a> {
pub fn new(datadir: String) -> Result<KvChainStore<'a>, kv::Error> {
// Configure the database
let cfg = Config::new(datadir + "/chain_data").cache_capacity(100_000_000);
// Open the key/value store
let store = Store::new(cfg)?;
Ok(KvChainStore {
headers: store.bucket(Some("headers"))?,
index: store.bucket(Some("index"))?,
meta: store.bucket(None)?,
_store: store,
headers_cache: RwLock::new(HashMap::new()),
index_cache: RwLock::new(HashMap::new()),
})
}
}
The KvChainStore
builder function initializes the store in datadir/chain_data
and configures a 100 MB cache for the sled pagecache.
This cache reduces disk accesses by temporarily storing writes and retrieving frequently accessed data. However, this kv
-specific cache is used exclusively for the meta
bucket, as the index
and headers
buckets have their own HashMap
cache.
Since KvChainStore
uses kv::Error
, we implement DatabaseError
for it in pruned_utreexo/error.rs:
// Path: floresta-chain/src/pruned_utreexo/error.rs
impl DatabaseError for kv::Error {}
Now, let's understand how we use the buckets and the HashMap
caches.
ChainStore Trait Implementation
For the save_roots
, load_roots
, save_height
and load_height
methods, we directly use the meta
bucket:
- We call
set
on the bucket to keep the data in thesled
cache, which is eventually written to disk. - We call
get
on the bucket to access data from thesled
cache or directly read it from disk.
Filename: pruned_utreexo/chainstore.rs
// Path: floresta-chain/src/pruned_utreexo/chainstore.rs
impl ChainStore for KvChainStore<'_> {
type Error = kv::Error;
fn load_roots(&self) -> Result<Option<Vec<u8>>, Self::Error> {
self.meta.get(&"roots")
}
fn save_roots(&self, roots: Vec<u8>) -> Result<(), Self::Error> {
self.meta.set(&"roots", &roots)?;
Ok(())
}
// ...
fn load_height(&self) -> Result<Option<BestChain>, Self::Error> {
let height = self.meta.get(&"height")?;
if let Some(height) = height {
return Ok(Some(deserialize(&height).unwrap()));
}
Ok(None)
}
fn save_height(&self, height: &BestChain) -> Result<(), Self::Error> {
let height = serialize(height);
self.meta.set(&"height", &height)?;
Ok(())
}
fn get_header(&self, block_hash: &BlockHash) -> Result<Option<DiskBlockHeader>, Self::Error> {
match self.headers_cache.read().get(block_hash) {
Some(header) => Ok(Some(*header)),
None => {
let block_hash = serialize(&block_hash);
Ok(self
.headers
.get(&block_hash)?
.and_then(|b| deserialize(&b).ok()))
}
}
}
fn flush(&self) -> Result<(), Self::Error> {
// save all headers in batch
let mut batch = Batch::new();
for header in self.headers_cache.read().iter() {
let ser_header = serialize(header.1);
let block_hash = serialize(&header.1.block_hash());
batch.set(&block_hash, &ser_header)?;
}
self.headers.batch(batch)?;
self.headers_cache.write().clear();
// save all index in batch
let mut batch = Batch::new();
for (height, hash) in self.index_cache.read().iter() {
let ser_hash = serialize(hash);
batch.set(&Integer::from(*height), &ser_hash)?;
}
self.index.batch(batch)?;
self.index_cache.write().clear();
// Flush the header bucket
self.headers.flush()?;
// Flush the block index
self.index.flush()?;
// Flush the default bucket with meta-info
self.meta.flush()?;
Ok(())
}
fn save_header(&self, header: &DiskBlockHeader) -> Result<(), Self::Error> {
self.headers_cache
.write()
.insert(header.block_hash(), *header);
Ok(())
}
fn get_block_hash(&self, height: u32) -> Result<Option<BlockHash>, Self::Error> {
match self.index_cache.read().get(&height).cloned() {
Some(hash) => Ok(Some(hash)),
None => Ok(self
.index
.get(&Integer::from(height))?
.and_then(|b| deserialize(&b).ok())),
}
}
fn update_block_index(&self, height: u32, hash: BlockHash) -> Result<(), Self::Error> {
self.index_cache.write().insert(height, hash);
Ok(())
}
}
On the other hand:
- The
update_block_index
andsave_header
methods do not use the buckets, rather they save the data on theindex_cache
andheaders_cache
HashMaps
. - The
get_block_hash
andget_header
methods first try to read data from theHashMap
, and if not found ask the bucket for a disk read.
// Path: floresta-chain/src/pruned_utreexo/chainstore.rs
impl ChainStore for KvChainStore<'_> {
type Error = kv::Error;
fn load_roots(&self) -> Result<Option<Vec<u8>>, Self::Error> {
self.meta.get(&"roots")
}
fn save_roots(&self, roots: Vec<u8>) -> Result<(), Self::Error> {
self.meta.set(&"roots", &roots)?;
Ok(())
}
fn load_height(&self) -> Result<Option<BestChain>, Self::Error> {
let height = self.meta.get(&"height")?;
if let Some(height) = height {
return Ok(Some(deserialize(&height).unwrap()));
}
Ok(None)
}
fn save_height(&self, height: &BestChain) -> Result<(), Self::Error> {
let height = serialize(height);
self.meta.set(&"height", &height)?;
Ok(())
}
fn get_header(&self, block_hash: &BlockHash) -> Result<Option<DiskBlockHeader>, Self::Error> {
match self.headers_cache.read().get(block_hash) {
Some(header) => Ok(Some(*header)),
None => {
let block_hash = serialize(&block_hash);
Ok(self
.headers
.get(&block_hash)?
.and_then(|b| deserialize(&b).ok()))
}
}
}
fn flush(&self) -> Result<(), Self::Error> {
// save all headers in batch
let mut batch = Batch::new();
for header in self.headers_cache.read().iter() {
let ser_header = serialize(header.1);
let block_hash = serialize(&header.1.block_hash());
batch.set(&block_hash, &ser_header)?;
}
self.headers.batch(batch)?;
self.headers_cache.write().clear();
// save all index in batch
let mut batch = Batch::new();
for (height, hash) in self.index_cache.read().iter() {
let ser_hash = serialize(hash);
batch.set(&Integer::from(*height), &ser_hash)?;
}
self.index.batch(batch)?;
self.index_cache.write().clear();
// Flush the header bucket
self.headers.flush()?;
// Flush the block index
self.index.flush()?;
// Flush the default bucket with meta-info
self.meta.flush()?;
Ok(())
}
fn save_header(&self, header: &DiskBlockHeader) -> Result<(), Self::Error> {
self.headers_cache
.write()
.insert(header.block_hash(), *header);
Ok(())
}
// ...
fn get_block_hash(&self, height: u32) -> Result<Option<BlockHash>, Self::Error> {
match self.index_cache.read().get(&height).cloned() {
// Found in the index_cache
Some(hash) => Ok(Some(hash)),
// Not found in the index_cache, so try to read from the index bucket
None => Ok(self
.index
.get(&Integer::from(height))?
.and_then(|b| deserialize(&b).ok())),
}
}
fn update_block_index(&self, height: u32, hash: BlockHash) -> Result<(), Self::Error> {
self.index_cache.write().insert(height, hash);
Ok(())
}
}
Thus, the index_cache
and headers_cache
data must be written to disk later using the ChainStore::flush
method. This method also ensures that any pending writes in the meta
bucket are committed to disk.
And that's all for this section! Next we will see two important types whose data is saved in the ChainStore
: BestChain
and DiskBlockHeader
.
BestChain and DiskBlockHeader
As previously mentioned, BestChain
and DiskBlockHeader
are Floresta types used for storing and retrieving data in the ChainStore
database.
DiskBlockHeader
We use a custom DiskBlockHeader
instead of the direct bitcoin::block::Header
to add some metadata:
Filename: pruned_utreexo/chainstore.rs
// Path: floresta-chain/src/pruned_utreexo/chainstore.rs
// BlockHeader is an alias for bitcoin::block::Header
pub enum DiskBlockHeader {
FullyValid(BlockHeader, u32),
AssumedValid(BlockHeader, u32),
Orphan(BlockHeader),
HeadersOnly(BlockHeader, u32),
InFork(BlockHeader, u32),
InvalidChain(BlockHeader),
}
DiskBlockHeader
not only holds a header but also encodes possible block states, as well as the height when it makes sense.
When we start downloading headers in IBD we save them as HeadersOnly
. If a header doesn't have a parent, it's saved as Orphan
. If it's not in the best chain, InFork
. And when we validate the actual blocks we should be able to mark the headers as FullyValid
.
Also, we have AssumeValid
for a configuration that allows the node to skip script validation, and InvalidChain
for cases when UpdatableChainstate::invalidate_block
is called.
BestChain
The BestChain
struct is an internal representation of the chain we are in and has the following fields:
best_block
: The current best chain's lastBlockHash
(the actual block may or may not have been validated yet).depth
: The number of blocks pilled after the genesis block (i.e. the height of the tip).validation_index
: TheBlockHash
up to which we have validated the chain.alternative_tips
: A vector of fork tipBlockHash
es with a chance of becoming the best chain.assume_valid_index
: Height occupied by the assume valid block (up to which we don't validate scripts).
Filename: pruned_utreexo/chain_state.rs
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
pub struct BestChain {
best_block: BlockHash,
depth: u32,
validation_index: BlockHash,
alternative_tips: Vec<BlockHash>,
assume_valid_index: u32,
}
Building the ChainState
The next step is building the ChainState
struct, which validates blocks and updates the ChainStore
.
Filename: pruned_utreexo/chain_state.rs
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
pub struct ChainState<PersistedState: ChainStore> {
inner: RwLock<ChainStateInner<PersistedState>>,
}
Note that the RwLock
that we use to wrap ChainStateInner
is not the one from the standard library but from the spin
crate, thus allowing no_std
.
std::sync::RwLock
relies on the OS to block and wake threads when the lock is available, whilespin::RwLock
uses a spinlock which does not require OS support for thread management, as the thread simply keeps running (and checking for lock availability) instead of sleeping.
The builder for ChainState
has this signature:
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
pub fn new(
chainstore: PersistedState,
network: Network,
assume_valid: AssumeValidArg,
) -> ChainState<PersistedState> {
The first argument is our ChainStore
implementation, the second one is a Network
enum, and thirdly the AssumeValidArg
enum.
Network
is an enum with four variants: Bitcoin
(mainchain), Testnet
, Regtest
and Signet
. It's declared in the lib.rs of floresta-chain
, along with conversions from and to bitcoin::network::Network
, an identical enum from the bitcoin
crate.
// Path: floresta-chain/src/lib.rs
// This is the only thing implemented in lib.rs
pub enum Network {
Bitcoin,
Testnet,
Regtest,
Signet,
}
// impl From<bitcoin::network::Network> for Network { ... }
// impl From<Network> for bitcoin::network::Network { ... }
The Assume-Valid Lore
The assume_valid
argument refers to a Bitcoin Core option that allows nodes during IBD to assume the validity of scripts (mainly signatures) up to a certain block.
Nodes with this option enabled will still choose the most PoW chain (the best tip), and will only skip script validation if the Assume-Valid
block is in that chain. Otherwise, if the Assume-Valid
block is not in the best chain, they will validate everything.
When users use the default
Assume-Valid
hash, hardcoded in the software, they aren't blindly trusting script validity. These hashes are reviewed through the same open-source process as other security-critical changes in Bitcoin Core, so the trust model is unchanged.
In Bitcoin Core, the hardcoded Assume-Valid
block hash is included in src/kernel/chainparams.cpp.
Filename: pruned_utreexo/chain_state.rs
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
pub enum AssumeValidArg {
Disabled,
Hardcoded,
UserInput(BlockHash),
}
Disabled
means the node verifies all scripts, Hardcoded
means the node uses the default block hash that has been hardcoded in the software (and validated by maintainers, developers and reviewers), and UserInput
means using a hash that the node runner provides, although the validity of the scripts up to this block should have been externally validated.
Genesis and Assume-Valid Blocks
The first part of the body of ChainState::new
(let's omit the impl
block from now on):
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
pub fn new(
chainstore: PersistedState,
network: Network,
assume_valid: AssumeValidArg,
) -> ChainState<PersistedState> {
let parameters = network.into();
let genesis = genesis_block(¶meters);
chainstore
.save_header(&super::chainstore::DiskBlockHeader::FullyValid(
genesis.header,
0,
))
.expect("Error while saving genesis");
chainstore
.update_block_index(0, genesis.block_hash())
.expect("Error updating index");
let assume_valid = ChainParams::get_assume_valid(network, assume_valid);
// ...
ChainState {
inner: RwLock::new(ChainStateInner {
chainstore,
acc: Stump::new(),
best_block: BestChain {
best_block: genesis.block_hash(),
depth: 0,
validation_index: genesis.block_hash(),
alternative_tips: Vec::new(),
assume_valid_index: 0,
},
broadcast_queue: Vec::new(),
subscribers: Vec::new(),
fee_estimation: (1_f64, 1_f64, 1_f64),
ibd: true,
consensus: Consensus { parameters },
assume_valid,
}),
}
}
First, we use the genesis_block
function from bitcoin
to retrieve the genesis block based on the specified parameters, which are determined by our Network
.
Then we save the genesis header into chainstore
, which of course is FullyValid
and has height 0. We also link the index 0 with the genesis block hash.
Finally, we get an Option<BlockHash>
by calling the ChainParams::get_assume_valid
function, which takes a Network
and an AssumeValidArg
.
Filename: pruned_utreexo/chainparams.rs
// Path: floresta-chain/src/pruned_utreexo/chainparams.rs
// Omitted: impl ChainParams {
pub fn get_assume_valid(network: Network, arg: AssumeValidArg) -> Option<BlockHash> {
match arg {
// No assume-valid hash
AssumeValidArg::Disabled => None,
// Use the user-provided hash
AssumeValidArg::UserInput(hash) => Some(hash),
// Fetch the hardcoded values, depending on the network
AssumeValidArg::Hardcoded => match network {
Network::Bitcoin => Some(bhash!(
"00000000000000000000569f4d863c27e667cbee8acc8da195e7e5551658e6e9"
)),
Network::Testnet => Some(bhash!(
"000000000000001142ad197bff16a1393290fca09e4ca904dd89e7ae98a90fcd"
)),
Network::Signet => Some(bhash!(
"0000003ed17b9c93954daab00d73ccbd0092074c4ebfc751c7458d58b827dfea"
)),
Network::Regtest => Some(bhash!(
"0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206"
)),
},
}
}
The final part of ChainState::new
just returns the instance of ChainState
with the ChainStateInner
initialized. We will see this initialization next.
Initializing ChainStateInner
This is the struct that holds the meat of the matter (or should we say the root of the issue). As it's inside the spin::RwLock
it can be read and modified in a thread-safe way. Its fields are:
acc
: The accumulator, of typeStump
(which comes from therustreexo
crate).chainstore
: Our implementation ofChainStore
.best_block
: Of typeBestChain
.broadcast_queue
: Holds a list of transactions to be broadcast, of typeVec<Transaction>
.subscribers
: A vector of trait objects (different types allowed) that implement theBlockConsumer
trait, indicating they want to get notified when a new valid block arrives.fee_estimation
: Fee estimation for the next 1, 10 and 20 blocks, as a tuple of three f64.ibd
: A boolean indicating if we are in IBD.consensus
: Parameters for the chain validation, as aConsensus
struct (a Floresta type that will be explained in detail in Chapter 4).assume_valid
: As anOption<BlockHash>
.
Note that the accumulator and the best block data are kept in our ChainStore
, but we cache them in ChainStateInner
for faster access, avoiding potential disk reads and deserializations (e.g. loading them from the meta
bucket if we use KvChainStore
).
Let's next see how these fields are accessed with an example.
Adding Subscribers to ChainState
As ChainState
implements BlockchainInterface
, it has a subscribe
method to allow other types receive notifications.
The subscribers are stored in the ChainStateInner.subscribers
field, but we need to handle the RwLock
that wraps ChainStateInner
for that.
Filename: pruned_utreexo/chain_state.rs
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
// Omitted: impl<PersistedState: ChainStore> BlockchainInterface for ChainState<PersistedState> {
fn subscribe(&self, tx: Arc<dyn BlockConsumer>) {
let mut inner = self.inner.write();
inner.subscribers.push(tx);
}
This is the BlockchainInterface::subscribe
implementation. We use the write
method on the RwLock
which then gives us exclusive access to the ChainStateInner
.
When inner
is dropped after the push, the lock is released and becomes available for other threads, which may acquire write
access (one thread at a time) or read
access (multiple threads simultaneously).
Initial ChainStateInner Values
In ChainState::new
we initialize ChainStateInner
like so:
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
pub fn new(
chainstore: PersistedState,
network: Network,
assume_valid: AssumeValidArg,
) -> ChainState<PersistedState> {
let parameters = network.into();
let genesis = genesis_block(¶meters);
chainstore
.save_header(&super::chainstore::DiskBlockHeader::FullyValid(
genesis.header,
0,
))
.expect("Error while saving genesis");
chainstore
.update_block_index(0, genesis.block_hash())
.expect("Error updating index");
let assume_valid = ChainParams::get_assume_valid(network, assume_valid);
// ...
ChainState {
inner: RwLock::new(ChainStateInner {
chainstore,
acc: Stump::new(),
best_block: BestChain {
best_block: genesis.block_hash(),
depth: 0,
validation_index: genesis.block_hash(),
alternative_tips: Vec::new(),
assume_valid_index: 0,
},
broadcast_queue: Vec::new(),
subscribers: Vec::new(),
fee_estimation: (1_f64, 1_f64, 1_f64),
ibd: true,
consensus: Consensus { parameters },
assume_valid,
}),
}
}
The TLDR is that we move chainstore
to the ChainStateInner
, initialize the accumulator (Stump::new
), initialize BestChain
with the genesis block (being the best block and best validated block) and depth 0, initialize broadcast_queue
and subscribers
as empty vectors, set the minimum fee estimations, set ibd
to true, use the Consensus
parameters for the current Network
and move the assume_valid
optional hash in.
Recap
In this chapter we have understood the structure of ChainState
, a type implementing UpdatableChainstate + BlockchainInterface
(i.e. a blockchain backend). This type required a ChainStore
implementation, expected to save state data to disk, and we examined the provided KvChainStore
.
Finally, we have seen the ChainStateInner
struct, which keeps track of the ChainStore
and more data.
We can now build a ChainState
as simply as:
fn main() {
let chain_store =
KvChainStore::new("./epic_location".to_string())
.expect("failed to open the blockchain database");
let chain = ChainState::new(
chain_store,
Network::Bitcoin,
AssumeValidArg::Disabled,
);
}
State Transition and Validation
With the ChainState
struct built, the foundation for running a node, we can now dive into the methods that validate and apply state transitions.
ChainState
has 4 impl
blocks (all located in pruned_utreexo/chain_state.rs):
- The
BlockchainInterface
trait implementation - The
UpdatableChainstate
trait implementation - The implementation of other methods and associated functions like
ChainState::new
- The conversion from
ChainStateBuilder
(builder type located in pruned_utreexo/chain_state_builder.rs) toChainState
The entry point to the state transition and validation logic are the accept_header
and connect_block
methods from UpdatableChainstate
. As we have explained previously, the first step in the IBD is accepting headers, so we will start with that.
Accepting Headers
The full accept_header
method implementation for ChainState
is below. To get read or write access to the ChainStateInner
we use two macros, read_lock
and write_lock
.
In short, the method takes a bitcoin::block::Header
(type alias BlockHeader
) and accepts it on top of our chain of headers, or maybe reorgs if it's extending a better chain (i.e. switching to the new better chain). If there's an error it returns BlockchainError
, which we mentioned in The UpdatableChainstate Trait subsection from Chapter 1.
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
fn accept_header(&self, header: BlockHeader) -> Result<(), BlockchainError> {
trace!("Accepting header {header:?}");
let disk_header = self.get_disk_block_header(&header.block_hash());
match disk_header {
Err(BlockchainError::Database(_)) => {
// If there's a database error we don't know if we already
// have the header or not
return Err(disk_header.unwrap_err());
}
Ok(found) => {
// Possibly reindex to recompute the best_block field
self.maybe_reindex(&found);
// We already have this header
return Ok(());
}
_ => (),
}
// The best block we know of
let best_block = self.get_best_block()?;
// Do validation in this header
let block_hash = self.validate_header(&header)?;
// Update our current tip
if header.prev_blockhash == best_block.1 {
let height = best_block.0 + 1;
trace!("Header builds on top of our best chain");
let mut inner = write_lock!(self);
inner.best_block.new_block(block_hash, height);
inner
.chainstore
.save_header(&super::chainstore::DiskBlockHeader::HeadersOnly(
header, height,
))?;
inner.chainstore.update_block_index(height, block_hash)?;
} else {
trace!("Header not in the best chain");
self.maybe_reorg(header)?;
}
Ok(())
}
First, we check if we already have the header in our database. We query it with the get_disk_block_header
method, which just wraps ChainStore::get_header
in order to return BlockchainError
(instead of T: DatabaseError
).
If get_disk_block_header
returns Err
it may be because the header was not in the database or because there was a DatabaseError
. In the latter case, we propagate the error.
We have the header
If we already have the header in our database we may reindex, which means recomputing the BestChain
struct, and return Ok
early.
Reindexing updates the
best_block
field if it is not up-to-date with the disk headers (for instance, having headers up to the 105th, butbest_block
only referencing the 100th). This happens when the node is turned off or crashes before persisting the latestBestChain
data.
We don't have the header
If we don't have the header, then we get the best block hash and height (with BlockchainInterface::get_best_block
) and perform a simple validation on the header with validate_header
. If validation passes, we potentially update the current tip.
- If the new header extends the previous best block:
- We update the
best_block
field, adding the new block hash and height. - Then we call
save_header
andupdate_block_index
to update the database (or theHashMap
caches if we useKvChainStore
).
- We update the
- If the header doesn't extend the current best chain, we may reorg if it extends a better chain.
Reindexing
During IBD, headers arrive rapidly, making it pointless to write the BestChain
data to disk for every new header. Instead, we update the ChainStateInner.best_block
field and only persist it occasionally, avoiding redundant writes that would instantly be overridden.
But there is a possibility that the node is shut down or crashes before save_height
is called (or before the pending write is completed) and after the headers have been written to disk. In this case we can recompute the last BestChain
data by going through the headers on disk. This recovery process is handled by the reindex_chain
method within maybe_reindex
.
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
fn maybe_reindex(&self, potential_tip: &DiskBlockHeader) {
// Check if the disk header is an unvalidated block in the best chain
if let DiskBlockHeader::HeadersOnly(_, height) = potential_tip {
// If the best chain height is lower, it needs to be updated
if *height > self.get_best_block().unwrap().0 {
let best_chain = self.reindex_chain();
write_lock!(self).best_block = best_chain;
}
}
}
We call reindex_chain
if disk header's height > best_block's height, as it means that best_block
is not up-to-date with the headers on disk.
Validate Header
The validate_header
method takes a BlockHeader
and performs the following checks:
Check the header chain
- Retrieve the previous
DiskBlockHeader
. If not found, returnBlockchainError::BlockNotPresent
orBlockchainError::Database
. - If the previous
DiskBlockHeader
is marked asOrphan
orInvalidChain
, returnBlockchainError::BlockValidation
.
Check the PoW
- Use the
get_next_required_work
method to compute the expected PoW target and compare it with the header's actual target. If the actual target is easier, returnBlockchainError::BlockValidation
. - Verify the PoW against the target using a
bitcoin
method. If verification fails, returnBlockchainError::BlockValidation
.
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
fn validate_header(&self, block_header: &BlockHeader) -> Result<BlockHash, BlockchainError> {
let prev_block = self.get_disk_block_header(&block_header.prev_blockhash)?;
let prev_block_height = prev_block.height();
if prev_block_height.is_none() {
return Err(BlockValidationErrors::BlockExtendsAnOrphanChain)?;
}
let height = prev_block_height.unwrap() + 1;
// ...
// Check pow
let expected_target = self.get_next_required_work(&prev_block, height, block_header);
let actual_target = block_header.target();
if actual_target > expected_target {
return Err(BlockValidationErrors::NotEnoughPow)?;
}
let block_hash = block_header
.validate_pow(actual_target)
.map_err(|_| BlockValidationErrors::NotEnoughPow)?;
Ok(block_hash)
}
A block header passing this validation will not make the block itself valid, but we can use this to build the chain of headers with verified PoW.
Reorging the Chain
In the accept_header
method we have seen that, when receiving a header that doesn't extend the best chain, we may reorg. This is done with the maybe_reorg
method.
We have to choose between the two branches, represented by:
branch_tip
: The last header from the alternative chain.current_tip
: The last header from the current best chain.
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
fn maybe_reorg(&self, branch_tip: BlockHeader) -> Result<(), BlockchainError> {
let current_tip = self.get_block_header(&self.get_best_block().unwrap().1)?;
self.check_branch(&branch_tip)?;
let current_work = self.get_branch_work(¤t_tip)?;
let new_work = self.get_branch_work(&branch_tip)?;
// If the new branch has more work, it becomes the new best chain
if new_work > current_work {
self.reorg(branch_tip)?;
return Ok(());
}
// If the new branch has less work, we just store it as an alternative branch
// that might become the best chain in the future.
self.push_alt_tip(&branch_tip)?;
let parent_height = self.get_ancestor(&branch_tip)?.height().unwrap();
read_lock!(self)
.chainstore
.save_header(&super::chainstore::DiskBlockHeader::InFork(
branch_tip,
parent_height + 1,
))?;
Ok(())
}
We first call the check_branch
method to check if we know all the branch_tip
ancestors. In other words, we check if branch_tip
is indeed part of a branch, which requires that no ancestor is Orphan
.
Then we get the work in each chain tip with get_branch_work
and do the following:
- We reorg to the
branch_tip
if it has more work, and returnOk
early. - Else if
branch_tip
doesn't have more work we push its hash tobest_block.alternative_tips
via thepush_alt_tip
method and save the header asInFork
.
The push_alt_tip
method just checks if the branch_tip
parent hash is in alternative_tips
to remove it, as it's no longer the tip of the branch. Then we simply push the branch_tip
hash.
Reorg
Let's now dig into reorg logic, with reorg
. We start by querying the best block hash and use it to query its header. Then we get the header where the branch forks out with find_fork_point
.
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
fn reorg(&self, new_tip: BlockHeader) -> Result<(), BlockchainError> {
let current_best_block = self.get_best_block().unwrap().1;
let current_best_block = self.get_block_header(¤t_best_block)?;
let fork_point = self.find_fork_point(&new_tip)?;
self.mark_chain_as_inactive(¤t_best_block, fork_point.block_hash())?;
self.mark_chain_as_active(&new_tip, fork_point.block_hash())?;
let validation_index = self.get_last_valid_block(&new_tip)?;
let depth = self.get_chain_depth(&new_tip)?;
self.change_active_chain(&new_tip, validation_index, depth);
Ok(())
}
We use mark_chain_as_inactive
and mark_chain_as_active
to update the disk data (i.e. marking the previous InFork
headers as HeadersOnly
and vice versa, and linking the height indexes to the new branch block hashes).
Then we invoke get_last_valid_block
and get_chain_depth
to obtain said data from a branch, provided the branch header tip.
Note that we don't validate forks unless they become the best chain, so in this case the last validated block is the last common block between the two branches.
With this data we call change_active_chain
to update the best_block
field. It also re-initializes the accumulator as we will need to fetch the utreexo roots for the last common block to proceed with the new branch validation.
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
fn change_active_chain(&self, new_tip: &BlockHeader, last_valid: BlockHash, depth: u32) {
let mut inner = self.inner.write();
inner.best_block.best_block = new_tip.block_hash();
inner.best_block.validation_index = last_valid;
inner.best_block.depth = depth;
inner.acc = Stump::new();
}
Connecting Blocks
Great! At this point we should have a sense of the inner workings of accept_headers
. Let's now understand the connect_block
method, which performs the actual block validation and updates the ChainStateInner
fields and database.
connect_block
takes a Block
, an UTXO set inclusion Proof
from rustreexo
, the inputs to spend from the UTXO set and the hashes from said inputs. If result is Ok
the function returns the height.
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
fn connect_block(
&self,
block: &Block,
proof: Proof,
inputs: HashMap<OutPoint, TxOut>,
del_hashes: Vec<sha256::Hash>,
) -> Result<u32, BlockchainError> {
let header = self.get_disk_block_header(&block.block_hash())?;
let height = match header {
DiskBlockHeader::FullyValid(_, height) => return Ok(height),
// If it's valid or orphan, we don't validate
DiskBlockHeader::Orphan(_)
| DiskBlockHeader::AssumedValid(_, _) // this will be validated by a partial chain
| DiskBlockHeader::InFork(_, _)
| DiskBlockHeader::InvalidChain(_) => return Ok(0),
DiskBlockHeader::HeadersOnly(_, height) => height,
};
// Check if this block is the next one in our chain, if we try
// to add them out-of-order, we'll have consensus issues with our
// accumulator
let expected_height = self.get_validation_index()? + 1;
if height != expected_height {
return Ok(height);
}
self.validate_block_no_acc(block, height, inputs)?;
let acc = Consensus::update_acc(&self.acc(), block, height, proof, del_hashes)?;
self.update_view(height, &block.header, acc)?;
info!(
"New tip! hash={} height={height} tx_count={}",
block.block_hash(),
block.txdata.len()
);
#[cfg(feature = "metrics")]
metrics::get_metrics().block_height.set(height.into());
if !self.is_in_idb() || height % 10_000 == 0 {
self.flush()?;
}
// Notify others we have a new block
self.notify(block, height);
Ok(height)
}
When we call connect_block
, the header should already be stored on disk, as accept_header
is called first.
If the header is FullyValid
it means we already validated the block, so we can return Ok
early. Else if the header is Orphan
, AssumeValid
, InFork
or InvalidChain
we don't validate and return Ok
with height 0.
Recall that
InvalidChain
doesn't mean our blockchain backend validated the block with a false result. Rather it means the backend was told to consider it invalid withBlockchainInterface::invalidate_block
.
If header is HeadersOnly
we get the height and continue. If this block, however, is not next one to validate, we return early again without validating the block. This is because we can only use the accumulator at height h to validate the block h + 1.
When block
is the next block to validate, we finally use validate_block_no_acc
, and then the Consensus::update_acc
function, which verifies the inclusion proof against the accumulator and returns the updated accumulator.
After this, we have fully validated the block! The next steps in connect_block
are updating the state and notifying the block to subscribers.
Post-Validation
After block validation we call update_view
to mark the disk header as FullyValid
(ChainStore::save_header
), update the block hash index (ChainStore::update_block_index
) and also update ChainStateInner.acc
and the validation index of best_block
.
Then, we call UpdatableChainstate::flush
every 10,000 blocks during IBD or for each new block once synced. In order, this method invokes:
save_acc
, which serializes the accumulator and callsChainStore::save_roots
ChainStore::save_height
ChainStore::flush
, to immediately flush to disk all pending writes
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
fn flush(&self) -> Result<(), BlockchainError> {
self.save_acc()?;
let inner = read_lock!(self);
inner.chainstore.save_height(&inner.best_block)?;
inner.chainstore.flush()?;
Ok(())
}
Note that this is the only time we persist the roots and height (best chain data), and it is the only time we persist the headers and index data if we use
KvChainStore
as store.
Last of all, we notify
the new validated block to subscribers.
Block Validation
We have arrived at the final part of this chapter! Here we understand the validate_block_no_acc
method that we used in connect_block
, and lastly do a recap of everything.
validate_block_no_acc
is also used inside theBlockchainInterface::validate_block
trait method implementation, as it encapsulates the non-utreexo validation logic.The difference between
UpdatableChainstate::connect_block
andBlockchainInterface::validate_block
is that the first is used during IBD, while the latter is a tool that allows the node user to validate blocks without affecting the node state.
The method returns BlockchainError::BlockValidation
, wrapping many different BlockValidationErrors
, which is an enum declared in pruned_utreexo/errors.rs.
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
pub fn validate_block_no_acc(
&self,
block: &Block,
height: u32,
inputs: HashMap<OutPoint, TxOut>,
) -> Result<(), BlockchainError> {
if !block.check_merkle_root() {
return Err(BlockValidationErrors::BadMerkleRoot)?;
}
let bip34_height = self.chain_params().params.bip34_height;
// If bip34 is active, check that the encoded block height is correct
if height >= bip34_height && self.get_bip34_height(block) != Some(height) {
return Err(BlockValidationErrors::BadBip34)?;
}
if !block.check_witness_commitment() {
return Err(BlockValidationErrors::BadWitnessCommitment)?;
}
if block.weight().to_wu() > 4_000_000 {
return Err(BlockValidationErrors::BlockTooBig)?;
}
// Validate block transactions
let subsidy = read_lock!(self).consensus.get_subsidy(height);
let verify_script = self.verify_script(height);
#[cfg(feature = "bitcoinconsensus")]
let flags = self.get_validation_flags(height, block.header.block_hash());
#[cfg(not(feature = "bitcoinconsensus"))]
let flags = 0;
Consensus::verify_block_transactions(
height,
inputs,
&block.txdata,
subsidy,
verify_script,
flags,
)?;
Ok(())
}
In order, we do the following things:
- Call
check_merkle_root
on theblock
, to check that the merkle root commits to all the transactions. - Check that, if the height is greater or equal than that of the activation of BIP 34, the coinbase transaction encodes the height as specified.
- Call
check_witness_commitment
, to check that thewtxid
merkle root is included in the coinbase transaction as per BIP 141. - Finally, check that the block weight doesn't exceed the 4,000,000 weight unit limit.
Lastly we go on to validate the transactions. We retrieve the current subsidy (the newly generated coins) with the get_subsidy
method on our Consensus
struct. We also call the verify_script
method which returns a boolean flag indicating if we are NOT inside the Assume-Valid
range.
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
fn verify_script(&self, height: u32) -> bool {
let inner = self.inner.read();
match inner.assume_valid {
Some(hash) => {
match inner.chainstore.get_header(&hash).unwrap() {
// If the assume-valid block is in the best chain, only verify scripts if we are higher
Some(DiskBlockHeader::HeadersOnly(_, assume_h))
| Some(DiskBlockHeader::FullyValid(_, assume_h)) => height > assume_h,
// Assume-valid is not in the best chain, so verify all the scripts
_ => true,
}
}
None => true,
}
}
We also get the validation flags for the current height with get_validation_flags
, only if the bitcoinconsensus
feature is active. These flags are used to validate transactions taking into account the different consensus rules that have been added over time.
And lastly we call the Consensus::verify_block_transactions
associated function.
Recap
In this chapter we have seen many of the methods for the ChainState
type, mainly related to the chain validation and state transition process.
We started with accept_header
, that checked if the header was in the database or not. If it was in the database we just called maybe_reindex
. If it was not, we validated it and potentially updated the chain tip, either by extending it or by reorging. We also called ChainStore::save_header
and ChainStore::update_block_index
.
Then we saw connect_block
, which validated the next block in the chain with validate_block_no_acc
. If block was valid, we may invoke UpdatableChainstate::flush
to persist all data (including the data saved with ChainStore::save_roots
and ChainStore::save_height
) and also marked the disk header as FullyValid
. Finally, we notified the new block.
Consensus and bitcoinconsensus
In the previous chapter, we saw that the block validation process involves two associated functions from Consensus
:
verify_block_transactions
: The last check performed insidevalidate_block_no_acc
, after having validated the two merkle roots and height commitment.update_acc
: Called insideconnect_block
, just aftervalidate_block_no_acc
, to verify the utreexo proof and get the updated accumulator.
The Consensus
struct only holds a parameters
field (as we saw when we initialized ChainStateInner) and provides a few core consensus functions. In this chapter we are going to see the two mentioned functions and discuss the details of how we verify scripts.
Filename: pruned_utreexo/consensus.rs
// Path: floresta-chain/src/pruned_utreexo/consensus.rs
pub struct Consensus {
// The chain parameters are in the chainparams.rs file
pub parameters: ChainParams,
}
bitcoinconsensus
Consensus::verify_block_transactions
is a critical part of Floresta, as it validates all the transactions in a block. One of the hardest parts for validation is checking the script satisfiability, that is, verifying whether the inputs can indeed spend the coins. It's also the most resource-intensive part of block validation, as it requires verifying many digital signatures.
Implementing a Bitcoin script interpreter is challenging, and given the complexity of both C++ and Rust, we cannot be certain that it will always behave in the same way as Bitcoin Core
. This is problematic because if our Rust implementation rejects a script that Bitcoin Core
accepts, our node will fork from the network. It will treat subsequent blocks as invalid, halting synchronization with the chain and being unable to continue tracking the user balance.
Partly because of this reason, in 2015 the script validation logic of Bitcoin Core
was extracted and placed into the libbitcoin-consensus library. This library includes 35 files that are identical to those of Bitcoin Core
. Subsequently, the library API was bound to Rust in rust-bitcoinconsensus, which serves as the bitcoinconsensus
feature-dependency in the bitcoin
crate.
If this feature is set, bitcoin
provides a verify_with_flags
method on Transaction
, which performs the script validation by calling C++ code extracted from Bitcoin Core
. Floresta uses this method to verify scripts.
libbitcoinkernel
bitcoinconsensus
handles only script validation and is maintained as a separate project fromBitcoin Core
, with limited upkeep.To address these shortcomings there's an ongoing effort within the
Bitcoin Core
community to extract the whole consensus engine into a library. This is known as the libbitcoinkernel project.Once this is achieved, we should be able to drop all the consensus code in Floresta and replace the
bitcoinconsensus
dependency with the Rust bindings for the new library. This would make Floresta safer and more reliable as a full node.
Transaction Validation
Let's now dive into Consensus::verify_block_transactions
, to see how we verify the transactions in a block. As we saw in the Block Validation section from last chapter, this function takes the height, the UTXOs to spend, the spending transactions, the current subsidy, the verify_script
boolean (which was only true when we are not in the Assume-Valid
range) and the validation flags.
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
pub fn validate_block_no_acc(
&self,
block: &Block,
height: u32,
inputs: HashMap<OutPoint, TxOut>,
) -> Result<(), BlockchainError> {
if !block.check_merkle_root() {
return Err(BlockValidationErrors::BadMerkleRoot)?;
}
let bip34_height = self.chain_params().params.bip34_height;
// If bip34 is active, check that the encoded block height is correct
if height >= bip34_height && self.get_bip34_height(block) != Some(height) {
return Err(BlockValidationErrors::BadBip34)?;
}
if !block.check_witness_commitment() {
return Err(BlockValidationErrors::BadWitnessCommitment)?;
}
if block.weight().to_wu() > 4_000_000 {
return Err(BlockValidationErrors::BlockTooBig)?;
}
// Validate block transactions
let subsidy = read_lock!(self).consensus.get_subsidy(height);
let verify_script = self.verify_script(height);
// ...
#[cfg(feature = "bitcoinconsensus")]
let flags = self.get_validation_flags(height, block.header.block_hash());
#[cfg(not(feature = "bitcoinconsensus"))]
let flags = 0;
Consensus::verify_block_transactions(
height,
inputs,
&block.txdata,
subsidy,
verify_script,
flags,
)?;
Ok(())
}
Validation Flags
The validation flags were returned by get_validation_flags
based on the current height and block hash, and they are of type core::ffi::c_uint
: a foreign function interface type used for the C++ bindings.
// Path: floresta-chain/src/pruned_utreexo/chain_state.rs
// Omitted: impl<PersistedState: ChainStore> ChainState<PersistedState> {
fn get_validation_flags(&self, height: u32, hash: BlockHash) -> c_uint {
let chain_params = &read_lock!(self).consensus.parameters;
if let Some(flag) = chain_params.exceptions.get(&hash) {
return *flag;
}
// From Bitcoin Core:
// BIP16 didn't become active until Apr 1 2012 (on mainnet, and
// retroactively applied to testnet)
// However, only one historical block violated the P2SH rules (on both
// mainnet and testnet).
// Similarly, only one historical block violated the TAPROOT rules on
// mainnet.
// For simplicity, always leave P2SH+WITNESS+TAPROOT on except for the two
// violating blocks.
let mut flags = bitcoinconsensus::VERIFY_P2SH | bitcoinconsensus::VERIFY_WITNESS;
if height >= chain_params.params.bip65_height {
flags |= bitcoinconsensus::VERIFY_CHECKLOCKTIMEVERIFY;
}
if height >= chain_params.params.bip66_height {
flags |= bitcoinconsensus::VERIFY_DERSIG;
}
if height >= chain_params.csv_activation_height {
flags |= bitcoinconsensus::VERIFY_CHECKSEQUENCEVERIFY;
}
if height >= chain_params.segwit_activation_height {
flags |= bitcoinconsensus::VERIFY_NULLDUMMY;
}
flags
}
The flags cover the following consensus rules added to Bitcoin over time:
- P2SH (BIP 16): Activated at height 173,805
- Enforce strict DER signatures (BIP 66): Activated at height 363,725
- CHECKLOCKTIMEVERIFY (BIP 65): Activated at height 388,381
- CHECKSEQUENCEVERIFY (BIP 112): Activated at height 419,328
- Segregated Witness (BIP 141) and Null Dummy (BIP 147): Activated at height 481,824
Verify Block Transactions
Now, the Consensus::verify_block_transactions
function has this body, which in turn calls Consensus::verify_transaction
:
Filename: pruned_utreexo/consensus.rs
// Path: floresta-chain/src/pruned_utreexo/consensus.rs
// Omitted: impl Consensus {
/// Verify if all transactions in a block are valid. Here we check the following:
/// - The block must contain at least one transaction, and this transaction must be coinbase
/// - The first transaction in the block must be coinbase
/// - The coinbase transaction must have the correct value (subsidy + fees)
/// - The block must not create more coins than allowed
/// - All transactions must be valid:
/// - The transaction must not be coinbase (already checked)
/// - The transaction must not have duplicate inputs
/// - The transaction must not spend more coins than it claims in the inputs
/// - The transaction must have valid scripts
#[allow(unused)]
pub fn verify_block_transactions(
height: u32,
mut utxos: HashMap<OutPoint, TxOut>,
transactions: &[Transaction],
subsidy: u64,
verify_script: bool,
flags: c_uint,
) -> Result<(), BlockchainError> {
// Blocks must contain at least one transaction (i.e. the coinbase)
if transactions.is_empty() {
return Err(BlockValidationErrors::EmptyBlock)?;
}
// Total block fees that the miner can claim in the coinbase
let mut fee = 0;
for (n, transaction) in transactions.iter().enumerate() {
if n == 0 {
if !transaction.is_coinbase() {
return Err(BlockValidationErrors::FirstTxIsNotCoinbase)?;
}
// Check coinbase input and output script limits
Self::verify_coinbase(transaction)?;
// Skip next checks: coinbase input is exempt, coinbase reward checked later
continue;
}
// Actually verify the transaction
let (in_value, out_value) =
Self::verify_transaction(transaction, &mut utxos, verify_script, flags)?;
// Fee is the difference between inputs and outputs
fee += in_value - out_value;
}
// Check coinbase output values to ensure the miner isn't producing excess coins
let allowed_reward = fee + subsidy;
let coinbase_total: u64 = transactions[0]
.output
.iter()
.map(|out| out.value.to_sat())
.sum();
if coinbase_total > allowed_reward {
return Err(BlockValidationErrors::BadCoinbaseOutValue)?;
}
Ok(())
}
/// Verifies a single transaction. This function checks the following:
/// - The transaction doesn't spend more coins than it claims in the inputs
/// - The transaction doesn't create more coins than allowed
/// - The transaction has valid scripts
/// - The transaction doesn't have duplicate inputs (implicitly checked by the hashmap)
fn verify_transaction(
transaction: &Transaction,
utxos: &mut HashMap<OutPoint, TxOut>,
_verify_script: bool,
_flags: c_uint,
) -> Result<(u64, u64), BlockchainError> {
let txid = || transaction.compute_txid();
let out_value: u64 = transaction
.output
.iter()
.map(|out| out.value.to_sat())
.sum();
let mut in_value = 0;
for input in transaction.input.iter() {
let txo = Self::get_utxo(input, utxos, txid)?;
in_value += txo.value.to_sat();
// Check script sizes (spent txo pubkey, and current tx scriptsig and TODO witness)
Self::validate_script_size(&txo.script_pubkey, || input.previous_output.txid)?;
Self::validate_script_size(&input.script_sig, txid)?;
// TODO check also witness script size
}
// Value in should be greater or equal to value out. Otherwise, inflation.
if out_value > in_value {
return Err(tx_err!(txid, NotEnoughMoney))?;
}
// Sanity check
if out_value > 21_000_000 * COIN_VALUE {
return Err(BlockValidationErrors::TooManyCoins)?;
}
// Verify the tx script
#[cfg(feature = "bitcoinconsensus")]
if _verify_script {
transaction
.verify_with_flags(|outpoint| utxos.remove(outpoint), _flags)
.map_err(|e| tx_err!(txid, ScriptValidationError, e.to_string()))?;
};
Ok((in_value, out_value))
}
In general the function behavior is well explained in the comments. Something to note is that we need the bitcoinconsensus
feature set in order to use verify_with_flags
, to verify the transaction scripts. If it's not set we won't perform script validation, so bitcoinconsensus
should probably be mandatory, not just opt-in.
We also don't validate if verify_script
is false, but this is because the Assume-Valid
process has already assessed the scripts as valid.
Note that these consensus checks are far from complete. More checks will be added in the short term, but once libbitcoinkernel
bindings are ready this function will instead use them.
Utreexo Validation
In the previous section we have seen the Consensus::verify_block_transactions
function. It was taking a utxos
argument, used to verify that each transaction input satisfies the script of the UTXO it spends, and that transactions spend no more than the sum of input amounts.
However, we have yet to verify that these utxos
actually exist in the UTXO set, i.e. check that nobody is spending coins out of thin air. That's what we are going to do inside Consensus::update_acc
, and get the updated UTXO set accumulator, with spent UTXOs removed and new ones added.
Recall that
Stump
is the type of our accumulator, coming from therustreexo
crate.Stump
represents the merkle roots of a forest where leaves are UTXO hashes.
Figure 4: A visual depiction of the utreexo forest. To prove that UTXO 4
is part of the set we provide the hash of UTXO 3
and h1
. With this data we can re-compute the h5
root, which must be identical. Credit: original utreexo post.
In the function we get the new leaf hashes (the hashes of newly created UTXOs in the block) by calling udata::proof_util::get_block_adds
. This function returns the new leaves to add to the accumulator, which exclude two cases:
- Created UTXOs that are provably unspendable (e.g. an OP_RETURN output or any output with a script larger than 10,000 bytes).
- Created UTXOs spent within the same block.
Finally, we get the updated Stump
using its modify
method, provided the leaves to add, the leaves to remove and the proof of inclusion for the latter. This method both verifies the proof and generates the new accumulator.
// Path: floresta-chain/src/pruned_utreexo/consensus.rs
// Omitted: impl Consensus {
pub fn update_acc(
acc: &Stump,
block: &Block,
height: u32,
proof: Proof,
del_hashes: Vec<sha256::Hash>,
) -> Result<Stump, BlockchainError> {
let block_hash = block.block_hash();
// Convert to BitcoinNodeHashes, from rustreexo
let del_hashes: Vec<_> = del_hashes.into_iter().map(Into::into).collect();
let adds = udata::proof_util::get_block_adds(block, height, block_hash);
// Update the accumulator
let acc = acc.modify(&adds, &del_hashes, &proof)?.0;
Ok(acc)
}
Advanced Chain Validation Methods
Before moving on from floresta-chain
to floresta-wire
, it's time to understand some advanced validation methods and another Chain
backend provided by Floresta: the PartialChainState
.
Although similar to ChainState
, PartialChainState
focuses on validating a limited range of blocks rather than the entire chain. This design is closely linked to concepts like UTXO snapshots (precomputed UTXO states at specific blocks) and, in particular, out-of-order validation, which we’ll explore below.
Out-of-Order Validation
One of the most powerful features enabled by utreexo is out-of-order validation, which allows block intervals to be validated independently if we know the utreexo roots at the start of each interval.
In traditional IBDs, block validation is inherently sequential: a block at height h
depends on the UTXO set resulting from block h - 1
, which in turn depends on h - 2
, and so forth. However, with UTXO set snapshots for specific blocks, validation can become non-linear.
Figure 5: Visual explanation of three block intervals, starting at block 1
, 100,001
, and 200,001
, that can be validated in parallel if we have the UTXO sets for those blocks. Credit: original post from Calvin Kim.
This process remains fully trustless because, at the end, we verify that the resulting UTXO set from one interval matches the UTXO set snapshot used to start the next. For example, in the image, the UTXO set after block 100,000
must match the set used for the interval beginning at block 100,001
, and so on.
Ultimately, the sequential nature of block validation is preserved. The worst outcome is wasting resources if the UTXO snapshots are incorrect, so it's still important to obtain these snapshots from a reliable source, such as hardcoded values within the software or reputable peers.
Out-of-Order Validation Without Utreexo
Out-of-order validation is technically possible without utreexo, but it would require entire UTXO sets for each interval, which would take many gigabytes.
Utreexo makes this feasible with compact accumulators, avoiding the need for full UTXO set storage and frequent disk reads. Instead, spent UTXOs are fetched on demand from the network, along with their inclusion proofs.
Essentially, we are trading disk operations for hash computations (by verifying merkle proofs and updating roots), along with a slightly higher network data demand. In other words, utreexo enables parallel validation while avoiding the bottleneck of slow disk access.
Trusted UTXO Set Snapshots
A related but slightly different concept is the Assume-Utxo
feature in Bitcoin Core
, which hardcodes a trusted, recent UTXO set hash. When a new node starts syncing, it downloads the corresponding UTXO set from the network, verifies its hash against the hardcoded value, and temporarily assumes it to be valid. Starting from this snapshot, the node can quickly sync to the chain tip (e.g., if the snapshot is from block 850,000
and the tip is at height 870,000
, only 20,000 blocks need to be validated to get a synced node).
This approach bypasses most of the IBD time, enabling rapid node synchronization while still silently completing IBD in the background to fully validate the UTXO set snapshot. It builds on the Assume-Valid
concept, relying on the open-source process to ensure the correctness of hardcoded UTXO set hashes.
This idea, adapted to Floresta, is what we call Assume-Utreexo
, a hardcoded UTXO snapshot in the form of utreexo roots. These hardcoded values are located in pruned_utreexo/chainparams.rs, alongside the Assume-Valid
hashes.
// Path: floresta-chain/src/pruned_utreexo/chainparams.rs
impl ChainParams {
pub fn get_assume_utreexo(network: Network) -> AssumeUtreexoValue {
let genesis = genesis_block(Params::new(network.into()));
match network {
Network::Bitcoin => AssumeUtreexoValue {
block_hash: bhash!(
"00000000000000000000569f4d863c27e667cbee8acc8da195e7e5551658e6e9"
),
height: 855571,
roots: [
// Hardcoded roots are here
"4dcc014cc23611dda2dcf0f34a3e62e7d302146df4b0b01ac701d440358c19d6",
"988e0a883e4ad0c5559432f4747395115112755ec1138dcdd62e2f5741c31c2c",
"49ecba683e12823d44f2ad190120d3028386d8bb7860a3eea62a250a1f293c60",
"7c02e55ae35f12501134f0b81a351abb6c5e7a2529641d0c537a7534a560c770",
"59cb07c73d71164ce1a4f953cfd01ef0e3269080e29d34022d4251523cb1e8ac",
"ff96c9983b6765092403f8089fe5d0cdd6a94c58e4dcd14e77570c8b10c17628",
"47ed934529b2ea03a7382febcf0c05e0bfc5884cc1235c2ad42624a56234b9a6",
"d5c9373ed35de281d426888bd656f04a36623197a33706932ab82014d67f26ae",
"05de50991df991f0b78d9166d06ce3c61cb29e07dc7c53ba75d75df6455e6967",
"ebfdaf53b7240e9cd25d7c63b35d462763253f9282cc97d8d0c92ea9ade6aa02",
"c349b6850f75346224cf7cf1e0a69e194306c59489017cd4f4a045c001f1fefc",
"7edfd925905e88fd14c47edaaf09606cf0ae19f3b898239a2feb607d175d9a90",
"442dadd38fd16949d2ef03d799aa6b61ad8c0b7c611aaa5e218bc6360c4f41ce",
"2a57b73e540c7a72cb44fdc4ab7fcc3f0f148be7885667f07fce345430f08a15",
"66dc66000a8baaacacef280783a0245b4d33bd7eba5f1f14b939bd3a54e135cb",
"67ba89afe6bce9bafbf0b88013e4446c861e6c746e291c3921e0b65c93671ba3",
"972ea2c7472c22e4eab49e9c2db5757a048b271b6251883ce89ccfeaa38b47ab",
]
.map(|s| s.parse().unwrap())
.to_vec(),
leaves: 2587882501,
},
Network::Testnet => AssumeUtreexoValue {
// ...
block_hash: genesis.block_hash(),
height: 0,
leaves: 0,
roots: Vec::new(),
},
Network::Signet => AssumeUtreexoValue {
// ...
block_hash: genesis.block_hash(),
height: 0,
leaves: 0,
roots: Vec::new(),
},
Network::Regtest => AssumeUtreexoValue {
// ...
block_hash: genesis.block_hash(),
height: 0,
leaves: 0,
roots: Vec::new(),
},
}
}
// ...
pub fn get_assume_valid(network: Network, arg: AssumeValidArg) -> Option<BlockHash> {
match arg {
AssumeValidArg::Disabled => None,
AssumeValidArg::UserInput(hash) => Some(hash),
AssumeValidArg::Hardcoded => match network {
Network::Bitcoin => Some(bhash!(
"00000000000000000000569f4d863c27e667cbee8acc8da195e7e5551658e6e9"
)),
Network::Testnet => Some(bhash!(
"000000000000001142ad197bff16a1393290fca09e4ca904dd89e7ae98a90fcd"
)),
Network::Signet => Some(bhash!(
"0000003ed17b9c93954daab00d73ccbd0092074c4ebfc751c7458d58b827dfea"
)),
Network::Regtest => Some(bhash!(
"0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206"
)),
},
}
}
}
If you click see more, you will notice we have 17 utreexo roots there, and this is the accumulator for more than 2 billion UTXOs!
PoW Fraud Proofs Sync
PoW Fraud Proofs Sync is yet another technique to speedup node synchronization, which was ideated by Ruben Somsen. It is similar in nature to running a light or SPV client, but with almost the security of a full node. This is the most powerful IBD optimization that Floresta implements, alongside Assume-Utreexo
.
The idea that underlies this type of sync is that we can treat blockchain forks as potential-fraud proofs. If a miner creates an invalid block (violating consensus rules), honest miners will not mine on top of such block. Instead, honest miners will fork the chain by mining an alternative, valid block at the same height.
As long as a small fraction of miners remains honest and produces at least one block, a non-validating observer can interpret blockchain forks as indicators of potentially invalid blocks, and will always catch any invalid block.
The PoW Fraud Proof sync process begins by identifying the most PoW chain, which only requires downloading block headers:
- If no fork is found, the node assumes the most PoW chain is valid and begins validating blocks starting close to the chain tip.
- If a fork is found, this suggests a potential invalid block in the most PoW chain (prompting honest miners to fork away). The node downloads and verifies the disputed block, which requires using the UTXO accumulator for that block. If valid, the node continues following the most PoW chain; if invalid, it switches to the alternative branch.
This method bypasses almost entirely the IBD verification while maintaining security. It relies on a small minority of honest hashpower (e.g., ~1%) to fork away from invalid chains, which we use to detect the invalid blocks.
In short, PoW Fraud Proofs Sync requires at least some valid blocks to be produced for invalid ones to be detected, whereas a regular full node, by validating every block, can detect invalid blocks even with 0% honest miners (though in that extreme case, the entire network would be in serious trouble 😄).
Hence, a PoW Fraud Proof synced node is vulnerable only when the Bitcoin chain is halted for an extended period of time, which would be catastrophic anyway. Check out this blog post by Davidson for more details.
Using PartialChainState as Chain Backend
As you may be guessing, PartialChainState
is an alternative chain validation backend made to leverage the parallel sync feature we have described (out-of-order validation).
In the Chain Backend API section from Chapter 1 we saw that the UpdatableChainstate
trait requires a get_partial_chain
method.
// Path: floresta-chain/src/pruned_utreexo/mod.rs
pub trait UpdatableChainstate {
fn connect_block(
&self,
block: &Block,
proof: Proof,
inputs: HashMap<OutPoint, TxOut>,
del_hashes: Vec<sha256::Hash>,
) -> Result<u32, BlockchainError>;
fn switch_chain(&self, new_tip: BlockHash) -> Result<(), BlockchainError>;
fn accept_header(&self, header: BlockHeader) -> Result<(), BlockchainError>;
fn handle_transaction(&self) -> Result<(), BlockchainError>;
fn flush(&self) -> Result<(), BlockchainError>;
fn toggle_ibd(&self, is_ibd: bool);
fn invalidate_block(&self, block: BlockHash) -> Result<(), BlockchainError>;
fn mark_block_as_valid(&self, block: BlockHash) -> Result<(), BlockchainError>;
fn get_root_hashes(&self) -> Vec<BitcoinNodeHash>;
// ...
fn get_partial_chain(
&self,
initial_height: u32,
final_height: u32,
acc: Stump,
) -> Result<PartialChainState, BlockchainError>;
// ...
fn mark_chain_as_assumed(&self, acc: Stump, tip: BlockHash) -> Result<bool, BlockchainError>;
}
The arguments that the method takes are indeed the block interval to validate and the accumulator at the start of the interval.
Just like ChainState
, PartialChainState
wraps an inner type which holds the actual data. However, instead of maintaining synchronization primitives, PartialChainState
assumes that only a single worker (thread or async task) will hold ownership at any given time. This design expects workers to operate independently, with each validating its assigned partial chain. Once all partial chains are validated, we can transition to the ChainState
backend.
Filename: pruned_utreexo/partial_chain.rs
// Path: floresta-chain/src/pruned_utreexo/partial_chain.rs
pub(crate) struct PartialChainStateInner {
/// The current accumulator state, it starts with a hardcoded value and
/// gets checked against the result of the previous partial chainstate.
pub(crate) current_acc: Stump,
/// The block headers in this interval, we need this to verify the blocks
/// and to build the accumulator. We assume this is sorted by height, and
/// should contain all blocks in this interval.
pub(crate) blocks: Vec<BlockHeader>,
/// The height this interval starts at. This [initial_height, final_height), so
/// if we break the interval at height 100, the first interval will be [0, 100)
/// and the second interval will be [100, 200). And the initial height of the
/// second interval will be 99.
pub(crate) initial_height: u32,
/// The height we are on right now, this is used to keep track of the progress
/// of the sync.
pub(crate) current_height: u32,
/// The height we are syncing up to, trying to push more blocks than this will
/// result in an error.
pub(crate) final_height: u32,
/// The error that occurred during validation, if any. It is here so we can
/// pull that afterwords.
pub(crate) error: Option<BlockValidationErrors>,
/// The consensus parameters, we need this to validate the blocks.
pub(crate) consensus: Consensus,
/// Whether we assume the signatures in this interval as valid, this is used to
/// speed up syncing, by assuming signatures in old blocks are valid.
pub(crate) assume_valid: bool,
}
We can see that PartialChainStateInner
has an assume_valid
field. By combining the parallel sync with Assume-Valid
we get a huge IBD speedup, with virtually no security trade-off. Most of the expensive script validations are skipped, while the remaining checks are performed in parallel and without disk access. In this IBD configuration, the primary bottleneck is likely network latency.
In the pruned_utreexo/partial_chain.rs file, we also find the BlockchainInterface
and UpdatableChainstate
implementations for PartialChainState
. These implementations are similar to those for ChainState
, but many methods remain unimplemented because PartialChainState
is designed specifically for IBD and operates with limited data. For instance:
// Path: floresta-chain/src/pruned_utreexo/partial_chain.rs
fn accept_header(&self, _header: BlockHeader) -> Result<(), BlockchainError> {
unimplemented!("partialChainState shouldn't be used to accept new headers")
}
Finally, there are very simple methods to get data from the status of validation of the partial chain:
// Path: floresta-chain/src/pruned_utreexo/partial_chain.rs
/// Returns whether any block inside this interval is invalid
pub fn has_invalid_blocks(&self) -> bool {
self.inner().error.is_some()
}
Moving On
Now that we’ve explored the power of utreexo and how Floresta leverages it, along with:
- The structure of
floresta-chain
, includingChainState
andPartialChainState
. - The use of the
ChainStore
trait forChainState
. - The consensus validation process.
We are now ready to fully delve into floresta-wire
: the chain provider.
UtreexoNode In-Depth
We have already explored most of the floresta-chain
crate! Now, it’s time to dive into the higher-level UtreexoNode
, first introduced in the utreexonode section of Chapter 1.
This chapter focuses entirely on the floresta-wire
crate, so stay tuned!
Revisiting the UtreexoNode Type
The two fields of UtreexoNode
are the NodeCommon<Chain>
struct and the generic Context
.
NodeCommon
represents the core state and data required for node operations, independent of the context. It keeps the Chain
backend, optional compact block filters, the mempool, the UtreexoNodeConfig
and Network
, peer connection data, networking configurations, and time-based events to enable effective node behavior and synchronization.
Filename: floresta-wire/src/p2p_wire/node.rs
// Path: floresta-wire/src/p2p_wire/node.rs
pub struct NodeCommon<Chain: BlockchainInterface + UpdatableChainstate> {
// 1. Core Blockchain and Transient Data
pub(crate) chain: Chain,
pub(crate) blocks: HashMap<BlockHash, (PeerId, UtreexoBlock)>,
pub(crate) mempool: Arc<tokio::sync::Mutex<Mempool>>,
pub(crate) block_filters: Option<Arc<NetworkFilters<FlatFiltersStore>>>,
pub(crate) last_filter: BlockHash,
// 2. Peer Management
pub(crate) peer_id_count: u32,
pub(crate) peer_ids: Vec<u32>,
pub(crate) peers: HashMap<u32, LocalPeerView>,
pub(crate) peer_by_service: HashMap<ServiceFlags, Vec<u32>>,
pub(crate) max_banscore: u32,
pub(crate) address_man: AddressMan,
// 3. Internal Communication
pub(crate) node_rx: UnboundedReceiver<NodeNotification>,
pub(crate) node_tx: UnboundedSender<NodeNotification>,
// 4. Networking Configuration
pub(crate) socks5: Option<Socks5StreamBuilder>,
pub(crate) fixed_peer: Option<LocalAddress>,
// 5. Time and Event Tracking
pub(crate) inflight: HashMap<InflightRequests, (u32, Instant)>,
pub(crate) last_headers_request: Instant,
pub(crate) last_tip_update: Instant,
pub(crate) last_connection: Instant,
pub(crate) last_peer_db_dump: Instant,
pub(crate) last_block_request: u32,
pub(crate) last_get_address_request: Instant,
pub(crate) last_broadcast: Instant,
pub(crate) last_send_addresses: Instant,
pub(crate) block_sync_avg: FractionAvg,
// 6. Configuration and Metadata
pub(crate) config: UtreexoNodeConfig,
pub(crate) datadir: String,
pub(crate) network: Network,
}
pub struct UtreexoNode<Chain: BlockchainInterface + UpdatableChainstate, Context> {
pub(crate) common: NodeCommon<Chain>,
pub(crate) context: Context,
}
On the other hand, the Context
generic in UtreexoNode
will allow the node to implement additional functionality and manage data specific to a particular context. This is explained in the next section.
Although the Context
generic in the type definition is not constrained by any trait, in the UtreexoNode
implementation block (from the same node.rs file), this generic is bound by both a NodeContext
trait and the Default
trait.
// Path: floresta-wire/src/p2p_wire/node.rs
impl<T, Chain> UtreexoNode<Chain, T>
where
T: 'static + Default + NodeContext,
WireError: From<<Chain as BlockchainInterface>::Error>,
Chain: BlockchainInterface + UpdatableChainstate + 'static,
{
// ...
In this implementation block, we also encounter a WireError
, which serves as the unified error type for methods in UtreexoNode
. It must implement the From
trait to convert errors produced by the Chain
backend via the BlockchainInterface
trait.
The
WireError
type is defined in the p2p_wire/error.rs file and is the primary error type infloresta-wire
, alongsidePeerError
, located in p2p_wire/peer.rs.
How to Access Inner Fields
To avoid repetitively calling self.common.field_name
to access the many inner NodeCommon
fields, UtreexoNode
implements the Deref
and DerefMut
traits. This means that we can access the NodeCommon
fields as if they were fields of UtreexoNode
.
// Path: floresta-wire/src/p2p_wire/node.rs
impl<Chain: BlockchainInterface + UpdatableChainstate, T> Deref for UtreexoNode<Chain, T> {
fn deref(&self) -> &Self::Target {
&self.common
}
type Target = NodeCommon<Chain>;
}
impl<T, Chain: BlockchainInterface + UpdatableChainstate> DerefMut for UtreexoNode<Chain, T> {
fn deref_mut(&mut self) -> &mut Self::Target {
&mut self.common
}
}
However, the Context
generic still needs to be accessed explicitly via self.context
.
The Role of UtreexoNode
UtreexoNode
acts as the central task managing critical events like:
- Receiving new blocks: Handling block announcements and integrating them into the blockchain.
- Managing peer connections and disconnections: Establishing, monitoring, and closing connections with other nodes.
- Updating peer addresses: Discovering, maintaining, and sharing network addresses of peers for efficient connectivity.
While the node orchestrates these high-level responsibilities, peer-specific tasks (e.g., handling pings and other peer messages) are delegated to the Floresta Peer
type, ensuring a clean separation of concerns. The Peer
component and its networking functionality will be explored in the next chapter.
In networking, a ping is a message sent between peers to check if they are online and responsive, often used to measure latency or maintain active connections.
In the next section we will understand the NodeContext
trait and the different node Context
s that Floresta implements.
Node Contexts
A Bitcoin client goes through different phases during its lifetime, each with distinct responsibilities and requirements. To manage these phases, Floresta uses node contexts, which encapsulate the specific logic and behavior needed for each phase.
Instead of creating a monolithic UtreexoNode
struct with all possible logic and conditional branches, the design separates shared functionality and phase-specific logic. The base UtreexoNode
struct handles common features, while contexts, passed as generic parameters implementing the NodeContext
trait, handle the specific behavior for each phase. This modular design simplifies the codebase and makes it more maintainable.
Default Floresta Contexts
The three node contexts in Floresta are:
-
ChainSelector: This context identifies the best proof-of-work (PoW) chain by downloading and evaluating multiple candidates. It quickly determines the chain with the highest PoW, as further client operations depend on this selection.
-
SyncNode: Responsible for downloading and verifying all blocks in the selected chain, this context ensures the chain's validity. Although it is computationally expensive and time-consuming, it guarantees that the chain is fully validated.
-
RunningNode: The primary context during normal operation, it starts after
ChainSelector
finishes. This context processes new blocks (even ifSyncNode
is still running) and handles user requests.
The only NodeContext
implementation that is part of the public API of floresta-wire
is the RunningNode
. You can check the exported modules in the lib.rs:
Filename: floresta-wire/src/lib.rs
// Path: floresta-wire/src/lib.rs
#[cfg(not(target_arch = "wasm32"))]
mod p2p_wire;
use bitcoin::block::Header as BlockHeader;
use bitcoin::Block;
use bitcoin::Transaction;
// Module re-exports
#[cfg(not(target_arch = "wasm32"))]
pub use p2p_wire::address_man;
#[cfg(not(target_arch = "wasm32"))]
pub use p2p_wire::mempool;
#[cfg(not(target_arch = "wasm32"))]
pub use p2p_wire::node;
#[cfg(not(target_arch = "wasm32"))]
pub use p2p_wire::node_context;
#[cfg(not(target_arch = "wasm32"))]
pub use p2p_wire::node_interface;
#[cfg(not(target_arch = "wasm32"))]
pub use p2p_wire::running_node;
// Re-export of the configuration struct
pub use p2p_wire::UtreexoNodeConfig;
However, RunningNode
will automatically and internally switch to the other two contexts when the UtreexoNode
is created, and then switch back. Hence, the RunningNode
context is the default and high-level context that runs in florestad
.
The NodeContext Trait
The following is the NodeContext
trait definition, which holds many useful node constants and provides one method. Click see more in the snippet if you would like to see all the comments for each const
.
Filename: p2p_wire/node_context.rs
// Path: floresta-wire/src/p2p_wire/node_context.rs
use bitcoin::p2p::ServiceFlags;
pub trait NodeContext {
const REQUEST_TIMEOUT: u64;
/// Max number of simultaneous connections we initiates we are willing to hold
const MAX_OUTGOING_PEERS: usize = 10;
/// We ask for peers every ASK_FOR_PEERS_INTERVAL seconds
const ASK_FOR_PEERS_INTERVAL: u64 = 60 * 60; // One hour
/// Save our database of peers every PEER_DB_DUMP_INTERVAL seconds
const PEER_DB_DUMP_INTERVAL: u64 = 30; // 30 seconds
/// Attempt to open a new connection (if needed) every TRY_NEW_CONNECTION seconds
const TRY_NEW_CONNECTION: u64 = 10; // 10 seconds
/// If ASSUME_STALE seconds passed since our last tip update, treat it as stale
const ASSUME_STALE: u64 = 15 * 60; // 15 minutes
/// While on IBD, if we've been without blocks for this long, ask for headers again
const IBD_REQUEST_BLOCKS_AGAIN: u64 = 30; // 30 seconds
/// How often we broadcast transactions
const BROADCAST_DELAY: u64 = 30; // 30 seconds
/// Max number of simultaneous inflight requests we allow
const MAX_INFLIGHT_REQUESTS: usize = 1_000;
/// Interval at which we open new feeler connections
const FEELER_INTERVAL: u64 = 30; // 30 seconds
/// Interval at which we rearrange our addresses
const ADDRESS_REARRANGE_INTERVAL: u64 = 60 * 60; // 1 hour
/// How long we ban a peer for
const BAN_TIME: u64 = 60 * 60 * 24;
/// How often we check if we haven't missed a block
const BLOCK_CHECK_INTERVAL: u64 = 60 * 5; // 5 minutes
/// How often we send our addresses to our peers
const SEND_ADDRESSES_INTERVAL: u64 = 60 * 60; // 1 hour
fn get_required_services(&self) -> ServiceFlags {
ServiceFlags::NETWORK
}
}
pub(crate) type PeerId = u32;
The get_required_services
implementation defaults to returning ServiceFlags::NETWORK
, meaning the node is capable of serving the complete blockchain to peers on the network. This is a placeholder implementation, and as we’ll see, each context will provide its own ServiceFlags
based on its role. Similarly, the constants can be customized when implementing NodeContext
for a specific type.
bitcoin::p2p::ServiceFlags
is a fundamental type that represents the various services that nodes in the network can offer. Floresta extends this functionality by defining two additional service flags infloresta-common
, which can be convertedinto
theServiceFlags
type.// Path: floresta-common/src/lib.rs /// Non-standard service flags that aren't in rust-bitcoin yet pub mod service_flags { /// This peer supports UTREEXO messages pub const UTREEXO: u64 = 1 << 24; /// This peer supports UTREEXO filter messages pub const UTREEXO_FILTER: u64 = 1 << 25; }
Finally, we find the PeerId
type alias that will be used to give each peer a specific identifier. This is all the code in node_context.rs.
Implementation Example
When we implement NodeContext
for any meaningful type, we also implement the context-specific methods for UtreexoNode
. Let's suppose we have a MyContext
type:
// Example implementation for MyContext
impl NodeContext for MyContext {
fn get_required_services(&self) -> bitcoin::p2p::ServiceFlags {
// The node under MyContext can serve the whole blockchain as well
// as segregated witness (SegWit) data to other peers
ServiceFlags::WITNESS | ServiceFlags::NETWORK
}
const REQUEST_TIMEOUT: u64 = 10 * 60; // 10 minutes
}
// The following `impl` block is only for `UtreexoNode`s that use MyContext
impl<Chain> UtreexoNode<MyContext, Chain>
where
WireError: From<<Chain as BlockchainInterface>::Error>,
Chain: BlockchainInterface + UpdatableChainstate + 'static,
{
// Methods for UtreexoNode in MyContext
}
Since UtreexoNode<RunningNode, _>
, UtreexoNode<SyncNode, _>
, and UtreexoNode<ChainSelector, _>
are all entirely different types (because Rust considers each generic parameter combination a separate type), we cannot mistakenly use methods that are not intended for a specific context.
The only shared functionality is that which is implemented generically for T: NodeContext
, which all three contexts satisfy, located in p2p_wire/node.rs (as we have seen in the previous section).
In the next sections within this chapter we will see some of these shared functions and methods implementations.
UtreexoNode Config and Builder
In this section and the next ones within this chapter, we are going to understand some methods implemented generically for UtreexoNode
, that is, methods implemented for UtreexoNode
under T: NodeContext
rather than a specific context type.
While ChainSelector
, SyncNode
, and RunningNode
provide impl
blocks for UtreexoNode
, those methods can only be used in a single context. Here, we will explore the functionality shared across all contexts.
UtreexoNode Builder
Let's start with the builder function, taking a UtreexoNodeConfig
, the Chain
backend, a mempool type, and optional compact block filters (from the floresta-compact-filters
crate).
// Path: floresta-wire/src/p2p_wire/node.rs
impl<T, Chain> UtreexoNode<Chain, T>
where
T: 'static + Default + NodeContext,
WireError: From<<Chain as BlockchainInterface>::Error>,
Chain: BlockchainInterface + UpdatableChainstate + 'static,
{
pub fn new(
config: UtreexoNodeConfig,
chain: Chain,
mempool: Arc<Mutex<Mempool>>,
block_filters: Option<Arc<NetworkFilters<FlatFiltersStore>>>,
) -> Result<Self, WireError> {
let (node_tx, node_rx) = unbounded_channel();
let socks5 = config.proxy.map(Socks5StreamBuilder::new);
let fixed_peer = config
.fixed_peer
.as_ref()
.map(|address| {
Self::resolve_connect_host(address, Self::get_port(config.network.into()))
})
.transpose()?;
Ok(UtreexoNode {
common: NodeCommon {
// Initialization of many fields :P
block_sync_avg: FractionAvg::new(0, 0),
last_filter: chain.get_block_hash(0).unwrap(),
block_filters,
inflight: HashMap::new(),
peer_id_count: 0,
peers: HashMap::new(),
last_block_request: chain.get_validation_index().expect("Invalid chain"),
chain,
peer_ids: Vec::new(),
peer_by_service: HashMap::new(),
mempool,
network: config.network.into(),
node_rx,
node_tx,
address_man: AddressMan::default(),
last_headers_request: Instant::now(),
last_tip_update: Instant::now(),
last_connection: Instant::now(),
last_peer_db_dump: Instant::now(),
last_broadcast: Instant::now(),
blocks: HashMap::new(),
last_get_address_request: Instant::now(),
last_send_addresses: Instant::now(),
datadir: config.datadir.clone(),
socks5,
max_banscore: config.max_banscore,
fixed_peer,
config,
},
context: T::default(),
})
}
// ...
The UtreexoNodeConfig
type outlines all the customizable options for running a UtreexoNode
. It specifies essential settings like the network, connection preferences, and resource management options. It also has a UtreexoNodeConfig::default
implementation. This type is better explained below.
Then, the Mempool
type that we see in the signature is a very simple mempool implementation for broadcasting transactions to the network.
The first line creates an unbounded multi-producer, single-consumer (mpsc) channel, allowing multiple tasks in the UtreexoNode
to send messages (via the sender node_tx
) to a central task that processes them (via the receiver node_rx
). If you are not familiar with channels, there's a section from the Rust book that covers them. Here, we use tokio
channels instead of Rust's standard library channels.
The
unbounded_channel
has no backpressure, meaning producers can keep sending messages without being slowed down, even if the receiver is behind. While this is convenient, it comes with the risk of excessive memory use if messages accumulate faster than they are processed.
The second line sets up a SOCKS5 proxy connection if a proxy address (config.proxy
) is provided. SOCKS5 is a protocol that routes network traffic through a proxy server, useful for privacy or bypassing network restrictions. The Socks5StreamBuilder
is a wrapper for core::net::SocketAddr
, implemented in p2p_wire/socks.rs. If no proxy is configured, the node will connect directly to the network.
// Path: floresta-wire/src/p2p_wire/socks.rs
impl Socks5StreamBuilder {
pub fn new(address: SocketAddr) -> Self {
Self { address }
}
// ...
Thirdly, we take the config.fixed_peer
, which is an Option<String>
, and convert it into an Option<LocalAddress>
by using the UtreexoNode::resolve_connect_host
function. LocalAddress
is the local representation of a peer address in Floresta, which we will explore in the Address Manager section.
Finally, the function returns the UtreexoNode
with all the NodeCommon
fields initialized and the default value of the passed NodeContext
.
UtreexoNodeConfig
Let's now check what are the customizable options for the node, that is, the UtreexoNodeConfig
struct.
It starts with the network
field (i.e. Bitcoin
, Testnet
, Regtest
or Signet
). Next we find the key options to enable pow_fraud_proofs
for a very fast node sync, which we learned about in the PoW Fraud Proofs Sync section from last chapter, and compact_filters
for lightweight blockchain rescans.
A fixed peer can be specified via fixed_peer
, and settings like max_banscore
, max_outbound
, and max_inflight
allow fine-tuning of peer management, connection limits, and parallel requests.
Filename: p2p_wire/mod.rs
// Path: floresta-wire/src/p2p_wire/mod.rs
pub struct UtreexoNodeConfig {
/// The blockchain we are in, defaults to Bitcoin. Possible values are Bitcoin,
/// Testnet, Regtest and Signet.
pub network: Network,
/// Whether to use PoW fraud proofs. Defaults to false.
///
/// PoW fraud proof is a mechanism to skip the verification of the whole blockchain,
/// but while also giving a better security than simple SPV.
pub pow_fraud_proofs: bool,
/// Whether to use compact filters. Defaults to false.
///
/// Compact filters are useful to rescan the blockchain for a specific address, without
/// needing to download the whole chain. It will download ~1GB of filters, and then
/// download the blocks that match the filters.
pub compact_filters: bool,
/// Fixed peers to connect to. Defaults to None.
///
/// If you want to connect to a specific peer, you can set this to a string with the
/// format `ip:port`. For example, `localhost:8333`.
pub fixed_peer: Option<String>,
/// If a peer misbehaves, we increase its ban score. If the ban score reaches this value,
/// we disconnect from the peer. Defaults to 100.
pub max_banscore: u32,
/// Maximum number of outbound connections. Defaults to 8.
pub max_outbound: u32,
/// Maximum number of inflight requests. Defaults to 10.
///
/// More inflight requests means more memory usage, but also more parallelism.
pub max_inflight: u32,
// ...
/// Data directory for the node. Defaults to `.floresta-node`.
pub datadir: String,
/// A SOCKS5 proxy to use. Defaults to None.
pub proxy: Option<SocketAddr>,
/// If enabled, the node will assume that the provided Utreexo state is valid, and will
/// start running from there
pub assume_utreexo: Option<AssumeUtreexoValue>,
/// If we assumeutreexo or pow_fraud_proof, we can skip the IBD and make our node usable
/// faster, with the tradeoff of security. If this is enabled, we will still download the
/// blocks in the background, and verify the final Utreexo state. So, the worse case scenario
/// is that we are vulnerable to a fraud proof attack for a few hours, but we can spot it
/// and react in a couple of hours at most, so the attack window is very small.
pub backfill: bool,
/// If we are using network-provided block filters, we may not need to download the whole
/// chain of filters, as our wallets may not have been created at the beginning of the chain.
/// With this option, we can make a rough estimate of the block height we need to start
/// and only download the filters from that height.
///
/// If the value is negative, it's relative to the current tip. For example, if the current
/// tip is at height 1000, and we set this value to -100, we will start downloading filters
/// from height 900.
pub filter_start_height: Option<i32>,
/// The user agent that we will advertise to our peers. Defaults to `floresta:<version>`.
pub user_agent: String,
}
impl Default for UtreexoNodeConfig {
fn default() -> Self {
UtreexoNodeConfig {
network: Network::Bitcoin,
pow_fraud_proofs: false,
compact_filters: false,
fixed_peer: None,
max_banscore: 100,
max_outbound: 8,
max_inflight: 10,
datadir: ".floresta-node".to_string(),
proxy: None,
backfill: false,
assume_utreexo: None,
filter_start_height: None,
user_agent: format!("floresta:{}", env!("CARGO_PKG_VERSION")),
}
}
}
Additional configurations include datadir
for specifying the node's data directory and an optional proxy
for connecting through a SOCKS5 proxy. Then we see the assume_utreexo
option, which we also explained in the Trusted UTXO Set Snapshots section from last chapter.
// Path: floresta-wire/src/p2p_wire/mod.rs
pub struct UtreexoNodeConfig {
/// The blockchain we are in, defaults to Bitcoin. Possible values are Bitcoin,
/// Testnet, Regtest and Signet.
pub network: Network,
/// Whether to use PoW fraud proofs. Defaults to false.
///
/// PoW fraud proof is a mechanism to skip the verification of the whole blockchain,
/// but while also giving a better security than simple SPV.
pub pow_fraud_proofs: bool,
/// Whether to use compact filters. Defaults to false.
///
/// Compact filters are useful to rescan the blockchain for a specific address, without
/// needing to download the whole chain. It will download ~1GB of filters, and then
/// download the blocks that match the filters.
pub compact_filters: bool,
/// Fixed peers to connect to. Defaults to None.
///
/// If you want to connect to a specific peer, you can set this to a string with the
/// format `ip:port`. For example, `localhost:8333`.
pub fixed_peer: Option<String>,
/// If a peer misbehaves, we increase its ban score. If the ban score reaches this value,
/// we disconnect from the peer. Defaults to 100.
pub max_banscore: u32,
/// Maximum number of outbound connections. Defaults to 8.
pub max_outbound: u32,
/// Maximum number of inflight requests. Defaults to 10.
///
/// More inflight requests means more memory usage, but also more parallelism.
pub max_inflight: u32,
/// Data directory for the node. Defaults to `.floresta-node`.
// ...
pub datadir: String,
/// A SOCKS5 proxy to use. Defaults to None.
pub proxy: Option<SocketAddr>,
/// If enabled, the node will assume that the provided Utreexo state is valid, and will
/// start running from there
pub assume_utreexo: Option<AssumeUtreexoValue>,
/// If we assumeutreexo or pow_fraud_proof, we can skip the IBD and make our node usable
/// faster, with the tradeoff of security. If this is enabled, we will still download the
/// blocks in the background, and verify the final Utreexo state. So, the worse case scenario
/// is that we are vulnerable to a fraud proof attack for a few hours, but we can spot it
/// and react in a couple of hours at most, so the attack window is very small.
pub backfill: bool,
/// If we are using network-provided block filters, we may not need to download the whole
/// chain of filters, as our wallets may not have been created at the beginning of the chain.
/// With this option, we can make a rough estimate of the block height we need to start
/// and only download the filters from that height.
///
/// If the value is negative, it's relative to the current tip. For example, if the current
/// tip is at height 1000, and we set this value to -100, we will start downloading filters
/// from height 900.
pub filter_start_height: Option<i32>,
/// The user agent that we will advertise to our peers. Defaults to `floresta:<version>`.
pub user_agent: String,
}
impl Default for UtreexoNodeConfig {
fn default() -> Self {
UtreexoNodeConfig {
network: Network::Bitcoin,
pow_fraud_proofs: false,
compact_filters: false,
fixed_peer: None,
max_banscore: 100,
max_outbound: 8,
max_inflight: 10,
datadir: ".floresta-node".to_string(),
proxy: None,
backfill: false,
assume_utreexo: None,
filter_start_height: None,
user_agent: format!("floresta:{}", env!("CARGO_PKG_VERSION")),
}
}
}
If one of pow_fraud_proofs
or assume_utreexo
is set, the backfill
option enables background and full validation of the chain, which is recommended for security since the node skipped the IBD.
For block filters, filter_start_height
helps optimize downloads by starting from a specific height rather than the chain’s beginning. Lastly, user_agent
allows nodes to customize their identifier when advertising to peers.
This flexible configuration ensures adaptability for various use cases and security levels, from development to production.
Opening Connections
In this section we are finally going to understand how Floresta connects to peers. The highest-level method for opening connections is maybe_open_connection
, which will redirect us to other lower-level functionality. Remember that these methods are context-independent.
When to Open Connections
The maybe_open_connection
method determines whether the node should establish a new connection to a peer and, if so, calls create_connection
.
// Path: floresta-wire/src/p2p_wire/node.rs
pub(crate) async fn maybe_open_connection(&mut self) -> Result<(), WireError> {
// If the user passes in a `--connect` cli argument, we only connect with
// that particular peer.
if self.fixed_peer.is_some() && !self.peers.is_empty() {
return Ok(());
}
// if we need utreexo peers, we can bypass our max outgoing peers limit in case
// we don't have any utreexo peers
let bypass = self
.context
.get_required_services()
.has(service_flags::UTREEXO.into())
&& !self.has_utreexo_peers();
if self.peers.len() < T::MAX_OUTGOING_PEERS || bypass {
self.create_connection(ConnectionKind::Regular).await;
}
Ok(())
}
If the user has specified a fixed peer via the --connect
command-line argument (self.fixed_peer.is_some()
) and there are already connected peers (!self.peers.is_empty()
), the method does nothing and exits early. This is because we have already connected to the fixed peer.
Also, if the node needs utreexo-related services (UTREEXO
service flag) for its specific context, but doesn’t have any peers offering them (!self.has_utreexo_peers()
), it sets a bypass
flag to ignore the usual connection limit.
Finally, if the number of peers is below the maximum allowed (self.peers.len() < T::MAX_OUTGOING_PEERS
) or the bypass
condition is true, it calls the create_connection
method to establish a new 'regular' connection to a peer.
The ConnectionKind
struct that create_connection
takes as argument is explained below.
Connection Kinds
// Path: floresta-wire/src/p2p_wire/node.rs
pub enum ConnectionKind {
Feeler,
Regular,
Extra,
}
Feeler Connections
Feeler connections are temporary probes used to verify if a peer is still active, regardless of its supported services. These lightweight tests help maintain an up-to-date and reliable pool of peers, ensuring the node can quickly establish connections when needed.
Regular Connections
Regular connections are the backbone of a node's peer-to-peer communication. These connections are established with trusted peers or those that meet specific service criteria (e.g., support for Utreexo or compact filters). Regular connections are long-lived and handle the bulk of the node's operations, such as exchanging blocks, headers, transactions, and keeping the node in sync.
Extra Connections
Extra connections extend the node’s reach by connecting to additional peers for specialized tasks, such as compact filter requests or fetching Utreexo proofs. These are temporary and created only when extra resources are required.
Create Connection
create_connection
gets required services via another method from UtreexoNode
, gets a peer address (prioritizing the fixed peer if specified), and ensures the peer isn’t already connected.
If no fixed peer is specified, we obtain a suitable peer address (or LocalAddress
) for connection by calling self.address_man.get_address_to_connect
. This method takes the required services and a boolean indicating whether a feeler connection is desired. We will explore this method in the next section.
// Path: floresta-wire/src/p2p_wire/node.rs
pub(crate) async fn create_connection(&mut self, kind: ConnectionKind) -> Option<()> {
let required_services = self.get_required_services();
let address = match &self.fixed_peer {
Some(address) => Some((0, address.clone())),
None => self
.address_man
.get_address_to_connect(required_services, matches!(kind, ConnectionKind::Feeler)),
};
debug!(
"attempting connection with address={:?} kind={:?}",
address, kind
);
let (peer_id, address) = address?;
let now = SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap()
.as_secs();
// Defaults to failed, if the connection is successful, we'll update the state
self.address_man
.update_set_state(peer_id, AddressState::Failed(now));
// Don't connect to the same peer twice
if self
.common
.peers
.iter()
.any(|peers| peers.1.address == address.get_net_address())
{
return None;
}
self.open_connection(kind, peer_id, address).await;
Some(())
}
Then we use the obtained LocalAddress
and the peer identifier as arguments for open_connection
, as well as the connection kind.
Both the LocalAddress
type and the get_address_to_connect
method are implemented in the address manager module (p2p_wire/address_man.rs) that we will see in the next section.
Open Connection
Moving on to open_connection
, we create a new unbounded_channel
for sending requests to the Peer
instance. Recall that the Peer
component is in charge of actually connecting to the respective peer over the network.
Then, depending on the value of self.socks5
we will call UtreexoNode::open_proxy_connection
or UtreexoNode::open_non_proxy_connection
. Each one of these functions will create a Peer
instance with the provided data and the channel receiver.
// Path: floresta-wire/src/p2p_wire/node.rs
pub(crate) async fn open_connection(
&mut self,
kind: ConnectionKind,
peer_id: usize,
address: LocalAddress,
) {
let (requests_tx, requests_rx) = unbounded_channel();
if let Some(ref proxy) = self.socks5 {
spawn(timeout(
Duration::from_secs(10),
Self::open_proxy_connection(
// Arguments omitted for brevity :P
proxy.address,
kind,
self.mempool.clone(),
self.network.into(),
self.node_tx.clone(),
peer_id,
address.clone(),
requests_rx,
self.peer_id_count,
self.config.user_agent.clone(),
),
));
} else {
spawn(timeout(
Duration::from_secs(10),
Self::open_non_proxy_connection(
// Arguments omitted for brevity :P
kind,
peer_id,
address.clone(),
requests_rx,
self.peer_id_count,
self.mempool.clone(),
self.network.into(),
self.node_tx.clone(),
self.config.user_agent.clone(),
),
));
}
let peer_count: u32 = self.peer_id_count;
self.inflight.insert(
InflightRequests::Connect(peer_count),
(peer_count, Instant::now()),
);
self.peers.insert(
peer_count,
LocalPeerView {
// Fields omitted for brevity :P
address: address.get_net_address(),
port: address.get_port(),
user_agent: "".to_string(),
state: PeerStatus::Awaiting,
channel: requests_tx,
services: ServiceFlags::NONE,
_last_message: Instant::now(),
kind,
address_id: peer_id as u32,
height: 0,
banscore: 0,
},
);
self.peer_id_count += 1;
}
Last of all, we simply insert the new inflight request (via the InflightRequests
type) to our tracker HashMap
, as well as the new peer (via the LocalPeerView
). Both types are also defined in p2p_wire/node.rs, along with UtreexoNode
, NodeCommon
, ConnectionKind
, and a few other types.
Recap
In this section, we have learned how Floresta establishes peer-to-peer connections, starting with the maybe_open_connection
method. This method initiates a connection if we aren't already connected to the optional fixed peer and either have fewer connections than Context::MAX_OUTGOING_PEERS
or lack a peer offering utreexo services.
We explored the three connection types: Feeler
(peer availability check), Regular
(core communication), and Extra
(specialized services). The create_connection
method selects an appropriate peer address while preventing duplicate connections, and open_connection
handles the network setup, either via a proxy or directly (internally creating a new Peer
). Finally, we examined how new connections are tracked using inflight requests and a peer registry, both fields of NodeCommon
.
Address Manager
Before diving into the details of P2P networking, let's understand the crucial address manager module.
In the last section, we saw that the create_connection
method uses the self.address_man.get_address_to_connect
method to obtain a suitable address for connection. This method belongs to the AddressMan
struct (a field from NodeCommon
), which is responsible for maintaining a record of known peer addresses and providing them to the node when needed.
Filename: p2p_wire/address_man.rs
// Path: floresta-wire/src/p2p_wire/address_man.rs
pub struct AddressMan {
addresses: HashMap<usize, LocalAddress>,
good_addresses: Vec<usize>,
good_peers_by_service: HashMap<ServiceFlags, Vec<usize>>,
peers_by_service: HashMap<ServiceFlags, Vec<usize>>,
}
This struct is straightforward: it keeps known peer addresses in a HashMap
(as LocalAddress
), the indexes of good peers in the map, and associates peers with the services they support.
Local Address
We have also encountered the LocalAddress
type a few times, which is implemented in this module. This type encapsulates all the information our node knows about each peer, effectively serving as our local representation of the peer's details.
// Path: floresta-wire/src/p2p_wire/address_man.rs
pub struct LocalAddress {
/// An actual address
address: AddrV2,
/// Last time we successfully connected to this peer, in secs, only relevant if state == State::Tried
last_connected: u64,
/// Our local state for this peer, as defined in AddressState
state: AddressState,
/// Network services announced by this peer
services: ServiceFlags,
/// Network port this peers listens to
port: u16,
/// Random id for this peer
pub id: usize,
}
The actual address is stored in the form of an AddrV2
enum, which is implemented by the bitcoin
crate. AddrV2
represents various network address types supported in Bitcoin's P2P protocol, as defined in BIP155.
Concretely,
AddrV2
includes variants forIPv4
,IPv6
,Tor
(v2 and v3),I2P
,Cjdns
addresses, and anUnknown
variant for unrecognized address types. This design allows the protocol to handle a diverse set of network addresses.
The LocalAddress
also stores the last connection date or time, measured as seconds since the UNIX_EPOCH, an AddressState
struct, the network services announced by the peer, the port that the peer listens to, and its identifier.
Below is the definition of AddressState
, which tracks the current status and history of our interactions with this peer:
// Path: floresta-wire/src/p2p_wire/address_man.rs
pub enum AddressState {
/// We never tried this peer before, so we don't know what to expect. This variant
/// also applies to peers that we tried to connect, but failed, or we didn't connect
/// to for a long time.
NeverTried,
/// We tried this peer before, and had success at least once, so we know what to expect
Tried(u64),
/// This peer misbehaved and we banned them
Banned(u64),
/// We are connected to this peer right now
Connected,
/// We tried connecting, but failed
Failed(u64),
}
Get Address to Connect
Let's finally inspect the get_address_to_connect
method on the AddressMan
, which we use to create connections.
This method selects a peer address for a new connection based on required services and whether the connection is a feeler. First of all, we will return None
if the address manager doesn't have any peers. Otherwise:
- For feeler connections, it randomly picks an address, or returns
None
if the peer isBanned
. - For regular connections, it prioritizes peers supporting the required services or falls back to a random address. Peers that are
Banned
or alreadyConnected
are excluded, while those in theNeverTried
,Tried
, orFailed
states are considered valid. If no suitable address is found, it returnsNone
.
// Path: floresta-wire/src/p2p_wire/address_man.rs
/// Returns a new random address to open a new connection, we try to get addresses with
/// a set of features supported for our peers
pub fn get_address_to_connect(
&mut self,
required_service: ServiceFlags,
feeler: bool,
) -> Option<(usize, LocalAddress)> {
if self.addresses.is_empty() {
return None;
}
// Feeler connection are used to test if a peer is still alive, we don't care about
// the features it supports or even if it's a valid peer. The only thing we care about
// is that we haven't banned it.
if feeler {
let idx = rand::random::<usize>() % self.addresses.len();
let peer = self.addresses.keys().nth(idx)?;
let address = self.addresses.get(peer)?.to_owned();
if let AddressState::Banned(_) = address.state {
return None;
}
return Some((*peer, address));
};
let (id, peer) = self
.get_address_by_service(required_service)
.or_else(|| self.get_random_address(required_service))?;
match peer.state {
AddressState::Banned(_) | AddressState::Connected => None,
AddressState::NeverTried | AddressState::Tried(_) | AddressState::Failed(_) => {
Some((id, peer))
}
}
}
Dump Peers
Another key functionality implemented in this module is the ability to write the current peer data to a peers.json
file, enabling the node to resume peer connections after a restart without repeating the initial peer discovery process.
To save each LocalAddress
we use a slightly modified type called DiskLocalAddress
, similar to how we used the DiskBlockHeader
type to persist BlockHeader
s.
// Path: floresta-wire/src/p2p_wire/address_man.rs
pub fn dump_peers(&self, datadir: &str) -> std::io::Result<()> {
let peers: Vec<_> = self
.addresses
.values()
.cloned()
.map(Into::<DiskLocalAddress>::into)
.collect::<Vec<_>>();
let peers = serde_json::to_string(&peers);
if let Ok(peers) = peers {
std::fs::write(datadir.to_owned() + "/peers.json", peers)?;
}
Ok(())
}
Similarly, there's a dump_utreexo_peers
method for persisting the utreexo peers into an anchors.json
file. Peers that support utreexo are very valuable for our node; we need their utreexo proofs for validating blocks, and they are rare in the network.
// Path: floresta-wire/src/p2p_wire/address_man.rs
/// Dumps the connected utreexo peers to a file on dir `datadir/anchors.json` in json format `
/// inputs are the directory to save the file and the list of ids of the connected utreexo peers
pub fn dump_utreexo_peers(&self, datadir: &str, peers_id: &[usize]) -> std::io::Result<()> {
// ...
let addresses: Vec<DiskLocalAddress> = peers_id
.iter()
.filter_map(|id| Some(self.addresses.get(id)?.to_owned().into()))
.collect();
let addresses: Result<String, serde_json::Error> = serde_json::to_string(&addresses);
if let Ok(addresses) = addresses {
std::fs::write(datadir.to_owned() + "/anchors.json", addresses)?;
}
Ok(())
}
Great! This concludes the chapter. In the next chapter, we will dive into P2P communication and networking, focusing on the Peer
type.
Peer-to-Peer Networking
In the previous chapter, we learned how UtreexoNode
opens connections, although we didn't dive into the low-level networking details. We mentioned that each peer connection is handled by the Peer
type, keeping the peer networking logic separate from UtreexoNode
.
In this chapter, we will explore the details of Peer
operations, beginning with the low-level logic for opening connections (i.e. Peer
creation).
Peer Creation
Recall that in the open_connection
method on UtreexoNode
we call either UtreexoNode::open_proxy_connection
or UtreexoNode::open_non_proxy_connection
, depending on the self.socks5
proxy option. It's within these two functions that the Peer
is created. Let's first learn how the direct TCP connection is opened!
The open_non_proxy_connection
function will first retrieve the peer's network address and port from the provided LocalAddress
and attempt to establish a TCP connection using TcpStream::connect
(from the tokio
crate). If successful, it enables the TCP_NODELAY
option to reduce latency by disabling Nagle's algorithm.
Nagle's Algorithm is a TCP feature designed to improve network efficiency by combining small data packets into larger ones before sending them. While this reduces overhead, it can introduce delays for latency-sensitive applications. The
nodelay
option disables Nagle's Algorithm, ensuring data is sent immediately without waiting to batch packets, making it ideal for real-time communication.
Next, the function splits the TCP stream into a reader and a writer using tokio::io::split
. The reader, of type ReadHalf
, is used for receiving data, while the writer, of type WriteHalf
, is used for sending data.
It then sets up a TCP stream actor, that is, an independent component that reads incoming messages. The actor is effectively a stream reader wrapper.
// Path: floresta-wire/src/p2p_wire/node.rs
pub(crate) async fn open_non_proxy_connection(
kind: ConnectionKind,
peer_id: usize,
address: LocalAddress,
requests_rx: UnboundedReceiver<NodeRequest>,
peer_id_count: u32,
mempool: Arc<Mutex<Mempool>>,
network: bitcoin::Network,
node_tx: UnboundedSender<NodeNotification>,
user_agent: String,
) -> Result<(), WireError> {
let address = (address.get_net_address(), address.get_port());
let stream = TcpStream::connect(address).await?;
stream.set_nodelay(true)?;
let (reader, writer) = tokio::io::split(stream);
let (cancellation_sender, cancellation_receiver) = tokio::sync::oneshot::channel();
let (actor_receiver, actor) = create_tcp_stream_actor(reader);
tokio::spawn(async move {
tokio::select! {
_ = cancellation_receiver => {}
_ = actor.run() => {}
}
});
// Use create_peer function instead of manually creating the peer
Peer::<WriteHalf>::create_peer(
peer_id_count,
mempool,
network,
node_tx.clone(),
requests_rx,
peer_id,
kind,
actor_receiver,
writer,
user_agent,
cancellation_sender,
)
.await;
Ok(())
}
This actor is obtained via the create_tcp_stream_actor
function, implemented in p2p_wire/peer.rs, which returns the actor receiver (to get the peer messages) and actor instance, of type TcpStreamActor
. The actor is spawned as a separate asynchronous task, ensuring it runs independently to handle incoming data.
Very importantly, the actor for a peer must be closed when the connection finalizes, and this is why we have an additional one-time-use channel, used by the Peer
type to send a cancellation signal (i.e. "Peer connection is closed, so we don't need to listen to the peer anymore"). The tokio::select
macro ensures that the async actor task is dropped whenever a cancellation signal is received from Peer
.
Finally, the Peer
instance is created using the Peer::create_peer
function. The communication channels (internal and over the P2P network) that the Peer
uses are:
- The node sender (
node_tx
): to send messages toUtreexoNode
. - The requests receiver (
requests_rx
): to receive requests fromUtreexoNode
that will be sent to the peer. - The
actor_receiver
: to receive peer messages. - The TCP stream
writer
: to send messages to the peer. - The
cancellation_sender
: to close the TCP reader actor.
By the end of this function, a fully initialized Peer
is ready to manage communication with the connected peer via TCP (writing side) and via TcpStreamActor
(reading side), as well as communicating with UtreexoNode
.
Proxy Connection
The open_proxy_connection
is pretty much the same, except we get the TCP stream writer and reader from the proxy connection instead. The proxy setup is handled by the Socks5StreamBuilder::connect
method, implemented in p2p_wire/socks.
// Path: floresta-wire/src/p2p_wire/node.rs
pub(crate) async fn open_proxy_connection(
proxy: SocketAddr,
// ...
kind: ConnectionKind,
mempool: Arc<Mutex<Mempool>>,
network: bitcoin::Network,
node_tx: UnboundedSender<NodeNotification>,
peer_id: usize,
address: LocalAddress,
requests_rx: UnboundedReceiver<NodeRequest>,
peer_id_count: u32,
user_agent: String,
) -> Result<(), Socks5Error> {
let addr = match address.get_address() {
// Convert to a SOCKS5 address
AddrV2::Cjdns(addr) => Socks5Addr::Ipv6(addr),
AddrV2::I2p(addr) => Socks5Addr::Domain(addr.into()),
AddrV2::Ipv4(addr) => Socks5Addr::Ipv4(addr),
AddrV2::Ipv6(addr) => Socks5Addr::Ipv6(addr),
AddrV2::TorV2(addr) => Socks5Addr::Domain(addr.into()),
AddrV2::TorV3(addr) => Socks5Addr::Domain(addr.into()),
AddrV2::Unknown(_, _) => {
return Err(Socks5Error::InvalidAddress);
}
};
let proxy = TcpStream::connect(proxy).await?;
// Set up the SOCKS5 proxy stream
let stream = Socks5StreamBuilder::connect(proxy, addr, address.get_port()).await?;
let (reader, writer) = tokio::io::split(stream);
let (cancellation_sender, cancellation_receiver) = tokio::sync::oneshot::channel();
let (actor_receiver, actor) = create_tcp_stream_actor(reader);
tokio::spawn(async move {
tokio::select! {
_ = cancellation_receiver => {}
_ = actor.run() => {}
}
});
Peer::<WriteHalf>::create_peer(
// Same as before
peer_id_count,
mempool,
network,
node_tx,
requests_rx,
peer_id,
kind,
actor_receiver,
writer,
user_agent,
cancellation_sender,
)
.await;
Ok(())
}
Recap of Channels
Let's do a brief recap of the channels we have opened for internal node message passing:
-
Node Channel (
Peer
->UtreexoNode
)Peer
sends vianode_tx
UtreexoNode
receives viaNodeCommon.node_rx
-
Requests Channel (
UtreexoNode
->Peer
)UtreexoNode
sends via eachLocalPeerView.channel
, stored inNodeCommon.peers
Peer
receives via itsrequests_rx
-
TCP Actor Channel (
TcpStreamActor
->Peer
)TcpStreamActor
sends viaactor_sender
Peer
receives viaactor_receiver
-
Cancellation Signal Channel (
Peer
->UtreexoNode
)Peer
sends the signal viacancellation_sender
at the end of the connectionUtreexoNode
receives it viacancellation_receiver
UtreexoNode
sends requests via the Request Channel to the Peer
component (which then forwards them to the peer via TCP), Peer
receives the result or other peer messages via the Actor Channel, and then it notifies UtreexoNode
via the Node Channel. When the peer connection is closed, Peer
uses the Cancellation Signal Channel to allow the TCP actor listening to the peer to be closed as well.
Next, we'll explore how messages are read and sent in the P2P network!
Reading and Writing Messages
Let's first learn about the TcpStreamActor
we created just before instantiating the Peer
type, tasked with reading messages from the corresponding peer.
TCP Stream Actor
The TcpStreamActor
type is a simple struct that wraps the stream reader and communicates to the Peer
via an unbound channel.
Filename: p2p_wire/peer.rs
// Path: floresta-wire/src/p2p_wire/peer.rs
pub struct TcpStreamActor<T: AsyncRead + Unpin> {
pub stream: T,
pub sender: UnboundedSender<ReaderMessage>,
}
// Path: floresta-wire/src/p2p_wire/peer.rs
pub fn create_tcp_stream_actor(
stream: impl AsyncRead + Unpin,
) -> (
UnboundedReceiver<ReaderMessage>,
TcpStreamActor<impl AsyncRead + Unpin>,
) {
// Open an unbound channel to communicate read peer messages
let (actor_sender, actor_receiver) = unbounded_channel();
// Initialize the actor with the `actor_sender` and the TCP stream reader
let actor = TcpStreamActor {
stream,
sender: actor_sender,
};
// Return the `actor_receiver` (to receive P2P messages from the actor), and the actor
(actor_receiver, actor)
}
This TcpStreamActor
implements a run
method, which independently handles all incoming messages from the corresponding peer (via the TCP stream reader), and sends them to the Peer
type (via the channel).
Note that the messages of the channel between TcpStreamActor
and Peer
are of type ReaderMessage
. Let's briefly see what is this type, which is also defined in peer.rs.
// Path: floresta-wire/src/p2p_wire/peer.rs
pub enum ReaderMessage {
Block(UtreexoBlock),
Message(RawNetworkMessage),
Error(PeerError),
}
UtreexoBlock
is a type defined infloresta-chain
that wraps thebitcoin::Block
type as well as the utreexo data needed for validation (proofs and spent UTXOs).RawNetworkMessage
is a type from thebitcoin
crate (used here for all messages that are notReaderMessage::Block
).PeerError
is the unified error type for thePeer
struct (similar to howWireError
is the error type forUtreexoNode
).
Below we will see the run
method in action, which listens to the peer via TCP (and sends the read messages to the Peer
component).
Reading Messages
// Path: floresta-wire/src/p2p_wire/peer.rs
pub async fn run(mut self) -> Result<()> {
let err = self.inner().await;
if let Err(err) = err {
self.sender.send(ReaderMessage::Error(err))?;
}
Ok(())
}
The run
method simply invokes the inner
method, and if it fails we notify the error to the Peer
. Let's see the full inner
method.
// Path: floresta-wire/src/p2p_wire/peer.rs
async fn inner(&mut self) -> std::result::Result<(), PeerError> {
loop {
let mut data: Vec<u8> = vec![0; 24];
// Read the header first, so learn the payload size
self.stream.read_exact(&mut data).await?;
let header: P2PMessageHeader = deserialize_partial(&data)?.0;
// Network Message too big
if header.length > (1024 * 1024 * 32) as u32 {
return Err(PeerError::MessageTooBig);
}
data.resize(24 + header.length as usize, 0);
// Read everything else
self.stream.read_exact(&mut data[24..]).await?;
// Intercept block messages
if header._command[0..5] == [0x62, 0x6c, 0x6f, 0x63, 0x6b] {
let mut block_data = vec![0; header.length as usize];
block_data.copy_from_slice(&data[24..]);
let message: UtreexoBlock = deserialize(&block_data)?;
self.sender.send(ReaderMessage::Block(message))?;
}
let message: RawNetworkMessage = deserialize(&data)?;
self.sender.send(ReaderMessage::Message(message))?;
}
}
This method is responsible for continuously reading messages from the TCP stream, processing them, and sending them to the Peer
. It reads a fixed-size header to determine the payload size, validates the size, then reads the full message.
Special handling is applied for block messages, which are deserialized and sent as ReaderMessage::Block
. The rest of messages are deserialized and sent as ReaderMessage::Message
. If an error occurs (e.g., message too large or deserialization failure), it stops and sends a ReaderMessage::Error
.
Writing Messages
In order to write messages via the TCP stream, we use the following write
method on Peer
:
// Path: floresta-wire/src/p2p_wire/peer.rs
pub async fn write(&mut self, msg: NetworkMessage) -> Result<()> {
debug!("Writing {} to peer {}", msg.command(), self.id);
let data = &mut RawNetworkMessage::new(self.network.magic(), msg);
let data = serialize(&data);
self.writer.write_all(&data).await?;
self.writer.flush().await?;
Ok(())
}
The NetworkMessage
is another bitcoin
type, similar to the RawNetworkMessage
. This type contains the payload, but needs to be converted into a RawNetworkMessage
in order to be sent through the network.
This method simply performs the conversion, serializes the resulting raw message, and writes it via the TCP stream writer
.
And that's all about how we read and write P2P messages! Next, we'll explore a few Peer
methods and how it handles network messages.
Handling Peer Messages
As we can see below, the only thing the Peer::create_peer
method does is initializing the Peer
and running its read_loop
method.
// Path: floresta-wire/src/p2p_wire/peer.rs
pub async fn create_peer(
// ...
id: u32,
mempool: Arc<Mutex<Mempool>>,
network: Network,
node_tx: UnboundedSender<NodeNotification>,
node_requests: UnboundedReceiver<NodeRequest>,
address_id: usize,
kind: ConnectionKind,
actor_receiver: UnboundedReceiver<ReaderMessage>,
writer: WriteHalf<TcpStream>,
our_user_agent: String,
cancellation_sender: tokio::sync::oneshot::Sender<()>,
) {
let peer = Peer {
// Initializing the many Peer fields :P
address_id,
blocks_only: false,
current_best_block: -1,
id,
mempool,
last_ping: None,
last_message: Instant::now(),
network,
node_tx,
services: ServiceFlags::NONE,
messages: 0,
start_time: Instant::now(),
user_agent: "".into(),
state: State::None,
send_headers: false,
node_requests,
kind,
wants_addrv2: false,
shutdown: false,
actor_receiver, // Add the receiver for messages from TcpStreamActor
writer,
our_user_agent,
cancellation_sender,
};
spawn(peer.read_loop());
}
This read_loop
method will in turn call a peer_loop_inner
method:
// Path: floresta-wire/src/p2p_wire/peer.rs
pub async fn read_loop(mut self) -> Result<()> {
let err = self.peer_loop_inner().await;
// Check any errors returned by the loop, shutdown the stream writer
// and send cancellation signal to close the stream reader (actor task)
// ...
if err.is_err() {
error!("Peer {} connection loop closed: {err:?}", self.id);
}
self.send_to_node(PeerMessages::Disconnected(self.address_id))
.await;
// force the stream to shutdown to prevent leaking resources
if let Err(shutdown_err) = self.writer.shutdown().await {
debug!(
"Failed to shutdown writer for Peer {}: {shutdown_err:?}",
self.id
);
}
if let Err(cancellation_err) = self.cancellation_sender.send(()) {
debug!(
"Failed to propagate cancellation signal for Peer {}: {cancellation_err:?}",
self.id
);
}
if let Err(err) = err {
debug!("Peer {} connection loop closed: {err:?}", self.id);
}
Ok(())
}
The Peer Loop
The peer_loop_inner
method is the main loop execution that handles all communication between the Peer
component, the actual peer over the network, and the UtreexoNode
. It sends P2P messages to the peer, processes requests from the node, and manages responses from the peer.
- Initial Handshake and Main Loop: At the start, the method sends a version message to the peer using
peer_utils::build_version_message
, which initiates the handshake. Then the method enters an asynchronous loop where it handles node requests, peer messages, and ensures the peer connection remains healthy.
// Path: floresta-wire/src/p2p_wire/peer.rs
async fn peer_loop_inner(&mut self) -> Result<()> {
// send a version
let version = peer_utils::build_version_message(self.our_user_agent.clone());
self.write(version).await?;
self.state = State::SentVersion(Instant::now());
loop {
futures::select! {
request = tokio::time::timeout(Duration::from_secs(10), self.node_requests.recv()).fuse() => {
match request {
Ok(None) => {
return Err(PeerError::Channel);
},
Ok(Some(request)) => {
self.handle_node_request(request).await?;
},
Err(_) => {
// Timeout, do nothing
}
}
},
message = self.actor_receiver.recv().fuse() => {
match message {
None => {
return Err(PeerError::Channel);
}
Some(ReaderMessage::Error(e)) => {
return Err(e);
}
Some(ReaderMessage::Block(block)) => {
self.send_to_node(PeerMessages::Block(block)).await;
}
Some(ReaderMessage::Message(msg)) => {
self.handle_peer_message(msg).await?;
}
}
}
}
// ...
if self.shutdown {
return Ok(());
}
// If we send a ping and our peer doesn't respond in time, disconnect
if let Some(when) = self.last_ping {
if when.elapsed().as_secs() > PING_TIMEOUT {
return Err(PeerError::Timeout);
}
}
// Send a ping to check if this peer is still good
let last_message = self.last_message.elapsed().as_secs();
if last_message > SEND_PING_TIMEOUT {
if self.last_ping.is_some() {
continue;
}
let nonce = rand::random();
self.last_ping = Some(Instant::now());
self.write(NetworkMessage::Ping(nonce)).await?;
}
// divide the number of messages by the number of seconds we've been connected,
// if it's more than 10 msg/sec, this peer is sending us too many messages, and we should
// disconnect.
let msg_sec = self
.messages
.checked_div(Instant::now().duration_since(self.start_time).as_secs())
.unwrap_or(0);
if msg_sec > 10 {
error!(
"Peer {} is sending us too many messages, disconnecting",
self.id
);
return Err(PeerError::TooManyMessages);
}
if let State::SentVersion(when) = self.state {
if Instant::now().duration_since(when) > Duration::from_secs(10) {
return Err(PeerError::UnexpectedMessage);
}
}
}
}
-
Handling Node Requests: The method uses a
futures::select!
block to listen for requests fromUtreexoNode
viaself.node_requests
, with a 10-second timeout for each operation.- If a request is received, it is passed to the
handle_node_request
method for processing. - If the channel is closed (
Ok(None)
), the method exits with aPeerError::Channel
. - If the timeout expires without receiving a request, the method simply does nothing, allowing the loop to continue.
- If a request is received, it is passed to the
-
Processing Peer Messages: Simultaneously, the loop listens for messages from the TCP actor via
self.actor_receiver
. Depending on the type of message received:- Error: If an error is reported (closed channel or
ReaderMessage::Error
), the loop exits with the error. - Block Message: If a block is received, it is forwarded to
UtreexoNode
usingsend_to_node
. - Generic Message: Other peer messages are processed by the
handle_peer_message
method.
- Error: If an error is reported (closed channel or
// Path: floresta-wire/src/p2p_wire/peer.rs
async fn peer_loop_inner(&mut self) -> Result<()> {
// send a version
let version = peer_utils::build_version_message(self.our_user_agent.clone());
self.write(version).await?;
self.state = State::SentVersion(Instant::now());
loop {
futures::select! {
request = tokio::time::timeout(Duration::from_secs(10), self.node_requests.recv()).fuse() => {
match request {
Ok(None) => {
return Err(PeerError::Channel);
},
Ok(Some(request)) => {
self.handle_node_request(request).await?;
},
Err(_) => {
// Timeout, do nothing
}
}
},
message = self.actor_receiver.recv().fuse() => {
match message {
None => {
return Err(PeerError::Channel);
}
Some(ReaderMessage::Error(e)) => {
return Err(e);
}
Some(ReaderMessage::Block(block)) => {
self.send_to_node(PeerMessages::Block(block)).await;
}
Some(ReaderMessage::Message(msg)) => {
self.handle_peer_message(msg).await?;
}
}
}
}
// ...
if self.shutdown {
return Ok(());
}
// If we send a ping and our peer doesn't respond in time, disconnect
if let Some(when) = self.last_ping {
if when.elapsed().as_secs() > PING_TIMEOUT {
return Err(PeerError::Timeout);
}
}
// Send a ping to check if this peer is still good
let last_message = self.last_message.elapsed().as_secs();
if last_message > SEND_PING_TIMEOUT {
if self.last_ping.is_some() {
continue;
}
let nonce = rand::random();
self.last_ping = Some(Instant::now());
self.write(NetworkMessage::Ping(nonce)).await?;
}
// ...
// divide the number of messages by the number of seconds we've been connected,
// if it's more than 10 msg/sec, this peer is sending us too many messages, and we should
// disconnect.
let msg_sec = self
.messages
.checked_div(Instant::now().duration_since(self.start_time).as_secs())
.unwrap_or(0);
if msg_sec > 10 {
error!(
"Peer {} is sending us too many messages, disconnecting",
self.id
);
return Err(PeerError::TooManyMessages);
}
if let State::SentVersion(when) = self.state {
if Instant::now().duration_since(when) > Duration::from_secs(10) {
return Err(PeerError::UnexpectedMessage);
}
}
}
}
-
Shutdown Check: The loop continually checks if the
shutdown
flag is set. If it is, the loop exits gracefully. -
Ping Management: To maintain the connection, the method sends periodic
NetworkMessage::Ping
s. If the peer fails to respond within a timeout (PING_TIMEOUT
), the connection is terminated. Additionally, if no messages have been exchanged for a period (SEND_PING_TIMEOUT
), a new ping is sent, and the timestamp is updated.
Currently, we disconnect if a peer doesn't respond to a ping within 30 seconds, and we send a ping 60 seconds after the last message.
// Path: floresta-wire/src/p2p_wire/peer.rs const PING_TIMEOUT: u64 = 30; const SEND_PING_TIMEOUT: u64 = 60;
// Path: floresta-wire/src/p2p_wire/peer.rs
async fn peer_loop_inner(&mut self) -> Result<()> {
// send a version
let version = peer_utils::build_version_message(self.our_user_agent.clone());
self.write(version).await?;
self.state = State::SentVersion(Instant::now());
loop {
futures::select! {
request = tokio::time::timeout(Duration::from_secs(10), self.node_requests.recv()).fuse() => {
match request {
Ok(None) => {
return Err(PeerError::Channel);
},
Ok(Some(request)) => {
self.handle_node_request(request).await?;
},
Err(_) => {
// Timeout, do nothing
}
}
},
message = self.actor_receiver.recv().fuse() => {
match message {
None => {
return Err(PeerError::Channel);
}
Some(ReaderMessage::Error(e)) => {
return Err(e);
}
Some(ReaderMessage::Block(block)) => {
self.send_to_node(PeerMessages::Block(block)).await;
}
Some(ReaderMessage::Message(msg)) => {
self.handle_peer_message(msg).await?;
}
}
}
}
if self.shutdown {
return Ok(());
}
// If we send a ping and our peer doesn't respond in time, disconnect
if let Some(when) = self.last_ping {
if when.elapsed().as_secs() > PING_TIMEOUT {
return Err(PeerError::Timeout);
}
}
// Send a ping to check if this peer is still good
let last_message = self.last_message.elapsed().as_secs();
if last_message > SEND_PING_TIMEOUT {
if self.last_ping.is_some() {
continue;
}
let nonce = rand::random();
self.last_ping = Some(Instant::now());
self.write(NetworkMessage::Ping(nonce)).await?;
}
// ...
// divide the number of messages by the number of seconds we've been connected,
// if it's more than 10 msg/sec, this peer is sending us too many messages, and we should
// disconnect.
let msg_sec = self
.messages
.checked_div(Instant::now().duration_since(self.start_time).as_secs())
.unwrap_or(0);
if msg_sec > 10 {
error!(
"Peer {} is sending us too many messages, disconnecting",
self.id
);
return Err(PeerError::TooManyMessages);
}
if let State::SentVersion(when) = self.state {
if Instant::now().duration_since(when) > Duration::from_secs(10) {
return Err(PeerError::UnexpectedMessage);
}
}
}
}
-
Rate Limiting: The method calculates the rate of messages received from the peer. If the peer sends more than 10 messages per second on average, it is deemed misbehaving, and the connection is closed.
-
Handshake Timeout: If the peer does not respond to the version message within 10 seconds, the loop exits with an error, as the expected handshake flow was not completed.
Handshake Process
In this Peer
execution loop we have also seen a State
type, stored in the Peer.state
field. This represents the state of the handshake with the peer:
// Path: floresta-wire/src/p2p_wire/peer.rs
enum State {
None,
SentVersion(Instant),
SentVerack,
Connected,
}
None
is the initial state when the Peer
is created, but shortly after that it will be updated with SentVersion
, when we initiate the handshake by sending our NetworkMessage::Version
.
If the peer is responsive, we will hear back from her within the next 10 seconds, via her NetworkMessage::Version
, which will be handled by the handle_peer_message
(that we saw in the third step). This method will internally save data from the peer, send her a NetworkMessage::Verack
(i.e. the acknowledgment of her message), and update the state to SentVerack
.
Finally, when we receive the NetworkMessage::Verack
from the peer, we can switch to the Connected
state, and communicate the new peer data with UtreexoNode
.
Node Communication Lifecycle
Once connected to a peer, UtreexoNode
can send requests and receive responses.
-
It interacts with a specific peer through
NodeCommon.peers
and usesLocalPeerView.channel
to send requests. -
Peer
receives the request message and handles it viahandle_node_request
(that we saw in the second step). This method will perform the TCP write operation. -
When the peer responds with a message, it is received via the TCP
actor_receiver
and handled by thehandle_peer_message
method, which likely passes new data back toUtreexoNode
, continuing the communication loop.