Understanding the data behind Bitcoin Core
Overview
In this tutorial, we will be taking a closer look at the data directory and files behind the Bitcoin core reference client. Having a better understanding of how this is managed allows us to overcome probing bitcoin's remote procedure call (RPC) and REST based interfaces for insights into the data maintained by the client.
Prerequisites
You will need access to a bitcoin node. We suggest executing against a node configured in regtest
mode so that we can have the freedom of playing with various scenarios without having to loose real money. You can however execute these against either the testnet
or mainnet
configurations.
Note:
If you don't currently have access to a bitcoin development environment set up, dont' worry, we have your back! We've setup a web based mechanism which provisions your very own private session that includes these tools and comes preconfigured with a bitcoin node inregtest
mode. https://bitcoindev.network/bitcoin-cli-sandbox/
Alternatively, we have also provided a simple docker container configured inregtest
mode that you can install for testing purposes.gr0kchain:~ $ docker volume create --name=bitcoind-data gr0kchain:~ $ docker run -v bitcoind-data:/bitcoin --name=bitcoind-node -d \ -p 18444:18444 \ -p 127.0.0.1:18332:18332 \ bitcoindevelopernetwork/bitcoind-regtest
Getting started
Before we get started, let's have a look at the data directory of an existing running bitcoin core node.
gr0kchain@bitcoindev $ tree ~/.bitcoin/
/home/gr0kchain/.bitcoin/
├── banlist.dat
├── bitcoin.conf
├── blocks
│ ├── blk00000.dat
│ ├── index
│ │ ├── 000003.log
│ │ ├── 000004.log
│ │ ├── 000005.ldb
│ │ ├── CURRENT
│ │ ├── LOCK
│ │ └── MANIFEST-000002
│ └── rev00000.dat
├── chainstate
│ ├── 000003.log
│ ├── CURRENT
│ ├── LOCK
│ └── MANIFEST-000002
├── db.log
├── debug.log
├── fee_estimates.dat
├── mempool.dat
├── peers.dat
└── wallet.dat
10 directories, 49 files
Note
By default, bitcoind will manage files in the following locations.Windows
%APPDATA%\Bitcoin
Linux~/.bitcoin/
Mac OS X~/Library/Application\ Support/Bitcoin/
This default location can be overridden using the
-datadir
configuration parameter or by adding adatadir
parameter to the bitcoin.conf file.A similar data directory is created for either the
testnet
andregtest
configuration in sub directories assuming either of these have been configured to avoid conflicting with themainnet
files.
Filename | Description |
---|---|
banlist.dat | stores the IPs/Subnets of banned nodes |
bitcoin.conf | contains configuration settings for bitcoind or bitcoin-qt |
bitcoind.pid | stores the process id of bitcoind while running |
blocks/blk000??.dat | block data (custom, 128 MiB per file); since 0.8.0 |
blocks/rev000??.dat | block undo data (custom); since 0.8.0 (format changed since pre-0.8) |
blocks/index/* | block index (LevelDB); since 0.8.0 |
chainstate/* | blockchain state database (LevelDB); since 0.8.0 |
database/* | BDB database environment; only used for wallet since 0.8.0; moved to wallets/ directory on new installs since 0.16.0 |
db.log | wallet database log file; moved to wallets/ directory on new installs since 0.16.0 |
debug.log | contains debug information and general logging generated by bitcoind or bitcoin-qt |
fee_estimates.dat | stores statistics used to estimate minimum transaction fees and priorities required for confirmation; since 0.10.0 |
indexes/txindex/* | optional transaction index database (LevelDB); since 0.17.0 |
mempool.dat | dump of the mempool's transactions; since 0.14.0 |
peers.dat | peer IP address database (custom format); since 0.7.0 |
wallet.dat | personal wallet (BDB) with keys and transactions; moved to wallets/ directory on new installs since 0.16.0 |
wallets/database/* | BDB database environment; used for wallets since 0.16.0 |
wallets/db.log | wallet database log file; since 0.16.0 |
wallets/wallet.dat | personal wallet (BDB) with keys and transactions; since 0.16.0 |
.cookie | session RPC authentication cookie (written at start when cookie authentication is used, deleted on shutdown): since 0.12.0 |
onion_private_key | cached Tor hidden service private key for -listenonion : since 0.12.0 |
guisettings.ini.bak | backup of former GUI settings after -resetguisettings is used |
Only Only used in pre-0.8.0
- blktree/; block chain index (LevelDB); since pre-0.8, replaced by blocks/index/ in 0.8.0
- coins/; unspent transaction output database (LevelDB); since pre-0.8, replaced by chainstate/ in 0.8.0
Only used before 0.8.0
- blkindex.dat: block chain index database (BDB); replaced by {chainstate/,blocks/index/,blocks/rev000??.dat} in 0.8.0
- blk000?.dat: block data (custom, 2 GiB per file); replaced by blocks/blk000??.dat in 0.8.0
Only used before 0.7.0
- addr.dat: peer IP address database (BDB); replaced by peers.dat in 0.7.0
As we can see, there are various files and directories which organise data behind our node, so let's take a closer look at each of these.
Some background on key store
For the purpose of this tutorial, we'll be having a closer look at the blocks
and chainstate
directories and files.
We will be using LevelDB, a light-weight, single-purpose library for persistence with bindings to many platforms used by bitcoin core for storing this data.
By default, LevelDB stores entries lexicographically sorted by keys. The sorting is one of the main distinguishing features of LevelDB amongst similar embedded data storage libraries and comes in very useful for querying as we’ll see later.
A primer on leveldb
Before we look at these in more details, let's first familiarise ourselves with leveldb
using nodejs.
- Create a directory for hosting our code
gr0kchain@bitcoindev $ mkdir code && cd code
- Install the
leveldb
package.
gr0kchain@bitcoindev $ npm install level
- Create a file called
index.js
that contains the following code.
var level = require('level')
// 1) Create our database, supply location and options.
// This will create or open the underlying store.
var db = level('my-db')
// 2) Put a key & value
db.put('name', 'Satoshi Nakamoto', function (err) {
if (err) return console.log('Ooops!', err) // some kind of I/O error
// 3) Fetch by key
db.get('name', function (err, value) {
if (err) return console.log('Ooops!', err) // likely the key was not found
// Ta da!
console.log('name=' + value)
})
})
- Run the script.
gr0kchain@bitcoindev $ node ./index.js
name=Satoshi Nakamoto
Great, you've just created your first level database!
A closer at the data behind leveldb
An interesting observation here will be checking the data directory created by our code.
gr0kchain@bitcoindev $ tree ./my-db/
./my-db/
├── 000003.log
├── CURRENT
├── LOCK
├── LOG
└── MANIFEST-000002
0 directories, 5 files
Note
For more information on these files consult the LevelDB Documentation.
Here you should notice a similar structure as seen previously for our chainstate
and blocks/index
directories.
Using the level
is great for developing applications, however, let's use a leveldb read–eval–print loop REPL utility called lev for exploring our data.
- Install
lev
gr0kchain@bitcoindev $ npm install -g lev
- Invoke our
my-db
files created
gr0kchain@bitcoindev $ lev ./my-db/
/>
- Obtain a list of current keys stored in our database.
gr0kchain@bitcoindev $ lev ./my-db/
/>ls
name
/>
- Obtain the value for the key
name
/>get name
'Satoshi Nakamoto'
- Add another key value pair to the database
/>put bitcoin "rocks"
'OK'
/>ls
bitcoin name
/>get bitcoin
'rocks'
- Exit the interactive
repl
/>.exit
Nice! Some additional commands we can use with lev
include.
- GET - Get a key from the database.
- PUT - Put a value into the database. If you have keyEncoding or valueEncoding set to json, these values will be parsed from strings into json.
- DEL - Delete a key from the database.
- LS - Get all the keys in the current range.
- START - Defines the start of the current range. You can also use GT or GTE.
- END - Defines the end of the current range. You can also use LT or LTE.
- LIMIT - Limit the number of records in the current range (defaults to 5000).
- REVERSE - Reverse the records in the current range.
Looking at the data behind bitcoin core
Now that we've looked how level db works, let's take a closer look at our block
and chainstate
directories.
Warning
It is recommended that you make a backup of your chaindata to avoid any accidental corruption..
Bitcoin core developer Pieter Wuille gives us a good explanation of these sections as follows.
Bitcoind since version 0.8 maintains two databases, the block index (in $DATADIR/blocks/index
) and the chainstate (in $DATADIR/chainstate
). The block index maintains information for every block, and where it is stored on disk. The chain state maintains information about the resulting state of validation as a result of the currently best known chain.
Inside the block index, the used key/value pairs are:
- 'b' + 32-byte block hash -> block index record. Each record stores:
- The block header.
- The height.
- The number of transactions.
- To what extent this block is validated.
- In which file, and where in that file, the block data is stored.
- In which file, and where in that file, the undo data is stored.
- 'f' + 4-byte file number -> file information record. Each record stores:
- The number of blocks stored in the block file with that number.
- The size of the block file with that number ($DATADIR/blocks/blkNNNNN.dat).
- The size of the undo file with that number ($DATADIR/blocks/revNNNNN.dat).
- The lowest and highest height of blocks stored in the block file with that number.
- The lowest and highest timestamp of blocks stored in the block file with that number.
- 'l' -> 4-byte file number: the last block file number used.
- 'R' -> 1-byte boolean ('1' if true): whether we're in the process of reindexing.
- 'F' + 1-byte flag name length + flag name string -> 1 byte boolean ('1' if true, '0' if false): various flags that can be on or off. Currently defined flags include:
- 'txindex': Whether the transaction index is enabled.
- 't' + 32-byte transaction hash -> transaction index record. These are optional and only exist if 'txindex' is enabled (see above). Each record stores:
- Which block file number the transaction is stored in.
- Which offset into that file the block the transaction is part of is stored at.
- The offset from the start of that block to the position where that transaction itself is stored.
Inside the chain state database, the following key/value pairs are stored:
- 'c' + 32-byte transaction hash -> unspent transaction output record for that transaction. These records are only present for transactions that have at least one unspent output left. Each record stores:
- The version of the transaction.
- Whether the transaction was a coinbase or not.
- Which height block contains the transaction.
- Which outputs of that transaction are unspent.
- The scriptPubKey and amount for those unspent outputs.
- 'B' -> 32-byte block hash: the block hash up to which the database represents the unspent transaction outputs.
Latest version of bitcoind(please add version compatibility) uses obfuscation of the value in key/value pair . So you need to XOR with the obfuscation key to get the real value.
Understanding the chainstate leveldb
Let's start by looking at the chainstate
folder. The chainstate directory contains the state as of the latest block. In simplified terms, it stores every spendable coin, who owns it, and how much it's worth.
Note
Using this against your data appears to corrupt the file which requires restarting bitcoind with -reindex or -reindex-chainstate. It is suggested that you execute these against a backup of your bitcoin datadir.
- LevelDB doesn't support concurrent access from multiple applications, so we'll first need to stop
bitcoind
.
gr0kchain@bitcoindev $ bitcoin-cli stop
- Make a backup of your chain data
gr0kchain@bitcoindev $ rsync -va ~/.bitcoin/chainstate/ ~/.bitcoin/chainstate_bk/
- Open the chainstate using the
lev
repl command.
gr0kchain@bitcoindev $ lev ~/.bitcoin/chainstate_bk/
/>
- Run the
ls
command.
/> ls
obfuscate_key B
Interesting, here we see a key called obfuscate_key
and another called B
. Some background on this can be found due to a pull request introduced into bitcoin core which helps overcome issues with Anti-Virus software from flagging bitcoin data as being hostile through intentionally adding virus signatures to the time chain. The obfuscation key is a 64-bit value identified by 0e00obfuscation_key
that should be XORed with each data value from the database.
Note
When setting the bitcoinddebug
field toleveldb
or1
, we will notice the obfuscation key log entry from ourdebug.log
file.gr0kchain@bitcoindev $ grep obfuscate ~/.bitcoin/regtest/debug.log 2019-03-14 12:06:16 Wrote new obfuscate key for /home/gr0kchain/.bitcoin/regtest/chainstate: eac3d71013881b79
Writing a script for reading from the chainstate
leveldb
Due to my experience with LevelDB's level library causing corruption to the database, I'd suggest making a backup of the data before executing any of these commands. I'll also be using a fresh copy of regtest
where we'll need to generate some blocks to get us going.
- First, let's create a backup of our database.
gr0kchain@bitcoindev $ bitcoind
gr0kchain@bitcoindev $ bitcoin-cli generate 1
gr0kchain@bitcoindev $ bitcoin-cli stop
gr0kchain@bitcoindev $ rsync -va ~/.bitcoin/regtest/ ~/.bitcoin/backup/
-
Next, we create a javascript file that works based on details covered.
var level = require('level') var db = level('/home/gr0kchain/.bitcoin/regtest_backup/chainstate/',{ keyEncoding: 'hex', valueEncoding: 'hex' }) var obfkey; db.createReadStream({ gte: '\x63', lt: '\x64' }) .on('data', function (data) { if (data.key == '0e006f62667573636174655f6b6579') { console.log("obfuscate_key", data) } else { console.log("record", data) } }) .on('error', function (err) { console.log('Oh my!', err) }) .on('close', function () { console.log('Stream closed') }) .on('end', function () { console.log('Stream ended') })
-
We can then run this against our backup database.
gr0kchain@bitcoindev $ node ./chainstate.js obfuscate_key { key: '0e006f62667573636174655f6b6579', value: '08eac3d71013881b79' } record { key: '42', value: '335c93f941bd7479dce90d3f50e423a3125b2bfdf641f7d78823a9df39f1c673' } record { key: '638db7b33143173127aff1473ac15501cfc75ebce965546b391de114034d33c237', value: 'ebc0e513374284367c3a5730b652bfc7400329de72f2dbf6d79cb8f5ceaeea183bf1855112' } Stream ended Stream closed
Note
The value for ourobfuscate_key
should match that we saw earlier in ourdebug.log
. In my local instance, this is08eac3d71013881b79
which in theleveldb
is prefixed with the value08
representing the ascii value forbackspace
and is not reflected in the log output. -
Start our bitcoind server, and check one of our previous blocks.
gr0kchain@bitcoindev $ bitcoind Bitcoin server starting gr0kchain@bitcoindev $ bitcoin-cli getblockchaininfo | grep hash "bestblockhash": "0add792acf7ee062aeecc9e5edfc98f8da386c432fda2a36006f3552e9449fd9", gr0kchain@bitcoindev $ bitcoin-cli getblock 0add792acf7ee062aeecc9e5edfc98f8da386c432fda2a36006f3552e9449fd9 { "hash": "0add792acf7ee062aeecc9e5edfc98f8da386c432fda2a36006f3552e9449fd9", "confirmations": 1, "size": 179, "height": 1, "version": 536870912, "merkleroot": "37c2334d0314e11d396b5465e9bc5ec7cf0155c13a47f1af2731174331b3b78d", "tx": [ "37c2334d0314e11d396b5465e9bc5ec7cf0155c13a47f1af2731174331b3b78d" ], "time": 1552565182, "mediantime": 1552565182, "nonce": 3, "bits": "207fffff", "difficulty": 4.656542373906925e-10, "chainwork": "0000000000000000000000000000000000000000000000000000000000000004", "previousblockhash": "0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206" }
{ key: '638db7b33143173127aff1473ac15501cfc75ebce965546b391de114034d33c237', value: 'ebc0e513374284367c3a5730b652bfc7400329de72f2dbf6d79cb8f5ceaeea183bf1855112' }
In the above example, we can see the utxo represented by its txid 37c2334d0314e11d396b5465e9bc5ec7cf0155c13a47f1af2731174331b3b78d
in little endian format leaded by a c
or 63
in hex.
The value in this case is still obfuscated using the value of our 0e006f62667573636174655f6b6579
keys value 08eac3d71013881b79
.
The reason for this is that the on disk storage files are often specially designed to be compact on disk, and not really intended to be easily usable by other applications (LevelDB doesn't support concurrent access from multiple applications anyway). There are several RPC methods for querying data from the databases (getblock, gettxoutsetinfo, gettxout) without needing direct access.
As you can see, only headers are stored inside this database. The actual blocks and transactions are stored in the block files, which are not databases, but just raw append-only files that contain the blocks in network format.
Decoding the values
To decode these values, using the obfuscation key.
- Install
bigi
to work with large numbers in javascript
gr0kchain@bitcoindev $ npm install bigi
- Start a node in interactive mode
gr0kchain@bitcoindev $ node
>
- Use the big integer package to assign our previous
value
andobfuscate_key
key value. You need to pop theB
character from this value and repeat it for the length of the value being decoded.
gr0kchain@bitcoindev $ node
> var bigi = require("bigi")
undefined
> var k = bigi.fromHex('eac3d71013881b79eac3d71013881b79eac3d71013881b79eac3d71013881b79eac3d71013')
undefined
> var v = bigi.fromHex('ebc0e513374284367c3a5730b652bfc7400329de72f2dbf6d79cb8f5ceaeea183bf1855112')
undefined
> var decode = v.xor(k)
undefined
> decode.toHex()
'0103320324ca9f4f96f98020a5daa4beaac0fece617ac08f3d5f6fe5dd26f161d132524101'
>
We now have the decoded version of our UTXO which can be decoded as per the instruction from here.
/** pruned version of CTransaction: only retains metadata and unspent transaction outputs
*
* Serialized format:
* - VARINT(nVersion)
* - VARINT(nCode)
* - unspentness bitvector, for vout[2] and further; least significant byte first
* - the non-spent CTxOuts (via CTxOutCompressor)
* - VARINT(nHeight)
*
* The nCode value consists of:
* - bit 1: IsCoinBase()
* - bit 2: vout[0] is not spent
* - bit 4: vout[1] is not spent
* - The higher bits encode N, the number of non-zero bytes in the following bitvector.
* - In case both bit 2 and bit 4 are unset, they encode N-1, as there must be at
* least one non-spent output).
*
* Example: 0104835800816115944e077fe7c803cfa57f29b36bf87c1d358bb85e
* <><><--------------------------------------------><---->
* | \ | /
* version code vout[1] height
*
* - version = 1
* - code = 4 (vout[1] is not spent, and 0 non-zero bytes of bitvector follow)
* - unspentness bitvector: as 0 non-zero bytes follow, it has length 0
* - vout[1]: 835800816115944e077fe7c803cfa57f29b36bf87c1d35
* * 8358: compact amount representation for 60000000000 (600 BTC)
* * 00: special txout type pay-to-pubkey-hash
* * 816115944e077fe7c803cfa57f29b36bf87c1d35: address uint160
* - height = 203998
*
*
* Example: 0109044086ef97d5790061b01caab50f1b8e9c50a5057eb43c2d9563a4eebbd123008c988f1a4a4de2161e0f50aac7f17e7f9555caa486af3b
* <><><--><--------------------------------------------------><----------------------------------------------><---->
* / \ \ | | /
* version code unspentness vout[4] vout[16] height
*
* - version = 1
* - code = 9 (coinbase, neither vout[0] or vout[1] are unspent,
* 2 (1, +1 because both bit 2 and bit 4 are unset) non-zero bitvector bytes follow)
* - unspentness bitvector: bits 2 (0x04) and 14 (0x4000) are set, so vout[2+2] and vout[14+2] are unspent
* - vout[4]: 86ef97d5790061b01caab50f1b8e9c50a5057eb43c2d9563a4ee
* * 86ef97d579: compact amount representation for 234925952 (2.35 BTC)
* * 00: special txout type pay-to-pubkey-hash
* * 61b01caab50f1b8e9c50a5057eb43c2d9563a4ee: address uint160
* - vout[16]: bbd123008c988f1a4a4de2161e0f50aac7f17e7f9555caa4
* * bbd123: compact amount representation for 110397 (0.001 BTC)
* * 00: special txout type pay-to-pubkey-hash
* * 8c988f1a4a4de2161e0f50aac7f17e7f9555caa4: address uint160
* - height = 120891
*/
Personally identifiable data [v0.8 and above]
This section may be of use to you if you wish to send a friend the blockchain, avoiding them a hefty download.
- wallet.dat
** Contains addresses and transactions linked to them. Please be sure to make backups of this file. It contains the keys necessary for spending your bitcoins. You should not transfer this file to any third party or they may be able to access your bitcoins. - db.log
** May contain information pertaining to your wallet. It may be safely deleted. - debug.log
** May contain IP addresses and transaction ID's. It may be safely deleted. - database/ folder
** This should only exist when bitcoin-qt is currently running. It contains information (BDB state) relating to your wallet. - peers.dat
** Unknown whether this contains personally identifiable data. It may be safely deleted.
Other files and folders (blocks, blocks/index, chainstate) may be safely transferred/archived as they contain information pertaining only to the public blockchain.
Transferability
The database files in the "blocks" and "chainstate" directories are cross-platform, and can be copied between different installations. These files, known collectively as a node's "block database", represent all of the information downloaded by a node during the syncing process. In other words, if you copy installation A's block database into installation B, installation B will then have the same syncing percentage as installation A. This is usually ''far'' faster than doing the normal initial sync over again. However, when you copy someone's database in this way, you are trusting them '''absolutely'''. Bitcoin Core treats its block database files as 100% accurate and trustworthy, whereas during the normal initial sync it treats each block offered by a peer as invalid until proven otherwise. If an attacker is able to modify your block database files, then they can do all sorts of evil things which could cause you to lose bitcoins. Therefore, you should only copy block databases from Bitcoin installations under your personal control, and only over a secure connection.
Each node has a unique block database, and all of the files are highly connected. So if you copy just a few files from one installation's "blocks" or "chainstate" directories into another installation, this will almost certainly cause the second node to crash or get stuck at some random point in the future. If you want to copy a block database from one installation to another, you have to delete the old database and copy ''all'' of the files at once. Both nodes have to be shut down while copying.
Only the file with the highest number in the "blocks" directory is ever written to. The earlier files will never change. Also, when these blk*.dat files are accessed, they are usually accessed in a highly sequential manner. Therefore, it's possible to symlink the "blocks" directory or some subset of the blk*.dat files individually onto a magnetic storage drive without much loss in performance (see Splitting the data directory), and if two installations start out with identical block databases (due to the copying described previously), subsequent runs of rsync will be very efficient.
Conclusion
In this tutorial, we had a look at the files and directories behind how the bitcoin core reference client manages it's own data.
References
- Bitcoin core Datadir
- What is the database for?
- LevelDB
- How does Bitcoin read from/write to LevelDB
- Chainstate data in only retains metadata and unspent transaction outputs src/coins.h
- What are the keys used in the blockchain levelDB (ie what are the key:value pairs)?
- What is the difference between chainstate and blocks folder?