This graph shows how long it takes to export a particular piece of data. As you can see, the exporting of receipts and logs runs significantly longer than all the other tasks.
A receipt in Ethereum is an object containing information about the result of a transaction, such as its status, gas used, the set of logs created through execution of the transaction and the Bloom filter composed from information in those logs. The reason exporting receipts is so slow isthat JSON RPC API only allows retrieving receipts one by one, unlike transactions, which can be retrieved per block. Even with request batching, this process is very slow.
With the introduction of an API method that allows retrieving all receipts in a block, this issue will be solved. Here are feature requests for geth and parity that you should upvote if you want to support the project:geth
,parity
.
Exporting ERC20 token transfers
As defined in theERC20 standard, every token transfer must emit a Transfer event with the following signature:
Transfer(address indexed _from, address indexed _to, uint256 _value)
Solidity events correspond to logs in EVM (Ethereum Virtual Machine), which are stored in the transaction’s log — a special data structure in the blockchain. These logs are associated with the address of the contract and are incorporated into the blockchain.
Every log contains a list of topics associated with it, which are used for index and search functionality. The first topic is always the Keccak hash of the event signature; you can calculate it with acommand from Ethereum ETL:
$ python get_keccak_hash.py -i "Transfer(address,address,uint256)"
> 0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef
The taskextract_token_transfers takes the file produced by export_receipts_and_logs, filters the logs by the above topic and writes the extracted token transfers to the output CSV file.
There is an alternative way of exporting token transfers that is more convenient if you don’t need receipts and logs and only care about ERC20 tokens. You can use export_token_transfers.py, which relies on eth_getFilterLogs API to retrieve the transfers directly from the Ethereum node, bypassing the logs extraction step.
Exporting contracts
Contracts in Ethereum are created with a special kind of transaction in which the receiver is set to 0x0. The receipt for such a transaction contains acontractAddress field with the address of the created contract. This field is passed toeth_getCode API to retrieve contract bytecode.
Disassembling the bytecode withethereum-dasm allows getting the initialization block of the contract and all PUSH4 opcodes in it. The operands to PUSH4 are the first 4 bytes of the Keccak hash of the ASCII form of the function signature, as explained inSolidity documentation. Below is an example output of ethereum-dasm: