I have a 20 year career in developing automated trading systems. I have been involved in Ethereum since before the pre-sale, hoping it would address the failings I saw in traditional finance. My motivation is to engage with the community to achieve this end.
I identified that frontrunning would be a problem for Ethereum in 2014 and begun estimating real world harms with my EthInclude project (a prototype of Zeromev) which supplied data for my MEV WTF Devcon talk around this time.
I’ve been published on Coindesk and I’m a frequent poster on ethresear.ch, where I warned that PBS would lead to builder centralization issues a year before MEV-Boost was released. I founded and developed http://zeromev.org on an Ethereum Foundation grant in 2021.
Also unrestricted direct database access to this data for CoW Swap / Zeromev and their partners
Minimum provisioning / maintenance period of 12 months
Grant Goals and impact:
Accessibility of data related to MEV on Ethereum is still relatively limited.
It is in the best interest of both zeromev.org and CoW Swap to make granular MEV data available to the public through a free API. This will help to increase transparency and awareness around the problematic nature of MEV and promote sustainable solutions.
Some potential ideas for use cases to build on top of the proposed API:
A chart that compares the value extracted from users split by the DEX protocol they were using
A chart that compares the amount of MEV incidents for every MEV type split by DEX protocol
An educational snippet on CoW Swap revealing more information about MEV incidents uses had when interacting with various DeFi protocols
Milestones:
The following MEV data will be summarized for each relevant transaction:
Field
Description
BlockNumber
Ethereum block number
TxIndex
Index of transaction in block
MEVType
Frontrun, Backrun, Sandwiched, Swaps, Arb, etc
Protocol
Uniswap, Bancor, Opensea, etc
UserLossUsd
Loss to user from the MEV
ExtractorProfitUsd
Profit to the extractor from the MEV
VolumeUsd
Swap volume (where applicable)
Imbalance
Sandwiched imbalance percentage
AddressFrom
Transaction sender
AddressTo
Transaction receiver
ArrivalTimeUS
Time the transaction was first seen by our US node
ArrivalTimeEU
Time the transaction was first seen by our European node
ArrivalTimeAS
Time the transaction was first seen by our Asian node
Deliverables:
Creation of a new postgres database to hold the MEV transaction summary data above
Data to be persisted on RAID drives across two servers / database instances (a write instance and replicating read instance)
Database to be populated with all available historical MEV transaction data
MEV transaction data updated in realtime as each new block is classified by Zeromev core
Authenticated access to the database restricted by IP for use by Cowswap / Zeromev / Dune / etc
The first payment of the grant has been executed: link @Pmcgoohan when you have updates about the progress, please post them here for everyone to follow
Hi. I’m sorry to report that Zeromev is having infrastructure problems.
We run two archive nodes for redundancy- sadly both have failed and so MEV data is no longer updating (although transaction timing data is still being collected without problems).
Resolving this situation and making the clusters more robust is my immediate priority.
I’ll be progressing the API project once this has been done.
Thanks.
That’s unfortunate!
Hope you find the path for quick recovery of Zeromev!
Please keep up posted when you have a timeline for getting back to work on the MEV API
Hello,
We successfully resolved the infrastructure issues above and increased capacity and Zeromev starting processing MEV data again.
Unfortunately, the system then halted on 04-Jan with a new issue related to the use of Flashbots mev-inspect-py.
I have contracted k8/python experts to help me resolve this.
I’m afraid that because all resources are currently focused on restoring the site, the launch date for the API project has been pushed back to late Feb/early Mar.
I hope to have Zeromev back up by the end of the week. Investigations are ongoing and I’ll have a clearer idea soon. I will post further information here as I have it.
Please accept my apologies for this, and thank you for your patience.
I’m pleased to say that the remaining issues with the site have been resolved which is now back up and running.
I discovered that mev-inspect-py processes certain blocks very slowly (many minutes rather than a few seconds as usual) and have now ensured that the site can handle these outliers.
So it’s full speed ahead on the API project. I’m sorry this has caused delay- thank you for your patience. I’m looking forward to getting stuck in on Monday.
The API servers have been provisioned and configured. Replicating database instances have been setup with automatic failover provided by a third watcher server.
The database and API source table have been created as specified. The code to populate this from the existing Zeromev MEV and arrival time databases is nearing completion.
It’s going well and we’re on course to deliver.
I am also extending the Zeromev dataset by backfilling another year of data. This will give the Zeromev site and API the same time range MEV-Explore (from Dec-2019). I expect this extended dataset to be made available as part of the API launch, if not before.
We’re making good progress this week. The API table is now being populated with data for testing and debugging. A few things have come up that I wanted to highlight:
Data Structure / API Improvements
I’ve added swap_count columns alongside swap_volume_usd for better reporting
I also aim to add extractror_swap_count and extractror_swap_volume columns so extractor volume can be differentiated from user or ‘true’ volume (and perhaps calculated user_swap_count, user_swap_volume columns through the API)
Data Structure Limitations
Note that while it will be possible to aggregate arb and swap volumes, it will not be possible to differentiate accurately by protocol because where there are multiple swaps per transaction the protocol field will be set to “multiple”
A later development could address this with a dedicated swaps table with a row for each swap rather than each ethereum transaction
Sandwich volume is anyway not impacted and can be aggregated by protocol
Classification Improvements
I will need to reclassify the entire MEV dataset to populate the API table, and this represents an opportunity to make improvements
Currently, if any token in a potential sandwich is unknown (ie: not a known token in the Ethplorer API), the Zeromev classification does not calculate it
Because the dataset will be used for aggregate reporting (eg: total MEV per day), I am looking into calculating MEV even in some instances where tokens are unknown
This should be possible as long as the input & output tokens are known (see screenshot below)
I think the steps above will greatly improve the power of the dataset, while keeping it simple to understand and report against, as was the original vision.
I do not expect this to push back delivery date beyond the end of this month/early next month.
I’d be keen to hear what you think here, and I’m very happy to discuss it and give further clarification.
The improvements to the MEV classification / calculations above have been coded successfully.
I aim to release these early with an announcement this week.
This will prepare us for the new website data format (some changes were needed there) so when the time comes we can export this along with the API data without any downtime and without breaking clients (this will take around 48 hours).
Both the API data export and the REST API are now up and running in development. Testing is ongoing.
Awesome, thanks for the update!
Do you have an ETA for public testing ready time?
Would be interested to find people that want to build cool visualizations using the API
Testing has raised a few small issues related to low liquidity DEX pools which have now been fixed.
This project has been a useful exercise in auditing the dataset, and the MEV data is looking strong.
Unfortunately, I have just come across an issue with the address fields.
I intend the address fields to relate to the from/to fields returned by the eth_getBlockByNumber RPC call, as I think these are most useful (please let me know if you disagree!)
I’ve been using the mev-inspect database address from/to fields until now, but on a closer inspection it seems they relate to specific traces which are not always what is needed (eg: uniswap 3 contract addresses rather than end user addresses).
So I think it’s best to ignore those fields, and import them directly via RPC as a batch job and incrementally after that.
Given that this means further coding and testing involving a high volume of data (the entire chain since 2019), where I was going to set the launch date for next week, I think it’s now sensible to aim for the week after.
How do we feel about March 15th as a potential launch date?
So I (rather publicly) discovered some issues with the backrunning data in the API last week, well of my interpretation of it at least. Explanation here…
[twitterdotcom]/pmcgoohanCrypto/status/1635283236158046215
Although it was hasty of me to generalize about the nature of these backruns, sandwich imbalances are interesting.
They show where my sandwich calculation is having to work to produce a balanced result based on the mev-inspect-py data. They will often point to more complex MEV types that are not yet fully quantified.
However, I want to avoid this data being reported as first-class MEV as it stands. As such, while backruns will still be visible through the API, I will null the user loss and extractor profit columns for them. I’ll keep the imbalance column as an indicator that the sandwich has been balanced.
The data in these nulled columns won’t be lost, it’ll still be stored in the database, just not published.
Despite all this, I’m on course for the API to be available at some point on Wed.
However, it is a little close to the wire as I will still have data importing right up to then. Because of this, and in the light of my recent backrunning U-turn, I’m keen to have more eyes on it before going public.
As such, I think the idea of an internal soft launch in the first instance is a good one. Perhaps we can discuss doing that this week and hold off on a general launch and any announcements until after that?