Grant Application: ZeroMEV API

Grant Application: ZeroMEV API


Grant Title:

ZeroMEV Data API


Author:

Pmcgoohan (zeromev.org)


About You:

I have a 20 year career in developing automated trading systems. I have been involved in Ethereum since before the pre-sale, hoping it would address the failings I saw in traditional finance. My motivation is to engage with the community to achieve this end.

I identified that frontrunning would be a problem for Ethereum in 2014 and begun estimating real world harms with my EthInclude project (a prototype of Zeromev) which supplied data for my MEV WTF Devcon talk around this time.

I’ve been published on Coindesk and I’m a frequent poster on ethresear.ch, where I warned that PBS would lead to builder centralization issues a year before MEV-Boost was released. I founded and developed http://zeromev.org on an Ethereum Foundation grant in 2021.


Additional Links:

Miners frontrunning post (2014) (http://bit.ly/3v3YDmP)
Estimate frontrunning estimator (https://github.com/pmcgoohan/EthInclude)
Builder centralization warning (https://ethresear.ch/t/two-slot-proposer-builder-separation/10980/10)
Coindesk article (http://bit.ly/3jWSRkG)
Founded/developed (http://zeromev.org)

EthGlobal (https://www.youtube.com/watch?v=f1eJnhET1U4)
YouTube (https://www.youtube.com/@pmcgoohan-zeromev7240)
Twitter (https://twitter.com/pmcgoohanCrypto)


Grant Category:

CoWmunity growth


Grant Description:

  • Generate transaction granularity MEV summary data sourced from zeromev.org
  • Data published via a documented public API at data.zeromev.org / info.zeromev.org
  • Also unrestricted direct database access to this data for CoW Swap / Zeromev and their partners
  • Minimum provisioning / maintenance period of 12 months

Grant Goals and impact:

Accessibility of data related to MEV on Ethereum is still relatively limited.

It is in the best interest of both zeromev.org and CoW Swap to make granular MEV data available to the public through a free API. This will help to increase transparency and awareness around the problematic nature of MEV and promote sustainable solutions.

Some potential ideas for use cases to build on top of the proposed API:

  • A chart that compares the value extracted from users split by the DEX protocol they were using
  • A chart that compares the amount of MEV incidents for every MEV type split by DEX protocol
  • An educational snippet on CoW Swap revealing more information about MEV incidents uses had when interacting with various DeFi protocols

Milestones:

The following MEV data will be summarized for each relevant transaction:

Field Description
BlockNumber Ethereum block number
TxIndex Index of transaction in block
MEVType Frontrun, Backrun, Sandwiched, Swaps, Arb, etc
Protocol Uniswap, Bancor, Opensea, etc
UserLossUsd Loss to user from the MEV
ExtractorProfitUsd Profit to the extractor from the MEV
VolumeUsd Swap volume (where applicable)
Imbalance Sandwiched imbalance percentage
AddressFrom Transaction sender
AddressTo Transaction receiver
ArrivalTimeUS Time the transaction was first seen by our US node
ArrivalTimeEU Time the transaction was first seen by our European node
ArrivalTimeAS Time the transaction was first seen by our Asian node

Deliverables:

  • Creation of a new postgres database to hold the MEV transaction summary data above
  • Data to be persisted on RAID drives across two servers / database instances (a write instance and replicating read instance)
  • Database to be populated with all available historical MEV transaction data
  • MEV transaction data updated in realtime as each new block is classified by Zeromev core
  • Authenticated access to the database restricted by IP for use by Cowswap / Zeromev / Dune / etc
  • Public access via a REST API at data.zeromev.org (rate limited)
  • Document the API and data fields alongside other zeromev documentation at info.zeromev.org
  • Service to be provisioned and maintained for a minimum of 12 months (approximately the remainder of the Zeromev contract with the Ethereum Foundation)
  • Development time is estimated to be 6 - 8 weeks

Funding Request:

Funding request summery: 30k xDAI (15k payed upfront, 15k payed upon completion)


Budget Breakdown:

Development $22400, Infrastructure / Maintenance $7600


Gnosis Chain Address (to receive the grant):

0x022a0D82f00Ed885f3707B92279Aa8528dc1b0A0


Referral:

Proposal was developed in collaboration with middleway.eth


Terms and conditions:

By applying for this grant, I agree to be bound by the CowDAO Participation Agreement and the COWDAO Grant Terms and Conditions.

4 Likes

An excellent grant. This has my support.

1 Like

The first payment of the grant has been executed: link
@Pmcgoohan when you have updates about the progress, please post them here for everyone to follow :pray:

1 Like

Thank you so much for your your support! :grinning: I’m looking forward to getting this out and will keep the community updated on this thread

2 Likes

Hi. I’m sorry to report that Zeromev is having infrastructure problems.
We run two archive nodes for redundancy- sadly both have failed and so MEV data is no longer updating (although transaction timing data is still being collected without problems).
Resolving this situation and making the clusters more robust is my immediate priority.
I’ll be progressing the API project once this has been done.
Thanks.

1 Like

That’s unfortunate!
Hope you find the path for quick recovery of Zeromev!
Please keep up posted when you have a timeline for getting back to work on the MEV API

Hello,
We successfully resolved the infrastructure issues above and increased capacity and Zeromev starting processing MEV data again.
Unfortunately, the system then halted on 04-Jan with a new issue related to the use of Flashbots mev-inspect-py.
I have contracted k8/python experts to help me resolve this.
I’m afraid that because all resources are currently focused on restoring the site, the launch date for the API project has been pushed back to late Feb/early Mar.
I hope to have Zeromev back up by the end of the week. Investigations are ongoing and I’ll have a clearer idea soon. I will post further information here as I have it.
Please accept my apologies for this, and thank you for your patience.

1 Like

Hi

I’m pleased to say that the remaining issues with the site have been resolved which is now back up and running.

I discovered that mev-inspect-py processes certain blocks very slowly (many minutes rather than a few seconds as usual) and have now ensured that the site can handle these outliers.

So it’s full speed ahead on the API project. I’m sorry this has caused delay- thank you for your patience. I’m looking forward to getting stuck in on Monday.

2 Likes

Awesome!
Glad that zeromev is back up and running
Looking forward for the cool data and education that will be built using the MEV API

1 Like

Hi all,

Quick progress update.

The API servers have been provisioned and configured. Replicating database instances have been setup with automatic failover provided by a third watcher server.

The database and API source table have been created as specified. The code to populate this from the existing Zeromev MEV and arrival time databases is nearing completion.

It’s going well and we’re on course to deliver.

I am also extending the Zeromev dataset by backfilling another year of data. This will give the Zeromev site and API the same time range MEV-Explore (from Dec-2019). I expect this extended dataset to be made available as part of the API launch, if not before.

1 Like

Hi

We’re making good progress this week. The API table is now being populated with data for testing and debugging. A few things have come up that I wanted to highlight:

Data Structure / API Improvements

  • I’ve added swap_count columns alongside swap_volume_usd for better reporting
  • I also aim to add extractror_swap_count and extractror_swap_volume columns so extractor volume can be differentiated from user or ‘true’ volume (and perhaps calculated user_swap_count, user_swap_volume columns through the API)

Data Structure Limitations

  • Note that while it will be possible to aggregate arb and swap volumes, it will not be possible to differentiate accurately by protocol because where there are multiple swaps per transaction the protocol field will be set to “multiple”
  • A later development could address this with a dedicated swaps table with a row for each swap rather than each ethereum transaction
  • Sandwich volume is anyway not impacted and can be aggregated by protocol

Classification Improvements

  • I will need to reclassify the entire MEV dataset to populate the API table, and this represents an opportunity to make improvements
  • Currently, if any token in a potential sandwich is unknown (ie: not a known token in the Ethplorer API), the Zeromev classification does not calculate it
  • Because the dataset will be used for aggregate reporting (eg: total MEV per day), I am looking into calculating MEV even in some instances where tokens are unknown
  • This should be possible as long as the input & output tokens are known (see screenshot below)

I think the steps above will greatly improve the power of the dataset, while keeping it simple to understand and report against, as was the original vision.

I do not expect this to push back delivery date beyond the end of this month/early next month.

I’d be keen to hear what you think here, and I’m very happy to discuss it and give further clarification.

Many thanks!
Pmcgoohan

2 Likes

Hi everyone,

Quick update for you.

The improvements to the MEV classification / calculations above have been coded successfully.

I aim to release these early with an announcement this week.

This will prepare us for the new website data format (some changes were needed there) so when the time comes we can export this along with the API data without any downtime and without breaking clients (this will take around 48 hours).

Both the API data export and the REST API are now up and running in development. Testing is ongoing.

It’s looking good! :smile:

1 Like

Awesome, thanks for the update!
Do you have an ETA for public testing ready time?
Would be interested to find people that want to build cool visualizations using the API

1 Like

Hey!

The updated version of the web client has been released, ready for the upcoming data format change above (announced here https://twitter.com/pmcgoohanCrypto/status/1629145705959784448?s=20).

I’m doing a full export in dev at the moment. Spot checks are looking good but I’d like to see the totals once it has completed.

That’ll put us in a position to discuss the launch date early next week. :rocket: :rocket: :rocket:

1 Like

Hi

Testing has raised a few small issues related to low liquidity DEX pools which have now been fixed.

This project has been a useful exercise in auditing the dataset, and the MEV data is looking strong.

Unfortunately, I have just come across an issue with the address fields.

I intend the address fields to relate to the from/to fields returned by the eth_getBlockByNumber RPC call, as I think these are most useful (please let me know if you disagree!)

I’ve been using the mev-inspect database address from/to fields until now, but on a closer inspection it seems they relate to specific traces which are not always what is needed (eg: uniswap 3 contract addresses rather than end user addresses).

So I think it’s best to ignore those fields, and import them directly via RPC as a batch job and incrementally after that.

Given that this means further coding and testing involving a high volume of data (the entire chain since 2019), where I was going to set the launch date for next week, I think it’s now sensible to aim for the week after.

How do we feel about March 15th as a potential launch date?

2 Likes

Hi everyone,

So I (rather publicly) discovered some issues with the backrunning data in the API last week, well of my interpretation of it at least. Explanation here…
[twitterdotcom]/pmcgoohanCrypto/status/1635283236158046215

Although it was hasty of me to generalize about the nature of these backruns, sandwich imbalances are interesting.

They show where my sandwich calculation is having to work to produce a balanced result based on the mev-inspect-py data. They will often point to more complex MEV types that are not yet fully quantified.

However, I want to avoid this data being reported as first-class MEV as it stands. As such, while backruns will still be visible through the API, I will null the user loss and extractor profit columns for them. I’ll keep the imbalance column as an indicator that the sandwich has been balanced.

The data in these nulled columns won’t be lost, it’ll still be stored in the database, just not published.

Despite all this, I’m on course for the API to be available at some point on Wed.

However, it is a little close to the wire as I will still have data importing right up to then. Because of this, and in the light of my recent backrunning U-turn, I’m keen to have more eyes on it before going public.

As such, I think the idea of an internal soft launch in the first instance is a good one. Perhaps we can discuss doing that this week and hold off on a general launch and any announcements until after that?

Let me know your thoughts

I just noticed that a couple of posts above have been flagged for some reason. Not sure why, they contain relevant information.

Yeah it’s just the forum auto mod, I’ve unflagged them, they should be visible now.

1 Like

Quick update: we had a successful soft launch of the Zeromev API yesterday :partying_face: :tada:

The API is up and running and serving MEV data from Jan-2020 to the present and updating in realtime.

It is fully load tested and rate limited and OpenAPI documentation is complete and published on the webserver.

Thank you for your support and encouragement in getting to this point. We’re looking good!

3 Likes

WOW! Fantastic! Where to find the API? I was about to contact you to use the data for academic research.

1 Like