Solver Selection and Benchmarking

Simple Summary

A very important component of CoW protocol are the so-called solvers. CoW protocol is truly unique in the sense that it allows independent off-chain solvers to compete for the right to execute a batch of trades.

Solvers are rewarded by CoW protocol with CoW tokens every time they executed a batch of trades:

At the moment, the following questions are at the top of mind of any new solver developer:

  • How do I get my solver deployed in production?
  • How good is my solver?
  • Say the CoW team is ready to review my solver, which rules will they enforce to approve it or reject it?
  • Why are there already solvers in production that did not have to go through the approval process all other new solvers have to go through?


The idea is to build an indipendent testing and benchmarking playground for solver developers that will let them test their solutions on the fly and without having to overload the CoW team with requests to test and approve their solvers.

The playground can be built in two different ways; the later more complex than the other.

Approach 1

The first approach would consist on collecting sets of trade batch test instances in an AWS S3 bucket every end of month. Solver developers can tap into the bucket of the most recent month and attempt to solve the trade batch test instances. Once the solver is finished, the solution is submitted to a solution scoring service that will provide a score to the solver. Solvers are allowed to submit their solutions several times a month. This setup is similar to a Kaggle competition, in which solvers will need to achieve the highest score to climb up the ranking and be considered for deployment by the end of the month.

Approach 2

The second approach would consist on deploying a websocket that will emit trade batches to listening solvers. Solvers must be prepared to receive the trade batch and submit a solution as soon as possible to the scoring service. This setup will allow to benchmark solvers in their ability to achieve maximum utility and the speed at which they do so. Similar to approach 1, the best solvers are shortlisted for deployment by the end of the month.

Either approach 1 or 2 are fairly simple to implement and would dramatically improve the solver development and approval process.


  1. Since all solvers utilize the same trade batch test instances, it can be ensured they are fairly benchmarked and, later on, screened for deployment.

  2. Solver developers can benchmark their solvers without having to wait until they are whitelisted.

  3. On the fly benchmarking and scoring can help solvers understand how they can improve their solutions.

  4. Total transparency in the selection and deployment of solvers. No solver should be running in production “just because”. Solvers running in production should be selected based on a fair comparison of the actual utility they can achieve or the unique attributes they bring into the mathmaking process of CoW protocol.

  5. If the performance of a solver that has been deployed decreases, then space should be made for a different solver that can achieve a higher performance score.

  6. Simplify the whitelisting process for the DAO.

Discussion Points

The exact scoring methodology remains an open question:

  • How should trade batch solutions be scored?

  • Should different scoring functions be implemented? One for speed, one for utility, one for amount of matches? Should an average total score be calculted?


For the implementation, it would be suggested to build a small task force of 5 community members that would build an MVP of the service and ranking dashboard.


I think this is a nice proposal and such a system will be an important step towards lowering the barrier to entry for new solvers. One thing I’d like to note about automatically listing the best solvers for the next months is that this likely still requires some form of bonding or other trust based approach as otherwise a solver that performed well under test may turn malicious and do unintended things when moved into production.

The current infrastructure is actually already running a “shadow driver” where the staging version of the Gnosis solvers receive production order flow and simulate how they would settle these trades. Unfortunately, it is still based on the current pull model (the driver queries all solvers it knows about therefore adding an external one still requires work from our team) and the solutions as well as logs and other metrics are not yet exposed to the public.

Generally, we would like to move the infrastructure into a more push based approach where solvers themselves are responsible for querying the auction instance and announce their solution (either via an API or websockets) to the orchestrating component and eventually are responsible for submitting the solutions as well.
In this model, it would be very simple to shadow-solve batches on real traffic.

Regarding a development team, do you happen to know people that would be able to tackle such a task? If so we could see if such a component could be combined with the architecture refactoring we would like to undertake anyway in the future.


I am very happy to see a formal process for solvers, both for transparency and simplification/automation.

1 Like

Thanks for the comments! It’s very interesting to hear that an architecture refactoring might take place in the forseeable future. In the next post replies, we will start with some thoughts regarding the architectures on the diagrams above and then we will proceed to “formalize” an extended architecture for the current setup that should enable solver analytics in the development environment.

Current Architecture



  • CoW Protocol is in control, i.e. driver decides which third party services it interacts with.

  • Can impose screening and whitelisting requirements on solvers (such as some form of bonding).

  • (Nearly) synchronized interaction between driver and solvers. Competition starts (nearly) at the same time for everyone. Can impose time limits.


  • Must maintain list of PROD whitelisted solvers (modifying PROD systems is always dangerous).

  • Must implement solver selection and whitelisting process.

  • New solver developers must run their own driver to test their solvers in a production grade environment.

1 Like

Push Based Architecture


We will assume every solver can connect/subscribe to CoW’s auction websocket channel. The auction websocket channel emits auctions in binary streams that must be received and processed by solvers. Main difference with the current architecture is that any solver can listen to the auctions being streamed in production. We assume they do not need to be production grade (whitelisted) solvers to listen. We assume solvers will need to HTTP POST to the driver whenever they believe they have a solution. From that point of view, only whitelisted solvers should be allowed to POST to avoid malicious behavior.


  • Solvers can simulate their solutions in an environment that’s nearly the same as production.

  • Can impose screening and whitelisting requirements on solvers (such as some form of bonding).

  • Synchronized interaction from driver to solvers. It should be possible to impose time limits. POSTed results should have a trade batch ID. The driver should probably have a process to only consider results POSTed for the latest trade batch emitted.


  • Still want to maintain list of whitelisted solvers that are allowed to POST to the driver.

  • Must implement solver selection and whitelisting process.

  • Solvers may overload the driver with POST requests and the driver must maintain some sort of queue of obtained results. Alternatively, CoW could require the solvers to stream their solutions via websocket. Then, the driver needs to hold connections to all solvers and manage their incoming streams of data.

Overall, we like the current architecture better. Below is the more detailed overview of how we envision approaches 1 and 2 described above could extend the current architecture.

1 Like

Extension (Approach 2)

The current architecture can be extended to add a DEV driver with an easier onboarding policy. We could think of adding a solver to development being as simple as filling in an online form. The form would collect solver owner info. as well as the solver’s endpoint.
In this case, even if for whatever reason there are differences between the auctions retrieved by PROD and DEV, what really matters is that all solvers are tested and ranked based on the same auctions.

1 Like

Extension (Approach 1)

Approach 1 would not require a DEV Implementation of the driver. A database that receives the auctions would be sufficient. I believe you already have an S3 bucket that gets periodically updated (?).

My team can help set up either of the approaches. We are actually implementing approach 1 on our end to score and rank our solvers against each other and the solution files provided by CoW. Once we are finished we could open the platform to everyone that may want to benchmark their solver.

1 Like

Crossposting this message from TG:

  • long term the push-based model seems like a requirement, it is not going to scale to force the driver itself to go query every known solver for a solution.

  • seems weird to have the driver be responsible for distributing the order book data at all. Seems like that should just be a shim in front of the order book DB (basically direct connection) that anyone can access in real-time. (like a more traditional data plane read-only service, read via WebSocket or polling)

  • The driver (or maybe more accurately in this model “the judge”) should only be responsible for evaluating a set of solutions, deciding which is best, and then posting it on-chain

  • the solvers then can post solutions to “the judge/driver” via an endpoint (prod, test) with some auth layer that only allows known friendlies (whitelist) to hit it.

    • In this case, the judge/driver is able to protect itself from bad actors via the auth layer and needs no knowledge of who the solvers are.
    • also making the drive/judge a modular component allows for the ability to swap in a “sandbox mode” judge that does historical rankings etc in a test environment. So when a solver gets brought up to the big leagues, they just switch from posting to test to prod

TLDR is that IMO the driver should be split into 2 separate services. 1 for order book distribution to solvers and 1 for receiving and evaluating solutions from solvers.

1 Like