CIP-20: Auction model for solver rewards

CIP: 20
title: Auction model for solver rewards
author: @felixhenneke, @harisang
status: active
created: 2023-01-16
replaces: CIP 14

Simple Summary:

We propose an auction based mechanism for rewarding winners of the solver competition.

Currently, the winner of the solver competition is rewarded an amount of COW token based on the orders that are included in their solution. This per-order rewards model does not take into account general market conditions or how difficult it was to find a solution. It therefore requires constant readjustment and it does not incentivize research into tackling challenging batches, e.g., utilizing the value of CoWs.

The proposed auction mechanism allows for dynamically finding a reward based on how much value a solver provides to users by settling a batch. This results in a distribution of rewards better aligned with goals of the protocol, solvers, and users.

Motivation

The current reward scheme for the solver competition is based on risk adjusted rewards on a per-order basis as described in CIP-14.

There are several issues with the current model:

  • It is parameterized by an ad-hoc minimum reward of 37 COW per order.
  • It is oblivious to how easy it is to detect the best execution (thin vs thick competition). Auctions with multiple identical submitted solutions are rewarded the same as auctions with only one valid solution.
  • It does not (as much as it should) reward innovation and taking advantage of CoWs and structurally better solutions (e.g., detecting CoWs vs execution with touching the same pool multiple times).
  • It is oblivious to the volume of an order and the particular pairs traded.

Ultimately, the current model is quite static, does not directly take protocol viability into consideration, and is not adaptive enough to the ever-changing competition landscape.

The proposed reward mechanism results in an improvement of all those issues. It also requires some changes in the behavior of solvers. We will discuss those after the specification of the mechanism.

Specification

Per-auction rewards are computed using a second-price auction mechanism for the quality of solutions:

  1. Each solver commits to a score and a solution.
  2. The winner is the solver with the highest score.
  3. The winner gets to settle the auction on chain, and is payed
    payment = cap( observedQuality - referenceScore ).

Here, the observed quality is observedQuality = observedSurplus + observedFee, as executed on chain. The reference score is the second highest submitted score. The uncapped payment of observedQuality - referenceScore is capped from above and below using the function cap(x) = max(-c, min(c + observedCost, x)) with a cap c of 0.01 ETH.

Submitted scores have to be positive. We note that the (protocol-submitted) empty solution, having a zero score, will always be considered in the ranking. This implies that an auction with only one submitted solution (from a solver) can still get rewarded (the referenceScore in this case is zero).

The observed quality is zero if the settlement is not successfully submitted to the blockchain. This can be, e.g., due to a revert of the transaction or because the transaction was not successfully settled in time (current limit is 2 minutes).

The payment described above can be decomposed into a reward and a gas reimbursement. In the case of a successful settlement, the gas reimbursement is equal to observedCost of the settlement, while the reward is payment - observedCost which is equal to min(observedQuality - observedCost - referenceScore, c). For a failed transaction, the gas reimbursement is zero and the reward is negative and equal to max(-c, -referenceScore), in which case the winning solver would have to pay the reward to the protocol. As before, gas reimbursements are payed in ETH while rewards are paid in COW, using an up to date COW price.

Example

Suppose we have 2 solutions:

  • solution 1 has quality $20, solver reports score $10
  • solution 2 has quality $25, solver reports score $5

Solution 1 wins based on the higher score. If they successfully submit the solution on chain they get payed $20 - $5 = $15. If the transaction fails they get payed $0 - $5 = -$5, i.e., they pay the protocol $5.

Suppose we have a third solution:

  • solution 3 has quality $15, solver reports score $9

Solution 1 still wins but the payment is $11 in case of a successful submission on chain and -$9 in case of a revert.

The protocol commits to spend 20_000_000 COW tokens per year i.e. around 383307 COW per week on rewards. If at the end of an accounting period (currently: a week) the total amount of performance based rewards to be paid to solvers is smaller than the amount committed to by the protocol, the remaining sum is distributed to solver teams which consistently provide solutions. Those rewards will be distributed pro-rate to solvers based on the number of auctions they submitted a valid solution for over the accounting period.

The capping of performance based rewards is chosen such that the total rewards for all solvers amount to around 15_000_000 COW tokens per year (see section on experiments). This is comparable to the rewards payed to solvers in Nov-Jan. The remaining COW tokens of the total allocation of 20_000_000 COW are rewarded based on consistency. The cap of 0.01ETH will be adjusted one time after 4 weeks of using the new reward scheme. If the amount of rewards payed in that period extrapolates to significantly more than 15_000_000 COW per year, the cap will be decreased. If the amount of rewards payed in that period extrapolates to significantly less than 15_000_000 COW per year, the cap will be increased.

We will monitor the additional constraint score < observedQuality for submitted scores and add this rule to the social consensus rules.

Implications for solvers

With the new reward mechanism solver teams would have to submit not only their solution but also a score. The score can be interpreted as expected quality - expected costs. A profit maximizing solver would maximize their score, which in turn would mean that they maximize expected quality - expected costs. The second-price mechanism means that solvers would truthfully report this expectation.

Example

Suppose a solver found a solution which in the case of a successful settlement generates a quality of $20 (e.g., by a surplus of 11$, fees of $9) for costs of $8 (e.g., by transaction costs of $9 and estimated positive slippage of $1). They estimate the revert probability to be 5%. Then they should optimally submit a score of (0.95 * $20 + 0.05 * $0) - (0.95 * $8 + 0.05 * $9) = $10.95.

The change of the reward mechanism will thus require solvers to adapt their strategy a bit: Instead of just maximizing quality - costs they now explicitly have to take into account revert risks. This moves complexity to solvers, as proper risk management is going to be required to be profitable.

The new reward scheme does not by itself enforce maximizing surplus of users. This is because in the score computation a smaller surplus of a solution can be compensated by a smaller cost e.g. due to positive slippage. We want to avoid the situation of solvers decreasing surplus and will therefore monitor and enforce EBBO (Ethereum’s Best Bid and Offer) rules more strictly, see the social consensus rules in CIP-11.

Transitioning period

Solvers have to learn to submit scores, e.g., by having a model for revert risk and costs. Therefore we propose a transitioning period for the new reward mechanism along the following lines:

  1. Two weeks for a dry run of the new scheme where solvers can optionally submit either a score or a scoreDicount (at most one of the two values). All solutions are simulated and only successfully simulated solutions are considered. If a scoreDiscount is submitted, the driver computes a score as score = simulatedSurplus + simulatedFees - simulatedCost - scoreDiscount. Solutions are still ranked and rewarded by the old scheme. Solvers can see how they would have been ranked and rewarded using the new scheme.
    This period will start on 28.2.2023 and with weekly payments computed on 7.3.2023 and 14.3.2023.
  2. Four weeks of using the new scheme in production where submitting either a score or a scoreDiscount (exactly one of the two values) is mandatory. The ranking of solutions and rewards are computed using the new scheme. This period is used to observe the competition and the effect of capping. If rewards are significantly smaller than expected the cap is increased. If rewards are significantly larger than expected the cap is reduced.
    This period starts on 14.3.23 with the first payment to solvers taking place on the 21.3.2023.
  3. After that period, the submission of a score becomes mandatory. This period starts on the 11.04.2023 with the first payment being on the 18.04.2023.

Experiments on historical data

The new reward scheme has been tested on historical auction data from October to January. Using the proposed scheme, the capping of rewards was chosen such that rewards are close to 41068 COW per day, corresponding to 15_000_000 COW per year, with a fixed exchange rate of 0.00005 ETH/COW. For October-January:

Figure

In the time period November-January the proposed scheme results in the following rewards for the individual solvers (blue: scheme old, yellow: new scheme).

Figure

The script and data for the experiments can be found in the Appendix. The script also contains additional information on the distribution of rewards, profits, and payments per auction and per solver.


UPDATE January 30 We included experiments and specified the capping.


UPDATE February 7

  • Information on the total budget for rewards was added and how they impact the choice of the capping parameter and consistency based payments.
  • A timeline was added with information on a transitioning period to the new reward scheme.
  • To avoid confusion, the naming of rewards changed slightly. The proposal now distinguished between payments and rewards. The naming is now in line with the current reward scheme.

UPDATE February 13 Slight change in the wording to make it compatible with future changes, e.g., driver/solver co-location.


UPDATE February 16 CIP is moved to voting phase. Link to snapshot.

Appendix

The script and data for testing the new scheme on historical data can be found here:

A lot of parties contributed to the ideas and formulation of this CIP-Draft. Some of the sources are:

  • A report on the proposed auction model with a discussion of alternatives and some rigorous analysis of optimal behavior of solvers:
  • The slides of the workshop on solver rewards restructuring:
  • A forum post on a related auction model by Thomas Bosman:
  • A report on a related auction model by Max Holloway (Xenophon Labs):
6 Likes

Great write-up!

Just to emphasize the core idea behind the hybrid reward scheme (second price auction + “universal basic income” rewards) was to alleviate the worry the protocol could be trying to squeeze solver rewards to 0 as well as giving solvers an incentive to

  1. find new opportunities that create meaningful surplus improvements (and get handsome rewards from the score improvement)
  2. at the same time close gaps to other solvers that are making a lot of money on exotic strategies and thus increase the part of the budget that is more evenly distributed (“universal basic income” of solvers). This should also make the protocol less reliant on single solver strategies.

I would suggest we implement a default scoring function (multiplying the surplus from the simulation with the revert risk obtained by linear regression) and use this in case solvers don’t provide a custom score. This would allow them to “ease in” to the new scheme.

Maybe this is too naive, but a starting point could be to pro-rate on the number of auctions each solver submitted a solution which passed simulation and respected EBBO over the accounting period.

score > observedQuality could happen in the case of negative slippage, right? Would we charge a solver twice in this case (once for the slippage and once for potentially delivering a quality “below” the score of the second best solution)?

I believe these links are broken atm.

1 Like

I think we should also consider specific situations where solvers attempt to submit a settlement but the transaction takes a long time to mine. I propose we also set a block limit for the settlement to appear on chain. I would even suggest enforcing “tagging” solutions (by appending some unique bytes to the calldata) so that we have an on chain record of what solution was meant for which auction (in order to avoid ambiguities of the same solver winning back-to-back auctions for example).

CIP-14 was already an attempt to adjust existing solver rewards to account for other costs (such as revert risks). I think if we don’t incorporate more dynamic reward scheme that allows for these risks to be priced in, then you might run into other issues (like solvers not being incentivized to participate in particular auctions because the additional risks are higher than the possible reward).

Additionally, having solvers propose scores themselves is a more decentralized approach and a step in the right direction IMO. Just from a technical point of view, having a centralized component responsible for simulating a solution from each solver would not scale well if the number of solvers grows significantly (I mean, it could be made to scale, but would also mean increasing the complexity and costs of running this centralized piece of infrastructure).

From an implementation point of view, I have a few more details that I think we should iron out:

  • Would this require all solvers to reveal their settlement calldata to participate in an auction, or just the winner (as the settlement should end up on chain) and second place?
  • When is the second place solution’s quality observed, at time of submission or would it be observed on the same block? Same transaction index? I think the choice of when to observe a solution’s quality may have some weird interactions with MEV (setting a higher slippage may make a solver pay more in MEV but cause competing solutions to not simulate when executed at the same block + transaction index and receive more rewards - not sure if this is a real problem, but for a particularly “juicy” order, it might make sense to play these games).
  • Is the protocol expected to verify the simulation for each proposed settlement transaction, or is the solver on the hook for saying that “yup, my transaction simulates successfully”. I think the latter scales better and, with the current rewards schemes, the solvers are already disincentivized to lie (as they would end up paying a lot if their solution can’t actually be executed on chain if they are selected as the winner). I would actually argue that the protocol should be responsible for simulating the proposed solutions when choosing a winner.

One last point for clarity, I assume that the rewards are payed in COW. With that in mind, what is used as a price oracle for computing the observed quality? Is it just the external prices that are already included in the auction and passed to the solvers (and currently used for objective value scoring)?

2 Likes

We are thinking of a sealed-bid setting where everything is revealed after the auction has ended. So, solvers would need to report their score, and also their solution json as they are doing right now, i.e., submit a json such that the calldata can be recovered. This latter step could be skipped, but I believe it’s better to still have it, in order to do some sanity checks (e.g., be able to check the orders each solution is settling, the clearing prices etc.

I agree there is a version of the proposed model where solvers only need to report a single number (their score), or even do multiple bids, and openly, and only the winning solver needs to provide calldata, after confirming that it is the winner. But this can further increase complexity on the solvers’ side, and it would really be a first-price auction in that case, so I am not in favor of moving towards this, at least for now.

The point of the score is that it is determined at the moment solvers report it. So there is no need to check anything about the 2nd solution. The payout scheme simply says that the payout is equal to the observed quality minus the reference score. And the reference score can be chosen as either the score the winner reported (first-price variant) or the score the second best solution reported (second-price variant, which is proposed here). Again, these are “static” numbers, and represent in a way what solvers promised they would deliver; and their reward or penalty will be based on the difference between their promise and what actually happened.

I addressed this above.

The model itself computes rewards in ETH. It is still open to discussion how the protocol would like to do the payments; one could split between ETH and COW, or only do gas reimbursements in COW and the remaining reward (if any) in COW, or any convex combination of the two. Not sure what’s the simplest way to reach consensus about this one.

1 Like

Just a couple of thoughts.

  • On the consensus rule score < observedQuality. If the protocol doesn’t simulate the settlements, then it will only be enforceable in retrospective. Ideally, solutions that do not satisfy the constraint would be discarted by the auctioneer. However, I think that the possibility of being punished in restrospective is enough incentive for a solver not to break the rule even if the solution will not be discarted.

  • On the quality of a solution observedQuality = ObservedSurplus + observedFee. Does this imply that CoW protocol will stop caring about the cost of execution for a solution? As it is mentioned in the Alternatives section, currently the observed quality does take the cost of execution into account. If the protocol does not simulate the solutions, then all solvers will need to follow the same approach to simulate their solutions. Differences in the simulation would lead to differences in cost estimates. If two solvers have the same solution, then differences in cost estimation approaches (and, therefore, the score) should not dictate who will be the winner of the auction. To ensure there are no differences resulting from the estimation of execution costs, the protocol will need to enforce rules to estimate the cost and simulate solutions in retrospective to determine if the score reported by winning solvers was “fair” given the observedQuality and the cost that would have been simulated. This is added work that would not be needed if the protocol computes the score. So, even though the protocol might save on the costs from running the simulations (as mentioned by @nlordell), it will have to invest some resources to ensure solvers are adhering to social consensus rules.

For now, I’m in favor of the setup in which the protocol determines computes the scores. This will ensure the the protocol can keep enforcing social consensus rules for the time being. In the meanwhile, the team could work on the tools needed to ensure the protocol can keep acting as referee in a setup in which solvers report their scores.

If we want to incorporate other type of risks that are not quantified by the current scoring function (simulatedQuality - simulatedCosts), then we could start with a setup in which the protocol computes the score and solvers report a scoreDiscount < 0. The protocol would then compute the final score for a solver as follows: simulatedQuality - simulatedCosts - scoreDiscount. This will actually allow the protocol to see how solvers will behave in the new regime, i.e. how much they actually adjust the score to incorporate other information. This will be really interesting to observe. I’m almost certain most solvers will always submit a score discount of 0. If this is the case, then we have a case against the new setup since differences in the score would only arrise from differences in the expectedQuality and the expectedCosts. The former measure is something that should only depend on the price vector (for which we have clear social consensus rules, such as no penning) and the fees (which are static). The later measure should be fair as long as all solvers use the same simulation approach. If this is the case, we would end up with the same setting we have right now, with the only difference that solvers will now take over the cost of simulating their solutions.

TL;DR

I’m in favor of the starting with a setup in which the protocol computes the score and allows solvers to report a scoreDiscount such that the final score is computed as follows

Score = simulatedQuality - simulatedCosts - scoreDiscount

The scoreDiscount will be an indicator of what the protocol can expect once the competition moves to a setup in which solvers compute the score.

2 Likes

Right! Sorry, for some reason I interpreted referenceScore = secondHighest(observedQuality). If this is just the reported score, then you are right, and this greatly simplifies things and addresses a few of my concerns (so scores don’t require verification by the protocol, and the observed quality is what is mined on chain).

Sorry, should have clarified a bit, is it expected that the calldata appears in the “solver competition” endpoint for everyone or just the winner? So, in practical terms, do we publicly “unseal” all of the bids?

Sorry, I should have clarified a bit, but when I mean “protocol” it was more of a technical distinction than a practical one. Specifically, we are in the process of restructuring our services in order to allow solvers to run the component that was typically handled by the solver binary that is run by the CoW team (enabling solvers to do things like manage their own PKs and have more control over how transactions get sent to the blockchain). The statement was more about “where” the simulation would happen in the services we currently operate given the new services model we are moving towards, and not that existing HTTP solver implementations should be on the hook for doing the simulations themselves.

Makes sense.

2 Likes

Thanks for the suggestions, @Filius. It really helps us if solver teams provide feedback.

Some thoughts on what you wrote:

Regarding social consensus rules, we plan to check them retrospectively. But we were planning to also simulate submitted solutions, at least in the beginning, and might check such rules before ranking. For the rule score < observedQuality we were not sure if we would need to enforce it strictly, more on that rule further below.

If we ignore revert risk for the moment, the reasoning is as follows: Solvers know the quality of their solution since they know the price vector, which uniquely determines surplus, and included orders, which determines fees. If a solver also knows how much they would need to cover the cost of executing the settlement, e.g. by simulating, they would optimally submit score = expectedQuality - expectedCost = simulatedQuality - simulatedCost. If they win the auction with this score, they get payed observedQuality - referenceScore >= simulatedCosts. Since the winner is determined by score, the protocol implicitly takes costs into account. Costs would just not be handled by the protocol itself but by solvers.

We have thought about a mechanism similar to that. What this boils down to is in my opinion essentially equivalent to submitting a score and restricting which scores are valid, similarly to rules of the form score < simulatedQuality:
If we keep the payment as observedQuality - score and the protocol computes score = simulatedQuality - simulatedCost - scoreDiscount, then the optimal scoreDiscount would be such that the score the protocol computes is again equal to expectedQuality - expectedCost. More on the underlying reasoning can be found here. This results in scoreDiscount = (simulatedQuality - simulatedCost) - (expectedQuality - expectedCost) and we have the same mechanism as with submitting scores directly. Any restriction of the form scoreDiscount > 0 (did you have a typo there?) can be expressed as a restriction on the score itself. In your case it would be equivalent to score < simulatedQuality - simulatedCost. This would be an alternative to the rule we suggested in the original post.
We included the rule score < observedQuality mostly to prevent shifting surplus to negative costs. But we were not sure if we should make it a strict rule or if it should hold only on average. What was your reasoning for including this restriction?

Could you explain your reasoning for that?

2 Likes

Very excited to see this and looking forward to the results of the experiments. I think this is a big step forward and would like to see this implemented ASAP.

However, it does seem like the “margin” between the winner and second place objectives has expanded substantially recently so there are some interesting decisions to be made around that. One idea is to fund part of the rewards from the traders themselves, perhaps past some flat amount.

2 Likes

Something else to consider is that it seems a decent chunk of the total margin comes from single solution auctions. In these cases, it’s not clear that if I produce a solution with 0.5 ETH objective that I have actually provided that much value. The alternative may have been that it gets solved in the next auction (potentially with a higher objective). So maybe it would be appropriate to have a lower rewards cap for these solutions.

2 Likes

Something else to consider is that it seems a decent chunk of the total margin comes from single solution auctions. In these cases, it’s not clear that if I produce a solution with 0.5 ETH objective that I have actually provided that much value. The alternative may have been that it gets solved in the next auction (potentially with a higher objective). So maybe it would be appropriate to have a lower rewards cap for these solutions.

Very good point. Single solution auctions are tricky for this model, as the model is based on competition and the assumption that the protocol indeed has to make a choice among a few candidate solutions. In the case of a single solution (corresponding to a single seller), there is a monopoly in a way.

We would very much like to hear your suggestions about how to deal with this case.

  1. One way to go, as you suggested, is to have a separate cap that is only applied in such cases.
  2. Revert to the current scheme, in these cases. Not sure if we want to go that way, but this would mean doing the auction-based rewards, whenever there is competition, and reverting to the current model of precomputed rewards when there is only one candidate solution.

Note that single solution auctions are not necessarily that rare, especially since there might be niche solvers that are able to trade exotic tokens that no other solver can trade. In such case, it is certain that such auctions will show up from time to time. One could, of course, argue that paying very large rewards in such cases would incentivize other solvers to upgrade themselves so that they are also able to trade such tokens; in other words, any such opportunities for huge profits would not be there for long due to competition. However, I still fill that there caps are needed, as I don’t see any scenario where the protocol should pay 1 ETH, for example, for the execution of an order that can only be executed by one solver.

As a last comment, capping in cases where a solver can predict they will be the only ones providing a feasible solution can also create games; if I know I will be the only participant in the auction and I will get paid $30 max, while I can generate surplus of $200, then I can easily report a score of epsilon, essentially matching the order at limit price, and then pocket the rest as slippage. For this reason, we need to signal that social consensus rules are there for a reason, and exploits of this form should be penalized drastically.

3 Likes

I think this is also where the 2-tiered rewards scheme (2nd price auction + a more egalitarian scheme based on participation rate) could kick in.

If there is only very little or no reward paid out based on solver participation (because the 2nd price auction is exhausting the pre-committed budget), solvers would have an incentive to reduce the gap on token pairs where there is a solver monopoly to increase that share of the total rewards.

2 Likes

Yeah I think reverting to the old rewards is a bit extreme. I would favour setting a separate cap as a compromise between the two models.

1 Like

Alternatively of course we could apply some other function to the raw rewards besides min(x, cap). E.g. f(x) = min(ax, cap) where 0 < a < 1.

It’s not obvious that there is a right choice here but unless we’ve identified a serious problem with solvers not solving solvable orders I don’t think the proportion of rewards given to single solution auctions should be substantial at all. Maybe an intuitive requirement should be that the average reward for a single solution auction should be no higher than the average reward for the rest in backtesting.

1 Like

I just updated the proposal with experiments on historical data. In short, the results are comparable to what was presented in the workshop slides. Please have a look at the script and data and play around with it.

2 Likes

The simple capping strategy I used for the experiments does not satisfy this. Since auctions with only a single solution are not treated in a special way, average rewards turn out to be between 1.5x and 2x as large, see the notebook with experiments. A different strategy might help here. Though I am not sure how important it is since auctions with only a single valid solution account only for about 10% of rewards.

Feel free to have a look at the notebook and try out other capping strategies.

1 Like

Interesting the caps are much lower than I expected. I should note that the graph shows that increasing the cap does not increase rewards by much but will probably lead to many less batches getting capped.

If it’s easy, it would be cool to see this done for January as well!

As for single solution auctions, I think it matters less in your experiments because the cap is so low compared to what I was looking at. Will think more on whether it merits special treatment imo.

1 Like

Another concern is that the protocol is paying less than committed to so “UBI” kicks in but it’s mostly due to the cap rather than an actual lack of surplus produced. I believe this is an undesirable outcome with such a low cap. In such a case, the cap should be increased first.

1 Like

We indeed would like to choose the cap as large as possible, ideally none at all. At the moment this does not seem feasible since it would lead to severe overspending from the side of the protocol. The first few weeks under the proposed scheme would be used to adapt the cap to the actual payments we do. Using consistency based payments is a way to not screw over solvers in the beginning when we take a cautious approach for the cap.

We are still debating how to best proceed with choosing the capping strategy. A fixed cap has similar disadvantages as our current model in that it does not adapt to market conditions. Capping based on transaction costs or fees is also possible. If you have suggestions here, we will try to test them.

Long term this looks like the right thing to do. It goes in the direction of paying solvers from adaptive fees taken directly from surplus. For now we would like to hold off from implementing such a change since it might impact the user experience a lot.

1 Like

Yeah I guess I just mean that UBI would go into effect if the week’s rewards are not high enough. In such a case, the cap should be increased instead. If the cap hits some other cap like 0.5 ETH and there’s still rewards left over, then we could use UBI.

If the cap is to prevent overspending and UBI is to prevent underspending, we should not impose the first to create a need for the second.

2 Likes

I’m going to raise this point again as I think it is important and didn’t see a reply to it (sorry if there was one that I missed!)

I think we need to consider the failure condition of a settlement not arriving on chain in time. This should be equivalent to it reverting and have an observed value of 0. I think this should be a block deadline for the settlement to appear on chain.

As an implementation note, I would suggest enforcing “tagging” solutions (by appending some unique bytes to the calldata) so that we have an on chain record of what solution was meant for which auction (in order to avoid ambiguities of the same solver winning back-to-back auctions for example).