Report on the first two weeks of CIP-20

Report on the first two weeks of CIP-20

CIP-20 changed the reward mechanism to an auction based mechanism. This change impacts the ranking of solvers and the payment of winning solvers. As such it can have an influence on various performance metrics of CoW Protocol.
In this report, we describe the current state of the competition after two weeks of testing CIP-20 in production.

Total rewards and their distribution

The main impact of CIP-20 is a change to how rewards are allocated to solvers. The goal is to pay more to solvers who provide solutions which are significantly better than the solutions of other solvers. Additionally, a part of the budget for rewards is allocated to solvers providing solutions consistently.

Rewards budget and consistency rewards

The total weekly budget for rewards is 383307 COW. Our target was to spend around 75% of it on performance rewards and 25% on consistency rewards.
In the first two weeks we spend 2 * 383307 COW = 766614 COW (actually 773867 COW and -0.X ETH due to subtleties in the accounting) on rewards. Around 85% were spent on performance and 15% on consistency. This indicates that we should tune the capping of performance rewards to pay slightly less performance rewards.

Reward by solver

Rewards are allocated differently to the different solvers.

In the first two weeks of CIP-20, Barter profited the most from the new scheme. Even though Otex and Laertes won significantly more auctions than Barter, they earned less rewards. SeaSolver performed worst compared to other solvers, mostly due to overbidding costing them around 0.5 ETH (see the section on broken social consensus rules below).

We do not expect that solvers have already adapted their solution finding strategy much to the new mechanism. Optimization based solvers might have to include revert risk explicitly into their algorithms to make the most out of the new mechanism.

Rewards per batch

Looking at the rewards per batch we see that most batches are rewarded relatively little (around 0.001 ETH) but in some cases the reward is significantly larger (0.01 ETH).

About 20% of payments are getting capped. This might force solvers to include the existence of a cap into their solution submission and bidding strategies. A larger cap is desirable here.

Looking at the distribution per batch for Barter and Otex we see that the average reward for Barter is a lot larger.

This explains how Barter can earn more rewards even though they win less auctions.

Large rewards are also paid when there is little competition. In the last two weeks, about 10% of auctions had only one competitor. We should aim at reducing this number. About 80% of auctions have at least four competitors.

Of the auctions without competition, most are won by the 1Inch solver.

Effect of capping payments

Performance based rewards are capped at 0.01ETH. A smaller cap would have been needed for the desired split of rewards (75% performance, 25% consistency).

It is noteworthy that the total performance rewards would decrease for large caps. This suggests overbidding on the side of solvers. One problem could be missing risk management for orders with large volume. There were several cases where uncapped payments would have resulted in payments from the solver to the protocol of the order of multiple ETH. The largest uncapped payments would have been around 15ETH and -15ETH.

A change of the cap would affect different solvers differently. For example, PLM would generally profit from a larger cap, while Otex would not.

Bidding behavior

To participate in the auction solvers have to submit a score which is used for ranking by the protocol. It is up to solvers to compute a score that makes them earn rewards.
Without capping, the score should be the expected quality minus expected cost. With capping this strategy changes a bit.

Score submission strategies

The protocol currently supports two ways of specifying the score of a solution:

  • by submitting a score directly and
  • by submitting a scoreDiscount which results in a score given by the old objective minus this discount.

At the moment most solvers submit a score directly. There are notable exceptions though. For example, Barter useed the default scoreDiscount.

Overbidding and underbidding

Over- and underbidding in the auction can both cause problems. Overbidding can lead to users getting worse prices. Underbidding can be a sign of collusion.

We do not have a good metric to determine overbidding and underbidding of solvers.

One indication for overbidding is the number of auctions where the winner earned a negative reward even though they settled successfully on chain (~11%). This has happened often for Laertes (~15%), 1Inch (~15%), and SeaSolver (~40%).

Broken social consensus rule

Overbidding is only prohibited in cases where a solver is willing to pay the protocol for settling an auction, score >= quality. This was initially only communicated as a social consensus rule and is now implemented as a strict rule by filtering out such solutions before ranking.

During the first two weeks of CIP-20 some solvers unintentionally broke this rule due to bugs in their score computation. This led to situations where solvers won auctions they should not have won and provided bad prices to users, effectively breaking the social consensus rule on EBBO (Ethereum’s best bid and offer). This is not prevented by the auction mechanism directly. It is however not in the economic interest of a solver to do this. This can be observed in the case of SeaSolver.

SeaSolver started to submit scores (instead of scoreDiscounts) in the beginning of the second week. Their submitted scores were larger than allowed, though, and they won lots of auctions where the payment from the protocol did not cover their execution costs. This can nicely be observed in the cumulative rewards they earned:

The overbidding is starting at the kink around block 16925000, resulting in negative rewards. After a hot fix for filtering out solutions with score >= quality was applied (and SeaSolver started to submit scoreDiscounts again) around block 16940000, they started to earn rewards again. In total, their overbidding cost them around 0.5ETH in negative reward (+ missing out on positive rewards).

On the one hand this is a good sign for the mechanism since it shows that overbidding is not profitable.
On the other hand it also highlights a problem with the mechanism: Users were not protected from getting suboptimal prices. This is due to the fact that although SeaSolver was not computing best possible executions, it would still win quite a few auctions due to overbidding. Again, the existence of robust competition (i.e., other solutions that were better) ensured that the resulting payout for such auctions was negative (i.e., a penalty was imposed). But users did not get the best possible executions. This showcased the need for other means of user protection, e.g., using a so-called EBBO (Ethereum’s Best Bid and Offer) test.

Impact on settlements

We did not expect that the new reward scheme would have a large effect on the quality of solutions in the first two weeks.

One effect that is immediately visible, however, is the reduction in economically nonviable stable-to-stable trades. This is due to CIP-20 requiring that non-positive “objective values” are filtered out. Note that this could have been implemented even before CIP-20. Also note that this hasn’t fully eliminated such trades, but has drastically reduced them.
In the figure below, we have a heuristic estimation for daily losses due to stable-stable arbitrage trades being executed by various solvers. One can see the drastic improvement towards the right, once CIP-20 was activated.

Another interesting effect is that Gnosis solvers are not oblivious to revert risk anymore. This can be seen in this settlement. The winning 1inch solver also had proposed another solution that was batching the huge order together with a smaller one. But due to increased gas usage the estimated revert risk increased resulting in a smaller score submitted for that solution. Thus the solver executed only the large order to reduce revert risk.

Open questions

  • Are there other metrics we should track to assess the success of the new mechanism?
  • How should the capping of payments be changed to achieve the desired 75-25 split of the reward budget?
  • How can more subtle over- and underbidding be detected?