In this post I will outline how bidding behavior of solvers is influenced by penalties.
The main take-away is bids should be discounted by some term depending on revert risk.
The topic of penalties has been raised on the forum before. For example when they got initially introduced and more recently in the context of changes from CIP-74 here and here. One thing missing in these posts is a clear description of the role penalties play in our mechanism and how they impact optimal bidding.
On the one hand, this post is meant as a description of the status quo of the mechanism of CoW Protocol and should inform behavior of solvers. On the other hand, it should provide a framework to discuss potential changes to penalties going forward.
Caps for rewards and penalties
We currently use a mechanism akin to a second price auction where rewards (positive payments) and penalties (”negative payments”) are capped.
The second price nature of the mechanism ideally allows for solvers to provide the best possible routing to users without strategizing about other solvers bids in the solver competition.
Caps for rewards and penalties serve two different purposes and they can be chosen independently.
In short: Some form of cap of rewards is required for economic viability of the protocol. Going from a large cap to a small cap (with respect to the uncapped reward) means a transition from a second price to a first price auction. Some form of penalty is required in our mechanism to avoid overbidding and completely stopping the competition, some cap of penalties is required for economic viability of solvers.
We currently use a cap for rewards based on the protocol fee charged to users and a fixed cap for penalties.
To clarify the impact of penalties and capped penalties, let’s have a look at simplified models of the auction mechanism of CoW Protocol.
Simple model 1
Suppose there is one order to be settled, and there is an exogenous revert risk of (1 - p). Suppose further that a solver can generate some surplus S while covering their execution costs (e.g. the might have a routing generating a surplus of S + c and take a cut c to cover a cost c of executing the solution).
If there are no caps: The solver should report a solution providing a surplus of p * S to the user and take a cut of (1 - p) * S. This maximizes profits of the solver. The solver does not need to have knowledge of what competing solvers report in the competition.
Solvers providing solutions with the undiscounted score of S are effectively overbidding, winning in cases where they make a loss on average.
Derivation
One justification for this is to look at the expected profit g of the solver if they win as a function of the reference score. If a solver wins with score s and the settlement is successful, they take a cut of S - s and get a reward of s - s_ref. The total profit in this case then is S - s_ref. If the solver wins but does not settle successfully, they receive a penalty of s_ref. The expected profit of winning is thus g(s_ref) = p * (S - s_ref) - (1 - p) * s_ref. The function does not depend on the score of the solution and thus the score is only relevant for either winning and on average receiving g(s_ref), or not winning and getting 0. Thus the solver wants to win in the cases where g(s_ref) > 0 and lose in cases where g(s_ref) < 0. If s_opt is a score with g(s_opt) = 0, the solver should bid s_opt. If they do, they win exactly when they make a profit on average, and lose when they would make a loss on average if they had won. In this model g(s_opt) = 0 holds for s_opt = p * S.
Concretely, if a solver finds a routing with $10 of surplus and a success probability of 90%, they should optimally bid $9 and take a cut of $1. This maximizes profit of the solvers. From the point of view of a user with surplus on the order of the default slippage tolerance of 50 bps, the price is reduced by 5 bps.
If there is only a cap c_p for penalties: The solver should provide a surplus of max(p * S, S - (1 - p) / p * c_p). They should take a cut of min((1 - p) * S, (1 - p) / p * c_p). For orders with little surplus, they effectively behave as if there were no capping of penalties. For larger orders, they discount the score less than if there were no cap. As before, not discounting S at all can be considered overbidding. With the optimal strategy, penalties are not offset by rewards for solvers to be profitable. Instead, penalties are offset by the additional cut solvers take.
Derivation
Following the argument from above for the modified expected profit g(s_ref) = p * (S - s_ref) - (1 - p) * min(s_ref, c_p) one can look at the two cases p * S <= c_p and p * S >= c_p giving the scores s_opt = p * S and s_opt = S - (1 - p) / p * c_p, respectively.
Concretely, with a cap of $20, if a solver finds a routing with $10 of surplus and success probability of 90%, they should optimally bid $9 and take a cut of $1. If they find a routing with $100 of surplus and success probability of 90%, they should bid $100 - 0.1 / 0.9 * $20 = $97.77 (instead of $90 (underbidding) or $100 (overbidding)). In the latter case, a user with surplus similar to the default slippage of 50 bps would have gotten a price which is around 1bps worse.
If there is a cap c_r for rewards, optimal solver behavior becomes more complicated as the expected profit does now directly depend on the score of a solution. Solvers are then expected to take an additional cut compared to the case without reward cap. The optimal cut requires some knowledge of what the competition does. Ideally we would like to avoid solvers having to spend time on this type of strategizing. But we have not found a solution to that problem which does not sacrifice other properties of the auction.
From the example above, it might look as if a cap of zero for penalties would work fine and result in users just getting the expected surplus of S. This is not always the case, as a slightly modified example shows.
Simple model 2
Suppose there is no external revert risk and a small chance of a huge price shift in favor of the user. With (a small) probability q a solver can provide additional surplus T. They can choose to revert in case there is no such price shift.
Without caps: The solver should incorporate the expected additional surplus into the bid, providing the user with S + q * T.
Derivation
This follows from looking at the expected profit of winning g(s_ref) = S - s_ref + q * T, compared to the profit with possible reverts g_rev(s_ref) = q * (S - s_ref + T) - (1 - q) * s_ref. The first expression is never smaller and gives s_opt = S + q * T.
With caps for penalties: The solver can bid s_opt = max(S + q * T, S + T - (1 - q) / q * c_p).
Derivation
This follows from looking at the expected profit of winning g(s_ref) = S - s_ref + q * T without reverting (as above), compared to the profit g_rev(s_ref) = q * (S - s_ref + T) - (1 - q) * min(s_ref, c_p). From the arguments with exogenous revert risk (using the success probability q and the surplus S + T), the latter case results in a bid max(q(S + T), S + T - (1 - q) / q * c_p). Note that in this case a solver profits from having information of other solvers bids. For example, if there is no competing bid, i.e. the reference score is zero, a solver will always have wanted to have chosen the solution and bid for the case of not reverting, as g(0) > g_rev(0). But if the competing bid is large, risking the revert can allow for a larger profit.
If q * T is larger than c_p, the solver can thus promise the user a surplus of S + T - (1 - q) / q * c_p. The solution will revert with large probability of (1 - q).
Even though this is an artificial model, a similar incentive is created by arbitrage opportunities or even normal price movements with slippage tolerances chosen by solvers. With strong competition this can result in an equilibrium of “overbidding” and unsuccessful settlements.
This is not generally a blocker to reducing the cap for penalties but indicates that the effect of this cap is quite different from a cap on rewards, where setting it to zero is possible.
Remarks
With this discussion in mind, the current situation seems to be that solvers overbid regularly and try to cover resulting losses with rewards. The intended/optimal behavior in our current mechanism is to cover losses from penalties by taking a cut directly from users. This, however, has a negative impact on execution quality.
The choice of the cap for penalties within our current mechanism is still a delicate issue. Choosing it too small can result in a meaningless competition and stall the protocol. Choosing it too large can result in too large of an impact on execution quality. Given the arguments above and data on previous auctions, it could be possible to come up with a cap which is derived more from first principles.
There can also be other mechanisms with better properties with regards to generating and distributing welfare, and a discussion around that can be very helpful. Besides the usual issue of how to distribute the value generated in the mechanism between users, solvers, and the DAO, one fundamental issue is how to incorporate unsuccessful settlements. Our current approach using (capped) penalties implicitly incorporates success rates into the ranking and execution.