How penalties affect bidding of solvers

In this post I will outline how bidding behavior of solvers is influenced by penalties.

The main take-away is bids should be discounted by some term depending on revert risk.

The topic of penalties has been raised on the forum before. For example when they got initially introduced and more recently in the context of changes from CIP-74 here and here. One thing missing in these posts is a clear description of the role penalties play in our mechanism and how they impact optimal bidding.

On the one hand, this post is meant as a description of the status quo of the mechanism of CoW Protocol and should inform behavior of solvers. On the other hand, it should provide a framework to discuss potential changes to penalties going forward.

Caps for rewards and penalties

We currently use a mechanism akin to a second price auction where rewards (positive payments) and penalties (”negative payments”) are capped.

The second price nature of the mechanism ideally allows for solvers to provide the best possible routing to users without strategizing about other solvers bids in the solver competition.

Caps for rewards and penalties serve two different purposes and they can be chosen independently.

In short: Some form of cap of rewards is required for economic viability of the protocol. Going from a large cap to a small cap (with respect to the uncapped reward) means a transition from a second price to a first price auction. Some form of penalty is required in our mechanism to avoid overbidding and completely stopping the competition, some cap of penalties is required for economic viability of solvers.

We currently use a cap for rewards based on the protocol fee charged to users and a fixed cap for penalties.

To clarify the impact of penalties and capped penalties, let’s have a look at simplified models of the auction mechanism of CoW Protocol.

Simple model 1

Suppose there is one order to be settled, and there is an exogenous revert risk of (1 - p). Suppose further that a solver can generate some surplus S while covering their execution costs (e.g. the might have a routing generating a surplus of S + c and take a cut c to cover a cost c of executing the solution).

If there are no caps: The solver should report a solution providing a surplus of p * S to the user and take a cut of (1 - p) * S. This maximizes profits of the solver. The solver does not need to have knowledge of what competing solvers report in the competition.

Solvers providing solutions with the undiscounted score of S are effectively overbidding, winning in cases where they make a loss on average.

Derivation

One justification for this is to look at the expected profit g of the solver if they win as a function of the reference score. If a solver wins with score s and the settlement is successful, they take a cut of S - s and get a reward of s - s_ref. The total profit in this case then is S - s_ref. If the solver wins but does not settle successfully, they receive a penalty of s_ref. The expected profit of winning is thus g(s_ref) = p * (S - s_ref) - (1 - p) * s_ref. The function does not depend on the score of the solution and thus the score is only relevant for either winning and on average receiving g(s_ref), or not winning and getting 0. Thus the solver wants to win in the cases where g(s_ref) > 0 and lose in cases where g(s_ref) < 0. If s_opt is a score with g(s_opt) = 0, the solver should bid s_opt. If they do, they win exactly when they make a profit on average, and lose when they would make a loss on average if they had won. In this model g(s_opt) = 0 holds for s_opt = p * S.

Concretely, if a solver finds a routing with $10 of surplus and a success probability of 90%, they should optimally bid $9 and take a cut of $1. This maximizes profit of the solvers. From the point of view of a user with surplus on the order of the default slippage tolerance of 50 bps, the price is reduced by 5 bps.

If there is only a cap c_p for penalties: The solver should provide a surplus of max(p * S, S - (1 - p) / p * c_p). They should take a cut of min((1 - p) * S, (1 - p) / p * c_p). For orders with little surplus, they effectively behave as if there were no capping of penalties. For larger orders, they discount the score less than if there were no cap. As before, not discounting S at all can be considered overbidding. With the optimal strategy, penalties are not offset by rewards for solvers to be profitable. Instead, penalties are offset by the additional cut solvers take.

Derivation

Following the argument from above for the modified expected profit g(s_ref) = p * (S - s_ref) - (1 - p) * min(s_ref, c_p) one can look at the two cases p * S <= c_p and p * S >= c_p giving the scores s_opt = p * S and s_opt = S - (1 - p) / p * c_p, respectively.

Concretely, with a cap of $20, if a solver finds a routing with $10 of surplus and success probability of 90%, they should optimally bid $9 and take a cut of $1. If they find a routing with $100 of surplus and success probability of 90%, they should bid $100 - 0.1 / 0.9 * $20 = $97.77 (instead of $90 (underbidding) or $100 (overbidding)). In the latter case, a user with surplus similar to the default slippage of 50 bps would have gotten a price which is around 1bps worse.

If there is a cap c_r for rewards, optimal solver behavior becomes more complicated as the expected profit does now directly depend on the score of a solution. Solvers are then expected to take an additional cut compared to the case without reward cap. The optimal cut requires some knowledge of what the competition does. Ideally we would like to avoid solvers having to spend time on this type of strategizing. But we have not found a solution to that problem which does not sacrifice other properties of the auction.

From the example above, it might look as if a cap of zero for penalties would work fine and result in users just getting the expected surplus of S. This is not always the case, as a slightly modified example shows.

Simple model 2

Suppose there is no external revert risk and a small chance of a huge price shift in favor of the user. With (a small) probability q a solver can provide additional surplus T. They can choose to revert in case there is no such price shift.

Without caps: The solver should incorporate the expected additional surplus into the bid, providing the user with S + q * T.

Derivation

This follows from looking at the expected profit of winning g(s_ref) = S - s_ref + q * T, compared to the profit with possible reverts g_rev(s_ref) = q * (S - s_ref + T) - (1 - q) * s_ref. The first expression is never smaller and gives s_opt = S + q * T.

With caps for penalties: The solver can bid s_opt = max(S + q * T, S + T - (1 - q) / q * c_p).

Derivation

This follows from looking at the expected profit of winning g(s_ref) = S - s_ref + q * T without reverting (as above), compared to the profit g_rev(s_ref) = q * (S - s_ref + T) - (1 - q) * min(s_ref, c_p). From the arguments with exogenous revert risk (using the success probability q and the surplus S + T), the latter case results in a bid max(q(S + T), S + T - (1 - q) / q * c_p). Note that in this case a solver profits from having information of other solvers bids. For example, if there is no competing bid, i.e. the reference score is zero, a solver will always have wanted to have chosen the solution and bid for the case of not reverting, as g(0) > g_rev(0). But if the competing bid is large, risking the revert can allow for a larger profit.

If q * T is larger than c_p, the solver can thus promise the user a surplus of S + T - (1 - q) / q * c_p. The solution will revert with large probability of (1 - q).

Even though this is an artificial model, a similar incentive is created by arbitrage opportunities or even normal price movements with slippage tolerances chosen by solvers. With strong competition this can result in an equilibrium of “overbidding” and unsuccessful settlements.

This is not generally a blocker to reducing the cap for penalties but indicates that the effect of this cap is quite different from a cap on rewards, where setting it to zero is possible.

Remarks

With this discussion in mind, the current situation seems to be that solvers overbid regularly and try to cover resulting losses with rewards. The intended/optimal behavior in our current mechanism is to cover losses from penalties by taking a cut directly from users. This, however, has a negative impact on execution quality.

The choice of the cap for penalties within our current mechanism is still a delicate issue. Choosing it too small can result in a meaningless competition and stall the protocol. Choosing it too large can result in too large of an impact on execution quality. Given the arguments above and data on previous auctions, it could be possible to come up with a cap which is derived more from first principles.

There can also be other mechanisms with better properties with regards to generating and distributing welfare, and a discussion around that can be very helpful. Besides the usual issue of how to distribute the value generated in the mechanism between users, solvers, and the DAO, one fundamental issue is how to incorporate unsuccessful settlements. Our current approach using (capped) penalties implicitly incorporates success rates into the ranking and execution.

2 Likes

There are a few practical aspects that are not fully captured by the simplified model.

1. Driver asymmetry: execution control

There are effectively two categories of solvers:

  • Teams using the reference driver
  • Teams allowed to run their own driver

The driver is responsible for submitting and managing the settlement transaction. When running a custom driver, a solver can implement more sophisticated fallback logic if a transaction starts failing (e.g. re-quoting, retrying under updated conditions, selecting alternative execution paths, etc.). This can increase the realized execution success rate.

By contrast, with the reference driver, if the winning solution begins to fail during simulation, even once, the solution is dropped and the solver is penalized.

Even if two solvers have identical theoretical success probability for a route, their realized success probability can differ due to execution control. Higher penalty caps increase the value of this execution flexibility. Lower penalty caps would not remove the structural difference, but would reduce the economic impact of that asymmetry.

2. Driver asymmetry: late submissions

In our own data (Wraxyn solver), out of 5,000 (~49 days of data, furthest auction = 11949394) won batches we observed 568 reverts. Of those, 63 (~11%) were not due to route quality or market movement, but due to late submission by the reference driver - the solution was valid, won the auction, but was submitted after the deadline and penalized.

Query used - https://dune.com/queries/4351957
capped_payment is negative with non-null tx_hash

This category of reverts cannot be inferred from order-level risk parameters. From a modeling perspective, it behaves as exogenous operational risk. At > 10% of total reverts, this is economically significant.

  • Late by 1 block: ~52 cases
  • Late by 2 blocks: ~8 cases
  • Late by 3 blocks: ~3 cases

3. Dependence on the second-best solution

Reverts are evaluated relative to the second-best solution. However, the solver does not have visibility into whether that second-best solution would actually have executed successfully under the same market conditions.

In volatile conditions, multiple candidate solutions may degrade simultaneously. Penalization based on the theoretical second-best implicitly assumes it was executable, which is not verifiable from the perspective of the winning solver.

1 Like