Riskadjusted solver rewards
This is joint work with @voyta.eth and @harisang.
In CoW Protocol, solvers bear the execution risk: the possibility that their solution may revert. Currently the protocol rewards solvers on average R
COW tokens per order, regardless of the execution risk, even though execution risk varies between orders and depends on network volatility. In some cases, the reward may not offset execution risk, discouraging solvers from submitting solutions, and leading to orders not being processed. In other cases, solvers might be earning relatively high rewards for bearing little to no risk (see appendix for an example of a norisk batch and an example of a batch that first reverted, incurring a large cost to the winning solver, before eventually getting mined).
The goal of this post is to introduce a method for redistributing the COW rewards across user orders so that they better reflect execution risk, but still on average be equal to a parameterizable R
amount of COW.
Note that this is most likely a provisional measure  it can be implemented straight away via the weekly solver payouts, and can also serve as a baseline that solvers may optionally use for pricing their solutions if/when later we progress to a model where solvers bid for the right to submit the auction (see this topic).
Even with these adjustments we foresee that some batches will continue to pay too much or little rewards, however we believe they can significantly reduce the margin of error compared to the status quo.
Riskadjusted rewards
On expectation, solver profits for a user order are given by the following expression:
E(profits) = (1  p) * rewards  p * costs
= (1  p) * (rewards + costs)  costs
where p
is the probability of the transaction reverting, costs=gas_units * gas_price
is the cost incurred if the transaction reverts, and rewards
is the amount we reward the solver with for this order.
We would like to find rewards that on expectation gives out a profit of T COW
to solvers:
E(profits) = T COW
<> (1  p) * (rewards + costs)  costs = T COW
<> rewards = (T COW + costs) / (1  p)  costs
So the rewards a solver should get for an order is a function of the probability of revert and the costs of reverting:
rewards(p, costs) = (T COW + costs) / (1  p)  costs [1]
Interesting values for T
are T=0 COW
, where rewards will be enough only to cover costs, and T=37 COW
, the average net profit solvers are getting today for the average reward of R=73 COW
(see [1] in appendix on how these values were computed).
Computing probability of revert p
We can model the probability of a batch to revert as a function of costs
, or more concretely as a function of gas_units
and gas_price
, since cost is equal to the product of these two quantities:
p = 1/(1 + exp(Î˛  É‘1 * gas_units  É‘2 * gas_price)))
where Î˛
, É‘1
, and É‘2
, are obtained by (logistic) regression. See appendix or the full analysis including the data exploration justifying the choice of these predictors, and the regression computation.
Capping rewards
With this model for p
, whenever the probability of revert approaches one, the risk adjusted rewards (eq. 1) goes to infinity. To account for possible model inaccuracies (training data can be too thin and noisy) we suggest to further cap the maximum amount of rewards that a solver can earn, as well as the maximum gas_units:
gas_units_capped(gas_units) = min(1250K, gas_units)
rewards_capped(p, gas_units, gas_price) = [2]
= min(
2500 COW,
rewards(p, gas_units_capped(gas_units) * gas_price)
)
The cap constants were empirically selected by looking at points where rewards would blow up, on a 5 month period of data and is further partially validated by the outcome analysis at the very end of the modeling notebook (the long version, see appendix).
This look like this:
Implementation
This proposal can be implemented by updating the driver to compute eq. [2] for every user order of every batch, and include the result in a â€śrewardsâ€ť key for the order in the instance.json sent to the solvers. Given that the average number of orders per batch, and the gas price and gas limit distribution can be expected to change over time, the regression model will need to be recalibrated periodically.
Note that liquidity orders are suggested to carry no COW rewards (a liquidity order should only be included in the solution if this improves the objective function).
Conclusion
This proposal introduces a method for distributing rewards among solvers as a function of indicators of risk of revert. Its main limitations are:
 Model is sensitive to calibration period. Calibrating the model on e.g. a non volatile period and then using it in a volatile period will provide inaccurate results. Ideally, the training historic period should reflect future conditions.
 Weâ€™re not considering slippage risk, only revert risk. Our data contains different kinds of solvers and their approach to the tradeoff between revert risk and slippage risk is different.
 Weâ€™re not aiming at a full model that captures probability of revert. Solvers set the slippage tolerance differently which is a huge driving factor, but this is unobservable to us.
Potential extensions:
 Revert risk is close to null when the solution can be settled using internal buffers exclusively. Successfully estimating whether this is the case would allow for awarding lower rewards for these batches.
 The DAO might wish to revise the estimated solver profit (variable T above).
October 6 UPDATE: Following the discussion below, we add the following to the proposal. In the case where a settlement is (potentially) using internal buffers and zero external interactions, then the â€śrewardsâ€ť value specified in the input json per order will be ignored when computing the reward; instead, this value will be replaced by the value T (per executed user order contained in the batch). We also clarify that liquidity orders provided in the input json that have no atomic execution, i.e., where the corresponding field â€śhas_atomic_executionâ€ť is set to FALSE, will be treated as internal interactions. This, in particular, means that a perfect CoW between a user order and such a liquidity order will be treated as a purely internal settlement, and thus, will lead to a reward of value T.
Appendix
Please refer to the following notebooks:

Computing current rewards and profits per order.

Modeling COW rewards
a. Short version  this contains reward logic derivation, final model specification, and calculation of COW rewards under the proposed mechanism
b. Full version (superset of the above): includes feature selection and additional analysis regarding final suggested COW rewards
Examples of norisk and highrisk batches
 Norisk batch: In this case, there was a single user order being settled, selling a small amount of USDC for ETH. Quasimodo (the winning solver) identified one of the baseline pools (i.e., the pools provided by the driver) as the pool to match this order, but also realized that internal buffers suffice, and so it ended up internalizing the trade. In other words, this ended up being an almost zerorisk trade (order could still be canceled), where the winning solver only needed to look at liquidity provided in the input json. Here is the settlement itself:
Ethereum Transaction Hash (Txhash) Details  Etherscan  Highrisk batch: Example of high gas costs and volatility: In this case, there was a single user order being settled, that was selling a substantial amount of BTRFLY for USDC. The winning solver was Gnosis_Paraswap and in its first attempt to settle onchain, the transaction reverted, incurring a cost of ~0.062 ETH. Eventually, the second attempt succeeded. Here are the corresponding logs: