CIP-Draft: Risk-adjusted solver rewards

Risk-adjusted solver rewards

This is joint work with @voyta.eth and @harisang.

In CoW Protocol, solvers bear the execution risk: the possibility that their solution may revert. Currently the protocol rewards solvers on average R COW tokens per order, regardless of the execution risk, even though execution risk varies between orders and depends on network volatility. In some cases, the reward may not offset execution risk, discouraging solvers from submitting solutions, and leading to orders not being processed. In other cases, solvers might be earning relatively high rewards for bearing little to no risk (see appendix for an example of a no-risk batch and an example of a batch that first reverted, incurring a large cost to the winning solver, before eventually getting mined).

The goal of this post is to introduce a method for redistributing the COW rewards across user orders so that they better reflect execution risk, but still on average be equal to a parameterizable R amount of COW.

Note that this is most likely a provisional measure - it can be implemented straight away via the weekly solver payouts, and can also serve as a baseline that solvers may optionally use for pricing their solutions if/when later we progress to a model where solvers bid for the right to submit the auction (see this topic).

Even with these adjustments we foresee that some batches will continue to pay too much or little rewards, however we believe they can significantly reduce the margin of error compared to the status quo.

Risk-adjusted rewards

On expectation, solver profits for a user order are given by the following expression:

E(profits) = (1 - p) * rewards - p * costs
           = (1 - p) * (rewards + costs) - costs

where p is the probability of the transaction reverting, costs=gas_units * gas_price is the cost incurred if the transaction reverts, and rewards is the amount we reward the solver with for this order.

We would like to find rewards that on expectation gives out a profit of T COW to solvers:

E(profits) = T COW
<-> (1 - p) * (rewards + costs) - costs = T COW
<-> rewards = (T COW + costs) / (1 - p) - costs

So the rewards a solver should get for an order is a function of the probability of revert and the costs of reverting:

rewards(p, costs) = (T COW + costs) / (1 - p) - costs                  [1]

Interesting values for T are T=0 COW, where rewards will be enough only to cover costs, and T=37 COW, the average net profit solvers are getting today for the average reward of R=73 COW (see [1] in appendix on how these values were computed).

Computing probability of revert p

We can model the probability of a batch to revert as a function of costs, or more concretely as a function of gas_units and gas_price, since cost is equal to the product of these two quantities:

p = 1/(1 + exp(-β - ɑ1 * gas_units - ɑ2 * gas_price)))

where β, ɑ1, and ɑ2, are obtained by (logistic) regression. See appendix or the full analysis including the data exploration justifying the choice of these predictors, and the regression computation.

Capping rewards

With this model for p, whenever the probability of revert approaches one, the risk adjusted rewards (eq. 1) goes to infinity. To account for possible model inaccuracies (training data can be too thin and noisy) we suggest to further cap the maximum amount of rewards that a solver can earn, as well as the maximum gas_units:

gas_units_capped(gas_units) = min(1250K, gas_units)

rewards_capped(p, gas_units, gas_price) =                               [2]
     = min(
           2500 COW,
           rewards(p, gas_units_capped(gas_units) * gas_price)
      )

The cap constants were empirically selected by looking at points where rewards would blow up, on a 5 month period of data and is further partially validated by the outcome analysis at the very end of the modeling notebook (the long version, see appendix).

This look like this:

Implementation

This proposal can be implemented by updating the driver to compute eq. [2] for every user order of every batch, and include the result in a “rewards” key for the order in the instance.json sent to the solvers. Given that the average number of orders per batch, and the gas price and gas limit distribution can be expected to change over time, the regression model will need to be recalibrated periodically.

Note that liquidity orders are suggested to carry no COW rewards (a liquidity order should only be included in the solution if this improves the objective function).

Conclusion

This proposal introduces a method for distributing rewards among solvers as a function of indicators of risk of revert. Its main limitations are:

  • Model is sensitive to calibration period. Calibrating the model on e.g. a non volatile period and then using it in a volatile period will provide inaccurate results. Ideally, the training historic period should reflect future conditions.
  • We’re not considering slippage risk, only revert risk. Our data contains different kinds of solvers and their approach to the trade-off between revert risk and slippage risk is different.
  • We’re not aiming at a full model that captures probability of revert. Solvers set the slippage tolerance differently which is a huge driving factor, but this is unobservable to us.

Potential extensions:

  • Revert risk is close to null when the solution can be settled using internal buffers exclusively. Successfully estimating whether this is the case would allow for awarding lower rewards for these batches.
  • The DAO might wish to revise the estimated solver profit (variable T above).

October 6 UPDATE: Following the discussion below, we add the following to the proposal. In the case where a settlement is (potentially) using internal buffers and zero external interactions, then the “rewards” value specified in the input json per order will be ignored when computing the reward; instead, this value will be replaced by the value T (per executed user order contained in the batch). We also clarify that liquidity orders provided in the input json that have no atomic execution, i.e., where the corresponding field “has_atomic_execution” is set to FALSE, will be treated as internal interactions. This, in particular, means that a perfect CoW between a user order and such a liquidity order will be treated as a purely internal settlement, and thus, will lead to a reward of value T.


Appendix

Please refer to the following notebooks:

  1. Computing current rewards and profits per order.

  2. Modeling COW rewards
    a. Short version - this contains reward logic derivation, final model specification, and calculation of COW rewards under the proposed mechanism
    b. Full version (superset of the above): includes feature selection and additional analysis regarding final suggested COW rewards

  3. Accompanying exploratory data analysis

Examples of no-risk and high-risk batches

  1. No-risk batch: In this case, there was a single user order being settled, selling a small amount of USDC for ETH. Quasimodo (the winning solver) identified one of the baseline pools (i.e., the pools provided by the driver) as the pool to match this order, but also realized that internal buffers suffice, and so it ended up internalizing the trade. In other words, this ended up being an almost zero-risk trade (order could still be canceled), where the winning solver only needed to look at liquidity provided in the input json. Here is the settlement itself:
    Ethereum Transaction Hash (Txhash) Details | Etherscan
  2. High-risk batch: Example of high gas costs and volatility: In this case, there was a single user order being settled, that was selling a substantial amount of BTRFLY for USDC. The winning solver was Gnosis_Paraswap and in its first attempt to settle on-chain, the transaction reverted, incurring a cost of ~0.062 ETH. Eventually, the second attempt succeeded. Here are the corresponding logs:
6 Likes

Since a good chunk of solvers use flashbots (and I guess mev-boost in the near future), the cost of reverting is always 0 for them. Is it worth adding complexity to the reward structure for what is essentially a corner case?

1 Like

Hi 6400, thanks for your feedback!

I am not sure I completely understand your concern. Currently solvers are reimbursed of successful transactions (i.e. protocol pays them), so we might be a) paying too much for risk-free transactions and b) paying too little for risky transactions. I suppose that even if we start using flashbots in the future exclusively, then the cost of reserving a slot in a block via flashbots will still change across time, so I imagine that it still makes sense to adjust the rewards to volatility. Maybe I missed your point?

This looks great! Much needed imo. I have now experienced first hand how varying and brutal revert risk can be.

On the topic of setting the value of T, I think as a solver of course I would prefer to have a relatively high level. T being about half of R seems quite low to me but I guess it depends heavily on the gas price at the time. Might be interesting to see a rolling average for these variables.

Thanks tomatosoup. I suggest to not make any change to the current value of T=37, to try to not to introduce too many changes simultaneously. That can be done neatly in a posterior CIP.

You can check the first of the linked notebooks to see how T and R were estimated from data.

Yeah that makes sense. Was the average COW reward under this scheme close to R in backtesting? Sorry if I missed it in my quick look.

In backtesting we had an average profit of T=37 COW for an average reward of R=73 COW (per order).

Under the proposed reward scheme?

Also I noticed that the regression was based on batch data while it’s being applied to orders. Do you know anything about the discrepancy between these two methods? E.g. in backtesting.

Under the current scheme, and therefore keeping it as is in the proposed scheme.

As you mentioned, the current method awards a fixed amount of 50 COW per batch plus 35 COW per order, while this proposal sets a per-order reward. I believe the projections are correct: In this notebook we’ve estimated this from a batch-oriented data file (each row is a batch, and contains the number of orders plus a boolean specifying if the batch was successful or not).

If there are no objections, I’d would like to move this post into a CIP. I’d just add that passing the CIP does not mean an immediate change to the proposed risk adjusted solver rewards scheme, but instead that the team is free to move on to implement it.

I don’t see why that is. I mean if your projection is 100% accurate then yeah the new T will be the same but what about R?

Right so you’re using a regression estimated from batches and applying it to orders and I’m just wondering if the discrepancy between those two is large. i.e. if I apply equation 2 to a batch versus applying it to orders and summing, is there a large difference?

Can you elaborate a bit more on this? If I understand correctly you are proposing the following: compute revert probability per batch (and not per order), and then use this to propose a reward, again per batch, so that the expected profit per batch is equal to some T?

One issue is that you don’t really know in advance how many orders will be in the batch (and thus you don’t have estimates for gas needed). So, I just want to clarify here how and when the rewards will be revealed to the solvers (e.g., after the settlement?).

No I’m not proposing anything. My first question is whether average cow rewards per order will be similar under this new scheme. My second question is what the error is when applying a regression estimated from batch level data to orders. The revert probabilities in the regression are per batch already. Does it make sense to use that to predict revert probabilities per order?

For the first question: yes the rewards will be similar in value, meaning solvers will be payed 73 COW per order which is what they are payed today on average (today they are payed 50 per batch + 35 per order). There will be no separation between per batch and per order anymore though, there will be just a single value of 73 per order.

For the second question: we ran the regression on batches composed of one order only so that it is easier to extrapolate to “by order” values. So yes, we believe it makes sense to do so.

1 Like

Ah right I forgot about that for the second question. Yeah that makes sense then. For the first question, do you have hard evidence for that? You are now introducing correlation between rewards and the revert probability. Doesn’t really matter to solvers but it matters to the protocol that you’re not paying too much more COW under this scheme.

I apologize that I haven’t yet understood your first question. Maybe let’s take a step back. What we did for estimating the current rewards per order (73 COW) was to take a few recent months of data and compute the following:

def current_rewards_per_order(r):
  if r.nr_user_orders == 0:
    return 0
  return (50 + r.nr_user_orders * 35) / r.nr_user_orders

df.apply(current_rewards_per_order, axis=1).mean()

That is, for every row r in the dataset we compute current_rewards_per_order(r) and then take the mean of those values. Note that when nr_user_orders=0 (ie it reverted) then rewards payed is zero.

I hope this explains where the 73 COW comes from. You seem to suggest that the protocol will pay less than 73 COW on average to solvers, but that is not true - the whole idea of this proposal is to keep paying a parameterizable amount of COW per order, which we suggest is still 73 COW so that nothing changes on average (the distribution still changes of course).

Right actually I think the new R will be higher than now which is a drawback. Cow is paying more for the same net profit to solvers. Consider the expected profit from an order: E(profit) = (1-p)*(rewards+costs) - costs, where p is the probability to revert. Before, (1-p) and costs are negatively correlated which reduces average profit compared to if they were independent. Now rewards are also negatively correlated with (1-p) which increases the negative correlation. Hence to maintain the same T = E(profit), you would need to increase E(rewards) = R. I think that’s correct. Could you run a backtest over the data to see how much rewards would be paid out under the new scheme?

Indeed the R does not need to be the same. I don’t quite follow the argument why it should be higher than currently - please see last cells of this notebook for the backtesting.

Ah so it was substantially lower. That’s really good then. Hmm not sure what’s wrong with my quick reasoning atm. I guess R is actually E((1-p)*rewards) which might explain it. Well I feel dumb. No objections from me for moving forward, looks great!

Sorry for the late reply, was a bit swamped.

Very nice work! Think something like this is a great first step to move the reward structure more towards the right shape, and we can iterate from there.

Some thoughts:

Potential extensions:

  • Revert risk is close to null when the solution can be settled using internal buffers exclusively. Successfully estimating whether this is the case would allow for awarding lower rewards for these batches.

This can be determined deterministically based on the solution submitted, no? If all amms/interactions are marked INTERNAL, and there are no atomic liquidity orders, the batch is buffer settled otherwise it’s not. Solvers will know if their solution is a buffer batch as well, so it will be clear upfront what their reward is going to be.

I would be in favor of including this in this round if possible. Perhaps just set revert risk to some fixed, very small number in that case (1%?).

Secondly, is it feasible to propagate these revert estimates to the user in any way? Eg something like new_fee = old_fee/(1-p_revert)? Not sure how much work this is re analysis and implementation, but I think there is a potential risk if the fee we quote to the user and the cost we reimburse to solvers diverges a lot, because it means we are effectively subsidizing the difference between tx cost on success and the full transaction cost including reverts.

We have already seen arbitrage bots exploit this difference. Right now it results in a loss for the solver, which gives them an incentive to detect and avoid such cases. When we reimburse it, that incentive disappears. Good for solvers, but bad for the protocol I think.

If it’s too complex to attach to this CIP we can just see how it pans out in practice though.