CIP-11: Rules of the Solver Competition - status quo and an update proposal

This proposal is joint work of the CoW Protocol solver team (@marco, @harisang, Alex and myself) and our dear CoWmunity member @voyta.eth.

TL;DR

This post provides an overview of the current rules of the solver competition and proposes the following changes to be added in the short-term:

  • Social consensus (implicit rules)
  • Global token conservation constraints
  • Local token conservation constraints

Summary

At CoW Protocol, we are aiming at achieving fair and cost-efficient trading by having a set of solvers competing to solve the Batch Auction Problem. The rules of the competition align solvers with these goals, ultimately driving value accrual for the CowDAO.

The solver department of the CoW Protocol development team has been continuously working on improving the Rules of the Solver Competition. In this proposal, we want to:

  1. Summarize the rules that are currently implemented and documented
  2. Propose to add a set of social consensus rules as well as two types of constraints in order to address issues with the current rules as well as prevent potential attacks on the mechanism

Here’s an overview of the current rules and the additions we are proposing:

Current rules

  • Objective function maximizes total surplus + fees - costs

  • User orders respect limit price and max buy/sell amount, uniform clearing prices

  • Liquidity orders matched at limit price

  • One set of internal buffers for all solvers

  • A solution containing no or only “new” user orders is invalid

Proposed additions

  • Social consensus (implicit rules)

  • Global token conservation constraints

  • Local token conservation constraints

Current Rules of the Solver Competition

The current batch auction problem is described here. In this section, we will summarize the status quo of the main rules that need to be respected by solvers. While these have worked reasonably well in practice, we will describe some of the drawbacks as an introduction to some of the suggested changes.

Objective function (scoring criterion)

The objective function that is currently used for scoring each solution reads

maximize: total user surplus + fees - costs.

The maximization of total user surplus does not in all cases distinguish between fair and unfair solutions. A solution may be optimal from the perspective of maximizing the sum of all individual surpluses and yet be unfair. We are addressing cases of unfairness both as part of the social consensus rules to be added, as well as via an explicit check in the backend (see further below for details).

User and liquidity orders

CoW Protocol supports two different kinds of orders - user orders and liquidity orders.

  • User orders are eligible to receive surplus relative to their limit prices as they are matched at uniform clearing prices, and their aggregated surplus is maximized (via the objective function). User orders need to pay fees to cover execution costs.

  • Liquidity orders, which are meant for market makers, are always matched at limit price, i.e., they are not eligible for surplus. However, liquidity orders currently do not need to pay any fees. They should therefore only be matched if they improve the objective function by increasing total user order surplus, or decreasing costs.

Internal buffers

A set of internal buffers is available to all solvers, consisting of a set of eligible tokens that the settlement contract holds at the time. Solvers may replace valid AMM interactions by interactions that trade directly with the internal buffers at the same exchange rate, thus potentially saving on execution costs.

Since the internal buffers currently belong to the protocol and not to the individual solvers, there is a risk of malicious solvers exploiting the buffers for their own advantage. We propose that such behavior should be punished as per the proposed social consensus rules.

Additional solution properties

A solution that does not settle any user orders is not considered valid for the solver competition. Moreover, in order to leave ample time for potential CoWs, solutions that only contain “new” orders (i.e., orders that have been in the system for less than 30 seconds), are considered invalid.

Proposed update #1: Social consensus

The above-mentioned rules do not guarantee all aspects of desirable solver behavior, in particular with respect to the fairness of solutions. There are “non-written” behavioral solver rules following from CoW Protocol’s mission of pioneering a fair DEX. While these other rules are not implemented in the backend, we propose that CowDAO enforces them by

  • transparently communicating the social consensus as to what kind of solver behavior is not allowed
  • checking settlements by solvers retrospectively
  • slashing solver bonds (via a DAO vote) in case violations to the social consensus were found

For now, we suggest to consider the following types of malicious behavior:

1. Provision of unfair solutions

One concern with the current rules is that total user surplus does not directly consider clearing exchange rates outside CoW Protocol. As of the date of this post, CoW Protocol settles only a minority of total volume on Ethereum and as such, the uniform clearing prices (exchange rates) should be in line with what the users would get elsewhere (i.e., if they traded against publicly available liquidity offered by prominent protocols, such as Uniswap, Balancer, or Curve).

As an example, consider two user orders of the same size that can be matched directly against each other (i.e., without requiring external liquidity). The current objective function yields the same value for every clearing price between the two orders’ limit prices. Hence, there is some leeway for solvers to decide how to settle this trade, in particular how to distribute the surplus between the orders. This is commonly known as bargaining problem. However, the current exchange rate on the external market indicates where the clearing price should be, as illustrated in the following picture:

Here, the external market is represented by the trading curve of an AMM. A solution that sets the clearing price, e.g., to either o1’s or o2’s limit price is considered unfair and thus invalid, because one of the orders would then receive less than if matched against the AMM. By contrast, the fair clearing price clearly coincides with the current spot price of the AMM (or should at least be set between the prices offered according to its buy- and sell-curve).

Practically, we suggest considering at least the AMMs that are passed as part of the batch instance as the external market price reference.

Moreover, there is another situation in which two orders are matched against external liquidity in independent trading cycles and where surplus can be distributed in an unfair manner. We suggest adding an explicit constraint to mitigate it as part of this proposal (see the “local token conservation” section below).

2. Inflation of the objective function

Using tokens for the sole purpose of inflating the objective value or maximizing the reward is forbidden (e.g., by creating fake tokens, or wash-trading with real tokens).

3. Illegal use of internal buffers

The internal buffers may only be used to replace legitimate AMM interactions available to the general public for the purpose of saving transaction costs.

4. Failure to report correct transacted values for encoded transactions

Solvers may choose to include encoded transactions in their solution, by providing relevant calldata, but when doing so they must also truthfully specify the amounts transferred by each encoded transaction. This is required for the backend to be able to verify the proposed token conservation constraints, and can be checked retrospectively.

5. Other malicious behavior

Malicious solver behavior is not limited to the above examples. Slashing can still happen for other reasons where there is intentional harm caused to the user and/or the protocol at the discretion of the CowDAO.

Proposed update #2: global token conservation

A very natural requirement for a solution is that, for each traded token, the amount of units bought of that token is equal to the amount of units sold of that token. In other words, no tokens can be “created” or “destroyed” within an executed settlement (that is, before removing AMM interactions that can be fulfilled using the internal buffers). As it considers the sum of the traded amounts of all orders and all external liquidity, we refer to this condition as “global token conservation”. While this constraint has already appeared in the documentation (see here) and we have notified solver teams to respect it, it is currently not explicitly checked and enforced by the backend as part of the solution validation. This, in turn, poses a risk especially regarding a potential abuse of internal buffers.

Therefore, in order to ensure correctness of the solutions and prevent buffer exploits, we propose to add the global token constraint for all tokens to the rules of the solver competition and implement a corresponding check in the backend.

Proposed update #3: local token conservation

As mentioned above, unfair shifting of user surplus from one order to another can happen in the case where two orders are trading against external liquidity on independent trading cycles. This means that an order that was necessary for generating a certain surplus might not end up “receiving” it. In order to mitigate such undesirable behavior, it needs to be ensured that for every user order, no external tokens “enter” or “exit” the trading cycles that the order is part of (and consequently, no surplus can be shifted between orders on independent trading cycles). This condition is referred to “token conservation per order" or “local token conservation”, and described in much greater detail in this technical blog post.

In order to provide stronger fairness guarantees for the users, we propose to add the local token conservation constraint for all user orders to be executed to the rules of the competition and implement the corresponding check in the backend.

Conclusion

We consider the proposed additions to the rules of the solver competition an important step towards a mechanism that provides verifiably fair solutions and that protects users and the protocol from potentially malicious solver behavior.

We are looking forward to feedback on these proposed changes by our CoWmunity!


Update July 15th, 2022

Removed “Proposed Update #2: global token conservation” from the proposal (controversial and needs more research).

9 Likes

Mostly agree with adding #3 (local token conservation). One drawback is that some desirable solutions get excluded by adding this constraint. In practice, I think it prevents many more bad solutions though. I don’t think I’ve ever seen an example of a good solution being prevented IRL, but bad ones are common.

I am a little bit torn on the #2 (global token conservation) combined with #1.4.

The problem is that transacted values are stochastic for most interactions due to potential slippage. Who is deciding what the ‘correct’ transacted values are? If this is a slashable offense, it is a bit tricky that the rules are not set a priori here.

You could say that in some cases the ‘correct’ translated values can be recovered from the interaction data (downstream dex aggregators like 0x and 1inch have a ‘quoted’ rate for example). But I would argue that the interesting value is the ‘expected’ value transaction value. How this value relates to the ‘quoted’ value depends, among other things, on the slippage strategy chosen by the solver. I.e. higher slippage tolerance means the expected transacted value (conditional on non-failure) is lower and vice versa. Furthermore, for direct AMM interactions there are no quoted transacted amounts recoverable from the calldata, only minimum return amounts.

Personally I would prefer to keep the status quo, where the solvers are financially liable for the (aggregate) correctness of their reported values by making them pay for slippage if it is negative across the week. This way you are rewarding solvers if their predictions are accurate in practice, instead of in theory. I see two downsides with this route though:

  • Relying on external prices to compute the aggregate slippage across the week is not ideal
  • It may trigger a race to the bottom where solvers deliberately quote optimistic prices to the extent that they always have large negative slippage, just to grab more market share

If the second point is bad for the protocol I don’t know: it’s effectively distributing COW reserves that were meant to reward solvers back to the users. So good for the users, but perhaps not ideal if the intent of those reserves was to subsidize innovation. Then again, if there are noticeable types of auctions where solvers are constantly price cutting in this way, that may be a sign the rewards for those batches are too high, and we should change the reward structure to reflect that

1 Like

Regarding global token conservation:

On which state should this global token conservation be checked? On the AMM state of the leading Ethereum block?

In case this is true, hardcoding this rule without any tolerance has some drawbacks: Image that markets are very volatile and solvers are seeing that in the next block the price of one asset will definitively move: E.g, the solvers believe that in the next block there will be a flashbots transaction that will move a certain AMM pool into a certain direction. Then, in order to reduce the expected negative slippage of a solver, they would have to set the prices in such a way that they receive negative token conversation. How would one deal with this fact?

1 Like

Thanks for your input!

I believe the current idea is that global token conservation is checked by the driver, using the executed amounts of the orders and the interactions, both of which are supplied in the solvers solution json. This is, in a sense, checking “theoretical” execution amounts, at least for amms/interactions, for the volatility reasons you highlight. In my opinion, there should be some tolerance in this test, but only as large as necessary to acomodate rounding errors.

Now, the other check that verifies if reported executed amounts of interactions correspond to the actually executed amounts on chain, does indeed need to account for blockchain volatility. If this is not explicitly checked, and we let solvers to lie about it if they pay for the slippage, then we are effectively designing a system where solvers can buy unfairness of solutions, which might not be ideal.

1 Like

Yes totally agree with your point here. There are also other reasons why you would not expect the transacted flow to be equal to the last block on average. Even if you expect a random price movement with mean zero (eg Gaussian noise), if you have a one-sided slippage limit like we do, the average price movement will be non-zero.

I think the best you can do is make the rules such that you incentivize solvers to truthfully report their expectations of transacted values, although I am not quite sure how you would do that here. The question is how important that property is to the protocol though (as long as solvers are financially liable for the correctness).

Yeah, makes sense.

But if we criticize these issues for the global token conversation, should we also do it for the local token conversation?

Fundamentally, it suffers from the same principles: Solver think that there will be price changes and hence do report solutions that don’t satisfy local token conversation.

Personally I would prefer to keep the status quo, where the solvers are financially liable for the (aggregate) correctness of their reported values by making them pay for slippage if it is negative across the week.

Maybe we can do the same for the local token conservation: This condition will be checked also in post-processing and any violation can be measured by some “surplus-defect” that a user suffered. These surplus-defects could then be accounted as deficit for the solvers weekly payout.
That means: Even if the solution does not have any slippage overall, if the surplus is not fairly distributed between orders, this distribution error will be measured and punished later on.

The same thing we can do for the envy-freeness. If the solutions in the post process are not envy-free, then the generated envy is measured and punished in the payout.
One would start with allowing higher tolerances at the start, but then continue to reduce them until “all solutions don’t have any surplus distribution issues any more and are basically envy-free”.

I don’t think this is a major issue. The envy-freeness, surplus-distribution defects and global token conversations can be calculated without prices. Only for the conversion of an imbalance/deficit of a token into a punishment in ETH, there needs to be a price IMO. Hence, any inaccuracies in prices will only make the punishment inaccurate. But this is not so important, as it still sets the right incentives for solvers, as they will just strive to minimize the imbalances, envy and surplus-distribution defects in the first place.

The big drawback of testing all these checks only a posteriori is that it puts all the DAOs buffers more at risk. But maybe it can be mitigated by checking the conditions roughly before submission and then more accurate once they are hitting the chain

I agree that global conservation may be hard to implement in practice. If I understand correctly, the main motivation for it was to prevent exploiting internal buffer trades? For this I think we could require a solver that wants to use internal buffers at a given exchange rate to provide proof in form of calldata (so that the “internalized amounts” can be simulated on the mined block in hindsight).

Apart from those internalized imbalances, all discrepancies in global token conservation could be considered as slippage and handled as such. Would this offer any additional attack surface that I’m overseeing?

Another consideration for this rule could have been to reduce required solver bond size going forward (I’m not sure this is the case, though). With the plan to colocate the driver and solver logic (to allow solvers implement their own solution submission strategies) I think we will always need sufficient trust/bond to cover the settlement contract’s balance.

1 Like

Good point on the internal buffer trades: maybe a practical solution would be to require that internal buffer trades simulate on the block they were queried on. Because a block could have been mined during the auction, it’s not 100% deterministic which block a solver simulated on. For compliance you could just check if there is any block that plausibly overlaps with the auction timer, on which the trade correctly simulates. This way you can just mechanically enforce equality, and avoid the whole random price movement problem.

I believe this is enough to discourage systematic abuse of the buffer. Theoretically solvers could gain a bit by picking the most favourable block when multiple are mined within an auction. Requires low latency engineering for almost no potential gains though, so I don’t see any rational solver putting in the effort.

(For completeness: I don’t think using the same approach for executed trades is wise, especially large ones. The impact of a single tick price movement when someone sells 1K ETH has a substantial impact on the buffer, so any expected discrepancy with the historical block should be priced in as much as possible)

This is what I mean by “solvers buying fairness”. I don’t think we should do that.

O I think I see now what you (and cowry) are saying here. Basically the issue would be if a solver wanted to submit an unfair solution like the example from the token conservation test post, they could get away with it by misrepresenting their expectations, i.e. flipping the limit prices of the two orders so they match up with the filled amounts that the two user orders are getting. Is that about right?

They can say they expect them to be such that the local token balance constraint is satisfied, but at execution time, they will be violating the constraint. If that is what you mean, then we agree!

I understand the concern. But how else would solve the issue that the on-chain prices might be different when the settlement transaction hits the blockchain and in the last block and solvers need to account for it in some way.

If you are concerned that solvers will just buy the fairness, we can also make it in such a way that solvers have to pay 2x the surplus shifts that they cased. This way, they can not buy fairness, as they will get punished in either case.

I guess what I am saying is that I suspect a system where the value f we fine a cheater for stealing s is the same as the value that was stolen (i.e. f = s) seems problematic. I guess the reason we don’t see that anywhere else, is because stealing creates havoc in the system: users must be indemnified, users are unhappy for the trouble, other solvers have fixed costs and are loosing money because someone cheated, etc - and the total prejudice usually surpasses the stolen amount s.

Now if you say that the fine is something greater than s, e.g. f = k * s for some very well selected positive k, then I suppose it can work.

I am curious: if we need to check if someone cheated retrospectively, and we fine them also retrospectively, what is the difference between fining them in the weekly payouts vs using the slash mechanism for that?

Bit difficult to think how this would work out without defining how we would detect a surplus shift when the global token conservation constraints are not satisfied.

Is it going to be something like: if I knew the true executed dex amounts, I would have assigned users different executed amounts → I conclude there is surplus shifting? That seems tricky because it’s almost guaranteed to be true whenever any slippage happened, even it’s random (unless everything cancels out perfectly).

I’d like to push for a consensus here, so we can get on with this CIP.

Firstly, to address previous replies, I’d like to point out that there is not much difference from the “proposed update #2: global token conservation” and the current “3. Illegal use of internal buffers”. I mean, theoretically, one can break global token conservation so that buffers becomes richer, but I don’t think we need to have that rule in writing :wink:

I think the concerns raised above are more general though. It touches the problem that one cannot really guarantee that a solution that satisfies all the constraints in one block, will still satisfy the constraints on the next block. The reason is of course that AMM state is changing across blocks, and that is nothing we can do about it. Just to stress that what is “noisy” is the f(in)->out amm functions, but this “noise” then propagates as this function is used on multiple constraints in the problem (e.g. global and local token conservation, envy-freeness, the objective value, etc). The fact people picked on the the global token conservation constraint above does not mean that that is the only one affected by this problem.

There are some ideas on how to workaround it:

  1. Measure the violation has on the solver reward payouts, and somehow discount the impact that this value had on the solver weekly payouts.
  2. Define a tolerance for these constraints, and slash solvers that violate the tolerance.

We can continue the discussion about this or other ideas here, but I would like to ask if you think this CIP would be useful enough just by enumerating these constraints, leaving the how and when they are checked to a future blog post?

1 Like

I now tend to agree with what @Cowry and @marco are converging towards. Indeed, testing all these conditions in any prior block has clear drawbacks (it took some time to convince myself but I now see it!), so I think it might just make sense to do all these tests once the transaction has been mined. Once we have the executed transaction, we can test global token conservation (as we do now anyways). Regarding local token conservation, we would also be able to first determine the realized exchange rates of all AMM interactions, that could be used to check local token conservation. II would like to think a bit more about how potential violation of global token conservation affects local conservation (as hinted by @tbosman) but I am not sure whether it would necessarily interfere with it.

In any case, we could come up with some metric for measuring violation. For global token conservation, I guess, it is obvious, but even for local token conservation, the condition can be reformulated so as to describe the difference between e.g., the realized buy amount of an order and the “intended” one, according to the test. This again corresponds to an absolute amount of “money”. All of these could be aggregated together via a simple summation and the solver could be penalized for them, ideally with some > 1 multiplicative factor, as suggested above, so that the penalty ends up being larger than the violation itself. Measuring envy-freeness (or unfairness of a price) might be trickier to quantify, but we could still try to agree on some proxy for it.

I do feel that this should also come with some solid infrastructure to do all the testing right after each settlement, and not just have the penalty computations once a week. I think that the system should be able to dynamically react to a solver consistently violating the constraints to prevent such systematic behaviors; in such a case, we could even propose a dynamic penalty, e.g., if you are consistently off by $5, then the penalty could be gradually increasing so as to kill off any systematic approach to mess with buffers/surplus shifts etc.

Overall, I am in favor of restructuring a bit the proposal and attempting to reach a reasonable draft in the next few days so that we can proceed then with a vote.

I hope we can find a way to make enforcing these constraints work without unwanted side effects. I am out of ideas personally.

I think changing the reward mechanism so that solvers are incentivized to report truthfully has more potential. In particular the idea of outsourcing the reward level which has been floated internally seems to go a long way in that direction (even though it wasn’t the primary goal). I’ll try to write something on that at some later point.

For now let me try to illustrate some of the obstacles. I apologize in advance, some of these examples posts are very technical.

Global token conservation per batch

In the current set up, global token conservation constraints are already effectively checked ex-post by the slippage accounting.

Right now the penalty rules effectively say that the USD values of positive and negative violations are averaged across all tokens in a transaction AND all transactions in a week. This aggregate is then penalized by a multiplier of 1 if it is negative.

Even with these rules the negative slippage has a good chance of putting weekly rewards underwater, this happens all the time. As I write this there are two solvers with overall negative slippage exceeding cumulative rewards, and this is even before cost of reverted transactions. This wasn’t a particularly volatile week either.

I am saying this just to make the point that solvers are already very strongly incentivized to minimize global token conservation violations.

Just to get an idea what would happen if we wouldn’t average across batches, but penalize batches directly even with a multiplier of 1 in case of violations: this query sums total violations across all batches where the overall slippage is negative (but the violation is still summed across all tokens in a transaction).

For the entire month of June, the worst solver would have been 1inch, with a total penalty of over -350K USD (I am sure Gnosis is good for it, though). The external solver with the highest value was Otex, still over 50K USD, substantially more than the total rewards.

1 Like

Violation in magnitude vs in price

Right now a lot of buy orders (exact output orders) are filled through 0x, as they have an API endpoint for those.

Computing execution paths for exact output orders is actually incredibly awkward. Not all DEXes have methods for exact output orders, especially not if you string multiple pools together, and definitely not across different ecosystems.

What’s really awkward though is handling slippage during the actual swap. Afaik only uni v3 and balancer have a way to execute the multi leg swaps in reverse order. That way you can set the slippage tolerance on the input amount, while always getting the required output and paying the minimum delta on your input to make the swap go through.

For all others cases, you maybe able to compute the required input on the historical block, but when swapping you need to first transfer the full input amount. If you send the exact quoted input amount, and anywhere in the path slippage occurs, you end up with insufficient output, which is not acceptable for a buy order. So the only way to ensure you get enough output AND allow slippage is to always input more than you think is necessary.

To give an example:
suppose you want to execute a buy order trading 1 ETH for 1000 USDC with 0.1% slippage. In practice, you actually swap 1.001 ETH for 1001 USDC (same exchange rate, higher volume). In case you hit the maximum slippage of 0.1% you still end up with the required output amount.

So this is what 0x does for buy orders. A consequence is that those orders create (asymmetric) token conservation violations in the typical case. We quote 1 ETH for 1000 USDC to the user, but most of the time we sell 1.001 ETH for 1001 USDC, so we have a -0.001 ETH and a + 1 USDC violation.

On average this should not affect the slippage penalty as the violations cancel out. However, this is fundamentally different from slippage due to random price movements with expectation 0, which is what we were mostly talking about in the rest of the thread.

I personally always thought allowing solvers to do this is a good thing. We could avoid it by not using aggregators for buy orders and always routing multi hop swaps through the settlement contract after each step. This would lead to worse rates and/or higher gas prices though, so I think it’s throwing out the baby with the bathwater. Buy orders are rare though, so I don’t have a strong opinion on this.

However, I now realize that this same problem pops up in worse form whenever we want to settle a literal COW, which is the whole point of the protocol.

Token conservation for COW batches

Suppose we want to solve a COW batch with two orders involving tokens A and B:

  • o1: selling 10 A for at least 9 B
  • o2: selling 5 B for at least 5 A

As the amounts don’t line up, we need to find an AMM to fill the gap. Due to uniform clearing price rule and global token conservation (see also this post) , the orders will be settled at the same exchange rate as the AMM.

This means that we need to find an AMM interaction swapping x token B for y token A, such that x and y solve these equations:

10*x/y = 5+x
and
5*y/x + y = 10

There is no direct way to express this solution in terms of exact input/output swap of amounts that can be computed from the input. You basically need to iterate over different values of x, compute the returned y and see if it fits. It’s possible, but finding the best y given input x is already a NP-hard problem if you have all the inputs, and the latency of querying swap method on the contract sequentially would be a big issue as well. The only case that is kind of doable is using only baseline liquidity, where we have swap curves that are described by a simple function and you can do everything in memory.

One simple way we could approach the general case would be to find a swap x for y that has slightly too much volume to satisfy the equations. Now we scale it down, while keeping the exchange rate fixed, until the equations are satisfied. The global token conservation will be violated, but the violation cancels out across tokens because we didn’t change the exchange rate. This is somewhat analogous to what happens with 0x buy orders.

Note that there is still an incentive to find a trade that is as close as possible to exact, because in that case you get a better exchange rate at the AMM, and therefore higher surplus (also potentially less gas). But if by the auction deadline the solver has found an interaction with a strictly better exchange rate than the best baseline liquidity, just with slightly too much volume, they can still submit it if we allow for such net-0 violations. This should lead to overall better solutions for the user.

(Disclaimer: I know it is possible to get the value of internal storage for eg Uniswap V3 through GraphQL in bulk and recreate the smart contact logic locally so you can efficiently iterate in memory, as this is something MEV searchers do. I don’t think the development, maintenance and resource cost of doing that for all dexes is anywhere near proportional to the revenue potential for solvers right now though. Also, those kinds of optimizations are more a consequence of the winner takes all dynamic, than actually adding value for anybody.)

I see the concerns regarding global token conservation. However, I fill that as of now solvers perform “poorly” in some sense with respect to this constraint, exactly because there is no stricter constraint that is enforced. The 1inch solver, for example, quite often makes unreasonable choices regarding slippage, that I would view mostly as a problem rather than “normal” behavior that should not be penalized. Also, it has happened that some weeks a solver “gets lucky” early in the week by accumulating positive slippage, and then systematically aims for slightly negative slippage in order to win more. This is definitely reasonable behavior given the status quo, but it also suggests, in my opinion, that currently solvers are actually not always incentivized to minimize global token conservation violations.

My suggestion for evaluating the solution at the mined block is that this is the only thing that actually happens, and this is what affects the users and the protocol. Of course, we should definitely acknowledge the challenges that come with this, and we should make sure that solvers are always incentivized to submit what they believe is a very good approximation of the truth. But I do think that this provides a clean and very explicit evaluation of the constraints, as otherwise, I feel it becomes quite tricky to distinguish between what constitutes an intentional vs unintended error.