Grant Application: CoW Protocol Playground Performance Testing Suite

bleu · November 18, 2025, 12:01am

Author Information

Team: @bleu @yvesfracari @ribeirojose @mendesfabio @lgahdl

About Us:
bleu collaborates with companies and DAOs as a web3 technology and user experience partner. We’re passionate about bridging the experience gap we see in blockchain and web3.

Our work for CoW so far:

[CoW] Framework Agnostic SDK: Restructured SDK architecture to be more composable with framework-agnostic base packages with EVM adapters.

[CoW] Hook dApps: CoW Hooks dApps – Built hook dApps integrated into CoW Swap frontend and developed the cow-shed module of @cowprotocol/cow-sdk to simplify permissioned hooks.

[CoW] Offline Development Mode (proposal): Self-contained offline development environment enabling developers to work without external dependencies while testing solver strategies with realistic DEX liquidity.

Simple Summary

The CoW Protocol Playground is currently incapable of running systematic performance tests without deploying to production.

We propose a performance testing suite for the CoW Playground that:

Generates configurable synthetic load (orders and patterns).
Measures performance end-to-end.
Integrates with the existing Prometheus/Grafana stack.
Works with fork mode (primary) and offline mode (stretch goal).

Goal

This proposal addresses the need for performance testing capabilities in the CoW Protocol development workflow. Current limitations include:

Performance improvements require deployment to production environments
No way to generate a synthetic load for testing
Difficult to measure the impact of optimizations before deployment
Cannot simulate edge cases or stress conditions
No standardized approach to performance testing

Benefits for the CoW Ecosystem:

Risk Reduction: Identify performance issues before production deployment
Faster Development: Measure optimization impact immediately in fork mode
Better Insights: Understand system behavior under various load patterns
Data-Driven Decisions: Make optimization choices based on concrete metrics
Reproducible Testing: Standard test scenarios for consistent benchmarking
Fork Mode Testing: Test performance with realistic mainnet state using Anvil fork mode

Fork Mode Integration (Primary Requirement)

Works with Anvil fork mode (anvil --fork-url $MAINNET_RPC).
Uses CoW archive node made available for development.
Leverages Anvil’s state caching for faster subsequent test runs after initial setup.
Integrates with existing Prometheus/Grafana metrics.
Provides realistic mainnet state for authentic performance testing.

Benefits of fork mode:

Test against actual mainnet state and liquidity
Realistic DEX interactions and pricing
Authentic solver behavior and settlement scenarios
First run caches state, subsequent runs are much faster

Offline Mode Compatibility (Stretch Goal)

Validate compatibility with offline mode environment
Test with pre-deployed contracts and local DEX liquidity
Confirm no external dependencies needed for offline testing
Document any differences in performance characteristics

Milestones

Milestone	Duration	Payment (xDAI)
M1 — Load Generation Framework	2 weeks	6,000 xDAI
M2 — Performance Benchmarking	2 weeks	6,000 xDAI
M3 — Metrics & Visualization	2 weeks	6,000 xDAI
M4 — Test Scenarios	1 week	3,000 xDAI
M5 — Integration, Documentation & Offline Mode Exploration	2 weeks	6,000 xDAI
Maintenance	1 year	27,000 COW

Total Duration: 9 weeks

Total Funding: 27,000 xDAI

Maintenance Vesting: 27,000 COW over 1 year

Deliverables

1. Load Generation Framework

Realistic order flow simulation:

Generates market/limit orders with configurable rates, sizes, token pairs, and patterns.
Simulates multiple traders and signing behavior.

2. Performance Benchmarking Tools

Measure order lifecycle, settlement latency, solver rounds, API latency, and resource usage.
Store baselines and compare runs to detect regressions.

3. Metrics Collection & Visualization

Prometheus exporters for testing metrics.
Ready-to-use Grafana dashboards (throughput, latency distributions, solver metrics, resource usage, error rates).
Optional alerts for performance degradation.

4. Test Scenarios & Configurations (RFP Requirement)

Reusable test configurations:

Predefined scenarios: light, medium, heavy, spike, sustained, edge cases.
Configuration-driven via YAML/JSON + a simple scenario builder.

5. Load Testing CLI Tool

Simple commands like performance-test run --scenario heavy-load.
Reads config files, shows real-time progress, exports reports.
CI-friendly interface.

6. Performance Regression Detection

Compare current vs baseline runs.
Highlight regressions via deltas and percentiles.
Optionally surface alerts through existing stack.

7. Playground Integration

Integrated into the offline docker-compose.
Validated against Anvil, pre-deployed contracts, and local liquidity.
Stretch: explore fork mode compatibility.

8. Documentation

Quick start for running tests.
Scenario configuration reference.
Metrics/graphs interpretation guide.
Architecture overview and extension points.

Specification

M1: Load Generation Framework (2 weeks)

Build the core load generation capabilities:

Order generation engine
- Market and limit order creation using CoW SDK order schemas
- Realistic token pair distribution
- Configurable order parameters
- Treat playground as HTTP API for standard load testing
User simulation module
- Multiple concurrent traders
- Signature generation and submission
- Order status tracking
CLI tool interface
- Command-line argument parsing
- Configuration file support
- Real-time progress display
Order submission strategies
- Constant rate submission
- Burst patterns
- Gradual ramp-up

Goal: A working load generator that creates realistic order flow for the playground environment.

Technical Note: Will evaluate k6 (strong candidate due to Grafana integration) vs Python-based tools for framework selection.

M2: Performance Benchmarking (2 weeks)

Implement performance measurement and comparison:

Metrics collection framework
- Order lifecycle timing (complex: track from submission through settlement)
- Settlement latency tracking
- API response times (req/s from load testing framework - k6 provides this natively)
- Resource utilization monitoring (extract from Docker container stats)
Baseline snapshot system
- Performance baseline capture
- Metadata and configuration storage
- Version control integration
Comparison engine
- Statistical analysis
- Regression detection algorithms (non-trivial: define thresholds, statistical significance)
- Performance difference reporting
Automated reporting
- Summary statistics
- Detailed performance breakdowns
- Visualization-ready data export

Goal: Benchmarking system with regression detection.

Technical Note: Regression algorithm and accurate order lifecycle timing are the most complex parts requiring careful design.

M3: Metrics & Visualization (2 weeks)

Integrate with Prometheus/Grafana infrastructure:

Prometheus exporters
- Custom metrics for load testing
- Performance benchmark metrics
- Test scenario metadata
Grafana dashboards
- Order throughput visualization
- Latency distribution histograms
- Resource utilization panels
- Comparison views
Alerting rules
- Performance degradation alerts
- Error rate thresholds
- Resource exhaustion warnings

Goal: Rich visualization and monitoring capabilities.

M4: Test Scenarios (1 week)

Build a test scenario library:

Implement predefined scenarios
- Light, medium, heavy load scenarios
- Spike and sustained load patterns
- Edge case scenarios
Create scenario configuration system
- Framework-native configuration (e.g., k6 JavaScript scenarios or YAML/JSON)
- Validation and error handling
- Template library
Example scenario collection with documentation

Goal: Rich library of reusable test scenarios.

M5: Integration, Documentation & Offline Mode Exploration (2 weeks)

Final integration, documentation, and offline mode exploration:

End-to-end integration testing with fork mode (PRIMARY)
- Configure Anvil fork mode: anvil --fork-url $MAINNET_RPC using CoW Protocol archive node
- Validate all test scenarios with forked mainnet state
- Verify Anvil’s state caching works correctly for subsequent runs
- Test realistic DEX interactions and settlement scenarios
- Metrics collection verification
- Dashboard validation
- Performance overhead measurement
- Discover and address missing metrics (iterative refinement)
- Document Anvil fork mode behavior and limitations
Offline mode exploration (STRETCH GOAL)
- Validate compatibility with offline mode environment if time permits
- Test with pre-deployed contracts and local liquidity
- Verify no external dependencies required
- Performance comparison between fork and offline modes
- Document differences in performance characteristics
Comprehensive documentation
- Quick start guide for fork mode setup
- Configuration reference for archive node integration
- Anvil fork mode configuration guide (block time, caching behavior)
- Metrics interpretation guide
- Architecture documentation
- Example workflows and tutorials
- Troubleshooting guide
- Offline mode setup guide (with stretch goal achieved)

Maintenance Vesting

Bug fixes and security updates
Documentation updates as protocols evolve
Support for fork mode updates and archive node changes
Address discovered issues during usage

Goal: Production-ready testing suite fully validated with fork mode, complete documentation, and offline mode exploration results as stretch goal.

Architecture Diagram

Architecture Components Description

1. Performance Testing CLI (Blue)

The command-line interface that developers interact with to run performance tests.

Configuration Loader: Reads and parses test scenario configurations from YAML/JSON files
Scenario Engine: Orchestrates the execution of test scenarios, managing timing and coordination
Order Generator: Creates synthetic orders based on scenario specifications
Report Generator: Produces performance reports comparing results against baselines

This is the entry point for developers running performance tests with simple commands like performance-test run --scenario heavy-load.

2. Load Generation (Green)

The core load generation system that simulates realistic user activity.

User Simulator: Models multiple concurrent users with realistic behavior patterns
Signing Engine: Generates valid signatures for orders using test accounts
Order Submitter: Sends orders to the Orderbook API at configured rates and patterns

This block handles the actual generation and submission of synthetic trading activity that mimics real users.

3. Metrics Collection (Orange)

The performance measurement and analysis system.

Metrics Collector: Monitors the CoW Protocol services and captures performance data
Prometheus Exporter: Exposes collected metrics in Prometheus format
Baseline Manager: Stores and manages performance baseline snapshots
Performance Comparator: Analyzes current performance against baselines and detects regressions

This block provides the intelligence to measure, track, and compare performance over time.

4. Monitoring Stack (Purple)

The visualization and alerting infrastructure.

Prometheus: Time-series database storing all performance metrics
Grafana Dashboards: Visual interfaces showing performance trends, latency distributions, and comparisons
Alert Manager: Sends notifications when performance degrades or thresholds are exceeded

This provides real-time visibility and historical tracking of system performance.

Method

Technical Approach

We propose a performance testing framework designed for the playground environment:

Fork Mode Integration (Primary Requirement):

Anvil fork mode: Uses anvil --fork-url $MAINNET_RPC with CoW Protocol archive node
Realistic mainnet state: Tests against actual DEX liquidity and contract state
Optimized performance: Anvil caches state after first run for faster subsequent tests
12s block time: Configured for Ethereum-like timing behavior
Docker integration: Runs alongside existing playground services
Authentic testing: Real solver behavior and settlement scenarios

Framework Selection:

We will evaluate and select the best-fit framework during M1 based on Grafana integration and CoW Protocol requirements:

k6 (leading candidate): Excellent Grafana integration, native Prometheus metrics export, JavaScript scenarios, proven performance testing capabilities
Python-based alternatives: (Locust, aiohttp) if specific CoW SDK integration needs outweigh k6’s advantages
CoW SDK integration: Reuse order schemas and types from @cowprotocol/cow-sdk for realistic order generation
API-first approach: Treat playground services as HTTP APIs for standard load testing patterns

Architecture:

Concurrent/asynchronous execution: Handle high-volume order generation efficiently
CLI-first design: Easy integration with CI/CD pipelines
Configuration-driven: Framework-native configuration for flexible scenario definition
Docker-native: Seamless integration with existing playground docker-compose setup

Load Generation Strategy:

Realistic order simulation: Model actual user behavior and order patterns
Configurable load patterns: Support various testing scenarios
Minimal system impact: Efficient resource usage when not actively testing
Extensible architecture: Easy to add new order types and patterns

Metrics Collection:

Non-intrusive monitoring: Leverage existing logging and metrics
Standard protocols: Prometheus for storage, Grafana for visualization
Rich metrics: Capture latency distributions, not just averages
Historical tracking: Enable long-term performance trend analysis

Implementation Strategy

Modular Design: Separate concerns (generation, collection, visualization)
Configuration-Driven: All test scenarios defined in configuration files
Docker Integration: Run as part of playground docker-compose setup
CI/CD Ready: Command-line interface for automated testing
Extensible: Plugin architecture for custom metrics and scenarios

Open Source Commitment

All code will be open-source from day 0. We’re open to feedback during PRs and will maintain the codebase according to CoW Protocol standards.

Long-term Sustainability

Maintenance Plan:

1-year maintenance through COW token vesting (27,000 COW)
Bug fixes and feature enhancements
Documentation updates as protocols evolve
Community support and issue triage
Updates for new playground features

Community Ownership:

All code contributed to CoW Protocol repositories
Documentation enables community contributions
Plugin architecture for community extensions
Training and knowledge transfer

Evaluation Criteria

Per the RFP, our proposal addresses all evaluation criteria:

1. Approach to Load Generation and Testing

Realistic simulation: Model actual user behavior and order patterns using CoW SDK order schemas
Flexible scenarios: Pre-built scenarios plus custom configuration
Scalable architecture: Handle light to heavy load testing
Concurrent/asynchronous design: Efficient resource usage and high throughput
Industry-standard tools: Evaluate k6 (preferred for Grafana integration) vs Python-based solutions
API-first approach: Treat playground services as HTTP APIs for standard load testing patterns

2. Quality of Metrics and Insights

Metrics: Latency, throughput, resource usage, error rates
Statistical analysis: Distributions, percentiles, regression detection
Actionable insights: Clear identification of bottlenecks
Comparative analysis: Before/after performance comparison
Historical tracking: Long-term performance trend visibility

3. Ease of Use for Developers

Simple CLI: Single command to run tests
Pre-built scenarios: Common test cases ready out-of-the-box
Clear documentation: Quick start to advanced usage
Intuitive configuration: YAML/JSON for easy customization
Automated reporting: No manual data analysis required

4. Integration with Existing Tools

Playground Fork Mode: Full compatibility with Anvil fork mode using CoW archive node
Anvil fork mode: Configured with 12s block time and state caching for optimal performance
Archive node integration: Uses CoW’s archive node for mainnet state forking
Prometheus/Grafana: Native integration with existing monitoring stack
Docker Compose: Seamless playground integration
Offline mode (stretch): Potential compatibility exploration
CI/CD ready: Command-line interface for automation

5. Maintainability and Documentation

Clean architecture: Modular, well-structured codebase
Tests: Unit and integration test coverage
Clear documentation: Architecture, usage, and extension guides
Example scenarios: Real-world usage examples

6. Cost and Timeline

Total cost: $27,000 xDAI development + 27,000 COW (1-year vesting) maintenance
Timeline: 9 weeks
Rate: $3,000/week
Buffer included: Extra time in M5 for discovering missing metrics and handling unexpected Anvil fork mode limitations

Funding Request

Development Grant: $27,000 (USDC)

Maintenance Vesting: 27,000 COW (1-year vesting from delivery date)

Timeline Breakdown:

M1 (Load Generation Framework): 2 weeks
M2 (Performance Benchmarking): 2 weeks
M3 (Metrics & Visualization): 2 weeks
M4 (Test Scenarios): 1 week
M5 (Integration, Fork Mode & Documentation): 2 weeks
Total: 9 weeks

Rate: $3,000/week

Budget Breakdown

Development Grant ($27,000 USDC):

Developer hourly rates during execution
Project manager on a need-to-know basis
Testing and validation infrastructure

Maintenance Vesting (27,000 COW over 1 year):

Bug fixes and feature enhancements
Documentation updates as protocols evolve
Community support and issue triage
Updates for new playground features

Payment Information

Gnosis Chain Address: 0x554866e3654E8485928334e7F91B5AfC37D18e04

Additional Information

The 1-year maintenance vesting ensures ongoing support and improvements as the CoW Protocol evolves. We’re committed to maintaining high code quality and responsiveness to community feedback throughout the maintenance period.

Our recent experience with the Offline Development Mode project gives us deep familiarity with the playground architecture, positioning us well to build effective performance testing tooling.

Terms and Conditions

By submitting this grant application, we acknowledge and agree to be bound by the CoW DAO Participation Agreement and the CoW Grant Terms and Conditions.

mfw78 · November 24, 2025, 3:14pm

Hi @bleu,

Thanks for your application for this RFP. On reviewing it, I note that you mention that this RFP requires the Offline Mode Integration (Primary requirement). Is this strictly required if an archive node were to be substituted and available for forking? (yes, I understand that the offline mode RFP is for offline, but want to be able to evaluate these grant applications independently of one another if possible )

mfw

bleu · November 25, 2025, 11:36am

Architecturally there’s not much difference between running the suite between the two options (offline and fork mode).

If we go fork-first, we will probably still hit rate limits / bottlenecks under very heavy load (Anvil + archive node). My understanding is that this is actually desirable at this stage: the goal is to surface those throttles and constraints, not necessarily fix all infra issues as part of this grant.

Practically, IMO this means we can:

Keep offline mode as the guaranteed baseline (per RFP), and
Design the suite so it also runs cleanly in fork mode, or even prioritize fork mode first if that’s preferred, assuming an archive node is available.

This might surface other issues with the current architecture, but our expectation is that we’re not responsible for fixing all of those as part of this performance testing grant — rather, we provide the tools and data to make them visible. Does that make sense to you?

One clarification on our side: Would you be providing the archive node for us to fork from, or should we explicitly include operating an archive node/using an upstream RPC in the proposal scope/budget?

mfw78 · November 26, 2025, 9:30am

If the tests that are done are pegged to a specific fork block (which could be configurable), then anvil will cache all the state that it is requesting, essentially having a higher initial runtime, but subsequent runs should be much faster.

For the purposes of this, I’d be inclined to talk to devops to have one of the archive nodes from CoW Protocol made available for the purposes of development (still to be discussed with them).

bleu · November 26, 2025, 9:13pm

Hi @mfw78! Thanks for the comment. Using a specific fork block with anvil’s caching should handle the performance concerns we mentioned, we can design the suite to run cleanly in fork mode first considering this. Let us know once you’ve confirmed with devops and we can adjust the proposal accordingly. The core testing architecture should work the same way regardless of whether it’s fork-based or offline-based.

mfw78 · November 28, 2025, 6:00pm

Please feel free just to preface the grant with it being conditional on the provision of archive nodes so as we can avoid sync issues with multiple stakeholders. We can reconsider this aspect if they are unable.

bleu · November 28, 2025, 7:52pm

Hi @mfw78 already adjusted the proposal to put fork mode(with the archive node) as a primary goal, and offline mode as a stretch goal. Now the two grant applications are disconnected one from another. Please let us know if there’s something else to adjust in this application. Thanks for the heads up!

Topic		Replies	Views
RFP: CoW Protocol Playground Performance Testing Suite CoW Grants Program rfp	2	103	November 21, 2025
Grant Application: CoW Playground Offline Development Mode CoW Grants Program grant-funded , rfp	7	106	December 6, 2025
RFP: CoW Protocol Playground Offline Development Mode CoW Grants Program rfp	3	152	November 21, 2025
Grant Application: CoW Protocol Playground Block Explorer & Transaction Analysis CoW Grants Program grant-rejected , rfp	2	144	December 1, 2025
Grant Application - CoW Protocol Playground Block Explorer Integration CoW Grants Program grant-funded	6	66	December 10, 2025