|
| 1 | +--- |
| 2 | +title: "Dataclasses vs Pydantic in Constraint Solvers" |
| 3 | +date: 2025-12-06 |
| 4 | +description: > |
| 5 | + Architectural guidance for Python constraint solvers: when to use dataclasses vs Pydantic for optimal performance. |
| 6 | +--- |
| 7 | + |
| 8 | +When building constraint solvers in Python, one architectural decision shapes everything else: should domain models use Pydantic (convenient for APIs) or dataclasses (minimal overhead)? |
| 9 | + |
| 10 | +Both tools are excellent at what they're designed for. The question is which fits the specific demands of constraint solving—where the same objects get evaluated millions of times per solve. |
| 11 | + |
| 12 | +We ran benchmarks across meeting scheduling and vehicle routing problems to understand the performance characteristics of each approach. |
| 13 | + |
| 14 | +**Note:** These benchmarks were run on small problems (50 meetings, 25-77 customers) using JPype to bridge Python and Java. The findings about relative performance between dataclasses and Pydantic hold regardless of scale, though absolute timings will vary with problem size and infrastructure. |
| 15 | + |
| 16 | +--- |
| 17 | + |
| 18 | +## Two Architectural Approaches |
| 19 | + |
| 20 | +### Unified Models (Pydantic Throughout) |
| 21 | + |
| 22 | +```python |
| 23 | +class Person(BaseModel): |
| 24 | + id: str |
| 25 | + full_name: str |
| 26 | + # Single model for API and constraint solving |
| 27 | + |
| 28 | +class MeetingAssignment(BaseModel): |
| 29 | + id: str |
| 30 | + meeting: Meeting |
| 31 | + starting_time_grain: TimeGrain | None = None |
| 32 | + room: Room | None = None |
| 33 | +``` |
| 34 | + |
| 35 | +One model structure handles everything: JSON parsing, validation, API docs, and constraint evaluation. This is appealing for its simplicity. |
| 36 | + |
| 37 | +### Separated Models (Dataclasses for Solving) |
| 38 | + |
| 39 | +```python |
| 40 | +# Domain model (constraint solving) |
| 41 | +@dataclass |
| 42 | +class Person: |
| 43 | + id: Annotated[str, PlanningId] |
| 44 | + full_name: str |
| 45 | + |
| 46 | +# API model (serialization) |
| 47 | +class PersonModel(BaseModel): |
| 48 | + id: str |
| 49 | + full_name: str |
| 50 | +``` |
| 51 | + |
| 52 | +Domain models are simple dataclasses. Pydantic handles API boundaries. Converters translate between them. |
| 53 | + |
| 54 | +--- |
| 55 | + |
| 56 | +## Benchmark Setup |
| 57 | + |
| 58 | +We tested three configurations across 60 scenarios (10 iterations × 6 configurations): |
| 59 | + |
| 60 | +- **Pydantic domain models**: Unified approach with Pydantic throughout |
| 61 | +- **Dataclass domain models**: Separated approach with dataclasses for solving |
| 62 | +- **Java reference**: Timefold v1.24.0 |
| 63 | + |
| 64 | +Each solve ran for 30 seconds on identical problem instances. |
| 65 | + |
| 66 | +**Test problems:** |
| 67 | +- Meeting scheduling (50 meetings, 18 rooms, 20 people) |
| 68 | +- Vehicle routing (25 customers, 6 vehicles) |
| 69 | + |
| 70 | +--- |
| 71 | + |
| 72 | +## Results: Meeting Scheduling |
| 73 | + |
| 74 | +| Configuration | Iterations Completed | Consistency | |
| 75 | +|---------------|---------------------|-------------| |
| 76 | +| Dataclass models | 60/60 | High | |
| 77 | +| Java reference | 60/60 | High | |
| 78 | +| Pydantic models | 46-58/60 | Variable | |
| 79 | + |
| 80 | +### What We Observed |
| 81 | + |
| 82 | +**Iteration throughput**: The dataclass configuration completed all optimization iterations within the time limit, matching the Java reference. The Pydantic configuration sometimes hit the time limit before finishing. |
| 83 | + |
| 84 | +**Object equality behavior**: We noticed some unexpected constraint evaluation differences when using Pydantic models with Python-generated test data. The same constraint logic produced different results depending on how `Person` objects were compared during conflict detection. |
| 85 | + |
| 86 | +--- |
| 87 | + |
| 88 | +## Results: Vehicle Routing |
| 89 | + |
| 90 | +| Configuration | Iterations Completed | Consistency | |
| 91 | +|---------------|---------------------|-------------| |
| 92 | +| Dataclass models | 60/60 | High | |
| 93 | +| Java reference | 60/60 | High | |
| 94 | +| Pydantic models | 57-59/60 | Variable | |
| 95 | + |
| 96 | +The pattern was consistent across problem domains. |
| 97 | + |
| 98 | +--- |
| 99 | + |
| 100 | +## Understanding the Difference |
| 101 | + |
| 102 | +### Object Equality in Hot Paths |
| 103 | + |
| 104 | +Constraint evaluation happens millions of times during solving. Meeting scheduling detects conflicts by comparing `Person` objects to find double-bookings. |
| 105 | + |
| 106 | +**Dataclass equality:** |
| 107 | +```python |
| 108 | +@dataclass |
| 109 | +class Person: |
| 110 | + id: str |
| 111 | + full_name: str |
| 112 | + # __eq__ generated from field values |
| 113 | + # Simple, predictable, fast |
| 114 | +``` |
| 115 | + |
| 116 | +Python generates straightforward comparison based on fields. |
| 117 | + |
| 118 | +**Pydantic equality:** |
| 119 | +```python |
| 120 | +class Person(BaseModel): |
| 121 | + id: str |
| 122 | + full_name: str |
| 123 | + # __eq__ involves model internals |
| 124 | + # Designed for API validation, not hot-path comparison |
| 125 | +``` |
| 126 | + |
| 127 | +Pydantic wasn't designed for millions of equality checks per second—it's optimized for API validation, where this overhead is negligible. |
| 128 | + |
| 129 | +### The Right Tool for Each Job |
| 130 | + |
| 131 | +Pydantic excels at API boundaries: parsing JSON, validating input, generating OpenAPI schemas. These operations happen once per request. |
| 132 | + |
| 133 | +Dataclasses excel at internal computation: simple field access, predictable equality, minimal overhead. These characteristics matter when operations repeat millions of times. |
| 134 | + |
| 135 | +--- |
| 136 | + |
| 137 | +## Practical Examples |
| 138 | + |
| 139 | +The quickstart guides demonstrate this pattern in action: |
| 140 | + |
| 141 | +### Employee Scheduling |
| 142 | +[Employee Scheduling Guide](/docs/getting-started/employee-scheduling/) shows: |
| 143 | +- Hard/soft constraint separation with `HardSoftDecimalScore` |
| 144 | +- Load balancing constraints using dataclass aggregation |
| 145 | +- Date-based filtering with simple set membership |
| 146 | + |
| 147 | +**Key pattern:** Domain uses `set[date]` for `unavailable_dates`—fast membership testing during constraint evaluation. |
| 148 | + |
| 149 | +### Meeting Scheduling |
| 150 | +[Meeting Scheduling Guide](/docs/getting-started/meeting-scheduling/) demonstrates: |
| 151 | +- Multi-variable planning entities (time + room) |
| 152 | +- Three-tier scoring (`HardMediumSoftScore`) |
| 153 | +- Complex joining patterns across attendance records |
| 154 | + |
| 155 | +**Key pattern:** Separate `Person`, `RequiredAttendance`, `PreferredAttendance` dataclasses keep joiner operations simple. |
| 156 | + |
| 157 | +### Vehicle Routing |
| 158 | +[Vehicle Routing Guide](/docs/getting-started/vehicle-routing/) illustrates: |
| 159 | +- Shadow variable chains (`PreviousElementShadowVariable`, `NextElementShadowVariable`) |
| 160 | +- Cascading updates for arrival time calculations |
| 161 | +- List variables with `PlanningListVariable` |
| 162 | + |
| 163 | +**Key pattern:** The `arrival_time` shadow variable cascades through the route chain. Dataclass field assignments keep these updates lightweight. |
| 164 | + |
| 165 | +--- |
| 166 | + |
| 167 | +## The Recommended Architecture |
| 168 | + |
| 169 | +Based on our experience, we recommend separating concerns: |
| 170 | + |
| 171 | +``` |
| 172 | +src/meeting_scheduling/ |
| 173 | +├── domain.py # @dataclass models for solver |
| 174 | +├── rest_api.py # Pydantic models for API |
| 175 | +└── converters.py # Boundary translation |
| 176 | +``` |
| 177 | + |
| 178 | +### Domain Layer |
| 179 | + |
| 180 | +```python |
| 181 | +@planning_entity |
| 182 | +@dataclass |
| 183 | +class MeetingAssignment: |
| 184 | + id: Annotated[str, PlanningId] |
| 185 | + meeting: Meeting |
| 186 | + starting_time_grain: Annotated[TimeGrain | None, PlanningVariable] = None |
| 187 | + room: Annotated[Room | None, PlanningVariable] = None |
| 188 | +``` |
| 189 | + |
| 190 | +Simple structures optimized for solver manipulation. |
| 191 | + |
| 192 | +### API Layer |
| 193 | + |
| 194 | +```python |
| 195 | +class MeetingAssignmentModel(BaseModel): |
| 196 | + id: str |
| 197 | + meeting: MeetingModel |
| 198 | + starting_time_grain: TimeGrainModel | None = None |
| 199 | + room: RoomModel | None = None |
| 200 | +``` |
| 201 | + |
| 202 | +Pydantic handles what it's designed for: request validation, JSON serialization, OpenAPI documentation. |
| 203 | + |
| 204 | +### Boundary Conversion |
| 205 | + |
| 206 | +```python |
| 207 | +def assignment_to_model(a: MeetingAssignment) -> MeetingAssignmentModel: |
| 208 | + return MeetingAssignmentModel( |
| 209 | + id=a.id, |
| 210 | + meeting=meeting_to_model(a.meeting), |
| 211 | + starting_time_grain=timegrain_to_model(a.starting_time_grain), |
| 212 | + room=room_to_model(a.room) |
| 213 | + ) |
| 214 | +``` |
| 215 | + |
| 216 | +Translation happens exactly twice per solve: on ingestion and serialization. |
| 217 | + |
| 218 | +--- |
| 219 | + |
| 220 | +## Additional Benefits |
| 221 | + |
| 222 | +### Optional Validation Mode |
| 223 | + |
| 224 | +```python |
| 225 | +# Production: fast dataclass domain |
| 226 | +solver.solve(problem) |
| 227 | + |
| 228 | +# Development: validate before solving |
| 229 | +validated = ProblemModel.model_validate(problem_dict) |
| 230 | +solver.solve(validated.to_domain()) |
| 231 | +``` |
| 232 | + |
| 233 | +Get validation during testing. Run at full speed in production. |
| 234 | + |
| 235 | +### Clear Debugging Boundaries |
| 236 | + |
| 237 | +The separation makes debugging easier—you know exactly what objects the solver sees versus what the API exposes. |
| 238 | + |
| 239 | +--- |
| 240 | + |
| 241 | +## Guidelines |
| 242 | + |
| 243 | +### When to Use Pydantic |
| 244 | + |
| 245 | +- API request/response validation |
| 246 | +- Configuration file parsing |
| 247 | +- Data serialization for storage |
| 248 | +- OpenAPI schema generation |
| 249 | +- Development-time validation |
| 250 | + |
| 251 | +### When to Use Dataclasses |
| 252 | + |
| 253 | +- Solver domain models |
| 254 | +- Objects compared in tight loops |
| 255 | +- Entities with frequent equality checks |
| 256 | +- Performance-critical data structures |
| 257 | +- Internal solver state |
| 258 | + |
| 259 | +### The Hybrid Pattern |
| 260 | + |
| 261 | +```python |
| 262 | +@app.post("/schedules") |
| 263 | +def create_schedule(request: ScheduleRequest) -> ScheduleResponse: |
| 264 | + # Validate once at API boundary |
| 265 | + problem = request.to_domain() |
| 266 | + |
| 267 | + # Solve with fast dataclasses |
| 268 | + solution = solver.solve(problem) |
| 269 | + |
| 270 | + # Serialize once for response |
| 271 | + return ScheduleResponse.from_domain(solution) |
| 272 | +``` |
| 273 | + |
| 274 | +Validation where it matters. Performance where it counts. |
| 275 | + |
| 276 | +--- |
| 277 | + |
| 278 | +## Trade-offs |
| 279 | + |
| 280 | +### More Code |
| 281 | + |
| 282 | +Separated models mean additional files and conversion logic. For simple APIs or prototypes, unified Pydantic might be fine to start with. |
| 283 | + |
| 284 | +### Performance at Scale |
| 285 | + |
| 286 | +The overhead difference grows with problem size. Small problems might not show much difference; larger problems will. |
| 287 | + |
| 288 | +--- |
| 289 | + |
| 290 | +## Summary |
| 291 | + |
| 292 | +Both Pydantic and dataclasses are excellent tools. The key insight is matching each to its strengths: |
| 293 | + |
| 294 | +- **Dataclasses** for solver internals—simple, predictable, optimized for repeated operations |
| 295 | +- **Pydantic** for API boundaries—rich validation, serialization, documentation generation |
| 296 | + |
| 297 | +This separation lets each tool do what it does best. |
| 298 | + |
| 299 | +Full benchmark code and results: [SolverForge Quickstarts Benchmarks](https://github.com/solverforge/solverforge-quickstarts/tree/main/benchmarks) |
0 commit comments