Skip to content

Commit a25f6c6

Browse files
committed
add blog entry Dataclasses vs Pydantic in Constraint Solvers
1 parent 89b2490 commit a25f6c6

File tree

4 files changed

+304
-6
lines changed

4 files changed

+304
-6
lines changed

content/en/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ title: SolverForge
1414
{{< /blocks/cover >}}
1515

1616
{{% blocks/lead %}}
17-
Model your planning problems with an expressive, business-object oriented syntax.
17+
Model your planning problems with an expressive, business-object oriented syntax
1818

1919
<a class="td-link-down" href="#td-block-2"><i class="fas fa-chevron-down"></i></a>
2020
{{% /blocks/lead %}}

content/en/blog/_index.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,5 @@
11
---
22
title: Blog
33
menu: {main: {weight: 30}}
4-
date: 2100-01-01
4+
date: 2025-12-06
55
---
6-
7-
This is the **blog** section. It has two categories: News and Releases.
8-
9-
Files in these directories will be listed in reverse chronological order.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
---
2+
title: Technical
3+
---
Lines changed: 299 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,299 @@
1+
---
2+
title: "Dataclasses vs Pydantic in Constraint Solvers"
3+
date: 2025-12-06
4+
description: >
5+
Architectural guidance for Python constraint solvers: when to use dataclasses vs Pydantic for optimal performance.
6+
---
7+
8+
When building constraint solvers in Python, one architectural decision shapes everything else: should domain models use Pydantic (convenient for APIs) or dataclasses (minimal overhead)?
9+
10+
Both tools are excellent at what they're designed for. The question is which fits the specific demands of constraint solving—where the same objects get evaluated millions of times per solve.
11+
12+
We ran benchmarks across meeting scheduling and vehicle routing problems to understand the performance characteristics of each approach.
13+
14+
**Note:** These benchmarks were run on small problems (50 meetings, 25-77 customers) using JPype to bridge Python and Java. The findings about relative performance between dataclasses and Pydantic hold regardless of scale, though absolute timings will vary with problem size and infrastructure.
15+
16+
---
17+
18+
## Two Architectural Approaches
19+
20+
### Unified Models (Pydantic Throughout)
21+
22+
```python
23+
class Person(BaseModel):
24+
id: str
25+
full_name: str
26+
# Single model for API and constraint solving
27+
28+
class MeetingAssignment(BaseModel):
29+
id: str
30+
meeting: Meeting
31+
starting_time_grain: TimeGrain | None = None
32+
room: Room | None = None
33+
```
34+
35+
One model structure handles everything: JSON parsing, validation, API docs, and constraint evaluation. This is appealing for its simplicity.
36+
37+
### Separated Models (Dataclasses for Solving)
38+
39+
```python
40+
# Domain model (constraint solving)
41+
@dataclass
42+
class Person:
43+
id: Annotated[str, PlanningId]
44+
full_name: str
45+
46+
# API model (serialization)
47+
class PersonModel(BaseModel):
48+
id: str
49+
full_name: str
50+
```
51+
52+
Domain models are simple dataclasses. Pydantic handles API boundaries. Converters translate between them.
53+
54+
---
55+
56+
## Benchmark Setup
57+
58+
We tested three configurations across 60 scenarios (10 iterations × 6 configurations):
59+
60+
- **Pydantic domain models**: Unified approach with Pydantic throughout
61+
- **Dataclass domain models**: Separated approach with dataclasses for solving
62+
- **Java reference**: Timefold v1.24.0
63+
64+
Each solve ran for 30 seconds on identical problem instances.
65+
66+
**Test problems:**
67+
- Meeting scheduling (50 meetings, 18 rooms, 20 people)
68+
- Vehicle routing (25 customers, 6 vehicles)
69+
70+
---
71+
72+
## Results: Meeting Scheduling
73+
74+
| Configuration | Iterations Completed | Consistency |
75+
|---------------|---------------------|-------------|
76+
| Dataclass models | 60/60 | High |
77+
| Java reference | 60/60 | High |
78+
| Pydantic models | 46-58/60 | Variable |
79+
80+
### What We Observed
81+
82+
**Iteration throughput**: The dataclass configuration completed all optimization iterations within the time limit, matching the Java reference. The Pydantic configuration sometimes hit the time limit before finishing.
83+
84+
**Object equality behavior**: We noticed some unexpected constraint evaluation differences when using Pydantic models with Python-generated test data. The same constraint logic produced different results depending on how `Person` objects were compared during conflict detection.
85+
86+
---
87+
88+
## Results: Vehicle Routing
89+
90+
| Configuration | Iterations Completed | Consistency |
91+
|---------------|---------------------|-------------|
92+
| Dataclass models | 60/60 | High |
93+
| Java reference | 60/60 | High |
94+
| Pydantic models | 57-59/60 | Variable |
95+
96+
The pattern was consistent across problem domains.
97+
98+
---
99+
100+
## Understanding the Difference
101+
102+
### Object Equality in Hot Paths
103+
104+
Constraint evaluation happens millions of times during solving. Meeting scheduling detects conflicts by comparing `Person` objects to find double-bookings.
105+
106+
**Dataclass equality:**
107+
```python
108+
@dataclass
109+
class Person:
110+
id: str
111+
full_name: str
112+
# __eq__ generated from field values
113+
# Simple, predictable, fast
114+
```
115+
116+
Python generates straightforward comparison based on fields.
117+
118+
**Pydantic equality:**
119+
```python
120+
class Person(BaseModel):
121+
id: str
122+
full_name: str
123+
# __eq__ involves model internals
124+
# Designed for API validation, not hot-path comparison
125+
```
126+
127+
Pydantic wasn't designed for millions of equality checks per second—it's optimized for API validation, where this overhead is negligible.
128+
129+
### The Right Tool for Each Job
130+
131+
Pydantic excels at API boundaries: parsing JSON, validating input, generating OpenAPI schemas. These operations happen once per request.
132+
133+
Dataclasses excel at internal computation: simple field access, predictable equality, minimal overhead. These characteristics matter when operations repeat millions of times.
134+
135+
---
136+
137+
## Practical Examples
138+
139+
The quickstart guides demonstrate this pattern in action:
140+
141+
### Employee Scheduling
142+
[Employee Scheduling Guide](/docs/getting-started/employee-scheduling/) shows:
143+
- Hard/soft constraint separation with `HardSoftDecimalScore`
144+
- Load balancing constraints using dataclass aggregation
145+
- Date-based filtering with simple set membership
146+
147+
**Key pattern:** Domain uses `set[date]` for `unavailable_dates`—fast membership testing during constraint evaluation.
148+
149+
### Meeting Scheduling
150+
[Meeting Scheduling Guide](/docs/getting-started/meeting-scheduling/) demonstrates:
151+
- Multi-variable planning entities (time + room)
152+
- Three-tier scoring (`HardMediumSoftScore`)
153+
- Complex joining patterns across attendance records
154+
155+
**Key pattern:** Separate `Person`, `RequiredAttendance`, `PreferredAttendance` dataclasses keep joiner operations simple.
156+
157+
### Vehicle Routing
158+
[Vehicle Routing Guide](/docs/getting-started/vehicle-routing/) illustrates:
159+
- Shadow variable chains (`PreviousElementShadowVariable`, `NextElementShadowVariable`)
160+
- Cascading updates for arrival time calculations
161+
- List variables with `PlanningListVariable`
162+
163+
**Key pattern:** The `arrival_time` shadow variable cascades through the route chain. Dataclass field assignments keep these updates lightweight.
164+
165+
---
166+
167+
## The Recommended Architecture
168+
169+
Based on our experience, we recommend separating concerns:
170+
171+
```
172+
src/meeting_scheduling/
173+
├── domain.py # @dataclass models for solver
174+
├── rest_api.py # Pydantic models for API
175+
└── converters.py # Boundary translation
176+
```
177+
178+
### Domain Layer
179+
180+
```python
181+
@planning_entity
182+
@dataclass
183+
class MeetingAssignment:
184+
id: Annotated[str, PlanningId]
185+
meeting: Meeting
186+
starting_time_grain: Annotated[TimeGrain | None, PlanningVariable] = None
187+
room: Annotated[Room | None, PlanningVariable] = None
188+
```
189+
190+
Simple structures optimized for solver manipulation.
191+
192+
### API Layer
193+
194+
```python
195+
class MeetingAssignmentModel(BaseModel):
196+
id: str
197+
meeting: MeetingModel
198+
starting_time_grain: TimeGrainModel | None = None
199+
room: RoomModel | None = None
200+
```
201+
202+
Pydantic handles what it's designed for: request validation, JSON serialization, OpenAPI documentation.
203+
204+
### Boundary Conversion
205+
206+
```python
207+
def assignment_to_model(a: MeetingAssignment) -> MeetingAssignmentModel:
208+
return MeetingAssignmentModel(
209+
id=a.id,
210+
meeting=meeting_to_model(a.meeting),
211+
starting_time_grain=timegrain_to_model(a.starting_time_grain),
212+
room=room_to_model(a.room)
213+
)
214+
```
215+
216+
Translation happens exactly twice per solve: on ingestion and serialization.
217+
218+
---
219+
220+
## Additional Benefits
221+
222+
### Optional Validation Mode
223+
224+
```python
225+
# Production: fast dataclass domain
226+
solver.solve(problem)
227+
228+
# Development: validate before solving
229+
validated = ProblemModel.model_validate(problem_dict)
230+
solver.solve(validated.to_domain())
231+
```
232+
233+
Get validation during testing. Run at full speed in production.
234+
235+
### Clear Debugging Boundaries
236+
237+
The separation makes debugging easier—you know exactly what objects the solver sees versus what the API exposes.
238+
239+
---
240+
241+
## Guidelines
242+
243+
### When to Use Pydantic
244+
245+
- API request/response validation
246+
- Configuration file parsing
247+
- Data serialization for storage
248+
- OpenAPI schema generation
249+
- Development-time validation
250+
251+
### When to Use Dataclasses
252+
253+
- Solver domain models
254+
- Objects compared in tight loops
255+
- Entities with frequent equality checks
256+
- Performance-critical data structures
257+
- Internal solver state
258+
259+
### The Hybrid Pattern
260+
261+
```python
262+
@app.post("/schedules")
263+
def create_schedule(request: ScheduleRequest) -> ScheduleResponse:
264+
# Validate once at API boundary
265+
problem = request.to_domain()
266+
267+
# Solve with fast dataclasses
268+
solution = solver.solve(problem)
269+
270+
# Serialize once for response
271+
return ScheduleResponse.from_domain(solution)
272+
```
273+
274+
Validation where it matters. Performance where it counts.
275+
276+
---
277+
278+
## Trade-offs
279+
280+
### More Code
281+
282+
Separated models mean additional files and conversion logic. For simple APIs or prototypes, unified Pydantic might be fine to start with.
283+
284+
### Performance at Scale
285+
286+
The overhead difference grows with problem size. Small problems might not show much difference; larger problems will.
287+
288+
---
289+
290+
## Summary
291+
292+
Both Pydantic and dataclasses are excellent tools. The key insight is matching each to its strengths:
293+
294+
- **Dataclasses** for solver internals—simple, predictable, optimized for repeated operations
295+
- **Pydantic** for API boundaries—rich validation, serialization, documentation generation
296+
297+
This separation lets each tool do what it does best.
298+
299+
Full benchmark code and results: [SolverForge Quickstarts Benchmarks](https://github.com/solverforge/solverforge-quickstarts/tree/main/benchmarks)

0 commit comments

Comments
 (0)