Below is a draft of the technical documentation for DataLinq. This document covers the architectural overview, core components, caching and mutation subsystems, query handling, and testing strategies. It’s intended to help developers and contributors understand how the library works internally and to serve as a reference for future improvements.
DataLinq is a lightweight, high-performance ORM designed primarily for read-heavy scenarios in small to medium projects. The library emphasizes immutability, efficient caching, and seamless backend integration. Its core features include:
-
Immutable Models:
Models are represented as immutable objects to ensure thread-safety and minimize side effects during data reads. When updates are necessary, the system creates a mutable copy via a defined mutation workflow. -
Source Generation:
A source generator produces both immutable and mutable classes from abstract model definitions. This reduces boilerplate and enforces a consistent pattern across the codebase. -
LINQ Integration:
Queries are written using standard LINQ expressions, which are translated into backend-specific commands, allowing a unified querying experience. -
Robust Caching:
A multi-layered caching subsystem—including row, index, and key caches—ensures that repeated data accesses incur minimal overhead. -
Backend Flexibility:
The architecture abstracts backend details behind interfaces and adapters, enabling easy switching between data sources (e.g., MariaDB, SQLite, JSON, CSV).
DataLinq is organized into several interconnected layers that work together to deliver its performance and flexibility:
-
Model Layer:
Consists of abstract model classes decorated with attributes (e.g.,[Table]
,[Column]
,[PrimaryKey]
) that describe how classes map to database tables. These definitions are used by the source generator to create concrete immutable and mutable classes (see Department.cs citeturn0file0, Employee.cs citeturn0file2). -
Instance Creation and Mutation:
Immutable objects are created dynamically based onRowData
provided by data readers. When mutation is required, methods likeMutate()
generate a mutable version, which can be updated and then saved back to the backend. The mutation workflow ensures that only immutable instances are stored in caches, preserving thread-safety and performance (see Immutable.cs citeturn1file2 and Mutable.cs citeturn1file6). -
Caching Subsystem:
The caching mechanism is divided into several parts:- RowCache: Caches immutable row objects keyed by their primary keys, tracking insertion ticks and sizes for eviction based on time, row count, or memory limits (see RowCache.cs citeturn2file2).
- IndexCache and KeyCache: Manage mappings between foreign keys and primary keys, and cache key instances for fast lookups (see IndexCache.cs citeturn2file0 and KeyCache.cs citeturn2file1).
- TableCache: Aggregates the various caches for an entire table, provides methods to update or remove rows based on changes, and supports preloading indices for faster query responses (see TableCache.cs citeturn2file3).
-
Query Engine:
DataLinq uses LINQ as the primary query language. LINQ expressions are parsed and translated into backend-specific SQL (or other query languages), with support for filtering, ordering, grouping, and pagination. The query system leverages caching to avoid unnecessary database round trips, as demonstrated in the extensive unit tests (see QueryTests.cs citeturn2file9). -
Testing Infrastructure:
The library is accompanied by a comprehensive suite of unit and integration tests. These tests verify everything from model instantiation and mutation to complex LINQ query operations and cache behavior (see CacheTests.cs citeturn2file5, MutationTests.cs citeturn2file8, and CoreTests.cs citeturn2file6).
-
Abstract Models:
Developers define models using abstract classes and decorate them with attributes to specify table names, column types, and relationships. For example, the Department class declares properties likeDeptNo
andName
, and defines relations to employees and managers. -
Source-Generated Classes:
A source generator processes these abstract definitions to generate:- Immutable classes: Provide read-only access to data, with lazy loading of related objects.
- Mutable classes: Allow modification of model properties via a
Mutate()
method, and support transactional updates. - Interfaces: Generated interfaces (e.g.,
IDepartmentWithChangedName
) ensure consistency and facilitate mocking in tests.
-
Immutable Base Class:
The base class for immutable models handles:- Retrieving values from underlying
RowData
. - Lazy evaluation of properties.
- Managing relations through helper methods that load related entities only when needed.
- Retrieving values from underlying
-
Mutable Wrapper:
TheMutable<T>
class encapsulates changes in a separateMutableRowData
structure. This ensures that modifications are isolated until explicitly committed, after which a new immutable instance is generated to update the cache. -
Factory Methods:
TheInstanceFactory
provides methods to create immutable instances dynamically. Reflection is used to instantiate models based on metadata extracted from attributes.
-
RowCache:
Stores immutable instances keyed by their primary keys. Tracks insertion ticks and sizes to enforce eviction policies based on time, count, or memory usage. This ensures repeated reads return cached objects without additional allocations. -
IndexCache and KeyCache:
- IndexCache: Maps foreign keys to arrays of primary keys and maintains a tick queue to remove old entries.
- KeyCache: Caches key instances to prevent redundant key creation, enhancing lookup performance.
-
TableCache:
Combines row and index caches for a given table. Handles state changes such as inserts, updates, and deletions by updating the caches accordingly. It also supports methods for preloading indices and retrieving rows with or without ordering.
-
LINQ Integration:
Queries are written in LINQ, and the query engine translates them into backend-specific SQL commands. The translation layer is capable of handling various operations such as:- Filtering using standard where clauses.
- Ordering, grouping, and pagination (using methods like
OrderBy
,Skip
, andTake
). - Joins and relation traversals by leveraging the relation properties defined in models.
-
Cache-Aware Query Execution:
When a query is executed, the system first checks the cache (viaTableCache
andRowCache
) for existing rows. If a row is missing, it retrieves the row data from the database, creates an immutable instance, and adds it to the cache.
-
Unit Tests:
The testing suite covers all aspects of the library:- Cache Tests: Validate that duplicate rows are not created, and that eviction policies based on time, row count, and memory size work as expected.
- Mutation Tests: Ensure that mutable instances correctly capture changes, can be reset, and that saving changes properly updates the backend and cache.
- Query Tests: Provide extensive examples of LINQ query usage, demonstrating filtering, ordering, grouping, and handling of unsupported operations.
-
Integration Tests:
TheDatabaseFixture
sets up real database connections (e.g., to MariaDB and SQLite) and uses generated test data (via Bogus) to ensure that the entire flow—from data retrieval and caching to mutation and query execution—operates correctly.
The caching subsystem is critical for achieving the zero-allocation goal in read-heavy scenarios. Here’s a closer look at the workflow:
-
Insertion into Cache:
When a new row is fetched from the database, its corresponding immutable instance is created using theInstanceFactory
. This instance is then stored in theRowCache
along with metadata (insertion ticks, size). Simultaneously, theIndexCache
is updated to map foreign keys to this row’s primary key (see RowCache.cs citeturn2file2 and IndexCache.cs citeturn2file0). -
Cache Eviction:
- Time-Based Eviction: The system can remove rows that were inserted before a specific tick value.
- Row Count/Size Limits: Methods in
RowCache
allow the cache to enforce limits by removing the oldest rows until the count or total size is within the defined thresholds. - Index Cache Maintenance: The
IndexCache
similarly purges outdated entries using its tick queue mechanism.
-
Cache Retrieval:
Before executing a query, the system checks theRowCache
for the required rows. If a row is found, it’s returned directly. Otherwise, the query system retrieves the missing rows from the database and updates the cache. -
Transaction Awareness:
TheTableCache
can maintain separate caches for transaction-specific data. This ensures that updates within a transaction do not affect the global cache until the transaction is committed.
DataLinq ensures data consistency while allowing mutations through a well-defined process:
-
Immutable to Mutable Conversion:
The generatedMutate()
methods (see source-generated Department file) allow conversion from an immutable instance to a mutable one. This is achieved using pattern matching, ensuring the proper type is returned regardless of whether the object is already mutable or not. -
Tracking Changes:
TheMutableRowData
class tracks modifications in a dictionary. Methods such asReset()
allow reverting changes to the original state, whileHasChanges()
reports whether any properties have been modified. -
Saving Changes:
When a mutable instance is saved, the updated data is written back to the backend. Upon successful commit, a new immutable instance is created to replace the old one in the cache. Extension methods in the generated code (e.g.,Save
,Update
,InsertOrUpdate
) abstract these operations, providing a seamless developer experience.
-
Additional Backends:
Although initial support focuses on MariaDB and SQLite, the modular design facilitates easy addition of new data sources (e.g., NoSQL, JSON files). -
Enhanced Query Optimizations:
Future enhancements could include query caching, more advanced translation strategies, and support for more complex LINQ expressions. -
Developer Contributions:
Clear guidelines and extensive test coverage make it easier for contributors to understand and extend the library. Developers are encouraged to review both the generated code and supporting subsystems (caching, mutation, and query translation) for insights. -
Documentation Updates:
This technical documentation is intended to evolve with the project. Feedback from developers and contributors is welcomed to ensure that the documentation remains accurate and helpful.
DataLinq’s design centers on immutability, efficient caching, and flexible querying, making it an ideal ORM for heavy-read applications with a focus on performance. The separation of concerns between model mapping, caching, mutation, and query translation ensures that each component can be optimized independently while maintaining a consistent developer experience.