This project extends the provided toyDB system, which includes:
- PF (Paged File) Layer for fixed-size page management
- AM (Access Method) Layer implementing B+ tree indexing
Documentation (pf.ps, am.ps) and sample datasets (data.tar.gz)
Implement a buffer manager with:
- Configurable buffer pool size
- LRU and MRU replacement strategies (selectable at file-open)
- Dirty-page tracking and explicit "mark dirty" API
- Statistics: logical reads/writes, physical I/O
- Plot showing performance under different read/write mixtures
Build a slotted-page mechanism on top of PF to support:
- Variable-length record storage
- Insert, delete, sequential scan
- Performance comparison with static (fixed-size) record storage
- Table summarizing space utilization and performance
Evaluate indexing on the Student file using roll number as key with:
- Bulk index creation on an existing file
- Incremental record-by-record index construction
- Optimized bulk-loading for sorted files
- Compare page accesses, build time, and query performance across all three methods.
- Configurable buffer pool size:
PF_SetBufferPoolSize(int size) - LRU replacement strategy: Pages moved to head, evicted from tail
- MRU replacement strategy: Pages moved to tail, evicted from head
- Strategy selection:
PF_OpenFileWithStrategy(fname, strategy) - Explicit mark dirty:
PF_MarkDirty(fd, pagenum) - Statistics collection:
PF_GetStatistics()tracks logical/physical I/O
- Variable-length record storage (
reclayer/slotted_page.c) - Insert, delete, and sequential scan operations
- Automatic page compaction for space efficiency
- Static record management for comparison (
reclayer/static_record.c)
- Bulk index creation:
AM_BulkCreateIndex()- reads all records, sorts, builds index - Incremental construction:
AM_IncrementalBuildIndex()- uses existingAM_InsertEntry - Bulk-loading:
AM_BulkLoadIndex()- optimized bottom-up construction for sorted data
- C compiler (cc/gcc)
- Make utility
- Unix-like environment (Linux/Unix/WSL)
cd toydb/pflayer
makeThis creates pflayer.o which is used by the AM layer.
cd ../amlayer
makeThis creates amlayer.o and test executables.
cd ../../reclayer
cc -c slotted_page.c -I../toydb/pflayer
cc -c static_record.c -I../toydb/pflayercd ../toydb/amlayer
cc -c index_build.c -I.cd tests
make test_buffer_pool
./test_buffer_poolcd tests
make test_slotted_vs_static
./test_slotted_vs_staticcd tests
make test_index_construction
./test_index_constructionTests core PF layer features including buffer pool configuration, LRU/MRU strategies, and statistics collection.
Tests buffer pool behavior with different read/write mixtures and replacement strategies.
Original PF layer test demonstrating page allocation, deallocation, and buffer management.
Tests hash-based page lookup and collision handling along with page allocation, deallocation, and buffer manager behavior.
Compares variable-length slotted-page structure with fixed-size static record management. Shows space utilization and performance metrics.
Tests simple index insertion and scans on character and integer fields.
Tests advanced index operations and scan functionality.
Tests complex index queries and range scans.
Compares three index construction methods:
- Incremental index construction
- Bulk index creation
- Optimized bulk-loading for sorted data
Shows build time and page access comparisons.
.
├── README.md
├── data/
├── reclayer/
│ ├── slotted_page.c
│ └── static_record.c
├── toydb/
│ ├── pflayer/
│ │ ├── pf.c
│ │ ├── buf.c
│ │ ├── hash.c
│ │ ├── pf.h
│ │ └── Makefile
│ └── amlayer/
│ ├── am.c
│ ├── aminsert.c
│ ├── amsearch.c
│ ├── index_build.c
│ └── Makefile
├── tests/
├── resources/
└── screenshots/
PF_Init();
PF_SetBufferPoolSize(50);
int fd = PF_OpenFileWithStrategy("datafile", PF_REPLACE_LRU);
int fd2 = PF_OpenFileWithStrategy("datafile2", PF_REPLACE_MRU);PFstats stats;
PF_GetStatistics(&stats);
printf("Logical reads: %ld\n", stats.logical_reads);
printf("Physical reads: %ld\n", stats.physical_reads);char pagebuf[PF_PAGE_SIZE];
SP_InitPage(pagebuf);
short slot_id;
SP_InsertRecord(pagebuf, record_data, record_len, &slot_id);
SP_GetRecord(pagebuf, slot_id, &data, &len);
SP_DeleteRecord(pagebuf, slot_id);// Bulk creation
AM_BulkCreateIndex("student", 0, 'i', sizeof(int), read_func, context);
// Incremental construction
int page_accesses;
AM_IncrementalBuildIndex("student", 0, 'i', sizeof(int),
read_func, context, &page_accesses);
// Bulk-loading (pre-sorted data)
AM_BulkLoadIndex("student", 0, 'i', sizeof(int), sorted_records, num_records);The project implemented and evaluated several key enhancements to the toyDB system, focusing on buffer management, variable-length record storage, and B+ tree index construction. The work demonstrates measurable benefits from an explicit buffer manager with configurable replacement policies, a slotted-page record layer for efficient variable-length storage, and multiple index-building strategies with clear trade-offs in build time and page accesses.
Key takeaways
- A configurable buffer pool with LRU/MRU policies and dirty-page tracking provides predictable I/O behavior and useful statistics for tuning.
- Slotted-page organization supports variable-length records with compaction and outperforms static layouts in space utilization for datasets with variable-sized fields.
- Different B+ tree construction methods (incremental, bulk create, optimized bulk-load) exhibit distinct advantages: incremental is simple, bulk create helps when sorting is feasible, and bulk-load is best for already-sorted input.
- Empirical plots and test programs validate the implementation and quantify trade-offs across read/write mixes and index construction methods.
Future work
- Add support for multi-threaded concurrency control and page pinning semantics.
- Implement more replacement policies (CLOCK, LFU) and refine adaptive strategies.
- Extend the AM layer to support secondary indexes, composite keys, and concurrency-safe index updates.
- Automate benchmarking harnesses to run reproducible experiments across varying dataset sizes and distributions.
Overall, the enhancements make toyDB a stronger foundation for exploring storage and indexing design decisions and provide a practical platform for further database systems experimentation.
- Sahil Narkhede - B23CS1060
- Anshit Agarwal - B23CS1087











