ToyDB Enhancements

This project extends the provided toyDB system, which includes:

PF (Paged File) Layer for fixed-size page management
AM (Access Method) Layer implementing B+ tree indexing

Documentation (pf.ps, am.ps) and sample datasets (data.tar.gz)

Assignment Requirements

1. Page Buffering (PF Layer)

Implement a buffer manager with:

Configurable buffer pool size
LRU and MRU replacement strategies (selectable at file-open)
Dirty-page tracking and explicit "mark dirty" API
Statistics: logical reads/writes, physical I/O
Plot showing performance under different read/write mixtures

2. Slotted Page Structure

Build a slotted-page mechanism on top of PF to support:

Variable-length record storage
Insert, delete, sequential scan
Performance comparison with static (fixed-size) record storage
Table summarizing space utilization and performance

3. B+ Tree Index Construction (AM Layer)

Evaluate indexing on the Student file using roll number as key with:

Bulk index creation on an existing file
Incremental record-by-record index construction
Optimized bulk-loading for sorted files
Compare page accesses, build time, and query performance across all three methods.

Implementation Summary

Completed Features

1. Enhanced Page Buffering (PF Layer)

Configurable buffer pool size: PF_SetBufferPoolSize(int size)
LRU replacement strategy: Pages moved to head, evicted from tail
MRU replacement strategy: Pages moved to tail, evicted from head
Strategy selection: PF_OpenFileWithStrategy(fname, strategy)
Explicit mark dirty: PF_MarkDirty(fd, pagenum)
Statistics collection: PF_GetStatistics() tracks logical/physical I/O

2. Slotted-Page Structure

Variable-length record storage (reclayer/slotted_page.c)
Insert, delete, and sequential scan operations
Automatic page compaction for space efficiency
Static record management for comparison (reclayer/static_record.c)

3. B+ Tree Index Construction (AM Layer)

Bulk index creation: AM_BulkCreateIndex() - reads all records, sorts, builds index
Incremental construction: AM_IncrementalBuildIndex() - uses existing AM_InsertEntry
Bulk-loading: AM_BulkLoadIndex() - optimized bottom-up construction for sorted data

Compilation Instructions

Prerequisites

C compiler (cc/gcc)
Make utility
Unix-like environment (Linux/Unix/WSL)

Step 1: Build PF Layer

cd toydb/pflayer
make

This creates pflayer.o which is used by the AM layer.

Step 2: Build AM Layer

cd ../amlayer
make

This creates amlayer.o and test executables.

Step 3: Build Record Layer (Optional)

cd ../../reclayer
cc -c slotted_page.c -I../toydb/pflayer
cc -c static_record.c -I../toydb/pflayer

Step 4: Build Index Construction Utilities

cd ../toydb/amlayer
cc -c index_build.c -I.

Running Tests

Test Buffer Pool with Different Strategies

cd tests
make test_buffer_pool
./test_buffer_pool

Test Slotted-Page vs Static Records

cd tests
make test_slotted_vs_static
./test_slotted_vs_static

Test Index Construction Methods

cd tests
make test_index_construction
./test_index_construction

Test Results

PF Layer (Paged File Layer)

Basic Functionality Test (`test_basic`)

Tests core PF layer features including buffer pool configuration, LRU/MRU strategies, and statistics collection.

Buffer Pool Performance Test (`test_buffer_pool`)

Tests buffer pool behavior with different read/write mixtures and replacement strategies.

PF Layer Test (`testpf`)

Original PF layer test demonstrating page allocation, deallocation, and buffer management.

PF Layer Test (`testhash`)

Tests hash-based page lookup and collision handling along with page allocation, deallocation, and buffer manager behavior.

Record Layer (Reclayer)

Slotted-Page vs Static Record Comparison (`test_slotted_vs_static`)

Compares variable-length slotted-page structure with fixed-size static record management. Shows space utilization and performance metrics.

AM Layer (Access Method Layer)

Test 1 (`test1`)

Tests simple index insertion and scans on character and integer fields.

Test 2 (`test2`)

Tests advanced index operations and scan functionality.

Test 3 (`test3`)

Tests complex index queries and range scans.

Index Construction Methods (`test_index_construction`)

Compares three index construction methods:

Incremental index construction
Bulk index creation
Optimized bulk-loading for sorted data

Shows build time and page access comparisons.

Performance Testing

Buffer Pool Performance

Instructions Construction Comparison

Storage Utilization Comparison

File Structure

.
├── README.md             
├── data/               
├── reclayer/            
│   ├── slotted_page.c
│   └── static_record.c
├── toydb/
│   ├── pflayer/         
│   │   ├── pf.c
│   │   ├── buf.c
│   │   ├── hash.c
│   │   ├── pf.h
│   │   └── Makefile
│   └── amlayer/         
│       ├── am.c
│       ├── aminsert.c
│       ├── amsearch.c
│       ├── index_build.c
│       └── Makefile
├── tests/                
├── resources/           
└── screenshots/

API Usage Examples

Buffer Pool Configuration

PF_Init();
PF_SetBufferPoolSize(50);  
int fd = PF_OpenFileWithStrategy("datafile", PF_REPLACE_LRU);
int fd2 = PF_OpenFileWithStrategy("datafile2", PF_REPLACE_MRU);

Statistics Collection

PFstats stats;
PF_GetStatistics(&stats);
printf("Logical reads: %ld\n", stats.logical_reads);
printf("Physical reads: %ld\n", stats.physical_reads);

Slotted-Page Operations

char pagebuf[PF_PAGE_SIZE];
SP_InitPage(pagebuf);
short slot_id;
SP_InsertRecord(pagebuf, record_data, record_len, &slot_id);
SP_GetRecord(pagebuf, slot_id, &data, &len);
SP_DeleteRecord(pagebuf, slot_id);

Index Construction

// Bulk creation
AM_BulkCreateIndex("student", 0, 'i', sizeof(int), read_func, context);

// Incremental construction
int page_accesses;
AM_IncrementalBuildIndex("student", 0, 'i', sizeof(int), 
                         read_func, context, &page_accesses);

// Bulk-loading (pre-sorted data)
AM_BulkLoadIndex("student", 0, 'i', sizeof(int), sorted_records, num_records);

Conclusion

The project implemented and evaluated several key enhancements to the toyDB system, focusing on buffer management, variable-length record storage, and B+ tree index construction. The work demonstrates measurable benefits from an explicit buffer manager with configurable replacement policies, a slotted-page record layer for efficient variable-length storage, and multiple index-building strategies with clear trade-offs in build time and page accesses.

Key takeaways

A configurable buffer pool with LRU/MRU policies and dirty-page tracking provides predictable I/O behavior and useful statistics for tuning.
Slotted-page organization supports variable-length records with compaction and outperforms static layouts in space utilization for datasets with variable-sized fields.
Different B+ tree construction methods (incremental, bulk create, optimized bulk-load) exhibit distinct advantages: incremental is simple, bulk create helps when sorting is feasible, and bulk-load is best for already-sorted input.
Empirical plots and test programs validate the implementation and quantify trade-offs across read/write mixes and index construction methods.

Future work

Add support for multi-threaded concurrency control and page pinning semantics.
Implement more replacement policies (CLOCK, LFU) and refine adaptive strategies.
Extend the AM layer to support secondary indexes, composite keys, and concurrency-safe index updates.
Automate benchmarking harnesses to run reproducible experiments across varying dataset sizes and distributions.

Overall, the enhancements make toyDB a stronger foundation for exploring storage and indexing design decisions and provide a practical platform for further database systems experimentation.

Contributors

Sahil Narkhede - B23CS1060
Anshit Agarwal - B23CS1087

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data		data
reclayer		reclayer
resources		resources
screenshots		screenshots
tests		tests
toydb		toydb
DBMS_Assignment_1.pdf		DBMS_Assignment_1.pdf
README.md		README.md
toyDB-enhancement-report.pdf		toyDB-enhancement-report.pdf

Sahil-1918912/toyDB

Folders and files

Latest commit

History

Repository files navigation

ToyDB Enhancements

Assignment Requirements

1. Page Buffering (PF Layer)

2. Slotted Page Structure

3. B+ Tree Index Construction (AM Layer)

Implementation Summary

Completed Features

1. Enhanced Page Buffering (PF Layer)

2. Slotted-Page Structure

3. B+ Tree Index Construction (AM Layer)

Compilation Instructions

Prerequisites

Step 1: Build PF Layer

Step 2: Build AM Layer

Step 3: Build Record Layer (Optional)

Step 4: Build Index Construction Utilities

Running Tests

Test Buffer Pool with Different Strategies

Test Slotted-Page vs Static Records

Test Index Construction Methods

Test Results

PF Layer (Paged File Layer)

Basic Functionality Test (test_basic)

Buffer Pool Performance Test (test_buffer_pool)

PF Layer Test (testpf)

PF Layer Test (testhash)

Record Layer (Reclayer)

Slotted-Page vs Static Record Comparison (test_slotted_vs_static)

AM Layer (Access Method Layer)

Test 1 (test1)

Test 2 (test2)

Test 3 (test3)

Index Construction Methods (test_index_construction)

Performance Testing

Buffer Pool Performance

Instructions Construction Comparison

Storage Utilization Comparison

File Structure

API Usage Examples

Buffer Pool Configuration

Statistics Collection

Slotted-Page Operations

Index Construction

Conclusion

Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Basic Functionality Test (`test_basic`)

Buffer Pool Performance Test (`test_buffer_pool`)

PF Layer Test (`testpf`)

PF Layer Test (`testhash`)

Slotted-Page vs Static Record Comparison (`test_slotted_vs_static`)

Test 1 (`test1`)

Test 2 (`test2`)

Test 3 (`test3`)

Index Construction Methods (`test_index_construction`)

Packages