Skip to content

Sahil-1918912/toyDB

Repository files navigation

ToyDB Enhancements

This project extends the provided toyDB system, which includes:

  • PF (Paged File) Layer for fixed-size page management
  • AM (Access Method) Layer implementing B+ tree indexing

Documentation (pf.ps, am.ps) and sample datasets (data.tar.gz)

Assignment Requirements

1. Page Buffering (PF Layer)

Implement a buffer manager with:

  • Configurable buffer pool size
  • LRU and MRU replacement strategies (selectable at file-open)
  • Dirty-page tracking and explicit "mark dirty" API
  • Statistics: logical reads/writes, physical I/O
  • Plot showing performance under different read/write mixtures

2. Slotted Page Structure

Build a slotted-page mechanism on top of PF to support:

  • Variable-length record storage
  • Insert, delete, sequential scan
  • Performance comparison with static (fixed-size) record storage
  • Table summarizing space utilization and performance

3. B+ Tree Index Construction (AM Layer)

Evaluate indexing on the Student file using roll number as key with:

  • Bulk index creation on an existing file
  • Incremental record-by-record index construction
  • Optimized bulk-loading for sorted files
  • Compare page accesses, build time, and query performance across all three methods.

Implementation Summary

Completed Features

1. Enhanced Page Buffering (PF Layer)

  • Configurable buffer pool size: PF_SetBufferPoolSize(int size)
  • LRU replacement strategy: Pages moved to head, evicted from tail
  • MRU replacement strategy: Pages moved to tail, evicted from head
  • Strategy selection: PF_OpenFileWithStrategy(fname, strategy)
  • Explicit mark dirty: PF_MarkDirty(fd, pagenum)
  • Statistics collection: PF_GetStatistics() tracks logical/physical I/O

2. Slotted-Page Structure

  • Variable-length record storage (reclayer/slotted_page.c)
  • Insert, delete, and sequential scan operations
  • Automatic page compaction for space efficiency
  • Static record management for comparison (reclayer/static_record.c)

3. B+ Tree Index Construction (AM Layer)

  • Bulk index creation: AM_BulkCreateIndex() - reads all records, sorts, builds index
  • Incremental construction: AM_IncrementalBuildIndex() - uses existing AM_InsertEntry
  • Bulk-loading: AM_BulkLoadIndex() - optimized bottom-up construction for sorted data

Compilation Instructions

Prerequisites

  • C compiler (cc/gcc)
  • Make utility
  • Unix-like environment (Linux/Unix/WSL)

Step 1: Build PF Layer

cd toydb/pflayer
make

This creates pflayer.o which is used by the AM layer.

Step 2: Build AM Layer

cd ../amlayer
make

This creates amlayer.o and test executables.

Step 3: Build Record Layer (Optional)

cd ../../reclayer
cc -c slotted_page.c -I../toydb/pflayer
cc -c static_record.c -I../toydb/pflayer

Step 4: Build Index Construction Utilities

cd ../toydb/amlayer
cc -c index_build.c -I.

Running Tests

Test Buffer Pool with Different Strategies

cd tests
make test_buffer_pool
./test_buffer_pool

Test Slotted-Page vs Static Records

cd tests
make test_slotted_vs_static
./test_slotted_vs_static

Test Index Construction Methods

cd tests
make test_index_construction
./test_index_construction

Test Results

PF Layer (Paged File Layer)

Basic Functionality Test (test_basic)

Tests core PF layer features including buffer pool configuration, LRU/MRU strategies, and statistics collection.

Basic Test

Buffer Pool Performance Test (test_buffer_pool)

Tests buffer pool behavior with different read/write mixtures and replacement strategies.

Buffer Pool Test

PF Layer Test (testpf)

Original PF layer test demonstrating page allocation, deallocation, and buffer management.

PF Layer Test

PF Layer Test (testhash)

Tests hash-based page lookup and collision handling along with page allocation, deallocation, and buffer manager behavior.

PF Layer Test

Record Layer (Reclayer)

Slotted-Page vs Static Record Comparison (test_slotted_vs_static)

Compares variable-length slotted-page structure with fixed-size static record management. Shows space utilization and performance metrics.

Slotted vs Static Test

AM Layer (Access Method Layer)

Test 1 (test1)

Tests simple index insertion and scans on character and integer fields.

AM Test 1

Test 2 (test2)

Tests advanced index operations and scan functionality.

AM Test 2

Test 3 (test3)

Tests complex index queries and range scans.

AM Test 3

Index Construction Methods (test_index_construction)

Compares three index construction methods:

  • Incremental index construction
  • Bulk index creation
  • Optimized bulk-loading for sorted data

Shows build time and page access comparisons.

Index Construction Test

Performance Testing

Buffer Pool Performance

Buffer Pool Performance

Instructions Construction Comparison

Index Construction Comparison

Storage Utilization Comparison

Storage Utilization Comparison

File Structure

.
├── README.md             
├── data/               
├── reclayer/            
│   ├── slotted_page.c
│   └── static_record.c
├── toydb/
│   ├── pflayer/         
│   │   ├── pf.c
│   │   ├── buf.c
│   │   ├── hash.c
│   │   ├── pf.h
│   │   └── Makefile
│   └── amlayer/         
│       ├── am.c
│       ├── aminsert.c
│       ├── amsearch.c
│       ├── index_build.c
│       └── Makefile
├── tests/                
├── resources/           
└── screenshots/      

API Usage Examples

Buffer Pool Configuration

PF_Init();
PF_SetBufferPoolSize(50);  
int fd = PF_OpenFileWithStrategy("datafile", PF_REPLACE_LRU);
int fd2 = PF_OpenFileWithStrategy("datafile2", PF_REPLACE_MRU);

Statistics Collection

PFstats stats;
PF_GetStatistics(&stats);
printf("Logical reads: %ld\n", stats.logical_reads);
printf("Physical reads: %ld\n", stats.physical_reads);

Slotted-Page Operations

char pagebuf[PF_PAGE_SIZE];
SP_InitPage(pagebuf);
short slot_id;
SP_InsertRecord(pagebuf, record_data, record_len, &slot_id);
SP_GetRecord(pagebuf, slot_id, &data, &len);
SP_DeleteRecord(pagebuf, slot_id);

Index Construction

// Bulk creation
AM_BulkCreateIndex("student", 0, 'i', sizeof(int), read_func, context);

// Incremental construction
int page_accesses;
AM_IncrementalBuildIndex("student", 0, 'i', sizeof(int), 
                         read_func, context, &page_accesses);

// Bulk-loading (pre-sorted data)
AM_BulkLoadIndex("student", 0, 'i', sizeof(int), sorted_records, num_records);

Conclusion

The project implemented and evaluated several key enhancements to the toyDB system, focusing on buffer management, variable-length record storage, and B+ tree index construction. The work demonstrates measurable benefits from an explicit buffer manager with configurable replacement policies, a slotted-page record layer for efficient variable-length storage, and multiple index-building strategies with clear trade-offs in build time and page accesses.

Key takeaways

  • A configurable buffer pool with LRU/MRU policies and dirty-page tracking provides predictable I/O behavior and useful statistics for tuning.
  • Slotted-page organization supports variable-length records with compaction and outperforms static layouts in space utilization for datasets with variable-sized fields.
  • Different B+ tree construction methods (incremental, bulk create, optimized bulk-load) exhibit distinct advantages: incremental is simple, bulk create helps when sorting is feasible, and bulk-load is best for already-sorted input.
  • Empirical plots and test programs validate the implementation and quantify trade-offs across read/write mixes and index construction methods.

Future work

  • Add support for multi-threaded concurrency control and page pinning semantics.
  • Implement more replacement policies (CLOCK, LFU) and refine adaptive strategies.
  • Extend the AM layer to support secondary indexes, composite keys, and concurrency-safe index updates.
  • Automate benchmarking harnesses to run reproducible experiments across varying dataset sizes and distributions.

Overall, the enhancements make toyDB a stronger foundation for exploring storage and indexing design decisions and provide a practical platform for further database systems experimentation.

Contributors

  1. Sahil Narkhede - B23CS1060
  2. Anshit Agarwal - B23CS1087

About

A customized database management system for the Database Systems [CSL3050] course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •