Skip to content

mohosy/json-recursive-descent-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

JSON Recursive Descent Parser

TypeScript License Node.js Zero Dependencies

A fully-featured JSON parser built from scratch using recursive descent parsing. This project demonstrates compiler fundamentals including lexical analysis (tokenization) and syntactic analysis (parsing) β€” core concepts used in production parsers at companies like Google and Meta.

No external parsing libraries. Just pure TypeScript implementing the RFC 8259 JSON specification.

Why This Project?

Recursive descent parsing is a fundamental technique used in:

  • Compilers (GCC, Clang, V8 JavaScript engine)
  • Interpreters (Python, Ruby)
  • Configuration parsers (JSON, YAML, TOML)
  • Query languages (SQL, GraphQL)

This implementation showcases:

  • Clean separation between lexer and parser
  • Comprehensive error handling with line/column information
  • Full JSON spec compliance including Unicode escapes
  • 100+ test cases verifying correctness

Demo Output

╔══════════════════════════════════════════════════════════════╗
β•‘     JSON Recursive Descent Parser - Demo                     β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Demo 1: Basic JSON Parsing
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Input JSON:
{
  "name": "Mo Shirmohammadi",
  "university": "USC",
  "skills": ["Python", "TypeScript", "React", "Node.js"],
  "seeking_internship": true
}

Parsed Result:
  Name: Mo Shirmohammadi
  Skills: Python, TypeScript, React, Node.js

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Demo 2: Tokenization (Lexical Analysis)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Input: {"count": 42, "active": true}

Tokens:
  1. LEFT_BRACE      | value: "{"
  2. STRING          | value: "count"
  3. COLON           | value: ":"
  4. NUMBER          | value: 42
  5. COMMA           | value: ","
  6. STRING          | value: "active"
  7. COLON           | value: ":"
  8. TRUE            | value: true
  9. RIGHT_BRACE     | value: "}"
  10. EOF            | value: null

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Demo 6: Error Handling
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  Missing value:
    Input: {"key": }
    Error: Unexpected token '}' at line 1, column 9

  Trailing comma:
    Input: {"key": "value",}
    Error: Trailing comma not allowed in JSON at line 1, column 17

  Unquoted key:
    Input: {key: "value"}
    Error: Unexpected identifier 'key' at line 1, column 1

Installation

git clone https://github.com/mohosy/json-recursive-descent-parser.git
cd json-recursive-descent-parser
npm install
npm run build

Usage

Basic Parsing

import { parse } from './dist/index.js';

const data = parse('{"name": "Mo", "age": 21, "skills": ["TypeScript", "Python"]}');
console.log(data.name);   // "Mo"
console.log(data.skills); // ["TypeScript", "Python"]

Tokenization

import { tokenize } from './dist/index.js';

const tokens = tokenize('{"key": 42}');
// Returns array of tokens with type, value, line, and column info

Validation

import { isValid, parseWithError } from './dist/index.js';

console.log(isValid('{"valid": true}'));  // true
console.log(isValid('{invalid}'));        // false

// Get detailed error info
const result = parseWithError('{invalid}');
if (!result.success) {
  console.log(result.error);
  // { message: "...", line: 1, column: 2 }
}

Run Demo

npm run demo

Run Tests

npm test

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         Input String                            β”‚
β”‚                    '{"name": "Mo", "age": 21}'                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                       LEXER (Tokenizer)                         β”‚
β”‚                                                                 β”‚
β”‚  Converts characters into tokens:                               β”‚
β”‚  β€’ Structural: { } [ ] : ,                                      β”‚
β”‚  β€’ Strings: "..." with escape handling                          β”‚
β”‚  β€’ Numbers: integers, decimals, exponents                       β”‚
β”‚  β€’ Keywords: true, false, null                                  β”‚
β”‚                                                                 β”‚
β”‚  Time: O(n)  |  Space: O(n)                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         Token Stream                            β”‚
β”‚  [LEFT_BRACE, STRING("name"), COLON, STRING("Mo"), ...]        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 PARSER (Recursive Descent)                      β”‚
β”‚                                                                 β”‚
β”‚  Grammar rules map to methods:                                  β”‚
β”‚  β€’ parseValue()  β†’ Dispatches to type-specific parsers          β”‚
β”‚  β€’ parseObject() β†’ Handles { key: value, ... }                  β”‚
β”‚  β€’ parseArray()  β†’ Handles [ value, value, ... ]                β”‚
β”‚                                                                 β”‚
β”‚  Time: O(n)  |  Space: O(d) where d = nesting depth             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      JavaScript Value                           β”‚
β”‚             { name: "Mo", age: 21 }                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

File Structure

json-recursive-descent-parser/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ types.ts       # Token types, JSON types, error classes
β”‚   β”œβ”€β”€ lexer.ts       # Tokenizer - converts string to tokens
β”‚   β”œβ”€β”€ parser.ts      # Recursive descent parser
β”‚   β”œβ”€β”€ index.ts       # Public API (parse, tokenize, isValid)
β”‚   β”œβ”€β”€ demo.ts        # Interactive demonstration
β”‚   └── tests/
β”‚       └── parser.test.ts  # Comprehensive test suite
β”œβ”€β”€ package.json
β”œβ”€β”€ tsconfig.json
└── README.md

JSON Grammar

The parser implements this grammar (simplified BNF):

json    β†’ value
value   β†’ object | array | STRING | NUMBER | "true" | "false" | "null"
object  β†’ "{" (pair ("," pair)*)? "}"
pair    β†’ STRING ":" value
array   β†’ "[" (value ("," value)*)? "]"

Each grammar rule maps directly to a method in the parser:

  • parseValue() β†’ Handles the value rule
  • parseObject() β†’ Handles the object rule
  • parseArray() β†’ Handles the array rule

This 1:1 mapping between grammar and code is what makes recursive descent parsing so elegant and maintainable.

Features

Full JSON Specification Support

  • βœ… Objects with string keys
  • βœ… Arrays with mixed types
  • βœ… Strings with all escape sequences (\n, \t, \\, \", \/, \b, \f, \r)
  • βœ… Unicode escapes (\uXXXX)
  • βœ… Numbers (integers, decimals, scientific notation)
  • βœ… Booleans (true, false)
  • βœ… Null values

Error Handling

  • βœ… Precise line and column numbers
  • βœ… Descriptive error messages
  • βœ… Trailing comma detection
  • βœ… Invalid escape sequence detection
  • βœ… Unterminated string detection
  • βœ… Invalid number format detection

Developer Experience

  • βœ… TypeScript with full type definitions
  • βœ… Zero runtime dependencies
  • βœ… Separate lexer/parser for debugging
  • βœ… Validation utilities

Complexity Analysis

Operation Time Space
Tokenize O(n) O(n)
Parse O(n) O(d)
Overall O(n) O(n)

Where:

  • n = input string length
  • d = maximum nesting depth

Why Recursive Descent?

Pros:

  • Simple and intuitive β€” grammar rules map directly to code
  • Easy to debug β€” call stack shows parse state
  • Excellent error messages β€” you know exactly where you are
  • No external tools needed β€” just code

Cons:

  • Not suitable for left-recursive grammars (not an issue for JSON)
  • Stack depth limited by nesting (handles 1000+ levels easily)

Alternatives:

  • LL(k) parser generators (ANTLR) β€” more powerful but require external tools
  • LR parsers (Bison, Yacc) β€” handle more grammars but harder to understand
  • PEG parsers β€” similar to recursive descent with memoization

For JSON's simple grammar, recursive descent is the perfect choice.

Running the Project

# Install dependencies
npm install

# Build TypeScript
npm run build

# Run interactive demo
npm run demo

# Run test suite
npm test

Author

Mo Shirmohammadi

  • GitHub: @mohosy
  • University: University of Southern California (USC)
  • Major: Computer Science

License

MIT License - feel free to use this code for learning, interviews, or building your own parsers!


Built from scratch to demonstrate compiler fundamentals. No parsing libraries were harmed in the making of this project.

About

JSON parser with tokenizer and recursive descent parsing. Zero dependencies.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors