A fully-featured JSON parser built from scratch using recursive descent parsing. This project demonstrates compiler fundamentals including lexical analysis (tokenization) and syntactic analysis (parsing) β core concepts used in production parsers at companies like Google and Meta.
No external parsing libraries. Just pure TypeScript implementing the RFC 8259 JSON specification.
Recursive descent parsing is a fundamental technique used in:
- Compilers (GCC, Clang, V8 JavaScript engine)
- Interpreters (Python, Ruby)
- Configuration parsers (JSON, YAML, TOML)
- Query languages (SQL, GraphQL)
This implementation showcases:
- Clean separation between lexer and parser
- Comprehensive error handling with line/column information
- Full JSON spec compliance including Unicode escapes
- 100+ test cases verifying correctness
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β JSON Recursive Descent Parser - Demo β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Demo 1: Basic JSON Parsing
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Input JSON:
{
"name": "Mo Shirmohammadi",
"university": "USC",
"skills": ["Python", "TypeScript", "React", "Node.js"],
"seeking_internship": true
}
Parsed Result:
Name: Mo Shirmohammadi
Skills: Python, TypeScript, React, Node.js
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Demo 2: Tokenization (Lexical Analysis)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Input: {"count": 42, "active": true}
Tokens:
1. LEFT_BRACE | value: "{"
2. STRING | value: "count"
3. COLON | value: ":"
4. NUMBER | value: 42
5. COMMA | value: ","
6. STRING | value: "active"
7. COLON | value: ":"
8. TRUE | value: true
9. RIGHT_BRACE | value: "}"
10. EOF | value: null
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Demo 6: Error Handling
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Missing value:
Input: {"key": }
Error: Unexpected token '}' at line 1, column 9
Trailing comma:
Input: {"key": "value",}
Error: Trailing comma not allowed in JSON at line 1, column 17
Unquoted key:
Input: {key: "value"}
Error: Unexpected identifier 'key' at line 1, column 1
git clone https://github.com/mohosy/json-recursive-descent-parser.git
cd json-recursive-descent-parser
npm install
npm run buildimport { parse } from './dist/index.js';
const data = parse('{"name": "Mo", "age": 21, "skills": ["TypeScript", "Python"]}');
console.log(data.name); // "Mo"
console.log(data.skills); // ["TypeScript", "Python"]import { tokenize } from './dist/index.js';
const tokens = tokenize('{"key": 42}');
// Returns array of tokens with type, value, line, and column infoimport { isValid, parseWithError } from './dist/index.js';
console.log(isValid('{"valid": true}')); // true
console.log(isValid('{invalid}')); // false
// Get detailed error info
const result = parseWithError('{invalid}');
if (!result.success) {
console.log(result.error);
// { message: "...", line: 1, column: 2 }
}npm run demonpm testβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Input String β
β '{"name": "Mo", "age": 21}' β
βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LEXER (Tokenizer) β
β β
β Converts characters into tokens: β
β β’ Structural: { } [ ] : , β
β β’ Strings: "..." with escape handling β
β β’ Numbers: integers, decimals, exponents β
β β’ Keywords: true, false, null β
β β
β Time: O(n) | Space: O(n) β
βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Token Stream β
β [LEFT_BRACE, STRING("name"), COLON, STRING("Mo"), ...] β
βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PARSER (Recursive Descent) β
β β
β Grammar rules map to methods: β
β β’ parseValue() β Dispatches to type-specific parsers β
β β’ parseObject() β Handles { key: value, ... } β
β β’ parseArray() β Handles [ value, value, ... ] β
β β
β Time: O(n) | Space: O(d) where d = nesting depth β
βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β JavaScript Value β
β { name: "Mo", age: 21 } β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
json-recursive-descent-parser/
βββ src/
β βββ types.ts # Token types, JSON types, error classes
β βββ lexer.ts # Tokenizer - converts string to tokens
β βββ parser.ts # Recursive descent parser
β βββ index.ts # Public API (parse, tokenize, isValid)
β βββ demo.ts # Interactive demonstration
β βββ tests/
β βββ parser.test.ts # Comprehensive test suite
βββ package.json
βββ tsconfig.json
βββ README.md
The parser implements this grammar (simplified BNF):
json β value
value β object | array | STRING | NUMBER | "true" | "false" | "null"
object β "{" (pair ("," pair)*)? "}"
pair β STRING ":" value
array β "[" (value ("," value)*)? "]"
Each grammar rule maps directly to a method in the parser:
parseValue()β Handles thevalueruleparseObject()β Handles theobjectruleparseArray()β Handles thearrayrule
This 1:1 mapping between grammar and code is what makes recursive descent parsing so elegant and maintainable.
- β Objects with string keys
- β Arrays with mixed types
- β
Strings with all escape sequences (
\n,\t,\\,\",\/,\b,\f,\r) - β
Unicode escapes (
\uXXXX) - β Numbers (integers, decimals, scientific notation)
- β
Booleans (
true,false) - β Null values
- β Precise line and column numbers
- β Descriptive error messages
- β Trailing comma detection
- β Invalid escape sequence detection
- β Unterminated string detection
- β Invalid number format detection
- β TypeScript with full type definitions
- β Zero runtime dependencies
- β Separate lexer/parser for debugging
- β Validation utilities
| Operation | Time | Space |
|---|---|---|
| Tokenize | O(n) | O(n) |
| Parse | O(n) | O(d) |
| Overall | O(n) | O(n) |
Where:
n= input string lengthd= maximum nesting depth
Pros:
- Simple and intuitive β grammar rules map directly to code
- Easy to debug β call stack shows parse state
- Excellent error messages β you know exactly where you are
- No external tools needed β just code
Cons:
- Not suitable for left-recursive grammars (not an issue for JSON)
- Stack depth limited by nesting (handles 1000+ levels easily)
Alternatives:
- LL(k) parser generators (ANTLR) β more powerful but require external tools
- LR parsers (Bison, Yacc) β handle more grammars but harder to understand
- PEG parsers β similar to recursive descent with memoization
For JSON's simple grammar, recursive descent is the perfect choice.
# Install dependencies
npm install
# Build TypeScript
npm run build
# Run interactive demo
npm run demo
# Run test suite
npm testMo Shirmohammadi
- GitHub: @mohosy
- University: University of Southern California (USC)
- Major: Computer Science
MIT License - feel free to use this code for learning, interviews, or building your own parsers!
Built from scratch to demonstrate compiler fundamentals. No parsing libraries were harmed in the making of this project.