A simple console / WPF application to parse and analyze PDF files using the UglyToad.PdfPig library.
- ✅ Reads a PDF file specified by the user
- ✅ Extracts words, cleans punctuation, and normalizes text
- ✅ Counts the frequency of each word (case-insensitive)
- ✅ Displays a sorted frequency list in the console
- ✅ Analyzes word frequencies with or without a stop-word list
- ✅ Analyzes only selected words specified by the user
- ✅ Exports selected word frequencies to JSON report
- ✅ Interactive console menu with options to analyze & export
Word frequency:
lorem — 42
ipsum — 39
dolor — 35
...
Processing completed.
{
"lorem": 42,
"ipsum": 39,
"dolor": 35
}dotnet run- Analyze and show all word frequencies in a PDF file
- Analyze and show only selected words in a PDF file
- Analyze word frequencies without predefined stop words
- Export the latest result to JSON
- Exit the application
This is a working prototype focused on word frequency analysis.
The codebase is modular and easily extensible for future features.
- Export to CSV or Excel formats
- Improved error handling and logging
Developed as a practical C# project to demonstrate PDF parsing, user interaction, and modular architecture.