This project compares the performance of Horspool's and Boyer-Moore's string matching algorithms.
We test:
- Real-world data using the IMDB Movie Review dataset
- Synthetic data with varying levels of noise
Both accuracy and efficiency are measured.
main.py— Tests on IMDB datamain_synthetic.py— Tests on synthetic datasynthetic_noise_experiment.py— Tests noise on synthetic datahorspool.py— Horspool algorithmboyer_moore.py— Boyer-Moore algorithmdownload_IMDB.py— IMDB dataset downloaderdatasets/— Experiment results and graphs
pip install -r requirements.txtFirst run python3 main.py. This will download the imdb dataset and extract it for you.
This will also run the analysis for the imdb dataset.
# Run IMDB experiments
python main.pyTo run a prelimenary test on synthetic data: run python3 main_synthetic.py.
# Run synthetic experiments
python main_synthetic.pyTo run an analysis for noise on syntheticly generated data: run python3 synthetic_noise_experiment.py.
# Run synthetic noise experiment
python synthetic_noise_experiment.py