Skip to content

Latest commit

 

History

History
66 lines (47 loc) · 1.69 KB

README.md

File metadata and controls

66 lines (47 loc) · 1.69 KB

Minimum Edit Distance

The similarity between two strings may be measured in many ways. One of such a string metric is known as the Levenshtein distance, which is a type of edit distance.

The edit distance between two strings is the minimum number of single-character insertions, deletions, or substitutions required to change one string into the other.

The class Levenshtein.py implements the Wagner–Fischer algorithm allowing to pass varying costs for insertion, deletion and substitution.

For more info on Levenshtein distance, refer wiki.

Requirements

  • Python 3.6+
  • No dependencies on any other library

Usage

from Levenshtein import Levenshtein

source = "hello"
target = "world"

l = Levenshtein(source, target, costs=(1, 1, 2))

min_distance = l.distance()
# min_distance = 4

operations = l.edit_ops()
# operations =
# [{'type': 'Substitution', 'i': 0, 'j': 0},
#  {'type': 'Substitution', 'i': 1, 'j': 1},
#  {'type': 'Substitution', 'i': 2, 'j': 2},
#  {'type': 'Match',        'i': 3, 'j': 3},
#  {'type': 'Substitution', 'i': 4, 'j': 4}]

l.print_distance_matrix()
# Distance Matrix:
# -  -  w  o  r  l  d
# -  0  2  4  6  8 10
# h  2  1  3  5  7  9
# e  4  3  2  4  6  8
# l  6  5  4  3  4  6
# l  8  7  6  5  3  5
# o 10  9  7  7  5  4

l.print_edit_ops()
# Edit Operations:
# Type           i  j
# --------------------
# Substitution   0  0
# Substitution   1  1
# Substitution   2  2
# Match          3  3
# Substitution   4  4

Check out Levenshtein.py for more details.

Authors

Mayank Jain

License

MIT