https://www.kaggle.com/datasets/aleskis/typescriptfunctions https://www.kaggle.com/datasets/aleskis/javascriptfunctions https://www.kaggle.com/datasets/aleskis/python-functions https://www.kaggle.com/datasets/aleskis/golang-functions possibly investigate using an autoencoder scikit-learn, perhaps?