Skip to content

maximtrp/tmplot

Repository files navigation

tmplot

Codacy coverage Codacy grade GitHub Workflow Status Documentation Status Downloads PyPI Issues

tmplot is a comprehensive Python package for topic modeling analysis and visualization. Built for data scientists and researchers, it provides powerful interactive reports and advanced analytics that extend beyond traditional LDAvis/pyLDAvis capabilities.

AnalyzeVisualizeCompare multiple topic models with ease

Plots

Key Features

Interactive Visualization

  • Topic scatter plots with customizable coordinates and sizing
  • Term probability charts with relevance weighting
  • Document analysis showing top documents per topic
  • Interactive reports with real-time parameter adjustment

Advanced Analytics

  • Topic stability analysis across multiple model runs
  • Model comparison with sophisticated distance metrics
  • Saliency calculations for term importance
  • Entropy metrics for model optimization

Model Support

  • tomotopy: LDAModel, LLDAModel, CTModel, DMRModel, HDPModel, PTModel, SLDAModel, GDMRModel
  • gensim: LdaModel, LdaMulticore
  • bitermplus: BTM

Distance Metrics

  • Kullback-Leibler (symmetric & non-symmetric)
  • Jensen-Shannon divergence
  • Jeffrey's divergence
  • Hellinger & Bhattacharyya distances
  • Total variation distance
  • Jaccard index

Dimensionality Reduction

t-SNE, SpectralEmbedding, MDS, LocallyLinearEmbedding, Isomap

Donate

If you find this package useful, please consider donating any amount of money. This will help me spend more time on supporting open-source software.

Buy Me A Coffee

Quick Start

Installation

# From PyPI (recommended)
pip install tmplot

# Development version
pip install git+https://github.com/maximtrp/tmplot.git

Basic Usage

import tmplot as tmp

# Load your topic model and documents
model = your_fitted_model  # tomotopy, gensim, or bitermplus
docs = your_documents

# Create interactive report
tmp.report(model, docs=docs)

# Or create individual visualizations
coords = tmp.prepare_coords(model)
tmp.plot_scatter_topics(coords, size_col='size')

Advanced Examples

Get Stable Topics

import tmplot as tmp

# Find stable topics across multiple models
models = [model1, model2, model3, model4]
closest_topics, distances = tmp.get_closest_topics(models)
stable_topics, stable_distances = tmp.get_stable_topics(closest_topics, distances)

Analyze Model

# Calculate entropy for model selection
entropy_score = tmp.entropy(phi_matrix)

# Analyze topic stability
saliency = tmp.get_salient_terms(phi, theta)

Visualize

# Create topic distance matrix with different metrics
topic_dists = tmp.get_topics_dist(phi, method='jensen-shannon')

# Generate coordinates with custom algorithm
coords = tmp.get_topics_scatter(topic_dists, theta, method='tsne')
tmp.plot_scatter_topics(coords, topic=3)  # Highlight topic 3

Documentation & Examples

Requirements

Core dependencies: numpy, scipy, scikit-learn, pandas, altair, ipywidgets

Optional models: tomotopy, gensim, bitermplus