-
Notifications
You must be signed in to change notification settings - Fork 0
Term Document Matrix
github-actions[bot] edited this page Nov 16, 2025
·
1 revision
In a Term-Document Matrix (TDM) or Document-Term Matrix (DTM):
- Rows represent terms (words or sometimes phrases).
- Columns represent documents.
- Entries in the matrix indicate the frequency (or sometimes weighted frequency) of a term in a particular document.
For example, consider you have the following three documents:
- "I love apples."
- "Apples are sweet."
- "I love sweet food."
A simple Term-Document Matrix might look something like:
| Doc1 | Doc2 | Doc3 | |
|---|---|---|---|
| I | 1 | 0 | 1 |
| love | 1 | 0 | 1 |
| apples | 1 | 1 | 0 |
| are | 0 | 1 | 0 |
| sweet | 0 | 1 | 1 |
| food | 0 | 0 | 1 |
In this matrix:
- A row represents a term.
- A column represents a document.
- The value at a given cell (term, document) represents the frequency of that term in the document.
A common enhancement to raw frequencies in a TDM is the use of TF-IDF (Term-Frequency-Inverse-Document-Frequency). TF-IDF weighs terms by their importance, giving higher weights to terms that are frequent in a specific document but not across many documents in the corpus.