Skip to content

Commit 7bf3dff

Browse files
committed
CLI formatting, docs.
1 parent a3d78a3 commit 7bf3dff

File tree

3 files changed

+34
-12
lines changed

3 files changed

+34
-12
lines changed

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
*.pyc
22
*.egg-info
33
env
4+
build
5+
dist

README.md

+29-9
Original file line numberDiff line numberDiff line change
@@ -43,29 +43,49 @@ Texplot is a little program that turns a document into a network of terms that a
4343

4444
## Generating graphs
4545

46-
The easiest way to build out a graph is to use the `frequent` function, which wraps up all the intermediate steps of tokenizing the text, computing the term distance matrix, generating the per-word topic lists, etc. (Or, use the `clumpy` function, which tries to pick words that concentrate really tightly in specific parts of the text). First, spin up a virtualenv:
46+
The easiest way to build out a graph is to use the `textplot` executable, which wraps up all the intermediate steps of tokenizing the text, estimating probability densities for the words, and indexing the distance matrix.
47+
48+
First, install Textplot via PyPI:
49+
50+
`pip3 install textplot`
51+
52+
Or, clone the repo and install the package manually:
4753

4854
```bash
49-
virtualenv env
55+
pyvenv env
5056
. env/bin/activate
51-
pip install -r requirements.txt
57+
pip3 install -r requirements.txt
58+
python3 setup.py install
5259
```
5360

61+
Then, generate graphs with:
62+
63+
`texplot generate [] []`
64+
65+
66+
67+
68+
5469
Then, fire up an IPython terminal and build a network:
5570

5671
```bash
57-
In [1]: from textplot import frequent
72+
In [1]: from textplot.helpers import build_graph
73+
74+
In [2]: g = build_graph('../texts/war-and-peace.txt')
75+
76+
Tokenizing text...
77+
Extracted 573064 tokens
5878

59-
In [2]: g = frequent('path/to/file.txt')
6079
Indexing terms:
61-
[############################### ] 140000/140185 - 00:00:03
80+
[################################] 124750/124750 - 00:00:06
81+
6282
Generating graph:
63-
[################################] 530/530 - 00:00:00
83+
[################################] 500/500 - 00:00:04
6484

65-
In [3]: g.write_gml('path/to/file.gml')
85+
In [3]: g.write_gml('war-and-peace.gml')
6686
```
6787
68-
The `frequent` function takes these arguments:
88+
The `build_graph` function takes these arguments:
6989
7090
- **(int) `term_depth=500`** - The number of terms to include in the network. Right now, the code just rakes the top X most frequent terms, after stopwords are removed.
7191

textplot/helpers.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -24,20 +24,20 @@ def build_graph(path, term_depth=500, skim_depth=10,
2424
"""
2525

2626
# Tokenize text.
27-
click.echo('Tokenizing text...')
27+
click.echo('\nTokenizing text...')
2828
t = Text.from_file(path)
2929
click.echo('Extracted %d tokens' % len(t.tokens))
3030

3131
m = Matrix()
3232

3333
# Index the term matrix.
34-
click.echo('Indexing terms:')
34+
click.echo('\nIndexing terms:')
3535
m.index(t, t.most_frequent_terms(term_depth), **kwargs)
3636

3737
g = Skimmer()
3838

3939
# Construct the network.
40-
click.echo('Generating graph:')
40+
click.echo('\nGenerating graph:')
4141
g.build(t, m, skim_depth, d_weights)
4242

4343
return g

0 commit comments

Comments
 (0)