CLI formatting, docs.

davidmcclure · davidmcclure · commit 7bf3dff28a57 · 2015-05-02T14:27:25.000-07:00
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,5 @@
 *.pyc
 *.egg-info
 env
+build
+dist
diff --git a/README.md b/README.md
@@ -43,29 +43,49 @@ Texplot is a little program that turns a document into a network of terms that a
 
 ## Generating graphs
 
-The easiest way to build out a graph is to use the `frequent` function, which wraps up all the intermediate steps of tokenizing the text, computing the term distance matrix, generating the per-word topic lists, etc. (Or, use the `clumpy` function, which tries to pick words that concentrate really tightly in specific parts of the text). First, spin up a virtualenv:
+The easiest way to build out a graph is to use the `textplot` executable, which wraps up all the intermediate steps of tokenizing the text, estimating probability densities for the words, and indexing the distance matrix.
+
+First, install Textplot via PyPI:
+
+`pip3 install textplot`
+
+Or, clone the repo and install the package manually:
 
 ```bash
-virtualenv env
+pyvenv env
 . env/bin/activate
-pip install -r requirements.txt
+pip3 install -r requirements.txt
+python3 setup.py install
 ```
 
+Then, generate graphs with:
+
+`texplot generate [] []`
+
+
+
+
+
 Then, fire up an IPython terminal and build a network:
 
 ```bash
-In [1]: from textplot import frequent
+In [1]: from textplot.helpers import build_graph
+
+In [2]: g = build_graph('../texts/war-and-peace.txt')
+
+Tokenizing text...
+Extracted 573064 tokens
 
-In [2]: g = frequent('path/to/file.txt')
 Indexing terms:
-[############################### ] 140000/140185 - 00:00:03
+[################################] 124750/124750 - 00:00:06
+
 Generating graph:
-[################################] 530/530 - 00:00:00
+[################################] 500/500 - 00:00:04
 
-In [3]: g.write_gml('path/to/file.gml')
+In [3]: g.write_gml('war-and-peace.gml')
 ```
 
-The `frequent` function takes these arguments:
+The `build_graph` function takes these arguments:
 
 - **(int) `term_depth=500`** - The number of terms to include in the network. Right now, the code just rakes the top X most frequent terms, after stopwords are removed.
 
diff --git a/textplot/helpers.py b/textplot/helpers.py
@@ -24,20 +24,20 @@ def build_graph(path, term_depth=500, skim_depth=10,
     """
 
     # Tokenize text.
-    click.echo('Tokenizing text...')
+    click.echo('\nTokenizing text...')
     t = Text.from_file(path)
     click.echo('Extracted %d tokens' % len(t.tokens))
 
     m = Matrix()
 
     # Index the term matrix.
-    click.echo('Indexing terms:')
+    click.echo('\nIndexing terms:')
     m.index(t, t.most_frequent_terms(term_depth), **kwargs)
 
     g = Skimmer()
 
     # Construct the network.
-    click.echo('Generating graph:')
+    click.echo('\nGenerating graph:')
     g.build(t, m, skim_depth, d_weights)
 
     return g

-Original file line number
+Diff line change
 *.pyc
 *.egg-info
 env
 +build
 +dist