Skip to content

Latest commit

 

History

History
59 lines (45 loc) · 4.99 KB

PROGRAMMING.md

File metadata and controls

59 lines (45 loc) · 4.99 KB

Programming the Markov Sentence Generator

This document describes how to use Patrick Mooney's Markov chain-based sentence generator from Python 3.X code. Python 2.X is not supported. (Python 3.X is now more than half as old as Python itself.) If what you're looking for is instructions for using the Markov chain-based sentence generator from your terminal rather than from Python code, you're reading the wrong document: you should look at the README file instead.

Overview

text_generator.py is a Python module that exposes a TextGenerator() object. In order to generate text with it, you need to ...

  1. Import the module, e.g. with import text_generator as tg.
  2. Instantiate a TextGenerator object, e.g. with genny = tg.TextGenerator().
    1. If it's convenient for you, you can pass a name to the generator's creation procedure by doing something like genny = tg.TextGenerator(name="MyTextGenerator"); this does nothing except cause the name to be printed if the generator object itself is passed to a procedure that creates a print representation.
  3. Train the object on a sample text (or multiple texts), which it will model and then use as the basis for creating text, e.g. with genny.train(['/path/to/file.txt', '/path/to/another/file.txt']).
    1. If you're just training the generator on a single file, you need not wrap the pathname in a list.
    2. If you prefer, you can instead pass this file or list of files as the training_texts parameter when creating the object, as so: genny = tg.TextGenerator(name="AwesomeTextGenerator", training_texts=['/path/to/a/text'])
    3. You can pass other arguments that wind up going to the train() method to the init code for the object, e.g. by doing something like genny = tg.TextGenerator(name="MyTextGenerator", training_texts='/path/to/file', markov_length=3).
  4. Use the generator to produce some new text, e.g. with genny.print_text(sentences_desired=8)
    1. There are other ways to generate text than just printing it to the terminal:
      • a_string = genny.gen_html_frag(sentences_desired=8, paragraph_break_probability=0) will generate text wrapped with HTML <p> ... </p> tags (though this option does not cause a complete, formally valid HTML document to be generated).
      • a_string = genny.gen_text(sentences_desired=8, paragraph_break_probability=0.125) will generate some text and store it in a_string.

The TextGenerator object is intentionally designed to be easily controllable by overriding its methods. Here's a list of methods that might be useful to override:

TextGenerator.comparison_form()
A function that normalizes tokens for comparison purposes. The default function makes no changes at all (i.e., tokens are compared with no preprocessing). But tokens could in theory be compared in any number of ways, including by normalizing capitalization; there's an included fix_caps token comparison function that was written by Harry R. Schwartz in his older version of the Markov-based text generator; I myself have never used it (and suspect it might not quite do what he thinks it does; see comments in the code for more details), but it's there if you want it.
TextGenerator._printer()
A function responsible for printing generated text directly to the console. Override this function to change the details of how text is printed. An overridden version of this function will need to take the same arguments as this function does (or at least consume them, e.g. by =using a *pargs/**kwargs argument-consuming syntax).

For an example of a simple class that overrides TextGenerator() productively, take a look at poetry_generator.py.

You can (of course!) use help(tg) or dir(tg) to explore the built-in documentation for the module.

Using with Cython

text_generator.py and poetry_generator.py get a big performance boost when compiled with Cython, and improving performance with Cython is a long-term goal of this project. My own projects tend to use Cython-compiled versions of the text generators to save time and memory, in any case.

Once Cython and a C compiler are set up, setup_tg.py and setup_pg.py can be used to compile faster versions of the modules as static, compiled libraries using, for instance,

python3 setup_tg.py build_ext --inplace