- Initial fork from Harry R. Schwartz's hrs/markov-sentence-generator.
- Samples were removed.
- This file was added.
- README.md was updated.
- Basic first release, this is alpha software, etc.
- Refactored Schwartz's program to avoid global variables.
- Added some (not enough) logging, based on my own
patrick_logger
module. - Wrapped it in a set of routines that allow it meaningfully parse command-line options.
- Refactored again so that it's also usable as a Python module.
- Added a more comprehensive usage message.
- Added ability to store (
-o
) and load (-l
) chains instead of creating them from scratch; this may help when runnng multiple times over the same text(s). - Added ability to add multiple texts with
-i
or--input
.
- Minor tweaks have happened here and there; I keep forgetting to update this HISTORY.md document.
- Added encoding declaration.
- Expanding what counts as punctuation tokens.
Still tweaking punctuation settings.
Pulled regex patterns to the front of the file, as constants. (Well, Python doesn't really have constants. Still.) Still tweaking the "what counts as a word?" question.
(missing notes, partially told in commit history)
Coming back to quick documentation of changes here.
gen_text()
now uses text_handling.multi_replace()
to do its substitutions
- this means that they're now regex-based and that they keep getting applied until nothing produces a change.
- Added the single-char ellipsis to the list of punctuations that count as tokens. Should have done that a while ago.
- Added several dash-related replacements to the final substitution list.
- Added the
-r
/--chars
option, which causes the tokens used to be characters, rather than words.- Still playing around with making this work, but it's basically solid.
- The parser has been rewritten to have an object-oriented interface when it's used as a library.
- Passing around multiple variables for different aspects of the data was getting really unwieldy.
- This makes everything a lot more conceptually simple, too: some ugly code has been eliminated.
- This opens up more conceptually simple options for similar parsers with subclassing with minimal fuss.
- Spoiler: there's going to be a first-pass poetry generator in the not-too-distant future.
- All command-line options are now working except for
-p
/--pause
.- When that's working, the initial v2 will be merged into the master branch.
- Well, that, and documentation has to be updated, too.
- This changes the interface to the unit.
- Everything that depends on the old API will have to be adapted to the new protocol.
- The amount of work required to keep the API consistent makes it not worthwhile. It's easier to adapt other projects to the new calling interface
- Especially because, to the best of my knowledge, no one other than me is using this module.
- Still, at this point, it's mature enough to be tested with AutoLovecraft. Which I'm about to do.
- All of the automatic text blogs are using it now except for Ulysses Redux, which will require a little more effort to rework.
- Well, Ulysses Redux is a bit more complicated a project, after all.
- Several minor bugs have been fixed, both in the text generator and in some of the blogs using it.
- Some Ulysses Redux scripts have been rewritten, as of today:
ch07.py
("Aeolus") -- seems to be workingch10.py
("Wandering Rocks` -- seems to be workingch15.py
("Circe") -- seems to be workingch17.py
("Eumaeus") -- seems to be workinggeneric_chapter.py
, which had its own calling interface changed; the following files that depend on it have been verified still to work:ch14.py
ch16.py
ch18.py
ch01.py
ch02.py
ch03.py
ch04.py
ch05.py
ch06.py
ch08.py
ch09.py
ch11.py
ch12.py
ch13.py
- Fixed the documentation. It should be more or less up to date now. I think.
- Added a series of interfaces to the set of final text-massaging substitutions that are performed on generated text.
TextGenerator.add_final_substitution()
TextGenerator.remove_final_substitution()
TextGenerator.get_final_substitutions()
TextGenerator.set_final_substitutions()
- Wrote first quick draft of a document describing the programming API to the text generator. It needs updating.
- Updated some of the documentation, including the internal documentation to the script itself, which still occasionally referred to
text_generator
assentence_generator
. Whoops. - Generating strings from a
TextGenerator()
object (i.e., anything that winds up funneling to a call to the object's__str__()
method) now takes the possibility of unnamed generators into account with a bit more elegance. - As a convenience, it's now possible to pass just a path to a file to the
train()
method instead of passing a list with one item.
- Incorporating
poetry_generator.py
, which subclassesTextGenerator
to create (a very simplistic)PoemGenerator
.- There's still a lot of work needed here.
- Still, it's a start.
- Samples of output from the developing poetry generator and scripts that call it are available here.
- Refactoring a bit, and setting up Cython builds of the generators.
- Added a number of interfaces to allow for fine-tuning of training parameters. Most notably,
TextGenerator.addItemToTempMapping()
is no longer a static method and requires an instance to call.TextGenerator._build_mapping()
now takes two new optional parameters:learn_starts
(defaultTrue
): whether this particular text should contribute to the underlyingstarts
list of tokens; andweight
(default: 1.0) indicates how much emphasis (relative to other texts that the generator is seeing during its training cycle) this particular text should be given. Must be a positive number.- These two parameters may be useful when doing non-basic training procedures.
TextGenerator.addItemToTempMapping()
also takes aweight
parameter to pass it downwards to_build_mapping()
.- Several object-attribute names have been renamed for the sake of concision.
- Some local variables, too.