Skip to content

Latest commit

 

History

History
12 lines (8 loc) · 359 Bytes

README.md

File metadata and controls

12 lines (8 loc) · 359 Bytes

learn-p4k

Text mining on album reviews from http://pitchfork.com

Run 'scrapy crawl p4k' from src/crawl_p4k to generate the corpus. It will be a file named p4k-all.json in data/p4k. See README-P4K.md in data/p4k for more information about this corpus.

See iodemo.py in demo/ for an example of how to read the corpus and extract basic statistics.