Skip to content

Commit 31976b3

Browse files
authored
Merge pull request #3442 from programminghistorian/Issue-3441
Issue-3441-cleaning-assets
2 parents ca98ac8 + 55bea08 commit 31976b3

File tree

48 files changed

+56
-10195
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+56
-10195
lines changed
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

assets/scissorsandpaste-master.zip

-12.5 MB
Binary file not shown.

assets/sentiment-analysis-syuzhet/galdos_miau.txt

-10,138
This file was deleted.
File renamed without changes.

en/lessons/cleaning-data-with-openrefine.md

+2-3
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,6 @@ doi: 10.46430/phen0023
3030

3131

3232

33-
3433
## Lesson goals
3534

3635
Don’t take your data at face value. That is the key message of this
@@ -144,7 +143,7 @@ as creating [Linked Data][].
144143
OpenRefine works on all platforms: Windows, Mac, and Linux. *OpenRefine*
145144
will open in your browser, but it is important to realise that the
146145
application is run locally and that your data won't be stored online.
147-
The data files are archived on the Programming Historian site: as [phm-collection][]. Please download the
146+
The data files are archived on the Programming Historian site as [phm-collection][]. Please download the
148147
*phm-collection.tsv* file before continuing.
149148

150149
On the *OpenRefine* start page, create a new project using the
@@ -413,7 +412,7 @@ the case you have made an error.
413412
[Controlled vocabulary]: http://en.wikipedia.org/wiki/Controlled_vocabulary
414413
[Linked Data]: http://en.wikipedia.org/wiki/Linked_data
415414
[Download OpenRefine]: https://openrefine.org/download
416-
[phm-collection]: /assets/phm-collection.tsv
415+
[phm-collection]: /assets/cleaning-data-with-openrefine/phm-collection.tsv
417416
[Powerhouse Museum Website]: /images/powerhouseScreenshot.png
418417
[facet]: http://en.wikipedia.org/wiki/Faceted_search
419418
[Screenshot of OpenRefine Example]: /images/overviewOfSomeClusters.png

en/lessons/extracting-keywords.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ The lesson touches on Regular Expressions, so some readers may find it handy to
5858

5959
The first step of this process is to take a look at the data that we will be using in the lesson. As mentioned, the data includes biographical details of approximately 6,692 graduates who began study at the University of Oxford in the early seventeenth century.
6060

61-
[The\_Dataset\_-\_Alumni_Oxonienses-Jas1.csv](/assets/The_Dataset_-_Alumni_Oxonienses-Jas1.csv) (1.4MB)
61+
[The\_Dataset\_-\_Alumni_Oxonienses-Jas1.csv](/assets/extracting-keywords/The_Dataset_-_Alumni_Oxonienses-Jas1.csv) (1.4MB)
6262

6363
{% include figure.html filename="extracting-keywords-1.png" caption="Screenshot of the first forty entries in the dataset" %}
6464

@@ -378,7 +378,7 @@ Before you re-run your Python code, you'll have to update your `texts.txt` file
378378

379379
I'd challenge you to make a few refinements to your gazetteer before moving ahead, just to make sure you have the hang of it.
380380

381-
Once you are happy with that, you can snag my [completed list of English and Welsh counties, shortforms, and various other cities (London, Bristol etc) and places (Jersey, Ireland, etc)](/assets/extracting-keywords-final-gazetteer.txt). My completed list contains 157 entries, and should get you all of the entries that can be extracted from the texts in this collection.
381+
Once you are happy with that, you can snag my [completed list of English and Welsh counties, shortforms, and various other cities (London, Bristol etc) and places (Jersey, Ireland, etc)](/assets/extracting-keywords/extracting-keywords-final-gazetteer.txt). My completed list contains 157 entries, and should get you all of the entries that can be extracted from the texts in this collection.
382382

383383
At this point you could stop, as you've achieved what you set out to do. This lesson taught you how to use a short Python program to search a fairly large number of texts for a set of keywords defined by you.
384384

en/lessons/from-html-to-list-of-words-1.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -259,4 +259,4 @@ that’s ok!
259259
[Manipulating Strings in Python]: /lessons/manipulating-strings-in-python
260260
[Code Reuse and Modularity]: /lessons/code-reuse-and-modularity
261261
[zip]: /assets/python-lessons2.zip
262-
[obo-t17800628-33.html]: /assets/obo-t17800628-33.html
262+
[obo-t17800628-33.html]: /assets/from-html-to-list-of-words-1/obo-t17800628-33.html

en/lessons/generating-an-ordered-data-set-from-an-OCR-text-file.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -221,7 +221,7 @@ def rom2ar(rom):
221221

222222
return result
223223
```
224-
(run <[this little script](/assets/Roman_to_Arabic.txt)> to see in detail how `rome2ar` works. Elegant programming like this can offer insight; like poetry.)
224+
(run <[this little script](/assets/generating-an-ordered-data-set-from-an-OCR-text-file/Roman_to_Arabic.txt)> to see in detail how `rome2ar` works. Elegant programming like this can offer insight; like poetry.)
225225

226226
## Some other things we'll need:
227227
At the top of your Python module, you're going to want to import some python modules that are a part of the standard library. (see Fred Gibbs's tutorial [*Installing Python Modules with pip*](/lessons/installing-python-modules-pip)).

en/lessons/json-and-jq.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ These set various jq [command-line options, or _flags_](https://stedolan.github.
132132

133133
jq operates by way of _filters_: a series of text commands that you can string together, and which dictate how jq should transform the JSON you give it.
134134

135-
To learn the basic jq filters, we'll work with a sample response from the Rijksmuseum API: [rkm.json](/assets/jq_rkm.json)
135+
To learn the basic jq filters, we'll work with a sample response from the Rijksmuseum API: [rkm.json](/assets/json-and-jq/jq_rkm.json)
136136
Select all the text at that link, copy it, and paste it into the "JSON" box at [jq play] on the left hand side.
137137

138138

@@ -425,7 +425,7 @@ One of the easiest ways to search and download Twitter data is using the excelle
425425

426426
For this lesson, we will use a small sample of 50 public tweets.
427427
Clear the "Filter", "JSON" and "Result" boxes on [jq play], and ensure all the checkboxes are unchecked.
428-
[Then copy this sample Twitter data](/assets/jq_twitter.json) into [jq play].
428+
[Then copy this sample Twitter data](/assets/json-and-jq/jq_twitter.json) into [jq play].
429429

430430
### One-to-many relationships: Tweet hashtags
431431

@@ -895,7 +895,7 @@ You should get the following table:
895895
"whiteprivilege",1
896896
```
897897

898-
[There are multiple ways to solve this with jq. See my answer here.](/assets/filter_retweets.txt)
898+
[There are multiple ways to solve this with jq. See my answer here.](/assets/json-and-jq/filter_retweets.txt)
899899

900900
#### Count total retweets per user
901901

@@ -909,7 +909,7 @@ Hints:
909909

910910
As a way to verify your results, user `356854246` should have a total retweet count of `51` based on this dataset.
911911

912-
[See my answer.](/assets/count_retweets.txt)
912+
[See my answer.](/assets/json-and-jq/count_retweets.txt)
913913

914914
## Using jq on the command line
915915

@@ -959,7 +959,7 @@ This can be useful when downloading JSON with a utility like `wget` for retrievi
959959
(See [Automated Downloading with Wget](/lessons/automated-downloading-with-wget) to learn the basics of this other command line program.)
960960

961961
```sh
962-
wget -qO- http://programminghistorian.org/assets/jq_rkm.json | jq -r '.artObjects[] | [.id, .title, .principalOrFirstMaker, .webImage.url] | @csv'
962+
wget -qO- http://programminghistorian.org/assets/json-and-jq/jq_rkm.json | jq -r '.artObjects[] | [.id, .title, .principalOrFirstMaker, .webImage.url] | @csv'
963963
```
964964

965965
Note that you must use the `wget` flag `-qO-` in order to send the output of `wget` into `jq` by way of a shell pipe.

en/lessons/naive-bayesian.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1462,7 +1462,7 @@ Happy hunting!
14621462

14631463
[A Naive Bayesian in the Old Bailey]: http://digitalhistoryhacks.blogspot.com/2008/05/naive-bayesian-in-old-bailey-part-1.html
14641464
[Old Bailey digital archive]: http://www.oldbaileyonline.org/
1465-
[A zip file of the scripts]: /assets/baileycode.zip
1465+
[A zip file of the scripts]: /assets/naive-bayesian/baileycode.zip
14661466
[another zip file]: https://doi.org/10.5281/zenodo.13284
14671467
[BeautifulSoup]: http://www.crummy.com/software/BeautifulSoup/
14681468
[search interface]: http://www.oldbaileyonline.org/forms/formMain.jsp

en/lessons/sentiment-analysis-syuzhet.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -245,7 +245,7 @@ library(tm)
245245

246246
## Load and Prepare the Text
247247

248-
Next, download a machine readable copy of the novel: [*Miau*](/assets/sentiment-analysis-syuzhet/galdos_miau.txt) and make sure to save it as a .txt file. When you open the file you will see that the novel is in [plain text](https://perma.cc/Z5WH-V9SW) format, which is essential for this particular analysis using R.
248+
Next, download a machine readable copy of the novel: [*Miau*](/assets/analisis-de-sentimientos-r/galdos_miau.txt) and make sure to save it as a .txt file. When you open the file you will see that the novel is in [plain text](https://perma.cc/Z5WH-V9SW) format, which is essential for this particular analysis using R.
249249

250250
With the text at hand, you first need to load it into R as one long string so that you can work with it programmatically. Make sure to replace `FILEPATH` with the location of the novel on your own computer (don't just type 'FILEPATH'). This loading process is slightly different on Mac/Linux and Windows machines:
251251

en/lessons/sonification.md

+9-9
Original file line numberDiff line numberDiff line change
@@ -52,9 +52,9 @@ You will see that 'sonification' moves us along the spectrum from mere 'visualiz
5252

5353
### Example Data
5454

55-
+ [Roman artefact data](/assets/sonification-roman-data.csv)
56-
+ [Excerpt from the Topic model of John Adams' Diary](/assets/sonification-diary.csv)
57-
+ [Excerpt from the Topic model of the Jesuit Relations](/assets/sonification-jesuittopics.csv)
55+
+ [Roman artefact data](/assets/sonification/sonification-roman-data.csv)
56+
+ [Excerpt from the Topic model of John Adams' Diary](/assets/sonification/sonification-diary.csv)
57+
+ [Excerpt from the Topic model of the Jesuit Relations](/assets/sonification/sonification-jesuittopics.csv)
5858

5959
# Some Background on Sonification
6060

@@ -122,18 +122,18 @@ _There is no 'right' way to represent your data as sound_, at least not yet: but
122122
But what about time? Historical data often has a punctuation point, a distinct 'time when' something occured. Thus, the amount of time between two data points has to be taken into account. This is where our next tool becomes quite useful, for when our data points have a relationship to one another in temporal space. We begin to move from sonfication (data points) to music (relationships between points).
123123

124124
### Practice
125-
The [sample dataset](/assets/sonification-roman-data.csv) provided contains counts of Roman coins in its first column and counts of other Roman materials from the same locations, as contained in the Portable Antiquities Scheme database from the British Museum. A sonification of this data might reveal or highlight aspects of the economic situation along Watling street, a major route through Roman Britain. The data points are organized geographically from North West to South East; thus as the sound plays out, we are hearing movement over space. Each note represents another stop along the way.
125+
The [sample dataset](/assets/sonification/sonification-roman-data.csv) provided contains counts of Roman coins in its first column and counts of other Roman materials from the same locations, as contained in the Portable Antiquities Scheme database from the British Museum. A sonification of this data might reveal or highlight aspects of the economic situation along Watling street, a major route through Roman Britain. The data points are organized geographically from North West to South East; thus as the sound plays out, we are hearing movement over space. Each note represents another stop along the way.
126126

127-
1. Open the[sonification-roman-data.csv](/assets/sonification-roman-data.csv) in a spreadsheet. Copy the first column into a text editor. Delete the line endings so that the data is all in a single row.
127+
1. Open the[sonification-roman-data.csv](/assets/sonification/sonification-roman-data.csv) in a spreadsheet. Copy the first column into a text editor. Delete the line endings so that the data is all in a single row.
128128
2. Add the following column information like so:
129129
```
130130
# Of Voices, Text Area Name, Text Area Data
131131
1,morphBox,
132132
,areaPitch1,
133133
```
134-
...so that your data follows immediately after that last comma (as like [this](/assets/sonification-romancoin-data-music.csv)). Save the file with a useful name like `coinsounds1.csv`.
134+
...so that your data follows immediately after that last comma (as like [this](/assets/sonification/sonification-romancoin-data-music.csv)). Save the file with a useful name like `coinsounds1.csv`.
135135

136-
3. Go to the [Musicalgorithms](http://musicalgorithms.org/3.0/index.html) site (version 3), and hit the load button. In the pop-up, click the blue 'load' button and select the file saved in step 2. The site will load your materials and display a green check mark if it loaded successfully. If it did not, make sure that your values are separated by spaces, and that they follow immediately the last comma in the code block in step 2. You may also try loading up the [demo file for this tutorial](/assets/sonification-romancoin-data-music.csv) instead.{% include figure.html filename="sonification-musicalgorithms-upload-4.png" caption="Click 'load' on the main screen to get this dialogue box. Then 'load csv'. Select your file; it will appear in the box. Then click the bottom load button." %}
136+
3. Go to the [Musicalgorithms](http://musicalgorithms.org/3.0/index.html) site (version 3), and hit the load button. In the pop-up, click the blue 'load' button and select the file saved in step 2. The site will load your materials and display a green check mark if it loaded successfully. If it did not, make sure that your values are separated by spaces, and that they follow immediately the last comma in the code block in step 2. You may also try loading up the [demo file for this tutorial](/assets/sonification/sonification-romancoin-data-music.csv) instead.{% include figure.html filename="sonification-musicalgorithms-upload-4.png" caption="Click 'load' on the main screen to get this dialogue box. Then 'load csv'. Select your file; it will appear in the box. Then click the bottom load button." %}
137137
4. Click on 'Pitch Input'. You'll see the values of your data. For now, **do not select** any further options on this page (thus using the site's default values).
138138
5. Click on 'Duration Input'. **Do not select any options here for now**. The options here will map various transformations against your data that will alter the duration for each note. Do not worry about these options for now; move on.
139139
6. Click on 'Pitch Mapping'. This is the most crucial choice, as it will transform (that is, scale) your raw data to a mapping against the keys of the keyboard. Leave the `mapping` set to 'division'. (The other options are modulo or logarithmic). The option `Range` 1 to 88 uses the full 88 keys of the keyboard; thus your lowest value would accord to the deepest note on the piano and your highest value with the highest note. You might wish instead to constrain your music around middle C, so enter 25 to 60 as your range. The output should change to: `31,34,34,34,25,28,30,60,28,25,26,26,25,25,60,25,25,38,33,26,25,25,25` These are no longer your counts; they are notes on the keyboard.{% include figure.html filename="sonification-musicalgorithms-settings-for-pitch-mapping-5.png" caption="Click into the 'range' box and set it to 25. The values underneath will change automatically. Click into the 'to' box and set it to 60. Click back into the other box; the values will update." %}
@@ -244,7 +244,7 @@ Can you make your computer play this song? (This [chart](https://web.archive.org
244244

245245
### Getting your own data in
246246

247-
[This file](/assets/sonification-diary.csv) is a selection from the topic model fitted to John Adams' Diaries for[The Macroscope](http://themacroscope.org). Only the strongest signals have been preserved by rounding the values in the columns to two decimal places (remembering that .25 for instance would indicate that that topic is contributing to a quarter of that diary entry's composition). To get this data into your python script, it has to be formatted in a particular away. The tricky bit is getting the date field right.
247+
[This file](/assets/sonification/sonification-diary.csv) is a selection from the topic model fitted to John Adams' Diaries for[The Macroscope](http://themacroscope.org). Only the strongest signals have been preserved by rounding the values in the columns to two decimal places (remembering that .25 for instance would indicate that that topic is contributing to a quarter of that diary entry's composition). To get this data into your python script, it has to be formatted in a particular away. The tricky bit is getting the date field right.
248248

249249
_For the purposes of this tutorial, we are going to leave the names of variables and so on unchanged from the sample script. The sample script was developed with earthquake data in mind; so where it says 'magnitude' we can think of it as equating to '% topic composition.'_
250250

@@ -375,7 +375,7 @@ Why would you want to do this? As has progressively become clear in tutorial, wh
375375

376376
Here, I offer simply a code snippet that will allow you to import your data, where your data is simply a list of values saved as csv. I am indebted to George Washington University librarian Laura Wrubel who posted to [gist.github.com](https://gist.github.com/lwrubel) her experiments in sonifying her library's circulation transactions.
377377

378-
In this [sample file](/assets/sonification-jesuittopics.csv)(a topic model generated from the [Jesuit Relations](http://puffin.creighton.edu/jesuit/relations/)), there are two topics. The first row contains the headers: topic1, topic2.
378+
In this [sample file](/assets/sonification/sonification-jesuittopics.csv)(a topic model generated from the [Jesuit Relations](http://puffin.creighton.edu/jesuit/relations/)), there are two topics. The first row contains the headers: topic1, topic2.
379379

380380
### Practice
381381

en/lessons/sustainable-authorship-in-plain-text-using-pandoc-and-markdown.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -331,8 +331,7 @@ small image) to your folder, and add the following somewhere in the body
331331
of the text: `![image caption](your_image.jpg)`.
332332

333333
At this point, your `main.md` should look something like the following.
334-
You can download this sample .md file
335-
[here](https://raw.githubusercontent.com/programminghistorian/jekyll/gh-pages/assets/sample.md).
334+
You can download this sample Markdown file from the _Programming Historian_ repository.
336335

337336
---
338337
title: Plain Text Workflow

es/lecciones/administracion-de-datos-en-r.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,8 @@ Copia el siguiente código en R Studio. Para ejecutarlo tienes que marcar las l
7878
```
7979

8080
## Un ejemplo de dplyr en acción
81-
Veamos un ejemplo de cómo dyplr nos puede ayudar a los historiadores. Vamos a cargar los datos del censo decenal de 1790 a 2010 de Estados Unidos. Descarga los datos haciendo [click aquí](/assets/data-wrangling-and-management-in-r/ejemplo_introductorio_estados.csv)[^2] y ponlos en la carpeta que vas a utilizar para trabajar en los ejemplos de este tutorial.
81+
82+
Veamos un ejemplo de cómo dyplr nos puede ayudar a los historiadores. Vamos a cargar los datos del censo decenal de 1790 a 2010 de Estados Unidos. Descarga los datos haciendo [click aquí](/assets/administracion-de-datos-en-r/ejemplo_introductorio_estados.csv)[^2] y ponlos en la carpeta que vas a utilizar para trabajar en los ejemplos de este tutorial.
8283

8384
Como los datos están en un archivo CSV, vamos a usar el comando de lectura ```read_csv()``` en el paquete [readr](https://cran.r-project.org/web/packages/readr/vignettes/readr.html) de "tidyverse".
8485

0 commit comments

Comments
 (0)