Skip to content

Issue-3441-cleaning-assets #3442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Jan 17, 2025
Merged
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
f066e94
Sort floating assets into folders
charlottejmc Dec 20, 2024
d8a5cac
More sorting assets
charlottejmc Dec 20, 2024
538f38b
More sorting assets into folders
charlottejmc Dec 20, 2024
8a0d645
Keep sorting assets
charlottejmc Dec 20, 2024
77df0fd
Tidying assets
charlottejmc Jan 8, 2025
2f54e4f
Update sustainable-authorship-in-plain-text-using-pandoc-and-markdown.md
charlottejmc Jan 8, 2025
78c2dcf
Update redaction-durable-avec-pandoc-et-markdown.md
charlottejmc Jan 8, 2025
408e72d
Merge branch 'gh-pages' into Issue-3441
Jan 8, 2025
b563b29
Update cleaning-data-with-openrefine.md
anisa-hawes Jan 8, 2025
dc249e1
Fixing a few mistakes
charlottejmc Jan 9, 2025
1acd5ae
Update cleaning-data-with-openrefine.md
charlottejmc Jan 15, 2025
c1a4710
Update cleaning-data-with-openrefine.md
charlottejmc Jan 15, 2025
3334055
Merge branch 'gh-pages' into Issue-3441
Jan 15, 2025
1ee9d0b
Merge branch 'gh-pages' into Issue-3441
Jan 15, 2025
de0da36
Reupload accidentally deleted file
charlottejmc Jan 16, 2025
f3c649b
Update administracion-de-datos-en-r.md
charlottejmc Jan 16, 2025
38fcce6
Update sustainable-authorship-in-plain-text-using-pandoc-and-markdown.md
charlottejmc Jan 16, 2025
3824840
Update sustainable-authorship-in-plain-text-using-pandoc-and-markdown.md
charlottejmc Jan 16, 2025
67ff378
Update redaction-durable-avec-pandoc-et-markdown.md
charlottejmc Jan 16, 2025
69dfb84
Update limpieza-de-datos-con-OpenRefine.md
charlottejmc Jan 16, 2025
e8d8ac0
Delete assets/sustainable-authorship-in-plain-text-using-pandoc-and-m…
charlottejmc Jan 16, 2025
b43f687
Reupload file to test
charlottejmc Jan 16, 2025
d6a93a5
Update sustainable-authorship-in-plain-text-using-pandoc-and-markdown.md
charlottejmc Jan 16, 2025
a69e23d
Update redaction-durable-avec-pandoc-et-markdown.md
charlottejmc Jan 16, 2025
199d41f
Update redaction-durable-avec-pandoc-et-markdown.md
charlottejmc Jan 16, 2025
2adbe26
Update sustainable-authorship-in-plain-text-using-pandoc-and-markdown.md
charlottejmc Jan 16, 2025
e2dab03
Update redaction-durable-avec-pandoc-et-markdown.md
charlottejmc Jan 16, 2025
dfbd3f2
Update sustainable-authorship-in-plain-text-using-pandoc-and-markdown.md
charlottejmc Jan 16, 2025
82bdde0
Update redaction-durable-avec-pandoc-et-markdown.md
charlottejmc Jan 16, 2025
6d4e84b
Update sustainable-authorship-in-plain-text-using-pandoc-and-markdown.md
charlottejmc Jan 16, 2025
4678367
Update redaction-durable-avec-pandoc-et-markdown.md
charlottejmc Jan 16, 2025
c515ef0
Update sustainable-authorship-in-plain-text-using-pandoc-and-markdown.md
charlottejmc Jan 16, 2025
55bea08
Update administracion-de-datos-en-r.md
charlottejmc Jan 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Binary file removed assets/scissorsandpaste-master.zip
Binary file not shown.
10,138 changes: 0 additions & 10,138 deletions assets/sentiment-analysis-syuzhet/galdos_miau.txt

This file was deleted.

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
5 changes: 2 additions & 3 deletions en/lessons/cleaning-data-with-openrefine.md
Original file line number Diff line number Diff line change
@@ -30,7 +30,6 @@ doi: 10.46430/phen0023




## Lesson goals

Don’t take your data at face value. That is the key message of this
@@ -144,7 +143,7 @@ as creating [Linked Data][].
OpenRefine works on all platforms: Windows, Mac, and Linux. *OpenRefine*
will open in your browser, but it is important to realise that the
application is run locally and that your data won't be stored online.
The data files are archived on the Programming Historian site: as [phm-collection][]. Please download the
The data files are archived on the Programming Historian site as [phm-collection][]. Please download the
*phm-collection.tsv* file before continuing.

On the *OpenRefine* start page, create a new project using the
@@ -413,7 +412,7 @@ the case you have made an error.
[Controlled vocabulary]: http://en.wikipedia.org/wiki/Controlled_vocabulary
[Linked Data]: http://en.wikipedia.org/wiki/Linked_data
[Download OpenRefine]: https://openrefine.org/download
[phm-collection]: /assets/phm-collection.tsv
[phm-collection]: /assets/cleaning-data-with-openrefine/phm-collection.tsv
[Powerhouse Museum Website]: /images/powerhouseScreenshot.png
[facet]: http://en.wikipedia.org/wiki/Faceted_search
[Screenshot of OpenRefine Example]: /images/overviewOfSomeClusters.png
4 changes: 2 additions & 2 deletions en/lessons/extracting-keywords.md
Original file line number Diff line number Diff line change
@@ -58,7 +58,7 @@ The lesson touches on Regular Expressions, so some readers may find it handy to

The first step of this process is to take a look at the data that we will be using in the lesson. As mentioned, the data includes biographical details of approximately 6,692 graduates who began study at the University of Oxford in the early seventeenth century.

[The\_Dataset\_-\_Alumni_Oxonienses-Jas1.csv](/assets/The_Dataset_-_Alumni_Oxonienses-Jas1.csv) (1.4MB)
[The\_Dataset\_-\_Alumni_Oxonienses-Jas1.csv](/assets/extracting-keywords/The_Dataset_-_Alumni_Oxonienses-Jas1.csv) (1.4MB)

{% include figure.html filename="extracting-keywords-1.png" caption="Screenshot of the first forty entries in the dataset" %}

@@ -378,7 +378,7 @@ Before you re-run your Python code, you'll have to update your `texts.txt` file

I'd challenge you to make a few refinements to your gazetteer before moving ahead, just to make sure you have the hang of it.

Once you are happy with that, you can snag my [completed list of English and Welsh counties, shortforms, and various other cities (London, Bristol etc) and places (Jersey, Ireland, etc)](/assets/extracting-keywords-final-gazetteer.txt). My completed list contains 157 entries, and should get you all of the entries that can be extracted from the texts in this collection.
Once you are happy with that, you can snag my [completed list of English and Welsh counties, shortforms, and various other cities (London, Bristol etc) and places (Jersey, Ireland, etc)](/assets/extracting-keywords/extracting-keywords-final-gazetteer.txt). My completed list contains 157 entries, and should get you all of the entries that can be extracted from the texts in this collection.

At this point you could stop, as you've achieved what you set out to do. This lesson taught you how to use a short Python program to search a fairly large number of texts for a set of keywords defined by you.

2 changes: 1 addition & 1 deletion en/lessons/from-html-to-list-of-words-1.md
Original file line number Diff line number Diff line change
@@ -259,4 +259,4 @@ that’s ok!
[Manipulating Strings in Python]: /lessons/manipulating-strings-in-python
[Code Reuse and Modularity]: /lessons/code-reuse-and-modularity
[zip]: /assets/python-lessons2.zip
[obo-t17800628-33.html]: /assets/obo-t17800628-33.html
[obo-t17800628-33.html]: /assets/from-html-to-list-of-words-1/obo-t17800628-33.html
Original file line number Diff line number Diff line change
@@ -221,7 +221,7 @@ def rom2ar(rom):

return result
```
(run <[this little script](/assets/Roman_to_Arabic.txt)> to see in detail how `rome2ar` works. Elegant programming like this can offer insight; like poetry.)
(run <[this little script](/assets/generating-an-ordered-data-set-from-an-OCR-text-file/Roman_to_Arabic.txt)> to see in detail how `rome2ar` works. Elegant programming like this can offer insight; like poetry.)

## Some other things we'll need:
At the top of your Python module, you're going to want to import some python modules that are a part of the standard library. (see Fred Gibbs's tutorial [*Installing Python Modules with pip*](/lessons/installing-python-modules-pip)).
10 changes: 5 additions & 5 deletions en/lessons/json-and-jq.md
Original file line number Diff line number Diff line change
@@ -132,7 +132,7 @@ These set various jq [command-line options, or _flags_](https://stedolan.github.

jq operates by way of _filters_: a series of text commands that you can string together, and which dictate how jq should transform the JSON you give it.

To learn the basic jq filters, we'll work with a sample response from the Rijksmuseum API: [rkm.json](/assets/jq_rkm.json)
To learn the basic jq filters, we'll work with a sample response from the Rijksmuseum API: [rkm.json](/assets/json-and-jq/jq_rkm.json)
Select all the text at that link, copy it, and paste it into the "JSON" box at [jq play] on the left hand side.


@@ -425,7 +425,7 @@ One of the easiest ways to search and download Twitter data is using the excelle

For this lesson, we will use a small sample of 50 public tweets.
Clear the "Filter", "JSON" and "Result" boxes on [jq play], and ensure all the checkboxes are unchecked.
[Then copy this sample Twitter data](/assets/jq_twitter.json) into [jq play].
[Then copy this sample Twitter data](/assets/json-and-jq/jq_twitter.json) into [jq play].

### One-to-many relationships: Tweet hashtags

@@ -895,7 +895,7 @@ You should get the following table:
"whiteprivilege",1
```

[There are multiple ways to solve this with jq. See my answer here.](/assets/filter_retweets.txt)
[There are multiple ways to solve this with jq. See my answer here.](/assets/json-and-jq/filter_retweets.txt)

#### Count total retweets per user

@@ -909,7 +909,7 @@ Hints:

As a way to verify your results, user `356854246` should have a total retweet count of `51` based on this dataset.

[See my answer.](/assets/count_retweets.txt)
[See my answer.](/assets/json-and-jq/count_retweets.txt)

## Using jq on the command line

@@ -959,7 +959,7 @@ This can be useful when downloading JSON with a utility like `wget` for retrievi
(See [Automated Downloading with Wget](/lessons/automated-downloading-with-wget) to learn the basics of this other command line program.)

```sh
wget -qO- http://programminghistorian.org/assets/jq_rkm.json | jq -r '.artObjects[] | [.id, .title, .principalOrFirstMaker, .webImage.url] | @csv'
wget -qO- http://programminghistorian.org/assets/json-and-jq/jq_rkm.json | jq -r '.artObjects[] | [.id, .title, .principalOrFirstMaker, .webImage.url] | @csv'
```

Note that you must use the `wget` flag `-qO-` in order to send the output of `wget` into `jq` by way of a shell pipe.
2 changes: 1 addition & 1 deletion en/lessons/naive-bayesian.md
Original file line number Diff line number Diff line change
@@ -1462,7 +1462,7 @@ Happy hunting!

[A Naive Bayesian in the Old Bailey]: http://digitalhistoryhacks.blogspot.com/2008/05/naive-bayesian-in-old-bailey-part-1.html
[Old Bailey digital archive]: http://www.oldbaileyonline.org/
[A zip file of the scripts]: /assets/baileycode.zip
[A zip file of the scripts]: /assets/naive-bayesian/baileycode.zip
[another zip file]: https://doi.org/10.5281/zenodo.13284
[BeautifulSoup]: http://www.crummy.com/software/BeautifulSoup/
[search interface]: http://www.oldbaileyonline.org/forms/formMain.jsp
2 changes: 1 addition & 1 deletion en/lessons/sentiment-analysis-syuzhet.md
Original file line number Diff line number Diff line change
@@ -245,7 +245,7 @@ library(tm)

## Load and Prepare the Text

Next, download a machine readable copy of the novel: [*Miau*](/assets/sentiment-analysis-syuzhet/galdos_miau.txt) and make sure to save it as a .txt file. When you open the file you will see that the novel is in [plain text](https://perma.cc/Z5WH-V9SW) format, which is essential for this particular analysis using R.
Next, download a machine readable copy of the novel: [*Miau*](/assets/analisis-de-sentimientos-r/galdos_miau.txt) and make sure to save it as a .txt file. When you open the file you will see that the novel is in [plain text](https://perma.cc/Z5WH-V9SW) format, which is essential for this particular analysis using R.

With the text at hand, you first need to load it into R as one long string so that you can work with it programmatically. Make sure to replace `FILEPATH` with the location of the novel on your own computer (don't just type 'FILEPATH'). This loading process is slightly different on Mac/Linux and Windows machines:

18 changes: 9 additions & 9 deletions en/lessons/sonification.md
Original file line number Diff line number Diff line change
@@ -52,9 +52,9 @@ You will see that 'sonification' moves us along the spectrum from mere 'visualiz

### Example Data

+ [Roman artefact data](/assets/sonification-roman-data.csv)
+ [Excerpt from the Topic model of John Adams' Diary](/assets/sonification-diary.csv)
+ [Excerpt from the Topic model of the Jesuit Relations](/assets/sonification-jesuittopics.csv)
+ [Roman artefact data](/assets/sonification/sonification-roman-data.csv)
+ [Excerpt from the Topic model of John Adams' Diary](/assets/sonification/sonification-diary.csv)
+ [Excerpt from the Topic model of the Jesuit Relations](/assets/sonification/sonification-jesuittopics.csv)

# Some Background on Sonification

@@ -122,18 +122,18 @@ _There is no 'right' way to represent your data as sound_, at least not yet: but
But what about time? Historical data often has a punctuation point, a distinct 'time when' something occured. Thus, the amount of time between two data points has to be taken into account. This is where our next tool becomes quite useful, for when our data points have a relationship to one another in temporal space. We begin to move from sonfication (data points) to music (relationships between points).

### Practice
The [sample dataset](/assets/sonification-roman-data.csv) provided contains counts of Roman coins in its first column and counts of other Roman materials from the same locations, as contained in the Portable Antiquities Scheme database from the British Museum. A sonification of this data might reveal or highlight aspects of the economic situation along Watling street, a major route through Roman Britain. The data points are organized geographically from North West to South East; thus as the sound plays out, we are hearing movement over space. Each note represents another stop along the way.
The [sample dataset](/assets/sonification/sonification-roman-data.csv) provided contains counts of Roman coins in its first column and counts of other Roman materials from the same locations, as contained in the Portable Antiquities Scheme database from the British Museum. A sonification of this data might reveal or highlight aspects of the economic situation along Watling street, a major route through Roman Britain. The data points are organized geographically from North West to South East; thus as the sound plays out, we are hearing movement over space. Each note represents another stop along the way.

1. Open the[sonification-roman-data.csv](/assets/sonification-roman-data.csv) in a spreadsheet. Copy the first column into a text editor. Delete the line endings so that the data is all in a single row.
1. Open the[sonification-roman-data.csv](/assets/sonification/sonification-roman-data.csv) in a spreadsheet. Copy the first column into a text editor. Delete the line endings so that the data is all in a single row.
2. Add the following column information like so:
```
# Of Voices, Text Area Name, Text Area Data
1,morphBox,
,areaPitch1,
```
...so that your data follows immediately after that last comma (as like [this](/assets/sonification-romancoin-data-music.csv)). Save the file with a useful name like `coinsounds1.csv`.
...so that your data follows immediately after that last comma (as like [this](/assets/sonification/sonification-romancoin-data-music.csv)). Save the file with a useful name like `coinsounds1.csv`.

3. Go to the [Musicalgorithms](http://musicalgorithms.org/3.0/index.html) site (version 3), and hit the load button. In the pop-up, click the blue 'load' button and select the file saved in step 2. The site will load your materials and display a green check mark if it loaded successfully. If it did not, make sure that your values are separated by spaces, and that they follow immediately the last comma in the code block in step 2. You may also try loading up the [demo file for this tutorial](/assets/sonification-romancoin-data-music.csv) instead.{% include figure.html filename="sonification-musicalgorithms-upload-4.png" caption="Click 'load' on the main screen to get this dialogue box. Then 'load csv'. Select your file; it will appear in the box. Then click the bottom load button." %}
3. Go to the [Musicalgorithms](http://musicalgorithms.org/3.0/index.html) site (version 3), and hit the load button. In the pop-up, click the blue 'load' button and select the file saved in step 2. The site will load your materials and display a green check mark if it loaded successfully. If it did not, make sure that your values are separated by spaces, and that they follow immediately the last comma in the code block in step 2. You may also try loading up the [demo file for this tutorial](/assets/sonification/sonification-romancoin-data-music.csv) instead.{% include figure.html filename="sonification-musicalgorithms-upload-4.png" caption="Click 'load' on the main screen to get this dialogue box. Then 'load csv'. Select your file; it will appear in the box. Then click the bottom load button." %}
4. Click on 'Pitch Input'. You'll see the values of your data. For now, **do not select** any further options on this page (thus using the site's default values).
5. Click on 'Duration Input'. **Do not select any options here for now**. The options here will map various transformations against your data that will alter the duration for each note. Do not worry about these options for now; move on.
6. Click on 'Pitch Mapping'. This is the most crucial choice, as it will transform (that is, scale) your raw data to a mapping against the keys of the keyboard. Leave the `mapping` set to 'division'. (The other options are modulo or logarithmic). The option `Range` 1 to 88 uses the full 88 keys of the keyboard; thus your lowest value would accord to the deepest note on the piano and your highest value with the highest note. You might wish instead to constrain your music around middle C, so enter 25 to 60 as your range. The output should change to: `31,34,34,34,25,28,30,60,28,25,26,26,25,25,60,25,25,38,33,26,25,25,25` These are no longer your counts; they are notes on the keyboard.{% include figure.html filename="sonification-musicalgorithms-settings-for-pitch-mapping-5.png" caption="Click into the 'range' box and set it to 25. The values underneath will change automatically. Click into the 'to' box and set it to 60. Click back into the other box; the values will update." %}
@@ -244,7 +244,7 @@ Can you make your computer play this song? (This [chart](https://web.archive.org

### Getting your own data in

[This file](/assets/sonification-diary.csv) is a selection from the topic model fitted to John Adams' Diaries for[The Macroscope](http://themacroscope.org). Only the strongest signals have been preserved by rounding the values in the columns to two decimal places (remembering that .25 for instance would indicate that that topic is contributing to a quarter of that diary entry's composition). To get this data into your python script, it has to be formatted in a particular away. The tricky bit is getting the date field right.
[This file](/assets/sonification/sonification-diary.csv) is a selection from the topic model fitted to John Adams' Diaries for[The Macroscope](http://themacroscope.org). Only the strongest signals have been preserved by rounding the values in the columns to two decimal places (remembering that .25 for instance would indicate that that topic is contributing to a quarter of that diary entry's composition). To get this data into your python script, it has to be formatted in a particular away. The tricky bit is getting the date field right.

_For the purposes of this tutorial, we are going to leave the names of variables and so on unchanged from the sample script. The sample script was developed with earthquake data in mind; so where it says 'magnitude' we can think of it as equating to '% topic composition.'_

@@ -375,7 +375,7 @@ Why would you want to do this? As has progressively become clear in tutorial, wh

Here, I offer simply a code snippet that will allow you to import your data, where your data is simply a list of values saved as csv. I am indebted to George Washington University librarian Laura Wrubel who posted to [gist.github.com](https://gist.github.com/lwrubel) her experiments in sonifying her library's circulation transactions.

In this [sample file](/assets/sonification-jesuittopics.csv)(a topic model generated from the [Jesuit Relations](http://puffin.creighton.edu/jesuit/relations/)), there are two topics. The first row contains the headers: topic1, topic2.
In this [sample file](/assets/sonification/sonification-jesuittopics.csv)(a topic model generated from the [Jesuit Relations](http://puffin.creighton.edu/jesuit/relations/)), there are two topics. The first row contains the headers: topic1, topic2.

### Practice

Original file line number Diff line number Diff line change
@@ -331,8 +331,7 @@ small image) to your folder, and add the following somewhere in the body
of the text: `![image caption](your_image.jpg)`.

At this point, your `main.md` should look something like the following.
You can download this sample .md file
[here](https://raw.githubusercontent.com/programminghistorian/jekyll/gh-pages/assets/sample.md).
You can download this sample Markdown file from the _Programming Historian_ repository.

---
title: Plain Text Workflow
3 changes: 2 additions & 1 deletion es/lecciones/administracion-de-datos-en-r.md
Original file line number Diff line number Diff line change
@@ -78,7 +78,8 @@ Copia el siguiente código en R Studio. Para ejecutarlo tienes que marcar las l
```

## Un ejemplo de dplyr en acción
Veamos un ejemplo de cómo dyplr nos puede ayudar a los historiadores. Vamos a cargar los datos del censo decenal de 1790 a 2010 de Estados Unidos. Descarga los datos haciendo [click aquí](/assets/data-wrangling-and-management-in-r/ejemplo_introductorio_estados.csv)[^2] y ponlos en la carpeta que vas a utilizar para trabajar en los ejemplos de este tutorial.

Veamos un ejemplo de cómo dyplr nos puede ayudar a los historiadores. Vamos a cargar los datos del censo decenal de 1790 a 2010 de Estados Unidos. Descarga los datos haciendo [click aquí](/assets/administracion-de-datos-en-r/ejemplo_introductorio_estados.csv)[^2] y ponlos en la carpeta que vas a utilizar para trabajar en los ejemplos de este tutorial.

Como los datos están en un archivo CSV, vamos a usar el comando de lectura ```read_csv()``` en el paquete [readr](https://cran.r-project.org/web/packages/readr/vignettes/readr.html) de "tidyverse".

Loading