Skip to content

Commit a29f955

Browse files
authored
Merge pull request #37 from ivelasq/til/r-comp
Til/r comp
2 parents 6f00ab4 + b01b29c commit a29f955

File tree

9 files changed

+4886
-5
lines changed

9 files changed

+4886
-5
lines changed

_freeze/site_libs/quarto-listing/list.min.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

_freeze/site_libs/quarto-listing/quarto-listing.js

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ const kProgressiveAttr = "data-src";
22
let categoriesLoaded = false;
33

44
window.quartoListingCategory = (category) => {
5+
category = atob(category);
56
if (categoriesLoaded) {
67
activateCategory(category);
78
setCategoryHash(category);
@@ -15,7 +16,9 @@ window["quarto-listing-loaded"] = () => {
1516
if (hash) {
1617
// If there is a category, switch to that
1718
if (hash.category) {
18-
activateCategory(hash.category);
19+
// category hash are URI encoded so we need to decode it before processing
20+
// so that we can match it with the category element processed in JS
21+
activateCategory(decodeURIComponent(hash.category));
1922
}
2023
// Paginate a specific listing
2124
const listingIds = Object.keys(window["quarto-listings"]);
@@ -58,7 +61,10 @@ window.document.addEventListener("DOMContentLoaded", function (_event) {
5861
);
5962

6063
for (const categoryEl of categoryEls) {
61-
const category = categoryEl.getAttribute("data-category");
64+
// category needs to support non ASCII characters
65+
const category = decodeURIComponent(
66+
atob(categoryEl.getAttribute("data-category"))
67+
);
6268
categoryEl.onclick = () => {
6369
activateCategory(category);
6470
setCategoryHash(category);
@@ -208,7 +214,9 @@ function activateCategory(category) {
208214

209215
// Activate this category
210216
const categoryEl = window.document.querySelector(
211-
`.quarto-listing-category .category[data-category='${category}'`
217+
`.quarto-listing-category .category[data-category='${btoa(
218+
encodeURIComponent(category)
219+
)}']`
212220
);
213221
if (categoryEl) {
214222
categoryEl.classList.add("active");
@@ -231,7 +239,9 @@ function filterListingCategory(category) {
231239
list.filter(function (item) {
232240
const itemValues = item.values();
233241
if (itemValues.categories !== null) {
234-
const categories = itemValues.categories.split(",");
242+
const categories = decodeURIComponent(
243+
atob(itemValues.categories)
244+
).split(",");
235245
return categories.includes(category);
236246
} else {
237247
return false;
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{
2+
"hash": "add4d05eb93703b182620718d2b16338",
3+
"result": {
4+
"engine": "knitr",
5+
"markdown": "---\ntitle: \"How to find out how much of R Core is R\"\ndate: \"2024-12-25\"\ncategory: R\noutput: html_document\n---\n\n\n\nCleaning out my computer as I get ready to switch to a new one has me running into old gems. So, when I say \"Today I learned,\" I really mean \"I learned this back in December 2021.\" 😅\n\nBack then, I gave a talk at Why R? called \n[Packages for Using R With Python, Tableau, and Other Tools](https://www.youtube.com/watch?v=vyA2EiIz4pI&feature=youtu.be). One part of the talk was about how R itself isn't just made up of R. \n\nI adapted [this classic blog post](https://librestats.wordpress.com/2011/08/27/how-much-of-r-is-written-in-r/) by wrathematics to explore the composition of the [R 4.1.2 source package](https://cran.r-project.org/src/base/R-4/). The post features a script that scans the `.R`, `.c`, and `.f` files in the source, then records the language (R, C, or Fortran) and the number of lines of code in each language to a CSV file. Keep in mind, I have almost no knowledge of Shell (and this was pre-ChatGPT days!), so it took me a bit to adapt the original script from 2011.\n\n```{.bash filename=\"shell.sh\"}\noutdir=\"./\"\n\nrdir=\"./R-4.1.2\" #eg, ~/R-2.13.1/\ncd $rdir/src\n\nfor rfile in `find . -type f -name *.R`\ndo\nloc=`wc -l $rfile | sed -e 's/ ./,/' -e 's/\\/[^/]*\\//\\//g' -e 's/\\/[^/]*\\//\\//g' -e 's/\\/[^/]*\\///g' -e 's/\\///'`\necho \"R,$loc\" >> $outdir/r_source_loc.csv\ndone\n\nfor cfile in `find . -type f -name *.c`\ndo\nloc=`wc -l $cfile | sed -e 's/ ./,/' -e 's/\\/[^/]*\\//\\//g' -e 's/\\/[^/]*\\//\\//g' -e 's/\\/[^/]*\\///g' -e 's/\\///'`\necho \"C,$loc\" >> $outdir/r_source_loc.csv\ndone\n\nfor ffile in `find . -type f -name *.f`\ndo\nloc=`wc -l $ffile | sed -e 's/ ./,/' -e 's/\\/[^/]*\\//\\//g' -e 's/\\/[^/]*\\//\\//g' -e 's/\\/[^/]*\\///g' -e 's/\\///'`\necho \"Fortran,$loc\" >> $outdir/r_source_loc.csv\ndone\n```\n\nThe script creates a file called `r_source_loc.csv`. It shows the number of lines by programming language by script in R 4.1.2. We can read it into R:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(dplyr)\nlibrary(stringr)\n\nr_loc <-\n readr::read_table(here::here(\"til-r\", \"r-composition\", \"r_source_loc.csv\"),\n col_names = c(\"language\", \"lines\", \"script\")) |> \n mutate(language = case_when(str_detect(language, \"R,,\") ~ \"R\",\n str_detect(language, \"C,,\") ~ \"C\",\n str_detect(language, \"Fortran,,\") ~ \"Fortran\"),\n lines = as.numeric(lines)) |> \n distinct()\n\nhead(r_loc)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 6 × 3\n language lines script \n <chr> <dbl> <chr> \n1 R 20 .snow2.RR \n2 R 9 .multicore3.RR\n3 R 15 .multicore2.RR\n4 R 10 .multicore1.RR\n5 R 25 .RSeed.R \n6 R 36 .Master.R \n```\n\n\n:::\n:::\n\n\n\nNow, we can visualize the percentage of R Core sourcecode files by language using ggplot2:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(ggplot2)\nlibrary(forcats)\n\nr_loc |> \n filter(!is.na(language)) |> \n group_by(language) |> \n summarise (n = n()) |> \n mutate(rel.freq = n / sum(n), accuracy = 0.1) |> \n ggplot(aes(x = fct_reorder(language, desc(rel.freq)), y = rel.freq, fill = language)) +\n geom_bar(stat = \"identity\") +\n geom_text(\n aes(label = scales::percent(rel.freq)),\n position = position_dodge(width = 0.9),\n vjust = -0.25,\n size = 4\n ) +\n theme_minimal() +\n labs(title = \"Percentage of R Core Sourcecode Files by Language\") +\n theme(plot.title = element_text(size = 14),\n axis.title.x = element_blank(),\n axis.title.y = element_blank(),\n axis.text.x = element_text(size = 12),\n axis.text.y = element_blank()) +\n scale_fill_manual(values = c(\"R\" = \"#332288\", \n \"C\" = \"#882255\", \n \"Fortran\" = \"#44AA99\"))\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-2-1.png){width=672}\n:::\n:::\n\n\n\nOr, we can visualize the percentage of R Core lines of code by language:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nr_loc |> \n filter(!is.na(language)) |> \n group_by(language) %>% \n summarise(sum_lines = sum(lines, na.rm = TRUE)) |> \n ungroup() |> \n mutate(percent = sum_lines/sum(sum_lines)) |> \n ggplot(aes(x = fct_reorder(language, desc(percent)), y = percent, fill = language)) +\n geom_bar(stat = \"identity\") +\n geom_text(\n aes(label = scales::percent(percent)),\n position = position_dodge(width = 0.9),\n vjust = -0.25,\n size = 4\n )+\n theme_minimal() +\n labs(title = \"Percentage of R Core Lines of Code by Language\") +\n theme(plot.title = element_text(size = 14),\n axis.title.x = element_blank(),\n axis.title.y = element_blank(),\n axis.text.x = element_text(size = 12),\n axis.text.y = element_blank(),\n legend.position = \"none\") +\n scale_fill_manual(values = c(\"R\" = \"#332288\", \n \"C\" = \"#882255\", \n \"Fortran\" = \"#44AA99\"))\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-3-1.png){width=672}\n:::\n:::\n\n\n\nIt’s interesting to see how much goes into making R what it is: an ecosystem built on collaboration across languages and tools (which was the takeaway from the talk!). If you’re curious about R's source code, give the script a shot!",
6+
"supporting": [
7+
"index_files"
8+
],
9+
"filters": [
10+
"rmarkdown/pagebreak.lua"
11+
],
12+
"includes": {},
13+
"engineDependencies": {},
14+
"preserve": {},
15+
"postProcess": true
16+
}
17+
}
Loading
Loading

pipedream.Rproj

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
Version: 1.0
2+
ProjectId: ebc81a81-96a3-4c8e-a728-27afe2266e1a
23

34
RestoreWorkspace: Default
45
SaveWorkspace: Default

til-r/r-composition/index.knit.md

Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
---
2+
title: "How to find out how much of R Core is R"
3+
date: "2024-12-25"
4+
category: R
5+
output: html_document
6+
---
7+
8+
9+
10+
Cleaning out my computer as I get ready to switch to a new one has me running into old gems. So, when I say "Today I learned," I really mean "I learned this back in December 2021." 😅
11+
12+
Back then, I gave a talk at Why R? called
13+
[Packages for Using R With Python, Tableau, and Other Tools](https://www.youtube.com/watch?v=vyA2EiIz4pI&feature=youtu.be). One part of the talk was about how R itself isn't just made up of R.
14+
15+
I adapted [this classic blog post](https://librestats.wordpress.com/2011/08/27/how-much-of-r-is-written-in-r/) by wrathematics to explore the composition of the [R 4.1.2 source package](https://cran.r-project.org/src/base/R-4/). The post features a script that scans the `.R`, `.c`, and `.f` files in the source, then records the language (R, C, or Fortran) and the number of lines of code in each language to a CSV file. Keep in mind, I have almost no knowledge of Shell (and this was pre-ChatGPT days!), so it took me a bit to adapt the original script from 2011.
16+
17+
```{.bash filename="shell.sh"}
18+
outdir="./"
19+
20+
rdir="./R-4.1.2" #eg, ~/R-2.13.1/
21+
cd $rdir/src
22+
23+
for rfile in `find . -type f -name *.R`
24+
do
25+
loc=`wc -l $rfile | sed -e 's/ ./,/' -e 's/\/[^/]*\//\//g' -e 's/\/[^/]*\//\//g' -e 's/\/[^/]*\///g' -e 's/\///'`
26+
echo "R,$loc" >> $outdir/r_source_loc.csv
27+
done
28+
29+
for cfile in `find . -type f -name *.c`
30+
do
31+
loc=`wc -l $cfile | sed -e 's/ ./,/' -e 's/\/[^/]*\//\//g' -e 's/\/[^/]*\//\//g' -e 's/\/[^/]*\///g' -e 's/\///'`
32+
echo "C,$loc" >> $outdir/r_source_loc.csv
33+
done
34+
35+
for ffile in `find . -type f -name *.f`
36+
do
37+
loc=`wc -l $ffile | sed -e 's/ ./,/' -e 's/\/[^/]*\//\//g' -e 's/\/[^/]*\//\//g' -e 's/\/[^/]*\///g' -e 's/\///'`
38+
echo "Fortran,$loc" >> $outdir/r_source_loc.csv
39+
done
40+
```
41+
42+
The script creates a file called `r_source_loc.csv`. It shows the number of lines by programming language by script in R 4.1.2. We can read it into R:
43+
44+
45+
46+
::: {.cell}
47+
48+
```{.r .cell-code}
49+
library(dplyr)
50+
library(stringr)
51+
52+
r_loc <-
53+
readr::read_table(here::here("til-r", "r-composition", "r_source_loc.csv"),
54+
col_names = c("language", "lines", "script")) |>
55+
mutate(language = case_when(str_detect(language, "R,,") ~ "R",
56+
str_detect(language, "C,,") ~ "C",
57+
str_detect(language, "Fortran,,") ~ "Fortran"),
58+
lines = as.numeric(lines)) |>
59+
distinct()
60+
61+
head(r_loc)
62+
```
63+
64+
::: {.cell-output .cell-output-stdout}
65+
66+
```
67+
# A tibble: 6 × 3
68+
language lines script
69+
<chr> <dbl> <chr>
70+
1 R 20 .snow2.RR
71+
2 R 9 .multicore3.RR
72+
3 R 15 .multicore2.RR
73+
4 R 10 .multicore1.RR
74+
5 R 25 .RSeed.R
75+
6 R 36 .Master.R
76+
```
77+
78+
79+
:::
80+
:::
81+
82+
83+
84+
Now, we can visualize the percentage of R Core sourcecode files by language using ggplot2:
85+
86+
87+
88+
::: {.cell}
89+
90+
```{.r .cell-code}
91+
library(ggplot2)
92+
library(forcats)
93+
94+
r_loc |>
95+
filter(!is.na(language)) |>
96+
group_by(language) |>
97+
summarise (n = n()) |>
98+
mutate(rel.freq = n / sum(n), accuracy = 0.1) |>
99+
ggplot(aes(x = fct_reorder(language, desc(rel.freq)), y = rel.freq, fill = language)) +
100+
geom_bar(stat = "identity") +
101+
geom_text(
102+
aes(label = scales::percent(rel.freq)),
103+
position = position_dodge(width = 0.9),
104+
vjust = -0.25,
105+
size = 4
106+
) +
107+
theme_minimal() +
108+
labs(title = "Percentage of R Core Sourcecode Files by Language") +
109+
theme(plot.title = element_text(size = 14),
110+
axis.title.x = element_blank(),
111+
axis.title.y = element_blank(),
112+
axis.text.x = element_text(size = 12),
113+
axis.text.y = element_blank()) +
114+
scale_fill_manual(values = c("R" = "#332288",
115+
"C" = "#882255",
116+
"Fortran" = "#44AA99"))
117+
```
118+
119+
::: {.cell-output-display}
120+
![](index_files/figure-html/unnamed-chunk-2-1.png){width=672}
121+
:::
122+
:::
123+
124+
125+
126+
Or, we can visualize the percentage of R Core lines of code by language:
127+
128+
129+
130+
::: {.cell}
131+
132+
```{.r .cell-code}
133+
r_loc |>
134+
filter(!is.na(language)) |>
135+
group_by(language) %>%
136+
summarise(sum_lines = sum(lines, na.rm = TRUE)) |>
137+
ungroup() |>
138+
mutate(percent = sum_lines/sum(sum_lines)) |>
139+
ggplot(aes(x = fct_reorder(language, desc(percent)), y = percent, fill = language)) +
140+
geom_bar(stat = "identity") +
141+
geom_text(
142+
aes(label = scales::percent(percent)),
143+
position = position_dodge(width = 0.9),
144+
vjust = -0.25,
145+
size = 4
146+
)+
147+
theme_minimal() +
148+
labs(title = "Percentage of R Core Lines of Code by Language") +
149+
theme(plot.title = element_text(size = 14),
150+
axis.title.x = element_blank(),
151+
axis.title.y = element_blank(),
152+
axis.text.x = element_text(size = 12),
153+
axis.text.y = element_blank(),
154+
legend.position = "none") +
155+
scale_fill_manual(values = c("R" = "#332288",
156+
"C" = "#882255",
157+
"Fortran" = "#44AA99"))
158+
```
159+
160+
::: {.cell-output-display}
161+
![](index_files/figure-html/unnamed-chunk-3-1.png){width=672}
162+
:::
163+
:::
164+
165+
166+
167+
It’s interesting to see how much goes into making R what it is: an ecosystem built on collaboration across languages and tools (which was the takeaway from the talk!). If you’re curious about R's source code, give the script a shot!

til-r/r-composition/index.qmd

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
---
2+
title: "How to find out how much of R Core is R"
3+
date: "2024-12-25"
4+
category: R
5+
output: html_document
6+
---
7+
8+
Cleaning out my computer as I get ready to switch to a new one has me running into old gems. So, when I say "Today I learned," I really mean "I learned this back in December 2021." 😅
9+
10+
Back then, I gave a talk at Why R? called
11+
[Packages for Using R With Python, Tableau, and Other Tools](https://www.youtube.com/watch?v=vyA2EiIz4pI&feature=youtu.be). One part of the talk was about how R itself isn't just made up of R.
12+
13+
I adapted [this classic blog post](https://librestats.wordpress.com/2011/08/27/how-much-of-r-is-written-in-r/) by wrathematics to explore the composition of the [R 4.1.2 source package](https://cran.r-project.org/src/base/R-4/). The post features a script that scans the `.R`, `.c`, and `.f` files in the source, then records the language (R, C, or Fortran) and the number of lines of code in each language to a CSV file. Keep in mind, I have almost no knowledge of Shell (and this was pre-ChatGPT days!), so it took me a bit to adapt the original script from 2011.
14+
15+
```{.bash filename="shell.sh"}
16+
outdir="./"
17+
18+
rdir="./R-4.1.2" #eg, ~/R-2.13.1/
19+
cd $rdir/src
20+
21+
for rfile in `find . -type f -name *.R`
22+
do
23+
loc=`wc -l $rfile | sed -e 's/ ./,/' -e 's/\/[^/]*\//\//g' -e 's/\/[^/]*\//\//g' -e 's/\/[^/]*\///g' -e 's/\///'`
24+
echo "R,$loc" >> $outdir/r_source_loc.csv
25+
done
26+
27+
for cfile in `find . -type f -name *.c`
28+
do
29+
loc=`wc -l $cfile | sed -e 's/ ./,/' -e 's/\/[^/]*\//\//g' -e 's/\/[^/]*\//\//g' -e 's/\/[^/]*\///g' -e 's/\///'`
30+
echo "C,$loc" >> $outdir/r_source_loc.csv
31+
done
32+
33+
for ffile in `find . -type f -name *.f`
34+
do
35+
loc=`wc -l $ffile | sed -e 's/ ./,/' -e 's/\/[^/]*\//\//g' -e 's/\/[^/]*\//\//g' -e 's/\/[^/]*\///g' -e 's/\///'`
36+
echo "Fortran,$loc" >> $outdir/r_source_loc.csv
37+
done
38+
```
39+
40+
The script creates a file called `r_source_loc.csv`. It shows the number of lines by programming language by script in R 4.1.2. We can read it into R:
41+
42+
```{r}
43+
#| warning: false
44+
library(dplyr)
45+
library(stringr)
46+
47+
r_loc <-
48+
readr::read_table(here::here("til-r", "r-composition", "r_source_loc.csv"),
49+
col_names = c("language", "lines", "script")) |>
50+
mutate(language = case_when(str_detect(language, "R,,") ~ "R",
51+
str_detect(language, "C,,") ~ "C",
52+
str_detect(language, "Fortran,,") ~ "Fortran"),
53+
lines = as.numeric(lines)) |>
54+
distinct()
55+
56+
head(r_loc)
57+
```
58+
59+
Now, we can visualize the percentage of R Core sourcecode files by language using ggplot2:
60+
61+
```{r}
62+
library(ggplot2)
63+
library(forcats)
64+
65+
r_loc |>
66+
filter(!is.na(language)) |>
67+
group_by(language) |>
68+
summarise (n = n()) |>
69+
mutate(rel.freq = n / sum(n), accuracy = 0.1) |>
70+
ggplot(aes(x = fct_reorder(language, desc(rel.freq)), y = rel.freq, fill = language)) +
71+
geom_bar(stat = "identity") +
72+
geom_text(
73+
aes(label = scales::percent(rel.freq)),
74+
position = position_dodge(width = 0.9),
75+
vjust = -0.25,
76+
size = 4
77+
) +
78+
theme_minimal() +
79+
labs(title = "Percentage of R Core Sourcecode Files by Language") +
80+
theme(plot.title = element_text(size = 14),
81+
axis.title.x = element_blank(),
82+
axis.title.y = element_blank(),
83+
axis.text.x = element_text(size = 12),
84+
axis.text.y = element_blank()) +
85+
scale_fill_manual(values = c("R" = "#332288",
86+
"C" = "#882255",
87+
"Fortran" = "#44AA99"))
88+
```
89+
90+
Or, we can visualize the percentage of R Core lines of code by language:
91+
92+
```{r}
93+
r_loc |>
94+
filter(!is.na(language)) |>
95+
group_by(language) %>%
96+
summarise(sum_lines = sum(lines, na.rm = TRUE)) |>
97+
ungroup() |>
98+
mutate(percent = sum_lines/sum(sum_lines)) |>
99+
ggplot(aes(x = fct_reorder(language, desc(percent)), y = percent, fill = language)) +
100+
geom_bar(stat = "identity") +
101+
geom_text(
102+
aes(label = scales::percent(percent)),
103+
position = position_dodge(width = 0.9),
104+
vjust = -0.25,
105+
size = 4
106+
)+
107+
theme_minimal() +
108+
labs(title = "Percentage of R Core Lines of Code by Language") +
109+
theme(plot.title = element_text(size = 14),
110+
axis.title.x = element_blank(),
111+
axis.title.y = element_blank(),
112+
axis.text.x = element_text(size = 12),
113+
axis.text.y = element_blank(),
114+
legend.position = "none") +
115+
scale_fill_manual(values = c("R" = "#332288",
116+
"C" = "#882255",
117+
"Fortran" = "#44AA99"))
118+
```
119+
120+
It’s interesting to see how much goes into making R what it is: an ecosystem built on collaboration across languages and tools (which was the takeaway from the talk!). If you’re curious about R's source code, give the script a shot!

0 commit comments

Comments
 (0)