Skip to content

chore: writing lints with Vale #141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,23 @@ jobs:
echo "-----------"
cat words.txt

- name: Preprocess HTML for prose linter
run: |
# First the HTML needs preprocessing to work well with Vale
pushd _out
python3 -m venv venv
. venv/bin/activate
pip install beautifulsoup4
python ../.vale/scripts/rewrite_html.py html-multi html-vale
popd

- name: Prose linter
uses: errata-ai/[email protected]
with:
files: _out/html-vale/
fail_on_error: true
filter_mode: nofilter

- name: Save word count
run: |
echo "# Word Counts" >> $GITHUB_STEP_SUMMARY
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ figures/*.aux
/figures/*.pdf
/figures/*.svg
/figures/auto/
venv
24 changes: 24 additions & 0 deletions .vale.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
StylesPath = ".vale/styles"

MinAlertLevel = suggestion

Vocab = Lean

# Inline-level tags to ignore
IgnoredScopes = code, tt

# Block-level tags to ignore
SkippedScopes = script, style, pre, div

# For now, ignore style warnings on included docstrings. It seems
# that vale is also not ignoring an entire tree, but only text
# immediately under the element with the tag, so we need some extra
# things here (e.g. a span under a code is not being ignored)
IgnoredClasses = namedocs, hl, token, goal-name, citation, TODO


[*]
BasedOnStyles = Vale, Lean

# Lean has its own of these
Vale.Spelling = NO
75 changes: 75 additions & 0 deletions .vale/scripts/rewrite_html.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
import os
import sys
import re
from bs4 import BeautifulSoup

section_number_pattern = re.compile(r"^\s*(\d+\.)+\s*")

def process_html_file(filepath, output_filepath):
"""Reads an HTML file, removes 'QA' from headers, and writes the result to the output path."""
with open(filepath, 'r', encoding='utf-8') as file:
soup = BeautifulSoup(file, 'html.parser')

# Simplify all header tags
for header_tag in soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6']):
if header_tag.contents:
start = header_tag.contents[0]
if start.string:
no_num = section_number_pattern.sub("", start.string)
header_tag.contents[0].replace_with(no_num)
for other in header_tag.find_all(['span'], class_="permalink-widget"):
other.decompose()
header_tag.replace_with(header_tag.get_text())

# Simplify ordinal idiom
for code_tag in soup.find_all('code'):
# Check if the <code> tag is followed directly by 'th'
if code_tag.next_sibling and code_tag.next_sibling.string and code_tag.next_sibling.string.startswith('th'):
# Replace <code>XYZ</code>th with '5th'
code_tag.replace_with("5")

# Delete docstring content (for now)
for element in soup.find_all(class_="namedocs"):
element.decompose()

# Replace citations with their text
for element in soup.find_all(class_="citation"):
for inner in element.contents:
inner.replace_with(inner.get_text())

# Ensure the output directory exists
os.makedirs(os.path.dirname(output_filepath), exist_ok=True)

# Write the modified HTML to the new output location
with open(output_filepath, 'w', encoding='utf-8') as file:
file.write(str(soup))

def traverse_directory(source_directory, output_directory):
"""Recursively traverses source directory, processes HTML files, and writes them to output directory."""
for root, _, files in os.walk(source_directory):
for filename in files:
if filename.endswith('.html') or filename.endswith('.htm'):
filepath = os.path.join(root, filename)
# Create the equivalent path in the output directory
relative_path = os.path.relpath(filepath, source_directory)
output_filepath = os.path.join(output_directory, relative_path)

print(f"Processing {filepath} -> {output_filepath}")
process_html_file(filepath, output_filepath)

if __name__ == "__main__":
# Check for command-line arguments
if len(sys.argv) != 3:
print("Usage: python script.py <source_directory> <output_directory>")
sys.exit(1)

# Get source and output directories from command-line arguments
source_directory = sys.argv[1]
output_directory = sys.argv[2]

if not os.path.isdir(source_directory):
print("The provided source path is not a directory or does not exist.")
sys.exit(1)

traverse_directory(source_directory, output_directory)
print("Processing complete.")
7 changes: 7 additions & 0 deletions .vale/styles/Lean/Capitalization.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
extends: substitution
message: Use '%s' instead of '%s'
level: error
ignorecase: false
swap:
'unicode': 'Unicode'
'javascript': 'JavaScript'
6 changes: 6 additions & 0 deletions .vale/styles/Lean/Latin.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
extends: substitution
message: The word 'se' should occur as part of 'per se'. Use '%s' instead of '%s'.
level: error
ignorecase: true
swap:
- '(?:[^\s]*) se\b': 'per se'
18 changes: 18 additions & 0 deletions .vale/styles/Lean/Names.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
extends: substitution
message: Use '%s' instead of '%s'.
level: error
ignorecase: true
swap:
- 'de moura': 'de Moura'
- 'blott': 'Blott'
- 'carneiro': 'Carneiro'
- 'lua': 'Lua'
- 'Madelaine': 'Madelaine'
- 'mathlib': 'Mathlib'
- 'merkin': 'Merkin'
- 'peano': 'Peano'
- 'selsam': 'Selsam'
- 'simons': 'Simons'
- 'ullrich': 'Ullrich'
- 'wadler': 'Wadler'

9 changes: 9 additions & 0 deletions .vale/styles/Lean/Spelling.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
extends: spelling
message: "Did you really mean '%s'?"
level: error
ignore:
- names.txt
- terms.txt



21 changes: 21 additions & 0 deletions .vale/styles/Lean/TechnicalTerms.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
extends: substitution
message: Use '%s' instead of '%s'
level: error
ignorecase: true
swap:
'typeclass': 'type class'
'typeclasses': 'type classes'
'codepoint': 'code point'
'multibyte': 'multi-byte'
'iff': 'if and only if'
'nonoverlapping': 'non-overlapping'
'datatypes': 'types'
"datatype's": "type's"
'precompiling': 'pre-compiling'
'nonterminating': 'non-terminating'
'nontermination': 'non-termination'
'enum \b.+\b': 'enum inductive'
'letterlike': 'letter-like'
'semireducible': 'semi-reducible'
'\b(.+)\b simps': 'non-terminal simps'
'\babelian\b': 'Abelian'
8 changes: 8 additions & 0 deletions .vale/styles/Lean/Titles.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
extends: capitalization
message: "'%s' should be in title case: '%s'"
level: warning
scope: heading
match: $title
style: Chicago


12 changes: 12 additions & 0 deletions .vale/styles/config/ignore/names.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Blott
Carneiro
Lua
Madelaine
Mathlib
Merkin
Moura
Peano
Selsam
Simons
Ullrich
Wadler
101 changes: 101 additions & 0 deletions .vale/styles/config/ignore/terms.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
Abelian
Packrat
accessor
accessors
bitvector
bitvectors
bitwise
callout
checkpointing
combinator
combinators
constructorless
conv
cumulative
cumulativity
definitionally
desugar
desugared
desugaring
disjointness
disjunct
disjuncts
elaborator
elaborator's
elaborators
enum
equational
extensional
extensionality
guillemet
guillemets
hoc
impredicative
impredicativity
initializer
initializers
injectivity
interoperate
interoperates
iterator
iterator's
iterators
letterlike
lookup
lookups
memoization
metaprogram
metaprogramming
metaprograms
metatheoretic
metavariable
metavariable's
metavariables
monoid
monomorphic
monomorphism
multiset
multisets
namespace
namespaces
nullary
parameters'
polymorphic
polymorphically
predicative
predicativity
prepending
propositionally
recursor
recursor's
recursors
scrutinee
scrutinees
se
semigroup
semireducible
simp
simps
simproc
simprocs
subgoal
subgoals
subproblem
subproblems
subsingleton
subsingletons
substring
substrings
subterm
subterms
subtree
subtrees
subtype
subtyping
syntaxes
toolchain
toolchain's
toolchains
unary
unescaped
uninstantiated
Empty file.
33 changes: 33 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,36 @@ Please remember to get in touch ahead of time to plan a contribution. In general
* Technical terms should be introduced using the `deftech` role and referred to using the `tech` role.
* Write in US English, deferring to the Chicago Manual of Style when in doubt. Exceptions to this style may be added and documented.
* One sentence per line, to make diffs easier to follow.

## CI

The CI requires that various checks are passed.

One of them is that the text must live up to a number of rules written with Vale. The style implementation is still quite incomplete; just because your prose passes the linter doesn't mean it will necessarily be accepted!

To run the check, first install Vale. The next step is to preprocess the generated HTML to remove features that Vale can't cope with. Finally, Vale itself can be run.

To preprocess the HTML, use the script `.vale/scripts/rewrite_html.py`. It requires BeautifulSoup, so here's the overall steps to get it working the first time:

```
$ cd .vale/scripts
$ python3 -m venv venv
$ . ./venv/bin/activate # or the appropriate script for your shell, e.g. activate.fish
$ pip install beautifulsoup4
```
After that, just run
```
$ . .vale/scripts/venv/bin/activate
```
to set up the Python environment.

The next step is to run this on Verso's output. If it's in `_out/html-multi`, do this via:
```
$ cd _out
$ python ../.vale/scripts/rewrite_html.py html-multi html-vale
```

Now, run `vale`:
```
$ vale html-vale
```
4 changes: 2 additions & 2 deletions Manual/BasicTypes.lean
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ tag := "basic-types"
%%%


Lean includes a number of built-in datatypes that are specially supported by the compiler.
Lean includes a number of built-in types that are specially supported by the compiler.
Some, such as {lean}`Nat`, additionally have special support in the kernel.
Other types don't have special compiler support _per se_, but rely in important ways on the internal representation of types for performance reasons.

Expand Down Expand Up @@ -99,7 +99,7 @@ tag := "char-runtime"
%%%


In monomorphic contexts, characters are represented as 32-bit immediate values. In other words, a field of a datatype or structure of type `Char` does not require indirection to access. In polymorphic contexts, characters are boxed.
In monomorphic contexts, characters are represented as 32-bit immediate values. In other words, a field of a constructor or structure of type `Char` does not require indirection to access. In polymorphic contexts, characters are boxed.


## API Reference
Expand Down
4 changes: 2 additions & 2 deletions Manual/BasicTypes/Nat.lean
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The natural numbers are nonnegative integers.
Logically, they are the numbers 0, 1, 2, 3, …, generated from the constructors {lean}`Nat.zero` and {lean}`Nat.succ`.
Lean imposes no upper bound on the representation of natural numbers other than physical constraints imposed by the available memory of the computer.

Because the natural numbers are fundamental to both mathematical reasoning and programming, they are specially supported by Lean's implementation. The logical model of the natural numbers is as an inductive datatype, and arithmetic operations are specified using this model. In Lean's kernel, the interpreter, and compiled code, closed natural numbers are represented as efficient arbitrary-precision integers. Sufficiently small numbers are immediate values that don't require indirection through a pointer. Arithmetic operations are implemented by primitives that take advantage of the efficient representations.
Because the natural numbers are fundamental to both mathematical reasoning and programming, they are specially supported by Lean's implementation. The logical model of the natural numbers is as an {tech}[inductive type], and arithmetic operations are specified using this model. In Lean's kernel, the interpreter, and compiled code, closed natural numbers are represented as efficient arbitrary-precision integers. Sufficiently small numbers are immediate values that don't require indirection through a pointer. Arithmetic operations are implemented by primitives that take advantage of the efficient representations.

# Logical Model
%%%
Expand Down Expand Up @@ -79,7 +79,7 @@ tag := "nat-performance"

Using Lean's built-in arithmetic operators, rather than redefining them, is essential.
The logical model of {lean}`Nat` is essentially a linked list, so addition would take time linear in the size of one argument.
Still worse, multiplication takes quadradic time in this model.
Still worse, multiplication takes quadratic time in this model.
While defining arithmetic from scratch can be a useful learning exercise, these redefined operations will not be nearly as fast.

# Syntax
Expand Down
Loading
Loading