Skip to content

Conversation

@trixmoe
Copy link

@trixmoe trixmoe commented Jun 7, 2025

This PR ports the existing Drone CI commands to GitHub Actions and Forgejo Actions.

It currently has both, but is created for the purpose of discussion. I do not necessarily believe that adding everything is a good idea.

PR Details

The workflow is sufficiently generic that it could be ported easily to Forgejo by changing the base runs-on/"OS" of the job. For testing reasons, the Forgejo Actions specify Codeberg's runner.

I'd like to add Woodpecker CI, but testing it is not very easy, as Codeberg's implementation currently requires a manual application and review process.

GitHub Actions' recent addition of arm64 also makes testing for that architecture much easier, though it is important to note that it is currently in "Public Preview".

CI-specific information/questions

At the time of writing, python:3.5 and python:3.6 fail with the following line:

FAILED (SKIP=11, errors=10, failures=71)

python:3.4 fails with the following line:

FAILED (SKIP=11, errors=13, failures=71)

It is impossible for me to check whether this is expected or not, as there are no logs of previous or current CI runs.

@trixmoe
Copy link
Author

trixmoe commented Jun 7, 2025

You can see a run of this here:

I added Forgejo Actions mostly due to the argument that relying on GitHub Actions is bad. (I do agree, but it is free compute that now also supports arm64)

@mweinelt
Copy link

mweinelt commented Jun 8, 2025

If you update to html5lib==1.1 you can apply the following change to fix the import

diff --git a/wpull/document/htmlparse/html5lib_.py b/wpull/document/htmlparse/html5lib_.py
index 6f24743..ba2b746 100644
--- a/wpull/document/htmlparse/html5lib_.py
+++ b/wpull/document/htmlparse/html5lib_.py
@@ -1,8 +1,8 @@
 '''Parsing using html5lib python.'''
 import html5lib.constants
-import html5lib.tokenizer
 import io
 import os.path
+from html5lib._tokenizer import HTMLTokenizer
 
 from wpull.document.htmlparse.base import BaseParser
 from wpull.document.htmlparse.element import Comment, Doctype, Element
@@ -24,7 +24,7 @@ class HTMLParser(BaseParser):
         return ValueError
 
     def parse(self, file, encoding=None):
-        tokenizer = html5lib.tokenizer.HTMLTokenizer(
+        tokenizer = HTMLTokenizer(
             file, encoding=encoding,
             useChardet=False if encoding else True,
             parseMeta=False if encoding else True,
@@ -97,7 +97,7 @@ if __name__ == '__main__':
         'testing', 'samples', 'xkcd_1.html'
         )
     with open(path, 'rb') as in_file:
-        tokenizer = html5lib.tokenizer.HTMLTokenizer(in_file)
+        tokenizer = HTMLTokenizer(in_file)
 
         for token in tokenizer:
             print(token)

Actually anything before html5lib/html5lib-python@c4dd677 would work without this patch, but this change made the tokenizer class private.

diff --git a/requirements.txt b/requirements.txt
index e9fbfb5..b25b687 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,7 +1,7 @@
 # Absolutely known to work versions only:
 chardet>=2.0.1,<=2.3
 dnspython3==1.12
-html5lib>=0.999,<1.0
+html5lib>=0.999,<0.99999999
 lxml>=3.1.0,<=3.5
 namedlist>=1.3,<=1.7
 psutil>=2.0,<=4.2

@trixmoe trixmoe marked this pull request as ready for review July 10, 2025 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants