In accordance with and in shared pursuit of WSU's research exchange mission, we
would like to help create a space designed to preserve and share university scholarship.
Within a single and shared digital repository, not only do we want a limitless array of knowledge
from articles, books, papers, and reports, but we want this digital media to be accessible to all.
This starts, first, with these digital documents meeting the standards set out by W3C, an
international community trying to bring public work together by providing concrete standards for
websites and digital media.
Many digital works are brought to the research exchange repository lacking the initial
accessibility standards for digital media set out by the international community, and has resulted
in an unknown but copious amount of educational media with sub-optimal accessibility. Our
goal is to create an application that can take a pdf and create a modified version that does not
change the comprehension or meaning of the work but heightens that document's accessibility
to that of W3C standards. We wish to then streamline this process, with it not just assisting a
single document, but that of an entire repository, to make the entire WSU research exchange
significantly more accessible to all.
To create software that can transform documents in W3C accessible documents, allowing those with various
disabilities equitable access to information.
In our quest to bring more accessibility to the WSU research exchange, we have landed on a few key accessibility
features to focus on, at least at the start, which can be expanded to other things as our project progresses.
These initial features include document metadata, color contrast, tagging, alternative text for images, and
reading order. Our desire is to create a fully automated system of taking pdf documents and converting them
into these more accessible versions.
- Latest version of Python from here (make sure to add Python to PATH in the installer)
- Latest version of Node JS from here
No Add ons.
- Clone or download the repository
- Install prerequisites from the links above
- Run the
setup.bat
script to install required modules - Run the
main.py
script to start the software
- Run the software from
main.py
- Automated document processing converting pdfs into their accessible counterparts
- Pdf processing via the following input options:
- Iteration through the WSU Research Exchange repository
- Singular document identifier
- List of document identifiers
- Input folder of documents
Some of the pipeline accessibility transformations are not operational at this time.
Tag Tree generation is possible but not optimally accurate. Working on 2-factor validation between the document layout and paragraph list.
- Fork it!
- Create your feature branch:
git checkout -b my-new-feature
- Commit your changes:
git commit -am 'Add some feature'
- Push to the branch:
git push origin my-new-feature
- Submit a pull request 😄