A novel unsupervised end-to-end manga translation & inpainting pipeline.
See paper here.
The source for the three required deliverables (proposal, 2 updates) are broken down into 3 parts under docs/src. The files are organized like so under docs
.
├── compile.py
├── final.md
├── proposal.md
├── src
│ ├── final
│ │ ├── images
│ │ ├── index.md
│ │ └── sections
│ ├── proposal
│ │ ├── images
│ │ ├── index.md
│ │ └── sections
│ └── update
│ ├── images
│ ├── index.md
│ └── sections
│ ├── abstract.md
│ ├── approach.md
│ ├── conclusion.md
│ ├── experiments_and_results.md
│ ├── introduction.md
│ ├── qualitative_results.md
│ └── references.md
└── update.md
Click here to see the proposal.
Click here to see the first midterm update.
Click here to see the (second) final update.
There are three versions, proposal
, update
, and final
, and the source files for these are located under their respective
directory under docs/src.
Each section of each version have their respective file under sections
directory for each update.
To work on a section, edit individual markdown files in there. Edit these, not the rendered version.
When you are done, execute the following script to recompile and commit:
Make sure you are under the docs
directory, and have python3
.
The compiler will generate the final paper for each version (proposal.md
, update.md
, final.md
) and put them under docs
chmod +x compile.py && ./compile.py
To add images, put them under images
directory for the correct version. Yes, each version has its own images
directory.
To link them from markdown, use the relative path. E.g. [Alt Text](../images/<filename>)
In index.md
for the version you are working on, first add the line (order matters).
[//]: # "<section_name>.md"
Then, in sections.md
under sections
directory for the version, create the corresponding file section-name.md
, edit the content of the section there.
Please Read
It is required to submit a self contained website for the website, GitHub Pages won't do. However, there is a work around - after each push to the repo, the GitHub site is automatically generated. Wait for the generation process to finish (you will see a green mark at the end of the commit), then run the following command to generate a static version of the site:
wget -P site -mpck --user-agent="" -e robots=off --wait 1 -E https://jiachenren.github.io/cs4476-cv-project/
Zip the folder site/jiachenren.github.io/cs4476-cv-project
(which is generated by the above script) and submit that.
We are using both self collected data the eDBtheque
database.
The self collected data contain several manga pages crawled from different websites. They are used purely for research purposes.
Currently, we have uploaded 2 chapters of 2 different comics from the romantized indonesian manga site sektekomik to serve as our system's test data.
The state-of-the-art manga database with ground truth pixel level labelling for panels and speech bubbles. It contains 100 pages in total, and is used in most of the relevant researches pertaining to information retrieval (IR) from manga.
If you are part of this project, contact Jiachen for the database login credentials. Otherwise request access from the owner here
This project makes use of google's Tesseract OCR for text recognition. In order
for the system to successfully, run, please install command line tool tesseract
and add it to path. For macos, just run
brew install tesseract
For other systems, refer to this guide. When you install tesseract, you might encounter some non-fatal errors, just ignore them. You'll be fine as long as you have the final binary.
In the project directory (assuming that you have your venv created), run
pip3 install pytesseract
- Preprocess input image by applying threshold and de-noise to convert to binary
SIFT
related functionalities for extracting features from recognized text blocks- Group contours in detected text blocks for character level segmentation
- Flood fill of speech bubbles using SIFT key points cluster centers as seeding coordinates
- Morphing of speech bubble binary mask to consume texts within
- Highlight text blocks
- Converting between color spaces, write back to disk
- Mask detected text blocks for iterative OCR
- MeanShift
clustering of SIFT
descriptor matches in masked image to hypothesize
new dialog bounding boxes
KMeans
clustering of pixels under flood-fill seed mask to extract dominant color