SIFT-OCR

A novel unsupervised end-to-end manga translation & inpainting pipeline.

See paper here.

Paper

The source for the three required deliverables (proposal, 2 updates) are broken down into 3 parts under docs/src. The files are organized like so under docs

.
├── compile.py
├── final.md
├── proposal.md
├── src
│   ├── final
│   │   ├── images
│   │   ├── index.md
│   │   └── sections
│   ├── proposal
│   │   ├── images
│   │   ├── index.md
│   │   └── sections
│   └── update
│       ├── images
│       ├── index.md
│       └── sections
│           ├── abstract.md
│           ├── approach.md
│           ├── conclusion.md
│           ├── experiments_and_results.md
│           ├── introduction.md
│           ├── qualitative_results.md
│           └── references.md
└── update.md

Deliverables

Click here to see the proposal.

Click here to see the first midterm update.

Click here to see the (second) final update.

How to contribute?

There are three versions, proposal, update, and final, and the source files for these are located under their respective directory under docs/src.

Each section of each version have their respective file under sections directory for each update. To work on a section, edit individual markdown files in there. Edit these, not the rendered version.

When you are done, execute the following script to recompile and commit:

Compile

Make sure you are under the docs directory, and have python3. The compiler will generate the final paper for each version (proposal.md, update.md, final.md) and put them under docs

chmod +x compile.py && ./compile.py

Images

To add images, put them under images directory for the correct version. Yes, each version has its own images directory. To link them from markdown, use the relative path. E.g. [Alt Text](../images/<filename>)

How to add a new section?

In index.md for the version you are working on, first add the line (order matters).

[//]: # "<section_name>.md"

Then, in sections.md under sections directory for the version, create the corresponding file section-name.md, edit the content of the section there.

Submission

Please Read

It is required to submit a self contained website for the website, GitHub Pages won't do. However, there is a work around - after each push to the repo, the GitHub site is automatically generated. Wait for the generation process to finish (you will see a green mark at the end of the commit), then run the following command to generate a static version of the site:

wget -P site -mpck --user-agent="" -e robots=off --wait 1 -E https://jiachenren.github.io/cs4476-cv-project/

Zip the folder site/jiachenren.github.io/cs4476-cv-project (which is generated by the above script) and submit that.

Database

We are using both self collected data the eDBtheque database.

Data collection protocol

Self collected data

The self collected data contain several manga pages crawled from different websites. They are used purely for research purposes.

Currently, we have uploaded 2 chapters of 2 different comics from the romantized indonesian manga site sektekomik to serve as our system's test data.

eDBtheque

The state-of-the-art manga database with ground truth pixel level labelling for panels and speech bubbles. It contains 100 pages in total, and is used in most of the relevant researches pertaining to information retrieval (IR) from manga.

If you are part of this project, contact Jiachen for the database login credentials. Otherwise request access from the owner here

Project Dependencies & Installation Guide

Install Tesseract OCR

This project makes use of google's Tesseract OCR for text recognition. In order for the system to successfully, run, please install command line tool tesseract and add it to path. For macos, just run

brew install tesseract

For other systems, refer to this guide. When you install tesseract, you might encounter some non-fatal errors, just ignore them. You'll be fine as long as you have the final binary.

Install Tesseract python package

In the project directory (assuming that you have your venv created), run

pip3 install pytesseract

Python dependencies

opencv-python

Preprocess input image by applying threshold and de-noise to convert to binary
SIFT related functionalities for extracting features from recognized text blocks
Group contours in detected text blocks for character level segmentation
Flood fill of speech bubbles using SIFT key points cluster centers as seeding coordinates
Morphing of speech bubble binary mask to consume texts within

PIL

Highlight text blocks
Converting between color spaces, write back to disk
Mask detected text blocks for iterative OCR

sklearn

- MeanShift clustering of SIFT descriptor matches in masked image to hypothesize new dialog bounding boxes

KMeans clustering of pixels under flood-fill seed mask to extract dominant color

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
docs		docs
src		src
.gitignore		.gitignore
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SIFT-OCR

Paper

Deliverables

How to contribute?

Compile

Images

How to add a new section?

Submission

Database

Data collection protocol

Self collected data

eDBtheque

Project Dependencies & Installation Guide

Install Tesseract OCR

Install Tesseract python package

Python dependencies

opencv-python

PIL

sklearn

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

JiachenRen/sift-ocr

Folders and files

Latest commit

History

Repository files navigation

SIFT-OCR

Paper

Deliverables

How to contribute?

Compile

Images

How to add a new section?

Submission

Database

Data collection protocol

Self collected data

eDBtheque

Project Dependencies & Installation Guide

Install Tesseract OCR

Install Tesseract python package

Python dependencies

opencv-python

PIL

sklearn

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages