PDF to Markdown Converter

This tool helps you convert complex PDF files (columns, tables, formulas, charts, flowcharts, etc.) into clear Markdown format using Google's Gemini vision models. You even do not need OCR. It's designed to be simple to use, even if you're not familiar with programming.

What Does It Do?

Converts PDF files to Markdown format
Keeps the original document structure
Handles tables, formulas using LaTeX, and special characters
Processes multiple PDF files at once
Creates clean, readable Markdown files

Before You Start

You'll need:
- Python installed on your computer (version 3.7 or newer)
- A Google Gemini API key (you can get one from Google AI Studio)
Required Python packages:
```
pip install google-generativeai pathlib
```

Setup Instructions

Download the pdf2md.py script to your computer
Open the script in a text editor and add your Google API key:
- Find this section at the top of the file:
```
API_KEY = "INSERT YOUR API KEY HERE"
```
- Replace "INSERT YOUR API KEY HERE" with your actual API key
- Save the file

How to Use

Put the script (pdf2md.py) in the same folder as your PDF files
Open a command prompt or terminal in that folder
Run the script:
```
python pdf2md.py
```

or for Mac users

 python3 pdf2md.py

Choose a model when prompted:
- Press 1 for the fastest and cheapest conversion (good for simple documents)
- Press 2 for the highest quality (best for complex documents)
- Press 3 for the excellent experimental model (check out for name on Google API site and modify appropriately)
Wait for the conversion to finish
- The script will create an "output" folder
- Your converted Markdown files will be saved there
- Each file will have the same name as the original PDF but with .md extension

Features

Easy model selection with numbered options
Processes multiple PDFs at the same time
Creates an organized output folder
Shows progress as it works
Provides a summary when finished

Troubleshooting

If you run into problems:

Make sure your API key is correctly entered in the script
Check names of Google models and correct them if differ
Check that your PDF files are in the same folder as the script
Ensure you have internet connection (needed for the Google API)
Make sure you've installed all required Python packages

Notes

The conversion quality depends on the PDF's structure and content
Some complex layouts might not convert perfectly or you have to parse long complex multicolumn texts to more documents. Based on testing, 5-6 pages is maximum in one complex document if you're using free tier Google API
The script needs internet connection to work
Each PDF conversion uses your Google API quota, choose model wisely

Need Help?

If you have questions or run into problems:

Check the troubleshooting section above
Make sure you followed all setup steps
Create an issue on GitHub if you need more help

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
LICENSE		LICENSE
README.md		README.md
pdf2md.py		pdf2md.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF to Markdown Converter

What Does It Do?

Before You Start

Setup Instructions

How to Use

Features

Troubleshooting

Notes

Need Help?

About

Languages

License

PetrAPConsulting/pdf2md

Folders and files

Latest commit

History

Repository files navigation

PDF to Markdown Converter

What Does It Do?

Before You Start

Setup Instructions

How to Use

Features

Troubleshooting

Notes

Need Help?

About

Topics

Resources

License

Stars

Watchers

Forks

Languages