diff --git a/README.md b/README.md index 007ec80f..f3f6dcef 100644 --- a/README.md +++ b/README.md @@ -26,7 +26,7 @@ This repo implements a pre-processing pipeline for the following documents. Curr | Category | Document Types | |-----------|-------------------------------| | Plaintext | `.txt`, `.eml`, `.msg`, `.xml`, `.html`, `.md`, `.rst`, `.json`, `.rtf` | -| Images | `.jpeg`, `.png` | +| Images | `.jpeg`, `.png`, `.heic` | | Documents | `.doc`, `.docx`, `.ppt`, `.pptx`, `.pdf`, `.odt`, `.epub`, `.csv`, `.tsv`, `.xlsx` | | Zipped | `.gz` | @@ -149,7 +149,7 @@ To extract the table structure from PDF files using the `hi_res` strategy, ensur #### Skip Table Extraction Currently, we provide support for enabling and disabling table extraction for file types other than PDF files. Set parameter `skip_infer_table_types` to specify the document types that you want to skip table extraction with. By default, we skip table extraction -for PDFs and Images, which are `pdf`, `jpg` and `png`. Again, please note that table extraction only works with `hi_res` strategy. For example, if you don't want to skip table extraction for images, you can pass an empty value to `skip_infer_table_types` with: +for PDFs and Images, which are `pdf`, `jpg`, `heic`, and `png`. Again, please note that table extraction only works with `hi_res` strategy. For example, if you don't want to skip table extraction for images, you can pass an empty value to `skip_infer_table_types` with: ``` curl -X 'POST' \ diff --git a/prepline_general/api/general.py b/prepline_general/api/general.py index 037bc8af..3d3cb0fc 100644 --- a/prepline_general/api/general.py +++ b/prepline_general/api/general.py @@ -63,7 +63,7 @@ def is_compatible_response_type(media_type: str, response_type: type) -> bool: DEFAULT_MIMETYPES = ( - "application/pdf,application/msword,image/jpeg,image/png,text/markdown," + "application/pdf,application/msword,image/jpeg,image/png,image/heic,text/markdown," "text/x-markdown,text/html," "application/vnd.openxmlformats-officedocument.wordprocessingml.document," "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet,"