Skip to content

Comments

issue#413 potential solution#425

Merged
chrismattmann merged 2 commits intochrismattmann:masterfrom
Zili-Yang-z:issue413_attempt3
Apr 6, 2025
Merged

issue#413 potential solution#425
chrismattmann merged 2 commits intochrismattmann:masterfrom
Zili-Yang-z:issue413_attempt3

Conversation

@Zili-Yang-z
Copy link

@Zili-Yang-z Zili-Yang-z commented Mar 25, 2025

What this PR does:
A potential solution for issue #413.

The current use of pkg_resources raises a DeprecationWarning during import. pkgutil is recommended as a lightweight, standard-library alternative. Also, pkg_resources from setuptools has been deprecated, since implicit namespace packages were introduced in PEP 420.

This PR replaces the deprecated pkg_resources.declare_namespace with pkgutil.extend_path for namespace declaration in tika/__init__.py.

Edited Files:

  • tika/__init__.py: Replaced pkg_resources.declare_namespace(__name__) with pkgutil.extend_path(__path__, __name__) and removed the fallback for pkg_resources.
  • setup.py: Added namespace_packages=['tika'].

Updated on Apr 1, 2025:
Removed namespace_packages=['tika'] from setup.py.
setup.py remains the same as the original version.

@chrismattmann
Copy link
Owner

thanks @Zili-Yang-z how can I test that this resolves the error?

@chrismattmann chrismattmann self-assigned this Mar 31, 2025
@chrismattmann chrismattmann added question py3 dependencies Pull requests that update a dependency file labels Mar 31, 2025
@chrismattmann chrismattmann added this to the tika-next milestone Mar 31, 2025
@Zili-Yang-z
Copy link
Author

Zili-Yang-z commented Apr 1, 2025

thanks @Zili-Yang-z how can I test that this resolves the error?

You're welcome!

I'm testing by importing tika in Google Colab.
When importing the original tika with warnings.simplefilter('always'), I got:

/usr/local/lib/python3.11/dist-packages/pkg_resources/__init__.py:3154: DeprecationWarning: Deprecated call to pkg_resources.declare_namespace('sphinxcontrib').
...
Implementing implicit namespace packages (as specified in PEP 420) is preferred to pkg_resources.declare_namespace. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
__import__('pkg_resources').declare_namespace(__name__)

Then I zipped and installed the modified tika that uses pkgutil.extend_path in __init__.py. After importing it with warnings.simplefilter('always'), the DeprecationWarnings disappeared as expected.

The test notebook for the original tika:
https://colab.research.google.com/drive/1NQg0HL4cFWra3yDW3sqEDw1sH26artkp?usp=sharing

The test notebook for the modified tika:
https://colab.research.google.com/drive/1rdiol2yxShLotQZQdjYIshmxWRMlhM4V?usp=sharing

Cannot attach the modified tika here since it exceeds 25 MB. It's exactly what I have edited in the PR.

@Zili-Yang-z
Copy link
Author

@chrismattmann I hope it will help!

@chrismattmann
Copy link
Owner

LGTM! I will merge now @Zili-Yang-z !

@chrismattmann chrismattmann merged commit e3cc7aa into chrismattmann:master Apr 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file py3 question

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants