Release/v1.5.0 by jansaldo · Pull Request #75 · AymurAI/backend

jansaldo · 2026-02-20T12:54:03Z

No description provided.

…/langextract

…presentation

…/langextract

…ure/langextract

… openai extra for langextract

…ty example

…ability Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

Co-authored-by: Copilot <[email protected]>

Feature/langextract

* ✨ Add GPU-enabled Ollama service to compose stack * 🔧 Add Make targets for managing Ollama service and models * 🔧 Add launch configuration and task for starting Ollama service

* ✨ Add GPU-enabled Ollama service to compose stack * 🔧 Add Make targets for managing Ollama service and models * 🔧 Add launch configuration and task for starting Ollama service * ✨ Implement LLM providers module with Ollama adapter and shared abstractions * ✅ Add unit tests for LLM providers including DummyProvider and OllamaLLMProvider * 📝 Document Ollama provider usage via notebook demo * 🐛 Fix tokenizer encoding by removing unnecessary special tokens flag * ♻️ Refactor chunk handling in LLMProvider to use _append_chunk method for consistency and improved readability * ✨ Enhance Ollama provider docs and DRY response building for sync/async calls * ♻️ Refactor OllamaLLMProvider to reuse AsyncClient instance for improved efficiency * 📝 Add async examples to OllamaLLMProvider notebook * ✅ Add async coverage for OllamaLLMProvider and tighten chunking tests * ♻️ Refactor OllamaLLMProvider to remove async client caching and streamline client instantiation

* Update .gitignore to exclude entity disambiguation experiment directories and modify Jupyter notebook execution counts and output handling * Refactor Makefile for improved service management and update .gitignore to exclude specific experiment directories. Add new Jupyter notebooks for entity disambiguation metrics and documentation. * Adjust example data for consistency in entity representation. * Refactor entity disambiguation notebooks to standardize attribute naming and improve metric evaluation. Update role attribute from 'rol' to 'role' for consistency across examples and documentation. Adjust evaluation function to return both score and metrics. * Add evaluation metrics for entity disambiguation - Introduced new metrics module for evaluating entity disambiguation performance, including functions for alias normalization, Jaccard similarity, and greedy matching. - Implemented main evaluation function to compute scores and metrics from gold and predicted entities. - Added Jupyter notebooks for practical examples and evaluation results, including normalized and non-normalized text evaluations. - Updated documentation to reflect changes in function signatures and outputs. * 🔧 Expand Makefile: add API management targets (api-run, api-stop, api-logs, api-full-run) for smoother service control * ♻️ Refactor metrics.py: clarify docstrings, align type hints, and polish logging * ✏️ Fix role attribute reference in evaluation metric documentation for consistency * 🔧 Add CanonicalEntities class to represent a collection of canonical entities * 📝 Update entity disambiguation notebooks: clean up imports, adjust paths, and streamline API calls for improved clarity and functionality --------- Co-authored-by: padonizetti Co-authored-by: jansaldo

* ✨ feat: Add Streamlit app for document summarization experiments * Add statistical analysis notebook for summarization performance evaluation( Visualized gaps in performance between CPU and CUDA models, llm alucinations) * 🎨 Quantitative and qualitative analysis of summaries: descriptive analysis by features, model comparison, gap analusis (CPU-CUDA), Garbage detection/outliers, analysis by document, visuailzations. * 🔒️ clear all outputs * 🎨 Improve Summary Analysis per document: cuda vs llama (same model), gemma vs llama (cuda), same document phi3 vs. phi4. Token per second gap. * ✨ Add YAML utility functions for loading and saving data * Merge dev into main for v1.1.12 (#57) * Update README.md * 🐛 bugfix: Fix XML special character escaping in DocAnonymizer * ➕ build(deps): Add python-docx package * ✨ feat: Add watermark and hyperlink functionality to document anonymization * ✨ feat: Install Archivo font in Dockerfile * 🎨 refactor: Improve Dockerfile structure and comments for clarity * ⏪ revert: Remove Archivo font installation from Dockerfile * 🔖 feat: Update aymurai package version to 1.1.11 in uv.lock * 🐛 Improve get_extension logic to fix document extraction issues on Windows and remove python-magic dependency * 🔧 Update Dockerfile to use 'bullseye' variant for Python images for improved compatibility * 🔧 Update Makefile targets for improved Docker workflow * 🔖 feat: Bump aymurai package version to 1.1.12 * ♻️ Harden get_extension with header scans and zip safeguards * 🔧 Extend document extraction timeout to 30s * 🔧 Refactor Docker workflow to build and push images using docker/build-push-action * 🔧 Fix workflow step order to correctly extract tag name before building Docker images * 🔧 Remove tag extraction step and use github.ref_name directly for Docker image builds * ⏪ Revert Docker workflow to extract tag name and use it for image versioning * Update .github/workflows/build-docker-image.yml Co-authored-by: Copilot <[email protected]> * ✏️ Remove incomplete comment Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: jed <[email protected]> Co-authored-by: Copilot <[email protected]> * ✨ Add GPU-enabled Ollama service to compose stack * 🔧 Add Make targets for managing Ollama service and models * 🔧 Add launch configuration and task for starting Ollama service * 🔧 Add system prompts for document summarization * 📝 Add summarization benchmark notebook * 🚚 Move statistical analysis notebook to summarization folder * ✨ Implement LLM providers module with Ollama adapter and shared abstractions * ✅ Add unit tests for LLM providers including DummyProvider and OllamaLLMProvider * 📝 Document Ollama provider usage via notebook demo * 🐛 Fix tokenizer encoding by removing unnecessary special tokens flag * ♻️ Refactor chunk handling in LLMProvider to use _append_chunk method for consistency and improved readability * ✨ Enhance Ollama provider docs and DRY response building for sync/async calls * ♻️ Refactor OllamaLLMProvider to reuse AsyncClient instance for improved efficiency * 📝 Add async examples to OllamaLLMProvider notebook * ✅ Add async coverage for OllamaLLMProvider and tighten chunking tests * ➕ Add tiktoken dependency to pyproject.toml and update version in uv.lock * 🔧 Enhance summarization prompts with additional information extraction and entity identification details * ✨ Add LLM summarization router * 📝 Add notebook for the summarization endpoint * ✏️ Fix formatting of keys in summarization defaults for consistency * ➕ Add dspy dependency and update related packages in project configuration * 🚧 WIP: Add prompt optimization notebook for summarization experiments --------- Co-authored-by: Sofi <[email protected]> Co-authored-by: jed <[email protected]> Co-authored-by: Copilot <[email protected]>

#64) * Merge dev into main for v1.1.12 (#57) * Update README.md * 🐛 bugfix: Fix XML special character escaping in DocAnonymizer * ➕ build(deps): Add python-docx package * ✨ feat: Add watermark and hyperlink functionality to document anonymization * ✨ feat: Install Archivo font in Dockerfile * 🎨 refactor: Improve Dockerfile structure and comments for clarity * ⏪ revert: Remove Archivo font installation from Dockerfile * 🔖 feat: Update aymurai package version to 1.1.11 in uv.lock * 🐛 Improve get_extension logic to fix document extraction issues on Windows and remove python-magic dependency * 🔧 Update Dockerfile to use 'bullseye' variant for Python images for improved compatibility * 🔧 Update Makefile targets for improved Docker workflow * 🔖 feat: Bump aymurai package version to 1.1.12 * ♻️ Harden get_extension with header scans and zip safeguards * 🔧 Extend document extraction timeout to 30s * 🔧 Refactor Docker workflow to build and push images using docker/build-push-action * 🔧 Fix workflow step order to correctly extract tag name before building Docker images * 🔧 Remove tag extraction step and use github.ref_name directly for Docker image builds * ⏪ Revert Docker workflow to extract tag name and use it for image versioning * Update .github/workflows/build-docker-image.yml * ✏️ Remove incomplete comment --------- * ♻️ refactor: Restructure USEM module with factory pattern and multiple encoder backends - Add BaseSentenceEncoder abstract base class for encoder interface - Implement factory pattern with EncoderType enum and create_encoder function - Add sentence-transformers encoder implementations (DistilUSE, MultilingualMiniLM) - Move TensorFlow implementation to tensorflow_encoder.py - Add lazy loading for encoder implementations via __getattr__ - Add auto-detection for Apple Silicon compatibility (defaults * 🚚 Rename test sentence encoders mac notebook * 📌 Sync dependencies ---------

…pose

…alidations

* 🔧 Configure VSCode Python env and Copilot scopes * 🔧 Include resources/llm in .dockerignore * 📌 Update dependencies in pyproject.toml and uv.lock * 🔧 Update Dockerfile and devcontainer.json to install additional PDF tooling * ♻️ Refactor Makefile and docker-compose.yml for improved service configuration and flexibility * 🚧 FIXME: Remove DecisionConv1dBinRegex model from pipeline configuration for dependencies update compatibility * 🔧 Set weights_only=False for torch.load compatibility * ✨ Enhance PDF extraction with marker integration and improved text processing * 🔧 Update run_safe_text_extraction to allow indefinite timeout by default * ✨ Add warm_marker_models function to initialize marker-pdf artifacts at startup * 🔥 Remove unused environment variables and rename TRANSFORMERS_CACHE to HF_HOME * 🔧 Improve service stopping logic for Ollama and API services in Makefile * 🔖 Bump aymurai package version to 2.0.0-alpha.1 * 🔧 Update HF_HOME path and remove HF_DATASETS_CACHE variable in .env.common * 🔧 Update OLLAMA_HOST for GPU-enabled services to point to ollama-gpu * 🔧 Simplify marker model warming logic by removing error handling * ♻️ Refactor text extraction into modular format-specific extractors * ✅ Add unit tests for document extraction and error handling * ➕ Add marker-pdf stack and drop textract * 🔧 Enhance PDF extraction with caching mechanism * 📝 Improve cache utility functions with enhanced docstrings and type hints * 🔧 Enhance cache key generation in PdfExtractor for improved stability and performance * 🔖 Update aymurai package version to 2.0.0a2.dev9

* 🩹 Ensure consistent entity attributes in reformat_entity function and reorder imports * 📝 Update subcategories exploration notebook * ⚗️ Add TensorFlow deprecation experiment notebook * ♻️ Refactor entity subcategorization: Remove USEMSubcategorizer, add SentenceTransformerSubcategorizer - Removed the USEMSubcategorizer implementation from `usem.py`. - Introduced new Jupyter notebooks for testing and evaluating the SentenceTransformerSubcategorizer. - Updated the pipeline configuration to utilize SentenceTransformerSubcategorizer with local embeddings instead of remote URLs. * ♻️ Refactor download function: Replace gdown with requests for improved file downloading * 🔥 Remove empty peft model module * ➖ Remove TensorFlow and gdown dependencies from pyproject.toml * 📌 Update uv.lock * ♻️ Refactor sentence encoder module: Remove unused dependencies and streamline factory functions * 🔖 Update aymurai package version to 2.0.0a3.dev9

…bdirectories and non-IPYNB files

…lback

…used dependencies

…ambiguation options in LabelPolicy

…s.py

…s across multiple modules

…ymizer/anonymizer.py for release/v1.5.0 compatibility 🔥 Removed `llm` disambiguation label policy for release/v1.5.0 compatibility

…entity_disambiguation/core.py discarding the role assignment for release/v1.5.0 compatibility 🎨 Changed aymurai/api/endpoints/routers/anonymizer/anonymizer.py discarding all the validations that had to do with LLM disambiguation for release/v1.5.0 compatibility 🎨 Minor changes in the rest of documents regarding to experimentation with the release/v1.5.0 API

…actor config passthrough and restore fixed timeout

…ebug path

…nto release/v1.5.0

…ntityAttributes

…ents for backward compatibility

…74) * Merge dev into main for v1.1.12 (#57) * Update README.md * 🐛 bugfix: Fix XML special character escaping in DocAnonymizer * ➕ build(deps): Add python-docx package * ✨ feat: Add watermark and hyperlink functionality to document anonymization * ✨ feat: Install Archivo font in Dockerfile * 🎨 refactor: Improve Dockerfile structure and comments for clarity * ⏪ revert: Remove Archivo font installation from Dockerfile * 🔖 feat: Update aymurai package version to 1.1.11 in uv.lock * 🐛 Improve get_extension logic to fix document extraction issues on Windows and remove python-magic dependency * 🔧 Update Dockerfile to use 'bullseye' variant for Python images for improved compatibility * 🔧 Update Makefile targets for improved Docker workflow * 🔖 feat: Bump aymurai package version to 1.1.12 * ♻️ Harden get_extension with header scans and zip safeguards * 🔧 Extend document extraction timeout to 30s * 🔧 Refactor Docker workflow to build and push images using docker/build-push-action * 🔧 Fix workflow step order to correctly extract tag name before building Docker images * 🔧 Remove tag extraction step and use github.ref_name directly for Docker image builds * ⏪ Revert Docker workflow to extract tag name and use it for image versioning * Update .github/workflows/build-docker-image.yml Co-authored-by: Copilot <[email protected]> * ✏️ Remove incomplete comment Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: jed <[email protected]> Co-authored-by: Copilot <[email protected]> * WIP: feat(decision): ✨ integrate TinyEmbeddingBagClassifier for decision detection (#67) * feat(decision): ✨ integrate TinyEmbeddingBagClassifier for decision detection - Introduced a new model class `DecisionEmbeddingBagBinRegex` using `TinyEmbeddingBagClassifier`. - Updated model loading and saving mechanisms to support safetensors format. - Added a new training notebook for the embedding bag classifier. - Modified the pipeline configuration to include the new model. * ⚡️ Remove unidecode usage to avoid double normalization in model_input_from_text * 📝 Add type hints and docstrings for clarity in DecisionEmbeddingBagBinRegex and TinyEmbeddingBagClassifier * 🔧 Refactor import statements for safetensors to remove try-except block * 🔥 Remove Conv1dTextClassifier, Tokenizer and SpanishTokenizer implementations * 🐛 Fix gen_aymurai_entity call by removing unused category parameter * 🔖 Update aymurai package version to 2.0.0a4.dev1 * 🔀 cherry-pick(decision): modernize decision model and upgrade ML dependencies Cherry-pick TinyEmbeddingBagClassifier (safetensors) replacing Conv1d model. Remove dead deps (torchtext, pytorch-lightning), upgrade torch to 2.x and flair to 0.15.1. * 🐛 cherry-pick(fix): datapublic and anonymizer crash when use_cache is disabled * test(infra): rewrite test infrastructure with architecture guide standards - Delete old test files (test_document_extract.py, test_anonymizer_predict.py, test_datapublic_predict.py) - Create new directory structure: tests/integration/pipelines/, tests/api/routers/{anonymizer,datapublic,misc}/ - Rewrite tests/conftest.py: - Set env vars at module level (RESOURCES_BASEPATH=resources, SQLALCHEMY_DATABASE_URI=sqlite:///:memory:) - Remove torch mock and lazy loader - Direct imports from production code - Clean fixtures: db_engine (session-scoped), db_session (function-scoped), client (with dependency override) - Test data builders: build_data_item(), build_label(), build_anonymization_paragraph(), build_datapublic_paragraph() - Update pyproject.toml with [tool.pytest.ini_options]: strict-markers, integration/slow markers Verification: uv run python -c 'import tests.conftest' succeeds, pytest collection clean * test(conftest): add pipeline loading helpers and mock factories for API tests Wave 2 complete: integration pipeline conftest + API router conftest Integration pipeline conftest: - PIPELINE_CONFIGS dict for flair-anonymizer and full-paragraph - load_test_pipeline() helper with print_config=False - Session-scoped fixtures for both pipelines (expensive model loading) - build_pipeline_input() test data builder - sample_text fixture with Spanish legal text API router conftest: - build_mock_pipeline() factory with MagicMock - Mock preprocess/predict_single/postprocess methods - build_processed_data_item() test data builder - Re-exports builders from root conftest * test(api): add document extract endpoint tests with mocked extraction * test(api): add anonymizer and datapublic endpoint tests with mocked pipelines * test(integration): add pipeline integration tests for flair-anonymizer and full-paragraph * ✅ test: refactor test infrastructure and add integration tests - Reorganize test conftest files to proper hierarchy (tests/api/conftest.py) - Add pytest to dependency groups in pyproject.toml - Refactor API router tests to use centralized fixtures and builders - Add real document extraction tests with DOCX/PDF generators - Improve pipeline integration tests with fixture-based stages - Fix label serialization to use model_dump(mode="json") - Update UUID generation for datapublic tests to use uuid.uuid5 - Add cache path environment setup for integration tests - Clean up imports and remove unused dependencies - Remove empty test file (document_extract.py) This refactoring improves test maintainability, adds proper integration testing without excessive mocking, and establishes consistent test utilities across the codebase. * 👷 ci(github): add pytest workflow for CI integration - Introduced a new GitHub Actions workflow for running pytest. - Configured to trigger on pull requests and manual dispatch. - Supports multiple OS and Python versions for comprehensive testing. * 👷fix(tests): fix env variable DISKCACHE_ROOT * 👷 ci(github): remove deprecated PR tests workflow & fix env variable - Deleted the old PR tests workflow file. - This cleanup helps streamline CI processes and reduces redundancy. * ci(github): 👷 add pipeline download and integration tests to CI workflow - Introduced a new script for downloading pipelines. - Updated the pytest workflow to include running API and pipeline tests. - Enhanced test execution with improved output formatting and failure limits. * fix(tests): 🐛 avoid context manager in TestClient to skip app startup - Changed TestClient usage to prevent app lifespan startup during tests. - Ensured proper cleanup by closing the client after use. - This improves test performance and reliability. * 👷 ci(github): add RESOURCES_BASEPATH environment variable for pipeline tests - Added RESOURCES_BASEPATH to the environment variables for both downloading pipelines data and running pipeline tests. - This change ensures that the necessary resource paths are correctly set during the CI workflow execution. * 👷 ci(github): update RESOURCES_BASEPATH for pipeline data download - Changed RESOURCES_BASEPATH from /tmp to resources in the pipeline download step. - Ensures the correct path is used for resource access during tests. * chore(pyproject): 🔧 add environment markers for platform compatibility - Introduced required-environments for tool.uv to specify platform requirements. - Updated resolution-markers and required-markers in uv.lock for better dependency management. - Added tensorflow-io-gcs-filesystem with specific markers for Windows and Linux. * ci(github): 👷 configure es_AR locale for Ubuntu runners - Added steps to configure the es_AR locale on Ubuntu. - Ensures proper locale settings for tests running in the CI environment. * 👷 ci(github): add AYMURAI_CACHE_BASEPATH environment variable for pipeline tests - Introduced AYMURAI_CACHE_BASEPATH to the environment variables for both pipeline download and pipeline tests. - This change ensures that the correct cache path is utilized during the execution of the tests. * 🐛 fix(dependencies): adjust textract dependency for platform compatibility - Added conditional dependency for textract based on the operating system. - Specified different sources for textract depending on whether the platform is Windows or not. * 🔥 chore(opencode): remove opencode.json configuration file - Deleted the opencode.json file as it is no longer needed. - This change helps to clean up the repository and remove obsolete configurations. * 🚚 Update pipeline path for datapublic in scripts, notebooks and tests * 📝 docs: replace Black references with Ruff in CONTRIBUTING and Alembic hook examples * 🔧 Add backslash to default CACHE_BASEPATH value * 🔧 Update cache path retrieval to use settings for consistency * ➖ Remove textract dependencies and update documentation for extract_document function * ✅ Update integration tests and add new test cases for anonymizer and datapublic flows * 🔥 chore(test): remove legacy /test dir and standardize sample doc path to /resources/data/sample/document-01.docx * 🔧 Update UV_VERSION to latest in devcontainer Dockerfile * 🔧 Update dependency installation command to include all groups * 📌 Update uv.lock * 🐛 Fix CACHE_BASEPATH env alias resolution for CI pipeline downloads

* ✨ feat(extractors): use pymupdf layout for pdf text extraction * ✨ feat(normalization): enhance document normalization to preserve paragraph structure * 📝 docs: document default values for extractor and normalization helpers * 🩹 fix(extractors): use pymupdf4llm.to_text with page_chunks for pdf paragraphs * ♻️ Add DOCX and PDF anonymizer modules - Implemented DocxAnonymizer class to handle anonymization of DOCX documents by replacing sensitive data with label tokens. This includes functionality for unzipping documents, parsing XML, editing content, and adding watermarks. - Developed PdfAnonymizer class for anonymizing PDF documents, utilizing pymupdf for document manipulation. This includes layout parsing, font caching, redaction operations, and watermarking. * 🔧 Enhance PDF and DOCX handling in anonymization process * 📝 Update backend module references for document rendering in README * ✅ Update tests to use DOCX format for document anonymization and enhance mock behavior * ✨ Add end-to-end PDF anonymization notebook with PyMuPDF and AymurAI API * ♻️ Rework PDF anonymization for precise spans and widget handling * 🔧 Update model_dump calls to exclude None values for improved data handling * 📝 Add docstrings to label replacement functions * ♻️ Refactor watermark handling and optimize PDF token aliasing * ✅ Add integration tests for merging fragmented numeric labels and excluding null alt attributes in PDF anonymization * ➖ Remove opencv-python-headless dependency from project requirements * ♻️ Implement paragraph splitting function to enhance document text extraction * 🔧 Update dependency installation command to prevent Python downloads * 🔥 Remove redundant tests for merging fragmented numeric labels and PDF anonymization * ♻️ Refactor anonymizer tests to use DOCX format and enhance mock functionality * 🔧 Add xfail marker for PDF extraction test on Windows due to tensor type issue * ✨ Enhance PDF anonymization by adding cleanup rects, removing overlapping links, and scrubbing metadata * 🔧 Remove redundant return statement in _label_replacement_text function * ♻️ Refactor anonymization module: split pdf and docx internals by format * ✅ Add integration tests for PDF and DOCX anonymizers, including metadata scrubbing and link preservation * ✨ Add watermark layout adjustments to avoid footer content overlap in PDF anonymization * ✅ Add integration test to ensure watermark is positioned away from footer content in PDF anonymization * 🩹 Fix: read docx xml as utf-8 across platforms * ✅ Add Windows-specific xfail marker for PDF tests and implement UTF-8 XML reading test

codacy-production · 2026-04-20T17:31:47Z

Not up to standards ⛔

🔴 Issues 96 high · 3 medium · 1 minor

Alerts:
⚠ 100 issues (≤ 0 issues of at least minor severity)

Results:
100 new issues

Category Results

ErrorProne 3 high

Security 3 medium
1 minor
93 high

View in Codacy

🟢 Metrics 1119 complexity · 27 duplication

Metric Results

Complexity 1119

Duplication 27

View in Codacy

_{TIP This summary will be updated as you push new changes. Give us feedback}

jansaldo and others added 30 commits August 22, 2025 11:57

➕ build(deps): Add langextract for text entity extraction

77ddef7

🚧 wip: Add langextract entity extraction experiment notebook

0bbf0d2

Merge branch 'dev' of https://github.com/AymurAI/backend into feature…

71cbd62

…/langextract

✨ feat: Enhance entity models with relation handling and canonical re…

c2bc1f2

…presentation

✨ feat: Add JSON serialization support and enhance utility functions

d19bb79

⬆️ Upgrade ML dependencies and refresh uv.lock

1c0edb5

🚧 wip: Update extraction examples in langextract notebook

fe35a4e

Merge branch 'dev' of https://github.com/AymurAI/backend into feature…

25070e2

…/langextract

📝 Add entity disambiguation notebook for canonical entity extraction

3d3d230

Merge branch 'release/v2.0.0' of github.com:AymurAI/backend into feat…

dabc47f

…ure/langextract

⬆️ Update dependencies: langextract to 1.1.0 and ollama to 0.6.1; add…

444194b

… openai extra for langextract

📝 Integrate custom OpenAI model for extraction and remove failing emp…

8b13aad

…ty example

📝 Update error message format in json_serial function for better read…

68eae78

…ability Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

♻️ Inline immediate return in get_pretty

4517601

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

🐛 Fix: Use json_serial in save_json

e96d8e4

Co-authored-by: Copilot <[email protected]>

🎨 Format json.dumps call in save_json for improved readability

c45a863

Merge pull request #58 from AymurAI/feature/langextract

6100447

Feature/langextract

Feature/ollama service (#59)

401a08b

* ✨ Add GPU-enabled Ollama service to compose stack * 🔧 Add Make targets for managing Ollama service and models * 🔧 Add launch configuration and task for starting Ollama service

🩹 Fix YAML key names in prompt defaults for summarization

60bf959

⏪ Rollback to previous torch and torchtext versions to avoid conflicts

f85d7cf

🩹 Fix: Add missing environment variable for OLLAMA_HOST in docker-com…

5ac37a0

…pose

📝 Add anonymization pipeline docs

c1a8a9a

🚧 WIP: Add Playwright PJN scraper

29b6082

📝 Add Jupyter notebook for entity disambiguation from pre-clustered v…

0dbc86d

…alidations

jansaldo and others added 27 commits February 20, 2026 17:37

🔥 Removed TESSDATA_PREFIX from .env.common

d02e20f

🙈 Update .gitignore to include notebooks directory while excluding su…

eef81c6

…bdirectories and non-IPYNB files

🔀 Synthesize docker-compose from 26033a8/00709164 after b05b768 rollback

7f8ebfa

🔀 Synthesize Makefile from afbfda9/d80f74b/26033a8f after f645881 rol…

b0233e5

…lback

🔧 Fix repository URL case sensitivity in pyproject.toml and remove un…

ca04d8b

…used dependencies

🔥 Remove tasks.json configuration for Ollama service

203f33e

🔥 Remove scraper and documentation

24825e1

🔥 Remove experiment module

a6986fe

🔥 Remove path utility functions from paths.py

968344d

🔥 Remove unused PromptSet and PromptLibrary classes, and simplify dis…

6c35143

…ambiguation options in LabelPolicy

🔥 Remove EntityRelation class and its associated methods from entitie…

4d28e03

…s.py

📝 Enhance documentation with detailed docstrings for various function…

e6f32ba

…s across multiple modules

🔥 Removed PromptLibrary class from aymurai/api/endpoints/routers/anon…

4559764

…ymizer/anonymizer.py for release/v1.5.0 compatibility 🔥 Removed `llm` disambiguation label policy for release/v1.5.0 compatibility

🔀 Synthesize document_extract from d349c69 after 3c55d8e: remove extr…

014b28e

…actor config passthrough and restore fixed timeout

🔀 Synthesize PDF extraction flow from d349c69/26033a8: remove cache/d…

91d2c10

…ebug path

🔥 Remove text extraction tests

000215e

Merge branch 'release/v1.5.0' of https://github.com/AymurAI/backend i…

9e68af6

…nto release/v1.5.0

📝 Update description formatting for aymurai_disambiguation field in E…

0d3aae5

…ntityAttributes

🦖 Update PdfExtractor.extract method to include ignored keyword argum…

efecf50

…ents for backward compatibility

🔥 Remove unused static logo file from API resources

837c639

🔧 Add version_scheme configuration to setuptools_scm in pyproject.toml

846ae17

📌 Update uv.lock

63e96ea

📝 Reorganize and update v1.5.0 documentation (EN/ES)

8f507c8

🚚 Rename full-paragraph pipeline to datapublic across code and docs

e97a513

jansaldo and others added 2 commits April 20, 2026 18:49

🐛 Remove unnecessary --extra runtime flag from uv sync command

3ad788b

🐛 Date formatter bug fixed for canonical entities generation.

87c7892

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release/v1.5.0#75

Release/v1.5.0#75
jansaldo wants to merge 99 commits intodevfrom
release/v1.5.0

jansaldo commented Feb 20, 2026

Uh oh!

codacy-production Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

jansaldo commented Feb 20, 2026

Uh oh!

codacy-production Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Not up to standards ⛔

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codacy-production Bot commented Apr 20, 2026 •

edited

Loading