Skip to content

Conversation

@philwinder
Copy link
Member

Summary

Implements proper idempotent behavior for the POST /api/v1/repositories endpoint. Previously, attempting to create a repository with a duplicate URI would return a 500 Internal Server Error due to an unhandled database IntegrityError. Now the endpoint follows RESTful idempotency principles:

  • First POST: Creates repository → Returns 201 Created
  • Subsequent POSTs: Triggers re-indexing of existing repository → Returns 200 OK
  • No duplicate data is created in the database

Changes

Core Implementation

  • CommitIndexingApplicationService.create_git_repository():

    • Now checks for existing repositories by sanitized URI using QueryBuilder.filter()
    • Returns tuple (GitRepo, bool) where boolean indicates if newly created
    • Triggers re-indexing tasks for existing repositories
  • POST /api/v1/repositories endpoint:

    • Updated to handle the tuple return value
    • Sets status code dynamically: 201 Created for new, 200 OK for existing
    • Updated documentation to reflect idempotent behavior

Testing

Added two comprehensive regression tests:

  1. test_deletion_smoke.py: Verifies repository deletion properly cleans up all associated data

    • Found bug: embeddings table still has 6 rows after deletion (documented in test)
    • Checks all 8 related tables: repos, commits, branches, tags, files, embeddings, enrichments, associations
  2. test_indexing_idempotency.py: Verifies no duplicate data when re-indexing same repository

    • Tests that POSTing same URI twice doesn't create duplicates
    • Validates all table counts remain unchanged
    • Confirms proper status codes (201 → 200)

Test Plan

  • Run test_indexing_idempotency.py - passes, verifies idempotent behavior
  • Run test_deletion_smoke.py - passes (documents embeddings bug)
  • Linting and type checking passes
  • Manual testing: POST same repository URI multiple times

Related Issues

Fixes the bug where duplicate repository creation attempts returned 500 errors instead of handling gracefully.

Notes

The deletion test identified an existing bug where the embeddings table is not fully cleaned up during repository deletion. This is documented in the test but not fixed in this PR to keep the scope focused on the idempotency fix.

Previously, POSTing the same repository URI twice would return a 500
error due to an unhandled IntegrityError from the database UNIQUE
constraint. This change implements proper idempotent behavior where:

- First POST: Creates repository → Returns 201 Created
- Subsequent POSTs: Triggers re-indexing → Returns 200 OK
- No duplicate data is created in the database

Changes:
- Update create_git_repository() to check for existing repos by
  sanitized URI using QueryBuilder pattern
- Return tuple (GitRepo, bool) indicating if repo was newly created
- Modify API endpoint to set appropriate status code (201 vs 200)
- Add regression tests for deletion and indexing idempotency

The implementation follows RESTful idempotency principles where
multiple identical requests produce the same result.

Tests added:
- test_deletion_smoke.py: Verifies repository deletion cleans up all
  associated data (found bug: embeddings table not fully cleaned)
- test_indexing_idempotency.py: Verifies no duplicate data created
  when re-indexing same repository
@github-actions
Copy link

github-actions bot commented Oct 31, 2025

Coverage

Coverage Report
FileStmtsMissCoverMissing
src/kodit
   _version.py18475%15–19
   app.py74740%3–168
   cli.py511767%57–84, 99–101, 107
   cli_utils.py25250%3–67
   config.py1122870%24, 175, 178–181, 194, 203–205, 209–211, 218–221, 241–245, 251–259
   database.py422635%25–62, 71, 76–84, 88–89, 93
   log.py1252475%60, 74, 142–146, 156–>160, 205–>198, 207–208, 213–223, 242, 244–253, 259–>258, 263–266, 276
   mcp.py512058%51–55, 157–198, 203, 222
   middleware.py35350%3–75
src/kodit/application/factories
   reporting_factory.py211148%20–24, 31–36
   server_factory.py18611828%107–143, 147–149, 153–157, 161–165, 171–177, 181–183, 187–191, 195–199, 203–205, 209–218, 222–224, 228–232, 236–240, 244–268, 272, 276–280, 287–289, 295–297, 301–305, 309–313, 317–319, 323–328, 332–336, 340–342, 346–355, 359–363, 367–371, 375–379, 383–389, 393–399
src/kodit/application/services
   code_search_application_service.py631081%23, 37, 51, 88–99, 105–112
   commit_indexing_application_service.py42031021%16, 225–254, 260, 269–309, 328–360, 364–371, 375–435, 462–479, 498, 518, 527–>532, 542–>exit, 549–646, 653–663, 677–706, 710–768, 774–838, 844–902, 906–956, 971–1031, 1037–1096, 1102–1114
   enrichment_query_service.py1195449%59, 74–91, 101–116, 128, 139, 153–165, 171, 179, 207–208, 212–213, 217–218, 222–223, 229, 237–238, 244, 252–253, 269, 279–302, 321, 338–>336
   indexing_worker_service.py62620%3–115
   queue_service.py35485%54–63
   reporting.py40583%13, 72, 84, 88, 102
   sync_scheduler.py402728%26–30, 34–36, 40–46, 50–65, 69–79
src/kodit/domain
   errors.py110%4
   protocols.py103991%37, 41, 45, 49, 53, 57, 61, 65, 80
   value_objects.py2721196%65, 94–119, 124, 129–132, 323, 622, 626
src/kodit/domain/enrichments
   enricher.py7186%17
   enrichment.py35197%41
src/kodit/domain/enrichments/architecture/database_schema
   database_schema.py8188%17
src/kodit/domain/enrichments/architecture/physical
   formatter.py4175%11
src/kodit/domain/enrichments/development
   development.py9189%18
src/kodit/domain/enrichments/history
   history.py9189%18
src/kodit/domain/enrichments/history/commit_description
   commit_description.py8188%17
src/kodit/domain/entities
   __init__.py1321587%22, 39, 80, 93, 103–107, 238–240, 244–245, 267–268
   git.py1211289%47, 52–54, 72, 87, 92–96, 168, 173, 190
src/kodit/domain/factories
   git_repo_factory.py17468%39–53, 79
src/kodit/domain/services
   bm25_service.py36292%59, 116
   embedding_service.py46195%92, 125–>124
   git_repository_service.py1686257%95–>91, 121–125, 135, 156–158, 164–174, 184–203, 211–228, 234–238, 253–273, 277–281, 345–363, 401
   git_service.py1501500%3–307
   physical_architecture_service.py58586%130, 132, 134, 136, 138
   task_status_query_service.py880%3–17
src/kodit/domain/tracking
   resolution_service.py312223%27–30, 39–44, 48–52, 56–57, 61–71
src/kodit/infrastructure/api/client
   __init__.py440%3–7
   base.py39390%3–100
   exceptions.py440%4–19
   generated_endpoints.py660%7–23
   search_client.py13130%3–86
src/kodit/infrastructure/api/middleware
   auth.py13130%3–31
src/kodit/infrastructure/api/v1
   dependencies.py58580%3–181
src/kodit/infrastructure/api/v1/routers
   commits.py73730%3–358
   queue.py16160%3–64
   repositories.py1131130%3–400
   search.py11110%3–58
src/kodit/infrastructure/api/v1/schemas
   commit.py45450%3–96
   context.py660%3–13
   enrichment.py20200%3–43
   queue.py16160%3–35
   repository.py55550%3–128
   search.py94940%3–222
   snippet.py27270%3–58
   tag.py13130%3–31
   task_status.py20200%3–41
src/kodit/infrastructure/bm25
   local_bm25_repository.py762465%22–23, 46–57, 76–77, 80–81, 88–89, 102–103, 107, 112, 137–>135, 150
   vectorchord_bm25_repository.py895532%118–120, 124–131, 135–139, 143–148, 152–154, 157–161, 165–202, 206–227, 234–237
src/kodit/infrastructure/cloning/git
   git_python_adaptor.py29114148%19–37, 42–59, 110–120, 133–166, 194–196, 215–>211, 226–228, 265, 297–306, 315–344, 370–378, 390–394, 404–412, 441, 455–460, 486–490, 509–514, 524–>exit, 542–544, 551–585
   working_copy.py53392%48–55
src/kodit/infrastructure/database_schema
   database_schema_detector.py1291119%62–80, 84–89, 93–102, 106–116, 120–129, 133–163, 167–268
src/kodit/infrastructure/embedding
   embedding_factory.py39776%64–65, 74–77, 85–86
   local_vector_search_repository.py352032%50–70, 75–88, 97
   vectorchord_vector_search_repository.py997221%100–105, 109–115, 119–120, 124–161, 167–202, 206–240, 249–260, 263–272
src/kodit/infrastructure/embedding/embedding_providers
   litellm_embedding_provider.py40390%70–78, 103–>107
   local_embedding_provider.py53292%16–17, 50–>60, 64–>78
src/kodit/infrastructure/enricher
   enricher_factory.py191135%21, 41–53
   litellm_enricher.py301347%47–79
   local_enricher.py453028%35–40, 55–121
   utils.py7522%20–30
src/kodit/infrastructure/git
   git_utils.py12120%3–32
src/kodit/infrastructure/ignore
   ignore_pattern_provider.py30300%3–69
src/kodit/infrastructure/mappers
   task_mapper.py12186%23
src/kodit/infrastructure/physical_architecture/detectors
   docker_compose_detector.py1473171%52, 62–63, 123–>126, 127–>131, 135–>137, 149–162, 166, 195–198, 211–221, 246–>244, 255–>exit, 262–>255, 278, 296–299, 304–305, 321, 334–336
src/kodit/infrastructure/physical_architecture/formatters
   narrative_formatter.py86593%112–113, 115–116, 145
src/kodit/infrastructure/providers
   async_batch_processor.py20286%28, 48
   litellm_provider.py761678%40–>44, 55–87
src/kodit/infrastructure/reporting
   db_progress.py11464%17–19, 23
   log_progress.py18945%18–20, 24–40
   telemetry_progress.py9278%15, 19
src/kodit/infrastructure/slicing
   api_doc_extractor.py38812564%56, 60–61, 69–71, 83, 129, 209–210, 232, 236, 325–334, 347, 401, 408–>416, 424–482, 486–492, 496–523, 527–529, 535–551, 559–>567, 581–>589, 589–>594, 594–>601, 614–>622, 622–>626, 638, 651–653, 668–669, 678, 691–697, 705–>704, 710–716, 724–>723, 729–735, 742–755, 760, 763, 772–773, 794–>803
   ast_analyzer.py5766485%212–213, 228–230, 324, 355, 361, 368, 376, 386, 400–>390, 410–412, 435–>434, 438–440, 446–>444, 450–452, 462, 482, 484, 491–>489, 492–>491, 494, 498–503, 510, 542, 590–>595, 596, 601, 644–>649, 650, 662–>667, 696, 727–>722, 746, 749–750, 753, 758, 763, 787, 791–>796, 797, 852–>850, 859, 882, 916, 946, 972–987, 994–>992, 998–1000, 1013–>1015, 1046–>1054, 1050–>1054, 1055, 1070–>1079, 1073–>1070, 1080–1081, 1091, 1095–>1100, 1101
   slicer.py3014779%76, 83–85, 94, 98, 105, 140, 146–148, 206, 220–223, 235, 242, 258–259, 277–>292, 287–>283, 328–>338, 330–>338, 342–>341, 348–349, 376–>363, 403–424, 451–>460, 474–476, 480, 495, 521–>520, 534–>532, 540, 554, 573, 591–>587, 594
src/kodit/infrastructure/sqlalchemy
   embedding_repository.py84295%179, 203, 211–>219
   enrichment_association_repository.py31194%69
   enrichment_v2_repository.py571075%76–78, 96, 127, 143–164
   entities.py2011094%37, 57, 63, 414–420
   git_branch_repository.py36678%33, 63–71
   git_repository.py36389%50–52
   git_tag_repository.py35773%29, 52, 61–69
   query.py1462679%60, 62, 64, 66, 68, 71–74, 102–>104, 168–173, 212, 222, 242, 261, 268–278, 295–300, 304–309, 317–322, 326–331
   repository.py121396%50, 84, 128
   task_repository.py69295%45, 109–>exit, 123
   unit_of_work.py30960%29–>exit, 41–43, 47–49, 53–55
src/kodit/migrations
   env.py30300%3–85
src/kodit/migrations/versions
   4b1a3b2c8fa5_refactor_git_tracking.py51510%10–185
   04b80f802e0c_foreign_key_review.py23230%10–98
   7c3bbc2ab32b_add_embeddings_table.py15150%10–54
   7f15f878c3a1_add_new_git_entities.py1401400%10–689
   9cf0e87de578_add_queue.py15150%10–46
   9e53ea8bb3b0_add_authors.py33330%10–102
   19f8c7faf8b9_add_generic_enrichment_type.py52520%10–259
   4073b33f9436_add_file_processing_flag.py11110%10–35
   4552eb3f23ce_add_summary.py11110%10–33
   85155663351e_initial.py23230%10–97
   b9cd1c3fd762_add_task_status.py21210%10–76
   c3f5137d30f5_index_all_the_things.py21210%10–49
   f9e5ef5e688f_add_git_commits_number.py15150%10–43
src/kodit/utils
   dump_config.py1821820%3–361
   dump_openapi.py25250%3–39
   generate_api_paths.py40400%4–135
   path_utils.py382039%28–57, 72, 77, 81
TOTAL7953346154% 

Tests Skipped Failures Errors Time
301 1 💤 0 ❌ 0 🔥 47.837s ⏱️

@philwinder philwinder merged commit a090c54 into main Nov 3, 2025
11 checks passed
@philwinder philwinder deleted the fix/api-idempotency-for-duplicate-repos branch November 3, 2025 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants