feat(subsequences): migrate old SubmissionExtras to SubmissionSupplemental DEV-934 #6422

rgraber · 2025-10-28T19:29:28Z

🗒️ Checklist

run linter locally
update developer docs (API, README, inline, etc.), if any
for user-facing doc changes create a Zulip thread at #Support Docs Updates, if any
draft PR with a title <type>(<scope>)<!>: <title> DEV-1234
assign yourself, tag PR: at least Front end and/or Back end or workflow
fill in the template below and delete template comments
review thyself: read the diff and repro the preview as written
open PR & confirm that CI passes & request reviewers, if needed
delete this section before merging

💭 Notes

Fill out the method for converting old SubmissionExtra content dicts to the new format expected by SubmissionSupplemental for translations and transcripts. Qualitative analysis answers will be handled separately.
This code makes numerous assumptions to fill in information that is not present in the old structure but required in the new:

If old[xpath]['transcript']['value'] == old[xpath]['googlets']['value']and the language codes are the same, we assume the most recent transcript was automatically generated
If, for any revision in old[xpath]['transcript']['revisions'], revision['value'] is the same as old[xpath]['googlets']['value'] and the language codes match, we assume that revision was automatically generated. If multiple match, we assume they were all automatically generated. This should be pretty rare but it's possible
1-2 also apply to transcriptions
old[xpath]['transcript']['dateModified'] will be assumed to be the creation date of the most recent revision (ie whatever is in old[xpath]['transcript']['value']). The same goes for translations
All uuids are newly generated
All old transcriptions/translations have status=complete with a _dateAccepted of now() (whenever the code is running)
To determine the dependency of any old translation, whether automated or manual:

If we know the source language, look for the most recent transcript in that language that was created before the translation.
If there is none, take the most recent transcription in that language
If there are no transcriptions in the source language, take the most recent transcript
If we don't know the source language, take the most recent transcript that was created before the translation
If there is none, take the most recent transcript

We can ignore any badly formatted revisions/transcripts/translations
Most recent revisions will be first in the version array

👀 Preview steps

On main:

ℹ️ have an account and a project with an audio question
Enable NLP
Add a submission to the project
Generate an automatic transcript in English
Generate an automatic translation in Spanish
Manually edit the English transcript and save
Manually edit the Spanish translation and save

On PR branch:
8. Clone the asset
9. PATCH the asset with

{
  "_version": "20250820",
  "_actionConfigs": {
    "<question xpath>": {
      "manual_transcription": [{"language": "en"}],
      "automated_google_transcription": [{"language": "en"}],
      "manual_translation": [{"language": "es"}],
      "automated_google_translation": [{"language": "es"}],
    }
  }
}

In a python shell, run

from kobo.apps.subsequences.schemas import validate_submission_supplement
from kobo.apps.subsequences.utils.versioning import migrate_submission_supplementals
se = SubmissionExtras.objects.get(submission_uuid=<submission uuid>)
validate_submission_supplement(<cloned_asset>, migrate_submission_supplementals(se.content))

🟢 The validation should pass

kobo/apps/subsequences/utils/versioning.py

kobo/apps/subsequences/schemas.py

noliveleger · 2025-10-30T17:31:51Z

kobo/apps/subsequences/utils/versioning.py



+def migrate_submission_supplementals(supplemental_data: dict) -> dict | None:
+    if supplemental_data.get('_version', None) == SCHEMA_VERSIONS[0]:


Nit: .get() already defaults to None, so you can drop the second argument.

noliveleger · 2025-10-30T17:41:50Z

kobo/apps/subsequences/tests/test_versioning.py

+        new_version = {
+            '_version': '20250820',
+            'Audio_question': {
+                'automatic_transcription': {


That's not the name of the action. It should be automatic_google_transcription. We are only using Google for NLP at the moment. The logic works great, but it should match the current action IDs.

Moreover, we need to rename every automated_* and Automated* to their (A|a)utomatic counterparts.

noliveleger · 2025-10-30T17:42:01Z

kobo/apps/subsequences/tests/test_versioning.py

+                        }
+                    ]
+                },
+                'automatic_translation': {


Same comment as above.

noliveleger · 2025-10-30T17:52:49Z

Nit: 9. PATCH the asset with is bad formatted, trailing comas are missing inside the "<question xpath>" dictionary.

noliveleger · 2025-10-30T17:58:22Z

validate_submission_supplement(<cloned_asset>, migrate_submission_supplementals(se.content)) exposes a bug, since with the current code the validation should not pass, automatic_transcription and automatic_translation should be rejected.

I think there’s an issue in get_submission_supplement_schema(asset: 'kpi.models.Asset') (unrelated to this PR though).
We don't set any validation at the question level (see jnm's comment below for current schema)

We should probably have schema which would look like something like that (my question is audio).

"type": "object",
  "additionalProperties": false,
  "required": ["_version", "audio"],
  "properties": {
    "_version": { "const": "20250820" },
    "audio": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "manual_transcription": { "$ref": "#/$defs/manualTranscription" },
        "automatic_google_transcription": { "$ref": "#/$defs/autoGoogleTranscription" },
        "manual_translation": { "$ref": "#/$defs/manualTranslation" },
        "automatic_google_translation": { "$ref": "#/$defs/autoGoogleTranslation" }
      },
      "required": [
        "manual_transcription",
        "automatic_google_transcription",
        "manual_translation",
        "automatic_google_translation"
      ]
    }
  },

So everything nested inside the question_name dictionary is ignored/by_passed.

jnm · 2025-10-30T21:18:44Z

So everything nested inside the question_name dictionary is ignored/by_passed.

Right 🤦 what we are doing now is NOT putting the action IDs inside properties. They're just hanging out at the top level of the object for each question, not doing anything:

{
  "additionalProperties": false,
  "properties": {
    "_version": {
      "const": "20250820"
    },
    "audio": {
      "additionalProperties": false,
      "properties": {},
      "type": "object",
      "automated_google_transcription": {
       …

Unfortunately, correcting this causes other problems because the $defs paths are wrong.

… beccagraber/refactor-subsequences-2025-migration

…ubsequences-2025-migration

rgraber added 13 commits October 22, 2025 11:40

temp: initial commit

4deca0f

fixup!: stuff

87ba5a4

fixup!: stuff

8e82e61

fixup!: stuff

89f236f

fixup!: stuff

cc9ce01

fixup!: stuff

9b31b20

fixup!: stuff

ef8655a

fixup!: messy but functional

3c35f14

fixup!: cleaning

335ff04

fixup!: accidental change

c638c62

fixup!: accidental change

73da17c

fixup!: new uuid

397d883

fixup!: format

3fd8f59

rgraber self-assigned this Oct 29, 2025

rgraber added the Back end label Oct 29, 2025

rgraber marked this pull request as ready for review October 29, 2025 13:50

rgraber requested a review from Guitlle as a code owner October 29, 2025 13:50

fixup!: stuff

746766c

rgraber requested review from jnm and noliveleger and removed request for Guitlle October 29, 2025 16:47

noliveleger removed the request for review from jnm October 30, 2025 17:06

noliveleger requested changes Oct 30, 2025

View reviewed changes

rgraber added 2 commits October 31, 2025 08:34

fixup!: changes from review

263227f

fixup!: action names

8569da1

rgraber requested a review from jnm as a code owner November 4, 2025 13:45

fixup: accidental change

e2a43ed

jnm and others added 3 commits November 6, 2025 09:33

Merge remote-tracking branch 'origin/refactor-subsequences-2025' into…

334d7f7

… beccagraber/refactor-subsequences-2025-migration

Merge branch 'refactor-subsequences-2025' into beccagraber/refactor-s…

6adf76a

…ubsequences-2025-migration

fixup!: auto accept manual

6588150

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(subsequences): migrate old SubmissionExtras to SubmissionSupplemental DEV-934 #6422

feat(subsequences): migrate old SubmissionExtras to SubmissionSupplemental DEV-934 #6422

Uh oh!

rgraber commented Oct 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

noliveleger Oct 30, 2025

Uh oh!

noliveleger Oct 30, 2025

Uh oh!

noliveleger Oct 30, 2025

Uh oh!

noliveleger commented Oct 30, 2025

Uh oh!

noliveleger commented Oct 30, 2025 •

edited

Loading

Uh oh!

jnm commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants



		def migrate_submission_supplementals(supplemental_data: dict) -> dict \| None:
		if supplemental_data.get('_version', None) == SCHEMA_VERSIONS[0]:

Uh oh!

feat(subsequences): migrate old SubmissionExtras to SubmissionSupplemental DEV-934 #6422

Are you sure you want to change the base?

feat(subsequences): migrate old SubmissionExtras to SubmissionSupplemental DEV-934 #6422

Uh oh!

Conversation

rgraber commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🗒️ Checklist

💭 Notes

👀 Preview steps

Uh oh!

Uh oh!

Uh oh!

noliveleger Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

noliveleger Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

noliveleger Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

noliveleger commented Oct 30, 2025

Uh oh!

noliveleger commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnm commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rgraber commented Oct 28, 2025 •

edited

Loading

noliveleger commented Oct 30, 2025 •

edited

Loading