Skip to content

Python script for copying content from a KoboToolbox XLSForm to SurveyCTO's XLSForm template.

License

Notifications You must be signed in to change notification settings

surveycto/kobo-to-surveycto

Repository files navigation

KoboToolbox to SurveyCTO XLSForm Converter

Overview

This Python script converts an XLSForm file exported from KoboToolbox into a format compatible with SurveyCTO. It uses a SurveyCTO-compatible template file (.xlsx) as a base, copies relevant data from the KoboToolbox form (.xls or .xlsx), and applies necessary transformations to ensure compatibility.

The script aims to preserve the structure and formatting (including conditional formatting) of the template file while integrating the KoboToolbox form's content.

Prerequisites

  1. Python 3: Ensure you have Python 3 installed.
  2. Required Libraries: The script uses pandas, openpyxl, and xlrd.
  3. Virtual Environment (Recommended): It's highly recommended to run this script within a Python virtual environment to manage dependencies. You can use the provided setup_environment.py script (or setup_venv.sh on Linux/macOS with Bash) to create an environment named kobo_to_surveycto and install the required packages.
    • Run: python setup_environment.py
    • Activate the environment:
      • Linux/macOS: source kobo_to_surveycto/bin/activate
      • Windows (PowerShell): .\kobo_to_surveycto\Scripts\activate
      • Windows (CMD): %cd%\kobo_to_surveycto\Scripts\activate.bat
  4. Template File: You need a base SurveyCTO XLSForm template (.xlsx) containing at least survey, choices, and settings sheets. By default, the script looks for template.xlsx in the same directory. You can specify a different template using the --template argument.

Usage

Run the script from your terminal after activating the virtual environment.

Syntax:

python kobo_converter.py <source_kobo_file> <output_scto_name> [--template <path_to_template>]

Arguments:

  1. <source_kobo_file>: (Required) Path to the source KoboToolbox XLSForm file (.xls or .xlsx).
  2. <output_scto_name>: (Required) Desired name (and optional path) for the output SurveyCTO file. The script will automatically enforce the .xlsx extension.
  3. --template <path_to_template>: (Optional) Path to the SurveyCTO template .xlsx file. If omitted, defaults to template.xlsx in the script's directory.

Example:

# Using default template.xlsx
python kobo_converter.py ./my_kobo_form.xlsx ./output/converted_form

# Specifying a template
python kobo_converter.py ./input/kobo.xls ./output/scto_form.xlsx --template ./templates/scto_base_v2.xlsx

Options for Invalid Choice Values

SurveyCTO has stricter rules for choice list name values than KoboToolbox (e.g., no spaces, limited special characters). If the script detects choice values in the source file's choices sheet (from the value or name column) that are not compatible with SurveyCTO, it will pause and present the following options:

  • A - Terminate the script now.
    • Stops the script without creating the output file. Allows you to manually fix the source form first.
  • B - Ignore and use unsupported choice lists and continue with the conversion.
    • Proceeds with the conversion, keeping the original (potentially invalid) choice values. The resulting form may not work correctly in SurveyCTO.
  • C - Automatically update choice lists values to supported ones (removing special characters, replacing spaces with underscores).
    • The script will sanitize the invalid choice values (e.g., "Option One" becomes "Option_One", "Choice - A" becomes "Choice-A"). It updates the choices sheet but does not attempt to update references to these values elsewhere (like in relevance or constraint columns on the survey sheet). Calculations or logic referencing the original values might break.
  • D - Automatically update choice lists values (as in C) AND attempt to update references to these values on the survey sheet. This may not work in all cases.
    • Performs the same sanitization as option C. Additionally, it searches through expression columns (required, relevance, constraint, calculation, etc.) on the survey sheet for the original invalid choice values enclosed in single or double quotes (e.g., 'Option One', "Choice - A") and attempts to replace them with the sanitized values (e.g., 'Option_One', "Choice-A").
    • Caution: This automatic reference updating is based on pattern matching and might not catch all instances or could potentially modify unintended parts of complex expressions. Thorough testing of the output form is recommended if using this option.

Summary of Changes Made by the Script

The script performs the following actions during conversion:

  1. File Structure: Copies the specified template file to the output location to preserve its structure and formatting.
  2. Sheet Validation: Checks for the presence of survey, choices, and settings sheets in both source and template files. Prompts the user if required sheets are missing in the source.
  3. Data Appending (survey, choices):
    • Appends rows from the source survey and choices sheets to the corresponding sheets in the output file, below any existing data in the template.
    • Attempts to preserve blank rows from the source sheets.
  4. Column Mapping (Case-Insensitive):
    • Matches columns between source and output sheets by header name, ignoring case.
    • Maps relevant (source) to relevance (output).
    • Maps read_only (source) to read only (output).
    • Maps constraint_message (source) to constraint message (output).
    • For the choices sheet, if the source lacks a value column but has a name column, it uses the name column data for the output value column.
  5. Value Transformations:
    • Converts true/false (case-insensitive) to yes/no in the required and read only columns (mapped from read_only).
    • On the survey sheet's type column, converts:
      • begin_group to begin group
      • end_group to end group
      • begin_repeat to begin repeat
      • end_repeat to end repeat
    • (Optional - User Choice C/D) Sanitizes invalid choice list name/value data (replaces - with -, other spaces with _, removes disallowed characters).
  6. Extra Columns: Copies columns present in the source survey or choices sheets but not in the template to the end of the existing headers in the output file.
  7. Settings Sheet Updates:
    • Updates specific cells in the output settings sheet based on values from the source settings sheet:
      • A2 (form_title): Uses source form_title, falls back to the source filename if missing.
      • B2 (form_id): Uses source form_id, falls back to a lowercase, underscore-separated version of the source filename (without extension) if missing.
      • F2 (default_language): Uses source default_language if present; otherwise makes no change.
  8. Reference Updates (Optional - User Choice D): Attempts to find and replace quoted references to original invalid choice values with their sanitized versions within expression columns on the survey sheet.

About

Python script for copying content from a KoboToolbox XLSForm to SurveyCTO's XLSForm template.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published