This Python script converts an XLSForm file exported from KoboToolbox into a format compatible with SurveyCTO. It uses a SurveyCTO-compatible template file (.xlsx
) as a base, copies relevant data from the KoboToolbox form (.xls
or .xlsx
), and applies necessary transformations to ensure compatibility.
The script aims to preserve the structure and formatting (including conditional formatting) of the template file while integrating the KoboToolbox form's content.
- Python 3: Ensure you have Python 3 installed.
- Required Libraries: The script uses
pandas
,openpyxl
, andxlrd
. - Virtual Environment (Recommended): It's highly recommended to run this script within a Python virtual environment to manage dependencies. You can use the provided
setup_environment.py
script (orsetup_venv.sh
on Linux/macOS with Bash) to create an environment namedkobo_to_surveycto
and install the required packages.- Run:
python setup_environment.py
- Activate the environment:
- Linux/macOS:
source kobo_to_surveycto/bin/activate
- Windows (PowerShell):
.\kobo_to_surveycto\Scripts\activate
- Windows (CMD):
%cd%\kobo_to_surveycto\Scripts\activate.bat
- Linux/macOS:
- Run:
- Template File: You need a base SurveyCTO XLSForm template (
.xlsx
) containing at leastsurvey
,choices
, andsettings
sheets. By default, the script looks fortemplate.xlsx
in the same directory. You can specify a different template using the--template
argument.
Run the script from your terminal after activating the virtual environment.
Syntax:
python kobo_converter.py <source_kobo_file> <output_scto_name> [--template <path_to_template>]
Arguments:
<source_kobo_file>
: (Required) Path to the source KoboToolbox XLSForm file (.xls
or.xlsx
).<output_scto_name>
: (Required) Desired name (and optional path) for the output SurveyCTO file. The script will automatically enforce the.xlsx
extension.--template <path_to_template>
: (Optional) Path to the SurveyCTO template.xlsx
file. If omitted, defaults totemplate.xlsx
in the script's directory.
Example:
# Using default template.xlsx
python kobo_converter.py ./my_kobo_form.xlsx ./output/converted_form
# Specifying a template
python kobo_converter.py ./input/kobo.xls ./output/scto_form.xlsx --template ./templates/scto_base_v2.xlsx
SurveyCTO has stricter rules for choice list name
values than KoboToolbox (e.g., no spaces, limited special characters). If the script detects choice values in the source file's choices
sheet (from the value
or name
column) that are not compatible with SurveyCTO, it will pause and present the following options:
- A - Terminate the script now.
- Stops the script without creating the output file. Allows you to manually fix the source form first.
- B - Ignore and use unsupported choice lists and continue with the conversion.
- Proceeds with the conversion, keeping the original (potentially invalid) choice values. The resulting form may not work correctly in SurveyCTO.
- C - Automatically update choice lists values to supported ones (removing special characters, replacing spaces with underscores).
- The script will sanitize the invalid choice values (e.g., "Option One" becomes "Option_One", "Choice - A" becomes "Choice-A"). It updates the
choices
sheet but does not attempt to update references to these values elsewhere (like inrelevance
orconstraint
columns on thesurvey
sheet). Calculations or logic referencing the original values might break.
- The script will sanitize the invalid choice values (e.g., "Option One" becomes "Option_One", "Choice - A" becomes "Choice-A"). It updates the
- D - Automatically update choice lists values (as in C) AND attempt to update references to these values on the survey sheet. This may not work in all cases.
- Performs the same sanitization as option C. Additionally, it searches through expression columns (
required
,relevance
,constraint
,calculation
, etc.) on thesurvey
sheet for the original invalid choice values enclosed in single or double quotes (e.g.,'Option One'
,"Choice - A"
) and attempts to replace them with the sanitized values (e.g.,'Option_One'
,"Choice-A"
). - Caution: This automatic reference updating is based on pattern matching and might not catch all instances or could potentially modify unintended parts of complex expressions. Thorough testing of the output form is recommended if using this option.
- Performs the same sanitization as option C. Additionally, it searches through expression columns (
The script performs the following actions during conversion:
- File Structure: Copies the specified template file to the output location to preserve its structure and formatting.
- Sheet Validation: Checks for the presence of
survey
,choices
, andsettings
sheets in both source and template files. Prompts the user if required sheets are missing in the source. - Data Appending (
survey
,choices
):- Appends rows from the source
survey
andchoices
sheets to the corresponding sheets in the output file, below any existing data in the template. - Attempts to preserve blank rows from the source sheets.
- Appends rows from the source
- Column Mapping (Case-Insensitive):
- Matches columns between source and output sheets by header name, ignoring case.
- Maps
relevant
(source) torelevance
(output). - Maps
read_only
(source) toread only
(output). - Maps
constraint_message
(source) toconstraint message
(output). - For the
choices
sheet, if the source lacks avalue
column but has aname
column, it uses thename
column data for the outputvalue
column.
- Value Transformations:
- Converts
true
/false
(case-insensitive) toyes
/no
in therequired
andread only
columns (mapped fromread_only
). - On the
survey
sheet'stype
column, converts:begin_group
tobegin group
end_group
toend group
begin_repeat
tobegin repeat
end_repeat
toend repeat
- (Optional - User Choice C/D) Sanitizes invalid choice list
name
/value
data (replaces-
with-
, other spaces with_
, removes disallowed characters).
- Converts
- Extra Columns: Copies columns present in the source
survey
orchoices
sheets but not in the template to the end of the existing headers in the output file. - Settings Sheet Updates:
- Updates specific cells in the output
settings
sheet based on values from the sourcesettings
sheet:A2
(form_title): Uses sourceform_title
, falls back to the source filename if missing.B2
(form_id): Uses sourceform_id
, falls back to a lowercase, underscore-separated version of the source filename (without extension) if missing.F2
(default_language): Uses sourcedefault_language
if present; otherwise makes no change.
- Updates specific cells in the output
- Reference Updates (Optional - User Choice D): Attempts to find and replace quoted references to original invalid choice values with their sanitized versions within expression columns on the
survey
sheet.