Skip to content

pythongen: Unsafe prefix names produce invalid or broken Python — no validation against keywords, builtins, or non-identifier characters #3458

@HendrikBorgelt

Description

@HendrikBorgelt

name: Bug report
about: Create a report to help us improve
title: 'pythongen: Unsafe prefix names produce invalid or broken Python — no validation against keywords, builtins, or non-identifier characters'
labels: bug, generator-python, generator-dataclasses, generator-pydantic
assignees: ''


Describe the bug
The gen-python (and gen-pydantic) generators do not validate that schema prefix names produce valid, safe Python identifiers before emitting them as module-level variable names or attribute-access expressions in generated code. This causes NameError, SyntaxError, or silent shadowing of Python builtins at import time of the generated module — with no warning at generation time.

Version of LinkML you are using
1.9.0

Please provide a schema (and if applicable, a data file) that replicates the issue

For Reproducing the schema you can right now just use a prefix with a dot in its prefix label.

The report down below goes slightly more into detail as to what caused this error. Sorry for it beeing LLM generated and therefore sounding very blunt. Also Issue #3376 allready mentions the issue, but only for a more limited example.

Schema prefix (dotted name, e.g. after Allotrope prefix renaming

prefixes:
  allotrope.equipment: http://purl.allotrope.org/ontologies/equipment#AFE_

Generated output in chem_dcat_ap.py

# Namespace declaration — dot correctly replaced with underscore ✓
ALLOTROPE_EQUIPMENT = CurieNamespace('allotrope_equipment', 'http://purl.allotrope.org/ontologies/equipment#AFE_')

# Usage site — dot interpreted as attribute access on an undefined object ✗
class Reactor(Device):
    class_class_uri: ClassVar[URIRef] = ALLOTROPE.EQUIPMENT["0000153"]
    #                                   ^^^^^^^ NameError: name 'ALLOTROPE' is not defined

Error at import time

NameError: name 'ALLOTROPE' is not defined

Root Cause

Two separate locations are responsible, and their behaviour is inconsistent
with each other.

1. linkml/generators/pythongen.pygen_namespaces()

The gen_namespaces() method applies ._ and -_ substitution
when declaring the namespace variable:

# linkml/generators/pythongen.py — gen_namespaces()
curienamespace_defs = [
    {
        "variable": f"{pfx.upper().replace('.', '_').replace('-', '_')}",
        "value":    f"CurieNamespace('{pfx.replace('.', '_')}', '{self.namespaces[pfx]}')",
    }
    for pfx in sorted(self.emit_prefixes)
]

So allotrope.equipment → variable name ALLOTROPE_EQUIPMENT. This part is
correct.

2. linkml_runtime/utils/namespaces.pyNamespaces.curie_for(..., pythonform=True)

When generating class_class_uri, class_model_uri, and similar class-level
attributes, pythongen.py calls curie_for(pythonform=True), which
reconstructs the Python expression from the raw prefix string. It
renders allotrope.equipment as the attribute-chain ALLOTROPE.EQUIPMENT[...]
rather than the flat variable name ALLOTROPE_EQUIPMENT[...].

The substitution applied in step 1 is not propagated to this usage site.

Why import keyword is already present but unused for this case

pythongen.py already imports keyword at the top of the file — suggesting
this problem was anticipated for class/slot name generation, but the same
guard was never applied to prefix variable names.


Full Class of Affected Prefix Names

The following categories all produce broken or unsafe generated Python, each
for a different reason:

Category Example prefixes Failure mode
Contains dot allotrope.equipment, allotrope.role NameError — inconsistent substitution between declaration and usage site
Contains hyphen my-prefix, obo-core May declare OK, but expression sites may still be wrong
Python hard keyword type, in, class, not, for, def SyntaxError at import
Python soft keyword (3.12+) match, case, type Context-dependent SyntaxError
Python builtin float, int, str, list, dict, set, type, id Silently shadows the builtin for the entire module
Starts with digit 2d, 3d, 4xr SyntaxError — not a valid identifier
Dunder-style __foo, __init__ Name mangling inside class bodies

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions