name: Bug report
about: Create a report to help us improve
title: 'pythongen: Unsafe prefix names produce invalid or broken Python — no validation against keywords, builtins, or non-identifier characters'
labels: bug, generator-python, generator-dataclasses, generator-pydantic
assignees: ''
Describe the bug
The gen-python (and gen-pydantic) generators do not validate that schema prefix names produce valid, safe Python identifiers before emitting them as module-level variable names or attribute-access expressions in generated code. This causes NameError, SyntaxError, or silent shadowing of Python builtins at import time of the generated module — with no warning at generation time.
Version of LinkML you are using
1.9.0
Please provide a schema (and if applicable, a data file) that replicates the issue
For Reproducing the schema you can right now just use a prefix with a dot in its prefix label.
The report down below goes slightly more into detail as to what caused this error. Sorry for it beeing LLM generated and therefore sounding very blunt. Also Issue #3376 allready mentions the issue, but only for a more limited example.
prefixes:
allotrope.equipment: http://purl.allotrope.org/ontologies/equipment#AFE_
Generated output in chem_dcat_ap.py
# Namespace declaration — dot correctly replaced with underscore ✓
ALLOTROPE_EQUIPMENT = CurieNamespace('allotrope_equipment', 'http://purl.allotrope.org/ontologies/equipment#AFE_')
# Usage site — dot interpreted as attribute access on an undefined object ✗
class Reactor(Device):
class_class_uri: ClassVar[URIRef] = ALLOTROPE.EQUIPMENT["0000153"]
# ^^^^^^^ NameError: name 'ALLOTROPE' is not defined
Error at import time
NameError: name 'ALLOTROPE' is not defined
Root Cause
Two separate locations are responsible, and their behaviour is inconsistent
with each other.
1. linkml/generators/pythongen.py — gen_namespaces()
The gen_namespaces() method applies . → _ and - → _ substitution
when declaring the namespace variable:
# linkml/generators/pythongen.py — gen_namespaces()
curienamespace_defs = [
{
"variable": f"{pfx.upper().replace('.', '_').replace('-', '_')}",
"value": f"CurieNamespace('{pfx.replace('.', '_')}', '{self.namespaces[pfx]}')",
}
for pfx in sorted(self.emit_prefixes)
]
So allotrope.equipment → variable name ALLOTROPE_EQUIPMENT. This part is
correct.
2. linkml_runtime/utils/namespaces.py — Namespaces.curie_for(..., pythonform=True)
When generating class_class_uri, class_model_uri, and similar class-level
attributes, pythongen.py calls curie_for(pythonform=True), which
reconstructs the Python expression from the raw prefix string. It
renders allotrope.equipment as the attribute-chain ALLOTROPE.EQUIPMENT[...]
rather than the flat variable name ALLOTROPE_EQUIPMENT[...].
The substitution applied in step 1 is not propagated to this usage site.
Why import keyword is already present but unused for this case
pythongen.py already imports keyword at the top of the file — suggesting
this problem was anticipated for class/slot name generation, but the same
guard was never applied to prefix variable names.
Full Class of Affected Prefix Names
The following categories all produce broken or unsafe generated Python, each
for a different reason:
| Category |
Example prefixes |
Failure mode |
| Contains dot |
allotrope.equipment, allotrope.role |
NameError — inconsistent substitution between declaration and usage site |
| Contains hyphen |
my-prefix, obo-core |
May declare OK, but expression sites may still be wrong |
| Python hard keyword |
type, in, class, not, for, def |
SyntaxError at import |
| Python soft keyword (3.12+) |
match, case, type |
Context-dependent SyntaxError |
| Python builtin |
float, int, str, list, dict, set, type, id |
Silently shadows the builtin for the entire module |
| Starts with digit |
2d, 3d, 4xr |
SyntaxError — not a valid identifier |
| Dunder-style |
__foo, __init__ |
Name mangling inside class bodies |
name: Bug report
about: Create a report to help us improve
title: 'pythongen: Unsafe prefix names produce invalid or broken Python — no validation against keywords, builtins, or non-identifier characters'
labels: bug, generator-python, generator-dataclasses, generator-pydantic
assignees: ''
Describe the bug
The gen-python (and gen-pydantic) generators do not validate that schema prefix names produce valid, safe Python identifiers before emitting them as module-level variable names or attribute-access expressions in generated code. This causes NameError, SyntaxError, or silent shadowing of Python builtins at import time of the generated module — with no warning at generation time.
Version of LinkML you are using
1.9.0
Please provide a schema (and if applicable, a data file) that replicates the issue
For Reproducing the schema you can right now just use a prefix with a dot in its prefix label.
The report down below goes slightly more into detail as to what caused this error. Sorry for it beeing LLM generated and therefore sounding very blunt. Also Issue #3376 allready mentions the issue, but only for a more limited example.
Schema prefix (dotted name, e.g. after Allotrope prefix renaming
Generated output in
chem_dcat_ap.pyError at import time
Root Cause
Two separate locations are responsible, and their behaviour is inconsistent
with each other.
1.
linkml/generators/pythongen.py—gen_namespaces()The
gen_namespaces()method applies.→_and-→_substitutionwhen declaring the namespace variable:
So
allotrope.equipment→ variable nameALLOTROPE_EQUIPMENT. This part iscorrect.
2.
linkml_runtime/utils/namespaces.py—Namespaces.curie_for(..., pythonform=True)When generating
class_class_uri,class_model_uri, and similar class-levelattributes,
pythongen.pycallscurie_for(pythonform=True), whichreconstructs the Python expression from the raw prefix string. It
renders
allotrope.equipmentas the attribute-chainALLOTROPE.EQUIPMENT[...]rather than the flat variable name
ALLOTROPE_EQUIPMENT[...].The substitution applied in step 1 is not propagated to this usage site.
Why
import keywordis already present but unused for this casepythongen.pyalready importskeywordat the top of the file — suggestingthis problem was anticipated for class/slot name generation, but the same
guard was never applied to prefix variable names.
Full Class of Affected Prefix Names
The following categories all produce broken or unsafe generated Python, each
for a different reason:
allotrope.equipment,allotrope.roleNameError— inconsistent substitution between declaration and usage sitemy-prefix,obo-coretype,in,class,not,for,defSyntaxErrorat importmatch,case,typeSyntaxErrorfloat,int,str,list,dict,set,type,id2d,3d,4xrSyntaxError— not a valid identifier__foo,__init__