Summary
FastxZipped.__next__ currently makes multiple passes over records on every call:
all(record is None for record in records) — check if all exhausted
all_not_none(records) — check if any are None (truncation)
[record.name for record in records] — extract names
[self._name_minus_ordinal(name) for name in ...] — strip ordinals
set(record_names) — check uniqueness
Since this runs on every record, there may be a performance concern for large FASTX files with many read groups.
These could be consolidated into a single loop that:
- Detects
None records (truncation or exhaustion)
- Extracts and validates names in one pass
- Short-circuits on name mismatch
Context
Raised in #259 (review comment by @nh13): #259 (comment)
Notes
The number of elements in records is bounded by the number of FASTX files being zipped (typically 2–4), so the constant factor may be negligible in practice. Worth benchmarking before optimizing.
Summary
FastxZipped.__next__currently makes multiple passes overrecordson every call:all(record is None for record in records)— check if all exhaustedall_not_none(records)— check if any areNone(truncation)[record.name for record in records]— extract names[self._name_minus_ordinal(name) for name in ...]— strip ordinalsset(record_names)— check uniquenessSince this runs on every record, there may be a performance concern for large FASTX files with many read groups.
These could be consolidated into a single loop that:
Nonerecords (truncation or exhaustion)Context
Raised in #259 (review comment by @nh13): #259 (comment)
Notes
The number of elements in
recordsis bounded by the number of FASTX files being zipped (typically 2–4), so the constant factor may be negligible in practice. Worth benchmarking before optimizing.