Skip to content

[BUG]: AttributeError: 'PipelineStageManager' object has no attribute 'stage_indices' #6397

@honglyua-il

Description

@honglyua-il

Is there an existing issue for this bug?

  • I have searched the existing issues

The bug has not been fixed in the latest main branch

  • I have checked the latest main branch

Do you feel comfortable sharing a concise (minimal) script that reproduces the error? :)

Yes, I will share a minimal reproducible script.

🐛 Describe the bug

Steps

git clone https://github.com/hpcaitech/ColossalAI
cd ColossalAI
pip install .
cd examples/language/deepseek
colossalai run --nproc_per_node 16 benchmark.py -c 7b -g -b 16 --tp 1 --pp 4 --num_steps 50 --pp_style interleaved --n_chunks 2

Throw below error:

[rank12]: Traceback (most recent call last):
[rank12]:   File "/root/ColossalAI/examples/language/deepseek/benchmark.py", line 295, in <module>
[rank12]:     main()
[rank12]:   File "/root/ColossalAI/examples/language/deepseek/benchmark.py", line 258, in main
[rank12]:     outputs = booster.execute_pipeline(
[rank12]:   File "/usr/local/lib/python3.10/site-packages/colossalai/booster/booster.py", line 221, in execute_pipeline
[rank12]:     return self.plugin.execute_pipeline(batch, model, criterion, optimizer, return_loss, return_outputs)
[rank12]:   File "/usr/local/lib/python3.10/site-packages/colossalai/booster/plugin/hybrid_parallel_plugin.py", line 1407, in execute_pipeline
[rank12]:     outputs = self.scheduler.forward_backward_step(
[rank12]:   File "/usr/local/lib/python3.10/site-packages/colossalai/pipeline/schedule/interleaved_pp.py", line 607, in forward_backward_step
[rank12]:     result = self.run_forward_backward(
[rank12]:   File "/usr/local/lib/python3.10/site-packages/colossalai/pipeline/schedule/interleaved_pp.py", line 462, in run_forward_backward
[rank12]:     output_obj = self.forward_step(model_chunk, model_chunk_id, input_obj, criterion, accum_loss, outputs)
[rank12]:   File "/usr/local/lib/python3.10/site-packages/colossalai/pipeline/schedule/interleaved_pp.py", line 309, in forward_step
[rank12]:     internal_inputs["stage_index"] = self.stage_manager.stage_indices[model_chunk_id]
[rank12]: AttributeError: 'PipelineStageManager' object has no attribute 'stage_indices'

Analysis

I have found PipelineStageManager has not initialize stage_indices in policies/deepseek.py#L334, but policies/deepseek_v3.py#L120 did it

Solution

assign stage_manager.stage_indices after policies/deepseek.py#L334

Environment

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions