Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HANDS-ON BUG] Unit 7 #600

Open
amostof opened this issue Feb 27, 2025 · 0 comments
Open

[HANDS-ON BUG] Unit 7 #600

amostof opened this issue Feb 27, 2025 · 0 comments

Comments

@amostof
Copy link

amostof commented Feb 27, 2025

Describe the bug

Running the shared code on my laptop without modifying any of the hyperparameters results in the error message, probability tensor contains either inf, nan or element < 0

Material

  • Did you use Google Colab? No

If not:

  • Your Operating system (OS) MacOS
  • Version of your OS 15.3.1
(rl) amir@Laptop-Amir ml-agents % mlagents-learn ./config/poca/SoccerTwos.yaml --env=./training-envs-executables/SoccerTwos/SoccerTwos.app --run-id="SoccerTwos-t1" --no-graphics     

            ┐  ╖
        ╓╖╬│╡  ││╬╖╖
    ╓╖╬│││││┘  ╬│││││╬╖
 ╖╬│││││╬╜        ╙╬│││││╖╖                               ╗╗╗
 ╬╬╬╬╖││╦╖        ╖╬││╗╣╣╣╬      ╟╣╣╬    ╟╣╣╣             ╜╜╜  ╟╣╣
 ╬╬╬╬╬╬╬╬╖│╬╖╖╓╬╪│╓╣╣╣╣╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╒╣╣╖╗╣╣╣╗   ╣╣╣ ╣╣╣╣╣╣ ╟╣╣╖   ╣╣╣
 ╬╬╬╬┐  ╙╬╬╬╬│╓╣╣╣╝╜  ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╣╙ ╙╣╣╣  ╣╣╣ ╙╟╣╣╜╙  ╫╣╣  ╟╣╣
 ╬╬╬╬┐     ╙╬╬╣╣      ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣     ╣╣╣┌╣╣╜
 ╬╬╬╜       ╬╬╣╣      ╙╝╣╣╬      ╙╣╣╣╗╖╓╗╣╣╣╜ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣╦╓    ╣╣╣╣╣
 ╙   ╓╦╖    ╬╬╣╣   ╓╗╗╖            ╙╝╣╣╣╣╝╜   ╘╝╝╜   ╝╝╝  ╝╝╝   ╙╣╣╣    ╟╣╣╣
   ╩╬╬╬╬╬╬╦╦╬╬╣╣╗╣╣╣╣╣╣╣╝                                             ╫╣╣╣╣
      ╙╬╬╬╬╬╬╬╣╣╣╣╣╣╝╜
          ╙╬╬╬╣╣╣╜
             ╙
        
 Version information:
  ml-agents: 1.2.0.dev0,
  ml-agents-envs: 1.2.0.dev0,
  Communicator API: 1.5.0,
  PyTorch: 2.6.0
[INFO] Connected to Unity environment with package version 2.3.0-exp.3 and communication version 1.5.0
[INFO] Connected new brain: SoccerTwos?team=1
[INFO] Connected new brain: SoccerTwos?team=0
[INFO] Hyperparameters for behavior name SoccerTwos: 
        trainer_type:   poca
        hyperparameters:
          batch_size:   2048
          buffer_size:  20480
          learning_rate:        0.0003
          beta: 0.005
          epsilon:      0.2
          lambd:        0.95
          num_epoch:    3
          learning_rate_schedule:       constant
          beta_schedule:        constant
          epsilon_schedule:     constant
        checkpoint_interval:    500000
        network_settings:
          normalize:    False
          hidden_units: 512
          num_layers:   2
          vis_encode_type:      simple
          memory:       None
          goal_conditioning_type:       hyper
          deterministic:        False
        reward_signals:
          extrinsic:
            gamma:      0.99
            strength:   1.0
            network_settings:
              normalize:        False
              hidden_units:     128
              num_layers:       2
              vis_encode_type:  simple
              memory:   None
              goal_conditioning_type:   hyper
              deterministic:    False
        init_path:      None
        keep_checkpoints:       5
        even_checkpoints:       False
        max_steps:      50000000
        time_horizon:   1000
        summary_freq:   10000
        threaded:       False
        self_play:
          save_steps:   50000
          team_change:  200000
          swap_steps:   2000
          window:       10
          play_against_latest_model_ratio:      0.5
          initial_elo:  1200.0
        behavioral_cloning:     None
/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/utils.py:289: UserWarning: The use of `x.T` on tensors of dimension other than 2 to reverse their shape is deprecated and it will throw an error in a future release. Consider `x.mT` to transpose batches of matrices or `x.permute(*torch.arange(x.ndim - 1, -1, -1))` to reverse the dimensions of a tensor. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3729.)
  torch.nn.functional.one_hot(_act.T, action_size[i]).float()
[INFO] SoccerTwos. Step: 10000. Time Elapsed: 47.233 s. Mean Reward: 0.000. Mean Group Reward: -0.200. Training. ELO: 1199.251.
[INFO] SoccerTwos. Step: 20000. Time Elapsed: 84.973 s. Mean Reward: 0.000. Mean Group Reward: -0.170. Training. ELO: 1199.001.
[INFO] SoccerTwos. Step: 30000. Time Elapsed: 107.770 s. Mean Reward: 0.000. Mean Group Reward: 0.079. Training. ELO: 1199.001.
[INFO] SoccerTwos. Step: 40000. Time Elapsed: 153.743 s. Mean Reward: 0.000. Mean Group Reward: -0.167. Training. ELO: 1198.256.
[INFO] SoccerTwos. Step: 50000. Time Elapsed: 199.639 s. Mean Reward: 0.000. Mean Group Reward: 0.214. Training. ELO: 1197.823.
[INFO] SoccerTwos. Step: 60000. Time Elapsed: 225.712 s. Mean Reward: 0.000. Mean Group Reward: -0.159. Training. ELO: 1197.535.
[INFO] SoccerTwos. Step: 70000. Time Elapsed: 277.827 s. Mean Reward: 0.000. Mean Group Reward: -0.122. Training. ELO: 1196.114.
[INFO] SoccerTwos. Step: 80000. Time Elapsed: 298.038 s. Mean Reward: 0.000. Mean Group Reward: -0.086. Training. ELO: 1196.068.
[INFO] SoccerTwos. Step: 90000. Time Elapsed: 349.539 s. Mean Reward: 0.000. Mean Group Reward: -0.019. Training. ELO: 1197.252.
[INFO] SoccerTwos. Step: 100000. Time Elapsed: 388.232 s. Mean Reward: 0.000. Mean Group Reward: 0.281. Training. ELO: 1200.413.
[INFO] SoccerTwos. Step: 110000. Time Elapsed: 420.512 s. Mean Reward: 0.000. Mean Group Reward: 0.143. Training. ELO: 1202.923.
[INFO] SoccerTwos. Step: 120000. Time Elapsed: 461.744 s. Mean Reward: 0.000. Mean Group Reward: 0.281. Training. ELO: 1205.046.
[INFO] SoccerTwos. Step: 130000. Time Elapsed: 494.701 s. Mean Reward: 0.000. Mean Group Reward: -0.117. Training. ELO: 1205.951.
[INFO] SoccerTwos. Step: 140000. Time Elapsed: 527.693 s. Mean Reward: 0.000. Mean Group Reward: -0.200. Training.
[INFO] SoccerTwos. Step: 150000. Time Elapsed: 561.406 s. Mean Reward: 0.000. Mean Group Reward: -0.003. Training. ELO: 1206.395.
[INFO] SoccerTwos. Step: 160000. Time Elapsed: 591.341 s. Mean Reward: 0.000. Mean Group Reward: 0.000. Training.
[INFO] SoccerTwos. Step: 170000. Time Elapsed: 624.032 s. Mean Reward: 0.000. Mean Group Reward: -0.016. Training. ELO: 1206.998.
[INFO] SoccerTwos. Step: 180000. Time Elapsed: 674.598 s. Mean Reward: 0.000. Mean Group Reward: -0.042. Training. ELO: 1207.747.
[INFO] SoccerTwos. Step: 190000. Time Elapsed: 696.535 s. Mean Reward: 0.000. Mean Group Reward: -0.200. Training. ELO: 1207.996.
[INFO] SoccerTwos. Step: 200000. Time Elapsed: 735.584 s. Mean Reward: 0.000. Mean Group Reward: 0.000. Training. ELO: 1207.996.
[INFO] SoccerTwos. Step: 210000. Time Elapsed: 781.218 s. Mean Reward: 0.000. Mean Group Reward: 0.026. Training. ELO: 1207.007.
Traceback (most recent call last):
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 175, in start_learning
    n_steps = self.advance(env_manager)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 233, in advance
    new_step_infos = env_manager.get_steps()
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/env_manager.py", line 124, in get_steps
    new_step_infos = self._step()
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 408, in _step
    self._queue_steps()
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 302, in _queue_steps
    env_action_info = self._take_step(env_worker.previous_step)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 543, in _take_step
    all_action_info[brain_name] = self.policies[brain_name].get_action(
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/policy/torch_policy.py", line 130, in get_action
    run_out = self.evaluate(decision_requests, global_agent_ids)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/policy/torch_policy.py", line 100, in evaluate
    action, run_out, memories = self.actor.get_action_and_stats(
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/networks.py", line 640, in get_action_and_stats
    action, log_probs, entropies = self.action_model(encoding, masks)
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/action_model.py", line 227, in forward
    actions = self._sample_action(dists)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/action_model.py", line 96, in _sample_action
    discrete_action.append(discrete_dist.sample())
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/distributions.py", line 124, in sample
    return torch.multinomial(self.probs, 1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/amir/anaconda3/envs/rl/bin/mlagents-learn", line 33, in <module>
    sys.exit(load_entry_point('mlagents', 'console_scripts', 'mlagents-learn')())
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/learn.py", line 270, in main
    run_cli(parse_command_line())
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/learn.py", line 266, in run_cli
    run_training(run_seed, options, num_areas)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/learn.py", line 138, in run_training
    tc.start_learning(env_manager)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 200, in start_learning
    self._save_models()
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 80, in _save_models
    self.trainers[brain_name].save_model()
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/ghost/trainer.py", line 334, in save_model
    self.trainer.save_model()
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer/rl_trainer.py", line 172, in save_model
    model_checkpoint = self._checkpoint()
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer/rl_trainer.py", line 144, in _checkpoint
    export_path, auxillary_paths = self.model_saver.save_checkpoint(
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/model_saver/torch_model_saver.py", line 60, in save_checkpoint
    self.export(checkpoint_path, behavior_name)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/model_saver/torch_model_saver.py", line 65, in export
    self.exporter.export_policy_model(output_filepath)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/model_serialization.py", line 164, in export_policy_model
    torch.onnx.export(
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/__init__.py", line 383, in export
    export(
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/utils.py", line 495, in export
    _export(
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/utils.py", line 1428, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/utils.py", line 1053, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/utils.py", line 937, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/utils.py", line 844, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/jit/_trace.py", line 1498, in _get_trace_graph
    outs = ONNXTracedModule(
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/jit/_trace.py", line 138, in forward
    graph, _out = torch._C._create_graph_by_tracing(
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/jit/_trace.py", line 129, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1729, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/networks.py", line 692, in forward
    ) = self.action_model.get_action_out(encoding, masks)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/action_model.py", line 191, in get_action_out
    discrete_out_list = [
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/action_model.py", line 192, in <listcomp>
    discrete_dist.exported_model_output()
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/distributions.py", line 149, in exported_model_output
    return self.sample()
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/distributions.py", line 124, in sample
    return torch.multinomial(self.probs, 1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant