You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running the shared code on my laptop without modifying any of the hyperparameters results in the error message, probability tensor contains either inf, nan or element < 0
Material
Did you use Google Colab? No
If not:
Your Operating system (OS) MacOS
Version of your OS 15.3.1
(rl) amir@Laptop-Amir ml-agents % mlagents-learn ./config/poca/SoccerTwos.yaml --env=./training-envs-executables/SoccerTwos/SoccerTwos.app --run-id="SoccerTwos-t1" --no-graphics
┐ ╖
╓╖╬│╡ ││╬╖╖
╓╖╬│││││┘ ╬│││││╬╖
╖╬│││││╬╜ ╙╬│││││╖╖ ╗╗╗
╬╬╬╬╖││╦╖ ╖╬││╗╣╣╣╬ ╟╣╣╬ ╟╣╣╣ ╜╜╜ ╟╣╣
╬╬╬╬╬╬╬╬╖│╬╖╖╓╬╪│╓╣╣╣╣╣╣╣╬ ╟╣╣╬ ╟╣╣╣ ╒╣╣╖╗╣╣╣╗ ╣╣╣ ╣╣╣╣╣╣ ╟╣╣╖ ╣╣╣
╬╬╬╬┐ ╙╬╬╬╬│╓╣╣╣╝╜ ╫╣╣╣╬ ╟╣╣╬ ╟╣╣╣ ╟╣╣╣╙ ╙╣╣╣ ╣╣╣ ╙╟╣╣╜╙ ╫╣╣ ╟╣╣
╬╬╬╬┐ ╙╬╬╣╣ ╫╣╣╣╬ ╟╣╣╬ ╟╣╣╣ ╟╣╣╬ ╣╣╣ ╣╣╣ ╟╣╣ ╣╣╣┌╣╣╜
╬╬╬╜ ╬╬╣╣ ╙╝╣╣╬ ╙╣╣╣╗╖╓╗╣╣╣╜ ╟╣╣╬ ╣╣╣ ╣╣╣ ╟╣╣╦╓ ╣╣╣╣╣
╙ ╓╦╖ ╬╬╣╣ ╓╗╗╖ ╙╝╣╣╣╣╝╜ ╘╝╝╜ ╝╝╝ ╝╝╝ ╙╣╣╣ ╟╣╣╣
╩╬╬╬╬╬╬╦╦╬╬╣╣╗╣╣╣╣╣╣╣╝ ╫╣╣╣╣
╙╬╬╬╬╬╬╬╣╣╣╣╣╣╝╜
╙╬╬╬╣╣╣╜
╙
Version information:
ml-agents: 1.2.0.dev0,
ml-agents-envs: 1.2.0.dev0,
Communicator API: 1.5.0,
PyTorch: 2.6.0
[INFO] Connected to Unity environment with package version 2.3.0-exp.3 and communication version 1.5.0
[INFO] Connected new brain: SoccerTwos?team=1
[INFO] Connected new brain: SoccerTwos?team=0
[INFO] Hyperparameters for behavior name SoccerTwos:
trainer_type: poca
hyperparameters:
batch_size: 2048
buffer_size: 20480
learning_rate: 0.0003
beta: 0.005
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: constant
beta_schedule: constant
epsilon_schedule: constant
checkpoint_interval: 500000
network_settings:
normalize: False
hidden_units: 512
num_layers: 2
vis_encode_type: simple
memory: None
goal_conditioning_type: hyper
deterministic: False
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
network_settings:
normalize: False
hidden_units: 128
num_layers: 2
vis_encode_type: simple
memory: None
goal_conditioning_type: hyper
deterministic: False
init_path: None
keep_checkpoints: 5
even_checkpoints: False
max_steps: 50000000
time_horizon: 1000
summary_freq: 10000
threaded: False
self_play:
save_steps: 50000
team_change: 200000
swap_steps: 2000
window: 10
play_against_latest_model_ratio: 0.5
initial_elo: 1200.0
behavioral_cloning: None
/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/utils.py:289: UserWarning: The use of `x.T` on tensors of dimension other than 2 to reverse their shape is deprecated and it will throw an error in a future release. Consider `x.mT` to transpose batches of matrices or `x.permute(*torch.arange(x.ndim - 1, -1, -1))` to reverse the dimensions of a tensor. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3729.)
torch.nn.functional.one_hot(_act.T, action_size[i]).float()
[INFO] SoccerTwos. Step: 10000. Time Elapsed: 47.233 s. Mean Reward: 0.000. Mean Group Reward: -0.200. Training. ELO: 1199.251.
[INFO] SoccerTwos. Step: 20000. Time Elapsed: 84.973 s. Mean Reward: 0.000. Mean Group Reward: -0.170. Training. ELO: 1199.001.
[INFO] SoccerTwos. Step: 30000. Time Elapsed: 107.770 s. Mean Reward: 0.000. Mean Group Reward: 0.079. Training. ELO: 1199.001.
[INFO] SoccerTwos. Step: 40000. Time Elapsed: 153.743 s. Mean Reward: 0.000. Mean Group Reward: -0.167. Training. ELO: 1198.256.
[INFO] SoccerTwos. Step: 50000. Time Elapsed: 199.639 s. Mean Reward: 0.000. Mean Group Reward: 0.214. Training. ELO: 1197.823.
[INFO] SoccerTwos. Step: 60000. Time Elapsed: 225.712 s. Mean Reward: 0.000. Mean Group Reward: -0.159. Training. ELO: 1197.535.
[INFO] SoccerTwos. Step: 70000. Time Elapsed: 277.827 s. Mean Reward: 0.000. Mean Group Reward: -0.122. Training. ELO: 1196.114.
[INFO] SoccerTwos. Step: 80000. Time Elapsed: 298.038 s. Mean Reward: 0.000. Mean Group Reward: -0.086. Training. ELO: 1196.068.
[INFO] SoccerTwos. Step: 90000. Time Elapsed: 349.539 s. Mean Reward: 0.000. Mean Group Reward: -0.019. Training. ELO: 1197.252.
[INFO] SoccerTwos. Step: 100000. Time Elapsed: 388.232 s. Mean Reward: 0.000. Mean Group Reward: 0.281. Training. ELO: 1200.413.
[INFO] SoccerTwos. Step: 110000. Time Elapsed: 420.512 s. Mean Reward: 0.000. Mean Group Reward: 0.143. Training. ELO: 1202.923.
[INFO] SoccerTwos. Step: 120000. Time Elapsed: 461.744 s. Mean Reward: 0.000. Mean Group Reward: 0.281. Training. ELO: 1205.046.
[INFO] SoccerTwos. Step: 130000. Time Elapsed: 494.701 s. Mean Reward: 0.000. Mean Group Reward: -0.117. Training. ELO: 1205.951.
[INFO] SoccerTwos. Step: 140000. Time Elapsed: 527.693 s. Mean Reward: 0.000. Mean Group Reward: -0.200. Training.
[INFO] SoccerTwos. Step: 150000. Time Elapsed: 561.406 s. Mean Reward: 0.000. Mean Group Reward: -0.003. Training. ELO: 1206.395.
[INFO] SoccerTwos. Step: 160000. Time Elapsed: 591.341 s. Mean Reward: 0.000. Mean Group Reward: 0.000. Training.
[INFO] SoccerTwos. Step: 170000. Time Elapsed: 624.032 s. Mean Reward: 0.000. Mean Group Reward: -0.016. Training. ELO: 1206.998.
[INFO] SoccerTwos. Step: 180000. Time Elapsed: 674.598 s. Mean Reward: 0.000. Mean Group Reward: -0.042. Training. ELO: 1207.747.
[INFO] SoccerTwos. Step: 190000. Time Elapsed: 696.535 s. Mean Reward: 0.000. Mean Group Reward: -0.200. Training. ELO: 1207.996.
[INFO] SoccerTwos. Step: 200000. Time Elapsed: 735.584 s. Mean Reward: 0.000. Mean Group Reward: 0.000. Training. ELO: 1207.996.
[INFO] SoccerTwos. Step: 210000. Time Elapsed: 781.218 s. Mean Reward: 0.000. Mean Group Reward: 0.026. Training. ELO: 1207.007.
Traceback (most recent call last):
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 175, in start_learning
n_steps = self.advance(env_manager)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 233, in advance
new_step_infos = env_manager.get_steps()
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/env_manager.py", line 124, in get_steps
new_step_infos = self._step()
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 408, in _step
self._queue_steps()
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 302, in _queue_steps
env_action_info = self._take_step(env_worker.previous_step)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 543, in _take_step
all_action_info[brain_name] = self.policies[brain_name].get_action(
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/policy/torch_policy.py", line 130, in get_action
run_out = self.evaluate(decision_requests, global_agent_ids)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/policy/torch_policy.py", line 100, in evaluate
action, run_out, memories = self.actor.get_action_and_stats(
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/networks.py", line 640, in get_action_and_stats
action, log_probs, entropies = self.action_model(encoding, masks)
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/action_model.py", line 227, in forward
actions = self._sample_action(dists)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/action_model.py", line 96, in _sample_action
discrete_action.append(discrete_dist.sample())
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/distributions.py", line 124, in sample
return torch.multinomial(self.probs, 1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/amir/anaconda3/envs/rl/bin/mlagents-learn", line 33, in<module>
sys.exit(load_entry_point('mlagents', 'console_scripts', 'mlagents-learn')())
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/learn.py", line 270, in main
run_cli(parse_command_line())
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/learn.py", line 266, in run_cli
run_training(run_seed, options, num_areas)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/learn.py", line 138, in run_training
tc.start_learning(env_manager)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 200, in start_learning
self._save_models()
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 80, in _save_models
self.trainers[brain_name].save_model()
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/ghost/trainer.py", line 334, in save_model
self.trainer.save_model()
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer/rl_trainer.py", line 172, in save_model
model_checkpoint = self._checkpoint()
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer/rl_trainer.py", line 144, in _checkpoint
export_path, auxillary_paths = self.model_saver.save_checkpoint(
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/model_saver/torch_model_saver.py", line 60, in save_checkpoint
self.export(checkpoint_path, behavior_name)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/model_saver/torch_model_saver.py", line 65, inexport
self.exporter.export_policy_model(output_filepath)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/model_serialization.py", line 164, in export_policy_model
torch.onnx.export(
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/__init__.py", line 383, inexport
export(
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/utils.py", line 495, inexport
_export(
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/utils.py", line 1428, in _export
graph, params_dict, torch_out = _model_to_graph(
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/utils.py", line 1053, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args)
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/utils.py", line 937, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/utils.py", line 844, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/jit/_trace.py", line 1498, in _get_trace_graph
outs = ONNXTracedModule(
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/jit/_trace.py", line 138, in forward
graph, _out = torch._C._create_graph_by_tracing(
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/jit/_trace.py", line 129, in wrapper
outs.append(self.inner(*trace_inputs))
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1729, in _slow_forward
result = self.forward(*input, **kwargs)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/networks.py", line 692, in forward
) = self.action_model.get_action_out(encoding, masks)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/action_model.py", line 191, in get_action_out
discrete_out_list = [
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/action_model.py", line 192, in<listcomp>discrete_dist.exported_model_output()
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/distributions.py", line 149, in exported_model_output
returnself.sample()
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/distributions.py", line 124, in sample
return torch.multinomial(self.probs, 1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
The text was updated successfully, but these errors were encountered:
Describe the bug
Running the shared code on my laptop without modifying any of the hyperparameters results in the error message,
probability tensor contains either
inf,
nanor element < 0
Material
If not:
The text was updated successfully, but these errors were encountered: