You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
Note: Providing complete information in the most concise form is the best way to get help. This issue template serves as the checklist for essential information to most of the technical issues and bug reports. For non-technical issues and feature requests, feel free to present the information in what you believe is the best form.
The command is simple
“/incubator-mxnet/tools/launch.py -H host -n 2 python3 store.py”
or
“/incubator-mxnet/tools/launch.py -H host -n 2 python3 image-classificatioin.py” with some other network config command.
host
"
server1
server2
"
both of them are sshable without password
Environment info (Required)
two Ubuntu16.04 with one GPU
## Error Message:
Traceback (most recent call last):
File “store.py”, line 3, in
store = kv.create(‘dist’)
File “/usr/local/lib/python3.5/dist-packages/mxnet/kvstore.py”, line 674, in create
ctypes.byref(handle)))
File “/usr/local/lib/python3.5/dist-packages/mxnet/base.py”, line 251, in check_call
raise MXNetError(py_str(LIB.MXGetLastError()))
mxnet.base.MXNetError: [16:33:33] src/van.cc:291: Check failed: (my_node.port) != (-1) bind failed
Minimum reproducible example
store.py code
from mxnet import kv, nd
store = kv.create('dist')
shape = (2, 3)
x = nd.random_uniform(shape=shape)
store.init('weight', x)
print('=== init "weight" ==={}'.format(x))
from mxnet import gpu,cpu
ctx = [gpu(0), cpu(0)]
y = [nd.zeros(shape, ctx=c) for c in ctx]
store.pull('weight', out=y)
print('=== pull "weight" to {} ===\n{}'.format(ctx, y))
~
Steps to reproduce
(Paste the commands you ran that produced the error.)
Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Note: Providing complete information in the most concise form is the best way to get help. This issue template serves as the checklist for essential information to most of the technical issues and bug reports. For non-technical issues and feature requests, feel free to present the information in what you believe is the best form.
For Q & A and discussion, please start a discussion thread at https://discuss.mxnet.io
Description
I am trying distributed training on two ubuntu server. Both of them have one GPU,but this may not be the problem.
I installed mxnet-cu90 with pip. and I also git cloned mxnet(https://github.com/apache/incubator-mxnet) to my home directory.
The command is simple
“
/incubator-mxnet/tools/launch.py -H host -n 2 python3 store.py”/incubator-mxnet/tools/launch.py -H host -n 2 python3 image-classificatioin.py” with some other network config command.or
“
host
"
server1
server2
"
both of them are sshable without password
Environment info (Required)
two Ubuntu16.04 with one GPU
Minimum reproducible example
store.py code
from mxnet import kv, nd
store = kv.create('dist')
shape = (2, 3)
x = nd.random_uniform(shape=shape)
store.init('weight', x)
print('=== init "weight" ==={}'.format(x))
from mxnet import gpu,cpu
ctx = [gpu(0), cpu(0)]
y = [nd.zeros(shape, ctx=c) for c in ctx]
store.pull('weight', out=y)
print('=== pull "weight" to {} ===\n{}'.format(ctx, y))
~
Steps to reproduce
(Paste the commands you ran that produced the error.)
What have you tried to solve it?
The text was updated successfully, but these errors were encountered: