-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
errors during tdgl testing #69
Comments
Can you open up Python and just try to import cupy? import cupy |
The default value for In any case, you can just work with |
I was able to import cupy, but the same errors were generated. I also set SolverOtion.gpu=False at the prompt and then ran tdgl.testing.run(). Again same errors... |
Here is the tdgl.version_dict() output {'tdgl': '0.8.0; |
Here is a section of error output. ______________ test_source_drain_current[1-True-True-True-0-5.0] _______________ transport_device = Device(
tdgl/test/test_solve.py:106: tdgl/solver/solve.py:44: in solve self = <tdgl.finite_volume.operators.MeshOperators object at 0x7f53f5ddcb20>
E AssertionError tdgl/finite_volume/operators.py:289: AssertionError transport_device = Device(
tdgl/test/test_solve.py:106: |
Here is the warning summary tdgl/test/test_visualize.py::test_interactive[True-False-None] tdgl/test/test_visualize.py: 40 warnings -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html |
It looks like the problem is these lines: py-tdgl/tdgl/finite_volume/operators.py Lines 7 to 12 in b779261
I am guessing that from cupyx.scipy.sparse import csc_matrix, csr_matrix
from cupyx.scipy.sparse.linalg import factorized |
We're reaching the limit of my shallow knowledge of Python. What do you mean by "...try the following?" Do you want me to edit the source code? |
I want to check if you can import the necessary modules related to
|
Ah ok, here you go.
|
Thanks! I think the problem may be that the location of the CUDA libraries is not in the
|
If the above works, then you can add the updated For example on my system, I would run (base) loganbvh ~ % conda activate tdgl
(tdgl) loganbvh ~ % echo $CONDA_PREFIX
/Users/loganbvh/miniforge3/envs/tdgl
(tdgl) loganbvh ~ % export 'LD_LIBRARY_PATH=/Users/loganbvh/miniforge3/envs/tdgl/lib:$LD_LIBRARY_PATH' >> ~/.bashrc |
Hmmm...still same errors. When i printenv LD_LIBRARY_PATH I get /home/TJPhysics/miniconda3/envs/tdgl/lib If I try to change the path to anything else I get a "bad substitution" response |
I think the "bad substitution error" was because I wasn't using the CONDA_PREFIX variable. I can assign any path to the CONDA_PREFIX variable and then, export LD_LIBRARY_PATH=${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH}. Got it. However, I'm still not sure what path to assign it. What do you mean by "the directory containing the current conda environment"? How would I figure that out? |
Sorry I wasn't clear. $CONDA_PREFIX will be set automatically when you activate a conda environment. On my system it looks like this (base) loganbvh@Logans-MacBook-Pro-2 ~ % echo $CONDA_PREFIX
/Users/loganbvh/miniforge3
(base) loganbvh@Logans-MacBook-Pro-2 ~ % conda activate tdgl
(tdgl) loganbvh@Logans-MacBook-Pro-2 ~ % echo $CONDA_PREFIX
/Users/loganbvh/miniforge3/envs/tdgl Which linux distribution are you using? And what is the output of |
Just to reiterate what I said above, you don't need a working installation of If you end up working with large |
Do you have CUDA Toolkit installed as listed in the requirements for cupy (https://docs.cupy.dev/en/stable/install.html#requirements)? If not, you can install it in your
|
loganbvh, answering your comments from top to bottom.
Since that produced problems I tried: conda install -c conda-forge -cupy However, I did set up conda as described at the beginning of the install instructions without any problems. If I repeat those commands, here is what I get: conda create --name tdgl python="3.10" If I say no (tdgl) tom@tom-Nitro-AN515-54: Error while loading conda entry point: conda-libmamba-solver (libarchive.so.19: cannot open shared object file: No such file or directory) |
I'm not sure why certain things got crossed out above... |
Thanks, and sorry for the barrage of questions. I am not a CUDA expert and I don't have a GPU in my personal computer (it's a MacBook), so debugging this sort of problem is difficult. To easily access a Linux machine with a GPU I have to use Google Colab. This is my understanding of the correct steps for installing
|
"Error while loading conda entry point: conda-libmamba-solver (libarchive.so.19: cannot open shared object file: No such file or directory)" Any thoughts as to what this might mean? This keeps coming up conda install cuda -c nvidia CondaValueError: You have chosen a non-default solver backend (libmamba) but it was not recognized. Choose one of: classic |
I have not seen this error before, but it looks like it is a known issue with conda (conda/conda-libmamba-solver#283) and it may be fixed by re-installing a library called
|
Same error as above with: conda install -n base libarchive -c main --force-reinstall |
It seems like your conda installer is just not working at all for some reason. Can you check whether you have CUDA installed at all by running "nvcc --version" in a terminal? If that command succeeds, it means CUDA is installed. In that case, verify that the CUDA version is compatible with the version of cupy you installed If the command produces an error, it means CUDA is not installed. In that case, I you can just download CUDA Toolkit directly from the Nvidia website: https://developer.nvidia.com/cuda-toolkit. This will do a global install on your system (I think), whereas installing CUDA with conda is required if you want to have different CUDA versions for different conda environments. |
Wow, nothing is working! Command 'nvcc' not found, but can be installed with: I tried installing directly from the NVIDIA website 1st. Everything went smoothly. But no change when I type the nvcc --version command. So I tried sudo apt install nvidia-cuda-toolkit Reading package lists... Done The following packages have unmet dependencies: I appreciate you trying to figure this out. As you said, its not really an issue for you since you don't have an NVIDIA graphics card. Has anybody made CUDA work for the tdgl program? |
If you installed CUDA directly from the NVIDIA website and the
If that works, you will want to add those lines to
This is not a problem with cupy or tdgl. I have successfully run tdgl with NVIDIA GPUs many times (just not on my personal/development laptop). The issue is just that you are missing a critical dependency of cupy (the CUDA Toolkit), which cannot be installed automatically by either cupy or tdgl. The way I would normally recommend installing the CUDA Toolkit is using conda (#69 (comment)), but for some reason conda is not working properly on your system so installing CUDA from the NVIDIA website is the other option. |
Updating the path was the trick to getting rid of those errors. Thanks for the help on that. Now I'm just getting warnings. I know I can run with the warnings, but I thought I'd check to see if there were any easy fixes. tdgl/test/test_visualization.py: 144 warnings tdgl/test/test_visualize.py::test_interactive[True-False-None] tdgl/test/test_visualize.py: 40 warnings -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html |
Great, thanks for letting me know. I will update the documentation with more detailed instructions on installing cupy and CUDA. I will also update the code to prevent those warnings and let you know when the new version is available (should be no more than a day or two). |
Would you be willing to field some basic questions about the Quickstart examples? I'm not sure if this is the venue. I think they are Python related more than physics. |
Sure, no problem |
I'm following your Quickstart guide, and I'm working at the command line in a linux shell (not Colab). I'm entering your commands as given, and that seems to be working until I arrive at the command numer [8]: fig, ax = device.draw() which I think is supposed to generate the picture immediately below it on the webpage, correct? If so I don't generate anything. What is going on here? |
The most widely used Python graphing library is matplotlib and most of the functions needed for plotting exist under the One popular interface for Python, aside from the command line, is Jupyter. The Quickstart example in the documentation is one of these Jupyter notebooks: https://github.com/loganbvh/py-tdgl/blob/main/docs/notebooks/quickstart.ipynb. Jupyter is a dependency of If you have cloned the |
Great! That all worked. I will probably start using Jupyter. In the mean time if I want to see the Mesh Stats [11]: device.mesh_stats() is there a way to do that from the command line or do I need to be in Jupyter. It looks like it is trying to generate some html. |
You can call |
I am running into an error at section 18 of the quickstart guide: When I enter this code if MAKE_ANIMATIONS: I get this error RuntimeError Traceback (most recent call last) Cell In[4], line 16, in make_video_from_solution(solution, quantities, fps, figsize) File ~/.local/lib/python3.10/site-packages/matplotlib/animation.py:1285, in Animation.to_html5_video(self, embed_limit) File ~/.local/lib/python3.10/site-packages/matplotlib/animation.py:148, in MovieWriterRegistry.getitem(self, name) RuntimeError: Requested MovieWriter (ffmpeg) not available I installed ffmpeg using pip. The error change slightly. It says, "UserWarning: Animation was deleted without rendering anything. This is most likely not intended. To prevent deletion, assign the Animation to a variable, e.g. It still highlights the same sections of code, and says that ffmpeg is not available. |
I cut and paste out of the Jupyter notebook so I lost the indentation. Also there are sections of the code highlighted but I'm not sure how to communicate that |
Installing the ffmpeg Python package from pip doesn't actually install the ffmpeg library (ref.). You need to install ffmpeg using your system package manager, e.g. The function def make_mp4_video_from_solution(
solution,
output_file="output.mp4",
quantities=("order_parameter", "phase"),
fps=20,
figsize=(5, 4),
):
"""Saves an MP4 video from a tdgl.Solution."""
with tdgl.non_gui_backend():
with h5py.File(solution.path, "r") as h5file:
anim = create_animation(
h5file,
quantities=quantities,
fps=fps,
figure_kwargs=dict(figsize=figsize),
)
anim.save(output_file) Then you can call the following and view the resulting MP4 file in whatever video application you'd like. if MAKE_ANIMATIONS:
make_mp4_video_from_solution(
zero_field_solution,
output_file="cell18.mp4",
quantities=["order_parameter", "phase", "scalar_potential"],
figsize=(6.5, 4),
) |
I started a PR with more detailed instructions in the documentation for installing CUDA: https://py-tdgl--70.org.readthedocs.build/en/70/installation.html#gpu-acceleration Want to take a look and let me know if you think there's anything that needs further clarification? |
All of that worked! Got through the Quickstart. Thanks for all your help in getting me started. |
I'm new to python and tdgl. After installing the software I ran the tdgl.testing program and it generated the following errors.
FAILED tdgl/test/test_solve.py::test_source_drain_current[1-True-True-True-0-5.0] - AssertionError
FAILED tdgl/test/test_solve.py::test_source_drain_current[1-True-True-True-0-] - AssertionError
FAILED tdgl/test/test_solve.py::test_source_drain_current[1-True-True-True-1-5.0] - AssertionError
FAILED tdgl/test/test_solve.py::test_source_drain_current[1-True-True-True-1-] - AssertionError
In scrolling through the code that is printed with the error I found this repeated every time.
E AssertionError
However, I did install cupy
python -m pip install -U setuptools pip
pip install cupy-cuda12x
pip install cuTENSOR.
but was unable to install the additional libraries like NCCL, etc
However, I'm not sure if that is really the problem. There are instructions to set tdgl.SolverOptions = True, but I'm not sure how to do that. I would assume it would default to the CPU if I didn't set it to TRUE?
Can anyone confirm that not installing those libraries is the problem? Anybody run into this problem and know how to fix it?
The text was updated successfully, but these errors were encountered: