Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'int' object has no attribute 'astype' #14

Open
bgbrink opened this issue Mar 26, 2018 · 10 comments
Open

AttributeError: 'int' object has no attribute 'astype' #14

bgbrink opened this issue Mar 26, 2018 · 10 comments

Comments

@bgbrink
Copy link

bgbrink commented Mar 26, 2018

When I run GRAAL, I receive the following error:

Processing...
Description: convert dense file to COO sparse data.
Done.
start filtering
nfrags =  [95581]
n init frags =  [95581]
mean sparsity =  0.0021264316
std sparsity =  0.0022989216
max_sparsity =  0.059509736
cleaning : start
number of fragments to remove =  0
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "main_window.py", line 85, in run
    pyramid = pyr.build_and_filter(self.base_folder, self.size_pyramid, self.factor)
  File "/home/benedikt/Python/GRAAL/pyramid_sparse.py", line 69, in build_and_filter
    current_abs_fragments_contacts, pyramid_0)
  File "/home/benedikt/Python/GRAAL/pyramid_sparse.py", line 756, in remove_problematic_fragments
    p.render(pt ,'step %s\nProcessing...\nDescription: removing bad fragments.' % step)
  File "/home/benedikt/Python/GRAAL/progressbar.py", line 61, in render
    self.progress = (bar_width * percent.astype(np.int)) / 100
AttributeError: 'int' object has no attribute 'astype'

The input data has been generated using the HiC-Box (thanks again for your help there).

Most likely unrelated: the stdout is spammed with this message as well:

*** BUG ***
In pixman_region32_init_rect: Invalid rectangle passed
Set a breakpoint on '_pixman_log_error' to debug
@baudrly
Copy link
Member

baudrly commented Mar 26, 2018

For some reason 'percent' (which used to be an int32 np object a few commits ago) is now a plain python int. Could you try replacing the incriminating line with:

self.progress = (bar_width * np.int32(percent)) / 100

and let me know if it fixes the issue? Thanks.

@bgbrink
Copy link
Author

bgbrink commented Mar 26, 2018

Yup that fixed it, thanks. But now I have a new error:

Processing...
Description: removing bad fragments.
max new id =  95582
update contacts files...
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "main_window.py", line 85, in run
    pyramid = pyr.build_and_filter(self.base_folder, self.size_pyramid, self.factor)
  File "/home/benedikt/Python/GRAAL/pyramid_sparse.py", line 69, in build_and_filter
    current_abs_fragments_contacts, pyramid_0)
  File "/home/benedikt/Python/GRAAL/pyramid_sparse.py", line 818, in remove_problematic_fragments
    new_abs_id_frag_a = old_2_new_frags[fa + 1] # old_2_new_frags 1-based index
KeyError: -2

Also, could you explain somewhere what the paramater "Size of the pyramid" does? You only say which one to use for the respective examples, but you don't give a reasoning. Does this parameter depend on the genome size?

@baudrly
Copy link
Member

baudrly commented Mar 26, 2018

Yup that fixed it, thanks. But now I have a new error:

Could you post a couple lines from your abs_fragments_contacts_weighted.txt file? It should really just be a sparse matrix in edge list format, with the first two columns representing the source and target nodes (fragments) and the third one representing the edges (number of contacts). Is there anything out of the ordinary with that file?

Also, could you explain somewhere what the paramater "Size of the pyramid" does?

It sets the size of your bins. The contact map is recursively sum-pooled for every level in the pyramid. The bigger the size, the larger the size of the bins GRAAL is going to work with.

Does this parameter depend on the genome size?

Yes, you don't want the matrix to be too large as it may take too long to converge (not to mention memory issues with your graphic card), but you don't want it to be too small either, as it will limit the possible operations on your genome (and opportunities for correction). From experience, on a GeForce GTX TITAN Z, I've found maps of size ranging from 1000 to 10000 bins to give pretty good results.
It can be hard to gauge the right level at first since the size of bins depends on the restriction map of the genome, but when in doubt I'd start with the highest level first and climb down as needed.

@bgbrink
Copy link
Author

bgbrink commented Mar 26, 2018

Thanks for the explaination, very helpful! You should consider adding it to the readme.md.

I had a quick look at the abs_fragments_contacts_weighted.txt and it looks fine to me:

id_frag_a	id_frag_b	n_contact
0	1	19
0	2	2
0	5	1
0	10	2
0	23	2
0	24	1
0	105	1
0	113	1
0	155	1

@baudrly
Copy link
Member

baudrly commented Mar 26, 2018

I'm not sure what's causing the KeyError. Don't you have any negative numbers hanging in your file? If you don't, could you send me the following:

  • abs_fragments_contacts_weighted.txt
  • fragments_list.txt
  • info_contigs.txt

so I can test and see what's wrong?

@bgbrink
Copy link
Author

bgbrink commented Mar 26, 2018

I just did a grep -oP -- '-\d+' abs_fragments_contacts_weighted.txt; and did not receive any results, so there shouldn't be any negative values. You can download the files here: https://www.dropbox.com/s/rwnil9as8h8v2v6/fragment_files.tar.gz?dl=1

@baudrly
Copy link
Member

baudrly commented Mar 26, 2018

Alright it runs fine on my machine, which may suggest I'm using a more up to date version. I added a branch called 'develop', could you try running it on your dataset?

@bgbrink
Copy link
Author

bgbrink commented Mar 27, 2018

Edit: I realized I never sent you the genome.fasta, which I previously had been selecting as the Fasta file. But since you don't have it, it can't be necessary so I tried it without, see below.
Could you clarify what should be selected under "Load Fasta File"?

@bgbrink
Copy link
Author

bgbrink commented Mar 28, 2018

I looked a bit more into this this morning. I ran everything again from scratch without loading a Fasta File and this is what I got.

Master branch
Computation runs fine up until here:

here we go
92.65692814897916% ▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣□□ step 9000000
Processing...
Description: convert dense file to COO sparse data.
Done.
Start filling the pyramid
here we go
92.65692814897916% ▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣□□ step 9000000
Processing...
Description: loading sparse data into hdf5.
Done.
pyramid built.
here we go
92.65692814897916% ▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣□□ step 9000000
Processing...
Description: convert dense file to COO sparse data.
Done.
start filtering
nfrags =  [95581]
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "main_window.py", line 85, in run
    pyramid = pyr.build_and_filter(self.base_folder, self.size_pyramid, self.factor)
  File "/home/benedikt/Python/GRAAL/pyramid_sparse.py", line 69, in build_and_filter
    current_abs_fragments_contacts, pyramid_0)
  File "/home/benedikt/Python/GRAAL/pyramid_sparse.py", line 585, in remove_problematic_fragments
    sparse_mat_csr = sp.csr_matrix((np_2_scipy_sparse[2,:], np_2_scipy_sparse[0:2,:]), shape=(nfrags, nfrags))
  File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/compressed.py", line 51, in __init__
    other = self.__class__(coo_matrix(arg1, shape=shape))
  File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/coo.py", line 191, in __init__
    self._check()
  File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/coo.py", line 241, in _check
    raise ValueError('negative row index found')
ValueError: negative row index found

Develop Branch
I had to make the following changes in order to get this version to run:

  1. File "main_window.py", line 9, changed wxversion.select("2.8") to wxversion.select("3.0")
  2. File "pyramid_sparse.py", changed all occurrences of p = ProgressBar('green', width=20, block='▣', empty='□') to p = ProgressBar('green', width=20, block='|', empty='-') in order to avoid SyntaxError: Non-ASCII character
  3. Files "simulation_loader.py" and "cuda_lib_gl.py", changed import Image to from PIL import Image

However, the end result is exactly the same, computation runs fine until ValueError.

@baudrly
Copy link
Member

baudrly commented Apr 5, 2018

Hello, sorry for the delay. The fasta file should be the reference genome you used to map the reads onto. When I said 'it runs fine', I means it successfully computes the entirety of the pyramid and stores it in memory. You still have to load the reference genome afterwards though, since it will be used to generate a new fasta file from the 'building blocks' being swapped, flipped, merged, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants