Skip to content

Add very basic test program for the libmaus2/biobambam2 cram encoding interface#43

Open
gt1 wants to merge 2 commits intojkbonfield:masterfrom
gt1:bambam_interface_test
Open

Add very basic test program for the libmaus2/biobambam2 cram encoding interface#43
gt1 wants to merge 2 commits intojkbonfield:masterfrom
gt1:bambam_interface_test

Conversation

@gt1
Copy link
Contributor

@gt1 gt1 commented Apr 11, 2021

Hi James,

here is a suggestion for a very simple test program using the cram_bambam interface. So far it is quite peculiar in that it requires

  1. the input file to have the SN, LN, M5 and UR attribute in each SQ header line
  2. all reference sequences to be present in the REF_CACHE directory via their M5 values (i.e. it does not deduce the sequences from some reference FastA file)

It has no actual parallelism yet, but I have used it successfully to transcode a BAM file into CRAM, so it should be sufficient to check whether the interface is functional (apart from locking/threading issues).

Best wishes,
German

@jkbonfield
Copy link
Owner

jkbonfield commented Apr 19, 2021

It builds, but there are no uses of it in the test harness.

I can run it on the existing test data if I recreate the tests/test.out/ce#sorted.full.cram file using a reference (the last test in the existing test harness is a referenceless encoding, but we could swap the entries around there so after that test it's reference based). It's easy enough to create an MD5 directory from ce.fa too and include that.

How would you want it validating? Success on decode of the output file with scramble once more? Eg:

REF_CACHE=../tests/data/refs/%s scramble -r ../tests/data/ce.fa tests/test.out/ce#sorted.full.cram tests/test.out/bambam_in.cram 
REF_CACHE=../tests/data/refs/%s ./tests/cram_bambam_interface_test < tests/test.out/bambam_in.cram  > tests/test.out/bambam_out.cram
REF_CACHE=../tests/data/refs/%s scramble tests/test.out/bambam_out.cram > /dev/null

I think I'd prefer it to be using filenames than stdin / stdout, as that's liable to cause test failures on Windows unless we're careful (I haven't checked if we are). That's easy to do with a change to the arg for scram_open and changing the fwrite file pointer. (A global var is sufficient - it's just a test program afterall.)

…ame instead of

reading from standard input and writing to standard output
@gt1
Copy link
Contributor Author

gt1 commented Apr 20, 2021

I have changed the program so it uses filenames of an input and an output file instead of stdin and stdout.

As for the valdiating method: I think another step of decoding the output again with scramble should be fine. Ideally it should check the actual data, but that could be difficult because of reformatting. Do you have anything in place for doing this? The order of the records should remain the same, so at least some fields chould be checked (seqid, position, read name, etc.).

@jkbonfield
Copy link
Owner

I'll have a think. I think I already rely on things like awk and md5sum in the test harness, but not 100% sure so I should check. If they're portable enough then it ought to be trivial to cull anything we think may be changeable, but I'd guess it ought to be pretty stable bar perhaps some header meta-data.

Thanks for working on this.

@jkbonfield
Copy link
Owner

jkbonfield commented May 6, 2021

I added a call to this in the test harness in a local branch containing this PR and rebased on top of some updates to master. I just added these to the existing cram_io.test file as both tests are covering biobambam requirements.

https://github.com/jkbonfield/io_lib/blob/bambam_interface_test/tests/cram_io.test

It works on most platforms, however it fails on s390x. Is this a known issue? If so is it the test that is incompatible, or the API in io_lib itself? We could potentially omit the test on a specific platform via a check of uname -s or similar.

https://travis-ci.org/github/jkbonfield/io_lib/jobs/769685530#L5560

*** Error in `/home/travis/build/jkbonfield/io_lib/tests/.libs/lt-cram_bambam_interface_test': malloc(): memory corruption: 0x000002aa01074bd0 ***

======= Backtrace: =========
/lib/s390x-linux-gnu/libc.so.6(+0x79262)[0x3ff81179262]
/lib/s390x-linux-gnu/libc.so.6(+0x7f3d6)[0x3ff8117f3d6]
/lib/s390x-linux-gnu/libc.so.6(+0x81ac0)[0x3ff81181ac0]
/lib/s390x-linux-gnu/libc.so.6(__libc_calloc+0x2a0)[0x3ff81184820]
/home/travis/build/jkbonfield/io_lib/io_lib/.libs/libstaden-read.so.14(cram_new_container+0x22)[0x3ff8146ec42]
/home/travis/build/jkbonfield/io_lib/io_lib/.libs/libstaden-read.so.14(cram_put_bam_seq+0x790)[0x3ff81459a20]
/home/travis/build/jkbonfield/io_lib/io_lib/.libs/libstaden-read.so.14(cram_process_work_package+0x142)[0x3ff8147950a]
/home/travis/build/jkbonfield/io_lib/tests/.libs/lt-cram_bambam_interface_test(+0x1bd6)[0x2aa00581bd6]
/home/travis/build/jkbonfield/io_lib/io_lib/.libs/libstaden-read.so.14(cram_enque_compression_block+0x10c)[0x3ff8147939c]
/home/travis/build/jkbonfield/io_lib/tests/.libs/lt-cram_bambam_interface_test(main+0x5ce)[0x2aa0058186e]
/lib/s390x-linux-gnu/libc.so.6(__libc_start_main+0x10e)[0x3ff81122ece]
/home/travis/build/jkbonfield/io_lib/tests/.libs/lt-cram_bambam_interface_test(+0x1a14)[0x2aa00581a14]
======= Memory map: ========
...
Aborted (core dumped)

I'm struggling to reproduce it on easier accessed platforms. I've tried address sanitizer, thread sanitizer and valgrind and it seems quite happy with all.

For what it's worth I'm retiring travis anyway soon, having now got some basic GitHub Actions set up, but I haven't as yet got other platforms done there so I may leave it for a little while yet.

@gt1
Copy link
Contributor Author

gt1 commented May 6, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants