-
Notifications
You must be signed in to change notification settings - Fork 450
Spring 2025 NEWS update. #1898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spring 2025 NEWS update. #1898
Conversation
NEWS
Outdated
* Fix a bug in breakend detection. It was incorrectly assuming that the ALT | ||
allele is of equal length to REF allele, but the VCF specification allows | ||
breakend insertions. | ||
(PR #1858, fixes bcftools#2317. Reported by Nicolai von K�gelgen). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ü
should be UTF-8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that was just a copy from GitHub.
I added this as an introductory paragraph rather than a bulleted item given the significance. I also link to out web pages, but this needs updating further with modern data sets and to clarify the tooling so it's more than a set of benchmarks. This is something I'll work on separately.
NEWS
Outdated
Note this release changes the default output CRAM version from 3.0 to 3.1. | ||
HTSlib and SAMtools have been able to read CRAM 3.1 since version 1.12, | ||
however other tools may not yet be able to cope. We know Noodles reads CRAM | ||
3.1 and htsjdk has a draft implementation, but not yet released. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3.1 and htsjdk has a draft implementation, but not yet released. | |
3.1 and htsjdk has a draft implementation that has not yet been released. |
NEWS
Outdated
htscodec name tokeniser uses a nul between names there is no reason why | ||
another value could be used. This change lets CRAM recognise other separator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
htscodec name tokeniser uses a nul between names there is no reason why | |
another value could be used. This change lets CRAM recognise other separator | |
htscodec name tokeniser uses a NUL between names there is no reason why | |
another value could not be used. This change lets CRAM recognise other separator |
NEWS
Outdated
invalid (by the spec) and previous handling was inconsistent. | ||
(PR #1882, fixes #1866) | ||
|
||
* Updated VCF code to work with VCF 4.4 prefixed phasing info. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Updated VCF code to work with VCF 4.4 prefixed phasing info. | |
* Updated VCF code to work with VCF 4.4 prefixed phasing info. |
NEWS
Outdated
* Remove UR:file:// and UR:ftp:// from ref search path, plus REF_PATH to EBI. | ||
Removing EBI as the default fallback when REF_PATH not set prevents the | ||
unintended DDOS on EBI's servers. | ||
(PR#1881) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this deserves to be lifted to the top alongside the CRAM 3.1 stuff, and rewritten to be much more user-focussed — something like:
HTSlib no longer fetches CRAM reference data from EBI's server by default. Your organisation may wish to set up local infrastructure to supply reference sequences, e.g., using the new ref-cache tool included in this HTSlib release. You can use REF_PATH and/or REF_CACHE environment variables to etc etc. See https://www.htslib.org/doc/reference_seqs.html for details.
The bit about UR:file://
and UR:ftp://
is comparatively minor and separate and could stay in the Updates section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, like the CRAM 3.1 stuff, this should probably be copied over to the top of the samtools NEWS entry too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good idea.
…es. Plus some minor text improvements (thanks to @jmarshall).
NEWS
Outdated
HTSlib no longer fetches CRAM reference data from EBI's server by default. Your | ||
organisation may wish to set up local infrastructure to supply reference | ||
sequences, e.g., using the new ref-cache tool included in this HTSlib release. | ||
You can use REF_PATH and/or REF_CACHE environment variables to etc etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to elucidate the “etc etc” 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh for Heaven's Sake! I even read them to check your grammar. I'll just blame it all on you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just took the editorial decision to delete the etc etc sentence in the end. The following line points people to the external web site which covers it all anyway. The vast majority of users they aren't likely to notice a change (unless the vast majority of users put up with periodic wedges and network failures due to an inaccessible refget server).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put that draft sentence in the suggested paragraph text because it seemed useful to draw attention to what individual users can do (i.e., REF_PATH
/REF_CACHE
) to reinstate their reference data access, in addition to the rather more heavy-weight server solution that might usually be set up at organisation scale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that ordinary users will very much notice this change. For example, I've attempted the following without any environment variables set:
- Constructed a
foo.sam
file with a GRCh38 chr1 header and a read mapped to this chromosome:
@SQ SN:chr1 AN:1 LN:248956422 AS:38 M5:6aef897c3d6ff0c78aff06ac189178dd UR:file:///someone/elses/path/to/GRCh38.fa
read1 0 chr1 6000001 10 200M * 0 0 GCACCCTCTCTGCACCCTGTCTGCGCCGTCCGTGCCCCGTCTGCACCTTGTCTGTGCCCCGTCCACACCCTGCCTGCACTCTGTCTGTGCCCCATCTGGGCCCTCTGCACGCCCTACCTGTGCCCTGTCTGTCTAGATAGATCCTGCAGCGCCTCCCGCCTCTGCTCAGCCAGCCCCCCGCACCCCCTAGAGGGTATTGC bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
- Used
samtools-1.21 view -Co foo.cram foo.sam
to convert this file to CRAM. As a byproduct of this conversion, samtools-1.21 populated ~/.cache/hts-ref/6a/ef/897c… with a chr1 cache. - Used
samtools-1.21 view foo.cram
to view the file. Samtools-1.21 used this cache to view the file quickly. - However current develop samtools could not view the file:
$ samtools view foo.cram
[E::fai_build3_core] Failed to open the file ///someone/elses/path/to/GRCh38.fa : No such file or directory
[E::refs_load_fai] Failed to open reference file '///someone/elses/path/to/GRCh38.fa'
[W::cram_get_ref] Failed to populate reference "chr1"
[W::cram_get_ref] See https://www.htslib.org/doc/reference_seqs.html for further suggestions
…etc…
Even though that cache exists, current develop samtools does not see it by default. You can set REF_CACHE
to point to it and current develop samtools will happily display the file; however, I would think that most ordinary users will need every available pointer to become aware of these environment variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I'll have a think about how better to phrase it. Note the web page documentation isn't just about the new ref-cache tool. It's there as a better description of how it all fits together, and is mentioned in error messages (as you saw). It's easier to point people to that than a man page, especially htslib users given essentially all library documentation is in samtools.1 man page (ie a totally different project).
My point about users not noticing is more about the main existing user base, but yes new users will (and already do) hit problems. My assertion there is based on the observation that the current EBI's ref-get server is sufficiently unreliable that anyone relying it on will quite rapidly hit a failure. So I'd imagine most who have been down that road have already found workarounds (all bar the one university which allegedly accounts for about half of the entire ref-get load!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason the default location for REF_CACHE
was removed is it was only automatically added when fetching from the EBI, and removing that means we no longer need to automatically cache either. Always setting REF_CACHE means other non-EBI downloads start to cache copies too, which would include any local proxies. We don't (necessarily) want two level proxies, although it can have some benefits in terms of partial file loads as the REF_CACHE is always a local file system and permits mmaps.
REF_CACHE was used in two ways - as somewhere to read from, and most importantly somewhere to write to for remote downloads. The whole point of REF_CACHE was it was caching, ie somewhere locally to write, and if you wanted read-only support then that's what REF_PATH is for. However I hadn't really considered the transition for people previously using a samtools that populated REF_CACHE from the EBI and are now using a samtools that doesn't populate. That was probably an error which should have been picked up, but sadly it sat for 4 months before being reviewed meaning I'd forgotten the nuances of the change too. We should have merged it promptly so we and others could be stress testing things, rather than delaying until days before release. :(
I'll consider reinstating this so having no REF_CACHE set defaults to searching in the standard location (whether via REF_PATH or REF_CACHE is irrelevant as it's a read-only thing and wouldn't actually change the environment). That will aid the transition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did consider if the default REF_CACHE
should be left so it works for pre-existing caches. On balance, I decided it was better for it to be set explicitly - otherwise you get a situation where samtools-1.22 onwards mysteriously do or do not work depending on if they find a pre-existing cache. Requiring the variable to be set makes the behaviour much more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the NEWS file again, but discussions on ongoing regarding the behaviour of a default REF_CACHE search path. (I was wrong to suggest it wasn't thought about. I'd forgotten about the details in my PR, but it was considered during merge.)
I merged it with the following line as we don't need to regurgitate the full details in NEWS, just point people to the authorative source and give them a hint on the key terms to search for.
|
||
* Convert U to T instead of U to N when sam_parsing. Though SAM format itself | ||
can contain U the BAM format cannot. | ||
(PR #1854, fixes samtools#2131 reported by James Furguson) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, didn't notice this previously.
(PR #1854, fixes samtools#2131 reported by James Furguson) | |
(PR #1854, fixes samtools#2131 reported by James Ferguson) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now PR #1919.
This is an update of the things that have happened since the last release. We will continue to update it until the next release.