|
1 |
| -2025-02- 14 Tao Liu < [email protected]> |
| 1 | +2025-02- 19 Tao Liu < [email protected]> |
2 | 2 | MACS 3.0.3
|
3 | 3 |
|
4 | 4 | * Features added
|
5 | 5 |
|
6 |
| - 1) FRAG format introduced. FRAG format is for fragment files |
7 |
| - defined by 10x genomics to store the alignments from single-cell |
8 |
| - ATAC-seq experiment. It can be regarded as the BEDPE format with |
9 |
| - two extra columns -- the barcode information and the counts of the |
10 |
| - fragments aligned to the same location with the same |
11 |
| - barcode. Currently, `callpeak` and `pileup` both support the new |
12 |
| - format. We will add support for other functions such as `hmmratac` |
13 |
| - in the future. |
14 |
| - |
15 |
| - We implemented the IO module for reading the fragment files in |
16 |
| - `Parser.FragParser`. And we then implemented a new |
17 |
| - `PairedEndTrack.PETrackII` to store the data in fragment file, |
18 |
| - including the barcodes and counts information. In the `PETrackII` |
19 |
| - class, we are able to extract a subset using a list of barcodes, |
20 |
| - which enables us to call peaks only on a pool (pseudo-bulk) of |
21 |
| - cells. |
22 |
| - |
23 |
| - 2) We extensively rewrote the `pyx` codes into `py` codes. In |
24 |
| - another words, we now apply the 'pure python style' with PEP-484 |
25 |
| - type annotations to our previous Cython style codes. So that, the |
26 |
| - source codes can be more compatible to Python programming tools |
27 |
| - such as `flake8`. During rewritting, we cleaned the source codes |
28 |
| - even more, and removed unnecessary dependencies during |
29 |
| - compilation. We will continue to do more code cleaning in the |
30 |
| - future. |
31 |
| - |
32 |
| - 3) We changed the behavior on the usage of 'blacklist' regions in |
33 |
| - `hmmratac`. We will remove the aligned fragments located in the |
34 |
| - 'blacklist' regions before the EM step to estimate fragment |
35 |
| - lengths distributions and HMM step to learn and predict nucleosome |
36 |
| - states. The reason is discussed in #680. To implement this |
37 |
| - feature, we added the `exclude` functions to PETrackI and |
38 |
| - PETrackII. |
| 6 | + 1) Now support FRAG format for single-cell ATAC-seq in `callpeak` |
| 7 | + and `pileup`. FRAG format is used by 10x Genomics to store |
| 8 | + alignments from the single-cell ATAC-seq pipeline |
| 9 | + `cellranger-atac` or the multiomics pipeline `cellranger-arc`. The |
| 10 | + format is essentially BEDPE with two additional columns: the |
| 11 | + barcode and the count of fragments aligned to the same location |
| 12 | + with the same barcode. Support for FRAG in other tools is coming |
| 13 | + soon, as well as for `hmmratac` calls. |
| 14 | + |
| 15 | + If you specify `-f FRAG` as your input format: |
| 16 | + |
| 17 | + - You can use a barcode list for a subset of cells with |
| 18 | + `--barcodes`, then `callpeak` will identify peaks and `pileup` |
| 19 | + will build pileup track for the fragments of this subset of cells. |
| 20 | + |
| 21 | + - Duplicates will not get removed as we'll assume all fragments |
| 22 | + are valid. Optionally, an option, `--max-count`, can be applied to |
| 23 | + set the maximum count. |
| 24 | + |
| 25 | + 2) We transitioned our `pyx` codes to `py` codes, adopting a 'pure |
| 26 | + Python style' with PEP-484 type annotations. This change has made |
| 27 | + oursource codes more compatible with Python programming tools such |
| 28 | + as `flake8`. During this process, we performed further code |
| 29 | + cleaning and eliminated unnecessary dependencies. We intend to |
| 30 | + continue improving our code quality in the future. |
| 31 | + |
| 32 | + 3) We have modified the handling of 'blacklist' regions in the |
| 33 | + `hmmratac` tool. This change impacts both the |
| 34 | + Expectation-Maximization (EM) step that estimates fragment length |
| 35 | + distributions, and the Hidden Markov Model (HMM) step that learns |
| 36 | + and predicts nucleosome states. We now exclude aligned fragments |
| 37 | + located in the 'blocklist' regions before both steps. We |
| 38 | + implemented the `exclude` functions in both PETrackI and PETrackII |
| 39 | + to support this feature. For more detailed information and the |
| 40 | + reasoning behind it, refer to issue #680. |
| 41 | + |
| 42 | + 4) We have tested Numpy>=2. Now MACS3 can be run on all Numpy >= |
| 43 | + 1.25. |
39 | 44 |
|
40 | 45 | * Bug fixed
|
41 | 46 |
|
42 |
| - 1) `hmmratac` option `--keep-duplicate` had opposite effect |
43 |
| - previously as indicated by the name and description. It has been |
44 |
| - renamed as `--remove-dup` to reflect the actual |
45 |
| - behavior. `hmmratac` will not remove duplicated fragments unless |
46 |
| - this option is set. |
| 47 | + 1) The `hmmratagc` option `--keep-duplicate` previously had the |
| 48 | + opposite effect of what its name and description |
| 49 | + suggested. Therefore, it was renamed to `--remove-dup` to more |
| 50 | + accurately describe the actual behavior. Duplicate fragments will |
| 51 | + not be removed by `hmmratac` unless this option is explicitly set |
| 52 | + up. |
47 | 53 |
|
48 |
| - 2) `hmmratac`: wrong class name used while saving digested signals |
49 |
| - in BedGraph files. Multiple other issues related to output |
50 |
| - filenames. #682 |
| 54 | + 2) `hmmratac`: wrong class name was used while saving digested |
| 55 | + signals in BedGraph files. Fixed multiple other issues related to |
| 56 | + output filenames. #682 |
51 | 57 |
|
52 | 58 | 3) Fix issues in big-endian system in `Parser.py` codes. Enable
|
53 | 59 | big-endian support in `BAM.py` codes for accessig certain
|
|
0 commit comments