Skip to content

Rmats prep PR#10128

Open
akaviaLab wants to merge 14 commits intonf-core:masterfrom
akaviaLab:rmats
Open

Rmats prep PR#10128
akaviaLab wants to merge 14 commits intonf-core:masterfrom
akaviaLab:rmats

Conversation

@akaviaLab
Copy link
Contributor

@akaviaLab akaviaLab commented Feb 23, 2026

PR checklist

This is a PR of rmats/prep.
RMATS processes BAM files from RNAseq and identifies splice junctions used, and differences between groups of samples.
RMATS is composed of 4 stages (that can be run together), but I'm planning to split into steps

  1. Prep which processes BAM files (this PR)
  2. Post, which process the output of prep
  3. Dividing the post files into groups, based on the statistics
  4. Stats - comparing two groups. Since stats can only compare two groups, I will set up stage 3 to read contrasts and set up the groups.

I will then set up a workflow that does all 4.

Eventually, the workflow will close #8699 . Please do not close the issue yet.

This creates the main.nf for rmats.py prep command, and adds tests for multiple parameters that can be given to rmats prep.

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • Remove all TODO statements.
  • Broadcast software version numbers to topic: versions - See version_topics
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda

@mashehu mashehu marked this pull request as draft February 24, 2026 11:51
@akaviaLab akaviaLab marked this pull request as ready for review March 8, 2026 17:20
@akaviaLab akaviaLab changed the title Rmats work in progress PR Rmats prep PR Mar 8, 2026

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the comment in line 33, This will ensure that files are named with an appropriate default prefix, but it can be overridden:

Suggested change
def prefix = task.ext.prefix ?: "${meta.id}"
def prefix = task.ext.prefix ?: "${meta.id}_prep"

// NOTES - post seems to need only the BAM *names*, not the actual files. Could we just get the first line of each file to get the names?
// for file in `ls multi_bam_rmats_prep_tmp/*.rmats`; do head -1 $file; done | tr '\n' ','
// NOTES - for stats, it should be possible to parse the formula using patsy, but if we include PAIRADISE we might have R - just do this in R, first pass
path reference_gtf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have a meta.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean it should be
tuple val(meta), path(reference_gtf)?

Also, should rmats_read_len have a meta? (one line below)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta2, but yes.
I wouldn't put a meta on the value channel.

Comment on lines +9 to +11
params.novel_splice_site ? "--novelSS" : "",
(params.novel_splice_site && params.minimum_intron_length) ? "--mil ${params.minimum_intron_length}" : "",
(params.novel_splice_site && params.max_exon_length) ? "--mel ${params.max_exon_length}" : "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we should be using params here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the documentation states that optional parmeters should be given via ext.args, and the example shows it with params
https://nf-co.re/docs/guidelines/components/modules#optional-command-arguments

How else can I put these optional params?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is a config for the nf-test, not for the module itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So params.novel_splice_site is not defined at all, and even if you put the module into a pipeline this config won't get used at all anyway, because it is just for nf-test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a long discussion on Slack with @jfy133 about the usage of config files and parameters, right https://nfcore.slack.com/archives/C043FMKUNLB/p1768941551558009 and I thought I was doing what was discussed there.
I plan to have a modules.config for the rmats sub-workflow, which will have one file, but 4 config sections (one for each module). Currently, I added a module.config for this test, just to check that eveything behaves as it should for this task.

@jfy133 - could you please clarify this question for me?

Comment on lines +12 to +13
// NOTES - post seems to need only the BAM *names*, not the actual files. Could we just get the first line of each file to get the names?
// for file in `ls multi_bam_rmats_prep_tmp/*.rmats`; do head -1 $file; done | tr '\n' ','
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you only need the bam names, then you could pass along the ${prefix}.prep.b1.txt file as an output of this module potentially.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, thank you.
I'm going to need to see how rmats post behaves to figure out the best way, but I'll keep it in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

new module: RMATS

3 participants