diff --git a/README.md b/README.md index 47f3b05f58421ee5393c80c35303de10aeed22ed..255bdd69fce59c271800da3ff3c0328a04830127 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,13 @@ A modularized version of the program [PopIns2](https://github.com/kehrlab/PopIns2) for population-scale detection of non-reference sequence variants. -__Note: The recommended way to run popins4snake is via the Snakemake workflow [*PopinSnake*](https://gitlab.informatik.hu-berlin.de/fonda_a6/popinSnake).__ + +*Popins4snake* is a program consisting of several functions. +The functions are designed to be chained into a workflow, together with calls to standard bioinformatics programs (samtools, bwa, ...) and bash commands. + +__The recommended way of running *popins4snake* is using the Snakemake workflow [PopinSnake](https://gitlab.informatik.hu-berlin.de/fonda_a6/popinSnake).__ + +You can find installation instructions for all dependencies of the PopinSnake workflow, including instructions for installing popins4snake in the [PopinSnake README file](https://gitlab.informatik.hu-berlin.de/fonda_a6/popinSnake/-/blob/main/README.md). @@ -11,6 +17,7 @@ __Note: The recommended way to run popins4snake is via the Snakemake workflow [* 1. [Requirements](#requirements) 1. [Installation](#installation) 1. [Usage](#usage) +1. [Summary of popins4snake functions](#summary-of-popins4snake-functions) 1. [Help](#help) 1. [References](#references) @@ -82,95 +89,92 @@ The [PopIns2 Wiki](https://github.com/kehrlab/PopIns2/wiki/Troubleshooting---FAQ ## Usage -*Popins4snake* is a program consisting of several functions. -The functions are designed to be chained into a workflow together with calls to standard bioinformatics programs (samtools, bwa, ...) and bash commands. -__The recommended way of running *popins4snake* is using the Snakemake workflow [PopinSnake](https://gitlab.informatik.hu-berlin.de/fonda_a6/popinSnake).__ -To display the help page of each of the *popins4snake* function, type `popins2 <command> --help` as shown in the [help section](#help). +To get an overview of the functions offered in *popins4snake*, you can run `./popins4snake -h` after installation: +``` +===================================================================== +A modularized version of the program PopIns2 + for population-scale detection of non-reference sequence variants +===================================================================== + +SYNOPSIS + ./popins4snake COMMAND [OPTIONS] -### The `crop-unmapped` command +COMMAND + crop-unmapped Extract unmapped and poorly aligned reads from a BAM file. + merge-bams Merge two name-sorted BAM files of the same sample and set mate information of now paired reads. + merge-contigs Merge sets of contigs into supercontigs using a colored compacted de Bruijn Graph. + find-locations Find insertion locations of (super-)contigs per sample. + merge-locations Merge insertion locations from all samples into one file. + place-refalign Find positions of (super-)contigs by aligning contig ends to the reference genome. + place-splitalign Find positions of (super-)contigs by split-read alignment (per sample). + place-finish Combine (super-)contig positions found by split-read alignment from all samples. + genotype Determine genotypes of all insertions in a sample. + +VERSION + 0.1.0-a52d4f5, Date: 2022-08-25 14:42:31 + +Try `../popins4snake/popins4snake COMMAND --help' for more information on each command. ``` -popins2 crop-unmapped [OPTIONS] sample.bam + + + +## Summary of *popins4snake* functions + +To display the help page of each of the *popins4snake* functions, type `./popins4snake <command> --help`. + + +### The `crop-unmapped` function +``` +popins4snake crop-unmapped [OPTIONS] sample.bam ``` The crop-unmapped command identifies reads without high-quality alignment to the reference genome. The reads given in the input BAM file must be indexed, i.e. the file `sample.bam.bai` is expected to exist. -### The `merge-bams` command +### The `merge-bams` function ``` -popins2 merge-bams [OPTIONS] input1.bam input2.bam +popins4snake merge-bams [OPTIONS] input1.bam input2.bam ``` -### The `merge-contigs` command +### The `merge-contigs` function ``` -popins2 merge-contigs [OPTIONS] {-s|-r} /path/to/sample_directories/ +popins4snake merge-contigs [OPTIONS] {-s|-r} /path/to/sample_directories/ ``` \[Default\] The merge command builds a colored and compacted de Bruijn Graph (ccdbg) of all contigs of all samples in a given source directory _DIR_. By default, the merge module finds all files of the pattern `<DIR>/*/assembly_final.contigs.fa`. To process the contigs of the [assemble command](#the-assemble-command) the __-r__ input parameter is recommended. Once the ccdbg is built, the merge module identifies paths in the graph and returns _supercontigs_. ``` -popins2 merge [OPTIONS] -y input.gfa -z input.bfg_colors +popins4snake merge-contigs [OPTIONS] -y input.gfa -z input.bfg_colors ``` An alternative way of providing input for the merge command is to directly pass a ccdbg. Here, the merge command expects a _GFA_ file and a _bfg_colors_ file, which is specific to the Bifrost. If you choose to run the merge command with a _pre_-built GFA graph, mind that you have to set the Algorithm options accordingly (in particular __-k__). -### The `find-locations` command +### The `find-locations` function ``` -popins2 find-locations [OPTIONS] SAMPLE_ID +popins4snake find-locations [OPTIONS] SAMPLE_ID ``` -### The `merge-locations` command +### The `merge-locations` function ``` -popins2 merge-locations [OPTIONS] +popins4snake merge-locations [OPTIONS] ``` -### The `place` commands +### The `place` function ``` -popins2 place-refalign [OPTIONS] -popins2 place-splitalign [OPTIONS] SAMPLE_ID -popins2 place-finish [OPTIONS] +popins4snake place-refalign [OPTIONS] +popins4snake place-splitalign [OPTIONS] SAMPLE_ID +popins4snake place-finish [OPTIONS] ``` In brief, the place commands attempt to anker the supercontigs to the samples. At first, all potential anker locations from all samples are collected. Then prefixes/suffixes of the supercontigs are aligned to all collected locations. For successful alignments records are written to a VCF file. In the second step, all remaining locations are split-aligned per sample. Finally, all locations from all successful split-alignments are combined and added to the VCF file. -### The `genotype` command +### The `genotype` function ``` -popins2 genotype [OPTIONS] SAMPLE_ID +popins4snake genotype [OPTIONS] SAMPLE_ID ``` The genotype command generates alleles (ALT) of the supercontigs with some flanking reference genome sequence. Then, the reads of a sample are aligned to ALT and the reference genome around the breakpoint (REF). The ratio of alignments to ALT and REF determines a genotype quality and a final genotype prediction per variant per sample. - - - -## Help - -``` -$ popins4snake -h - -===================================================================== -A modularized version of the program PopIns2 - for population-scale detection of non-reference sequence variants -===================================================================== - -SYNOPSIS - ../popins4snake/popins4snake COMMAND [OPTIONS] - -COMMAND - crop-unmapped Extract unmapped and poorly aligned reads from a BAM file. - merge-bams Merge two name-sorted BAM files of the same sample and set mate information of now paired reads. - merge-contigs Merge sets of contigs into supercontigs using a colored compacted de Bruijn Graph. - find-locations Find insertion locations of (super-)contigs per sample. - merge-locations Merge insertion locations from all samples into one file. - place-refalign Find positions of (super-)contigs by aligning contig ends to the reference genome. - place-splitalign Find positions of (super-)contigs by split-read alignment (per sample). - place-finish Combine (super-)contig positions found by split-read alignment from all samples. - genotype Determine genotypes of all insertions in a sample. - -VERSION - 0.1.0-a52d4f5, Date: 2022-08-25 14:42:31 - -Try `../popins4snake/popins4snake COMMAND --help' for more information on each command. -```