Skip to content
Snippets Groups Projects
Commit 006b9184 authored by Birte Kehr's avatar Birte Kehr
Browse files

More edits to README - command descriptions still incomplete.

parent 366363ae
No related merge requests found
......@@ -2,7 +2,13 @@
A modularized version of the program [PopIns2](https://github.com/kehrlab/PopIns2) for population-scale detection of non-reference sequence variants.
__Note: The recommended way to run popins4snake is via the Snakemake workflow [*PopinSnake*](https://gitlab.informatik.hu-berlin.de/fonda_a6/popinSnake).__
*Popins4snake* is a program consisting of several functions.
The functions are designed to be chained into a workflow, together with calls to standard bioinformatics programs (samtools, bwa, ...) and bash commands.
__The recommended way of running *popins4snake* is using the Snakemake workflow [PopinSnake](https://gitlab.informatik.hu-berlin.de/fonda_a6/popinSnake).__
You can find installation instructions for all dependencies of the PopinSnake workflow, including instructions for installing popins4snake in the [PopinSnake README file](https://gitlab.informatik.hu-berlin.de/fonda_a6/popinSnake/-/blob/main/README.md).
......@@ -11,6 +17,7 @@ __Note: The recommended way to run popins4snake is via the Snakemake workflow [*
1. [Requirements](#requirements)
1. [Installation](#installation)
1. [Usage](#usage)
1. [Summary of popins4snake functions](#summary-of-popins4snake-functions)
1. [Help](#help)
1. [References](#references)
......@@ -82,95 +89,92 @@ The [PopIns2 Wiki](https://github.com/kehrlab/PopIns2/wiki/Troubleshooting---FAQ
## Usage
*Popins4snake* is a program consisting of several functions.
The functions are designed to be chained into a workflow together with calls to standard bioinformatics programs (samtools, bwa, ...) and bash commands.
__The recommended way of running *popins4snake* is using the Snakemake workflow [PopinSnake](https://gitlab.informatik.hu-berlin.de/fonda_a6/popinSnake).__
To display the help page of each of the *popins4snake* function, type `popins2 <command> --help` as shown in the [help section](#help).
To get an overview of the functions offered in *popins4snake*, you can run `./popins4snake -h` after installation:
```
=====================================================================
A modularized version of the program PopIns2
for population-scale detection of non-reference sequence variants
=====================================================================
SYNOPSIS
./popins4snake COMMAND [OPTIONS]
### The `crop-unmapped` command
COMMAND
crop-unmapped Extract unmapped and poorly aligned reads from a BAM file.
merge-bams Merge two name-sorted BAM files of the same sample and set mate information of now paired reads.
merge-contigs Merge sets of contigs into supercontigs using a colored compacted de Bruijn Graph.
find-locations Find insertion locations of (super-)contigs per sample.
merge-locations Merge insertion locations from all samples into one file.
place-refalign Find positions of (super-)contigs by aligning contig ends to the reference genome.
place-splitalign Find positions of (super-)contigs by split-read alignment (per sample).
place-finish Combine (super-)contig positions found by split-read alignment from all samples.
genotype Determine genotypes of all insertions in a sample.
VERSION
0.1.0-a52d4f5, Date: 2022-08-25 14:42:31
Try `../popins4snake/popins4snake COMMAND --help' for more information on each command.
```
popins2 crop-unmapped [OPTIONS] sample.bam
## Summary of *popins4snake* functions
To display the help page of each of the *popins4snake* functions, type `./popins4snake <command> --help`.
### The `crop-unmapped` function
```
popins4snake crop-unmapped [OPTIONS] sample.bam
```
The crop-unmapped command identifies reads without high-quality alignment to the reference genome. The reads given in the input BAM file must be indexed, i.e. the file `sample.bam.bai` is expected to exist.
### The `merge-bams` command
### The `merge-bams` function
```
popins2 merge-bams [OPTIONS] input1.bam input2.bam
popins4snake merge-bams [OPTIONS] input1.bam input2.bam
```
### The `merge-contigs` command
### The `merge-contigs` function
```
popins2 merge-contigs [OPTIONS] {-s|-r} /path/to/sample_directories/
popins4snake merge-contigs [OPTIONS] {-s|-r} /path/to/sample_directories/
```
\[Default\] The merge command builds a colored and compacted de Bruijn Graph (ccdbg) of all contigs of all samples in a given source directory _DIR_.
By default, the merge module finds all files of the pattern `<DIR>/*/assembly_final.contigs.fa`. To process the contigs of the [assemble command](#the-assemble-command) the __-r__ input parameter is recommended. Once the ccdbg is built, the merge module identifies paths in the graph and returns _supercontigs_.
```
popins2 merge [OPTIONS] -y input.gfa -z input.bfg_colors
popins4snake merge-contigs [OPTIONS] -y input.gfa -z input.bfg_colors
```
An alternative way of providing input for the merge command is to directly pass a ccdbg. Here, the merge command expects a _GFA_ file and a _bfg_colors_ file, which is specific to the Bifrost. If you choose to run the merge command with a _pre_-built GFA graph, mind that you have to set the Algorithm options accordingly (in particular __-k__).
### The `find-locations` command
### The `find-locations` function
```
popins2 find-locations [OPTIONS] SAMPLE_ID
popins4snake find-locations [OPTIONS] SAMPLE_ID
```
### The `merge-locations` command
### The `merge-locations` function
```
popins2 merge-locations [OPTIONS]
popins4snake merge-locations [OPTIONS]
```
### The `place` commands
### The `place` function
```
popins2 place-refalign [OPTIONS]
popins2 place-splitalign [OPTIONS] SAMPLE_ID
popins2 place-finish [OPTIONS]
popins4snake place-refalign [OPTIONS]
popins4snake place-splitalign [OPTIONS] SAMPLE_ID
popins4snake place-finish [OPTIONS]
```
In brief, the place commands attempt to anker the supercontigs to the samples. At first, all potential anker locations from all samples are collected. Then prefixes/suffixes of the supercontigs are aligned to all collected locations. For successful alignments records are written to a VCF file. In the second step, all remaining locations are split-aligned per sample. Finally, all locations from all successful split-alignments are combined and added to the VCF file.
### The `genotype` command
### The `genotype` function
```
popins2 genotype [OPTIONS] SAMPLE_ID
popins4snake genotype [OPTIONS] SAMPLE_ID
```
The genotype command generates alleles (ALT) of the supercontigs with some flanking reference genome sequence. Then, the reads of a sample are aligned to ALT and the reference genome around the breakpoint (REF). The ratio of alignments to ALT and REF determines a genotype quality and a final genotype prediction per variant per sample.
## Help
```
$ popins4snake -h
=====================================================================
A modularized version of the program PopIns2
for population-scale detection of non-reference sequence variants
=====================================================================
SYNOPSIS
../popins4snake/popins4snake COMMAND [OPTIONS]
COMMAND
crop-unmapped Extract unmapped and poorly aligned reads from a BAM file.
merge-bams Merge two name-sorted BAM files of the same sample and set mate information of now paired reads.
merge-contigs Merge sets of contigs into supercontigs using a colored compacted de Bruijn Graph.
find-locations Find insertion locations of (super-)contigs per sample.
merge-locations Merge insertion locations from all samples into one file.
place-refalign Find positions of (super-)contigs by aligning contig ends to the reference genome.
place-splitalign Find positions of (super-)contigs by split-read alignment (per sample).
place-finish Combine (super-)contig positions found by split-read alignment from all samples.
genotype Determine genotypes of all insertions in a sample.
VERSION
0.1.0-a52d4f5, Date: 2022-08-25 14:42:31
Try `../popins4snake/popins4snake COMMAND --help' for more information on each command.
```
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment