From c73480d7cc179acbbbc477256586fce1a69fe0e3 Mon Sep 17 00:00:00 2001 From: Kedi Cao <kedi.cao@klinik.uni-regensburg.de> Date: Thu, 20 Feb 2025 12:28:19 +0100 Subject: [PATCH] Update README.md --- README.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 9d4839f..52b4285 100644 --- a/README.md +++ b/README.md @@ -117,28 +117,28 @@ Below we provide information on detailed explaination on each function comparing popins4snake crop-unmapped [OPTIONS] sample.bam ``` The crop-unmapped command identifies reads without high-quality alignment to the reference genome. The reads given in the input BAM file must be indexed, i.e. the file `sample.bam.bai` is expected to exist. -Originally part of the `assemble` function from PopIns and PopIns2, now an independent function for the workflow, the unmapped reads will be sorted by samtools, get filtered by read quality through SICKLE, and assembled. +Originally part of the `assemble` function from PopIns and PopIns2, now an independent function for the workflow, the unmapped reads will be sorted by samtools, get filtered by read quality through [SICKLE](https://github.com/najoshi/sickle), and assembled. crop-unmapped now provide its own quality filtering method by adding `--min-qual` and `--min-read-len`, but SICKLE can still be called directly from the workflow. **Workflow Configuration**\ -Through the config file, user can select their desired quality filtering method and choose their preferred assembler: MINIA or VELVET following the quality filering step. +Through the config file, user can select their desired quality filtering method and choose their preferred assembler: [MINIA](https://github.com/GATB/minia) or [VELVET](https://github.com/dzerbino/velvet) following the quality filering step. ### The `merge-contigs` function ``` popins4snake merge-contigs [OPTIONS] ``` -The merge-contig command builds a colored and compacted de Bruijn Graph (ccdbg) of all contigs of all samples in a given source directory. For general usage see [PopIns2](https://github.com/kehrlab/PopIns2) merge function. +The merge-contig command builds a colored and compacted de Bruijn Graph (ccdbg) of all contigs of all samples in a given source directory. For general usage see [PopIns2 merge function](https://github.com/kehrlab/PopIns2?tab=readme-ov-file#the-merge-command). **Workflow Configuration**\ -As in the snakemake workflow, user can set the k-mer size for customized Algorithm options. The function also supports multi-threading for running on a cluster, setup the number of threads in cluster_config.yaml. +As in the snakemake workflow, user can set the k-mer size for customized Algorithm options. The function also supports multi-threading for running on a cluster, setup the number of threads in `cluster_config.yaml`. ### The `merge-bams` function ``` popins4snake merge-bams [OPTIONS] input1.bam input2.bam ``` -As part of the contigmap function in PopIns and PopIns2, now used in contigmap module in the workflow, `merge-bams` merges the mapped and sorted files from `BWA` and `SAMtools` in the contigmap module. This process anchors both ends of each read pair, ensuring that pairs with one end aligned to the reference genome and the other end aligned to the supercontigs are brought together. +As part of the contigmap function in PopIns and PopIns2, now used in contigmap module in the workflow, `merge-bams` merges the mapped and sorted files from [BWA](https://github.com/lh3/bwa) and [SAMtools](https://github.com/samtools/samtools) in the contigmap module. This process anchors both ends of each read pair, ensuring that pairs with one end aligned to the reference genome and the other end aligned to the supercontigs are brought together. ### Functions for `position` module The functions below, including `find-locations`,`merge-locations` and `place` functions, are part of the position module in the workflow. Since the workflow now supports optional contamination removal, some intermediate files have changed based on the config conditions. Therefor these functions were adjusted to take files with removed contaminations and aligned to alternative references during the cleaning steps. @@ -162,12 +162,15 @@ popins4snake place-finish [OPTIONS] ``` In brief, the place commands attempt to anker the supercontigs to the samples. At first, all potential anker locations from all samples are collected. Then prefixes/suffixes of the supercontigs are aligned to all collected locations. For successful alignments records are written to a VCF file. In the second step, all remaining locations are split-aligned per sample. Finally, all locations from all successful split-alignments are combined and added to the VCF file. +**Workflow Configuration**\ +As used in the workflow, user can set the value for `--readlength` parameter for place-refalign and place-spitlign from the `snake_config.yaml`. + ### The `genotype` function ``` popins4snake genotype [OPTIONS] SAMPLE_ID ``` The genotype command generates alleles (ALT) of the supercontigs with some flanking reference genome sequence. Then, the reads of a sample are aligned to ALT and the reference genome around the breakpoint (REF). The ratio of alignments to ALT and REF determines a genotype quality and a final genotype prediction per variant per sample. -Combined with bcftools sort and merge functions, these steps completed the genotype module of the workflow. +Combined with [BCFtools](https://github.com/samtools/bcftools) sort and merge functions, these steps completed the genotype module of the workflow. -- GitLab