Consensus Calling - Ridom SeqSphere+ Documentation

The SeqSphere+ consensus caller is used to calculate a consensus

for read alignments resulting of SKESA assemblies that were remapped with BWA
for read alignments resulting of SPAdes assemblies that were remapped with BWA
for read alignments created by reference mapping with BWA
for correcting the consensus of Velvet assemblies

Consensus Base

Get winner base:
- Get for each strand the winner base with the highest sum of base qualities.
- If gap appears more often in the reads than any of the other bases, a gap is returned.
- If the two strand winner bases are not equal, mix both strands together, and get again the winner base with the highest sum of base qualities.
Ambiguity heck:
- If sum of base quality of the winner base for both strands reaches the read support threshold (default: 60%) related to the sum of all base qualities, the winner base is taken as potential consensus base.
- Else the ambiguity N is called as potential consensus base.
Cpverage check:
- If the coverage strands is below the coverage threshold (default: 0) , an N is called.

For remapping alignments of SPAdes assemblies and for reference mappings that have an estimated average genome coverage below 50, the coverage threshold is set to 5. Else no threshold is set, so minimum coverage is required. Both thresholds can be can be configured in the short-read assembler settings of a pipeline script.

Consensus Base Quality

The consensus base qualities are caluclated from the two best base qualities for each strand (based on a MIRA-like process):

For each strand, the best quality is taken.
Then 10% of the next best base quality is added.
The total of both strand qualities gives the final quality, with cap 90.
Consensus gaps have always quality 0.

Example:

forward:
    A 30   -> 30     \
    A 20   ->  2     /  -> 32  \
    A 20                        \
    A 20                         \
                                   ->  60     
reverse:                         /
    A 26   -> 26     \          /
    A 20   ->  2     /  -> 28  /
    A 15

Uncovered Reference Genome Positions in BAM Files

If the reference genome has regions that are uncovered in the genome under study, those gaps are deleted in the resulting consensus sequence and juxtapositional covered regions are concatenated. This rule applies also for FASTA files that are exported.

Filtering Reads in BAM Files

If one of the following flags are found, the reads are skipped when calling the consensus, and are not imported into samples:

Read marked as unmapped
Read mapping quality is below 10 (configurable)
Read has not a single best hit (i.e., X0 flag is not equal to 1)
Read fails vendor quality check
Read marked as duplicate (however, marking duplicates is not performed within SeqSphere+)
Read alignment start is 0, length is 0 or name is *

Contents

Consensus Base

Consensus Base Quality

Uncovered Reference Genome Positions in BAM Files

Filtering Reads in BAM Files