Supported Data Files

Oxford Nanopore Long-read data can be processed as FASTA files or can be assembled using the ONT Data Assembly Module (beta). PacBio Long-read data can be processed as FASTA files only.

Optional the topology and coverage information can be extracted from the contigs of the FASTA files if stated in the FASTA header line of a contig as following:

  • If the line contains topology=circular if they contain a complete circular plasmid or chromosome. This term is used to define circular plasmids for Chromosome and Plasmids Overview Task Template processing. Knowing if a contig is circular might improve the MOB-suite plasmid reconstruction process.
  • If the line contains a term like coverage=123 to specify the coverage value of this contig. Contigs with coverage=0 are excluded from coverage calculation. All contigs in the FASTA file must have a coverage information, else no average assembled coverage is calculated. Knowing the average coverage helps for QC.
  • If the line contains [no-recon] the Chromosome and Plasmids Overview Task Template processing will skip the plasmid reconstruction process using the tool MOB-recon, so the contig will be treated as a complete plasmid (but not marked as circular).

Thereby, also a NCBI conform naming of the contigs can be achieved; e.g., for a circular chromosome:
>contig1_1710375900 [topology=circular][completeness=complete][chromosome];5261576;coverage=29

Using in the pipeline a tool like Circlator to fixstart (and orientation) helps tremendously for downstream visualization and comparisons of chromosomes and plasmids. For chromosomes Circlator uses for this function by default matches to the dnaA gene. For defining the start and orientation of most plasmids the CGE PlasmidFinder replicon database that is used for rep-typing could be utilized.

Importing Run Info

PacBio run infos can be imported.