ContentsInteractive Import of Epi MetadataEpidemiological metadata can be typed in per sample via the sample overview panel or imported via a MS Excel or CSV file. Most of those data fields are modeled according the Global Microbial Identifier (GMI)/NCBI minimum epidemiological data requirements (e.g., FDA GenomTrakr or CDC are using these fields). The focus of these requirements lies obviously on the three classical epidemiological dimensions, i.e., place, time, and ‘person’ information. In contrast to the procedure detail and statistic data fields, a SeqSphere+ user can create next to the default epidemiological data fields as many as wished additional epidemiological data fields by defining a customized database scheme for storing epidemiological data.
Automatic Import of MetadataMetadata can be automatically imported from files when using the pipeline. Either CSV or SPEC files can be used for import. CSV-filesComma-separated-values (CSV) files can be used to import metadata in a pipeline. The CSV-files can use , or ; as separator. The first row of a CSV file is used as the header. The field names described in the section Epi metadata fields below can be used to name the columns. In addition, if the label for a sample field is unique within a database schema, this label can also be used in the header row to define the sample field. Tags can be set by specifying tg. as prefix in the header followed by the tag name, e.g. tg.MyTag. The columns for each sample indicate whether the tag should be used on the given sample. Use "yes" or "true" or "1" to set the tag for a sample. Dates can be given as yyyy-MM-dd. One CSV-file for multiple pipeline samplesA CSV-file with name metadata.csv can be added to the contig or read-file directory. The CSV-file can contain data for multiple Samples. The first row is used as header. The first header-column named sample id (ignoring case) is used as column that contains the Sample Ids. Each row of data is then checked to see if it contains the Sample ID of the currently processed Sample in this column. If this is the case, the 'Epi Metadata Fields' and 'Procedure Details and Statistics Fields' of the currently processed sample are filled with the data from this row. If multiple rows exist that match the Sample Id, they are processed from top to bottom. Example for a CSV-file for multiple pipeline samples: sample id,Strain,pf.assembler,pf.assembler_version,tg.T2,tg.T3 DE9622,strain 1,MIRA,3,yes,no DE9686,strain 2,MIRA,3,no,yes Note that tag T2 will be set for DE9622 in this example, and DE9686 will be tagged with T3. Values from CSV-files for multiple pipeline samples overwrite values from CSV-files for single samples if both are present. CSV-files for single samplesA CSV-file for a single sample must have the same name as the input sequence file (e.g., FASTA or FASTQ) or the sample id but with the file name extension ".csv". It must be placed in the contig or read-file directory. The first row is used as header, the second row contains the data. All other rows are ignored. Example for a sample CSV-file: ef.Characteristic.genus,ef.Characteristic.species,Strain,tg.T2,tg.T3 Escherichia,coli,strain 3,yes,yes Note that tags T2 and T3 will be set for the sample in this example. SPEC-filesSPEC files can also be used in SeqSphere to export and import Metadata. The SPEC file for a sample must have the same name as the input sequence file (e.g., FASTA or FASTQ) but with the file name extension ".spec". If a specific filenaming is used in a pipeline, the SPEC file may also have the name of the sample ID. Additionally, a single SPEC file can also be defined for all sequence files of its directory, if it is named "sequence_specification.spec". If multiple SPEC files are found for a sample, they are merged together. The content of a SPEC file is plain text (UTF-8) where each line holds a single field and value pair, in the format: field=value (e.g., pf.avg._coverage_(assembled)=111 ). The fields may be in any order. The following fields can be set in a SPEC file and will be imported as Metadata. Dates can be given as yyyy-MM-dd. Epi Metadata FieldsThe following names can be used to specify epi metadata fields in CSV or SPEC-files: ef.Sample.alias_id ef.Sample.isolationDate ef.Sample.receiptDate ef.Sample.sample_id_of_collector ef.Sample.sender ef.Sample.comment ef.Sample.modifiedDate ef.Sample.createdDate ef.Sample.submittedDate ef.Sample.downloaded_from ef.Sample.submitted_to ef.Source.source_type ef.Source.source_subtype ef.Source.host ef.Source.host_age ef.Source.host_sex ef.Source.host_disease ef.Source.isolation_source ef.Source.isolation_country ef.Source.isolation_state ef.Source.isolation_city ef.Source.isolation_zip ef.Source.isolation_lat_long ef.Source.lat_long_resolution ef.Source.cluster_outbreak ef.Source.epi_info ef.Source.case_id ef.Source.ecdc_case_id ef.Characteristic.genus ef.Characteristic.species ef.Characteristic.subspecies ef.Characteristic.strain ef.Characteristic.genotype ef.Characteristic.serotype ef.Characteristic.pathotype ef.Characteristic.identification_method ef.Characteristic.identification_kit_vendor ef.Characteristic.culture_collection ef.Characteristic.pubmed_id ef.Characteristic.study ef.Characteristic.ncbi_accession ef.Characteristic.experiment_accession ef.Characteristic.sample_accession ef.Characteristic.study_accession ef.Report.report_comment Procedure Details and Statitistics Fieldspf.library_source pf.library_strategy pf.sequencing_protocol pf.sequencing_vendor pf.assembly_pre-processing pf.assembly_type pf.assembler pf.assembler_version pf.assembler_parameters pf.assembly_post-processing pf.expected_genome_size_for_downsampling pf.downsampled_to_coverage pf.top_species_match pf.top_species_match_identity pf.top_species_match_shared-hashes pf.contamination_check_result pf.fastqc_per_base_sequence_quality_(forward_reads) pf.fastqc_per_base_sequence_quality_(reverse_reads) pf.fastqc_adapter_content pf.avg._coverage_(unassembled) pf.avg._coverage_(processed,_unassembled) pf.avg._read_length_(unassembled) pf.avg._read_length_(processed,_unassembled) pf.read_count_(unassembled) pf.read_count_(processed,_unassembled) pf.read_base_count_(unassembled) pf.read_base_count_(processed,_unassembled) pf.contig_count_(assembled) pf.n50_(assembled) pf.read_count_(assembled) pf.read_fwd_count_(assembled) pf.read_rev_count_(assembled) pf.consensus_base_count_(assembled) pf.approximated_genome_size_(mbases) pf.max_contig_length_(assembled) pf.min_contig_length_(assembled) pf.avg._contig_length_(assembled) pf.avg._coverage_(assembled) pf.read_base_count_(assembled) Paths to Raw Read Files Fieldsfl.reads.1 fl.reads.2 |