Description
Summary of phylogenomic methods and analyses used in "Immunogenicity of convalescent and vaccinated sera against clinical isolates of ancestral SARS-CoV-2, Beta, Delta, and Omicron variants"
Methods
Raw reads underwent adapter/quality trimming (trim-galore v0.6.5 [citation: https://github.com/FelixKrueger/TrimGalore]), host filtering and read mapping to reference (bwa v0.7.17 [citation: arXiv:1303.3997v2 ], samtools v.1.7 [citation: 10.1093/bioinformatics/btp352]) trimming of primers (iVar v1.3 [citation:10.1186/s13059-018-1618-7]) and variant/consensus calling (freebayes v1.3.2 [citation: arXiv:1207.3907]) using the SIGNAL workflow (https://github.com/jaleezyy/covid-19-signal) v1.4.4dev (#60dd466) [citation: doi.org/10.3390/v12080895] with the ARTICv4 amplicon scheme (from https://github.com/artic-network/artic-ncov2019) and the MN908947.3 SARS-CoV-2 reference genome and annotations. Additional quality control and variant effect annotation (SnpEff v5.0-0 [citation:0.4161/fly.19695]) was performed using the ncov-tools v1.8.0 (https://github.com/jts/ncov-tools/). Finally, PANGO lineages were assigned to consensus sequences using pangolin v3.1.17 (with the PangoLEARN v2021-12-06 models) [citation:10.1093/ve/veab064], scorpio v0.3.16 (with constellations v0.1.1) [citation: https://github.com/cov-lineages/scorpio], and PANGO-designations v1.2.117 [citation:10.1038/s41564-020-0770-5]. Variants were summarised using PyVCF v0.6.8 [citation:https://github.com/jamescasbon/PyVCF] and pandas v1.2.4 [citation:10.25080/Majora-92bf1922-00a]. Phylogenetic analysis was performed using augur v13.1.0 [citation: 10.21105/joss.02906] with IQTree (v2.2.0beta) [citation:10.1093/molbev/msaa015] and the resulting phylogenetic figure generated using ETE v3.1.2 [citation: 10.1093/molbev/msw046]. Contexual sequences were incorporated into the phylogenetic analysis by using Nexstrain's ingested GISAID metadata and pandas to randomly sample a representative subset of sequences (jointly deposited in NCBI and GISAID) that belonged to lineages observed in Canada (see sequences_used_in_tree_with_acknowledgements.tsv for metadata and acknowledgements).
File Description
-
20220101_MN01513_WGS114_DEC31SRI_CK_summary_valid_negative_pass_only.tsvncov-tools generate QC summary -
sk_variant_summary.ipynbnotebook containing code to summarise variants (tables/variant_percentage_read_support_protein_nonsynonymous_only.tsvand graphicfigures/intermediate/spike_mutation_table_styled.png) and subsample representative genomesphlyogeny/seqs/open_context_genomes.fastafrom GISAID (nextstrain ingested fasta and metadata from 2021-12-31:metadata_2021-12-31_17-29.tsv.gzandsequences_fasta_2022_01_03.tar.xz) -
genomes/Consensus sequences generated by FreeBayes via SIGNAL. -
variants/ncov-tools SnpEff annotated SIGNAL FreeBayes VCFs -
phylogenydata used to generate annotated phylogeny with augur -
phylogeny/tree.shscript used to generate phylogeny -
phylogeny/seqssequences used for phlyogeny -
phylogeny/datareference data for phylogeny -
phylogeny/augurphylogeny and intermediate files -
phlyogeny/viz_tree.pyete3 based script to generate phylogeny figure (tree.svg) -
figurefiles for generating result plot -
figure/phylo_variant_figure.*final figure combiningtree.svgandspike_mutation_table_styled.png -
figure/intermediate/tree.svgrendered SVG of phylogeny -
figure/intermediate/spike_mutation_table_styled.pngrendered summary of variants -
tablesset of tables for manuscript -
tables/sequences_used_in_tree_with_acknowledgements.tsvncov-ingest metadata with acknowledgements -
tables/variant_percentage_read_support_protein_nonsynonymous_only.tsvsummary of variants
