Step-by-step guide to assemble a genome
This page provides links to the software we used to assemble the corn snake genome v2, along with the suggested commands and input/output files, whenever appropriate. We will extend and update this guide as new versions of the genome are made available. Detailed explanations on how to use these tools are provided in the associated publication cited below.
Good luck with your assembly!
Related Publication
Ullate-Agote A., Chan F.Y., Tzika A.C.
A Step-by-step Guide to Assemble a Reptilian Genome
Methods in Molecular Biology: Avian and Reptilian Developmental Biology (2017)
Software
You can download here example input and output files for all the software described below (zipped file, 53MB).
FastQC
Download
Skewer
Download
‣ Suggested command to trim the adaptors:
skewer -x adaptor_R1 -y adaptor_R2 -k 1 -r 0.15 -l 0 -z -t number_threads HiSeq2500_R1.fastq HiSeq2500_R2.fastq
DISCOVAR de novo
Download
‣ Suggested command to run the assembly:
DiscovarExp NUM_THREADS=48 MEM_MONITOR=True READS=" HiSeq2500_R1_trimmed_lib{1,2}.fastq.gz,HiSeq2500_R2_trimmed_lib{1,2}.fastq.gz" OUT_DIR=DiscovarExp_out
‣ Suggested command to list 'circular' contigs:
grep "circular" a.lines.fasta | cut -c 2- > circular_contigs.txt
BLAST+
Download
‣ Suggested command for contaminants database:
blastn -task blastn -reward 1 -penalty -5 -gapopen 3 -gapextend 3 -dust yes -soft_masking true -evalue 0.05 -searchsp 1750000000000 -db contaminant_db -query contigs_1Kb.fasta -outfmt 6 -out Results_contaminant_db.xls
‣ Suggested command for 'circular' contigs:
blastn -task blastn -dust yes -soft_masking true -evalue 0.0001 -db nt -max_target_seqs 1 -remote -query Circular_contigs.fasta -outfmt 6 -out Results_circular.xls
LANE Runner
Download
Trimmomatic
Download
bwa
Download
‣ Suggested command to index a library:
bwa index contigs_1Kb_filtered.fasta
‣ Suggested command for alignment:
bwa mem -t number_threads contigs_1Kb_filtered.fasta libray_R1.fastq library_R2.fastq > library_output.sam 2>out_library.err
samtools
Download
‣ Suggested command for SAM to sorted BAM conversion:
samtools view -buS 3Kb_output.sam | samtools sort -@ number_threads -m memory_perthread - output_library.sorted
‣ Suggested command to index a sorted BAM file:
samtools index 3Kb.sorted.bam
BESST
Download
‣ Suggested command for scaffolding: runBESST -c contigs_1Kb_filtered.fasta -f 100bp_v1.sorted.bam HiSeq2500_lib1.sorted.bam HiSeq2500_lib2.sorted.bam 3Kb.sorted.bam 6Kb.sorted.bam 20Kb.sorted.bam --orientation fr fr fr fr fr fr -o BESST_scaffolding_default
REAPR
Download
‣ Suggested command to identify errors in the sequences names:
reapr facheck scaffolds_assembly_firstRound.fasta
‣ Suggested command to correct errors in the sequences names:
reapr facheck scaffold_assembly_firstRound.fasta new_assembly
‣ Suggested command to align a mate-pair library with SMALT:
reapr smaltmap scaffold_assembly_firstRound.fasta 20Kb_R1.fastq 20Kb_R2.fastq firstRound_20Kb.bam 1>out_firstRound_20Kb.txt 2>err_firstRound_20Kb.txt
‣ Suggested command:
reapr pipeline scaffold_assembly_firstRound.fasta firstRound_20Kb.bam output_directory
SSPACE standard (free for academics)
Download
‣ Suggested command for scaffolding:
perl SSPACE_Standard_v3.0.pl -l libraries_list.txt -s Scaffolds_broken.fa -T number_threads -b SSPACE_output 1>&2 > SSPACE3.log
BLAT
Download
‣ Suggested command for alignment:
blat SSPACE_output.final.scaffolds.fasta RNA_reads_transcripts.fa output.psl -noHead
L_RNA_scaffolder
Download
‣ Suggested command for assembly:
perl L_RNA_scaffolder.sh -d
IrysView (needs registration to download)
Download
IrysSolve
Download
CEGMA (no longer supported by the developers)
Download
‣ Suggested command:
cegma -g final_assembly.fasta -o final_CEGMA -T number_threads --vrt -v
BUSCO
Download
‣ Suggested command:
python3 BUSCO_v1.1b1.py -c number_threads -in final_assembly.fasta -m genome -l eukaryota/ -o BUSCO_final_eukaryota 1> out.log 2> out.err