Authorea

Quality control of short read data

We have used FASTQC to analyse the quality of the raw sequencing data, and NGS Toolkit \cite{Patel_2012} and SeqTK \cite{seqtk} to run them through a pipeline of quality control. The goal is to obtain the best quality dataset by identifying and removing low-quality sequences and optimize the subsequent analysis steps. After assessing the quality of the obtained sequences, we have trimmed the first 3 and the last 25 bp from each short-read. We have also performed a sliding window filtering, where the average Phred score is computed every step, and the read is trimmed if the score drops