Single-Gene Dotplot Analysis is Effective in Predicting Cluster Assignment in Gordonia Bacteriophages
Bacteriophages, or “phages,” are a diverse group of bacteria-infecting viruses. There are estimated to be 10^31 phages on Earth, which creates a need to organize them into smaller groups to be able to study them more effectively. “Clustering” is a grouping method that organizes phages by similar genetic sequences. Clusters of phages are most similar to other phages in their cluster. Clusters can further be divided into subclusters, in which phages are even more closely related. Most cluster assignments are done via full genome comparison, which is time consuming and requires extensive sequence analysis. It has been shown that single-gene comparison of viruses that infect Mycobacterium sp. can predict a phage’s cluster. This research project aimed to determine the ability of single-gene comparison to accurately predict clusters for phages infecting Gordonia sp. Gepard dotplot analysis of the portal and tape measure gene paired with BLAST comparison was used to group phages into clusters and subclusters. The cluster assignments were compared to full genome-based cluster assignment found on Phamerator. A two-proportion two-sided Z-test compared the cluster recovery rate using the portal and tape measure genes to previous single-gene comparison results. It was determined that both the portal and tape measure gene are effective predictors of cluster assignment in Gordonia sp. phages. These results show that full genome analysis may not be necessary to identify a phage’s cluster, which would save time for researchers searching for phages of a specific cluster.