• Login
    View Item 
    •   Carroll Scholars Home
    • Mathematics, Engineering and Computer Science
    • Mathematics, Engineering and Computer Science Undergraduate Theses
    • View Item
    •   Carroll Scholars Home
    • Mathematics, Engineering and Computer Science
    • Mathematics, Engineering and Computer Science Undergraduate Theses
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Verification of a Support Vector Machine Model for Predicting Proteotypic Peptides

    Thumbnail
    View/Open
    2010_ForbesJ_THS_000535.pdf (1.610Mb)
    Author
    Forbes, Jessica
    Advisor
    Holly Zullo; Kelly Cline; Jennifer Gloweinka
    Date of Issue
    2010-04-01
    Metadata
    Show full item record
    URI
    https://scholars.carroll.edu/handle/20.500.12647/3418
    Title
    Verification of a Support Vector Machine Model for Predicting Proteotypic Peptides
    Type
    thesis
    Abstract
    The current method to match mass spectra from tandem mass spectrometry (MS) to a peptide sequence requires searching a large database of all possible peptides encoded by an organism. However, only a subset of these possible peptides is consistently and repeatedly identified by MS (proteotypic peptides). Matching spectra to this smaller, proteotypic peptide search space increases computational efficiency and improves accuracy of the peptide identification, hence increasing the confidence that a protein has been accurately identified. Currently, it is labor-intensive to build a proteotypic peptide database of experimentally observed peptides! thus computationally deriving such a database is desirable. Webb-Robertson et al. trained a statistical learning algorithm called a support vector machine (SVM) from Yersinia pestis data that computationally classifies a peptide as proteotypic or not proteotypic. Preliminary tests by these authors showed that this SVM accurately predicted proteotypic peptides for two closely related bacterial species — Salmonella typhimurium and Shewanella oneidensis. To test the versatility of the classifier, experimentally generated proteotypic peptide databases from three bacteria more distantly related to Y. pestis, as well as one vertebrate species, were gathered - Pelagibacter ubique, Caulobacter crescentus, Cyanothece, and Mus musculus (mouse). For each of these species, those proteins with at least four experimentally determined proteotypic peptides were extracted and all possible peptides for those proteins were classified with the SVM. The resulting information was analyzed in MatLab, creating a Receiver Operating Characteristic (ROC) curve and associated area under the curve (AUC) value to describe the sensitivity and specificity of the SVM model, where an AUC of 1.0 describes a perfect classifier and a random binary classifier would generate an AUC value of 0.5. The average AUC values for Y. pestis, P. ubique, C. crescentus, Cyanothece, and mouse were 0.8351, 0.7442, 0.7622, 0.7455, and 0.7457 respectively. Therefore, the current SVM classifier accurately predicts proteotypic peptides for diverse bacterial species as well as the mouse. Future research may include retraining SVMs to target a specific protein sample preparation method or species.
    Degree Awarded
    Bachelor's
    Semester
    Spring
    Department
    Mathematics, Engineering & Computer Science
    Collections
    • Mathematics, Engineering and Computer Science Undergraduate Theses

    Browse

    All of Carroll ScholarsCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    DSpace software copyright © 2002-2023  DuraSpace
    DSpace Express is a service operated by 
    Atmire NV