Show simple item record

dc.contributor.authorVan Reenen, Mari
dc.contributor.authorWesterhuis, Johan A.
dc.contributor.authorReinecke, Carolus J.
dc.contributor.authorVenter, J. Hendrik
dc.date.accessioned2017-03-07T06:02:47Z
dc.date.available2017-03-07T06:02:47Z
dc.date.issued2017
dc.identifier.citationVan Reenen, M. et al. 2017. Metabolomics variable selection and classification in the presence of observations below the detection limit using an extension of ERp. BMC bioinformatics, 18(1): Article no 83. [https://doi.org/10.1186/s12859-017-1480-8]en_US
dc.identifier.issn1471-2105 (Online)
dc.identifier.urihttp://hdl.handle.net/10394/20721
dc.identifier.urihttps://doi.org/10.1186/s12859-017-1480-8
dc.identifier.urihttps://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1480-8
dc.description.abstractBackground ERp is a variable selection and classification method for metabolomics data. ERp uses minimized classification error rates, based on data from a control and experimental group, to test the null hypothesis of no difference between the distributions of variables over the two groups. If the associated p-values are significant they indicate discriminatory variables (i.e. informative metabolites). The p-values are calculated assuming a common continuous strictly increasing cumulative distribution under the null hypothesis. This assumption is violated when zero-valued observations can occur with positive probability, a characteristic of GC-MS metabolomics data, disqualifying ERp in this context. This paper extends ERp to address two sources of zero-valued observations: (i) zeros reflecting the complete absence of a metabolite from a sample (true zeros); and (ii) zeros reflecting a measurement below the detection limit. This is achieved by allowing the null cumulative distribution function to take the form of a mixture between a jump at zero and a continuous strictly increasing function. The extended ERp approach is referred to as XERp. Results XERp is no longer non-parametric, but its null distributions depend only on one parameter, the true proportion of zeros. Under the null hypothesis this parameter can be estimated by the proportion of zeros in the available data. XERp is shown to perform well with regard to bias and power. To demonstrate the utility of XERp, it is applied to GC-MS data from a metabolomics study on tuberculosis meningitis in infants and children. We find that XERp is able to provide an informative shortlist of discriminatory variables, while attaining satisfactory classification accuracy for new subjects in a leave-one-out cross-validation context. Conclusion XERp takes into account the distributional structure of data with a probability mass at zero without requiring any knowledge of the detection limit of the metabolomics platform. XERp is able to identify variables that discriminate between two groups by simultaneously extracting information from the difference in the proportion of zeros and shifts in the distributions of the non-zero observations. XERp uses simple rules to classify new subjects and a weight pair to adjust for unequal sample sizes or sensitivity and specificity requirementsen_US
dc.language.isoenen_US
dc.publisherBioMed Centralen_US
dc.subjectDetection limiten_US
dc.subjectProbability mass at zeroen_US
dc.subjectVariable selectionen_US
dc.subjectClassificationen_US
dc.subjectMetabolomicsen_US
dc.titleMetabolomics variable selection and classification in the presence of observations below the detection limit using an extension of ERpen_US
dc.typeArticleen_US
dc.contributor.researchID12791733 - Van Reenen, Mari
dc.contributor.researchID25980629 - Westerhuis, Johannes Arnold
dc.contributor.researchID10055037 - Reinecke, Carolus Johannes
dc.contributor.researchID10168907 - Venter, Johannes Hendrik


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record