Abstract
A data set obtained from the UC Irvine Machine Learning Repository contains information regarding 210 kernels from three different varieties of wheat (Kama, Rosa, and Canadian). The objective of this project was to determine how these three varieties could be distinguished by conducting several multivariate analyses in R. A correlation matrix, principal component analysis (PCA), and a linear discriminant analysis (LDA) were explored. These analyses indicated that varieties of wheat kernels can be discriminated by their size, compactness, and shape. On average, the Rosa variety is the largest type of wheat kernel, and the Canadian is the smallest while Kama is intermediate in size, making it difficult to distinguish. However, the Kama variety is typically more spherical than the other groups. On average , the Canadian variety is the least dense whereas Rosa and Kama kernels are typically more compact. These distinctions made it possible for the LDA to separate the three kernel types. However, variance in the data set makes it impossible to confidently predict the wheat variety of every kernel. As a result, the misclassification rate of the LDA was 3.8%. These errors all involved the Kama variety because its ranges for all of the variables included in the data set overlapped with the ranges of both the Canadian and Rosa varieties.
|
Emily Hughes
M.S. student in Geology/Paleontology at West Virginia University email: [email protected] |