A k-means cluster analysis was conducted to identify subgroups of college students based on their similarity of responses to an evaluation questioner regarding classes they attended. The dataset included a total of 5820 records

The quantitative clustering variable below were included. All clustering variables were standardized to have a mean of 0 and a standard deviation of 1.

**Class:**Course code; possible values from {1-13}**Repeat:**Number of times the student is taking this course; values taken from {0,1,2,3,…}**Attendance:**Code of the level of attendance; values from {0, 1, 2, 3, 4}**Difficulty:**Level of difficulty of the course as perceived by the student; values taken from {1,2,3,4,5}

Possible values for the variables below are {1,2,3,4,5}

**Q1:**The semester course content, teaching method and evaluation system were provided at the start.**Q2:**The course aims and objectives were clearly stated at the beginning of the period.**Q3:**The course was worth the amount of credit assigned to it.**Q4:**The course was taught according to the syllabus announced on the first day of class.**Q5:**The class discussions, homework assignments, applications and studies were satisfactory.**Q6**: The textbook and other courses resources were sufficient and up to date.**Q7:**The course allowed field work, applications, laboratory, discussion and other studies.**Q8:**The quizzes, assignments, projects and exams contributed to helping the learning.**Q9**: I greatly enjoyed the class and was eager to actively participate during the lectures.**Q10**: My initial expectations about the course were met at the end of the period or year.**Q11:**The course was relevant and beneficial to my professional development.**Q12**: The course helped me look at life and the world with a new perspective.

Data were randomly split into a training set that included 70% of the observations and a test set that included 30% of the observations. A series of k-means cluster analyses were conducted on the training data specifying k=1-9 clusters, using **Euclidean **distance. The variance in the clustering variables that was accounted for by the clusters was plotted for each of the nine cluster solutions in an elbow curve to provide guidance for choosing the number of clusters to interpret.

The elbow curve suggested that 2 or 3 cluster solutions might be interpreted. The results below are for an interpretation of the 2-cluster solution.

Canonical discriminant analyses was used to reduce the 15 clustering variable down a few variables that accounted for most of the variance in the clustering variables. A scatterplot of the first two canonical variables by cluster (see below) indicated that the two clusters had almost no within cluster variance, and did not overlap. The two clusters were not densely packed suggesting a relatively high within cluster variance.

The means on the clustering variables showed this was the first attempt at this class for students in cluster-1 (negative average), students in this cluster had a high attendance record (compared to cluster-1) and they had a favorable impression about their classes. Students in cluster-1 had a poor attendance record, many repeating their classes and had negative impression about classes they were taking.

In order to validate the clusters, an Analysis of Variance (ANOVA) was conducting to test for significant differences between the clusters on Q9 (students who greatly enjoyed their class and were eager to participate).

A tukey test was used for post hoc comparisons between the clusters. Results indicated significant differences between the clusters on Q9