IDEA : Bias Higher Than Previously Said


The document below is from the prestigious IDEA Center at Kansas State University. You may find it interesting/useful.

"It is clear from these findings that the portion of student ratings of instructional excellence accounted for by variables not under the instructor's control is much higher than previously thought. If student ratings are to be used to inform improvement strategies or to help make 
administrative recommendations, it is essential that appropriate adjustments be made to account for relevant extraneous circumstances. Our proposed method for doing this, while a satisfactory approach to the problem, requires a larger data base such as that compiled by the IDEA program."

IDEA Research & Development 
Exchange (IDEA Center paper) September 1997 
Kansas State University 
Center for Faculty Development and Evaluation 
 

Studies of the Impact of Extraneous Variables by Dr. Donald P. Hoyt

There is a considerable body of literature related to "bias" in student ratings of instruction. Reviews of this literature can be found in Centra, 1970; Feldman, 1976, 1978, 1979; Haladyna and Hess, 1994; Kulik and McKeachie, 1975; and Marsh, 1987). The bulk of the literature suggests that the extent of bias in student ratings is not extreme. While there has been considerable variation from study to study, most report that extraneous factors account for 5-25 percent of the variance in effectiveness measures. The highly refined most recent study (Haladyna and Hess, 1994) suggested that these may be underestimates; in their study, 38 percent of the variance in criterion ratings was accounted for by their measures of bias. It appears that bias reflecting student variables (gender, age, personality, expected grade, etc.) are more significant than those related to teaching conditions, instructor characteristics, or procedural factors. To the degree that these factors represent circumstances over which the instructor has no control, measures of teaching effectiveness  represent an unknown mixture of teaching skill and other, unrelated  factors. Unless such unrelated factors are identified and assessed so that their impact can be taken into account, we run the risk of rewarding/punishing faculty members, and holding them accountable, for  outcomes over which they have no control. Given the widespread use of  student ratings in faculty evaluation and merit increase processes, the consequences can be serious injustice.

Since its inception over 30 years ago, the IDEA System has attempted to deal with this problem by taking two potentially potent extraneous variables into account- (1) size of class and (2) student motivation as inferred from responses to Item 36 ("I had a strong desire to take this 
course"). Classes are sorted into one of four size groups ("small" to "very large") and one of five motivation groups ("very low" to "very high"). Separate norms have been prepared for each of the 20 combinations of size and motivation so that the effectiveness ratings of teachers are 
"controlled", none is advantaged nor disadvantaged on the basis of class size or student motivation.

Recently, we have been using a sample of about 36,000 classes to refine our way of taking extraneous variables into account. Rather than providing separate norms, we have developed "adjusted" criteria; i.e., estimates of what average criterion ratings would have been if extraneous circumstances were the same for all IDEA participants. This permits a more precise 
approach to the question while, at the same time, making possible the examination of additional variables without requiring an unreasonably large sample size.

Previous studies have shown that "student motivation" as measured by Item 36 is a potent predictor of progress ratings. But, because of the way this item is worded, we couldn't be sure if the "strong desire to take the course" was a function of who was teaching the course or, alternatively, of the course's contend purpose. Accordingly, the IDEA form has used two 
"experimental" items to reflect these options (Item C, "I really wanted to take a course from this instructor"; and Item D, "I really wanted to take this course regardless of who taught it"). These two items were included with Item 36 in our expanded studies of "student motivation" as an 
extraneous variable.

Two other items on the IDEA form were considered as possible extraneous variables-course difficulty (Item 33, "Difficulty of subject matter") and student effort (Item 35, "I worked harder on this course than on most courses I have taken"). However, responses to these items could not be directly included as measures of extraneous variables because, to some degree, they can be influenced by the instructor's class demeanor.Few people would challenge the idea that instructors can make a course "difficult" or "easy"; and many experts regard "inducing student effort" to be a major challenge to all teachers.

At the same time, it is probably true that some disciplines (or courses within a given discipline) are inherently more difficult than others, so that "difficulty" (Item 33) is probably a function both of the instructor's teaching strategies and characteristics inherent in the discipline/course. If these could be assessed separately, the portion reflecting discipline/course characteristics could qualify as a genuine "extraneous variable." In a similar vein, some students characteristically work harder and more conscientiously at their studies than do others; they have a strong commitment to, and the habits associated with, learning. Therefore, it seems desirable to attempt to separate the portion of "effort" (Item 35) which is due to the instructor's stimulation from the portion which is inherent in the student's attitudes and habits. The latter portion would 
seem to constitute a potentially important extraneous variable.

To address the problems raised in the preceding paragraph, we first treated "Difficulty" (Item 33) as a dependent variable and used the 22 items of "teacher behavior" as independent variables, conducting a step-wise multiple regression analysis. Our assumption that teacher behavior determined, in part, how difficult students perceived a course to be was amply confirmed by this analysis. Approximately 53 percent of the variance in student rating of "difficulty" was accounted for by teacher behaviors. We used this analysis to predict student ratings of difficulty from their description of the teacher's methods; we called this prediction 
"Teacher-induced Difficulty" (D-1). By subtracting D-1 from the obtained rating of "Difficulty," we developed an inferred measure of "disciplinary difficulty" (D-2). Positive scores (obtained scores higher than predicted scores) were indicative of courses which were inherently relatively 
difficult; negative scores (obtained scores lower than predicted scores) were indicative of courses which were inherently relatively easy.

We assessed the two components of "effort" in a similar way. "Teacher-induced Effort" (E-1) was inferred from a multiple regression analysis which related 22 teacher behaviors and D-1 to the mean ratings of Item 35. Approximately 61 percent of the variation in this measure could be 
accounted for in this fashion. Predictions of effort made on the basis of these findings ("Teacher-induced Effort") were subtracted from the obtained effort ratings; the difference was called "Student Academic Commitment" (E-2). High scores identified those with commitments which were higher than would be expected simply on the basis of teacher behavior.

Size of class was retained as another potentially important extraneous variable. Thus, there were six measures of extraneous circumstances which were explored: class size, three measures ofstudent  motivation (Items 36, C, and D), Disciplinary Difficulty (D-2), and Student Academic Commitment (E-2). These measures were used as independent variables in a series of 
multiple regression analyses; the dependent variables (measures of "teaching effectiveness") were progress ratings on each of the 10 objectives assessed by IDEA, progress on relevant objectives (individual progress ratings weighted by instructor ratings of importance), and ratings 
on three other summary measures: "excellent teacher," "excellent course," and "learned a great deal." For the first 10 of these, only those classes for which the instructor rated the objective as "important" or "Essential" were included.

Size of class did not enter significantly into any of the regression equations. The other five variables did make significant independent contributions to the prediction of at least 2 of the 14 criteria. The percent of variance accounted for by these measures of extraneous variables ranged from 19% to 50% on the 10 individual progress ratings, with an average of 39%. For the four summary measures, the percent of variance accounted for ranged from 47% to 61% and averaged 55%.

It is clear from these findings that the portion of student ratings of instructional excellence accounted for by variables not under the instructor's control is much higher than previously thought. If student ratings are to be used to inform improvement strategies or to help make 
administrative recommendations, it is essential that appropriate adjustments be made to account for relevant extraneous circumstances. Our proposed method for doing this, while a satisfactory approach to the problem, requires a larger data base such as that compiled by the IDEA 
program.

Centra, J. A. (1979). Uses and limitations of student ratings. Determining Faculty Effectiveness, 17-46. San Francisco: Jossey Bass.

Feldman, K. A. (1976). Grades and college students' evaluations of their courses and teachers.

(1978) Course characteristics and college students' ratings of their teachers.

(1979) The significance of circumstances for students' ratings of their teachers and courses. 
All in Research in Higher Education: 4: 69-111; 6: 223-274; 9: 199-242; 10: 149-172.

Haladyna, T. and Hess, R. K. (1994). The detection and correction of bias in student ratings of instruction. Research in Higher Education: 35: 669-687.

Kulik, J. A., and McKeachie, W. J. (1975). The evaluation of teachers in higher education. Review of Research in Higher Education: 3: 210-240.

Marsh, H. W. (1987). Relationship to background characteristics: The witch hunt for potential biases in students' evaluations. International Journal of Educational Research: 11: 305-329.

 ____________________________ 
Dr. Donald Hoyt returns to Kansas State and the Center on a part time basis to direct research activities at the Center. He retired from Kansas State University five years ago as Assistant Provost and Director of the Office of Planning and Evaluation Services. He is Professor Emeritus of Psychology and Education. Don Hoyt was the primary developer of the IDEA and DECAD Systems and the Center is most fortunate to have him available to guide our current research and development efforts. 



Society for a Return To Academic Standards 


Last Updated: 7 July 1999