More Ammunition

From: Donald Seagle, Ph.D.

It appears that you and I have something in common -- a strong interest in promoting the discontinuation of Student Evaluations of Instruction (SEIs) in colleges and universities. Clearly, there is substantial research evidence to show that students retaliate against decent, honest instructors when they receive undesirable grades (which some so richly deserve). I also believe that it can be shown that SEIs, in addition to the current educational philosophy of treating student like customers (which they most certainly are not), and the legal system that now undermines the authority of instructors, are the primary factors causing grade inflation. The prevailing attitude towards treating students as customers, and the threat of lawsuits also permeates into our K through 12 system,. Some proponents have suggested that the K-12 students should also be allowed to evaluate their teachers.

Despite the obvious psychological and sociological negative impact on student and instructor behavior, I have had little success in persuading
administrators that SEIs do more harm than good. In part, the problem lies with dumb-ass administrators and instructors who are more concerned
with "working the system" to enhance their merit awards, and promotion than pushing their students beyond the level students tend to gravitate to
(i.e., using good old fashion Marine Corps training techniques to take
students beyond what they think they can do)

Be that as it may, I thought you might be interested in learning that my own research found that there is a well defined correlation between the "grades students expect to receive" and "the ratings they give their instructors" on end-of-semester Student Evaluations of Instruction. Moreover, the study that I did showed that 80 percent of the students in a small community college in a western state expected to receive an "A" or a "B." That expectation was mathematically evenly split -- 50 percent expected an "A" and 50 percent expected a "B." Of the remaining twenty percent, no students reported an expectation of receiving less than a "C."
I would say that looks like grade inflation; albeit, it could be a case where the grades really did reflect the true performance of all the students.

Exactly 50 percent of the students expecting to receive an "A" gave their instructors a rating equivalent to an "A." The other 50 percent gave their instructors a rating equivalent to a "B." And, exactly half of the students expecting to receive a "B" gave their instructors a rating equivalent to a "B." The other half expecting a "B" gave their instructors a rating equivalent to a "C." More interestingly, no student gave his/her instructor a rating higher than the rating he/she (the student) expected to receive. The study covered several instructional disciplines. Regression Analysis was employed using a five-point Likert scale to contrast the difference between the "grades students expected to receive," and " the overall ratings they gave their instructor." Two other variables were also evaluated: The correlation
between "instructor friendliness," and "grades students expect to receive," and the correlation between "instructor leniency," and "grades students expect to receive."

In addition, my own experience as an instructor with 30 years of teaching experience, and as many years sitting in classrooms as a student, leads me
to believe that instructors are literally forced into giving high grades in order to survive. I know for a fact (because I have repeatedly heard students say it when I sat in classes as a student) that they would get even with their instructors on their SEIs when they (the students) believe the instructors are being too demanding or grading too harshly. I also know that I can predict what kind of evaluation I will receive by how hard the tests are that I give students, and how much I verbally stroke the students. What students obviously want is rigorous classes, lenient attendance policies, and easy tests. Astute instructors know that the way
to survive is to create the illusion students want which is "to get all the students to think that they have mastered a whole lot of difficult material (when they have not) by giving them relatively easy tests so that all of their scores are very high."

From:

James R. Martin, "Evaluating Faculty Based on Student Opinions: Problems, Implications and Recommendations from Deming's Theory of Management Perspective, Issues in Accounting, Vol. 13. No. 4, November 1998, pp. 1079 - 1094.

The purpose of this paper is to present the case against using student opinions to evaluate teaching effectiveness and to recommend an alternative method consistent with Deming's theory of management. Briefly, the thrust of the argument is that student opinions should not be used as the basis for evaluating teaching effectiveness because these aggregated opinions are invalid measures of quality teaching, provide no empirical evidence in this regard, are incomparable across different courses and different faculty members, promote faculty gaming and competition, tend to distract all participants and observers from the learning mission of the university, and insure the sub-optimization and further decline of the higher education system. Using student opinions to evaluate, compare and subsequently rank faculty members represents a severe form of a problem Deming referred to as a "deadly disease of Western style management." The theme of the alternative approach is that learning on a program-wide basis should be the primary consideration in the evaluation of teaching effectiveness. Emphasis should shift from student opinion surveys to the development and assessment of program-wide learning outcomes. To achieve this shift in emphasis, the university performance measurement system needs to be redesigned to motivate faculty members to become part of an integrated learning development and assessment team, rather than a group of independent contractors competing for individual rewards.

From:

J.A. Centra and F.R. Creech, "The Relationship Between Student, Teacher, and Course Characteristics and Student Ratings of Teacher Effectiveness," SIR Report No. 4, Educational Testing Service, Princeton, N.J., 1976, pp. 24-27.

Grade expected. Almost all students in the sample completed SIR toward the end of the course and prior to receiving their final grade. Their expected course grade, which they provided at that time, may have been a somewhat optimistic estimate of their final mark; as Table 3 shows, about 80 percent expected to receive an A or B grade. Nevertheless previous studies have demonstrated a strong relationship between expected grade and actual grade received (Walsh, 1976).

Table 3 indicates a moderate relationship between expected grades and student ratings of teacher effectiveness. Students expecting an A grade gave a mean rating of 3.95; while those expecting a D grade gave a mean rating of 3.02. Differences between pairs of mean ratings for the four expected grades, as the Scheffé tests indicate, were significant (p < .001). Moreover, the correlation between expected grade and rating of teacher effectiveness across 14,023 students was 0.19. A similar correlation was found when classes were used as the unit of analysis. That is, the correlation between the average expected grade for each class and the average rating of teacher effectiveness given by the class was .20 (N=9, 194). Thus, while there seems to be a relationship, it is not especially strong.

Expected grade was also correlated with ratings of the value of the course to the student. These correlations were slightly higher: individual student responses correlated .26 and the means correlated .31.

Table 3

Ratings of Teacher Effectiveness According to
Grade Expected by Student
N = 14,023 Students

                                              Scheffé test for
                                            multiple comparisons,
Grade Expected   N     Mean Rating(1)      Significant differences

     A         4583       3.95       $--    $--    $--
                                       |      |      |
                                     .001     |      |
                                       |      |      |   $--   $--
     B         6711       3.74       $--      |      |     |     |
                                            .001     |   .001    |
                                              |      |     |     |
     C         3503       3.41              $--      |   $--     |    $--
                                                     |           |      |
                                                   .001        .001   .001
                                                     |           |      |
     D          226       3.02                     $--         $--    $--

(1)1 X 4 analysis of variance, p < .001.


        Correlation between expected grade and rating of teacher
effectiveness was .19 (N = 14,023).

        Correlation between expected grade and value of course to
student was .26 (N = 14,093).

        Correlation between mean expected grade for each class and mean
rating of teacher effectiveness was .20 (N= 9,194 Classes).

        Correlation between mean expected grade for each class and mean
rating of value of course to student was .31 (N = 9,184 Classes).
One way of interpreting the correlations between expected grades and ratings
is to view them as partial evidence for validity. That is, if a grade or
an expected grade is a reflection of how much students know about the subject
matter at the end of a course, then there ought to be at least a modest
relationship with their ratings of teacher effectiveness. So, in part,
the relationship reported can be viewed as modest evidence that students
rate higher those courses in which they learn most.(1)

A major concern, however, is that grades might influence ratings and, as discussed earlier, that students will reward easy grading teachers with higher ratings. It is difficult to determine the extent to which the 0.19 to 0.31 correlations reflect easy grading practices or support the validity of the ratings. Certainly there does not seem to be overriding evidence that students rate an instructor favorably or unfavorably because of the grades they receive or anticipate receiving, although there may be occasions when that does occur. For example, Holmes (1972) reported that if students' actual grades are lower than they expected to receive, they then tend to give the instructor lower ratings. Similarly, our analysis of ratings using both grades expected by students and their cumulative grade-point average to classify students revealed interesting differences. These will be discussed in the next section.

In summary, expected grades tended to be related positively to ratings of teacher effectiveness and ratings of class value, a finding that agrees with several previous studies (e.g., Weaver, 1960; Spencer, 1965; Stewart and Malpass, 1966; Granzin and Painter, 1973).

Notes:

(1)A better way to investigate the relationship between student learning and their rating of courses and teachers is to use an objective measure of learning such as an end-of-course achievement test given in many sections of the same course. An SIR study of this kind is published as ETS Research Bulletin 76-06 and is included as part of SIR Report #4 (student ratings of instruction and their relationship to student learning).


From:

W.E. Cashin, "Student Ratings of Teachers: A Summary of the Research," IDEA Paper No. 20, September, 1988, pp. 1-6.

In the social sciences validity correlations above 0.70 are unusual, especially when studying complex phenomena (e.g., learning). Thus, correlations between 0.20 and 0.49 are practically useful.


From:

G.M. Gillmore and Anthony Greenwald, "The Effects of Course Demands and Leniency on Student Ratings of Instruction," Report 94-4 Office of Educational Assessment, University of Washington, April 1994, pp. 15-16.

Grades: This study found a correlation between grades, both expected and relative, and average ratings, much as has been found in other studies. The results of the multiple regression argues that lenient grading standards are a positively biasing factor beyond any mediating effects of learning, although absent an independent measure of learning, we can not be sure of this conclusion. We make this claim because of the significant beta weight for grades even after the effects of the proportion of valuable hours to total hours and the intellectual challenge or effort were removed. Indeed, according to these results, the way to achieve high ratings is to do things in class and give out-of-class assignments that are viewed by students as valuable, to make the course intellectually challenging, and to promise students high grades or grades that are higher than they are accustomed to receiving. It would be a mistake, however, to conclude that by giving high grades along one can assure high ratings.

The study found that the student ratings general factor appears to be influenced by three factors: students perceptions of the ratio of valuable hours to total hours, the challenge of the course, and their grades in the course. In terms of arguing for student ratings as an agent for improved teaching, the first two legs are very favorable. No one could argue with intellectually challenging students with educationally valuable experiences. The third leg, however, is troubling. For example, one could hypothesize a cycle of grade inflation --giving higher grades leads to higher ratings and the averages of both slowly creep upward. Insofar as the relative grade item is a good proxy variable for an index of grading leniency, one might adjust ratings to take that factor into account. Such an adjustment, if done carefully, might serve as a force against grading leniency as well as enhancing the validity of the ratings.


From:

Peter Seldin, "Faculty Assessment Changes In the Works at B-Schools," AACSB, Fall 1993, p. 17.

Don't take assessment data gathered for the purpose of improving teaching performance and then use it for tenure and promotion decisions.

Confidentiality of the data must remain inviolate. Should data obtained for the purpose of strengthening performance surreptitiously be used for personnel decisions, it will have an immediate chilling, even fatal, effect on the credibility of the entire faculty evaluation program.


From:

Michael Scriven, "Using Student Rating in Teacher Evaluations," Evaluation Perspective, January, 1994, pp. 1-2, 4-5.

Another problem with the use of rating forms for summative valuation is that many of them ask the wrong global or overall questions. This is of great importance since, when you get down to real implementations, these overall questions, usually coming at the end of the form, are the ones on which most personnel dcisions are based. (Sometimes - but it's a worse alternative - the average score on all questions is used.) Common examples of this kind of mistake include the use of forms whose 'key question' asks for (i) comparisons with other teachers, (ii) whether the respondent would recommend the course to a friend with similar interests, or (iii) whether 'it's one of the best courses' one has had. All are face-invalid and certainly provide a worse basis for adverse personnel action than the polygraph in criminal cases.

Based on examination of some hundreds of forms that are or have been used for personnel decisions (as well as professional development), the previous considerations entail that not more than one or two could stand up in a serious hearing. A few more are defensible as a partial basis for professional development (formative evaluation); but if used in the usual way for that purpose, they are also invalid.


From:

Robert Uzgalis, "Measuring the Success of Teaching."

Forcing students to do lots of homework and projects is not likely to prove popular with students and it will require determination on the part of the teacher. It may not make the teacher popular either, depending on the general situation at the university. But in terms of education, moving up the abstraction ladder doing several projects at each level forcing the student to master that level, will work to make the mass of students assimilate and make the material a part of them.

The concept of student teacher evaluations is bogus. Students can not evaluate teachers, because they are not yet in a position to understand what they are to learn. Giving credence to student teacher evaluations only lowers the standard of teaching. Giving response forms to students so they can complain about teaching can give reasonable feedback to inexperienced teachers, things like one can't be heard in the back of the room or that the writing on transparencys can't be read. But experienced teachers handle these things quickly at the beginning of a class. Evaluation of teaching can only be done by peers, and a peer review system.

From:

Paul A. Wagner, “Let’s Return Texas to Times When Students Had to Prove Mettle,” Houston Chronicle, August 25,1996, p.5C

Has anyone considered how much an incompetent physician, teacher, lawyer, or therapist costs society? The idea of having students prove their mettle was to allow colleges quality control over the entry of new professionals. Unfortunately, the legislators have made the exercise of this responsibility economic suicide. No matter how good the instruction, some people are just not cut out to be neurosurgeons, accountants or school teachers. Coupled with the advent of grade inflation and higher education as a retailer of credit hours, was the universal adoption of student satisfaction surveys allegedly measuring teaching effectiveness. There is a double irony in this practice. First, if a student were in a position to know whether or not he or she got an appropriate amount of “knowledge” in the most appropriate way, there would be no point to the student being a student in the first place. He or she would already know too much. If the student is innocent of the subject matter and of effective techniques for getting it across, he or she should not be asked to evaluate something outside his or her realm of expertise. In this light, it is readily seen that student evaluations of teaching are not and cannot be a measure of teaching excellence.

Instead, student evaluations of teaching amount to little more than a marketing instrument revealing student satisfaction. And what are students likely to find satisfying? Something fun, something reflecting minimal inconvenience while advancing the student toward the goal of eventual economic reward. What is most unsatisfying is being asked to prove your mettle. W.E. Deming and other gurus of quality theory have long acknowledged that employees will produce the numbers they are to be judged by. Consequently, it is imperative that the numbers that leadership seeks matter. As a result of student satisfaction surveys, professors are rewarded by minimizing immediate personal inconveniences requiring students to “prove their mettle.” This problem may be shortly aggravated further, at least in Texas. A proposal sponsored by state Sen. Bill Ratliff, R-Mt. Pleasant, to be considered for legislative action in the next session (1997), permits the firing of tenured faculty if their student evaluations are “below standard” (whatever that means). There is a cry for accountability in higher education today. That cry is certainly reasonable. To address the cry, matters of accountability must be taken seriously and not used by political demagogues seeking to enrich their personal fortune in university administration or in the Legislature.

From:

Dr. John C. Damron, "Instructor Personality and the Politics of the Classroom", Social Sciences Department, Douglas College, P.O. Box 2503, New Westminster, B.C., Canada, VL3 5B2, (604) 527-5312

In institutional environments in which instructors' student ratings are visible and student achievement is essentially invisible, inequities are inevitable. Instructors who are skilled at the art of impression management are likely to receive high student ratings whether or not their students have adequately mastered course materials. In contrast, instructors with effective pedagogical skills who cannot or will not manage students' impressions will receive substantially poorer ratings, especially if they fail to exude liberalism, exhibitionism and other key personality attributes. These findings, which would seem to be consistent with those of educational seduction researchers, call into question the validity of student ratings as measures of instructional effectiveness.

From:

Henry H. Bauer, " The New Generations: Students Who Don't Study", Professor of Chemistry & Science Studies, Virginia Polytechnic Institute & State University, Blacksburg, VA 24061-0247

No matter how discouraged or infuriated I become over the behavior of individual students, even a moment's reflection brings me back to the realization that the greatest victims of low standards, low expectations, grade inflation, and the rest, are those very students--our very own children and grandchildren. And then I come to treasure all the more, those few students in whom I generated some spark.

And then I become doubly angry--implacably, impenitently, unalterable angry--at those who bear the prime responsibility for the corruption of higher education: those many administrators and faculty who lack ideals, who lack conviction, who in Australia are called gutless wonders. We've come to the sad pass that the reform of education must come from the outside, because the rotteness inside has now reached so far that it will not and cannot reform itself.

Let me close with a quote from Christopher Lasch:

"The moral bottom has dropped out of our culture. Americans have no compelling incentive to postpone gratification, because they no longer live in the future....There is only one cure for the malady that afflicts our culture, and that is to speak the truth about it."

An excellent publication is the Virginia Scholar, Editor, Henry H. Bauer, 1306 Highland Circle, Blacksburg, VA 24060-5623.

From: Donald Simanek, The Decline and Fall of Education, Part I

Now in the school where I teach, it's not uncommon to have a class in which there's not one student meeting this outmoded criterion for an A or B student. One is faced with an entire class of the calibre of those we used to 'write off' and ignore. There may be no one, save perhaps an occasional foreign exchange student, to set a standard of high achievement, demonstrating to others that mastery of such difficult material is possible by mere mortals.

Today we are searching, like Diogenes, for anyone capable of earning an honest A.

Try as we might to maintain grading standards in the sciences, we are under great pressure to adapt to the grade inflation which has caused some departments on campus to give nothing but A and B grades, even to students who 'never crack a book.' In some 'disciplines' the only way to get a C or below is to annoy the instructor, or fail to attend class! It does seem that the disciplines which have shown the greatest grade inflation are those where the course 'content' is mostly 'hot air.' [I call this 'the gas law of education.']

I've even had students ask, with some indignation, "Why must we work so hard in a physics course to get a measly C when we can get A's in non-science courses without ever studying?" I respond, "Yes, I know that's true, but why should there be any course on campus you can get an A in without studying ?"

From:

J.E. Stone, "Inflated Grades, Inflated Enrollment, and Inflated Budgets: Analysis and Call for Review at the State Level," Education Policy Analysis Archives, v3, n11.

Abstract: Reports of the past 13 years that call attention to deficient academic standards in American higher education are enumerated. Particular attention is given the Wingspread Group's recent 'An American Imperative: Higher Expectations for Higher Education.' Low academic standards, grade inflation, and budgetary incentives for increased enrollment are analyzed and a call is made for research at the state level. Reported trends in achievement and GPAs are extrapolated to Tennessee and combined with local data to support the inference that 15% of the state's present day college graduates would not have earned a degree by mid 1960's standards. A conspicuous lack of interest by public oversight bodies is noted despite a growing public awareness of low academic expectations and lenient grading and an implicit budgetary impact of over $100 million. Various academic policies and the dynamics of bureaucratic control are discussed in relationship to the maintenance of academic standards. The disincentives for challenging course requirements and responsible grading are examined, and the growing movement to address academic quality issues through better training and supervision of faculty are critiqued. Recommendations that would encourage renewed academic integrity and make learning outcomes visible to students, parents, employees, and the taxpaying public are offered and briefly discussed.

EXCERPT FROM:
Cashin, William. (1990). Students Do Rate Different Academic Fields Differently. In Theall, M. & Franklin J. (Eds), Student Ratings of Instruction: Issues For Improving Practice. Jossey-Bass, Inc. San Francisco, 1990.
--------------

CROSS-DISCIPLINE RATINGS BIAS

If you ask a college teacher whether students rate different academic fields differently, he or she will most probably say yes. If you ask why, you are not likely to be given much justification beyond the conviction that different fields are different. Nevertheless, there is increasing evidence that the conventional wisdom is correct. Students do rate academic fields differently. What is not clear is why....

The high group tends to consist of the arts and humanities. This trend is not universal, however; English language and literature and history both fall into the medium-low group. The low groups tend to consist mostly of business, economics, computer science, math, physical sciences, and engineering. The biological and social sciences and health and other
professions tend to fall somewhere in the middle.

If we look at "Course Effectiveness" and "Instructor Effectiveness" combined, we see that the fine and applied arts and music fall into the high group for both measures. If we consider fields that are high on one measure and medium-high on the other, art, communications, foreign languages and literature, home economics, secretarial studies, and speech also fall toward the high end. This is very much a humanities cluster, with the exception of home economics and secretarial studies.

Several fields fall into the low group for both course effectiveness and instructor effectiveness: business and management, computer and information sciences, data-processing technologies, economics, engineering, physical sciences, and physics. To the fields that were low on one measure and medium low on the other we must add accounting, chemistry, mathematical sciences, and philosophy. This is very much a math-science-technical cluster, with the exception of philosophy and,
perhaps, business and management.

The primary implication [of these findings] is that...we need to decide what to do about this phenomenon when we interpret student-ratings data. Administrators can no longer look at data from a variety of fields and unquestioningly compare numbers directly. Instructors cannot look at two courses they are teaching and necessarily assume that, if their ratings for the two courses are the same, that they taught both courses equally well.

The real problem arises from our not knowing why the different fields are rated differently. This finding is not due just to variations in student motivation (for example, required verses elective courses) or class size. In one unpublished analysis of IDEA data it was found, even after researchers controlled for students' motivation and class size, that differences in academic fields explained an additional 10 percent or more of the variance for some IDEA course objectives. In another study of a sample of IDEA data, 14-18 percent of the remaining variance was explained after controlling for differences among institutions, in number of courses for each field, in student motivation, and in class size.

There are several possible explanations for differences in the ratings of different academic fields. One is that the more quantitative courses tend to receive lower ratings. The low fields tend to be math, science, engineering, and quantitative business courses (for example, accounting and economics). A possible explanation for these differences is that students' quantitative skills are more poorly developed than their verbal skills. This would make quantitative courses more difficult to teach. Moreover, quantitative courses may receive lower ratings because students have lower expectations of success and lower actual rates of success. We have evidence that higher student ratings are related to...students' satisfaction and that, as grades decrease, students more frequently attribute their poor performance to factors external to themselves.

Another explanation of different ratings for different fields is that the more sequential courses, where success depends heavily on the mastery of material from a previous course, tend to receive lower ratings. This holds true for most math and science courses and for many professional courses, but it also holds true of foreign language courses, which tend to
receive low ratings. Sequential courses may receive lower ratings because today's students are not studying as much as students have in previous decades and so do not have as solid a foundation for the courses that come later in a sequence....

Yet another explanation is that students in different majors rate course differently, because of differences in attitude, in academic skill and goals, in motivation, in learning styles, or in models of effective teaching. Although students majoring in any given field are likely to vary in many ways, it is quite possible that, taken as a group, they have certain characteristics that are related to how they rate courses and instructors....

A final explanation is that some academic fields are poorly taught... Probably the real explanation lies in some combination of the explanations just offered (pp. 113-119).*
----------------------------------------

* Lawrence Aleamoni (1989), an evaluation expert at the University of Arizona, has made an observation of a similar sort regarding rating biases against required courses and student biases associated with various course levels (e.g., freshman, sophomore, and the like). He reports that the variables that distinguish a required course from an elective, and that identify courses by level (freshman, sophomore, and so on) do seem to generate significant differences in student ratings. For example, the higher the proportion of students taking the class as a requirement, the lower the overall rating. [Moreover], freshmen tend to rate their teachers significantly lower than do sophomores, sophomores tend to rate them significantly lower than do juniors, and so on.

"Without deviating from the norm, progress is not possible."-Frank Zappa

Society for a Return to Academic Standards

False Advertising

Last Updated: November 3, 1999

Send Comments To: donald.crumbley@tamucc.edu

More Ammunition

About the University

Research & Academics

Quick Links

Visitor Information