University and Quality Systems. Evaluating faculty performance in face-to-face and online programs
A comparison of Likert and BARS instruments
university, quality, faculty performance, questionnaires, Likert, BARSAbstract
The assessment of faculty or teaching staff performance is key in quality systems in the university context. This assessment is usually done through student satisfaction surveys that use Likert or BARS (Behavioral Anchored Rating Scales) instruments to measure student perceptions of teaching staff effectiveness. This paper examines the ambiguity, clarity, and precision of these two types of instruments. The authors, using an experimental methodology and with the participation of 2,223 students from four Spanish universities, during six academic years (between 2019 and 2024), analyze the three aspects mentioned (ambiguity, clarity, and precision) in both types of questionnaires. The results confirm the existence of significant differences between the instruments. The results also show that although doubts about the ambiguity, lack of clarity and precision of Likert-type questionnaires are justified, these aspects can be improved by BARS-type instruments. The conclusions drawn invite administrators and policymakers, quality agencies, and university managers to consider which of these two instruments is more appropriate for gathering the information they need to make better decisions about faculty promotion.
Alvarado Lagunas, E., Ramírez, D. M., & Téllez, E. A. (2016). Percepción de la calidad educativa: caso aplicado a estudiantes de la Universidad Autónoma de Nuevo León y del Instituto Tecnológico de Estudios Superiores de Monterrey. Revista de la Educación Superior, 45(180), 55–74.
Arubayi, E. A. (1987). Improvement of Instruction and Teacher Effectiveness: Are Student Ratings Reliable and Valid? Higher Education, 16(3), 267–278.
Arvey, R. D., & Hoyle, J. C. (1974). A Guttman approach to the development of behaviorally based rating scales for systems analysts and programmer/analysts. Journal of Applied Psychology, 59(1), 61–68.
Bernardin, H. J. (1977). Behavioural expectation scales versus summated scales. Journal of Applied Psychology, 62(4), 422–427.
Bernardin, H. J., Alvares, K. M., & Cranny, C. J. (1976). A recomparison of behavioral expectation scales to summated scales. Journal of Applied Psychology, 61(5), 564–570.
Bernardin, H. John., & Beatty, R. W. (1984). Performance appraisal: Assessing human behavior at work (PWS, Ed.). Kent Pub. Co.
Borman, W. C., & Vallon, W. R. (1974). A view of what can happen when Behavioral Expectation Scales are developed in one setting and used in another. Journal of Applied Psychology, 59(2), 197–201.
Borman, W., & Dunnette, M. (1975). Behavior-based versus trait-oriented performance ratings: An empirical study. Journal of Applied Psychology, 60(5), 561–565.
Campbell, J. P., Dunnette, M. D., Arvey, R. D., & Hellervik, L. V. (1973). The development and evaluation of behaviorally based rating scales. Journal of Applied Psychology, 57(1), 15–22.
Cone, C., Viswesh, V., Gupta, V., & Unni, E. (2018). Motivators, barriers, and strategies to improve response rate to student evaluation of teaching. Currents in Pharmacy Teaching and Learning, 10 (12), 1543-1549.
Cunningham, S., Laundon, M., Cathcart, A., Bashar, M. A., & Nayak, R. (2023). First, do no harm: automated detection of abusive comments in student evaluation of teaching surveys. Assessment & Evaluation in Higher Education, 48(3), 377–389.
De-Juanas Oliva, Á., & Beltrán Llera, J. A. (2013). Valoraciones de los estudiantes de ciencias de la educación sobre la calidad de la docencia universitaria. Educación XX1, 17(1), 59–82.
De La Orden, A. (2009). Evaluación y calidad: análisis de un modelo. Estudios sobre Educación, 16, 17–36.
Dickinson, T. L., & Zellinger, P. M. (1980). A comparison of the behaviorally anchored rating and mixed standard scale formats. Journal of Applied Psychology, 65(2), 147–154.
Edwards, A., & Kenney, K. (1946). A comparison of the Thurstone and Likert techniques of attitude scale construction. Journal of Applied Psychology, 30(1), 72.
Escobar-Pérez, J., & Cuervo-Martínez, Á. (2008). Validez de contenido y juicio de expertos: Una aproximación a su utilización. Avances En Medición, 6, 27–36.
Feistauer, D., & Richter, T. (2016). How reliable are students’ evaluations of teaching quality? A variance components approach. Assessment & Evaluation in Higher Education, 47(8), 1–17.
Fernández Millán, J. M., & Fernández Navas, M. (2013). Elaboración de una escala de evaluación de desempeño para educadores sociales en centros de protección de menores. Intangible Capital, 9(3), 571–589.
Fogli, L., Hulin, C. L., & Blood, M. R. (1971). Development of first-level behavioral job criteria. Journal of Applied Psychology, 55(1), 3–8.
Gil Edo, M. T., Roca Puig, V., & Camisón Zornoza, C. (1999). Hacia modelos de calidad de servicio orientados al cliente en las universidades públicas: el caso de la Universitat Jaume I. Investigaciones Europeas de Dirección y Economía de La Empresa, 5(2), 69–92.
Gómez-García, M., Soto-Varela, R., Boumadan, M., & Matosas-López, L. (2023). Can the use patterns of social networks in university students predict the utility perceived in digital educational resources? Interactive Learning Environments, 31(3), 1279–1292.
González López, I. (2003). Determinación de los elementos que condicionan la calidad de la universidad: Aplicación práctica de un análisis factorial. RELIEVE - Revista Electrónica de Investigación y Evaluación Educativa, 9(1), 83–96.
Hadie, S. N. H., Hassan, A., Talip, S. B., & Yusoff, M. S. B. (2019). The Teacher Behavior Inventory: validation of teacher behavior in an interactive lecture environment. Teacher Development, 23(1), 36–49.
Harari, O., & Zedeck, S. (1973). Development of Behaviorally Anchored Scales for the Evaluation of Faculty Teaching. Journal of Applied Psychology, 58(2), 261–265.
Hernández Romero, G. (2022). Perspective of the university student on the practice of values in teachers. IJERI: International Journal of Educational Research and Innovation, 18, 132–150.
Hom, P. W., DeNisi, A. S., Kinicki, A. J., & Bannister, B. D. (1982). Effectiveness of performance feedback from behaviorally anchored rating scales. Journal of Applied Psychology, 67(5), 568–576.
Hornstein, H. A. (2017). Student evaluations of teaching are an inadequate assessment tool for evaluating faculty performance. Cogent Education, 4(1), 1–8.
Huybers, T. (2014). Student evaluation of teaching: the use of best–worst scaling. Assessment & Evaluation in Higher Education, 39(4), 496–513.
Keaveny, T. J., & McGann, A. F. (1975). A comparison of behavioral expectation scales and graphic rating scales. Journal of Applied Psychology, 60(6), 695–703.
Kell, H. J., Martin-Raugh, M. P., Carney, L. M., Inglese, P. A., Chen, L., & Feng, G. (2017). Exploring methods for developing Behaviorally Anchored Rating Scales for evaluating structured interview performance, 1, 1-17.
Klimenko, O., Hernández-Flórez, N. E., Tamayo-Lopera, D. A., Cudris-Torres, L., Niño-Vega, J. A., Vizcaino-Escobar, A. E., Klimenko, O., Hernández-Flórez, N. E., Tamayo-Lopera, D. A., Cudris-Torres, L., Niño-Vega, J. A., & Vizcaino-Escobar, A. E. (2023). Assessment of the teaching performance favors to creativity in a sample of Colombian public and private educational institutions. Revista de Investigación, Desarrollo e Innovación, 13(1), 115–128.
Layne, B. H., Decristoforo, J. R., & Mcginty, D. (1999). Electronic versus traditional student ratings of instruction. Research in Higher Education, 40(2), 221–232.
Leguey Galán, S., Leguey Galán, S., & Matosas López, L. (2018). ¿De qué depende la satisfacción del alumnado con la actividad docente? Espacios, 39(17), 13–29.
Linse, A. R. (2017). Interpreting and using student ratings data: Guidance for faculty serving as administrators and on evaluation committees. Studies in Educational Evaluation, 54, 94–106.
Lizasoain-Hernández, L., Etxeberria-Murgiondo, J., & Lukas-Mujika, J. F. (2017). Propuesta de un nuevo cuestionario de evaluación de los profesores de la Universidad del País Vasco. Estudio psicométrico, dimensional y diferencial. RELIEVE - Revista Electrónica de Investigación y Evaluación Educativa, 23(1), 1–21.
Lobo, J. (2023). Students’ acceptance of google classroom as an effective pedagogical tool for Physical Education. IJERI: International Journal of Educational Research and Innovation, 20, 1–15.
Luna Serrano, E. (2015). Validación de constructo de un cuestionario de evaluación de la competencia docente. Revista Electrónica de Investigación Educativa, 17(3), 13-27.
Marsh, H. W. (1982). SEEQ: a reliable, valid, and useful instrument for collecting students’ evaluations of university teaching. British Journal of Educational Psychology, 52(2), 77–95.
Marsh, H. W. (1991). A multidimensional perspective on students’ evaluations of teaching effectiveness - reply to Abrami and Dapollonia (1991). Journal of Educational Psychology, 83(3), 416–421.
Marsh, H. W. (2007). Students’ Evaluations of University Teaching: Dimensionality, Reliability, Validity, Potential Biases and Usefulness. In S. J. C. Perry R.P. (Ed.), The Scholarship of Teaching and Learning in Higher Education: An Evidence-Based Perspective (pp. 319–383). Springer.
Martínez, A., Smith, K., Llop-Gironés, A., Vergara, M., & Benach, J. (2016). La mercantilización de la sanidad: El caso de Catalunya. Cuadernos de Relaciones Laborales, 34(2), 335–355.
Martin-Raugh, M., Tannenbaum, R. J., Tocci, C. M., & Reese, C. (2016). Behaviourally Anchored Rating Scales: An application for evaluating teaching practice. Teaching and Teacher Education, 59, 414–419.
Mateo, J. (2000). La evaluación del profesorado y la gestión de la calidad de la educación. Hacia un modelo comprensivo de evaluación sistemática de la docencia. Revista de Investigación Educativa, 18(1), 7–34.
Matosas-López, L. (2023). Measuring Teaching Effectiveness with Behavioral Scales: A Systematic Literature Review. The International Journal of Educational Organization and Leadership, 30(1), 43–58.
Matosas-López, L., & Cuevas-Molano, E. (2022). Assessing Teaching Effectiveness in Blended Learning Methodologies: Validity and Reliability of an Instrument with Behavioral Anchored Rating Scales. Behavioral Sciences, 12(10), 394–414.
Matosas-López, L., Leguey-Galán, S., & Doncel-Pedrera, L. M. (2019). Converting Likert scales into Behavioral Anchored Rating Scales (Bars) for the evaluation of teaching effectiveness for formative purposes. Journal of University Teaching & Learning Practice, 16(3), 1–24.
Matosas-López, L., Leguey-Galán, S., & Leguey-Galán, S. (2019). Cómo resolver el problema de pérdida de información conductual en el diseño de Behaviorally Anchored Rating Scales-BARS. El caso de la medición de la eficiencia docente en el contexto universitario. Espacios, 40(19), 6–21.
Matosas-López, L., Muñoz-Cantero, J. M., Molero, D., & Espiñeira-Bellón, E. M. (2023). Estudio psicométrico de un cuestionario con BARS. Una oportunidad para mejorar los programas de medición de la eficacia docente y la toma de decisiones en los procesos de acreditación. Revista Interuniversitaria de Formación del Profesorado, 98(37.1), 95–120.
Matosas-López, L., Romero-Ania, A., & Cuevas-Molano, E. (2019). ¿Leen los universitarios las encuestas de evaluación del profesorado cuando se aplican incentivos por participación? Una aproximación empírica. Revista Iberoamericana Sobre Calidad, Eficacia y Cambio en Educación, 17(3), 99–124.
Molero López-Barajas, D. M., & Ruiz Carrascosa, J. (2005). La evaluación de la docencia universitaria. Dimensiones y variables más relevantes. Revista de Investigación Educativa, 23(1), 57–84.
Morley, D. D. (2012). Claims about the reliability of student evaluations of instruction: The ecological fallacy rides again. Studies in Educational Evaluation, 38(1), 15–20.
Nygaard, C., & Belluigi, D. Z. (2011). A proposed methodology for contextualised evaluation in higher education. Assessment & Evaluation in Higher Education, 36(6), 657–671.
Pedregosa, P. R. (2022). Identification of self-esteem levels in secondary school students according to: sex, grade an area of origin. IJERI: International Journal of Educational Research and Innovation, 18, 170–183.
Perdomo Ortiz, J., & González Benito, J. (2004). Medición de la gestión de la calidad total: una revisión de la literatura. Cuadernos de Administración, 17(24), 91–109.
Reardon, M., & Waters, L. K. (1979). Leniency and Halo in Student Ratings of College Instructors: A Comparison of Three Rating Procedures with Implications for Scale Validity. Educational and Psychological Measurement, 39(1), 159–162.
Remmers, H. H. (1928). The relationship between students’ marks and student attitude toward instructors. School & Society, 28, 759–760.
Remmers, H. H. (1971). Rating methods in research of teaching. In Gage & N. L. (Ed) (Eds.), Handbook of research on teaching. Rand McNally.
Roszkowski, M. J., & Soven, M. (2010). Shifting gears: consequences of including two negatively worded items in the middle of a positively worded questionnaire. Assessment & Evaluation in Higher Education, 35(1), 113–130.
Ruiz Carrascosa, J. (2000). La evaluación de la enseñanza por los alumnos en el plan nacional de evaluación de la calidad de las universidades. Construcción de un instrumento de valoración. Revista de Investigación Educativa, 18(2), 433–445.
Santos-Rego, M. A., Sotelino-Losada, A., Jover-Olmeda, G., Naval, C., Álvarez-Castillo, J. L., & Vázquez-Verdera, V. (2017). Diseño y validación de un cuestionario sobre práctica docente y actitud del profesorado universitario hacia la innovación. Educación XX1, 20(2), 39–71.
Sharon, A. T., & Bartlett, C. J. (1969). Effect of instructional conditions in producing leniency on two types of rating scales. Personnel Psychology, 22(3), 251–263.
Shultz, M. M., & Zedeck, S. (2011). Predicting Lawyer Effectiveness: Broadening the Basis for Law School Admission Decisions. Law & Social Inquiry, 36(03), 620–661.
Sierra Sánchez, J. (2012). Factors influencing a student’s decision to pursue a communications degree in Spain. Intangible Capital, 8(1), 43–60.
Sigurdardottir, M. S., Rafnsdottir, G. L., Jónsdóttir, A. H., & Kristofersson, D. M. (2023). Student evaluation of teaching: gender bias in a country at the forefront of gender equality. Higher Education Research & Development, 42(4), 954–967.
Spooren, P. (2010). On the credibility of the judge. A cross-classified multilevel analysis on students’ evaluation of teaching. Studies in Educational Evaluation, 36(4), 121–131.
Spooren, P., Brockx, B., & Mortelmans, D. (2013). On the Validity of Student Evaluation of Teaching: The State of the Art. Review of Educational Research, 83(4), 598–642.
Spooren, P., & Loon, F. Van. (2012). Who participates (not)? A non-response analysis on students’ evaluations of teaching. Procedia - Social and Behavioral Sciences, 69, 990–996.
Spooren, P., Mortelmans, D., & Thijssen, P. (2012). ‘Content’ versus ‘style’: acquiescence in student evaluation of teaching? British Educational Research Journal, 38(1), 3–21.
Spooren, P., Vandermoere, F., Vanderstraeten, R., & Pepermans, K. (2017). Exploring high impact scholarship in research on student’s evaluation of teaching (SET). Educational Research Review, 22, 129–141.
Stoskopf, C. H., Glik, D. C., Baker, S. L., Ciesla, J. R., & Cover, C. M. (1992). The reliability and construct validity of a Behaviorally Anchored Rating Scale used to measure nursing assistant performance. Evaluation Review, 16(3), 333–345.
Toland, M. D., & De Ayala, R. J. (2005). A multilevel factor analysis of students’ evaluations of teaching. Educational and Psychological Measurement, 65(2), 272–296.
Uttl, B., & Smibert, D. (2017). Student evaluations of teaching: Teaching quantitative courses can be hazardous to one’s career. PeerJ, 5, 1–13.
Valero, M. M., & Gonzalez, J. M. G. (2017). El modelo de acreditación del sistema sanitario público en Andalucía. Cuadernos de Relaciones Laborales, 35(1), 187–208.
Vanacore, A., & Pellegrino, M. S. (2019). How Reliable are Students’ Evaluations of Teaching (SETs)? A Study to Test Student’s Reproducibility and Repeatability. Social Indicators Research, 146, 77–89
Veciana Vergés, J. M., & Capelleras i Segura, J. Ll. (2004). Calidad de servicio en la enseñanza universitaria desarrollo y validación de una escala media. Revista Europea de Dirección y Economía de La Empresa, 13(4), 55–72.
Williams, W. E., & Seiler, D. A. (1973). Relationship between measures of effort and job performance. Journal of Applied Psychology, 57(1), 49–54.
Zhao, J., & Gallant, D. J. (2012). Student evaluation of instruction in higher education: exploring issues of validity and reliability. Assessment & Evaluation in Higher Education, 37(2), 227–235.
How to Cite
Copyright (c) 2024 Luis Matosas-López, Sonsoles Leguey-Galán, Cristóbal Ballesteros-Regaña, Noelia Pelicano Piris

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.