University and Quality Systems. Evaluating faculty performance in face-to-face and online programs
A comparison of Likert and BARS instruments
DOI:
https://doi.org/10.46661/ijeri.10983Keywords:
university, quality, faculty performance, questionnaires, Likert, BARSAbstract
The assessment of faculty or teaching staff performance is key in quality systems in the university context. This assessment is usually done through student satisfaction surveys that use Likert or BARS (Behavioral Anchored Rating Scales) instruments to measure student perceptions of teaching staff effectiveness. This paper examines the ambiguity, clarity, and precision of these two types of instruments. The authors, using an experimental methodology and with the participation of 2,223 students from four Spanish universities, during six academic years (between 2019 and 2024), analyze the three aspects mentioned (ambiguity, clarity, and precision) in both types of questionnaires. The results confirm the existence of significant differences between the instruments. The results also show that although doubts about the ambiguity, lack of clarity and precision of Likert-type questionnaires are justified, these aspects can be improved by BARS-type instruments. The conclusions drawn invite administrators and policymakers, quality agencies, and university managers to consider which of these two instruments is more appropriate for gathering the information they need to make better decisions about faculty promotion.
Downloads
References
Alvarado Lagunas, E., Ramírez, D. M., & Téllez, E. A. (2016). Percepción de la calidad educativa: caso aplicado a estudiantes de la Universidad Autónoma de Nuevo León y del Instituto Tecnológico de Estudios Superiores de Monterrey. Revista de la Educación Superior, 45(180), 55–74. https://doi.org/10.1016/j.resu.2016.06.006
Arubayi, E. A. (1987). Improvement of Instruction and Teacher Effectiveness: Are Student Ratings Reliable and Valid? Higher Education, 16(3), 267–278. https://doi.org/10.1007/BF00148970
Arvey, R. D., & Hoyle, J. C. (1974). A Guttman approach to the development of behaviorally based rating scales for systems analysts and programmer/analysts. Journal of Applied Psychology, 59(1), 61–68. https://doi.org/10.1037/h0035830
Bernardin, H. J. (1977). Behavioural expectation scales versus summated scales. Journal of Applied Psychology, 62(4), 422–427. https://doi.org/10.1037/0021-9010.62.4.422
Bernardin, H. J., Alvares, K. M., & Cranny, C. J. (1976). A recomparison of behavioral expectation scales to summated scales. Journal of Applied Psychology, 61(5), 564–570. https://doi.org/10.1037/0021-9010.61.5.564
Bernardin, H. John., & Beatty, R. W. (1984). Performance appraisal: Assessing human behavior at work (PWS, Ed.). Kent Pub. Co.
Borman, W. C., & Vallon, W. R. (1974). A view of what can happen when Behavioral Expectation Scales are developed in one setting and used in another. Journal of Applied Psychology, 59(2), 197–201. https://doi.org/10.1037/h0036312
Borman, W., & Dunnette, M. (1975). Behavior-based versus trait-oriented performance ratings: An empirical study. Journal of Applied Psychology, 60(5), 561–565. https://doi.org/10.1037/0021-9010.60.5.561
Campbell, J. P., Dunnette, M. D., Arvey, R. D., & Hellervik, L. V. (1973). The development and evaluation of behaviorally based rating scales. Journal of Applied Psychology, 57(1), 15–22. https://doi.org/10.1037/h0034185
Cone, C., Viswesh, V., Gupta, V., & Unni, E. (2018). Motivators, barriers, and strategies to improve response rate to student evaluation of teaching. Currents in Pharmacy Teaching and Learning, 10 (12), 1543-1549. https://doi.org/10.1016/J.CPTL.2018.08.020
Cunningham, S., Laundon, M., Cathcart, A., Bashar, M. A., & Nayak, R. (2023). First, do no harm: automated detection of abusive comments in student evaluation of teaching surveys. Assessment & Evaluation in Higher Education, 48(3), 377–389. https://doi.org/10.1080/02602938.2022.2081668
De-Juanas Oliva, Á., & Beltrán Llera, J. A. (2013). Valoraciones de los estudiantes de ciencias de la educación sobre la calidad de la docencia universitaria. Educación XX1, 17(1), 59–82. https://doi.org/10.5944/educxx1.17.1.10705
De La Orden, A. (2009). Evaluación y calidad: análisis de un modelo. Estudios sobre Educación, 16, 17–36. https://doi.org/10.15581/004.16.22426
Dickinson, T. L., & Zellinger, P. M. (1980). A comparison of the behaviorally anchored rating and mixed standard scale formats. Journal of Applied Psychology, 65(2), 147–154. https://doi.org/10.1037//0021-9010.65.2.147
Edwards, A., & Kenney, K. (1946). A comparison of the Thurstone and Likert techniques of attitude scale construction. Journal of Applied Psychology, 30(1), 72. https://doi.org/10.1037/h0062418
Escobar-Pérez, J., & Cuervo-Martínez, Á. (2008). Validez de contenido y juicio de expertos: Una aproximación a su utilización. Avances En Medición, 6, 27–36.
Feistauer, D., & Richter, T. (2016). How reliable are students’ evaluations of teaching quality? A variance components approach. Assessment & Evaluation in Higher Education, 47(8), 1–17. https://doi.org/10.1080/02602938.2016.1261083
Fernández Millán, J. M., & Fernández Navas, M. (2013). Elaboración de una escala de evaluación de desempeño para educadores sociales en centros de protección de menores. Intangible Capital, 9(3), 571–589. https://doi.org/10.3926/ic.410
Fogli, L., Hulin, C. L., & Blood, M. R. (1971). Development of first-level behavioral job criteria. Journal of Applied Psychology, 55(1), 3–8. https://doi.org/10.1037/h0030631
Gil Edo, M. T., Roca Puig, V., & Camisón Zornoza, C. (1999). Hacia modelos de calidad de servicio orientados al cliente en las universidades públicas: el caso de la Universitat Jaume I. Investigaciones Europeas de Dirección y Economía de La Empresa, 5(2), 69–92.
Gómez-García, M., Soto-Varela, R., Boumadan, M., & Matosas-López, L. (2023). Can the use patterns of social networks in university students predict the utility perceived in digital educational resources? Interactive Learning Environments, 31(3), 1279–1292. https://doi.org/10.1080/10494820.2020.1830120
González López, I. (2003). Determinación de los elementos que condicionan la calidad de la universidad: Aplicación práctica de un análisis factorial. RELIEVE - Revista Electrónica de Investigación y Evaluación Educativa, 9(1), 83–96. https://doi.org/10.7203/relieve.9.1.4351
Hadie, S. N. H., Hassan, A., Talip, S. B., & Yusoff, M. S. B. (2019). The Teacher Behavior Inventory: validation of teacher behavior in an interactive lecture environment. Teacher Development, 23(1), 36–49. https://doi.org/10.1080/13664530.2018.1464504
Harari, O., & Zedeck, S. (1973). Development of Behaviorally Anchored Scales for the Evaluation of Faculty Teaching. Journal of Applied Psychology, 58(2), 261–265. https://doi.org/10.1037/h0035633
Hernández Romero, G. (2022). Perspective of the university student on the practice of values in teachers. IJERI: International Journal of Educational Research and Innovation, 18, 132–150. https://doi.org/10.46661/IJERI.5453
Hom, P. W., DeNisi, A. S., Kinicki, A. J., & Bannister, B. D. (1982). Effectiveness of performance feedback from behaviorally anchored rating scales. Journal of Applied Psychology, 67(5), 568–576. https://doi.org/10.1037/0021-9010.67.5.568
Hornstein, H. A. (2017). Student evaluations of teaching are an inadequate assessment tool for evaluating faculty performance. Cogent Education, 4(1), 1–8. https://doi.org/10.1080/2331186X.2017.1304016
Huybers, T. (2014). Student evaluation of teaching: the use of best–worst scaling. Assessment & Evaluation in Higher Education, 39(4), 496–513. https://doi.org/10.1080/02602938.2013.851782
Keaveny, T. J., & McGann, A. F. (1975). A comparison of behavioral expectation scales and graphic rating scales. Journal of Applied Psychology, 60(6), 695–703. https://doi.org/10.1037/0021-9010.60.6.695
Kell, H. J., Martin-Raugh, M. P., Carney, L. M., Inglese, P. A., Chen, L., & Feng, G. (2017). Exploring methods for developing Behaviorally Anchored Rating Scales for evaluating structured interview performance, 1, 1-17. https://doi.org/10.1002/ets2.12152
Klimenko, O., Hernández-Flórez, N. E., Tamayo-Lopera, D. A., Cudris-Torres, L., Niño-Vega, J. A., Vizcaino-Escobar, A. E., Klimenko, O., Hernández-Flórez, N. E., Tamayo-Lopera, D. A., Cudris-Torres, L., Niño-Vega, J. A., & Vizcaino-Escobar, A. E. (2023). Assessment of the teaching performance favors to creativity in a sample of Colombian public and private educational institutions. Revista de Investigación, Desarrollo e Innovación, 13(1), 115–128. https://doi.org/10.19053/20278306.V13.N1.2023.16071
Layne, B. H., Decristoforo, J. R., & Mcginty, D. (1999). Electronic versus traditional student ratings of instruction. Research in Higher Education, 40(2), 221–232. https://doi.org/10.1023/A:1018738731032
Leguey Galán, S., Leguey Galán, S., & Matosas López, L. (2018). ¿De qué depende la satisfacción del alumnado con la actividad docente? Espacios, 39(17), 13–29.
Linse, A. R. (2017). Interpreting and using student ratings data: Guidance for faculty serving as administrators and on evaluation committees. Studies in Educational Evaluation, 54, 94–106. https://doi.org/10.1016/j.stueduc.2016.12.004
Lizasoain-Hernández, L., Etxeberria-Murgiondo, J., & Lukas-Mujika, J. F. (2017). Propuesta de un nuevo cuestionario de evaluación de los profesores de la Universidad del País Vasco. Estudio psicométrico, dimensional y diferencial. RELIEVE - Revista Electrónica de Investigación y Evaluación Educativa, 23(1), 1–21. https://doi.org/10.7203/relieve.23.2.10436
Lobo, J. (2023). Students’ acceptance of google classroom as an effective pedagogical tool for Physical Education. IJERI: International Journal of Educational Research and Innovation, 20, 1–15. https://doi.org/10.46661/IJERI.7535
Luna Serrano, E. (2015). Validación de constructo de un cuestionario de evaluación de la competencia docente. Revista Electrónica de Investigación Educativa, 17(3), 13-27.
Marsh, H. W. (1982). SEEQ: a reliable, valid, and useful instrument for collecting students’ evaluations of university teaching. British Journal of Educational Psychology, 52(2), 77–95. https://doi.org/10.1111/j.2044-8279.1982.tb02505.x
Marsh, H. W. (1991). A multidimensional perspective on students’ evaluations of teaching effectiveness - reply to Abrami and Dapollonia (1991). Journal of Educational Psychology, 83(3), 416–421. https://doi.org/10.1037//0022-0663.83.3.416
Marsh, H. W. (2007). Students’ Evaluations of University Teaching: Dimensionality, Reliability, Validity, Potential Biases and Usefulness. In S. J. C. Perry R.P. (Ed.), The Scholarship of Teaching and Learning in Higher Education: An Evidence-Based Perspective (pp. 319–383). Springer. https://doi.org/10.1007/1-4020-5742-3_9
Martínez, A., Smith, K., Llop-Gironés, A., Vergara, M., & Benach, J. (2016). La mercantilización de la sanidad: El caso de Catalunya. Cuadernos de Relaciones Laborales, 34(2), 335–355. https://doi.org/10.5209/CRLA.53460
Martin-Raugh, M., Tannenbaum, R. J., Tocci, C. M., & Reese, C. (2016). Behaviourally Anchored Rating Scales: An application for evaluating teaching practice. Teaching and Teacher Education, 59, 414–419. https://doi.org/10.1016/j.tate.2016.07.026
Mateo, J. (2000). La evaluación del profesorado y la gestión de la calidad de la educación. Hacia un modelo comprensivo de evaluación sistemática de la docencia. Revista de Investigación Educativa, 18(1), 7–34.
Matosas-López, L. (2023). Measuring Teaching Effectiveness with Behavioral Scales: A Systematic Literature Review. The International Journal of Educational Organization and Leadership, 30(1), 43–58. https://doi.org/10.18848/2329-1656/CGP/V30I01/43-58
Matosas-López, L., & Cuevas-Molano, E. (2022). Assessing Teaching Effectiveness in Blended Learning Methodologies: Validity and Reliability of an Instrument with Behavioral Anchored Rating Scales. Behavioral Sciences, 12(10), 394–414. https://doi.org/10.3390/bs12100394
Matosas-López, L., Leguey-Galán, S., & Doncel-Pedrera, L. M. (2019). Converting Likert scales into Behavioral Anchored Rating Scales (Bars) for the evaluation of teaching effectiveness for formative purposes. Journal of University Teaching & Learning Practice, 16(3), 1–24. https://doi.org/https://doi.org/10.53761/1.16.3.9
Matosas-López, L., Leguey-Galán, S., & Leguey-Galán, S. (2019). Cómo resolver el problema de pérdida de información conductual en el diseño de Behaviorally Anchored Rating Scales-BARS. El caso de la medición de la eficiencia docente en el contexto universitario. Espacios, 40(19), 6–21.
Matosas-López, L., Muñoz-Cantero, J. M., Molero, D., & Espiñeira-Bellón, E. M. (2023). Estudio psicométrico de un cuestionario con BARS. Una oportunidad para mejorar los programas de medición de la eficacia docente y la toma de decisiones en los procesos de acreditación. Revista Interuniversitaria de Formación del Profesorado, 98(37.1), 95–120. https://doi.org/10.47553/RIFOP.V98I37.1.97313
Matosas-López, L., Romero-Ania, A., & Cuevas-Molano, E. (2019). ¿Leen los universitarios las encuestas de evaluación del profesorado cuando se aplican incentivos por participación? Una aproximación empírica. Revista Iberoamericana Sobre Calidad, Eficacia y Cambio en Educación, 17(3), 99–124. https://doi.org/10.15366/reice2019.17.3.006
Molero López-Barajas, D. M., & Ruiz Carrascosa, J. (2005). La evaluación de la docencia universitaria. Dimensiones y variables más relevantes. Revista de Investigación Educativa, 23(1), 57–84.
Morley, D. D. (2012). Claims about the reliability of student evaluations of instruction: The ecological fallacy rides again. Studies in Educational Evaluation, 38(1), 15–20. https://doi.org/10.1016/j.stueduc.2012.01.001
Nygaard, C., & Belluigi, D. Z. (2011). A proposed methodology for contextualised evaluation in higher education. Assessment & Evaluation in Higher Education, 36(6), 657–671. https://doi.org/10.1080/02602931003650037
Pedregosa, P. R. (2022). Identification of self-esteem levels in secondary school students according to: sex, grade an area of origin. IJERI: International Journal of Educational Research and Innovation, 18, 170–183. https://doi.org/10.46661/IJERI.6090
Perdomo Ortiz, J., & González Benito, J. (2004). Medición de la gestión de la calidad total: una revisión de la literatura. Cuadernos de Administración, 17(24), 91–109.
Reardon, M., & Waters, L. K. (1979). Leniency and Halo in Student Ratings of College Instructors: A Comparison of Three Rating Procedures with Implications for Scale Validity. Educational and Psychological Measurement, 39(1), 159–162. https://doi.org/10.1177/001316447903900121
Remmers, H. H. (1928). The relationship between students’ marks and student attitude toward instructors. School & Society, 28, 759–760.
Remmers, H. H. (1971). Rating methods in research of teaching. In Gage & N. L. (Ed) (Eds.), Handbook of research on teaching. Rand McNally.
Roszkowski, M. J., & Soven, M. (2010). Shifting gears: consequences of including two negatively worded items in the middle of a positively worded questionnaire. Assessment & Evaluation in Higher Education, 35(1), 113–130. https://doi.org/10.1080/02602930802618344
Ruiz Carrascosa, J. (2000). La evaluación de la enseñanza por los alumnos en el plan nacional de evaluación de la calidad de las universidades. Construcción de un instrumento de valoración. Revista de Investigación Educativa, 18(2), 433–445.
Santos-Rego, M. A., Sotelino-Losada, A., Jover-Olmeda, G., Naval, C., Álvarez-Castillo, J. L., & Vázquez-Verdera, V. (2017). Diseño y validación de un cuestionario sobre práctica docente y actitud del profesorado universitario hacia la innovación. Educación XX1, 20(2), 39–71. https://doi.org/10.5944/educxx1.19031
Sharon, A. T., & Bartlett, C. J. (1969). Effect of instructional conditions in producing leniency on two types of rating scales. Personnel Psychology, 22(3), 251–263. https://doi.org/10.1111/j.1744-6570.1969.tb00330.x
Shultz, M. M., & Zedeck, S. (2011). Predicting Lawyer Effectiveness: Broadening the Basis for Law School Admission Decisions. Law & Social Inquiry, 36(03), 620–661. https://doi.org/10.1111/j.1747-4469.2011.01245.x
Sierra Sánchez, J. (2012). Factors influencing a student’s decision to pursue a communications degree in Spain. Intangible Capital, 8(1), 43–60. https://doi.org/10.3926/ic.277
Sigurdardottir, M. S., Rafnsdottir, G. L., Jónsdóttir, A. H., & Kristofersson, D. M. (2023). Student evaluation of teaching: gender bias in a country at the forefront of gender equality. Higher Education Research & Development, 42(4), 954–967. https://doi.org/10.1080/07294360.2022.2087604
Spooren, P. (2010). On the credibility of the judge. A cross-classified multilevel analysis on students’ evaluation of teaching. Studies in Educational Evaluation, 36(4), 121–131. https://doi.org/10.1016/j.stueduc.2011.02.001
Spooren, P., Brockx, B., & Mortelmans, D. (2013). On the Validity of Student Evaluation of Teaching: The State of the Art. Review of Educational Research, 83(4), 598–642. https://doi.org/10.3102/0034654313496870
Spooren, P., & Loon, F. Van. (2012). Who participates (not)? A non-response analysis on students’ evaluations of teaching. Procedia - Social and Behavioral Sciences, 69, 990–996. https://doi.org/10.1016/j.sbspro.2012.12.025
Spooren, P., Mortelmans, D., & Thijssen, P. (2012). ‘Content’ versus ‘style’: acquiescence in student evaluation of teaching? British Educational Research Journal, 38(1), 3–21. https://doi.org/10.1080/01411926.2010.523453
Spooren, P., Vandermoere, F., Vanderstraeten, R., & Pepermans, K. (2017). Exploring high impact scholarship in research on student’s evaluation of teaching (SET). Educational Research Review, 22, 129–141. https://doi.org/10.1016/j.edurev.2017.09.001
Stoskopf, C. H., Glik, D. C., Baker, S. L., Ciesla, J. R., & Cover, C. M. (1992). The reliability and construct validity of a Behaviorally Anchored Rating Scale used to measure nursing assistant performance. Evaluation Review, 16(3), 333–345. https://doi.org/10.1177/0193841X9201600307
Toland, M. D., & De Ayala, R. J. (2005). A multilevel factor analysis of students’ evaluations of teaching. Educational and Psychological Measurement, 65(2), 272–296. https://doi.org/10.1177/0013164404268667
Uttl, B., & Smibert, D. (2017). Student evaluations of teaching: Teaching quantitative courses can be hazardous to one’s career. PeerJ, 5, 1–13. https://doi.org/10.7717/peerj.3299
Valero, M. M., & Gonzalez, J. M. G. (2017). El modelo de acreditación del sistema sanitario público en Andalucía. Cuadernos de Relaciones Laborales, 35(1), 187–208. https://doi.org/10.5209/CRLA.54989
Vanacore, A., & Pellegrino, M. S. (2019). How Reliable are Students’ Evaluations of Teaching (SETs)? A Study to Test Student’s Reproducibility and Repeatability. Social Indicators Research, 146, 77–89 https://doi.org/10.1007/s11205-018-02055-y
Veciana Vergés, J. M., & Capelleras i Segura, J. Ll. (2004). Calidad de servicio en la enseñanza universitaria desarrollo y validación de una escala media. Revista Europea de Dirección y Economía de La Empresa, 13(4), 55–72.
Williams, W. E., & Seiler, D. A. (1973). Relationship between measures of effort and job performance. Journal of Applied Psychology, 57(1), 49–54. https://doi.org/10.1037/h0034201
Zhao, J., & Gallant, D. J. (2012). Student evaluation of instruction in higher education: exploring issues of validity and reliability. Assessment & Evaluation in Higher Education, 37(2), 227–235. https://doi.org/10.1080/02602938.2010.523819
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Luis Matosas-López, Sonsoles Leguey-Galán, Cristóbal Ballesteros-Regaña, Noelia Pelicano Piris

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.