University and Quality Systems. Evaluating faculty performance in face-to-face and online programs: A comparison of Likert and BARS instruments

Luis Matosas-López; Sonsoles Leguey-Galán; Cristóbal Ballesteros Regaña; Noelia Pelicano Piris

doi:10.46661/ijeri.10983

Authors

Luis Matosas-López Universidad Rey Juan Carlos https://orcid.org/0000-0001-7313-0146
Sonsoles Leguey-Galán Universidad Rey Juan Carlos https://orcid.org/0000-0001-9117-7458
Cristóbal Ballesteros Regaña Universidad de Sevilla https://orcid.org/0000-0002-9959-6953
Noelia Pelicano Piris Universidad Internacional de la Rioja https://orcid.org/0000-0001-8233-1812

DOI:

https://doi.org/10.46661/ijeri.10983

Keywords:

university, quality, faculty performance, questionnaires, Likert, BARS

Abstract

The assessment of faculty or teaching staff performance is key in quality systems in the university context. This assessment is usually done through student satisfaction surveys that use Likert or BARS (Behavioral Anchored Rating Scales) instruments to measure student perceptions of teaching staff effectiveness. This paper examines the ambiguity, clarity, and precision of these two types of instruments. The authors, using an experimental methodology and with the participation of 2,223 students from four Spanish universities, during six academic years (between 2019 and 2024), analyze the three aspects mentioned (ambiguity, clarity, and precision) in both types of questionnaires. The results confirm the existence of significant differences between the instruments. The results also show that although doubts about the ambiguity, lack of clarity and precision of Likert-type questionnaires are justified, these aspects can be improved by BARS-type instruments. The conclusions drawn invite administrators and policymakers, quality agencies, and university managers to consider which of these two instruments is more appropriate for gathering the information they need to make better decisions about faculty promotion.

Downloads

Download data is not yet available.

Author Biography

Luis Matosas-López, Universidad Rey Juan Carlos

Profesor a tiempo completo del Departamento de Economía Financiera, Contabilidad e Idioma Moderno, dentro del área de Informática Aplicada a las Ciencias Sociales.

References

Alvarado Lagunas, E., Ramírez, D. M., & Téllez, E. A. (2016). Percepción de la calidad educativa: caso aplicado a estudiantes de la Universidad Autónoma de Nuevo León y del Instituto Tecnológico de Estudios Superiores de Monterrey. Revista de la Educación Superior, 45(180), 55–74. https://doi.org/10.1016/j.resu.2016.06.006

Arubayi, E. A. (1987). Improvement of Instruction and Teacher Effectiveness: Are Student Ratings Reliable and Valid? Higher Education, 16(3), 267–278. https://doi.org/10.1007/BF00148970

Arvey, R. D., & Hoyle, J. C. (1974). A Guttman approach to the development of behaviorally based rating scales for systems analysts and programmer/analysts. Journal of Applied Psychology, 59(1), 61–68. https://doi.org/10.1037/h0035830

Bernardin, H. J. (1977). Behavioural expectation scales versus summated scales. Journal of Applied Psychology, 62(4), 422–427. https://doi.org/10.1037/0021-9010.62.4.422

Bernardin, H. J., Alvares, K. M., & Cranny, C. J. (1976). A recomparison of behavioral expectation scales to summated scales. Journal of Applied Psychology, 61(5), 564–570. https://doi.org/10.1037/0021-9010.61.5.564

Bernardin, H. John., & Beatty, R. W. (1984). Performance appraisal: Assessing human behavior at work (PWS, Ed.). Kent Pub. Co.

Borman, W. C., & Vallon, W. R. (1974). A view of what can happen when Behavioral Expectation Scales are developed in one setting and used in another. Journal of Applied Psychology, 59(2), 197–201. https://doi.org/10.1037/h0036312

Borman, W., & Dunnette, M. (1975). Behavior-based versus trait-oriented performance ratings: An empirical study. Journal of Applied Psychology, 60(5), 561–565. https://doi.org/10.1037/0021-9010.60.5.561

Campbell, J. P., Dunnette, M. D., Arvey, R. D., & Hellervik, L. V. (1973). The development and evaluation of behaviorally based rating scales. Journal of Applied Psychology, 57(1), 15–22. https://doi.org/10.1037/h0034185

Cone, C., Viswesh, V., Gupta, V., & Unni, E. (2018). Motivators, barriers, and strategies to improve response rate to student evaluation of teaching. Currents in Pharmacy Teaching and Learning, 10 (12), 1543-1549. https://doi.org/10.1016/J.CPTL.2018.08.020

Cunningham, S., Laundon, M., Cathcart, A., Bashar, M. A., & Nayak, R. (2023). First, do no harm: automated detection of abusive comments in student evaluation of teaching surveys. Assessment & Evaluation in Higher Education, 48(3), 377–389. https://doi.org/10.1080/02602938.2022.2081668

De-Juanas Oliva, Á., & Beltrán Llera, J. A. (2013). Valoraciones de los estudiantes de ciencias de la educación sobre la calidad de la docencia universitaria. Educación XX1, 17(1), 59–82. https://doi.org/10.5944/educxx1.17.1.10705

De La Orden, A. (2009). Evaluación y calidad: análisis de un modelo. Estudios sobre Educación, 16, 17–36. https://doi.org/10.15581/004.16.22426

Dickinson, T. L., & Zellinger, P. M. (1980). A comparison of the behaviorally anchored rating and mixed standard scale formats. Journal of Applied Psychology, 65(2), 147–154. https://doi.org/10.1037//0021-9010.65.2.147

Edwards, A., & Kenney, K. (1946). A comparison of the Thurstone and Likert techniques of attitude scale construction. Journal of Applied Psychology, 30(1), 72. https://doi.org/10.1037/h0062418

Escobar-Pérez, J., & Cuervo-Martínez, Á. (2008). Validez de contenido y juicio de expertos: Una aproximación a su utilización. Avances En Medición, 6, 27–36.

Feistauer, D., & Richter, T. (2016). How reliable are students’ evaluations of teaching quality? A variance components approach. Assessment & Evaluation in Higher Education, 47(8), 1–17. https://doi.org/10.1080/02602938.2016.1261083

Fernández Millán, J. M., & Fernández Navas, M. (2013). Elaboración de una escala de evaluación de desempeño para educadores sociales en centros de protección de menores. Intangible Capital, 9(3), 571–589. https://doi.org/10.3926/ic.410

Fogli, L., Hulin, C. L., & Blood, M. R. (1971). Development of first-level behavioral job criteria. Journal of Applied Psychology, 55(1), 3–8. https://doi.org/10.1037/h0030631

Gil Edo, M. T., Roca Puig, V., & Camisón Zornoza, C. (1999). Hacia modelos de calidad de servicio orientados al cliente en las universidades públicas: el caso de la Universitat Jaume I. Investigaciones Europeas de Dirección y Economía de La Empresa, 5(2), 69–92.

Gómez-García, M., Soto-Varela, R., Boumadan, M., & Matosas-López, L. (2023). Can the use patterns of social networks in university students predict the utility perceived in digital educational resources? Interactive Learning Environments, 31(3), 1279–1292. https://doi.org/10.1080/10494820.2020.1830120

González López, I. (2003). Determinación de los elementos que condicionan la calidad de la universidad: Aplicación práctica de un análisis factorial. RELIEVE - Revista Electrónica de Investigación y Evaluación Educativa, 9(1), 83–96. https://doi.org/10.7203/relieve.9.1.4351

Hadie, S. N. H., Hassan, A., Talip, S. B., & Yusoff, M. S. B. (2019). The Teacher Behavior Inventory: validation of teacher behavior in an interactive lecture environment. Teacher Development, 23(1), 36–49. https://doi.org/10.1080/13664530.2018.1464504

Harari, O., & Zedeck, S. (1973). Development of Behaviorally Anchored Scales for the Evaluation of Faculty Teaching. Journal of Applied Psychology, 58(2), 261–265. https://doi.org/10.1037/h0035633

Hernández Romero, G. (2022). Perspective of the university student on the practice of values in teachers. IJERI: International Journal of Educational Research and Innovation, 18, 132–150. https://doi.org/10.46661/IJERI.5453

Hom, P. W., DeNisi, A. S., Kinicki, A. J., & Bannister, B. D. (1982). Effectiveness of performance feedback from behaviorally anchored rating scales. Journal of Applied Psychology, 67(5), 568–576. https://doi.org/10.1037/0021-9010.67.5.568

Hornstein, H. A. (2017). Student evaluations of teaching are an inadequate assessment tool for evaluating faculty performance. Cogent Education, 4(1), 1–8. https://doi.org/10.1080/2331186X.2017.1304016

Huybers, T. (2014). Student evaluation of teaching: the use of best–worst scaling. Assessment & Evaluation in Higher Education, 39(4), 496–513. https://doi.org/10.1080/02602938.2013.851782

Keaveny, T. J., & McGann, A. F. (1975). A comparison of behavioral expectation scales and graphic rating scales. Journal of Applied Psychology, 60(6), 695–703. https://doi.org/10.1037/0021-9010.60.6.695

Kell, H. J., Martin-Raugh, M. P., Carney, L. M., Inglese, P. A., Chen, L., & Feng, G. (2017). Exploring methods for developing Behaviorally Anchored Rating Scales for evaluating structured interview performance, 1, 1-17. https://doi.org/10.1002/ets2.12152

Klimenko, O., Hernández-Flórez, N. E., Tamayo-Lopera, D. A., Cudris-Torres, L., Niño-Vega, J. A., Vizcaino-Escobar, A. E., Klimenko, O., Hernández-Flórez, N. E., Tamayo-Lopera, D. A., Cudris-Torres, L., Niño-Vega, J. A., & Vizcaino-Escobar, A. E. (2023). Assessment of the teaching performance favors to creativity in a sample of Colombian public and private educational institutions. Revista de Investigación, Desarrollo e Innovación, 13(1), 115–128. https://doi.org/10.19053/20278306.V13.N1.2023.16071

Layne, B. H., Decristoforo, J. R., & Mcginty, D. (1999). Electronic versus traditional student ratings of instruction. Research in Higher Education, 40(2), 221–232. https://doi.org/10.1023/A:1018738731032

Leguey Galán, S., Leguey Galán, S., & Matosas López, L. (2018). ¿De qué depende la satisfacción del alumnado con la actividad docente? Espacios, 39(17), 13–29.

Linse, A. R. (2017). Interpreting and using student ratings data: Guidance for faculty serving as administrators and on evaluation committees. Studies in Educational Evaluation, 54, 94–106. https://doi.org/10.1016/j.stueduc.2016.12.004

Lizasoain-Hernández, L., Etxeberria-Murgiondo, J., & Lukas-Mujika, J. F. (2017). Propuesta de un nuevo cuestionario de evaluación de los profesores de la Universidad del País Vasco. Estudio psicométrico, dimensional y diferencial. RELIEVE - Revista Electrónica de Investigación y Evaluación Educativa, 23(1), 1–21. https://doi.org/10.7203/relieve.23.2.10436

Lobo, J. (2023). Students’ acceptance of google classroom as an effective pedagogical tool for Physical Education. IJERI: International Journal of Educational Research and Innovation, 20, 1–15. https://doi.org/10.46661/IJERI.7535

Luna Serrano, E. (2015). Validación de constructo de un cuestionario de evaluación de la competencia docente. Revista Electrónica de Investigación Educativa, 17(3), 13-27.

Marsh, H. W. (1982). SEEQ: a reliable, valid, and useful instrument for collecting students’ evaluations of university teaching. British Journal of Educational Psychology, 52(2), 77–95. https://doi.org/10.1111/j.2044-8279.1982.tb02505.x

Marsh, H. W. (1991). A multidimensional perspective on students’ evaluations of teaching effectiveness - reply to Abrami and Dapollonia (1991). Journal of Educational Psychology, 83(3), 416–421. https://doi.org/10.1037//0022-0663.83.3.416

Marsh, H. W. (2007). Students’ Evaluations of University Teaching: Dimensionality, Reliability, Validity, Potential Biases and Usefulness. In S. J. C. Perry R.P. (Ed.), The Scholarship of Teaching and Learning in Higher Education: An Evidence-Based Perspective (pp. 319–383). Springer. https://doi.org/10.1007/1-4020-5742-3_9

Martínez, A., Smith, K., Llop-Gironés, A., Vergara, M., & Benach, J. (2016). La mercantilización de la sanidad: El caso de Catalunya. Cuadernos de Relaciones Laborales, 34(2), 335–355. https://doi.org/10.5209/CRLA.53460

Martin-Raugh, M., Tannenbaum, R. J., Tocci, C. M., & Reese, C. (2016). Behaviourally Anchored Rating Scales: An application for evaluating teaching practice. Teaching and Teacher Education, 59, 414–419. https://doi.org/10.1016/j.tate.2016.07.026

Mateo, J. (2000). La evaluación del profesorado y la gestión de la calidad de la educación. Hacia un modelo comprensivo de evaluación sistemática de la docencia. Revista de Investigación Educativa, 18(1), 7–34.

Matosas-López, L. (2023). Measuring Teaching Effectiveness with Behavioral Scales: A Systematic Literature Review. The International Journal of Educational Organization and Leadership, 30(1), 43–58. https://doi.org/10.18848/2329-1656/CGP/V30I01/43-58

Matosas-López, L., & Cuevas-Molano, E. (2022). Assessing Teaching Effectiveness in Blended Learning Methodologies: Validity and Reliability of an Instrument with Behavioral Anchored Rating Scales. Behavioral Sciences, 12(10), 394–414. https://doi.org/10.3390/bs12100394

Matosas-López, L., Leguey-Galán, S., & Doncel-Pedrera, L. M. (2019). Converting Likert scales into Behavioral Anchored Rating Scales (Bars) for the evaluation of teaching effectiveness for formative purposes. Journal of University Teaching & Learning Practice, 16(3), 1–24. https://doi.org/https://doi.org/10.53761/1.16.3.9

Matosas-López, L., Leguey-Galán, S., & Leguey-Galán, S. (2019). Cómo resolver el problema de pérdida de información conductual en el diseño de Behaviorally Anchored Rating Scales-BARS. El caso de la medición de la eficiencia docente en el contexto universitario. Espacios, 40(19), 6–21.

Matosas-López, L., Muñoz-Cantero, J. M., Molero, D., & Espiñeira-Bellón, E. M. (2023). Estudio psicométrico de un cuestionario con BARS. Una oportunidad para mejorar los programas de medición de la eficacia docente y la toma de decisiones en los procesos de acreditación. Revista Interuniversitaria de Formación del Profesorado, 98(37.1), 95–120. https://doi.org/10.47553/RIFOP.V98I37.1.97313

Matosas-López, L., Romero-Ania, A., & Cuevas-Molano, E. (2019). ¿Leen los universitarios las encuestas de evaluación del profesorado cuando se aplican incentivos por participación? Una aproximación empírica. Revista Iberoamericana Sobre Calidad, Eficacia y Cambio en Educación, 17(3), 99–124. https://doi.org/10.15366/reice2019.17.3.006

Molero López-Barajas, D. M., & Ruiz Carrascosa, J. (2005). La evaluación de la docencia universitaria. Dimensiones y variables más relevantes. Revista de Investigación Educativa, 23(1), 57–84.

Morley, D. D. (2012). Claims about the reliability of student evaluations of instruction: The ecological fallacy rides again. Studies in Educational Evaluation, 38(1), 15–20. https://doi.org/10.1016/j.stueduc.2012.01.001

Nygaard, C., & Belluigi, D. Z. (2011). A proposed methodology for contextualised evaluation in higher education. Assessment & Evaluation in Higher Education, 36(6), 657–671. https://doi.org/10.1080/02602931003650037

Pedregosa, P. R. (2022). Identification of self-esteem levels in secondary school students according to: sex, grade an area of origin. IJERI: International Journal of Educational Research and Innovation, 18, 170–183. https://doi.org/10.46661/IJERI.6090

Perdomo Ortiz, J., & González Benito, J. (2004). Medición de la gestión de la calidad total: una revisión de la literatura. Cuadernos de Administración, 17(24), 91–109.

Reardon, M., & Waters, L. K. (1979). Leniency and Halo in Student Ratings of College Instructors: A Comparison of Three Rating Procedures with Implications for Scale Validity. Educational and Psychological Measurement, 39(1), 159–162. https://doi.org/10.1177/001316447903900121

Remmers, H. H. (1928). The relationship between students’ marks and student attitude toward instructors. School & Society, 28, 759–760.

Remmers, H. H. (1971). Rating methods in research of teaching. In Gage & N. L. (Ed) (Eds.), Handbook of research on teaching. Rand McNally.

Roszkowski, M. J., & Soven, M. (2010). Shifting gears: consequences of including two negatively worded items in the middle of a positively worded questionnaire. Assessment & Evaluation in Higher Education, 35(1), 113–130. https://doi.org/10.1080/02602930802618344

Ruiz Carrascosa, J. (2000). La evaluación de la enseñanza por los alumnos en el plan nacional de evaluación de la calidad de las universidades. Construcción de un instrumento de valoración. Revista de Investigación Educativa, 18(2), 433–445.

Santos-Rego, M. A., Sotelino-Losada, A., Jover-Olmeda, G., Naval, C., Álvarez-Castillo, J. L., & Vázquez-Verdera, V. (2017). Diseño y validación de un cuestionario sobre práctica docente y actitud del profesorado universitario hacia la innovación. Educación XX1, 20(2), 39–71. https://doi.org/10.5944/educxx1.19031

Sharon, A. T., & Bartlett, C. J. (1969). Effect of instructional conditions in producing leniency on two types of rating scales. Personnel Psychology, 22(3), 251–263. https://doi.org/10.1111/j.1744-6570.1969.tb00330.x

Shultz, M. M., & Zedeck, S. (2011). Predicting Lawyer Effectiveness: Broadening the Basis for Law School Admission Decisions. Law & Social Inquiry, 36(03), 620–661. https://doi.org/10.1111/j.1747-4469.2011.01245.x

Sierra Sánchez, J. (2012). Factors influencing a student’s decision to pursue a communications degree in Spain. Intangible Capital, 8(1), 43–60. https://doi.org/10.3926/ic.277

Sigurdardottir, M. S., Rafnsdottir, G. L., Jónsdóttir, A. H., & Kristofersson, D. M. (2023). Student evaluation of teaching: gender bias in a country at the forefront of gender equality. Higher Education Research & Development, 42(4), 954–967. https://doi.org/10.1080/07294360.2022.2087604

Spooren, P. (2010). On the credibility of the judge. A cross-classified multilevel analysis on students’ evaluation of teaching. Studies in Educational Evaluation, 36(4), 121–131. https://doi.org/10.1016/j.stueduc.2011.02.001

Spooren, P., Brockx, B., & Mortelmans, D. (2013). On the Validity of Student Evaluation of Teaching: The State of the Art. Review of Educational Research, 83(4), 598–642. https://doi.org/10.3102/0034654313496870

Spooren, P., & Loon, F. Van. (2012). Who participates (not)? A non-response analysis on students’ evaluations of teaching. Procedia - Social and Behavioral Sciences, 69, 990–996. https://doi.org/10.1016/j.sbspro.2012.12.025

Spooren, P., Mortelmans, D., & Thijssen, P. (2012). ‘Content’ versus ‘style’: acquiescence in student evaluation of teaching? British Educational Research Journal, 38(1), 3–21. https://doi.org/10.1080/01411926.2010.523453

Spooren, P., Vandermoere, F., Vanderstraeten, R., & Pepermans, K. (2017). Exploring high impact scholarship in research on student’s evaluation of teaching (SET). Educational Research Review, 22, 129–141. https://doi.org/10.1016/j.edurev.2017.09.001

Stoskopf, C. H., Glik, D. C., Baker, S. L., Ciesla, J. R., & Cover, C. M. (1992). The reliability and construct validity of a Behaviorally Anchored Rating Scale used to measure nursing assistant performance. Evaluation Review, 16(3), 333–345. https://doi.org/10.1177/0193841X9201600307

Toland, M. D., & De Ayala, R. J. (2005). A multilevel factor analysis of students’ evaluations of teaching. Educational and Psychological Measurement, 65(2), 272–296. https://doi.org/10.1177/0013164404268667

Uttl, B., & Smibert, D. (2017). Student evaluations of teaching: Teaching quantitative courses can be hazardous to one’s career. PeerJ, 5, 1–13. https://doi.org/10.7717/peerj.3299

Valero, M. M., & Gonzalez, J. M. G. (2017). El modelo de acreditación del sistema sanitario público en Andalucía. Cuadernos de Relaciones Laborales, 35(1), 187–208. https://doi.org/10.5209/CRLA.54989

Vanacore, A., & Pellegrino, M. S. (2019). How Reliable are Students’ Evaluations of Teaching (SETs)? A Study to Test Student’s Reproducibility and Repeatability. Social Indicators Research, 146, 77–89 https://doi.org/10.1007/s11205-018-02055-y

Veciana Vergés, J. M., & Capelleras i Segura, J. Ll. (2004). Calidad de servicio en la enseñanza universitaria desarrollo y validación de una escala media. Revista Europea de Dirección y Economía de La Empresa, 13(4), 55–72.

Williams, W. E., & Seiler, D. A. (1973). Relationship between measures of effort and job performance. Journal of Applied Psychology, 57(1), 49–54. https://doi.org/10.1037/h0034201

Zhao, J., & Gallant, D. J. (2012). Student evaluation of instruction in higher education: exploring issues of validity and reliability. Assessment & Evaluation in Higher Education, 37(2), 227–235. https://doi.org/10.1080/02602938.2010.523819