´ëÇѾð¾îÇÐȸ ÀüÀÚÀú³Î

´ëÇѾð¾îÇÐȸ

29±Ç 2È£ (2021³â 6¿ù)

µö·¯´× ±â¹Ý ´Ü¾î ÀÓº£µùÀ» Àû¿ëÇÑ »çÁø ÀÚ¸· ¿µÀÛ¹® äÁ¡ ½Ã½ºÅÛ

±èµ¿¼º

Pages : 1-20

DOI : https://doi.org/10.24303/lakdoi.2021.29.2.1

PDFº¸±â

¸®½ºÆ®

Abstract

Kim, Dongsung. (2021). Automatic scoring system for picture-based English caption writing test adopting deep learning based word-embedding. The Linguistic Association of Korea Journal, 29(2), 1-20. Since human grading of English writing requires substantial resources, many researchers in the area of Computer-Assisted Language Learning (CALL) have been focusing on automatic scoring systems based on natural language processing systems, machine learning, and other automatic processing mechanisms. English Testing Services (ETS) announced several automatic scoring systems for English writing. In this paper, we suggest using a deep learning based automatic scoring system for an English caption writing test. Our method involves using a sentence similarity measurement, which compares different levels of answer sentences with user writing input. We chose different word embedding types (Word2Vec, Word Movers Distance (WMD), Bidirectional Encoder Representations from Transformers (BERT)) and Abstract Meaning Representation (AMR), a linguistic model for comparing semantic differences between two sentences based on semantic representation. Scoring systems should not only satisfy the requirements of complicated scoring rubrics but also meet the conditions of a language proficiency test. Our results show that BERT outperforms three competitive models in predicting accurate scoring levels and also shows the characteristics of the criterion reference which could theoretically express the standards of a language proficiency test.

Keywords

# ÄÄÇ»ÅÍ ¾ð¾îº¸Á¶ÇнÀ(computer assisted language learning) # µö·¯´×(deep learning) # ¿µÀÛ¹®(English writing) # ´Ü¾î ÀÓº£µù (word embedding) # ÁØ°ÅÂüÁ¶°Ë»ç(criterion-referenced test) # äÁ¡(scoring)

References

  • ±èµ¿¼º, äÈñ¶ô, ÀÌ»óö. (2008). ¹®¹ý¼º°ú ¾îÈÖ ÀÀÁý¼º ±â¹ÝÀÇ ¿µ¾î ÀÛ¹® Æò°¡ ½Ã½ºÅÛ. ÀÎÁö°úÇÐ, 19(3), 223-255.
  • ±èµ¿¼º. (2016). Ãß»óÀû ÀÇ¹Ì Ç¥»óÀ» È°¿ëÇÑ »çÁø ÀÚ¸· ¿µÀÛ¹® Æò°¡. ¾ð¾îÇÐ, 24(4), 1-26.
  • ±èÁöÀº, ÀÌ°øÁÖ. (2007). ÁßÇлý ¿µÀÛ¹® ½Ç·Â Çâ»óÀ» À§ÇÑ ÀÚµ¿ ¹®¹ý äÁ¡ ½Ã½ºÅÛ ±¸Ãà. Çѱ¹ÄÜÅÙÃ÷ÇÐȸ³í¹®Áö, 7(5), 36-46.
  • ¹Î¼±½Ä. (2008). Toeic Writing Test °ø½Ä¹®Á¦Áý. ¼­¿ï: ½Ã»ç¿µ¾î»ç.
  • Áø°æ¾Ö. (2007). ¿µÀÛ¹® ÀÚµ¿ äÁ¡ ½Ã½ºÅÛ °³¹ß ¿¬±¸. ¿µ¾î¾î¹®±³À°, 13(1), 235-259.
  • Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rator V.2. Journal of Technology, Journal of Learning, and Assessment, 4(3), 3-30.
  • Botvin, G., & Sutton, S. (1977). The development of structural complexity in children¡¯s fantasy narratives. Developmental Psychology, 13(4), 377–388.
  • Cai, S., & Knight, K. (2013). Smatch: an evaluation metric for semantic feature structures. In Proceedings of the ACL, 748-752.
  • Cohen, C., Higham, C., & Nabi, S. (2020). Deep learnability. Frontiers in AI, 3(43), 1-11.
  • Condon, W. (2009). Looking beyond judging and ranking. Assessing Writing, 14(3), 141-56.
  • Dalad, N., & Manoj, N. (2018). Transforming second language acquisition modeling. In Proceedings of NIPS, 1-9.
  • Farouk, M. (2019). Measuring sentence similarity. Indian Journal of Science and Technology, 12(25), 1-11.
  • Firth, J. (1957). Papers in Linguistics 1934–1951 (1957) London: Oxford University Press.
  • Goldberg, Y. (2019). Assessing BERT¡¯s syntactic abilities. arXiv:1901.05287.
  • Heift, T., & Schulze M. (2007). Errors and Intelligence in Computer-Assisted Language Learning. New York: Routledge.
  • Jawahr, G., Benoit, S., & Seddah, D. (2019). What does BERT learn about the structure of language? In Proceedings of ACL, 3651-3657.
  • Jurafsky, D., & Martin, J. (2020). Speech and language processing. London, UK: Pearson.
  • Kusner, M., Sun, S., Klkin N., & Weinberger, K. (2015). From word embeddings to document distance. In Proceedings of International Conference on Machine Learning, 957-66.
  • Leacock, C., & Chodorow, M. (2003). C-rater. Computers and Humanities, 37, 389-405.
  • Leacock, C., Chodorow, M., Gamon, M., & Tetreault, J. (2014). Automated Grammatical Error Detection for Language Learners. San Rafael, CA: Morgan & Claypool Publishers.
  • Manning, C. (2017). Representations for Language. Retrieved January 26, 2021, from http://simons.berkeley.edu/sites/default/files/docs/6449/christo phermanning.pdf.
  • McKeough, A., & Malcolm, J. (2011). Stories of family, stories of self: Developmental pathways to interpretive thought during adolescence. New Directions for Child & Adolescent Development, 2011(131), 59-71.
  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Proceedings of NIPS, 3111-9.
  • Nagata, N. (1996). Computer vs. Workbook instruction in second language acquisition. CALICO Journal, 14(1), 53–75.
  • Ramasinghe, T., Orasan, S., & Mitkov, R. (2019). Enhancing unsupervised sentence similarity methods with deep contextualized word representation. In Proceedings of the Recent Advances in NLP, 994-1003.
  • Rivers, W. (1981). Teaching foreign-language skills. Chicago: Univ. of Chicago Press.
  • Somasundaran, S., Lee, C., Chodorow, M., & Wang, X. (2015). Automated scoring of picture-based story narration. In Proceedings of Innovative Use of NLP for Building Educational Applications, 42–48.
  • Settles, B., LaFlair, G., & Hagiwara, M. (2020). Machine learning-driven language asessment. In Proceedings of the Transactions for Computational Linguistics, 247-263.
  • Sukkarieh, J., & Stoyanchev, S. (2009). Automating model building in c-rater. In Proceedings of Applied Textual Inference, 61-69.
  • Turc, I., Chang, M., Lee, K., & Toutanova, K. (2019). Well-read students learn better: On the importance of pre-training compact models. Unpublished manuscript.
  • Warstadt, A., & Bowman, S. (2020). Can neural networks acquire a structural bias from raw linguistic data? In Proceedings of Cognitive Science Society, 1737-1943.
  • Weigle, S. (2010). Validation of automated scores of TOEFL iBT tasks against non-test indicators of writing ability. Language Testing, 27(3), 335–353.
  • Weigle, S. (2011). Validation of automated scores of TOEFL iBT tasks against non-test indicators of writing ability. TOEFL iBT Research Report TOEFL iBT-15. Princeton, NJ: Educational Testing Service.
  • Wiegle, S. (2013). English as a second language writing and automated essay evaluation. In M. Shermis & J. Burstein (Eds.), Handbook of automated essay evaluation (pp. 36-54). New York: Routledge.