Given the statement above, it is disappointing that in recent years, while learning analytics (LA) has  increasingly offered opportunities to monitor student learning through online assessment, the capacity to analyse students’ free text responses has not advanced as rapidly. Although there are useful tools for checking grammar and style, more complex text analysis has largely remained the domain of linguistics specialists.

Dr Jenny McDonald The CLeaR SEED fund recently sponsored a text analytics (TA) seminar series led by Dr Jenny McDonald, who is researching the educational affordances of TA and developing a tool to unlock and utilise data emanating from student responses to open-ended questions, particularly in the short answer format.

Mining this rich source of data is limited by the scarcity of TA tools that work effectively on short documents. Jenny’s software aims to address this gap while providing information to teachers and feedback to students. She would like to see it used to encourage a teaching and learning dialogue on the meaning that students are taking from their teachers.  She has found that students often cling to the words that they understand and insert them incongruously into their writing, developing weird mash-ups of correct and incorrect memories.

This post contains my impressions of the sessions I attended in this seminar series. Since I don’t claim to be an expert in LA or TA, I’ve provided links to further information, to (hopefully) avoid  a ‘weird mash-up of correct and incorrect memories’. I’ve been a fly on the wall at many discussions among LA enthusiasts but I confess that, while I find its potential exciting and  happily explore and make use of user-friendly LA, my eyes can glaze over when the discussion grows more technical or esoteric.

In her introduction, Jenny focused on three key points:

  • Language as data: What can the language of learning environments tell us about our teaching and student learning?
  • Teaching and learning as dialogue: Making the case for the central dialogic relationship between teaching and learning
  • Practical approach to analysis: Introduce simple tools to support reflection on language in use.

Text analysis isn’t new.  At the first opportunity, one attendee — a linguistics and literature specialist — promptly imported a Jane Austen novel and ran it through one of the TA programs suggested in the presentation. The Gutenburg corpus provides a standard for comparison for older literature, enabling analysis of things like gendered language and character interaction in Jane Austen’s novels. Digital corpora in many languages for a wide range of disciplines now provide relevant standards for text analysis in many other contexts.

Some TA tools can help you gauge whether you are teaching at the right level. You can compare key words (which commonly float to the top of an analysis) per expert and per student. If there is a big disparity you might want to re-think the way you are doing things.  We explored several indices of readability including the delightfully named SMOG filter (simple measure of gobbledegook – it roughly relates to the number of syllables per word and provides an indication of the reading level required, in terms of years of schooling, in order to make sense of the text). It can similarly prove helpful to establish whether you are addressing your audience appropriately.   (The SMOG filter is valued in public health to ensure brochures for general consumption are written appropriately for the lay person.)

Looking at key words or phrases in responses to short answer questions can help you pick up common student misconceptions. You can then address them either online or face-to-face in class.

TA can also help establish how fluent students are in the language of your discipline. Another attendee – also a linguist – uses Lex tutor to discover whether her (Business) students are readily using the appropriate  ‘academic’ words. As formative feedback, she tells students the outcome. Showing you understand the language of the discipline – as determined by Lex tutor –  earns marks in this paper.

But as Jenny stressed, language is complex and difficult to contain within algorithms and the like.  She illustrated this with quotes from a rare BBC recording of Virginia Woolf:


Virginia Woolf
‘Words belong in the mind’
 ‘[Words] are the wildest, free-est, most irresponsible, most unteachable of all things’

Jenny urged us to make this our mantra: “Text analysis is not always perfect!”

Often it is difficult to delve deeply without information on your specific context. Jenny’s software, called Quantext, aims to make it easy to customise for context and will be the subject of a pilot study in Semester 2 this year. You can:

  • edit the blacklist (words excluded from the analysis. eg ‘stop’ words like ‘to’, the’)
  • edit the ‘white list’ (which ensures particular words are included in the analysis)
  • turn punctuation on or off
  • add common mis-spellings
  • filter for stemming (eg teach, teaching, taught all have the same stem)
  • replace negatives (eg not increasing may = decreasing)
  • find key pairs – words that are paired more often than expected by chance
  • Create categories or labels and apply them to student answers.

You can read more detail on Jenny’s rationale and an analysis of student written responses to short-answer questions posed in a large first year health sciences course in Short answers to deep questions: supporting teachers in large-class settings

Short answers to deep questions: supporting teachers in large-class settings.  (McDonald, J., Bird R.J.,Zouaq, A., Moskal, A.D.M. (2017) Journal of  Computer Assisted Learning).

For further information on Quantext and the pilot study, you can contact either Jenny or A/Prof Cathy Gunn or Dr Claire Donald at CLeaR.


Useful links provided at the workshops

Corpus MOOC

Basic online tools

More advanced online tools

Full featured tools – download or registration required
Kilgarriff, Adam, et al. The Sketch Engine: ten years on. In Lexicography(2014): 1–30. Available from
Anthony, L. (2014). AntConc (Version 3.4.3) [Computer Software]. Tokyo, Japan:
Waseda University. Available from

Programming libraries
Natural Language Toolkit for Python

Introduction to Text Mining in R

Writing Style
The Writer’s Diet

Skip menu