A strategy to encourage and evaluate Engineering students’ understanding of structural design concepts

This project was developed by James LimVicente Gonzalez and Raj Das from Civil Engineering, Faculty of Engineering, at the University of Auckland. The resource – a strategy for helping students grasp threshold concepts by making them provide written responses to questions using an online web application, Xorro-Q – might be useful to any teachers of difficult concepts across STEM disciplines.

The project was supported by the SEED Fund grants for 2017 and was trialed by James’ students of CIV211: Structures and Design I that year.

For similar projects, see: Learning Writing by Using Scenario-Based QuestionsWriting Reflective Scripts to Overcome Misconceptions and Difficulties in Maths;  Interteaching to Enhance Student Experience and Performance; Operations Management Simulator.

Project background


1) Students do not easily understand threshold concepts, and mathematically-driven lectures may not be the most effective means of helping students;
2) Requiring students to write engages them more in the thinking process of conceptual understanding, making their learning more efficient.

James Lim, who has experience in conveying threshold and higher-level threshold concepts in second-year mechanics, decided to adopt a number of techniques for teaching his students, including:

1) explanation of concepts without mathematics before demonstration of the maths behind the concepts;
2) LIVESCRIBE/ PENCAST recordings;
4) use of on-line feedback-rich activities developed in XORRO-Q with third-year student volunteers. These assess and enhance conceptual understanding (for the second-year students). They allow students to repeat the activities until the concept is demonstrably grasped. (XORRO-Q is an automated assessment tool now integrated into CANVAS. It uniquely enables students to submit sketches – say, of a structure deflecting – providing automated assessment and relevant feedback to the students, as well as class-wide views for the lecturer).

Evidence from student surveys and feedback sessions suggested to James that these initiatives are well received. At the same time, he felt that exam and class test results still demonstrated that certain threshold concepts (such as, e.g., bending moment diagrams, etc.) were not fully grasped by his students. James had recently introduced a discursive writing question in the class-tests that asked the students to convey their understanding of such concepts.

In Engineering education, achieving early understanding of key threshold concepts has a significant impact on students’ success in the subsequent years. Integration of these threshold concepts (the ability to connect and apply them to new or novel problems) is a particularly challenging aspect of education across Engineering and various other disciplines, including Science and Economics.

The hypothesis that James wished to investigate in his SEED grant project is that writing and presenting explanations to peers for conceptually challenging subjects can lead students to address gaps in their underlying understanding of concepts rather than merely to adopt ritualised, formula-based approaches when attempting to solve a problem.

He decided that students would work in small groups, and provide a written conceptual explanation of a concept and its application to a tutorial question (a separate activity, for which a mathematical solution would already be provided). Having provided such a written explanation, the students would then be asked to peer-evaluate the explanations of other students. The hypothesis would be tested through class-surveys and focus groups. James chose XORRO-Q as the software to provide the framework around which the groups and peer evaluation would be organised.

The project ran in Summer School, Semester One and Semester Two, 2017.

Project reflection

Engineering students are trained to and can usually easily do the math and apply formulas to work. However, in order to apply the right formula to solve a specific question, and to understand what they are doing and why, our future engineers first need to comprehend the underlying concepts.

One of the best tools for teachers to make sure that their students understand such concepts is test questions. The downfall of this tool is the large amount of time it requires for the teacher to evaluate the answers and grade the test. Time expense prevents the possibility of repeating the exercise to continuously test the students’ understanding of concepts.

Using a software tool like Xorro-Q automates the grading process because, through the use of keywords or phrases, it detects students’ understanding of concepts taught. Here, the difficulty lies in getting the tests “sensitive” enough in order to detect all the correct answers, and specific enough to avoid “false positive” answers.

By the sensitivity of a test we mean the ratio of the number of positively identified positive answers (true positive) to the number of total positive answers (true positive and false negative). The specificity we define as the ratio of the number of correctly identified negative answers (true negative) to the number of total negative answers (true negative and false positive).

It took us several iterations of testing the questions and going over the corrections given by Xorro-Q in order to detect false positive and false negative answers and so improve the correction tool.

What did our students think of the exercise? Overall, they were very receptive to the task and appreciated the possibility given to them to repeat attempts at tackling the test questions. In the feedback, some students showed disappointment with the tool failing to positively identify some of their answers. At other times, they felt confused with how to interpret the questions correctly, and/or what was expected as an answer from them.

Our further work on this project will include continuing the improvement of both the sensitivity and specificity of the automated correction of questions through the Xorro-Q web application. This includes extracting the data and analysing false positive and false negative results in order to amend keywords and phrases that the application looks for in the text replies.

We are still analysing some data (as a “writing” question was also included in the final exam at the end of the previous semester), but this is what we have got so far.

The particular activity (prepared by our last year’s intern Emily Brandsma) was run once during a class (with 42 participants), and was accessed as a self-paced online activity on CANVAS by 248 students (pretty much the entire cohort). A total of 691 results were collected, meaning that, on the average, students made 691 divided by 248, or 2.8 attempts, on the activity. As a result, we have a great set of text contributions, which can now be used to improve the same questions for next year.

The maximum score available for the activity was 24 points. The following table shows how the class fared as their attempts grew (this data is from the self-paced CANVAS group only, i.e., it excludes the in-class data and any test data by myself or others who would have found it too easy).

Attempt 1 2


4 5 6 7 8
Count 249 182 102 55 31 19 11 8
Min 0 0 0 0 2 0 0 0
LQ 4 7 9.25 9 11 10 4 4
Median 8 12 14 15 16 17 7 15
UQ 12 17 17.8 18 20 18 18 19
Max 24 23 24 22 24 23 24



It’s pretty impressive, I think, that more than 12% of the class completed five or more attempts. One participant had 20 goes at it, lifting his score from “2” in the first attempt to “22” in his last (an almost perfect score).

When we look at the data from students who made above six or seven attempts, the sample size is too small and the results become harder to interpret. These could be students who are really struggling with the English language, or they might be perfectionists pushing for top marks, or perhaps those may be people who are satisfied with their max score but are doing repeat rounds to work out answers to just one or two questions (and so leaving other questions blank, as zero). The data from the students who made the highest number of attempts are not so useful for us, I think, as the others.

The median score rises from 8/24 in the first attempt, to 16/24 by the fifth attempt. The LQ and UQ also increase proportionately. So there is obvious learning taking place…

There were 47 feedback submissions made by students. Of these, the great majority complained about the obvious errors made by Xorro-Q; that they made submissions which were marked as wrong; that they found the task too tedious… But there were some positive responses too. What follows is a pretty representative mix of student comments, which, I think, can be taken as an overall positive response, provided that we improve the test questions to enhance assessment as well as improve the feedback given to the students.

frustrating activity. I do feel i was forced to think deeply but please go over my answers once again as i believe i have missed out on marks which i should have been awarded.
Good concept of the questions however it is very fussy if relying on key words to judge whether the answer is correct or not. Improvements all in time I suppose :).
Good learning experience
great questions. learned a lot. liked the answers provided. very helpful. thank you.
these are some good questions. they really got me thinking.
this is a waste of time
this is actually quite silly of an exercise. haha im tilted
This thing is super annoying, however very useful I feel.
Solution for these type questions or tips.
I dont understand how the questions i got wrong are completely wrong
i feel as though im getting the answers correct but they keep marking me wrong
I feel like my definitions for second moment of area and radius of gyration are correct, however I didn’t get any marks for them.
I got it kin
Im pretty sure my answers are right, or somewhat on the right track but it keeps marking zero
im pretty sure that the answers that got marked wrong were actually right… the wording was perhaps different or an idea was presented differently, but still right, right?

It’s hard to know whether my answers are wrong, or if my method of explaining it is not being registered by the system. I think the idea of this is nice but perhaps needs more development before it is a useful learning tool. I have only become more confused since doing this activity


I think, this is encouraging, especially considering that these are responses by the very first round of participants tackling an activity, which we knew would be full of errors needing to be fixed once we got the first participation data back.

Skip menu