Proctorio
Proctorio
Abstract:            In this paper we report on the outcome of a controlled experiment using one of the widely available and used
                     online proctoring systems, Proctorio. The system uses an AI-based algorithm to automatically flag suspicious
                     behaviour, which can then be checked by a human agent. The experiment involved 30 students, 6 of which
                     were asked to cheat in various ways, while 5 others were asked to behave nervously but make the test honestly.
                     This took place in the context of a Computer Science programme, so the technical competence of the students
                     in using and abusing the system can be considered far above average.
                     The most important findings were that none of the cheating students were flagged by Proctorio, whereas only
                     one (out of 6) was caught out by an independent check by a human agent. The sensitivity of Proctorio, based
                     on this experience, should therefore be put at very close to zero. On the positive side, the students found
                     (on the whole) the system easy to set up and work with, and believed (in the majority) that the use of online
                     proctoring per se would act as a deterrent to cheating.
                     The use of online proctoring is therefore best compared to taking a placebo: it has some positive influence,
                     not because it works but because people believe that it works, or that it might work. In practice however,
                     before adopting this solution, policy makers would do well to balance the cost of deploying it (which can be
                     considerable) against the marginal benefits of this placebo effect.
                                                                                                                                               279
Bergmans, L., Bouali, N., Luttikhuis, M. and Rensink, A.
On the Efficacy of Online Proctoring using Proctorio.
DOI: 10.5220/0010399602790290
In Proceedings of the 13th International Conference on Computer Supported Education (CSEDU 2021) - Volume 1, pages 279-290
ISBN: 978-989-758-502-9
Copyright c 2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
CSEDU 2021 - 13th International Conference on Computer Supported Education
we concentrate on one particular solution that has                 that they had taken in the past, meaning that the na-
found widespread adoption: that of online proctoring.              ture of the questions and the expected kind of answers
In particular, we describe an experiment in using one              were familiar. Six out of the 30 students were asked to
of the three systems for online proctoring that have               cheat during the test, in ways to be devised by them-
been recommended in the quickscan (see (Quickscan                  selves, so as to fool the online proctor; the rest be-
SURF, 2020)) by SURF, a “collaborative organisation                haved honestly. Moreover, out of the 24 honest stu-
for ICT in Dutch education and research” of which all              dents, five were asked to act nervously; in this way
public Dutch institutes of higher education are mem-               we wanted to try and elicit false positives from the
bers.1                                                             system.
                                                                       Besides Proctorio’s capabilities for automatic
Approach. Online proctoring refers to the princi-                  analysis, we also conducted a human scan of the (an-
ple of remotely monitoring the actions of a student                notated) videos, by staff unaware of the role of the stu-
while she is taking a test, with the idea of detecting             dents (but aware of the initial findings of Proctorio).
behaviour that suggests fraud. The monitoring con-                 We expected that humans would be better than the
sists of using camera, microphone and typically some               AI-based algorithm in detecting certain behaviours as
degree of control over the computer of the student.                cheating, but worse in maintaining a sufficient and
The detection can be done by a human being (the                    even level of attention during the tedious task of mon-
proctor, also called invigilator in other parts of the             itoring.
Anglosaxon world), or it can be done through some
AI-based algorithm — or a combination of both.                     Findings.    Summarising, our main findings were:
    The question we set out to answer in this paper is:             • The automatic analysis of Proctorio detected none
how well does it work? In other words, is online proc-                of the cheating students; the human reviewers de-
toring a good way to detect actual cheating, without                  tected 1 (out of 6). Thus, the percentage of false
accusing honest students — in more formal terms: is                   negatives was very large, pointing to a very low
it both sensitive and specific? How do students expe-                 sensitivity of online proctoring.
rience the use of proctoring?
    In answering this question, we have limited our-                • None of the honest students were flagged as sus-
selves to a single proctoring system, Proctorio2 ,                    picious by Proctorio, whereas one was suspected
which is one of the three SURF-approved systems of                    by the human reviewer. Thus, the percentage of
(Quickscan SURF, 2020). The main reason for se-                       false positives was zero for the automatic detec-
lecting Proctorio is the usability of the system; it is               tion, and 4% for the human analysis, pointing to
possible to use it on the majority of operating systems               a relatively high specificity achievable by online
by installing a Google Chrome extension and it can                    proctoring (which, however, is quite useless in the
be used for large groups of students. It features au-                 light of the disastrous sensitivity).
tomatic detection of behaviour deemed suspicious in                Furthermore, we gained valuable insights into the
a number of categories, ranging from hand and eye                  conditions necessary to make online proctoring an ac-
movement to computer usage or sound. The teacher                   ceptable measure in the opinion of the participating
can select the categories she wants to take into ac-               students.
count, as well as the sensitivity level at which the
behaviour is flagged as suspicious, at any point dur-              The outcome of the experiment is presented in more
ing the proceedings (before, during or after the test).            detail in Section 3, and discussed in Section 4 (includ-
Proctorio outputs an annotated real-time recording for             ing threats to validity). After discussing related work
each student, which can be separately checked by the               (Section 5), in Section 6 we draw some conclusions.
teacher so that the system’s suspicions can be con-
firmed or negated. The system is described in some
detail in Section 2.                                               2    EXPERIMENTAL SETUP
    Using Proctorio, we have conducted a controlled
randomized trial involving 30 students taking a test               To prepare the experiment, we had to find and instruct
specifically set for this experiment. The students were            participants, choose the technical setup, and deter-
volunteers and were hired for their efforts; their re-             mine what kind of data we wanted to do gather be-
sults on the test did not matter to the experiment in              sides the results of Proctorio’s automatic fraud detec-
any way. The subject of the test was a first-year course           tion.
   1 See   https://surf.nl
   2 See   https://proctorio.com/
280
                                                                       On the Efficacy of Online Proctoring using Proctorio
Participant Selection. At the time we carried out               tering more than they would normally do. (They
the experiment, the issue of online proctoring had al-          were not instructed in particular ways to act ner-
ready received quite a bit of attention nationally and          vous.) The underlying hypothesis was that such
had been discussed among the Computer Science stu-              behaviour could possibly generate false positives.
dents; even though the University of Twente had early           These students were randomly selected from the
on decided not to rely on online proctoring, the atti-          participant list (after the cheaters had been cho-
tude of the student body was overwhelmingly nega-               sen) and were asked to confirm their willingness
tive. Thus, we took our experiment as an opportunity            to take up this role.
to involve the students more constructively and to be        • Default: A control group of 19 students who took
transparent in our investigation. Moreover, we wished          the test as though this was a real exam, behaving
to remove confounding influences, such as the intrin-          as they would under ordinary circumstances and
sic motivation of students to score well on the test. In       not trying to cheat.
this light, we made the following choices regarding
the selection of participants:                              The six cheating students were not given a specific
                                                            task. They did their own research and decided on
  • We called for volunteer participants among all the
                                                            their course of action. They were informed of the
    Bachelor Computer Science students, this being
                                                            identities of the other cheaters, so that they could dis-
    the typical target group for online proctoring.
                                                            cuss their approach among themselves and stimulate
 • Participants were paid as Teaching Assistants for        each other to come up with creative ideas. We did ask
   the time they spent in doing the test and providing      the students to concentrate on digital/technical meth-
   feedback.                                                ods for cheating, as that would give us insight into
 • The proctored test was created especially for the        the technical sensitivity of the system. (This was also
    purpose of the experiment; though it mimicked a         based on an earlier, much smaller and more informal
    real test that all students had already taken (possi-   test in which it became apparent that “classical”, non-
    bly a year or more earlier), it was not part of the     technical methods such as cheat sheets were virtually
    actual test schedule.                                   undetectable by Proctorio.)
A one-hour digital information session was organized
where interested students were informed about the           Technical Setup. In order to set up online proctor-
set-up of the test, the Proctorio system and their pri-     ing, we used Proctorio, integrated with the main ap-
vacy. Students had the opportunity to pose questions,       plication for digital testing that is used at the univer-
which turned out to be mainly related to the storage        sity: Remindo.3 In order to take part, students had
and use of their data by the Proctorio system and the       to install a Google Chrome extension for Remindo
university. Students could volunteer to participate at      on their computer (which is available under all three
any point before, during or after this information ses-     major OSs: Windows, Mac OS and Linux). Thanks
sion.                                                       to the integration, students could activate Proctorio
    In total, 30 students volunteered, out of a total       when starting their exam in Remindo.
Bachelor population of 720; all of the volunteers were          Remindo has different settings, in particular con-
admitted to participate in the experiment. The group        cerning the use of any tools outside Chrome, or even
of participants was geographically diverse, reflective      switching to any tabs outside the one in which Re-
of the actual population in the study programme: 46%        mindo runs, for the duration of the test. For this ex-
were international students, some of them physically        periment, we used the strictest setting: the test was
located outside the Netherlands at the time of the test.    presented in a web browser set to full screen, and no
    The participants were divided into different popu-      other tabs or applications were allowed.
lations:                                                        Before the students conducted the exam, they
                                                            were asked to do a system check. The goal of the
 • Cheating: A group of 6 students who attempted to
                                                            system check was twofold; it provided us insight in
   fool the system by cheating without detection. Af-
                                                            technical difficulties students might encounter when
   ter the information session, some of the students
                                                            working with Proctorio and it was an opportunity for
   themselves indicated their willingness to take up
                                                            the students to get to know the system already and
   this role. We invited these students and randomly
                                                            start well-prepared at the actual exam.
   selected several others. As we needed a majority
                                                                Proctorio produces its results in the form of a so-
   group to compare with, we decided on selecting a
                                                            called gradebook for each student. It uses a combi-
   small sample of 6 to be cheating students.
                                                               3 See
 • Nervous: A group of 5 students who were asked                       https://www.utwente.nl/en/educational-systems/
   to act nervous, by moving, looking around or mut-        about-the-applications/remindo
                                                                                                                      281
CSEDU 2021 - 13th International Conference on Computer Supported Education
nation of audio, video and keystroke monitoring to                 and findings, to determine how human proctors can
analyse the student’s behaviour. In doing so, it uses              be used best to complement the automatic detection
so-called profiles that determine how suspicious cer-              system.
tain types of behaviour are judged to be. To be pre-                   Thirdly, the participants were asked to evaluate
cise, Proctorio distinguishes metrics and abnormali-               their findings, in two ways: the cheaters were asked
ties (measured based on the difference between the                 to describe their approach, and all students filled in a
behaviour of the individual student to the group aver-             survey, asking them about
age); a profile sets a weight for each of the metrics,              • ease of use,
and an on/off switch for each of the abnormalities.
This results in a suspicion level ranging from 0–100                • technical possibilities,
based on the weighted metrics and a count of flagged                • privacy aspects, and
abnormalities for each gradebook. Boundaries can be                 • advice to the teachers.
set for judging the suspicion level to be low (by de-
fault up to 35), medium (by default between 36–59)
or high (by default 60 or higher).
    Proctorio has a default lenient profile. Besides               3    OUTCOME OF THE
this, we defined a couple of more severe profiles,                      EXPERIMENT
which we called audio (weighing audio-related mea-
sures more heavily) and keystrokes (giving a higher                The overall results of the automatic detection (using
weight to keystrokes and copy/paste behaviours).                   the three profiles) and the judgement of the reviewer
These, however, are not neutral; instead, we created               are shown in Table 2. Points to be noted are:
them specifically with the aim to catch out those stu-
                                                                    • Some students had to restart the test several times,
dents which we knew to be cheaters, without also ac-
                                                                      due to technical difficulties. The second column
cusing those we know were honest (nervous or nor-
                                                                      (“#”) shows the separate attempts.
mal). In other words, we were trying to tune the sys-
tem to its best achievable sensitivity and specificity              • The fourth column (“Cheat method”) summarises
based on the given gradebooks. Table 1 gives an                       the actions that the cheating students had under-
overview of the profiles.                                             taken to circumvent the system. These are re-
    Apart from checking the computed suspicion lev-                   ported in Table 3 in more detail.
els and flagged abnormalities, one can also access the              • For each of the three Proctorio profiles we tried
gradebooks directly, and check in more detail what                    out (see previous section), the table reports the
happened, either as classified by Proctorio or through                suspicion level, the rank (which is the place of the
own inspection of the recorded input.                                 suspicion level in the overall set of results) and
                                                                      the flags (which is the number of abnormalities
2.1    Additional Data                                                noted). The colour coding of the level columns is
                                                                      based on the default levels at which Proctorio con-
Besides the analysis results provided by Proctorio, we                siders the performance to be unsuspicious (green,
collected several other types of data.                                35 or below), slightly suspicious (yellow, between
    First of all, the 30 gradebooks were reviewed by                  35 and 60) or suspicious (red, 60 or higher). The
six reviewers (each gradebook by a single reviewer),                  rank columns are coloured red (for the top 20% of
all of whom were staff members. The reviewers did                     suspicion levels) to green (for the bottom 80%).
not know which students had been assigned which                     • The final column (“Reviewer”) reports the find-
role. They noted which fraudulent actions they per-                   ings of the human reviewer. Reviewers did not
ceived, and compared their findings against the stu-                  know which students were hackers and which had
dents’ own reports. In reviewing the gradebooks, the                  been asked to act nervously. Videos with a sus-
reviewers were guided by what the system had in-                      picion level (in the default profile) of above 30%
dicated as periods of abnormal activities — so their                  were reviewed entirely. Between 21% and 30%,
findings were not completely independent of the au-                   reviewers clicked on all peaks that were indicated
tomatic detection system. We will come back to this                   in Proctorio’s gradebook. In videos with a risk
in Section 3.                                                         score of 20% or lower, only a spot check was per-
    Next to the focus on the fraudulent actions, it was               formed.
also important to gain a more general view on the pro-
cess from a review perspective. Therefore, secondly,               Table 3 shows in more detail what actions the cheaters
the reviewers were asked to document their approach                took, both successfully and unsuccessfully. As can
                                                                   be seen, several students used virtual machines, none
282
                                                                                    On the Efficacy of Online Proctoring using Proctorio
of which were detected either by Proctorio or by the                in general, as it was created based on prior knowledge
reviewers. For the virtual machines, the screen res-                about who was actually cheating.)
olution was odd, as the window was resized in order                     The results are shown in the last three columns of
to fit the virtual machines onto the student’s screen,              Table 3. Three out of six cheating students turn out to
and this was flagged as an abnormality by Procto-                   be undetectable by any means whatsoever. We also
rio. However, many of the honest students were also                 wish to recall that, even though Test02 and Test06
flagged for irregular screen resolution. The audio                  are in the top 20% under some profiles, this does not
level was also not flagged as abnormal by Proctorio,                equal detection, as in both cases our human reviewer
even for those who did use audio calls (in combination              cleared the student, as reported in Table 2.
with a virtual machine) as their method of cheating.
    To get a better idea of the usefulness of the Proc-             3.1     Reviewer Evaluation
torio results, we refer to a recommendation from
Technical University Eindhoven, which states that the               The reviewers discussed the process and findings.
top 20% of suspicion levels should always be man-                   The most important findings were:
ually reviewed. This means that, in a regular work-                  • You can’t see what students are doing from the
flow using online proctoring, any students not ranked                  chest down because of the way laptop cameras are
among the top 20% would certainly never be sus-                        aimed. If students were subtle they could use a
pected of cheating. In our experiment, only one stu-                   phone / notes undetected.
dent (Test08) falls into that top 20% for the default
                                                                     • The room scan is not a very useful feature. Stu-
(lenient) profile; two more (Test02 and Test06) rise to
                                                                       dents either moved the camera too quickly and
the top 20% under either the audio or the keystroke
                                                                       made a blurry recording, or they failed to record
profile. We then took a final step, namely to try and
                                                                       their desktop.
create a dedicated Proctorio profile for each of the
cheaters, in order to catch them out. If this fails for              • Watching an entire recording is very boring, mak-
a given cheater, then we may conclude that the in-                     ing it very hard to concentrate for long. Everyone
put data that Proctorio collects is, under no circum-                  changed to clicking highlights in the incident re-
stances, sufficient to expose this student. (Of course,                port instead.
if a dedicated profile does show up a given cheater,                 • The ID scanner does not always yield a clear pic-
that does not actually mean that it is a useful profile                ture. Sometimes we could not recognize the stu-
                                                                       dent.
                                                                                                                                   283
CSEDU 2021 - 13th International Conference on Computer Supported Education
 • Some students had a lot of sound flags due to out-                                     About 75% of students state that Proctorio is a suit-
   side noise. Others were flagged a lot for looking                                      able option for remote assessment.
   away because they were restless or working on an
   exam question on paper (on their desk).                                                Technical Possibilities. All students were able to
                                                                                          start the exam. Some students mentioned that they
3.2    Student Evaluation                                                                 had technical issues during the exam where most of
                                                                                          them were related to getting started with the exam,
As described in Section 2, the participating students                                     for example microphones and webcams that were not
were asked to provide us with their evaluation. All the                                   functioning.
participating 30 students filled out the digital evalua-
tion survey. The survey consisted of 17 questions (5                                      Privacy Aspects. Initially students were quite con-
open and 12 closed questions). The highlights are pre-                                    cerned about their privacy especially when it comes to
sented below; Figure 1 presents some of the collected                                     the storage of their data and how the data is handled
answers in the form of a bar chart.                                                       after the exam process. Students have the following
                                                                                          concerns when it comes to privacy:
Ease of Use. In general students are positive about                                          • Concerns about the distribution of the recording
the user-friendliness of the Proctorio system. Four                                            to third parties
students had to contact the Proctorio help desk before
they could start the exam. The issues were resolved                                          • Having to install an extension in the browser
and all of the students (30) were able to conduct the                                        • Having to scan my room
exam. The majority of students state that it is easy
                                                                                             • How long will the data be stored and who has ac-
to start the exam combination Remindo-Proctorio and
                                                                                               cess to the data
that the integration between these systems works well.
284
                                                                                                                                                                score (rank)
                 score (rank)
               Proctorio default
                                   Caught by reviewer?
                                                                                                                                                             Proctorio dedicated
      ID                                                 Cheat actions (successful)   Cheat actions (unsuccessful)       Dedicated Proctorio settings                              Comments
      Test02     36                No Lowered volume on input mic, ran a Discord      Virtual machine attempted on       Raise frame metric 'Audio levels'     48
               (13th)                 call in the system tray and continued to talk   different operating systems, but   from 1 to 2; add environmental       (4th)
                                      to people. Used a Bluetooth mouse that can      the virtualized camera never       abmormality 'Audio levels'
                                      switch between devices.                         passed the system check.
      Test05    13                 No Virtual box. Had websites and WhatsApp          Tried to get webcam on a loop.      No combination of settings was        N/A                Even with the strictest
               (28th)                 open during the exam, and class notes.          Tried to find Proctorio source code found that could raise this                              settings, the abnormalities
                                      Paused scrolling on the screen to minimize                                          student into the top 20%                                 do not go above 2
                                      abnormal scrolling behavior.
      Test06     36                No Virtual box, similar to Test05. Used the web                                       Raise frame metric `Keystrokes'       49
               (10th)                 to look up answers.                                                                from 0 to 1; add environmental       (7th)
                                                                                                                         abnormalities 'Mouse movement'
                                                                                                                         and 'Clicking'
      Test18     48                Yes Opened Discord in another desktop on her      Was locked out of Proctorio                                                N/A                Student was already in the
                (3rd)                  laptop so it was not visible in the taskbar.  several times when attempting to                                                              top 20% using the default
                                       Friend Googled questions visible on her       switch screens.                                                                               profile
                                       screen stream. Answers delivered over voice
                                       chat. Was able to disable microphone after
                                                                                                                                                                                                                 Table 3: Actions undertaken by cheaters.
285
                                                                                                                                                                                                                                                            On the Efficacy of Online Proctoring using Proctorio
CSEDU 2021 - 13th International Conference on Computer Supported Education
Advice. As the student opinion is a very important                 2. When properly informed, students are not op-
aspect of the acceptation of a Proctoring solution, we                posed to the use of online proctoring, though other
asked the students what the University should defi-                   testing methods are clearly preferred;
nitely take into account when considering continua-                3. Proctorio cannot reliably (or in some cases not
tion of proctoring. A summary of the given answers:                   at all) detect technical cheats that Bachelor Com-
 • Clearly communicate about the privacy aspects;                     puter Science students can come up with (in other
   which data is stored where and visible by whom?                    words, its sensitivity is unacceptably low);
 • There will always be students that try to outsmart               4. In seeming contradiction with the above, students
   a system                                                            are (in a clear majority) of the opinion that the use
 • Prefer to have physical exams and only use proc-                    of Proctorio will prevent cheating.
   toring when really needed, for the people that can-             The “seeming contradiction” between the demon-
   not come to campus                                              strated poor actual efficacy of online proctoring on
                                                                   the one hand and its perceived benefit on the other
 • The room scan is not thorough enough and there-
                                                                   can at least partially be resolved by observing that the
   fore makes it easy to work-around (cheat)
                                                                   former is about detection, whereas the latter is about
 • Think about bathroom possibilities during the ex-               prevention. There are clearly some forms of cheating
   ams                                                             which would be so easy to detect using online proctor-
 • For some exams it could be difficult to only work               ing — like sitting next to each other and openly col-
   on one screen, which is an automated setting in                 laborating — that they are automatically prevented,
   Proctorio.                                                      and in fact were not even tried out by our group of
                                                                   cheaters. In fact, such cheat methods would be de-
                                                                   tectable by a technically far less involved system than
                                                                   the one offered by Proctorio.
4     ANALYSIS                                                         Granted that such “casual cheats” are prevented,
                                                                   what remains are the “technical cheats” such as the
The takeaways of the results presented above are as                ones employed by our participants. We have shown
follows:                                                           that those are virtually un-detectable through online
1. Proctorio (in the combination with Remindo, as                  proctoring; so the question is if there is any preventive
   used here) is an easy-to-use system for students                effect. Any such effect will have to stem from the per-
   and teaching staff;                                             ception of students that the chance of getting caught is
286
                                                                        On the Efficacy of Online Proctoring using Proctorio
nevertheless non-zero. Since not all students are risk-     new features and the possibilities to work around the
averse, some of them have great confidence in their         system.
technical abilities, and some will even regard it as a          Next to giving the cheating students the freedom
challenge to “beat the system”’, it follows that online     to select their own methods, we also informed them
proctoring will not suffice to prevent technical cheats.    about the other cheating students, so that they could
We therefore pose that the use of online proctoring as      discuss their approach. In a real situation, it might be
the primary way to ensure reliability of online testing     less likely that potential cheaters seek each other out
is very dubious.                                            — although anecdotally we have heard that students
                                                            have done exactly that in some cases, in connection
Internal Validity. As we have used an experimental          with real online tests.
setup, there are certain threats to the internal validity       A final threat to external validity is the fact that we
that we have had to take into account.                      have conducted our experiment using a single tool,
    The first point is related to the student group that    Proctorio, and nevertheless have used the results to
participated in the experiment. As the students were        draw conclusions about the general principle of on-
not graded for their effort, there was less at stake for    line proctoring. We believe that this is justified be-
them than in a real test. This might affect their stress    cause Proctorio is representative of the cutting edge
level, especially for the group of cheaters, being lower    in tooling of this kind; we feel that it is unlikely that
than at an actual test and hence making it harder to de-    the shortcomings we have observed would be absent
tect cheats. On the other hand, the extrinsic incentive     in other tools.
of being paid made them take their role very seriously,
as is also visible in the Proctorio recordings.             System Limitations. One of the criteria during the
    Another issue related to the student group is the       selection of Proctorio was that it should be possible
representativeness of the sample. Besides the limited       for the majority of students to work with the tool with-
number of participants (30), the selection process was      out having any technical difficulties. The Proctorio
not structured: students could show their interest to       system fits this need because of its use as a Google
participate. This could lead to participants that have      Chrome extension. the consequence of this approach
a strong opinion about the proctoring, with increas-        is that virtual machines are hard to detect because
ing motivation to successfully cheat. It is not know        there is less influence on the hardware of the student.
how well the sentiment of the experimental group re-        The students in our test quickly came to this conclu-
flects the student population. During the informa-          sion as well and all decided to follow more or less a
tion session the importance of this experiment for the      similar approach of working with a virtual desktop.
decision making of the University was also stressed,
which might have influenced the students’ decision to
participate.                                                5    RELATED WORK
    As we wanted to know with which kind of cheat-
ing methods students would come up with, we did not
                                                            The worldwide shift towards online education in-
give specific instructions to the cheaters. In conse-
                                                            duced by Covid-19 brought the conversation on the
quence, they mostly selected somewhat similar ap-
                                                            credibility of online assessment methods back to
proaches. There might be other cheat methods that
                                                            light. When on-site testing is no longer an option,
were not tried out, to which our observations are
                                                            an effective way to ensure students’ integrity during
therefore not directly applicable. We did ask the par-
                                                            exams is a necessity to maintain the value of degrees
ticipants to focus on technical cheat methods because
                                                            that universities deliver around the world. The choice
from a prior, more superficial check it had already be-     of a suitable proctoring tool amongst the plethora of
come apparent that more traditional methods, such as        products available is not trivial. Hussein et al. com-
the use of cheat sheets, are hard to detect with proc-      pared online proctoring tools to decide which should
toring software.                                            be adopted at the University of the South Pacific, out
                                                            of which the decision was to continue with Proctorio
External Validity. Our experimental student group           (Hussein et al., 2020).
consisted of only Computer Science students. These              At the University of Twente, we ran two prior
are certain to be more technically proficient than the      experiments with candidate proctoring tools, the Re-
average student, hence this might have implications         spondus Lockdown Browser and the MyLabsPlus en-
for the external validity. Next to their technical abili-   vironment, in 2016 and 2017 (Krak and Diesvelt,
ties, Computer Science students also might find it mo-      2016; Krak and Diesvelt, 2017).Our findings con-
tivating to enrich their knowledge about these kind of      cluded that such tools did not preserve the validity
                                                                                                                       287
CSEDU 2021 - 13th International Conference on Computer Supported Education
of digital exams, as both were proven to be vulner-                tored test. The results of the study show that stu-
able and surpassable in a plethora of ways. We have                dents in the unproctored setting scored significantly
not found further research into the efficacy of such               higher (14% more) than their proctored counterparts,
methods, besides these prior experiments and the cur-              and spent twice as much time taking the tests, which
rent paper. Other online proctoring tools which record             the investigators linked to unproctored tests allowing
the examinees during their test face criticism related             much space for cheating (Alessio et al., 2017). A
to privacy issues and raising anxiety levels for test              similar result was achieved by Karim et al., whose
takers (Hylton et al., 2016). The privacy issues are               experiment setup involved 295 participants who were
also among the concerns found in (Krak and Diesvelt,               handed out to cognitive ability test, one that is search-
2016; Krak and Diesvelt, 2017).                                    able online and one that isn’t. The experiment saw
                                                                   30% of the participants withdrawing from the proc-
Regarding online proctoring, we look at two related
                                                                   tored test compared to 19% in the unproctored one,
research lines, one that tackles the acceptance of these
                                                                   it also confirms that unproctored examinees scored
systems by examinees, and another that looks at how
                                                                   higher than the proctored ones. Opposing (Alessio
it impacts the performance in a given test.
                                                                   et al., 2017), Hylton et al. administered an experiment
    In 2009, using Software Secure Remote Proctor-
                                                                   with two groups of participants, wherein the first takes
ing SSRP system, researchers conducted an experi-
                                                                   an unproctored exam while the other is proctored on-
ment with 31 students from 6 different faculties in a
                                                                   line. Though the results show that the unproctored ex-
small regional university to evaluate students’ accep-
                                                                   aminees score 3% more than their proctored peers and
tance of online proctoring tools. The results showed
                                                                   spend 30% more time on the test, the researchers of-
that slightly less than half the students expressed their
                                                                   fer a different interpretation linking the slightly lower
support for online proctoring tools, whilst a quarter of
                                                                   results in proctored settings to higher anxiety levels
the students expressed refusal of such proctoring tech-
                                                                   (Hylton et al., 2016). Results from a study conducted
niques (Bedford et al., 2009). Lilley et al. investigated
                                                                   at the University of Minnesota show slightly differ-
the acceptance of online proctoring with a group of 21
                                                                   ent results from (Alessio et al., 2017) and (Karim
bachelor students from 7 different countries. Using
                                                                   et al., 2014). In this setup, students taking a psy-
ProctorU, the subjects participated in an online for-
                                                                   chology minor afford the freedom of choosing on-
mative and two online summatve assessments. 9 of
                                                                   site or online proctored exams. The study spans three
the 21 participants shared their experiences with on-
                                                                   semesters and found that the scores of online exam-
line proctoring, 8 of which expressed their support to
                                                                   inees were 8% lower than their on-site counterparts
use online proctors in further modules (Lilley et al.,
                                                                   for two semesters; this difference disappeared in the
2016). A later experiment conducted by Milone et al.
                                                                   third semester with both types of examinees scoring
in the university of Minnesota in 2017 concerned a
                                                                   similar results (Brothen and Klimes-Dougan, 2015).
larger pool of students, 344, and showed that 89% of
                                                                   A more recent study by Neftali and Bic compared the
the students were satisfied with their experience using
                                                                   performance of students taking an online and an on-
an online proctoring tool, ProctorU, for their online
                                                                   site version of the same discrete math course. The
exams, while 62% agreed that the setup of the proc-
                                                                   study found that while online students score higher in
toring tool takes less than 10 minutes (Milone et al.,
                                                                   online homework, their results in the online proctored
2017).
                                                                   exams are 2% less from their online peers.
    Another direction in proctoring research concerns
                                                                       Dendir and Maxwell (Dendir and Maxwell, 2020)
the impact of proctoring tools on test scores. A study
                                                                   report on a study ran in between 2014 and 2019,
by Weiner and Hurtz contrasted on-site proctoring to
                                                                   in which the scores of students in two online
online proctoring. The experiment concerned more
                                                                   courses, principles of microeconomics and geography
than 14.000 participants and concluded that there is
                                                                   of North America, were compared before and after
a high overlap between the scores of the examinees
                                                                   the adoption of a web-based proctoring tool in 2018,
in both online and on-site settings. Furthermore, the
                                                                   Respondus Monitor. The experiment showed that af-
examinees dissociated their test scores from the type
                                                                   ter the adoption of online proctoring the scores have
of proctoring in place (Weiner and Hurtz, 2017). In a
                                                                   dropped on average by 10 to 20%. This suggests, that
different setting, Alessio et al. compared the scores of
                                                                   prior to the adoption of proctoring, cheating on online
students in proctored and unproctored settings. The
                                                                   exams was a common occurrence. This confirms that
study concerned 147 students enrolled in a online
                                                                   the use of online proctoring has a preventive effect, as
course on medical terminology. The experiment set-
                                                                   was also suggested in our own student survey.
ting allowed students to be divided over 9 sections,
                                                                       Vazquez et al. (Vazquez et al., 2021) ran a study
according to their majors, 4 of which took an online-
                                                                   with 974 students enrolled in two sections —online
proctored test, whilst the remaining 5 took an unproc-
288
                                                                       On the Efficacy of Online Proctoring using Proctorio
and physical on Winter 2016 and Spring 2017 respec-        the use of different software systems and different
tively— of a microeconomic principles course to in-        ways of online proctoring. The selection of the Proc-
vestigate the effectiveness and impact of proctoring       torio software implied certain design decisions dur-
on students’ scores. For the face-to-face course, three    ing the process. Future work could provide a more
exams were scheduled. The experiment showed that           in depth overview of different software systems, but
the unproctored students scored 11.1% higher than          also different methods of online proctoring, e.g. live
the students who took the exam with a live proctor in      proctoring and automated proctoring. The effective-
the first exam. The gap grew in favor of the unproc-       ness and student experience should be compared and
tored students to 11.2% higher on the second exam,         evaluated.
to reach 15.3% on the third. These differences how-
                                                           For the purpose of the decision process of our univer-
ever were smaller for online students who were proc-
                                                           sity, the results of this experiment were written up in
tored with a web-based proctor (ProctorU) in two ex-
                                                           (Bergmans and Luttikhuis, 2020), which also contains
ams. Unproctored students scored 5% higher in the
                                                           some more details on the behaviour of the (pseudon-
first exam, and 0.8% higher on the second. Vazquez
                                                           omyzed) individual students. At the moment of writ-
el al. tied the larger gap in proctored physical exams
                                                           ing, this is used in a University-wide discussion on
to students collaboration during exams.
                                                           the adoption of online proctoring (using Proctorio).
                                                                An unavoidable component of any such discussion
                                                           is: what are the alternatives? If we do not impose au-
6    DISCUSSION AND                                        tomatic online proctoring, using Proctorio or one of
     CONCLUSION                                            its competitors, do we take the other extreme and just
                                                           trust on the students’ good behaviour, possibly aug-
Most teachers and managers involved in the pro-            mented by oral check-ups of a selection of students?
cess of testing and the decisions on how to conduct        This is not the core topic of this paper, and merits a
it online have a very good grasp of the difficulties       much longer discussion, but let us at least suggest one
involved. For instance, the whitepaper (Whitepa-           alternative that may be worth considering: live online
per SURF, 2020) by SURF (the same organisation             proctoring, with a human invigilator watching over a
that performed the quickscan on privacy aspects in         limited group of students, and no recording. We hy-
(Quickscan SURF, 2020)) gives a rather thorough            pothesise that this will have the same preventive effect
analysis of risk levels and countermeasures to cheat-      discussed in Section 4: casual cheats can be detected
ing. Online proctoring is merely one and not the most      easily, and technical cheats cannot.
favoured of those countermeasures. This is also con-
firmed by students. It is therefore quite important to
involve them as stakeholders when choosing to intro-       REFERENCES
duce proctoring as a preventive measure.
    With this paper, we have aimed to inject some data     Alessio, H., Malay, N. J., Maurer, K., Bailer, A. J., and
into the discussion, of a kind that is not widely found         Rubin, B. (2017). Examining the effect of proctoring
nor easy to obtain, namely regarding the sensitivity            on online test scores. Online Learning, 21(1).
of online proctoring — in other words, its ability to      Bedford, W., Gregg, J. R., and Clinton, S. (2009). Imple-
avoid false negatives. Without carrying out a con-              menting technology to prevent online cheating: A case
                                                                study at a small southern regional university (SSRU).
trolled experiment, as we did, it is not really possible        MERLOT Journal of Online Learning and Teaching,
to say anything about this with confidence.                     5(2).
On the other hand, the used experimental approach          Bergmans, L. and Luttikhuis, M. (2020). Proctorio test re-
also implies limitations (already discussed in Sec-             sults and TELT recommendation. Policy document,
                                                                Technology Enhanced Learning & Teaching, Univer-
tion 4) and suggestions for future work. Further re-            sity of Twente. Availablehere.
search in real exam settings will provide more insight     Brothen, T. and Klimes-Dougan, B. (2015). Delivering on-
into the effectiveness of online proctoring. The vol-           line exams through ProctorU. Poster at the Minnesota
untary, mono-disciplinary and relatively small size of          eLearning Summit; online version here.
the sample that was used in this experiment also sug-      Dendir, S. and Maxwell, R. S. (2020). Cheating in online
gests that future work is needed. Conducting research           courses: Evidence from online proctoring. Computers
on a bigger student population, coming from differ-             in Human Behavior Reports, 2.
ent disciplines, would give a more complete overview       Hussein, M., Yusuf, J., Deb, A. S., L.Fong, and Naidu,
on the possibilities for the implementation of online           S. (2020). An evaluation of online proctoring tools.
proctoring. A final proposition for future work is on           International Council for Open and Distance Educa-
                                                                tion, 12:509–525.
                                                                                                                      289
CSEDU 2021 - 13th International Conference on Computer Supported Education
290