Important Study
Important Study
1. Introduction
2. Literature review
1. We thank an anonymous reviewer for their considerable input into this section of the Liter-
ature review.
6 Ekaterina Sudina and Luke Plonsky
groups”; this allows natural experiments to “use this naturally occurring variation
in exposure to identify the impact of the event on some outcome of interest”
(Craig et al., 2017, p. 2). Critically, one of the major differences between random-
ized controlled trials, natural experiments, and nonexperimental observational
studies is how well the intervention has been defined (Craig et al., 2017, Table 1).
In randomized controlled trials, the intervention is thoroughly documented and
implemented; in natural experiments, the intervention is also happening, but
researchers have less control over it; finally, in nonexperimental observational
studies, there is no clear intervention at all. Although natural experiments have
been particularly embraced in public health research, they have immediate applic-
ability in instructed SLA and applied linguistics research more generally, espe-
cially in situations where experimental manipulations are not deemed reasonable
or ethical. As such, natural experiments have arguably higher ecological validity
compared to more traditional, rigidly controlled experiments. One example of a
natural experiment for SLA would be a study that examines the effect of partic-
ipating in a study abroad program on L2 learners’ oral proficiency using a pre-
and posttest design (unlike a quasi-experimental study, a natural experiment does
not strictly control the independent variables). However, to our knowledge, no
study to date has conducted a natural experiment to investigate the effectiveness
of mobile-assisted language learning.
and intermediate Duolingo learners attained similar – and in some cases, supe-
rior – L2 proficiency levels as university students who studied foreign languages
for four and five semesters, respectively (see Jiang et al., 2020; Jiang, Chen, et al.,
2021). Of particular relevance to the present investigation is a study by Jiang,
Rollinson, Plonsky, et al. (2021) that explored a possible relationship between app
usage and gains in proficiency and observed modest correlations between the two
(ρ = .02−.14 for L2 French listening and reading; ρ = .01−.06 for L2 Spanish lis-
tening and reading, respectively). However, only one of many possible temporal
or exposure-related indicators was used (total hours). Furthermore, the sample
included only novices, and no pretest data were collected – features the present
study seeks to improve on.
Additionally, recently published studies and existing reports on Duolingo
effectiveness have predominantly assessed specific language skills (i.e., listening,
reading, and speaking; see Jiang et al., 2020; Jiang, Chen, et al., 2021; Jiang,
Rollinson, et al., 2021; Jiang, Rollinson, Plonsky, et al., 2021).2 One exception is
Loewen et al.’s (2019) investigation of Duolingo users’ overall L2 Turkish profi-
ciency along with five subareas (listening, speaking, writing, reading, and lexi-
cogrammar); although comprehensive and informative, this study did not involve
a control or comparison group and included only nine participants. Clearly, the
field stands to benefit from more research on L2 learners’ overall proficiency in
the domain of mobile-assisted language learning.
Although research on individual differences has long established its niche in SLA
overall (Gass et al., 2020) and computer-assisted language learning in particular
(see Pawlak & Kruk, 2022), much remains uncertain about the role of individ-
ual differences in mobile-assisted language learning. For example, Loewen et al.
(2019) raised concerns over participants’ attitudes to some app-related features
(e.g., lack of interaction and limited variation in Duolingo tasks), which might
have affected learners’ motivation and persistence in language app use. In the
same vein, García Botero et al. (2019) noted inconsistencies between Duolingo
learners’ questionnaire responses, which pointed to students’ motivation in and
positive attitude towards using the app out of class, and their interview data,
which demonstrated students’ mixed views on engagement and lack of long-term
interest while using the app.
2. The first three publications are white papers published by Duolingo; Jiang, Rollinson,
Plonsky, et al., 2021 is a peer-reviewed article.
8 Ekaterina Sudina and Luke Plonsky
All Duolingo courses are aligned with the Common European Framework of Ref-
erence (CEFR). Both French and Spanish courses start with a brief Intro section
(also known as A1.0, where A corresponds to the beginner or Basic User level;
see Council of Europe, 2001) followed by the A1 content, which has two sections
(A1.1 and A1.2) and covers both communicatively functional as well as grammati-
cal topics, and the A2 content, which also consists of two sections (A2.1 and A2.2)
and covers more advanced vocabulary and grammar. The last section of each
Duolingo course includes B1 content, where B corresponds to the intermediate or
Independent User level; see Council of Europe, 2001). The B1 content has four
sections (B1.1 through B1.4), at the end of which language learners are expected to
have mastered even more advanced communicatively functional and grammatical
topics (e.g., “World news,” “Learning,” “Subjunctive with common conjunctions,”
and “Past conditional” for French; “World news,” “Gossip,” “Imperfect subjunc-
tive,” and “Passive” for Spanish). In addition to allowing for multiple “oppor-
tunities for practice and repeated exposure to target language structures,” the
Duolingo courses combine “more implicit, comprehension‐based learning with
explicit feedback and explanations” (Jiang, Rollinson, Plonsky, et al., 2021, p. 981).
Notably, Duolingo encourages a high “degree of user autonomy in navigating the
The effects of frequency, duration, and intensity on L2 learning through Duolingo 9
3. Method
3.1 Participants
Table 1. (continued)
French Spanish
(k = 139) (k = 148)
Characteristic k % k %
Other languages studied (excluding French/Spanish)
No 31 22 34 23
Yes 108 78 114 77
Education b
Some high school 1 1
High school 12 8 10 7
Associate’s degree 4 3 10 7
Bachelor’s degree 55 38 58 39
Master’s degree 40 28 55 37
Ph.D. 18 13 6 4
Trade School 4 3 2 1
Other 11 8 7 5
Ethnicity
Asian 7 5 4 3
African American 4 3 7 5
Caucasian 117 84 126 85
Latino or Hispanic 7 5 7 5
Other 4 3 4 3
Self-rated level of French/Spanish when started using the
app c
Mean 2.38 2.52
SD 1.46 1.53
Self-rated level of French/Spanish at pretest c
Mean 4.18 4.29
SD 1.31 1.56
Self-rated level of French/Spanish at posttest d
Mean 4.43 4.44
SD 1.37 1.43
Skills learned through Duolingo the most e
Vocabulary 112 19 120 20
Grammar 87 15 102 17
Pronunciation 71 12 56 9
Listening 96 16 87 14
12 Ekaterina Sudina and Luke Plonsky
Table 1. (continued)
French Spanish
(k = 139) (k = 148)
Characteristic k % k %
Speaking 55 9 56 9
Reading 104 18 114 19
Writing 69 12 80 13
Using Duolingo resources other than regular lessons f
Stories 110 49 111 46
Podcasts 36 16 34 14
Tips 64 29 73 30
Nothing 14 6 25 10
Experience learning French/Spanish before using
Duolingo
No 42 30 32 22
Yes 97 70 116 78
Ways of learning French/Spanish before using Duolingo g
Being around native speakers 17 9 40 18
High school classes 65 36 79 36
Language apps 41 23 46 21
Internet-based materials (e.g., podcasts, YouTube) 9 5 11 5
Textbooks and other materials in print 3 2 4 2
Conversational language classes 25 14 22 10
Other 21 12 20 9
Taking French/Spanish classes in addition to Duolingo
No 134 96 140 95
Yes 5 4 8 5
Using other programs/apps in addition to Duolingo
No 116 83 130 88
Yes 23 17 18 12
Notes.
a. French: k = 353, Spanish: k = 400.
b. French: k = 144, Spanish: k = 149.
c.d 0 = Absolute beginner; 10 = Native speaker.
e. French: k = 594, Spanish: k = 615.
f. French: k = 224, Spanish: k = 243.
g. French: k = 181, Spanish: k = 222.
The effects of frequency, duration, and intensity on L2 learning through Duolingo 13
Three different types of data and corresponding data sources were used in the
study.
1. Exposure/behavioral data
The first data source shed light on learners’ exposure to the target language and
related behavior (in-app engagement). Of particular interest was (a) the duration
of app usage measured as total minutes per participant across the 6-month period
of study (i.e., “Minutes”), (b) the number of times the learner opened the app
in a given week (i.e., “Logins”) and the number of days the learner completed at
least one lesson (i.e., “Sessions”) – two frequency measures, and (c) the following
content-related/curriculum-oriented intensity variables: “Lessons” (i.e., the num-
ber of lessons completed), “Level reviews” (i.e., the final lesson for a given Level/
Skill combination), “Skill practice” (i.e., when a learner goes back to review skills
that they have already “gilded”), “Stories” (i.e., the number of stories completed),
and “Tests” (i.e., the number of tests completed). All of these indicators were used
in their raw forms as predictors of learner gains.
2. Self-report data
To understand learner demographics as well as participants’ language learning
history, an instrument that largely mirrored Jiang, Rollinson, Plonsky, et al.’s
(2021) background questionnaire was used. Additionally, we collected data using
scales for measuring two individual difference variables: L2 grit (adapted from
Teimouri et al., 2022) and L2 motivated learning behavior (adapted from Papi
et al., 2019). These variables, individually and in tandem, allowed the study to
examine these two individual differences as additional predictors of both in-app
engagement and gains in learning.
Teimouri et al. (2022) validated their instrument with a sample of 191 learners
of English in a foreign language context (Iran) and reported Cronbach’s α relia-
bility of .80 for the full L2 grit scale, .86 for the perseverance of effort subscale
(PE, five positively keyed items), and .66 for the consistency of interest subscale
(CI, four negatively keyed items). Papi et al.’s (2019) L2 motivated learning behav-
ior questionnaire (five positively keyed items) was first used with a sample of
257 learners of English in a second language context (the US) and had internal
consistency-reliability of .86 as measured by Cronbach’s α.
Both scales were employed after implementing minor adjustments. Specifi-
cally, the word English in the original scales was replaced with French or Spanish
in the present study in order to tailor the item wording to participants’ target lan-
guages. Additionally, for the sake of consistency, Papi et al.’s (2019) original 5-point
14 Ekaterina Sudina and Luke Plonsky
Likert-type scale (endpoints: 1 = never true of me; 5 = always true of me) was
replaced with Teimouri et al.’s (2022) 5-point fully verbal and numerical Likert-
type scale (endpoints: 1 = not like me at all; 5 = very much like me). An example
item for L2 grit: “I am a diligent French/Spanish learner”; an example item for L2
motivated learning behavior: “I work hard at studying French/Spanish.”
3. L2 proficiency
Two different types of language tests were used to measure participants’ L2 profi-
ciency: A C‑test (Spanish: Riggs & Maimone, 2018; French: Counsell, 2018) and
an elicited imitation test (EIT; Spanish: Solon et al., 2019; French: Gaillard &
Tremblay, 2016). These instruments were chosen based on a number of considera-
tions. First, Spanish and French C‑tests and EITs have undergone rigorous devel-
opment and possess strong validity arguments. To illustrate, Riggs and Maimone
(2018) reported a high correlation between Spanish C‑test scores and (a) self-
assessed proficiency (r = .81, p < .001) as well as (b) class level (r = .73, p < .001).
Counsell (2018) also reported sizeable and positive correlations between French
C‑test scores and (a) self-assessed proficiency (rs = .58–.67 for reading, writing,
listening, and speaking, respectively, with an overall r = .63, p < .01) and (b) pro-
gram level of study at the university (r = .85, p < .01). In the same vein, Gaillard and
Tremblay (2016) found that the strongest predictors of French EIT ratings in their
study were (a) C‑test scores (R2 = .79) and (b) class level (R2 = .69). Solon et al.
(2019), in turn, suggested that “the modified, 36-item EIT is, in fact, better able
to discriminate among learners at higher levels of proficiency than is the 30-item
EIT” (p. 14) and reported Cronbach’s α reliability ranging from .78 to .97 for the
30-item EIT and from .84 to .97 for the 36-item EIT for L2 learners at different
proficiency levels. Another benefit of using C‑tests and EITs is that they are con-
sidered to be good measures of explicit and implicit L2 knowledge, respectively
(e.g., Ellis, 2005; Heo, 2016).3
Moreover, the validity of these two groups of tests is supported not only by
primary studies but also by two recent meta-analyses. Synthesizing results across
239 studies, McKay (2019) found an almost perfect correlation between C‑tests
and tests of general language proficiency (r = .94). The evidence for EITs is like-
wise very strong. Kostromitina and Plonsky’s (2022) meta-analysis observed an
attenuation corrected correlation of r = .81 between EITs and other largely stan-
dardized tests of L2 proficiency.
Second are a set of practical considerations. These proficiency measures are
highly efficient and can be completed independently and online in approximately
20−30 minutes. Upon completion, these tests can then be scored quickly and
accurately. The instruments are also freely available and do not carry any propri-
etary restrictions.
Finally, both C‑tests and EITs have been developed and are available in a
range of languages (Arabic, Chinese, German, Japanese, Russian). Therefore, the
present study could be replicated in other L2s without changing this critical
design feature (i.e., the dependent measure). The instruments used to collect self-
report and L2 proficiency data are available in Appendix A.
3.3 Procedure
Following IRB approval, the self-report and L2 proficiency measures were pilot-
tested, and the data were collected using an online survey platform Gorilla
(https://gorilla.sc). Eligible app users (see above) were invited to be part of the
study starting on August 3, 2021; the first round of data collection lasted until
November 5, 2021. Those who expressed interest were asked to begin the study by
completing an online survey that included a consent form followed by (a) instru-
ments for L2 grit and motivation (items for each scale were randomized to control
for order effect), (b) a language background questionnaire, (c) an EIT, and (d) a
C‑test for their chosen language (Spanish or French). This battery of instruments
was completed remotely, without a proctor, in about an hour. Upon completion,
participants were reminded of the minimum app engagement required for partic-
ipation (i.e., at least 2 logins to the app per week for the following 26 weeks).
Six months later (i.e., in February through May 2022), each participant who
had met the eligibility requirement was contacted again and invited to retake
the two individual difference scales as well as the two proficiency tests. Partici-
pants who met the selection criteria and the eligibility requirements received a
$ 100 Amazon gift card. The selection criteria included: (a) being a Duolingo user
studying either Spanish or French and (b) being a native speaker of English resid-
ing in the US. The requirements included: (a) completion of a survey and two
language tests (at the time of the pretest and posttest 6 months later) and (b) a
minimum of 52 logins on the Duolingo app (2 per week × 26 weeks) to ensure
minimally sufficient engagement with the target language and Duolingo content.
After all data were collected and de-identified, the C‑tests were scored auto-
matically, whereas the EITs were scored by four trained raters, all highly proficient
in the target language (two raters per language: the lead rater scored all of the
EIT items, whereas the second rater scored approximately 10% of the sample’s
EITs). Following rater training and norming sessions, which lasted approximately
two hours, the raters for each language (French vs. Spanish) got calibrated them-
selves and proceeded to independently score the EITs. To avoid potential rater
16 Ekaterina Sudina and Luke Plonsky
bias, raters were kept unaware of which audio files had come from the pretest and
which were from the posttest.
To calculate interrater reliability for the EIT scores, intraclass correlation coeffi-
cients (ICC, two-way mixed, consistency) were computed: (a) French: average α
for 28 items = .96; average by test type (14 items each): pretest = .95, posttest = .96;
(b) Spanish: average α for 30 items = .98; average by test type (15 items each):
pretest = .99, posttest = .97.
During the data clean-up, 30 cases (10% out of a total of 287; 17 Spanish,
13 French) were excluded listwise due to issues with recordings or participants’
misinterpretation of the task (several produced English translations rather than
French/Spanish imitations). There were no missing data on other variables.
To compare proficiency scores across languages and proficiency test modes
(written vs. oral), they were first converted to decimals separately by language.
Spanish EIT: 36 items, max possible score = 144 (4 per item). French EIT: 50
items, max possible score = 200 (4 per item). C‑test (both languages): 125 items,
max possible score = 125 (1 per item). Next, the assumptions for each statistical
analysis were checked and met (see Appendix B).
4. Results
The inspection of the scale data revealed two items with low corrected item-total
correlations (ITCs < .40) on the L2 Grit Consistency of Interest subscale: CI7R
and CI8R, which considerably affected reliability of the scale and were, therefore,
removed from further analyses. The rest of the corrected ITCs for all constructs
and subconstructs were > .40 on both the pretest and the posttest. The stabil-
ity of constructs over time was assessed by test-retest reliability: r(L2 grit) = .68;
r(L2 perseverance of effort) = .69; r(L2 consistency of interest) = .58; r(L2 motiva-
tion) = .61, p < .001. As demonstrated in Table 2, internal-consistency reliability of
the scales was also acceptable. Additionally, descriptive statistics indicated that the
participants had the highest mean score on L2 consistency of interest and the low-
est mean score on L2 motivation on both the pretest and the posttest.
The effects of frequency, duration, and intensity on L2 learning through Duolingo 17
Notes.
a. k = number of items; M = mean; SD = standard deviation; 95% CI = 95% confidence intervals of
coefficient alphas; L2 = second language. Spearman’s rho = .59
b. k = number of items; M = mean; SD = standard deviation; 95% CI = 95% confidence intervals of
coefficient alphas; L2 = second language. Spearman’s rho = .65
As shown in Table 3, the participants’ mean scores on the EIT (i.e., oral profi-
ciency test) were higher in Spanish than in French, which was observed during
both the pretest and the posttest. However, Spanish EIT scores were more spread
out, as demonstrated by higher standard deviations. The participants’ average
C‑test scores (i.e., written proficiency test) were overall slightly higher than the
EIT scores, except for Spanish pre-test mean scores, which were virtually the
same in both modes. Nonetheless, the two groups’ proficiency in the written
mode appeared to be at about the same level on both the pretest and the posttest
(see Figure 1). The results of dependent-samples t-tests showed that learner profi-
ciency gains from the pretest to the posttest were significant, with small effect sizes
adjusted for the within-sample correlation (Plonsky & Oswald, 2014): (a) EIT
gains: t(256) = 13.38, p < .001, Cohen’s d = .29, 95% CI [.24, .33]; (b) C‑test gains:
t(286) = 11.00, p < .001, Cohen’s d = .36, 95% CI [.29, .43].
18 Ekaterina Sudina and Luke Plonsky
Notes. M = mean; SD = standard deviation. EIT (elicited imitation test) = oral proficiency;
C‑test = written proficiency.
Figure 1. Proficiency test gains in the written vs. oral mode and by language
4.3 RQ1: To what extent do learner gains differ when tested in the written
vs. oral mode?
The results of dependent-samples t-tests for RQ1 demonstrated that there were no
statistically significant differences in learner proficiency gains in the written vs.
oral mode (see also Figure 1), and the effect sizes adjusted for the within-sample
correlation were small (see Plonsky & Oswald, 2014): (a) French: t(125) = −.71,
p = .48, Cohen’s d = −.07, 95% CI [−.27, .13]; (b) Spanish: t(130) = 1.23, p = .22,
Cohen’s d = .13, 95% CI [−.08, .34].
As shown in Figure 1, the results of the independent-samples t-tests for RQ1
showed that the EIT gains (oral mode) did not statistically differ by language:
t(204.64) = −.85, p = .397, Cohen’s d = −.11, 95% CI [−.35, .14]. The same was true for
the C‑test gains (written mode) by language: t(285) = 1.52, p = .13, Cohen’s d = .18,
The effects of frequency, duration, and intensity on L2 learning through Duolingo 19
95% CI [−.05, .41]. Of note, similar results were observed when the two tests were
re-run without outliers on the dependent variable.
4.4 RQ2: To what extent are frequency, duration, and intensity of Duolingo
app usage associated with gains in L2 Spanish and French?
A summary of descriptive statistics for the Duolingo app usage data is demon-
strated in Table 4. It shows meaningful differences among the frequency, duration,
and intensity variables. (A slight overlap between the two frequency variables was
addressed when conducting a follow-up multiple regression analysis, see Table 6.)
values. In particular, requiring at least two logins per week, though necessary to
ensure regular exposure to the target language, seems to have yielded an unusual
level of homogeneity in login data across the sample: The mean number of logins
was 165 (6.35 logins per week) with a standard deviation of only 11.30. The rela-
tionships between logins and gains on the EIT and C‑test were then re-examined
using Thorndike’s formula for correction for range restriction, resulting in sub-
stantially larger correlations in both cases (i.e., .46 and .11, respectively).
Table 5. Pearson correlations between Duolingo app usage and proficiency gains
(N = 233)
Type App usage EIT gains C‑test gains
Frequency Logins a r = .14, BCa 95% CI [.03, .26] * r = .03, BCa 95% CI [−.09, .17]
Sessions r = .12, BCa 95% CI [.00, .24] r = .06, BCa 95% CI [−.07, .19]
Duration Minutes r = .01, BCa 95% CI [−.16, .18] r = .20, BCa 95% CI [.05, .35]
Intensity: Content Lessons r = .21, BCa 95% CI [.07, .35] r = .26, BCa 95% CI [.13, .40]
and Curriculum Level reviews r = .21, BCa 95% CI [.07, .34] r = .11, BCa 95% CI [−.04, .27]
Skill practice r = −.06, BCa 95% CI [−.19, .07] r = .08, BCa 95% CI [−.04, .22]
Stories r = −.02, BCa 95% CI [−.16, .14] r = .07, BCa 95% CI [−.04, .18]
Tests r = .09, BCa 95% CI [−.02, .25] r = .09, BCa 95% CI [−.06, .22]
Notes.
* Bias-corrected and accelerated 95% confidence intervals for correlation coefficients.
a. When adjusted for range restriction, the correlations for logins x EIT (r = .14) and logins x C‑test
gains (r = .03) increase to .46 and .11, respectively.
The results of the first multiple regression analysis suggested that when the
three variables most strongly associated with EIT gains were entered into the
model as predictors along with the target language, which was added as a covari-
ate, the model explained 4−6% of the variance in EIT gains and was statistically
significant: F(4, 238) = 3.56, p = .008, R2 = .06, adjusted R2 = .04. ‘Level Reviews’
emerged as the only meaningful positive predictor (see Table 6).4
The results of the second multiple regression analysis demonstrated that
when the two variables most strongly correlated with C‑test gains were entered
in the model as predictors along with the target language, which was added as a
covariate, the model explained 5−6% of the variance in C‑test gains and was sta-
tistically significant: F(3, 257) = 5.21, p = .002, R2 = .06, adjusted R2 = .05. ‘Lessons’
emerged as the only meaningful positive predictor (see Table 7).
4. Of note, the Sessions variable was excluded from the model due to a large correlation with
the Logins variable.
The effects of frequency, duration, and intensity on L2 learning through Duolingo 21
4.5 RQ3: To what extent are L2 grit and motivation associated with the
frequency, duration, and intensity with which learners use Duolingo?
Table 8. Pearson correlations between Duolingo app usage and individual differences
(N = 260)
L2 Perseverance L2 Consistency
Type App usage of effort of interest L2 Motivation
Frequency Logins r = .16 [.06, .27] * r = .08 [−.05, .20] r = .20 [.08, .31]
Sessions r = .15 [.04, .26] r = .12 [−.02, .23] r = .20 [.07, .31]
Duration Minutes r = .18 [.05, .30] r = .13 [.02, .22] r = .24 [.12, .35]
22 Ekaterina Sudina and Luke Plonsky
Table 8. (continued)
L2 Perseverance L2 Consistency
Type App usage of effort of interest L2 Motivation
Intensity: Content Lessons r = .17 [.02, .31] r = .13 [.03, .22] r = .22 [.09, .33]
and curriculum Level reviews r = −.002 [−.11, .11] r = .01 [−.11, .11] r = .07 [−.05, .18]
Skill practice r = .003 [−.12, .13] r = −.01 [−.13, .11] r = .03 [−.10, .15]
Stories r = .16 [.03, .28] r = .12 [.02, .22] r = .17 [.05, .27]
Tests r = .09 [−.04, .20] r = .03 [−.09, .13] r = .06 [−.05, .15]
Note.
* Bias-corrected and accelerated 95% confidence intervals for correlation coefficients.
4.6 RQ4: To what extent are L2 grit and motivation associated with gains in
L2 Spanish and French?
Pearson correlations revealed the extent to which L2 grit and motivation mea-
sured at the pretest were associated with learner gains in written and oral profi-
ciency. The observed relationships were positive and constituted generally small
effect sizes (Plonsky & Oswald, 2014). As in response to some of the previously
conducted analyses, we applied a correction to the correlations that we had reason
to believe may have been attenuated due to range restriction. Specifically, the
standard deviations observed for both of the L2 grit subconstructs were substan-
tially smaller than in previous studies of foreign-language learners (e.g., Sudina
& Plonsky, 2021b) and were adjusted accordingly yet conservatively, shown in the
following results in parentheses following the corresponding uncorrected corre-
lations: r = .02, BCa 95% CI [−.10, .16] between EIT gains and L2 perseverance
of effort; r = .12 (rcorrected = .22), BCa 95% CI [−.01, .25] between EIT gains and L2
consistency of interest; r = .03, BCa 95% CI [−.10, .16] between EIT gains and L2
motivation; r = .16 (rcorrected = .19), BCa 95% CI [.04, .28] between C‑test gains and
L2 perseverance of effort (note that the confidence interval does not cross zero);
r = .09 (rcorrected = .17), BCa 95% CI [−.04, .23] between C‑test gains and L2 consis-
tency of interest; r = .17, BCa 95% CI [.05, .30] between C‑test gains and L2 moti-
vation (note that the confidence interval does not cross zero).
To examine the extent to which individual differences of L2 grit and moti-
vation predicted EIT/C‑test gains, two standard multiple regressions with four
predictors (i.e., L2 perseverance of effort, L2 consistency of interest, L2 motiva-
tion, and target language, which was added as a covariate) and EIT/C‑test gains
as outcome variables were performed. The two models explained 1−3% of the
variance in gains scores and were not statistically significant: (a) F(4, 244) = 1.76,
The effects of frequency, duration, and intensity on L2 learning through Duolingo 23
p = .14, R2 = .03, adjusted R2 = .01 for EIT gains, with L2 consistency of interest
as the only contributing predictor: β = .14, B = .01, 95% CI [.00, .03], p = .05; (b)
F(4, 268) = 1.83, p = .12, R2 = .03, adjusted R2 = .01 for C‑test gains, with no meaning-
ful predictors.
5. Discussion
The current study sought to examine the predictive power of two sets of variables
on L2 gains made in app-based language learning via Duolingo. Specifically, we
were interested in better understanding L2 development as a function of both
(a) learners’ app-based exposure/behavior (e.g., instructional frequency, dura-
tion) as well as (b) learners’ L2 grit and motivation. On a broad, theoretical level,
these sets of variables represent the two main types of factors (learner-external
and learner-internal) known to influence L2 learning (Gass et al., 2020). On a
practical level, the results have the potential to inform the instructional design
of Duolingo’s curriculum and to provide implications for in-app experience that
increase learner efficiency.
The study is unique in at least two respects. First, to our knowledge, this is
the only study to consider distribution of practice effects in the context of mobile-
based language learning. Moreover, we have done so by means of a natural exper-
iment thereby greatly increasing the study’s ecological validity. Second, although a
growing body of evidence has begun to accumulate on the role of grit in L2 devel-
opment (see Teimouri et al., 2021), no study to date has done so with mobile lan-
guage learners. It is also the first study to employ a longitudinal design to examine
the power of grit in predicting gains over time.
One challenge to these goals, which we want to be upfront about, were the
relatively modest gains observed on both the written and oral proficiency tests
(i.e., C‑test and EIT) in both languages. The lack of target language gains that
were observed over time (i.e., our main dependent variable) imposed a limitation
on the study’s findings because less gains necessarily means less for the predictor
or independent variables to explain. These gains appear in conflict with previous
findings on the effectiveness of Duolingo (e.g., Jiang et al., 2021). However, there
are several alternate explanations. For example, unlike other standardized profi-
ciency tests (e.g., ACTFL’s Oral Proficiency Interview), the dependent measures
in the present study were not developed with lower proficiency levels in mind
and may have been too difficult, as noted to us by several participants. Another
explanation for the modest gains observed may be a lack of effort on pre- and
post-assessments on the part of learners. Finally, we need to account for the user
autonomy and the amount of the course content covered by the participants after
24 Ekaterina Sudina and Luke Plonsky
RQ4 addressed the same relationship modeled in Figure 2 but without the
mediating effect of in-app exposure. The findings for this relationship were mod-
est but provide additional evidence of the predictive validity of L2 grit in the con-
text of app-based language learning.
6. Conclusion
The findings of this study carry relevance and potential benefits on multiple levels.
First, this study allowed us to gain a better understanding of the role of tech-
nology in instructed second-language acquisition. This is critical as technologi-
cal advances have the potential to make language learning not only “a lifelong
(spanning one’s lifetime) but also a lifewide (not confined to a particular loca-
tion, such as a school) activity” (Reinders & Stockwell, 2017, p. 372). Second, the
present investigation contributed to the growing line of evidence of Duolingo’s
effectiveness by assessing L2 learners’ proficiency in both written and oral modes
using high validity and high practicality measures. The study also shed further
light on our understanding of the individual and combined effects of frequency,
duration, and intensity of instruction on L2 development and, critically, on the
learner-internal factors that lead to choices to engage with the app. Finally, on a
practical level, the results of the present study may also inform Duolingo lesson
design and recommendations provided to learners with respect to the frequency,
duration, and intensity of app usage.
Funding
This article was made Open Access under a cc by 4.0 license through payment of an APC by or
on behalf of the authors.
The effects of frequency, duration, and intensity on L2 learning through Duolingo 27
Acknowledgements
This project was funded by a Duolingo Efficacy Study grant. We are very grateful to the Learn-
ing Science team at Duolingo for seeing the value in this study and for all their support and
assistance. In particular we would like to thank Xiangying Jiang, Erin Gustafson, and Joseph
Rollinson for their patience, generosity, and help digging up the learner-usage data we needed
time and time again. We are also very grateful to the language learners who contributed their
time and energy (and data) to this study. In addition, our sincere thanks go to Kevin Hirschi,
Masha Kostromitina, Ben Brown, and Andrew Dennis, for their tireless assistance with scoring,
coding, piloting, and Gorilla-wrangling (the data collection platform, not the primate). Thanks
to Kate Yaw for help recording our test instructions. Last, our gratitude goes out to the C-test
and elicited imitation test authors who very kindly provided us with the materials needed to
employ their instruments in this study: Stéphanie Gaillard, Annie Tremblay, Corinne Counsell,
Daniel Riggs, Luciane L. Maimone, Megan Solon, Hae In Park, Carly Henderson, and Marzieh
Dehghan-Chaleshtori.
References
Gass, S. M., Behney, J., & Plonsky, L. (2020). Second language acquisition: An introductory
course (5th ed.). Routledge.
Heo, Y. (2016). Heritage and L2 learners’ acquisition of Korean in terms of implicit and explicit
knowledge. PhD dissertation, Michigan State University, East Lansing.
Jiang, X., Chen, H., Portnoff, L., Gustafson, E., Rollinson, J., Plonsky, L., & Pajak, B. (2021).
Seven units of Duolingo courses comparable to 5 university semesters in reading and
listening. Duolingo Research Report DRR-21-03.
Jiang, X., Rollinson, J., Chen, H., Reuveni, B., Gustafson, E., Plonsky, L., & Pajak, B. (2021).
How well does Duolingo teach speaking skills? Duolingo Research Report DRR-21-02.
Jiang, X., Rollinson, J., Plonsky, L., & Pajak, B. (2020). Duolingo efficacy study: Beginning-
level courses equivalent to four university semesters. Duolingo Research Report
DRR-20-04.
Jiang, X., Rollinson, J., Plonsky, L., Gustafson, E., & Pajak, B. (2021). Evaluating the reading
and listening outcomes of beginning-level Duolingo courses. Foreign Language Annals,
54, 974–1002.
Kasprowicz, R. E., Marsden, E., & Sephton, N. (2019). Investigating distribution of practice
effects for the learning of foreign language verb morphology in the young learner.
Modern Language Journal, 103, 580–606.
Kim, S. K., & Webb, S. (2022). The effects of spaced practice on second language learning: A
meta-analysis. Language Learning, 72, 269–319.
Kostromitina, M., & Plonsky, L. (2022). Elicited imitation tasks as a measure of L2 proficiency:
A meta-analysis. Studies in Second Language Acquisition, 44, 886–911.
Li, M., & DeKeyser, R. (2019). Distribution of practice effects in the acquisition and retention
of L2 Mandarin tonal word production. Modern Language Journal, 103, 607–628.
Loewen, S. (2020). Introduction to instructed second language acquisition (2nd ed.). Routledge.
Loewen, S., Crowther, D., Isbell, D., Kim, K., Maloney, J., Miller, Z., & Rawal, H. (2019).
Mobile-assisted language learning: A Duolingo case study. ReCALL, 31(3), 293–311.
Loewen, S., Isbell, D., & Sporn, Z. (2020). The effectiveness of app-based language instruction
for developing receptive linguistic knowledge and oral communicative ability. Foreign
Language Annals, 53(2), 209–233.
McKay, T. (2019). More on the validity and reliability of C‑test scores: A meta-analysis of
C‑test studies (Unpublished doctoral dissertation). Georgetown University, Washington,
DC.
Nakata, T. (2015). Effects of expanding and equal spacing on second language vocabulary
learning: Does gradually increasing spacing increase vocabulary learning?. Studies in
Second Language Acquisition, 37, 677–711.
Nakata, T., & Suzuki, Y. (2019). Effects of massing and spacing on the learning of semantically
related and unrelated words. Studies in Second Language Acquisition, 41, 287–311.
Norris, J. M., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and
quantitative meta-analysis. Language Learning, 50, 417–528.
Papi, M., Bondarenko, A., Mansouri, S., Feng, L., & Jiang, C. (2019). Rethinking L2 motivation
research: The 2 × 2 model of L2 self-guides. Studies in Second Language Acquisition, 41,
337–361.
The effects of frequency, duration, and intensity on L2 learning through Duolingo 29
Park, D., Yu, A., Baelen, R. N., Tsukayama, E., & Duckworth, A. L. (2018). Fostering grit:
Perceived school goal-structure predicts growth in grit and grades. Contemporary
Educational Psychology, 55, 120–128.
Pawlak, M., & Kruk, M. (2022). Individual differences in computer assisted language learning
research. Routledge.
Plonsky, L., & Oswald, F. L. (2014). How big is “big”? Interpreting effect sizes in L2 research.
Language Learning, 64, 878–912.
Plonsky, L., & Ziegler, N. (2016). The CALL-SLA interface: Insights from a second-order
synthesis. Language Learning & Technology, 20, 17–37.
Reinders, H., & Stockwell, G. (2017). Computer-assisted SLA. In S. Loewen & M. Sato (Eds.),
The Routledge handbook of instructed second language acquisition (pp. 108–125).
Routledge.
Riggs, D., & Maimone, L. L. (2018). A computerized-administered C‑test in Spanish. In
J. M. Norris (Ed.), Developing C‑tests for estimating proficiency in foreign language
research (pp. 265–294). Peter Lang.
Rohrer, D. (2015). Student instruction should be distributed over long time periods.
Educational Psychology Review, 27(4), 635–643.
Rogers, J. (2017). The spacing effect and its relevance to second language acquisition. Applied
Linguistics, 38(6), 906–911.
Rogers, J. (2023). Spacing effects in task repetition research. Language Learning, 73(2),
445–474.
Rogers, J., & Cheung, A. (2020). Input spacing and the learning of L2 vocabulary in a
classroom context. Language Teaching Research, 24(5), 616–641.
Rogers, J., & Cheung, A. (2021). Does it matter when you review? Input spacing, ecological
validity, and the learning of L2 vocabulary. Studies in Second Language Acquisition, 43,
1138–1156.
Saito, K., & Plonsky, L. (2019). Measuring the effects of second language pronunciation
teaching: A proposed framework and meta-analysis. Language Learning, 69, 652–708.
Schmidt, F. T., Nagy, G., Fleckenstein, J., Möller, J., & Retelsdorf, J. (2018). Same same, but
different? Relations between facets of conscientiousness and grit. European Journal of
Personality, 32, 705–720.
Serrano, R. (2011). The time factor in EFL classroom practice. Language Learning, 61(1),
117–145.
Serrano, R. (2022). A state-of-the-art review of distribution-of-practice effects on L2 learning.
Studies in Second Language Learning and Teaching, 12(3), 355–379.
Serrano, R., & Huang, H. Y. (2018). Learning vocabulary through assisted repeated reading:
How much time should there be between repetitions of the same text? TESOL Quarterly,
52(4), 971–994.
Serrano, R., & Huang, H. Y. (2023). Time distribution and intentional vocabulary learning
through repeated reading: A partial replication and extension. Language Awareness, 32(1),
1–18.
Solon, M., Park, H. I., Henderson, C., & Dehghan-Chaleshtori, M. (2019). Revisiting the
Spanish elicited imitation task: A tool for assessing advanced language learners? Studies
in Second Language Acquisition, 41, 1027–1053.
30 Ekaterina Sudina and Luke Plonsky
Sudina, E., & Plonsky, L. (2021a). Academic perseverance in foreign language learning: An
investigation of language-specific grit and its conceptual correlates. Modern Language
Journal, 105, 829–857.
Sudina, E., & Plonsky, L. (2021b). Language learning grit, achievement, and anxiety among L2
and L3 learners in Russia. ITL – International Journal of Applied Linguistics, 172, 161–198.
Suzuki, Y. (2017). The optimal distribution of practice for the acquisition of L2 morphology: A
conceptual replication and extension. Language Learning, 67(3), 512–545.
Suzuki, Y. (2019). Individualization of practice distribution in second language grammar
learning: The role of metalinguistic rule rehearsal ability and working memory capacity.
Journal of Second Language Studies, 2(2), 169-196.
Suzuki, Y., & DeKeyser, R. (2017). Effects of distributed practice on the proceduralization of
morphology. Language Teaching Research, 21(2), 166–188.
Teimouri, Y. (2017). L2 selves, emotions, and motivated behaviors. Studies in Second Language
Acquisition, 39, 681–709.
Teimouri, Y., Sudina, E., & Plonsky, L. (2021). On domain-specific conceptualization and
measurement of grit in L2 learning. Journal for the Psychology of Language Learning, 3,
156–164.
Teimouri, Y., Plonsky, L., & Tabandeh, F. (2022). L2 Grit: Passion and perseverance for
second-language learning. Language Teaching Research, 26, 893–918.
Uchihara, T., Webb, S., & Yanagisawa, A. (2019). The effects of repetition on incidental
vocabulary learning: A meta-analysis of correlational studies. Language Learning, 69(3),
559–599.
Yamagata, S., Nakata, T., & Rogers, J. (2023). Effects of distributed practice on the acquisition
of verb-noun collocations. Studies in Second Language Acquisition, 45, 291–317.
(After signing the consent form). Thank you! Please set aside 60 minutes of undisturbed time
to participate in the study. You should have access to a computer with a working microphone
and speakers and reliable internet service. Please, do not use a mobile phone.
Now think about your foreign language learning experience on Duolingo (in your case, French/
Spanish) and respond to the following items by selecting the statements that best describe you.
This is not a test and there are no right or wrong answers, so please be honest.
L2-GRIT SCALE
(adapted from Teimouri et al., 2022)
Perseverance of effort
1. I will not allow anything to stop me from making progress in learning French/Spanish*.
2. I am a diligent French/Spanish learner.
The effects of frequency, duration, and intensity on L2 learning through Duolingo 31
3. Now that I have decided to learn French/Spanish, nothing can prevent me from reaching
this goal.
4. When it comes to French/Spanish, I am a hard-working learner.
5. I put much time and effort into improving my weaknesses in learning French/Spanish.
Consistency of interest
6R. I think I have lost my interest in learning French/Spanish.
7R. I have been obsessed with learning French/Spanish in the past but later lost interest.
8R. My interests in learning French/Spanish change from year to year.
9R. I am not as interested in learning French/Spanish as I used to be.
Note. ‘R’ indicates negatively keyed items that have been reversed.
*The original scale was developed for L2 English, which in the present study was replaced
with French/Spanish.
Note. * The word English in the original scale was replaced with French/Spanish in the present
study.
Note. The mean scores on each scale indicate the levels of L2 grit and L2 motivated learning behavior,
respectively. The items were randomized.
Background questionnaire
(based on Jiang et al., 2021)
This form asks for background information about you. Although we ask for your name and
email, we do so only because we want to associate your answers to this questionnaire with your
other data. Your answers will be treated confidentially. Only the researchers will have access to
the information you provide.
1. Name:
2. Email (the one that is associated with your Duolingo account):
3. Age (please put a number):
4. What language(s) was/were spoken in your home before you were 6 years old?
5. What other languages do you speak, if any?*
6. Why are you learning French/Spanish? (Check all that apply)
For travel For school For job-related purposes For fun/leisure
For memory/brain acuteness For social purposes Other (please specify)
32 Ekaterina Sudina and Luke Plonsky
20. Did you take French/Spanish classes during the time you used Duolingo? Yes/No
21. Did you use other programs or apps to learn French/Spanish during the time you used
Duolingo? Yes/No
L2 proficiency
Introduction
This language test will ask you to listen to several short audio files in Spanish and make a
recording in response. (Please be patient as recordings may take time to load.)
Note that some of the items might be quite challenging. Please try to complete each of them
to the best of your ability.
<Click here to start>
Instructions-1
You are going to hear several sentences in English (6 in total). After each sentence, there will be
a short pause, followed by a tone sound {TONE}. Your task is to try to repeat exactly what you
hear. You will have only one attempt to do so. You will be given sufficient time after the tone to
repeat the sentence. Repeat as much as you can. Remember, don’t start repeating the sentence
until after you hear the tone sound {TONE}. Now let’s begin.
<I’m ready>
Practice stimuli
1. We drove to the park.
2. I’ll call her tomorrow night.
3. You can buy meat at the butcher shop.
4. My brother just bought a brand new computer.
5. Sometimes they take their dog for a walk in the park.
6. We’re going to play volleyball at the gym that I told you about.
Instructions-2
Now, you are going to hear a number of sentences in Spanish (36 in total). Once again, after
each sentence, there will be a short pause, followed by a tone sound {TONE}. Your task is to try
to repeat exactly what you hear in Spanish. You will have only one attempt to do so. You will be
given sufficient time after the tone to repeat the sentence. Repeat as much as you can. Remem-
ber, don’t start repeating the sentence until after you hear the tone sound {TONE}. Now let’s
begin.
Main stimuli
1. Quiero cortarme el pelo. (7 syllables)
2. El libro está en la mesa. (7 syllables)
3. El carro lo tiene Pedro. (8 syllables)
4. Él se ducha cada mañana. (9 syllables)
5. ¿Qué dice usted que va a hacer hoy? (9 syllables)
6. Dudo que sepa manejar muy bien. (10 syllables)
7. Las calles de esta ciudad son muy anchas. (11 syllables)
8. Puede que llueva mañana todo el día. (12 syllables)
9. Las casas son muy bonitas pero caras. (12 syllables)
34 Ekaterina Sudina and Luke Plonsky
Introduction
This language test will ask you to listen to several short audio files in French and make a record-
ing in response. (Please be patient as recordings may take time to load.)
Note that some of the items might be quite challenging. Please try to complete each of them
to the best of your ability.
<Click here to start>
Instructions-1
You are going to hear several sentences in English (6 in total). After each sentence, there will be
a short pause, followed by a tone sound {TONE}. Your task is to try to repeat exactly what you
hear. You will have only one attempt to do so. You will be given sufficient time after the tone to
repeat the sentence. Repeat as much as you can. Remember, don’t start repeating the sentence
until after you hear the tone sound {TONE}. Now let’s begin.
<I’m ready>
The effects of frequency, duration, and intensity on L2 learning through Duolingo 35
Practice stimuli
1. We drove to the park.
2. I’ll call her tomorrow night.
3. You can buy meat at the butcher shop.
4. My brother just bought a brand new computer.
5. Sometimes they take their dog for a walk in the park.
6. We’re going to play volleyball at the gym that I told you about.
Instructions-2
Now, you are going to hear a number of sentences in French (50 in total). Once again, after each
sentence, there will be a short pause, followed by a tone sound {TONE}. Your task is to try to
repeat exactly what you hear in French. You will have only one attempt to do so. You will be
given sufficient time after the tone to repeat the sentence. Repeat as much as you can. Remem-
ber, don’t start repeating the sentence until after you hear the tone sound {TONE}. Now let’s
begin.
Main stimuli
1. Dans cette grande ville, les rues sont larges.
2. Je doute qu’il sache si bien conduire.
3. Qu’est-ce que tu as dit que tu faisais?
4. Il est possible qu’il pleuve des cordes.
5. Les maisons sont trè s belles mais trop chè res.
6. Le livre rouge n’était pas sur la table.
7. Ni lui ni moi ne les avions comprises!
8. Il prend une douche tous les matins à 7h00.
9. Je n’aime pas les films qui sont à l’eau de rose.
10. Aprè s le dé jeuner, as-tu fait une bonne sieste?
11. Tu aimes é couter la musique techno, n’est-ce pas ?
12. Est-ce que tu penses que je dois me faire couper les cheveux?
13. Traverse la rue au feu et puis continue tout droit!
14. Y-a-t-il beaucoup de gens qui ne mangent rien le matin?
15. On en avait une petite noire qui s’appelait minouche.
16. J’espè re que le temps se ré chauffera plus tôt cette anné e.
17. Le petit garç on dont le chaton est mort hier est triste.
18. Quand Sophie reç ut sa collè gue, elle lui proposa du thé .
19. Ce restaurant est sensé avoir de la trè s bonne nourriture.
20. Je veux une belle et grande maison dans laquelle mes enfants puissent vivre.
21. La chatte que tu as nourrie hier é tait celle de ma voisine.
22. Le nombre de fumeuses en France ne cesse d’augmenter chaque anné e.
23. Gabriel, en é pousant sa patronne, a fait d’une pierre deux coups.
24. N’ê tes-vous pas fatigué s aprè s ce voyage en voiture de trois jours?
25. Nous aurions dû faire des ré servations avant d’aller au thé âtre.
26. Prenons deux semaines pour visiter New York pendant les vacances d’été !
27. Qu’allez-vous faire demain soir aprè s lui avoir dit la vé rité ?
28. Est-ce qu’elle vient de finir de peindre l’inté rieur de son appartement?
36 Ekaterina Sudina and Luke Plonsky
29. La personne avec qui je sortais n’avait pas un grand sens de l’humour.
30. Elle commande uniquement des plats de viande et ne mange jamais de lé gumes.
31. Vous pensez que le prix des maisons en ville va redevenir abordable?
32. Une bonne amie à moi s’occupe toujours des trois enfants de mon voisin.
33. Avant de pouvoir aller dehors, il doit finir de ranger sa chambre.
34. La police a arrê té le terrible voleur qui é tait grand et mince.
35. Auriez-vous la gentillesse de me passer le livre qui est sur la table ?
36. Elle a dé cidé de suivre des é tudes d’arts plastiques à l’Ecole des Beaux-Arts.
37. Dè s que la pré sidente eut signé le document, son secré taire l’emporta.
38. Excusez-moi, savez-vous si le train de 11h30 a dé jà quitté la gare?
39. Je ne me suis jamais autant amusé e que lorsque je suis allé à la patinoire.
40. Ce sont eux qui l’ont organisé l’an dernier à l’Université de l’Illinois.
41. Plus elle se dé pê chait dans son travail, moins elle ré alisait un travail de qualité .
42. Dè s que l’on aura dî né , on regardera attentivement le documentaire sur France 3.
43. Ne penses-tu pas que les ré alisatrices du film souhaiteraient lire les scé narios le plus tôt
possible?
44. L’examen n’était pas aussi difficile que celui de Monsieur Durand en cours de litté rature.
45. Laura et Julie, ce sont elles qui viennent de finir de dé corer é lé gamment la chambre d’amis.
46. Il est possible que ses parents soient arrivé s en France avant le dé but de la guerre d’A lgé rie.
47. On vient juste de rentrer du supermarché où les promotions é taient particuliè rement
inté ressantes.
48. Les é tudiantes Laure et Sté phanie vont continuer à l’étudier à l’Université de Montré al.
49. Marie, prenez votre courage à deux mains et vous verrez que cet entretien passera comme
une lettre à la poste!
50. Les é tudiants sortant de l’université avec un Master en poche ont plus de chance de trouver
un travail que les autres.
(as in the original French test). The total wait time between the auditory stimulus and the
onset of sentence repetition was, therefore, 3.25s (as in the original French EIT study).
8. There were no breaks between the trials as in the Spanish EIT study.
9. As both EITs were administered as self-paced tests, the maximum recording time was esti-
mated based on the formula by Solon et al. (2019; see supplementary materials) and set to
19s/19,000 ms (rounded based on the calculations below).
Spanish sentence #36 (Acabamos de volver del supermercado donde las ofertas eran muy
interesantes) = 27 syllables, 6.248s (the longest sentence recorded by a native speaker)
27 syllables = 6.248s → native speaker time
7 syllables = 6.248 + 2 → nonnative speaker time
27 syllables = (6.248 + 2) + (20 syllables *.5) = 8.248 + 10 = 18.248s → max. recording time for
nonnative speakers
French sentence #49 (Marie, prenez votre courage à deux mains et vous verrez que cet
entretien passera comme une lettre à la poste!) = 28 syllables, 6.238s (the longest sentence
recorded by a native speaker)
28 syllables = 6.238s → native speaker time
7 syllables = 6.238 + 2 → nonnative speaker time
28 syllables = (6.238 + 2) + (21 syllables *.5) = 8.238 +10.5 = 18.738s → max. recording time for
nonnative speakers
10. Finally, for both French and Spanish tests, a rubric developed by Solon et al. (2019) was
used.
Spanish C‑test
(Riggs & Maimone, 2018)
Introduction
In this language test, you will be presented with short Spanish texts in which parts of words are
deleted. The deletions correspond to the final portions of the words. Please do your best to fill
in the missing part of the word.
Complete the words as accurately as possible, paying attention to the spelling and gram-
matical features like accents or agreement in gender and number.
You may put a zero in the blank if you do not know the answer and do not want to guess.
There will be a total of 5 texts, each taking about 3 to 5 minutes to complete. Please try to finish
each text in under 6 minutes.
Main part
Below you will be presented with short Spanish texts in which parts of words are deleted. The
deletions correspond to the final portions of the words. Please do your best to fill in the missing
part of the word. Complete the words as accurately as possible, paying attention to the spelling
and grammatical features like accents or agreement in gender and number. You may put a zero
in the blank if you do not know the answer and do not want to guess. There will be a total of 5
texts, each taking about 3 to 5 minutes to complete. Please try to finish each text in under 6 min-
utes.
Spanish accents (if you do not have a Spanish keyboard): á, é, í, ó, ú, ñ, ü.
Example
On Sunday, the weather was beautiful, and we went for a walk.
On Monday, it was raining, and we stay at home.
38 Ekaterina Sudina and Luke Plonsky
French C‑test
(Counsell, 2018)
Introduction
In this language test, you will be presented with short French texts in which parts of words are
deleted. The deletions correspond to the final portions of the words. Please do your best to fill
in the missing part of the word.
Complete the words as accurately as possible, paying attention to the spelling and gram-
matical features like accents or agreement in gender and number. Words with hyphens or apos-
trophes like “celui-ci” or “l’ami” count as one word.
You may put a zero in the blank if you do not know the answer and do not want to guess.
There will be a total of 5 texts, each taking about 3 to 5 minutes to complete. Please try to finish
each text in under 6 minutes.
Main Part
Below you will be presented with short French texts in which parts of words are deleted. The
deletions correspond to the final portions of the words. Please do your best to fill in the missing
part of the word. Complete the words as accurately as possible, paying attention to the spelling
and grammatical features like accents or agreement in gender and number. Words with hyphens
or apostrophes like “celui-ci” or “l’ami” count as one word. You may put a zero in the blank if
you do not know the answer and do not want to guess. There will be a total of 5 texts, each tak-
ing about 3 to 5 minutes to complete. Please try to finish each text in under 6 minutes.
French accents (if you do not have a French keyboard): é, à, è, ù, â, ê, î, ô, û, ç, ë, ï, ü.
Example
On Sunday, the weather was beautiful, and we went for a walk.
On Monday, it was raining, and we stay at home.
RQ1a. The assumptions for paired samples t-tests by language were satisfactorily met (i.e.,
the dependent variable of proficiency gain scores was continuously scaled; the distrib-
ution of the differences in proficiency gain scores followed the normal curve and con-
tained no extreme univariate outliers; the independent variable of test mode – oral vs.
written – consisted of categorical data from two related groups).
RQ1b. There were no major violations of assumptions for the two-sample t-tests. For the EIT
gains in French vs. Spanish, the dependent variable was approximately normally dis-
tributed for each language group, but Levene’s test was statistically significant, sug-
gesting the lack of homogeneity of variances; nonetheless, the sample sizes for the
two language groups were roughly equal, which does not require equal population
variances. For the C‑test gains in French vs. Spanish, the dependent variable was,
again, approximately normally distributed for each language group, and Levene’s test
was not statistically significant (i.e., equal variances assumed). However, univariate
outlier analysis revealed two extreme outliers (|z| > .3.29) on the EIT gains variable
and six additional extreme outliers on the C‑test gains variable. A close inspection of
these scores did not indicate any red flags in participants’ performance. Therefore, the
analyses were conducted twice, with and without outliers, to allow for comparisons.
RQ2a. To meet the assumptions for Pearson correlations, all extreme outliers (|z| > .3.29)
were removed from the variables of interest (i.e., 6 from the Login and C‑test gains
variables, 4 from the Session, Minutes, Level reviews, Skill practice, and Tests vari-
ables, 2 from the Lessons and EIT gains variables, and 1 from the Stories variable)
as they were found to affect the correlation estimates. The assumption of linearity
was satisfied as indicated by the matrix scatterplot. Q-Q plots and histograms sug-
gested minor deviations from normality. Therefore, bootstrapped Pearson correla-
tions (based on 1,000 samples) with bias-corrected and accelerated confidence
intervals were performed (final N = 233).
RQ2b. Prior to performing multiple regression analyses, all extreme univariate outliers were
removed from the variables of interest. The strongest predictors were chosen based
on correlational analyses (see RQ2a). However, in the model predicting EIT gains,
42 Ekaterina Sudina and Luke Plonsky
the Sessions and Logins variables were highly correlated. The ensure the absence of
multicollinearity, the Sessions variable was removed from the model because it had
a weaker correlation with EIT gains than the Logins variable. Following the removal
of 10 multivariate outliers on three continuous predictor variables in the model pre-
dicting EIT gains and the removal of 19 multivariate outliers on two continuous pre-
dictor variables in the model predicting C‑test gains, the assumptions of linearity;
absence of multicollinearity; absence of autocorrelation; and normality, linearity, and
homoscedasticity of residuals were met.
RQ3. The assumptions for Pearson correlations between Duolingo app usage variables (i.e.,
frequency, duration, and intensity) and individual differences (i.e., L2 grit and moti-
vation) were satisfied after removing extreme outliers (|z| > .3.29) from the variables
of interest. The inspection of the matrix scatterplot supported the assumption of lin-
earity. To account for occasional deviations from normality, which were indicated by
Q-Q plots and histograms, bootstrapped Pearson correlations (based on 1,000 sam-
ples) with bias-corrected and accelerated confidence intervals were performed (final
N = 260).
RQ4a. Concerning the assumptions for Pearson correlations, first, eight extreme outliers
(|z| > .3.29) were removed from the gains variables (i.e., two from the EIT and six from
the C‑test gains), and four outliers were eliminated from L2 grit consistency of inter-
est because they were found to affect the correlation values. A matrix scatterplot did
not indicate any violations of linearity. To account for minor deviations from normal-
ity (as suggested by Q-Q plots and histograms), bootstrapped Pearson correlations
(based on 1,000 samples) with bias-corrected and accelerated confidence intervals
were performed (final N = 248).
RQ4b. To check the assumptions for multiple regression analyses, first, all extreme univariate
outliers were removed from the variables of interest (see RQ4a). Five multivariate out-
liers on three continuous predictor variables were removed as well. All the assump-
tions of linearity, the absence of multicollinearity, the absence of autocorrelation, and
normality, linearity, and homoscedasticity of residuals were met.
Luke Plonsky
Northern Arizona University
United States
lukeplonsky@gmail.com
Co-author information
Ekaterina Sudina
East Carolina University
sudinae22@ecu.edu
The effects of frequency, duration, and intensity on L2 learning through Duolingo 43
Publication history