0% found this document useful (0 votes)
3 views13 pages

Article 2

Uploaded by

Lydia Tanggo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views13 pages

Article 2

Uploaded by

Lydia Tanggo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

PHYSICAL REVIEW PHYSICS EDUCATION RESEARCH 21, 010103 (2025)

Editors' Suggestion

Applying cognitive diagnostic models to mechanics concept inventories


Vy Le ,1 Jayson M. Nissen ,2 Xiuxiu Tang,3 Yuxiao Zhang ,3 Amirreza Mehrabi ,4
Jason W. Morphew ,4 Hua Hua Chang,3 and Ben Van Dusen 1
1
School of Education, Iowa State University, Ames, Iowa 50011, USA
2
Nissen Education and Research Design, Monterey, California 93940, USA
3
College of Education, Purdue University, West Lafayette, Indianapolis 47907, USA
4
School of Engineering Education, Purdue University, West Lafayette, Indianapolis 47907, USA

(Received 13 March 2024; accepted 25 July 2024; published 21 January 2025)

In physics education research, instructors and researchers often use research-based assessments (RBAs)
to assess students’ skills and knowledge. In this paper, we support the development of a mechanics
cognitive diagnostic to test and implement effective and equitable pedagogies for physics instruction.
Adaptive assessments using cognitive diagnostic models provide significant advantages over fixed-length
RBAs commonly used in physics education research. As part of a broader project to develop a cognitive
diagnostic assessment for introductory mechanics within an evidence-centered design framework, we
identified and tested the student models of four skills that cross content areas in introductory physics: apply
vectors, conceptual relationships, algebra, and visualizations. We developed the student models in three
steps. First, we based the model on learning objectives from instructors. Second, we coded the items on
RBAs using the student models. Finally, we then tested and refined this coding using a common cognitive
diagnostic model, the deterministic inputs, noisy “and” gate model. The data included 19 889 students who
completed either the Force Concept Inventory, Force and Motion Conceptual Evaluation, or Energy and
Momentum Conceptual Survey on the LASSO platform. The results indicated a good to adequate fit for the
student models with high accuracies for classifying students with many of the skills. The items from these
three RBAs do not cover all of the skills in enough detail, however, they will form a useful initial item bank
for the development of the mechanics cognitive diagnostic.

DOI: 10.1103/PhysRevPhysEducRes.21.010103

I. INTRODUCTION assessments: (i) a lack of easily actionable information and


(ii) a lack of timely information [3]. Examining overall
Since the development of the Force Concept Inventory
RBA scores on student pretests can inform an instructor
(FCI), research-based assessments (RBAs) have played an
how well prepared a group of students is. Still, the overall
important role in shaping the landscape of physics educa-
RBA scores do not help instructors identify the specific
tion research [1,2]. RBAs have provided instructors and
skills students need to gain to be successful. Instructors and
researchers with empirical evidence about how students
researchers also examine student gains in scores from the
learn and change throughout courses [1,3]. Researchers
first (pretest) to the last (post-test) week of class. While this
used data from RBAs to assess the impact of curricular and
is a useful measure of the impact on instruction, it is an
pedagogical innovations [1]. RBAs also play a central role
inherently retrospective activity that cannot inform instruc-
in documenting inequities in physics courses before and
tion throughout a course.
after instruction [4,5]. In previous studies, researchers have
To address the shortcomings of existing RBAs, we are
primarily used the data from the RBAs as summative
developing the mechanics cognitive diagnostic (MCD).
assessments to evaluate the effectiveness of a course [5–7].
The MCD is a cognitive diagnostic (CD) computerized
Although some instructors use RBAs as formative
adaptive testing (CAT) assessment [10]. CD-CATs are
assessments to inform their instructions, such as creating
adaptive assessments that can cover the specific contents
groups with diverse content knowledge [8,9], two short-
and skills an instructor needs and wishes to assess them.
comings of existing RBAs hamper their use as formative
CDs assess which skills a student has or has not mastered
[11]. CATs can adapt to students’ proficiency level and skill
mastery profile, making assessment individualized and
Published by the American Physical Society under the terms of more efficient. These features allow an instructor to
the Creative Commons Attribution 4.0 International license.
Further distribution of this work must maintain attribution to administer a CD-CAT as a formative assessment throughout
the author(s) and the published article’s title, journal citation, a semester. The MCD will provide instructors with student-
and DOI. level and course-level assessments of student content

2469-9896=25=21(1)=010103(13) 010103-1 Published by the American Physical Society


VY LE et al. PHYS. REV. PHYS. EDUC. RES. 21, 010103 (2025)

knowledge and skill acquisition to help tailor instruction to and pencil, but the administration is moving to online
students’ needs. formats [27]. This move to online data collection has led to
To support the development of the MCD, we investigated the development of CATs for introductory physics that have
the skills assessed by three RBAs commonly used in advantages over fixed-length tests. In this section, we
introductory college mechanics courses [1]. This research discuss RBAs in introductory mechanics, options for
develops the models for the student skills and the evidence administering RBAs online, CAT broadly, and the appli-
for assessing those skills as a component of the larger cation of CAT to RBAs in physics.
development of the MCD. The MCD will leverage this
information to provide instructors with timely and action-
A. RBAs in introductory mechanics
able formative assessments.
PhysPort [28] provides an extensive list of RBAs for
II. RESEARCH QUESTION physics and other extensive pedagogical resources.
PhysPort, however, does not administer assessments on-
To support the development of the MCD to measure line. RBA developers and researchers have instead often
skills across introductory mechanics content areas, we relied on Qualtrics to administer the RBAs they develop or
developed and applied a model of four skills to three use online or the LASSO platform [26,27]. Administering
commonly used RBAs for introductory mechanics courses. RBAs online allows assessing students in class or outside of
To this end, we ask the following research question: class to save class time, automatically analyze the collected
• What skills and content areas do three RBAs for data, and aggregate the data for research purposes [29].
introductory mechanics cover? PhysPort describes 117 RBAs [28] with 16 RBAs for
introductory mechanics. Each RBA targets content areas and
III. DEFINITIONS skills important for physics learning. The titles of each RBA
often state the focus of the RBAs. For example, our study
To support readers’ interpretation of our research, Table I
analyzed data from three RBAs because we had access to
includes a selection of terms and their definitions.
enough data for the analysis in this paper through the LASSO
database. The Force Concept Inventory (FCI) [30] focuses on
IV. LITERATURE REVIEW conceptual knowledge of forces and kinematics. The Force
Many physics education researchers and instructors use and Motion Conceptual Evaluation (FMCE) [31] provides
existing fixed-length RBAs. PhysPort [25] and the LASSO similar coverage but has four energy questions. The Energy
platform [26] provide lists and resources of these RBAs. and Momentum Conceptual Survey (EMCS) [32] covers
Initially, instructors administered these RBAs with paper exactly what the name states. Other assessment names also

TABLE I. Definitions of terms.

Term—Definition
Computerized adaptive testing (CAT)—Administered on computers, the test adaptively selects appropriate items for each person to
match student proficiency [12–14].
Proficiency—“…the student’s general facility with answering the items correctly on the assessment under consideration” [15]. Higher
proficiency increases the probability of answering assessment items correctly. Different fields use different terms for proficiency, such
as skill, ability, latent trait, and omega.
Skills—A latent attribute that students need to master to answer items correctly and that cuts across content areas [13,16,17].
Q-matrix—A Q-matrix, or “question matrix,” is a binary matrix that maps the relationship between test items and the underlying skills
they measure. Each row represents a test item, and each column represents a specific skill. An entry of 1 in the matrix indicates that a
particular skill is required to answer the corresponding test item correctly, while a 0 indicates that the skill is not required.
Cognitive diagnostic (CD) assessment—An assessment method that evaluates students on specific skills to determine mastery. In
contrast to traditional assessment methods that measure students on a single proficiency, CD provides diagnostic information on
students’ skill strengths and weaknesses to support personalized educational strategies [18,19].
Classification accuracy—The agreement between observed and true skill classifications. In practice, this is calculated using the expected
skill classifications rather than the true classifications, which is detailed in an example around Eqs. (4) and (6) in Ref. [20].
Deterministic inputs, noisy “and” gate (DINA) model—A cognitive diagnostic model assuming that a student must master all the
required skills to solve an item correctly. The absence of any required skills cannot be compensated by the mastery of others. This
model operates within a binary framework, categorizing each skill as either mastered or not mastered [19,21–23].
Evidence-centered design—A framework for developing educational assessments based on establishing logical, evidence-based
arguments [24].

010103-2
APPLYING COGNITIVE DIAGNOSTIC MODELS … PHYS. REV. PHYS. EDUC. RES. 21, 010103 (2025)

portray skills or content areas of interest to physics education: evaluate students’ proficiencies [12,39]. One such study by
the Test of Understanding Graphs in Kinematics, the Test of Istiyono et al. [40] utilized CAT to assess the physics
Understanding Vectors in Kinematics, and the Rotational problem-solving skills of senior high school students,
Kinematics Inventory. These names imply that graphs and revealing that most students’ competencies fell within
vectors play an important role in many physics courses and the medium-to-low categories. Morphew et al. [12]
that many physics courses cover rotation. As discussed explored the use of CAT to evaluate physics proficiency
below, cognitive diagnostics allow for incorporating addi- and identify the areas where students needed to improve
tional items to cover new topics throughout their lifetime. when preparing for course exams in an introductory
physics course. Their studies showed that students who
B. Cognitive diagnostic—Computerized used the CAT improved their performance on subsequent
adaptive testing exams. In another study, Yasuda et al. [41] also indicated
Computerized adaptive testing (CAT) uses item response CAT can reduce testing time by shorter test lengths while
theory to establish a relationship between the student’s maintaining the accuracy of test measurement and admin-
proficiency levels and the probability of their success in istration. Yasuda et al. [39] examined item overexposure in
answering test items [13]. CAT selects items based on FCI-CAT, employing pretest proficiency for item selection.
student responses to the preceding items to estimate the This shortened test duration while maintaining accuracy
student’s proficiency and then aligns each item’s difficulty and enhanced security by reducing item content memori-
with the individual’s proficiency [13]. This continuous zation and sharing among students.
adaptation of item difficulty to student proficiency ensures
that the test remains challenging and engaging for the V. THEORETICAL FRAMEWORK
students throughout its duration and provides a more We drew on evidence-centered design [24] to inform our
precise estimation of the proficiency of students than development of the MCD. Evidence-centered design was
paper-and-pencil assessment [12–14]. Compared to first applied in the high-stakes contexts of the graduate
paper-and-pencil assessment methods, CAT requires fewer record examinations [24,42] and has also been effectively
items to accurately measure students’ proficiency mean- utilized in physics education research for the development
while controlling the selected items concerning their of RBAs [43,44]. We used three core premises in the
content variety [33]. Chen et al. [34] show that CAT evidence-centered design framework [24].
supports test security by drawing from a large item bank to 1. Assessment developers need content and context
control for item overexposure and how CAT can use pretest expertise to create high-quality items. In this analysis,
proficiency estimates for item selection and proficiency we focused our analysis on three RBAs developed by
estimation to maximize test efficiency. physics education researchers—FCI, FMCE, and
Combining cognitive diagnostic (CD) models and CAT EMCS.
improves the assessment process and categorizes students 2. Assessment developers use evidence-based reason-
based on their mastery of distinct skills associated with ing to evaluate students’ comprehension and identify
each item. CD models aim to estimate how the students’ misunderstandings accurately. In this analysis, we
cognitive proficiency relates to the specific skills or developed a Q-matrix that identified which under-
contents necessary to solve individual test items [13,35], lying skills were required to correctly answer each
with skill as a fundamental cognitive unit or proficiency item (more details in Sec. VI B).
that students need to acquire and master to answer certain 3. When creating assessments, developers must con-
items [16,17]. Deterministic inputs, noisy “and” gate sider various factors such as resource availability,
(DINA) model emerges as a CD model that facilitates limitations, and usage conditions. For instance, the
the assessment of skill mastery profiles and estimating item LASSO platform supports multiple-choice items and
parameters [36]. The DINA model leverages a Q-matrix to needs web-enabled devices, but it conserves class
test the relationships between items and the skills requisite and instructor time.
to answer them [37], thereby providing a structured Our work used the conceptual assessment framework
framework for monitoring the mastery levels of distinct provided by the evidence-centered design framework with
proficiency [37]. The DINA model is applied for the its five models [24] (shown in Fig. 1) to guide assessment
evaluation of the mastery situation of students across development. The models and their connections to our
various skills, including problem solving [38], computa- work are as follows:
tional thinking [17], and domain-specific knowledge [37]. 1. Student models focus on identifying one or more
variables directly relevant to the knowledge, skills,
C. CAT in physics education or proficiencies an instructor wishes to examine. In
We are unaware of any CD assessments in physics. this project, a qualitative analysis (see Sec. VI B)
Researchers have, however, conducted studies on the indicated that four skills (i.e., apply vectors, con-
effectiveness of CAT using an item response theory to ceptual relationships, algebra, and visualizations)

010103-3
VY LE et al. PHYS. REV. PHYS. EDUC. RES. 21, 010103 (2025)

FIG. 1. An evidence-centered design framework for creating the mechanics cognitive diagnostic (MCD). This paper focuses on the
student models and evidence models. The student models determine the skills and content areas that our assessment aims to measure.
The evidence models apply the DINA model to the multiple-choice questions (task model) students answer to measure students’ skills.
Our CD-CAT algorithm will determine which items to ask students, who will take the assessment online through the LASSO platform.

and four content areas (i.e., kinematics, forces, skills and content areas to measure for the student models.
energy, and momentum) would be optimum for Subsequent quantitative analyses drove the testing of the
our MCD. evidence models and iterative improvements of the student
2. Evidence models include evidence rules and meas- models. We first used artifacts from courses to build the
urement models to provide a guide to update student models of skills that cut across the content of
information regarding a student’s performance. introductory mechanics courses. We then identified RBAs
The evidence rules govern how observable variables with sufficient data available through the LASSO platform
summarize a student’s performance on individual and coded each item for the skills it assessed. Finally, we used
test items. The measurement model transforms the an iterative process that applied the DINA model to build the
student responses into the student skill profile. In evidence models and to improve our definitions of the skills
this project, the evidence rules were binary, right or and the coding of the skills on each item. In this iterative
wrong scores, and the measurement model is the process, the DINA model suggested changes to the item skill
DINA model, which includes the Q-matrix. codes initially made by content experts. The suggested
3. Task model describes what students do to provide changes were accepted or rejected by content experts. We
input to the evidence models. In this project, the task then ran a final DINA model on our revised codes.
model was multiple-choice questions.
4. Assembly model describes how the three models A. RBAs data collection and cleaning
above, including the student models, evidence mod-
Our analysis examined student responses on three RBAs:
els, and task models, work together to form the
the FCI (30 items, 12 932 students), FMCE (47 items, 5510
psychometric frame of the assessment. In the
students), and EMCS (25 items, 1447 students). Our dataset
broader project, we developed a CD-CAT algorithm
came from the LASSO platform [26,29,45]. LASSO pro-
that integrated models 1–3 for the MCD.
vided post-test data from 19 889 students across the three
5. Delivery model describes integrating all the models
assessments. We removed assessments completed in less
required for evaluation. We used the online LASSO
than 5 min and assessments with missing answers.
platform [29,45] in this project.
In this paper, we focus on the student models and
evidence models (models 1–2). These models are instru- B. Qualitative data analysis
mental in aligning our analysis with the research question. We developed an initial list of skills and content areas
By evaluating the student models, we gain insights into the covered in physics courses by coding learning objectives
range of competencies RBAs are designed to assess. from courses using standards-based grading. We focused
Similarly, through the evidence models, we understand on standards-based grading because instructors explicitly
how these assessments capture and represent student list the learning objectives students should master during
understanding in various skills and content areas. the course [46]. Initially, we coded a set of skills based on
both the standards and the items on the RBAs; the skills
included apply vectors, conceptual understanding, algebra,
VI. MATERIALS AND METHODS
visualizations, and definitions. We discarded definitions as
To answer the research question, we employed a mixed a skill because it represents a memorized response that the
methods approach using qualitative coding to identify the other skills covered in greater depth by asking students to

010103-4
APPLYING COGNITIVE DIAGNOSTIC MODELS … PHYS. REV. PHYS. EDUC. RES. 21, 010103 (2025)

TABLE II. Definition of the skills in the FCI, FMCE, and EMCS assessments.

Skills Definition
Apply vectors Item requires manipulating vectors in more than one dimension or has a change in sign
for a 1D vector quantity.
Conceptual relationships Item requires students to identify a relationship between variables and/or the situations
in which those relationships apply.
Algebra Item requires students to reorganize one or more equations. This goes beyond recognizing
the standard forms of equations.
Visualizations Item requires extracting information from or creating formal visualizations
such as xy plots, bar plots, or line graphs.

TABLE III. Definition of the content areas in the FCI, FMCE, and EMCS assessments.

Content areas Definition


Kinematics Items concerning the motion of objects without reference to the forces that cause the motion.
Forces Free body diagram and Newtonian laws.
Energy Conservation of energy, work, setup system, and the relationship between force and potential energy.
Momentum Conservation of momentum and impulse.

apply or understand the concept. And, we are not aware of items and the required skills, which we defined in Table I.
RBAs for introductory physics that ask definition ques- Each row of the Q-matrix corresponds to a test item, and
tions. Table II lists the four skills and their definitions. each column corresponds to a skill. Q-matrix entries are
We initially coded content areas at a finer grain size to binary, indicating whether a skill is needed for a specific
match the standards-based grading learning objectives, e.g., item. The DINA model produces a skill profile for each
kinematics was split into four areas across two variables: student, represented as a binary vector, indicating whether
1D or 2D and constant velocity or constant acceleration. they have mastered each skill. For example, a profile of [1,
These content areas, however, were too fine grained to 0, 1, 0] means the student has mastered skills 1 and 3 but
develop an assessment with a reasonable length for students not skills 2 and 4. The DINA model assumes a student
to complete or a realistic size item bank. Therefore, we needs to have mastered all the required skills for a particular
simplified the content codes: i.e., kinematics, forces, item to answer it correctly. If a student lacks even one
energy, and momentum for these three RBAs. Table III required skill, the model assumes the student will answer
lists the four content areas covered by these three RBAs and the item incorrectly [23]. The model incorporates a prob-
their definitions. abilistic component (noisy “AND” gate) to account for real-
Based on this initial set of codes we developed, we coded world inconsistencies with two complementary parameters:
each item for its relevant skills and content areas. Our slip (s) and guess (g). Slip is the probability that a student
coding team included three researchers with backgrounds who has mastered all the required skills still answers the
in physics and teaching physics. Each item was independ- item incorrectly due to carelessness, distraction, or error.
ently coded by at least two team members. The three coders Guess is the probability that a student who has not mastered
then compared the coding for the items and reached a all the required skills answers the item correctly by
consensus on all items. This consensus coding of the three guessing or other factors. Slip and guess add a stochastic
assessments provided one of the inputs into the DINA element to help to account for the noise in real-testing
analysis. scenarios, where students might guess or make unexpected
errors. For each item, the probability that a student answers
C. Quantitative data analysis correctly is determined by whether they have the required
skills and the slip and guess parameters. If the student has
1. DINA model all required skills then PðcorrectÞ ¼ 1 − s. If the student
The Deterministic Input, Noisy “AND” gate (DINA) does not have all required skills then PðcorrectÞ ¼ g. The
model is the foundational cognitive diagnostic model model estimates each students skill profile based on their
[21,22]. The DINA model is used to analyze responses responses, the Q-matrix, and the slip and guess parameters
to test items and determine the underlying skills that for each item. We used the DINA model because the model
students possess [19]. A Q-matrix [47] (acting as a fits indicated it was not necessary to use a more complex
deterministic input) defines the relationship between test model like the generalized DINA model.

010103-5
VY LE et al. PHYS. REV. PHYS. EDUC. RES. 21, 010103 (2025)

In this study, we used the DINA model to analyze TABLE IV. Q-matrix modifications and adoption rates.
students’ response data for each of the three RBAs to
refine our item codes further and calibrate each item’s slip Total Possible Suggested Adopted Adoption Change
items changes changes changes rate (%) rate (%)
and guess parameters. The DINA model analyses also
generated skill mastery profiles for each student, which FCI 30 90 11 7 64 7.8
were not the focus of the research question in this paper. FMCE 47 141 14 5 36 3.5
These psychometric analyses were implemented using the EMCS 25 75 1 1 100 4.0
G-DINA package [48] in the R programming environment.
Overall 102 306 26 13 50 4.2
RMSEA2 and SRMSR were used to assess the degree of
the model-data fit. RMSEA2 is the root mean square error
approximation (RMSEA) based on the M2 statistic using skill was initially not considered essential for item 7.
the univariate and bivariate margins. RMSEA2 ranges However, empirical response data suggested that this skill
from 0 to 1, and RMSEA2 < 0.06 indicates a good fit was required to answer item 7 correctly. Postreview, the
[49,50]. SRMSR, the standardized root mean squared expert panel endorsed this modification; thereby, the value
residual, has acceptable values ranging between 0 and 0.8. in the Q-matrix corresponding to the intersection of item 7
Models with SRMSR < 0.05 can be viewed as a well- and conceptual relationships was changed from “0” to “1.”
fitted model, and models with SRMSR < 0.08 are typi- Overall, only 8.5% of the codings (26 of 306) were
cally considered acceptable models [50–52]. Additionally, identified for reexamination by this analysis. Of the 26
the skill-level classification accuracy, defined in Table I, proposed changes, 13 were adopted across the 3 assess-
informed the reliability and validity of the CD assessment. ments, yielding an overall adoption rate of 50%. This
Classification accuracies range from 0 to 1, with values iterative approach to informing the validity of the Q-matrix
greater than or equal to 0.9 considered high [53,54] and avoids overreliance on either expert opinion or empirical
values greater than 0.8 are acceptable [55]. data, harmonizing both information sources to enhance the
The appropriateness of the Q-matrix plays an important accuracy of the Q-matrix. Table VIII (see the Appendix)
role in CD assessments and affects the degree of model- shows the final coding for each RBA item across the four
data fit. Inappropriate specifications in the Q-matrix may content areas and four skills.
lead to poor model fit and thus may produce incorrect
skill diagnosis results for students. Therefore, we need a VII. FINDINGS
Q-matrix validation step in the study. The input Q-matrices
for the DINA analysis for each RBA were constructed by This section addresses the research question by detailing
content experts, as detailed in the prior section. In the the skills and content areas measured by the three assess-
Q-matrix validation step, detailed below, the DINA analysis ments, as detailed in Table VIII in the Appendix. First, we
further examined each Q-matrix to identify potential present which of the four skills the items on the three
misspecifications in the Q-matrices. assessments measured and the number of skills the items
measured. The specific models relating the items to the four
2. Q-matrix validation skills are presented in the Appendix, see Tables IX–XI.
Second, we show the content areas covered in the three
The analysis fitted the DINA model to students’ post- assessments. Finally, we examine the skills across content
assessment responses using the Q-matrix constructed by areas. This structure highlights the various aspects of the
the three coders. The proportion of variance accounted for items in these three assessments.
method [56] measured the relationships between the items
and the skills specified in the provided Q-matrix. The
analysis of the empirical response data suggested changes A. Skills
to the provided Q-matrix, which the three coders reviewed. FCI—The FCI assessed three skills (Fig. 2). Eighteen
The coders assessed the suggested modifications for how items assessed apply vector skill, 17 assessed conceptual
well they aligned with the definitions and revised the relationships skill, 1 assessed visualizations skill, and 0
Q-matrix when the majority of the team agreed with the assessed algebra skill. The majority of items assessed a
suggested changes. The refined Q-matrix was then used in single skill. Twenty-four items (80%) assessed a single
subsequent CD modeling analyses. skill, 6 items (20%) assessed two skills, and 0 items
Table IV presents a summary detailing the frequency of assessed three skills (Table V).
data-driven modifications suggested, adopted by the FMCE—The FMCE assessed the same three skills as the
coders, and the rate of adoption for each of the three FCI (Fig. 2). All 47 items assessed conceptual relationships
assessments under study. The FCI, for example, had 11 skill, 19 items assessed the visualizations skill, 18 items
proposed changes of the 90 possible changes (30 items assessed apply vectors skill, and 0 items assessed algebra
each with three possible skills), and the coders adopted 7 of skill. The majority of items assessed multiple skills.
these suggestions. For instance, conceptual relationships Thirteen items (28%) assessed a single skill, while 31

010103-6
APPLYING COGNITIVE DIAGNOSTIC MODELS … PHYS. REV. PHYS. EDUC. RES. 21, 010103 (2025)

30 assessed a single skill, 5 items (20%) assessed two skills,


Kinematics
Forces and 0 items assessed three skills (Table V).
Energy
20 Momentum

FCI
1. DINA model fit
10
10 8 8 9
The analysis fitted the DINA model with the refined
1 Q-matrix to the response data. According to the established
0 criteria [50,57], the model demonstrated satisfactory
30
31
fit (RMSEA2 < 0.05, SRMSR < 0.07) for the FCI
Number of items

(RMSEA2 ¼ 0.048, SRMSR ¼ 0.062) and EMCS


(RMSEA2 ¼ 0.028, SRMSR ¼ 0.041), whereas the fit

FMCE
20
14
12
for the FMCE was unsatisfactory (RMSEA2 ¼ 0.090,
10 9 8 SRMSR ¼ 0.110). These outcomes suggest that the model
4 4
2
adequately represents the underlying data structure for the
0 FCI and EMCS but might not capture the latent structure of
the FMCE well.
30

2. DINA model classification accuracy

EMCS
20
15
Table VI presents the classification accuracy [20] for
10
10 each skill across the three RBAs. As discussed in the skills
4 section, not all skills were measured by each of the RBAs; 9
1 1 1
0 of 12 were possible. For those skills that were measured, 7
Apply Vectors Conceptual Algebra Visualizations of the 9 classification accuracies were high (over 0.9). The
Relationships
Skills classification accuracy of visualizations for the FCI (0.79)
and algebra for the EMCS (0.63) was notably lower. The
FIG. 2. The distribution of items across skills, content areas, lower classification accuracy reflects the lack of items
and assessments. Note that each item can assess multiple skills. measuring these skills (Fig. 2).
Only 2 items, 3 and 13 of the EMCS assessed multiple content
areas (i.e., energy and momentum) under the conceptual rela- B. Content areas
tionships skill.
FCI—The FCI assessed two content areas (Fig. 2 and
Table VII). Eighteen items assessed forces, 12 assessed
items (66%) assessed two skills, and 3 items (6%) assessed kinematics, and 0 assessed energy and momentum. All 30
three skills (Table V). items (100%) assessed a single content area.
EMCS—Similar to the FCI and FMCE, the EMCS FMCE—The FMCE assessed three content areas (Fig. 2
assessed the apply vectors and conceptual relationships and Table VII). Thirty-one items assessed forces, 12
skills (Fig. 2). The EMCS differed in that it included 2
items that assessed the algebra skill. Of the 25 EMCS items, TABLE VI. Skill classification accuracy by assessment.
23 assessed the conceptual relationships skill (with items 3
and 13 both coded for energy and momentum), 5 assessed Apply Conceptual
the apply vectors skill, 2 assessed the algebra skill, and 0 vectors relationships Algebra Visualizations
assessed the visualizations skill. The EMCS was the only FCI 0.97 0.96 ··· 0.79
assessment with items assessing the algebra skill. The most FMCE 0.96 0.98 ··· 0.91
of items assessed a single skill. Twenty items (80%) EMCS 0.94 0.95 0.63 ···

TABLE VII. The number of items across content areas they


TABLE V. The distribution of items across the number of skills
assess. Only 2 items, 3 and 13 of the EMCS assessed multiple
they assess.
content areas (i.e., energy and momentum) under the conceptual
Number of skills relationships skill.

1 2 3 Total Kinematics Forces Energy Momentum


FCI 24 (80%) 6 (20%) 0 (0%) FCI 30 12 18 0 0
FMCE 13 (28%) 31 (66%) 3 (6%) FMCE 47 12 31 4 0
EMCS 20 (80%) 5 (20%) 0 (0%) EMCS 25 0 0 15 12
Total 57 (56%) 42 (41%) 3 (3%) Overall 102 24 49 19 12

010103-7
VY LE et al. PHYS. REV. PHYS. EDUC. RES. 21, 010103 (2025)

40 Kinematics skill, which follows from their conceptual focus. In


Number of items 40
Forces
Energy addition to measuring the conceptual relationships skill,
30 Momentum all three RBAs also included sufficient items to assess the
24
20 19 apply vectors skill with high classification accuracies. The
20
12
10 10
FMCE included sufficient items to assess the visualizations
10 8
4
2
skill. These results, in addition to other RBAs on visual-
1 1 1
0 izations and apply vectors specifically Zavala et al. [58],
Apply Vectors Conceptual Algebra Visualizations indicate that these three skills are common learning
Relationships
Skills objectives of physics instruction.
The three RBAs did not include enough items assessing
FIG. 3. The distribution of items from the three assessments the algebra skill to inform how well that skill fits within our
(FCI, FMCE, and EMCS) across skills and content areas. Each student models. This likely follows from these RBAs being
assessment contains a different number of items, and some items conceptual assessments developed to refocus physics
assess multiple skills and content areas.
instruction from memorization and applying equations to
a deeper understanding of the conceptual relationships
assessed kinematics, 4 assessed energy, and 0 assessed linking the physical world. However, applying and manipu-
momentum. Similar to FCI, all 47 (100%) items assessed a lating equations was a common learning objective in the
single content area. standards-based grading rubrics we used to develop our
EMCS—The EMCS assessed two content areas (Fig. 2 student models. Many instructors and students may want
and Table VII). Fifteen items assessed energy, 12 assessed formative assessments on algebra skills to support their
momentum, and 0 assessed kinematics and forces. Unlike teaching and learning.
the FCI and the FMCE, 23 items assessed a single content Most items in both the FCI and EMCS required mastery
area, and 2 items assessed two content areas (8%). of a single skill, while most items in the FMCE needed
multiple skills. Requiring multiple skills to answer an item
C. Skills × content areas correctly can have two effects. First, requiring mastery of
more than one skill typically makes the items more difficult
The distribution of skills assessed was not consistent
to answer. This is consistent with prior findings that the
across content areas, see Fig. 3. This inconsistency follows
FMCE is more difficult than the FCI [6]. Second, multiskill
from several aspects of the three RBAs. The FCI and
items can provide different information than a single-skill
FMCE did not measure the algebra skill. The EMCS did
item, and item banks should include a mix of single- and
not measure the visualizations skill. Most items came from
multiskill items to pick from to maximize the information
the FMCE and FCI, which focused more on forces than
generated by each item a student answers. Combining the
kinematics. Across the three RBAs, very few items mea-
three assessments into a single test bank provides a more
sured the apply vectors skill for energy (1) and momentum
even mix of single- and multiskill items than any of these
(4), even though applying vectors is central to momentum.
three RBAs.
And, very few items measured the visualizations skills for
energy (2) and momentum (0).
B. Evidence models—Model fits
and classification accuracies
VIII. DISCUSSION
The DINA model fit the FCI and EMCS well, but the fit
This study supports the development of the MCD within for the FMCE was marginal. The length and difficulty of
the evidence-centered design framework by focusing on the the FMCE may have driven this marginal fit. The large
student and evidence models (Fig. 1). For the student number of items assessing multiple skills may have also
models, the three RBAs measured all four skills, though to been a factor. Post hoc analyses to test these possibilities
different extents, across the four content areas. For the indicated that they were not major contributors to the
evidence models, the three RBAs assessed most of the marginal model fit of the FMCE. The additional analyses
skills with high classification accuracies. These results included the generalized DINA model, DINA models of the
indicate that the combined items from the three RBAs will first and second half of the FMCE, and separate DINA
provide an adequate initial item bank for the further models of the students from calculus- and algebra-based
development of the MCD. physics courses. The marginal fit likely follows from our
post hoc application of our skill model to the FMCE. This
A. Student models—The four skills model fits two assessments well, and one assessment
The three RBAs—FCI, FMCE, and EMCS—each marginally indicates that the student models of the skills
included items that assessed three of the four skills across are broadly applicable to physics learning, and items from
two to three content areas. The three RBAs all included a these three assessments can form the initial item bank of a
majority of items that assessed the conceptual relationships cognitive diagnostic.

010103-8
APPLYING COGNITIVE DIAGNOSTIC MODELS … PHYS. REV. PHYS. EDUC. RES. 21, 010103 (2025)

The three RBAs had classification accuracy of above 0.9 the inclusion of new items under development to fill in gaps
for the apply vectors and conceptual relationships skills, as in the item bank. The combined item bank will also
shown in Table VI. This makes sense for the FMCE and improve classification accuracy by having more items to
FCI, given that they each had at least 17 items for each of draw on. However, the high classification accuracy (0.941)
the apply vectors and conceptual relationships skills for the apply vectors skill on the EMCS indicates that even
(Fig. 2). Although the EMCS only had 5 items measuring just 5 items can provide a high classification accuracy. This
apply vectors, the classification accuracy was still 0.94. result indicates that shorter assessments may allow for high
This finding indicates that a relatively small number of levels of classification accuracy for skills while also using
items can still accurately assess a skill. The number of items fewer questions. We plan on ensuring sufficient classifi-
measuring algebra skills on the EMCS (2) and visualization cation accuracy with a minimum of ten items for each
skills on the FCI (1) was not sufficient to generate useful content area and skill combination. This will also provide
classification accuracies (< 0.8). Combining the three enough items to estimate student proficiency when an
assessments into a single-item bank should provide suffi- instructor administers a single content area and skill
cient coverage of apply vectors, conceptual relationships, combination as a weekly test. Future work will add content
and visualization skills, but it will not offer enough items to areas for mathematics and rotational mechanics.
assess the algebra skill. Additionally, the combined item Using LASSO as the delivery system for the MCD
bank will require additional items to assess the visualiza- provides instructors with an adaptive tool to assess stu-
tion and apply vectors skills in the content areas of energy dent’s skills and knowledge across content areas or in
and momentum. specific content areas. In particular, using a cognitive
diagnostic for the assembly model allows instructors to
IX. LIMITATIONS design formative assessments by choosing the skills and
The DINA analysis assumes students have mastered each content areas to measure. Integrating guidelines and con-
skill assessed by an item to answer that item correctly. A straints on test lengths will help instructors design accurate
less restrictive analysis, such as the generalized DINA, that assessments of those skills and content areas. The cognitive
assumes some questions can be answered by only master- diagnostic also allows flexible timing; instructors can
ing a subset of skills or by students who have only partially design pretests or post-tests that cover many skills and
mastered skills may provide a better fit. The three RBAs content areas or weekly tests focused on a few skills for one
constrained the skills that the analyses could test. This was content area.
an obvious issue for the algebra skill, which was only For researchers, the MCD will collect longitudinal data
assessed by two items on one assessment. Physics instruc- across skills and content areas. These data can inform the
tors also likely value and teach other skills they would want development of learning progressions or skills transfer
to assess, such as the ability to decompose complex across content areas, such as applying vectors in math-
problems into smaller pieces to solve as assessed by the ematical, kinematics, and momentum content areas.
Mechanics Reasoning Inventory [59]). The analysis does Developing more items that cover multiple content areas
not test the extent to which the items and assessments act can inform how physics content interacts, which current
differently across populations, e.g., gender, race, or type of RBAs do not assess. Because LASSO is free for instructors,
physics course. Mixed evidence exists about the measure- the data will likely also represent a broader cross section of
ment invariance [60] and differential item functioning physics learners [63] than physics education research has
[61,62] of the FCI and FMCE. The combination of items historically included [64].
from these three assessments administered through a
cognitive diagnostic at a large scale will provide a dataset ACKNOWLEDGMENTS
to identify and understand item differences and potential
This research was made possible through the financial
item biases between groups of students.
support provided by National Science Foundation Grant
No. 2141847. We extend our appreciation to LASSO for
X. CONCLUSIONS
their support in both collecting and sharing data for this
Combining 102 items from three RBAs into a single item research.
bank to create a CD-CAT provides a solid foundation for
building the MCD. The limited number of items assessing
APPENDIX
the algebra skill and the apply vectors and visualizations
skills for energy and momentum point to these as specific The Appendix includes the coding and refined Q-matrix
areas for improvement of the item bank. Delivering the tables (Table VIII–IX) for the three assessments used to
MCD online, fortunately, has the advantage of allowing for conduct the DINA model analysis.

010103-9
VY LE et al. PHYS. REV. PHYS. EDUC. RES. 21, 010103 (2025)

TABLE VIII. The skills and content areas for items from the FCI, FMCE, and EMCS. Note that “FCI_01” represents an abbreviation
of the assessment name and the number of the item on the assessment.

Content area Apply vectors Conceptual relationships Algebra Visualizations


Kinematics FCI_07, FCI_08, FCI_09, FCI_01, FCI_02, FCI_07, FCI_12, FCI_20, FMCE_22,
FCI_12, FCI_14, FCI_21, FCI_14, FCI_19, FCI_20, FCI_23, FMCE_23, FMCE_24,
FCI_22, FCI_23, FMCE_27, FMCE_22, FMCE_23, FMCE_24, FMCE_25, FMCE_26,
FMCE_28, FMCE_29, FMCE_25, FMCE_26, FMCE_27, FMCE_40, FMCE_41,
FMCE_41 FMCE_28, FMCE_29, FMCE_40, FMCE_42, FMCE_43
FMCE_41, FMCE_42, FMCE_43
Forces FCI_05, FCI_11, FCI_13, FCI_03, FCI_04, FCI_06, FCI_10, FMCE_14, FMCE_15,
FCI_17, FCI_18, FCI_25, FCI_15, FCI_16, FCI_24, FCI_25, FMCE_16, FMCE_17,
FCI_26, FCI_27, FCI_29, FCI_28, FMCE_01, FMCE_02, FMCE_18, FMCE_19,
FCI_30, FMCE_01, FMCE_03, FMCE_04, FMCE_05, FMCE_20, FMCE_21
FMCE_03, FMCE_04, FMCE_06, FMCE_07, FMCE_08,
FMCE_05, FMCE_06, FMCE_09, FMCE_10, FMCE_11,
FMCE_07, FMCE_08, FMCE_12, FMCE_13, FMCE_14,
FMCE_09, FMCE_10, FMCE_15, FMCE_16, FMCE_17,
FMCE_11, FMCE_12, FMCE_18, FMCE_19, FMCE_20,
FMCE_13, FMCE_20, FMCE_21, FMCE_30, FMCE_31,
FMCE_21 FMCE_32, FMCE_33, FMCE_34,
FMCE_35, FMCE_36, FMCE_37,
FMCE_38, FMCE_39
Energy EMCS_01 EMCS_01, EMCS_02, EMCS_03, EMCS_15 FMCE_44, FMCE_45
EMCS_04, EMCS_06, EMCS_08,
EMCS_09, EMCS_12, EMCS_13,
EMCS_15, EMCS_17, EMCS_20,
EMCS_22, EMCS_24, EMCS_25,
FMCE_44, FMCE_45, FMCE_46,
FMCE_47
Momentum EMCS_05, EMCS_11, EMCS_03, EMCS_05, EMCS_07, EMCS_21
EMCS_13, EMCS_23 EMCS_10, EMCS_13, EMCS_14,
EMCS_16, EMCS_18, EMCS_19,
EMCS_21

TABLE IX. The table provides the refined Q-matrix for each TABLE IX. (Continued)
FCI item, represented as binary coding, with * denoted adoption
changes from the suggested Q-matrix of the DINA model. Apply Conceptual
FCI item vectors relationships Algebra Visualizations
Apply Conceptual 15 0* 1 0 0
FCI item vectors relationships Algebra Visualizations 16 0 1 0 0
1 0 1 0 0 17 1 0 0 0
2 0* 1 0 0 18 1 0 0 0
3 0* 1 0 0 19 0 1 0 0
4 0 1 0 0 20 0 1 0 1
5 1 0 0 0 21 1 0 0 0
6 0 1 0 0 22 1 0 0 0
7 1 1 0 0 23 1 1 0 0
8 1 0* 0 0 24 0 1 0 0
9 1 0 0 0 25 1 1* 0 0
10 0 1 0 0 26 1 0 0 0
11 1 0 0 0 27 1 0* 0 0
12 1 1 0 0 28 0 1 0 0
13 1 0 0 0 29 1 0* 0 0
14 1 1 0 0 30 1 0 0 0
(Table continued)

010103-10
APPLYING COGNITIVE DIAGNOSTIC MODELS … PHYS. REV. PHYS. EDUC. RES. 21, 010103 (2025)

TABLE X. The table provides the refined Q-matrix for each TABLE X. (Continued)
FMCE item, represented as binary coding, with * denoted
adoption changes from the suggested Q-matrix of the DINA Apply Conceptual
model. FMCE item vectors relationships Algebra Visualizations

Apply Conceptual 40 0 1 0 1
FMCE item vectors relationships Algebra Visualizations 41 1* 1 0 1
42 0 1 0 1
1 1 1 0 0 43 0 1 0 1
2 0* 1 0 0 44 0 1 0 1
3 1 1 0 0 45 0 1 0 1
4 1* 1 0 0 46 0 1 0 0
5 1 1 0 0 47 0 1 0 0
6 1* 1 0 0
7 1 1 0 0
8 1 1 0 0
9 1 1 0 0 TABLE XI. The table provides the refined Q-matrix for each
10 1 1 0 0 EMCS item, represented as binary coding, with * denoting
11 1 1 0 0 adoption changes from the suggested Q-matrix of the DINA
12 1 1 0 0 model.
13 1 1 0 0
Apply Conceptual
14 0 1 0 1
EMCS item vectors relationships Algebra Visualizations
15 0 1 0 1
16 0 1 0 1 1 1 1 0 0
17 0 1 0 1 2 0 1 0 0
18 0 1 0 1 3 0 1 0 0
19 0 1 0 1 4 0 1 0 0
20 1* 1 0 1 5 1 1 0 0
21 1 1 0 1 6 0 1 0 0
22 0 1 0 1 7 0 1 0 0
23 0 1 0 1 8 0 1 0 0
24 0 1 0 1 9 0 1 0 0
25 0 1 0 1 10 0 1 0 0
26 0 1 0 1 11 1 0 0 0
27 1 1 0 0 12 0 1 0 0
28 1 1 0 0 13 1 1 0 0
29 1 1 0 0 14 0 1 0 0
30 0 1 0 0 15 0 1 1 0
31 0 1 0 0 16 0 1 0 0
32 0 1 0 0 17 0 1 0 0
33 0 1 0 0 18 0 1 0 0
34 0 1 0 0 19 0 1 0 0
35 0 1 0 0 20 0 1 0 0
36 0 1 0 0 21 0 1 1 0
37 0 1 0 0 22 0 1 0 0
38 0 1 0 0 23 1 0* 0 0
39 0 1 0 0 24 0 1 0 0
25 0 1 0 0
(Table continued)

[1] Adrian Madsen, Sarah B. McKagan, and Eleanor C. Sayre, [2] Jennifer L. Docktor and José P. Mestre, Synthesis of
Resource letter RBAI-1: Research-based assessment instru- discipline-based education research in physics, Phys.
ments in physics and astronomy, Am. J. Phys. 85, 245 (2017). Rev. ST Phys. Educ. Res. 10, 020119 (2014).

010103-11
VY LE et al. PHYS. REV. PHYS. EDUC. RES. 21, 010103 (2025)

[3] Adrian Madsen, Sarah B. McKagan, Mathew Sandy [19] Jimmy De La Torre and Nathan Minchen, Cognitively
Martinuk, Alexander Bell, and Eleanor C. Sayre, Re- diagnostic assessments and the cognitive diagnosis model
search-based assessment affordances and constraints: Per- framework, Psicol. Educ. 20, 89 (2014).
ceptions of physics faculty, Phys. Rev. Phys. Educ. Res. 12, [20] Wenyi Wang, Lihong Song, Ping Chen, Yaru Meng, and
010115 (2016). Shuliang Ding, Attribute-level and pattern-level classifica-
[4] Ben Van Dusen and Jayson Nissen, Equity in college tion consistency and accuracy indices for cognitive diag-
physics student learning: A critical quantitative intersec- nostic assessment, J. Educ. Measure. 52, 457 (2015).
tionality investigation, J. Res. Sci. Teach. 57, 33 (2020). [21] Edward Haertel, An application of latent class models to
[5] Bethany R Wilcox and HJ Lewandowski, Research-based assessment data, Appl. Psychol. Meas. 8, 333 (1984).
of students’ beliefs about experimental physics: When is [22] Brian W Junker and Klaas Sijtsma, Cognitive assessment
gender a factor?, Phys. Rev. Phys. Educ. Res. 12, 020130 models with few assumptions, and connections with non-
(2016). parametric item response theory, Appl. Psychol. Meas. 25,
[6] Ronald K Thornton, Dennis Kuhl, Karen Cummings, and 258 (2001).
Jeffrey Marx, Comparing the force and motion conceptual [23] J. de la Torre, DINA model and parameter estimation: A
evaluation and the force concept inventory, Phys. Rev. ST didactic, J. Educ. Behav. Stat. 34, 115 (2009).
Phys. Educ. Res. 5, 010105 (2009). [24] Robert J. Mislevy, Russell G. Almond, and Janice F. Lukas,
[7] Siera M. Stoen, Mark A. McDaniel, Regina F. Frey, K. A brief introduction to evidence-centered design, ETS Res.
Mairin Hynes, and Michael J. Cahill, Force concept Rep. Ser. 2003, i–29 (2003).
inventory: More than just conceptual understanding, Phys. [25] Physport assessments: Force and motion conceptual
Rev. Phys. Educ. Res. 16, 010105 (2020). evaluation (n.d.), https://www.physport.org/assessments/
[8] James T. Laverty, Amogh Sirnoorkar, Amali Priyanka assessment.cfm?A=FMCE.
Jambuge, Katherine D. Rainey, Joshua Weaver, [26] Learning Assistant Alliance (2020), https://lassoeducation
Alexander Adamson, and Bethany R. Wilcox, A new .org/?fbclid=IwAR0ACweS923WEt-7d_Q_s5AGr0TFjJf
paradigm for research-based assessment development, F8Gkq-r6-1Ajjr8onOL_yzkSsY0c.
presented at PER Conf. 2022, Grand Rapids, MI, [27] Ben Van Dusen, Mollee Shultz, Jayson M. Nissen, Bethany
10.1119/perc.2022.pr.Laverty. R. Wilcox, N. G. Holmes, Manher Jariwala, Eleanor W.
[9] Nance S. Wilson, Teachers expanding pedagogical content Close, H. J. Lewandowski, and Steven Pollock, Online
knowledge: Learning about formative assessment together, administration of research-based assessments, Am. J. Phys.
J. Serv. Educ. 34, 283 (2008). 89, 7 (2021).
[10] Jacqueline Leighton and Mark Gierl, Cognitive Diagnostic [28] Physport: Browse assessments (n.d.), https://www
Assessment for Education: Theory and Applications .physport.org/assessments/?fbclid=IwAR0-A-5UFMfpUs
(Cambridge University Press, Cambridge, England, 2007). QprAnyxSVlozSaMXZb9rUJ5wnrFOqw24aQDopYmW
[11] Ying Cui, Mark J. Gierl, and Hua-Hua Chang, Estimating SRP0I.
classification consistency and accuracy for cognitive diag- [29] Ben Van Dusen, LASSO: A new tool to support instructors
nostic assessment, J. Educ. Measure. 49, 19 (2012). and researchers, American Physics Society Forum on
[12] Jason W. Morphew, Jose P. Mestre, Hyeon-Ah Kang, Hua- Education Fall 2018, arXiv:1812.02299.
Hua Chang, and Gregory Fabry, Using computer adaptive [30] David Hestenes, Malcolm Wells, and Gregg Swackhamer,
testing to assess physics proficiency and improve exam Force concept inventory, Phys. Teach. 30, 141 (1992).
performance in an introductory physics course, Phys. Rev. [31] Ronald K. Thornton and David R. Sokoloff, Assessing
Phys. Educ. Res. 14, 020110 (2018). student learning of Newton’s laws: The force and motion
[13] Hua-Hua Chang, Psychometrics behind computerized conceptual evaluation and the evaluation of active learning
adaptive testing, Psychometrika 80, 1 (2015). laboratory and lecture curricula, Am. J. Phys. 66, 338
[14] David J. Weiss, Improving measurement quality and (1998).
efficiency with adaptive testing, Appl. Psychol. Meas. 6, [32] Chandralekha Singh and David Rosengrant, Multiple-
473 (1982). choice test of energy and momentum concepts, Am. J.
[15] John Stewart, John Hansen, and Lin Ding, Quantitative Phys. 71, 607 (2003).
methods in PER, in The International Handbook of Physics [33] Alper Şahin and Durmus Özbasi, Effects of content
Education Research: Special Topics (AIP Publishing LLC, balancing and item selection method on ability estimation
Melville, NY, 2023), Chap. 24. in computerized adaptive tests, Eurasian J. Educ. Res.
[16] Christoph Helm, Julia Warwas, and Henry Schirmer, (2017).
Cognitive diagnosis models of students’ skill profiles as [34] Shu-Ying Chen, Pui-Wa Lei, and Wen-Han Liao, Control-
a basis for adaptive teaching: An example from introduc- ling item exposure and test overlap on the fly in comput-
tory accounting classes, Empirical Res. Vocat. Educ. Train. erized adaptive testing, Br. J. Math. Stat. Psychol. 61, 471
14, 1 (2022). (2008).
[17] Tingxuan Li and Anne Traynor, The use of cognitive [35] Carlos Fernando Collares, Cognitive diagnostic modeling
diagnostic modeling in the assessment of computational in healthcare professions education: An eye-opener, Adv.
thinking, AERA Open 8, 23328584221081256 (2022). Health Sci. Educ. 27, 427 (2022).
[18] Hamdollah Ravand and Alexander Robitzsch, Cognitive [36] Rose C. Anamezie and Fidelis O. Nnadi, Parameterization
diagnostic modeling using R, Pract. Assess. Res. Eval. 20, of teacher-made physics achievement test using determin-
11 (2015). istic-input-noisy-and-gate (DINA) model, J. Prof. Issues

010103-12
APPLYING COGNITIVE DIAGNOSTIC MODELS … PHYS. REV. PHYS. EDUC. RES. 21, 010103 (2025)

Eng. Educ. Pract. 9, 101 (2018), https://iiste.org/Journals/ criteria versus new alternatives, Struct. Equation Modell.
index.php/JEP/article/view/45266. 6, 1 (1999).
[37] Yunxiao Chen, Jingchen Liu, Gongjun Xu, and Zhiliang [51] Alberto Maydeu-Olivares and Harry Joe, Assessing
Ying, Statistical analysis of Q-matrix based diagnostic approximate fit in categorical data analysis, Multivariate
classification models, J. Am. Stat. Assoc. 110, 850 (2015). Behav. Res. 49, 305 (2014).
[38] Jiwei Zhang, Jing Lu, Jing Yang, Zhaoyuan Zhang, and [52] Sung Tae Jang, The implications of intersectionality on
Shanshan Sun, Exploring multiple strategic problem Southeast Asian female students’ educational outcomes in
solving behaviors in educational psychology research by the United States: A critical quantitative intersectionality
using mixture cognitive diagnosis model, Front. Psychol. analysis, Am. Educ. Res. J. 55, 1268 (2018).
12, 568348 (2021). [53] Zhengqi Tan, Jimmy De la Torre, Wenchao Ma, David
[39] J Yasuda, N Mae, MM Hull, and M Taniguchi, Analysis to Huh, Mary E. Larimer, and Eun-Young Mun, A tutorial on
develop computerized adaptive testing with the force cognitive diagnosis modeling for characterizing mental
concept inventory, J. Phys. Conf. Ser. 1929, 012009 health symptom profiles using existing item responses,
(2021). Prev. Sci. 24, 480 (2023).
[40] Edi Istiyono, Wipsar Sunu Brams Dwandaru, and Revnika [54] Qianru Liang, Jimmy de la Torre, Mary E Larimer, and
Faizah, Mapping of physics problem-solving skills of Eun-Young Mun, Mental health symptom profiles over
senior high school students using PhysProSS-CAT, Res. time: A three-step latent transition cognitive diagnosis
Eval. Educ. 4, 144 (2018). modeling analysis with covariates, in Dependent Data in
[41] Jun-ichiro Yasuda, Naohiro Mae, Michael M. Hull, and Social Sciences Research (Springer, Cham, 2024),
Masa-aki Taniguchi, Optimizing the length of computer- pp. 539–562, 10.1007/978-3-031-56318-8_22.
ized adaptive testing for the force concept inventory, Phys. [55] Justin Paulsen, Dubravka Svetina, Yanan Feng, and
Rev. Phys. Educ. Res. 17, 010115 (2021). Montserrat Valdivia, Examining the impact of differential
[42] Kathleen M Sheehan, Irene Kostin, and Yoko Futagi, item functioning on classification accuracy in cognitive
Supporting efficient, evidence-centered item development diagnostic models, Appl. Psychol. Meas. 44, 267 (2020).
for the GRE verbal measure, ETS Research Report [56] Jimmy de la Torre and Chia-Yi Chiu, A general method of
No. RR-07-29, 2007. empirical Q-matrix validation, Psychometrika 81, 253
[43] Benjamin Pollard, Robert Hobbs, Rachel Henderson, (2016).
Marcos D. Caballero, and H. J. Lewandowski, Introductory [57] Peter M. Bentler, Comparative fit indexes in structural
physics lab instructors’ perspectives on measurement un- models, Psychol. Bull. 107, 238 (1990).
certainty, Phys. Rev. Phys. Educ. Res. 17, 010133 (2021). [58] Genaro Zavala, Santa Tejeda, Pablo Barniol, and Robert J.
[44] Michael Vignal, Gayle Geschwind, Benjamin Pollard, Beichner, Modifying the test of understanding graphs
Rachel Henderson, Marcos D. Caballero, and H. J. in kinematics, Phys. Rev. Phys. Educ. Res. 13, 020111
Lewandowski, Survey of physics reasoning on uncertainty (2017).
concepts in experiments: An assessment of measure- [59] Andrew Pawl, Analia Barrantes, Carolin Cardamone, Saif
ment uncertainty for introductory physics labs, arXiv: Rayyan, and David E. Pritchard, Development of a me-
2302.07336. chanics reasoning inventory, AIP Conf. Proc. 1413, 287
[45] Jayson M. Nissen, Ian Her Many Horses, Ben Van Dusen, (2012).
Manher Jariwala, and Eleanor Close, Providing context for [60] Alicen Morley, Jayson M. Nissen, and Ben Van Dusen,
identifying effective introductory mechanics courses, Phys. Measurement invariance across race and gender for the
Teach. 60, 179 (2022). force concept inventory, Phys. Rev. Phys. Educ. Res. 19,
[46] Ian D. Beatty, Standards-based grading in introductory 020102 (2023).
university physics, J. Scholarship Teach. Learn. 13, 1 [61] Adrienne Traxler, Rachel Henderson, John Stewart, Gay
(2013), https://scholarworks.iu.edu/journals/index.php/ Stewart, Alexis Papak, and Rebecca Lindell, Gender fair-
josotl/article/view/3264. ness within the force concept inventory, Phys. Rev. Phys.
[47] Kikumi K. Tatsuoka, Architecture of knowledge structures Educ. Res. 14, 010103 (2018).
and cognitive diagnosis: A statistical pattern recognition [62] Rachel Henderson, Paul Miller, John Stewart, Adrienne
and classification approach, in Cognitively Diagnostic Traxler, and Rebecca Lindell, Item-level gender fairness in
Assessment (Routledge, London, 2012), pp. 327–359. the force and motion conceptual evaluation and the con-
[48] Wenchao Ma and Jimmy de la Torre, GDINA: An R ceptual survey of electricity and magnetism, Phys. Rev.
package for cognitive diagnosis modeling, J. Stat. Softw. Phys. Educ. Res. 14, 020103 (2018).
93, 1 (2020). [63] Jayson M Nissen, Ian Her Many Horses, Ben Van Dusen,
[49] Daire Hooper, Joseph Coughlan, and Michael Mullen, Manher Jariwala, and Eleanor W. Close, Tools for identify-
Evaluating model fit: A synthesis of the structural equation ing courses that support development of expertlike
modelling literature, in Proceedings of the 7th European physics attitudes, Phys. Rev. Phys. Educ. Res. 17, 013103
Conference on Research Methodology for Business and (2021).
Management Studies (2008), Vol. 2008, pp. 195–200. [64] Stephen Kanim and Ximena C. Cid, Demographics of
[50] Li-tze Hu and Peter M. Bentler, Cutoff criteria for fit physics education research, Phys. Rev. Phys. Educ. Res.
indexes in covariance structure analysis: Conventional 16, 020106 (2020).

010103-13

You might also like