Guide
to
SDTM
Demographics
(DM)
Domain
(Note : Referred SDTMIG
V. 3.3 AND V. 3.4 For Creating
This Guide)
Overview of SDTM and Its Purpose
in Clinical Trials
What Is SDTM?
The Study Data Tabulation Model (SDTM) is a standardized
framework developed by the Clinical Data Interchange
Standards Consortium (CDISC). Its primary purpose is to
organize and present clinical trial data in a consistent
structure. By using SDTM, clinical trial data becomes easier to
understand, integrate, and submit to regulatory authorities
like the U.S. Food and Drug Administration (FDA) and the
European Medicines Agency (EMA).
Why Is SDTM Important?
Regulatory Compliance:
SDTM ensures that data submissions meet global regulatory
requirements, allowing for smoother reviews by agencies like
the FDA. Without SDTM, regulatory authorities may reject or
delay the review of clinical trial data due to inconsistencies or
lack of standardization.
Data Consistency:
Standardizing data ensures that all stakeholders, including
sponsors, researchers, and regulators, interpret data in the
same way. This reduces errors and confusion when working
with large datasets from multiple clinical trials.
Integration and Reuse:
SDTM allows datasets from different clinical trials to be easily
combined and analyzed. For example, a pharmaceutical
company studying a new drug can compare data across
multiple studies to assess its safety and efficacy.
SImproved Efficiency:
By adhering to SDTM standards, data preparation, analysis,
and submission processes become faster. Researchers no
longer need to reformat or reorganize data to meet specific
requirements.
Key Features of SDTM
Standard Domains:
SDTM organizes data into domains, such as DM
(Demographics), AE (Adverse Events), and LB (Laboratory
Tests). Each domain represents a specific type of data
collected during the trial.
Clear Definitions:
SDTM provides precise definitions for each variable, ensuring
that everyone uses the same terminology and structure.
Traceability:
SDTM datasets are designed to maintain traceability,
meaning it’s easy to track how a variable was derived or
where a specific piece of data originated.
Controlled Terminology:
Many SDTM variables use controlled terminology, such as
predefined lists of acceptable values (e.g., for SEX, the values
are "M" for Male and "F" for Female). This ensures consistency
across datasets.
Role and Significance of the DM
Domain
What Is the DM Domain?
The Demographics (DM) domain is one of the most
fundamental components of SDTM. It contains key
demographic information about each subject who
participated in the clinical trial. This includes data like the
subject's age, sex, race, and country of participation.
The DM domain serves as the backbone of the clinical trial
dataset, providing the foundation upon which other domains
are built.
Why Is the DM Domain Important?
Identification of Subjects:
Each subject in the study is assigned a unique identifier in the
DM domain (e.g., USUBJID), which links their data across all
other domains.
Without this identifier, it would be impossible to connect
information about a subject’s adverse events (AE), laboratory
tests (LB), or treatments (EX).
Baseline Characteristics:
The DM domain provides baseline demographic details such
as age, sex, and ethnicity, which are crucial for analyzing the
trial results.
For example, researchers may want to compare how a drug
performs across different age groups or between males and
females.
Regulatory Submissions:
Regulatory authorities use the DM domain to assess the
diversity and representation of the study population.
The data helps ensure that the trial included enough
participants from various demographics to draw reliable
conclusions.
Centralized Linking:
The DM domain acts as a central hub, connecting data across
all other SDTM domains.
For instance, a subject’s DM record ensures their adverse
events (AE domain) and medical history (MH domain) are
linked correctly.
Why the DM Domain Is the
Foundation of Clinical Trial
Datasets
Universal Inclusion
Every subject in the study is represented in the DM domain.
Whether the subject completed the trial, dropped out, or
experienced adverse events, their record remains in the DM
dataset.
Cross-Domain Consistency
The DM domain ensures consistency across all other SDTM
domains. For example:
If a subject’s USUBJID is incorrectly entered in the AE domain,
it won’t match the identifier in the DM domain, triggering an
error.
Key for Analysis
Most clinical trial analyses begin with the DM domain
because it provides the context needed to interpret other
datasets. For instance:
A high number of adverse events in older adults may indicate
the drug is less safe for that age group.
Regulatory Requirement
The DM domain is a required component of any SDTM
submission. Regulatory agencies like the FDA use it to verify
the study population and assess the trial’s reliability.
DM DOMAIN ALL VARIABLES
1. STUDYID
Label: Study Identifier
Type: Character
Length: 20
Control Terminology: Not applicable
Role: Identifier
Core: Required (Req)
Comment: Unique identifier for a clinical study.
Origin: Defined by the sponsor.
Logic:
Assign a unique alphanumeric code to represent the study.
For instance, STUDYID = 'ABC123' ensures each study is
uniquely identifiable across the database.
2. DOMAIN
Label: Domain Abbreviation
Type: Character
Length: 2
Control Terminology: Fixed to "DM" for the Demographics
domain.
Role: Identifier
Core: Required (Req)
Comment: Represents the dataset the variable belongs to.
Origin: Automatically assigned by the system or
predefined.
Logic:
Set as DOMAIN = 'DM' to indicate the dataset’s role within the
SDTM framework.
3. USUBJID
Label: Unique Subject Identifier
Type: Character
Length: 50
Control Terminology: Not applicable
Role: Identifier
Core: Required (Req)
Comment: A globally unique identifier for each subject
across all studies.
Origin: Derived from the concatenation of STUDYID,
SITEID, and SUBJID.
Logic:
USUBJID = CATX('-', STUDYID, SITEID, SUBJID);
This ensures that the combination of study, site, and subject
ID creates a unique identifier for each participant.
4. SUBJID
Label: Subject Identifier for the Study
Type: Character
Length: 20
Control Terminology: Not applicable
Role: Topic
Core: Required (Req)
Comment: Unique within the study; often the ID recorded
on the CRF.
Origin: Defined during data collection.
Logic:
Use the subject’s unique ID as recorded in the case report
forms to maintain consistency.
5. RFSTDTC
Label: Subject Reference Start Date/Time
Type: Character
Length: ISO 8601 format (e.g., YYYY-MM-DD).
Control Terminology: ISO 8601
Role: Record Qualifier
Core: Expected (Exp)
Comment: Typically the date of first exposure to
treatment.
Origin: Derived from the Exposure (EX) domain.
Logic:
Assign the earliest EXSTDTC value from the Exposure domain
to identify the initiation of treatment.
6. RFENDTC
Label: Subject Reference End Date/Time
Type: Character
Length: ISO 8601 format
Control Terminology: ISO 8601
Role: Record Qualifier
Core: Expected (Exp)
Comment: The date of the last exposure or end of the trial.
Origin: Derived from EXENDTC or disposition information.
Logic:
Assign the latest EXENDTC value from the Exposure domain
or the trial end date to capture the subject’s end of
participation.
7. RFXSTDTC
Label: Date/Time of First Study Treatment
Type: Character
Length: ISO 8601 format
Control Terminology: ISO 8601
Role: Record Qualifier
Core: Expected (Exp)
Comment: The first exposure to any protocol-specified
treatment.
Origin: Derived from EXSTDTC.
Logic:
Assign the earliest EXSTDTC value to identify the start of
study treatment.
8. RFXENDTC
Label: Date/Time of Last Study Treatment
Type: Character
Length: ISO 8601 format
Control Terminology: ISO 8601
Role: Record Qualifier
Core: Expected (Exp)
Comment: The last exposure to any protocol-specified
treatment.
Origin: Derived from EXENDTC.
Logic:
Assign the latest EXENDTC value to mark the conclusion of
study treatment.
9. RFICDTC
Label: Date/Time of Informed Consent
Type: Character
Length: ISO 8601 format
Control Terminology: ISO 8601
Role: Record Qualifier
Core: Expected (Exp)
Comment: The date when the subject signed the
informed consent form.
Origin: Collected from clinical trial documentation.
Logic:
Assign the exact date when informed consent was signed to
confirm subject’s participation eligibility.
10. RFPENDTC
Label: Date/Time of End of Participation
Type: Character
Length: ISO 8601 format
Control Terminology: ISO 8601
Role: Record Qualifier
Core: Expected (Exp)
Comment: Date when the subject ended participation in
the trial.
Origin: Derived from disposition or follow-up data.
Logic:
Assign the last known contact date or follow-up date to mark
the completion of participation.
11. DTHDTC
Label: Date/Time of Death
Type: Character
Length: ISO 8601 format
Control Terminology: ISO 8601
Role: Record Qualifier
Core: Expected (Exp)
Comment: Date when the subject died.
Origin: Derived from the clinical database.
Logic:
Assign the recorded death date. If unavailable, leave as null to
indicate missing information.
12. DTHFL
Label: Subject Death Flag
Type: Character
Length: 1
Control Terminology: Controlled terms are "Y" (Yes) or
null.
Role: Record Qualifier
Core: Expected (Exp)
Comment: Indicates whether the subject has died.
Origin: Derived from DTHDTC.
Logic:
If DTHDTC is populated, set DTHFL = 'Y'; otherwise, leave as
null.
13. BRTHDTC
Label: Date/Time of Birth
Type: Character
Length: ISO 8601 format
Control Terminology: ISO 8601
Role: Record Qualifier
Core: Expected (Exp)
Comment: Subject’s date of birth.
Origin: Collected from the subject’s records.
Logic:
Record the exact date of birth to enable age calculation for
analysis.
14. SEX
Label: Sex
Type: Character
Length: 1
Control Terminology: Controlled Terminology (e.g., M, F,
U)
Role: Topic
Core: Required (Req)
Comment: Biological sex of the subject.
Origin: Collected during screening.
Logic:
Use predefined codes for male (M), female (F), or unknown
(U).
15. RACE
Label: Race
Type: Character
Length: 200
Control Terminology: Controlled Terminology
Role: Record Qualifier
Core: Expected (Exp)
Comment: Subject’s racial background.
Origin: Collected during screening or self-reported.
Logic:
Record the race as per the controlled terminology guidelines.
16. ETHNIC
Label: Ethnicity
Type: Character
Length: 200
Control Terminology: Controlled Terminology
Role: Record Qualifier
Core: Expected (Exp)
Comment: Subject’s ethnicity.
Origin: Collected during screening or self-reported.
Logic:
Use standardized codes or terms to indicate ethnicity.
17. ARM
Label: Description of Planned Arm
Type: Character
Length: 200
Control Terminology: Not applicable
Role: Record Qualifier
Core: Expected (Exp)
Comment: Planned treatment or intervention group for
the subject.
Origin: Derived from the protocol.
Logic:
Record the treatment group as defined in the study protocol.
18. ARMCD
Label: Planned Arm Code
Type: Character
Length: 20
Control Terminology: Controlled Terminology
Role: Record Qualifier
Core: Expected (Exp)
Comment: Code for the planned treatment or
intervention group.
Origin: Derived from the protocol.
Logic:
Assign the code representing the planned arm.
19. COUNTRY
Label: Country
Type: Character
Length: 3
Control Terminology: ISO 3166-1 alpha-3
Role: Record Qualifier
Core: Expected (Exp)
Comment: Country of the study site.
Origin: Derived from site information.
Logic:
Use the three-letter ISO code for the country.
20. DTHDTC
Label: Date/Time of Death
Type: Character
Length: ISO 8601 format
Control Terminology: ISO 8601
Role: Record Qualifier
Core: Permissible (Perm)
Comment: Date of death for the subject, if applicable.
Origin: Derived from adverse events or follow-up data.
Logic:
Record the exact date of death for accurate reporting and
analysis.
21. ETHNIC
Label: Ethnicity
Type: Character
Length: 20
Control Terminology: Controlled by CDISC ethnicity terms
(e.g., "HISPANIC OR LATINO", "NOT HISPANIC OR LATINO").
Role: Record Qualifier
Core: Expected (Exp)
Comment: Ethnic group of the subject based on protocol
requirements.
Origin: Collected during data entry.
Logic:
Assign the ethnicity specified in the CRF or leave as null if
unavailable.
22. ARM
Label: Description of Planned Arm
Type: Character
Length: 20
Control Terminology: Not applicable
Role: Record Qualifier
Core: Required (Req)
Comment: Describes the planned arm of the subject in
the study (e.g., "Placebo", "Treatment A").
Origin: Derived from the protocol's planned arm
assignments.
Logic:
Assign the planned treatment arm for the subject based on
the protocol.
23. ARMCD
Label: Planned Arm Code
Type: Character
Length: 20
Control Terminology: Not applicable
Role: Record Qualifier
Core: Required (Req)
Comment: Short code representing the planned arm of
the study (e.g., "PLA", "TRTA").
Origin: Derived from the protocol's planned arm codes.
Logic:
Assign the short code for the planned arm based on the
protocol.
24. ACTARM
Label: Description of Actual Arm
Type: Character
Length: 20
Control Terminology: Not applicable
Role: Record Qualifier
Core: Expected (Exp)
Comment: Describes the actual arm the subject was
assigned to.
Origin: Derived from the actual arm assignments.
Logic:
Assign the actual arm description the subject participated in
based on the study records.
25. ACTARMCD
Label: Actual Arm Code
Type: Character
Length: 20
Control Terminology: Not applicable
Role: Record Qualifier
Core: Expected (Exp)
Comment: Short code representing the actual arm the
subject participated in.
Origin: Derived from the actual arm assignments.
Logic:
Assign the short code for the actual arm the subject was in.
26. COUNTRY
Label: Country of Participation
Type: Character
Length: 3
Control Terminology: ISO 3166-1 alpha-3 country codes
(e.g., "USA", "IND").
Role: Record Qualifier
Core: Required (Req)
Comment: Represents the country where the subject
participated in the study.
Origin: Derived from site information.
Logic:
Assign the country code corresponding to the subject's
participation site.
27. VISITNUM
Label: Visit Number
Type: Numeric
Length: Integer
Control Terminology: Not applicable
Role: Timing
Core: Required (Req)
Comment: Indicates the visit sequence number for the
study.
Origin: Defined in the protocol schedule.
Logic:
Assign sequential numbers starting from 1 for each visit.
28. VISIT
Label: Visit Name
Type: Character
Length: 20
Control Terminology: Not applicable
Role: Timing
Core: Required (Req)
Comment: Describes the name of the visit (e.g.,
"Screening", "Baseline").
Origin: Derived from the protocol schedule.
Logic:
Assign the visit name based on the visit schedule in the
protocol.
29. VISITDY
Label: Planned Study Day of Visit
Type: Numeric
Length: Integer
Control Terminology: Not applicable
Role: Timing
Core: Expected (Exp)
Comment: Indicates the planned day relative to the start
of treatment.
Origin: Derived from the protocol schedule.
Logic:
Calculate as the difference in days between VISIT and
RFSTDTC.
30. ARMU
Label: Arm Units
Type: Character
Length: 20
Control Terminology: Not applicable
Role: Variable Qualifier
Core: Permissible (Perm)
Comment: Units of measurement for the treatment arm if
applicable.
Origin: Defined in the protocol.
Logic:
Assign the units of measurement or leave as null if not
applicable.
Structure of the DM Domain
The DM domain consists of one record per subject. It contains
various variables that help in describing each subject’s
baseline characteristics and participation in the study.
Category Variable Name Core/Permissible Description
Identifier STUDYID Core Unique study identifier
Identifier DOMAIN Core Fixed value DM for the DM domain
Unique subject identifier combining STUDYID
Timing USUBJID Core
and SUBJID
Timing RFSTDTC Core Reference start date/time
Demographics RFENDTC Core Reference end date/time
Demographics AGE Core Age of the subject at the start of the study
Demographics SEX Core Sex of the subject (M, F, U)
Race of the subject based on controlled
Demographics RACE Core
terminology
Ethnicity of the subject (HISPANIC, NON-
Demographics ETHNIC Permissible
HISPANIC)
Trial Participation ARM Core Name of the treatment group (arm)
Trial Participation ARMCD Core Code for the treatment arm (P, TA, etc.)
Key Variables in the DM Domain
1. Identifier Variables
STUDYID: Identifies the clinical study.
DOMAIN: Always set to DM for the Demographics domain.
USUBJID: A unique identifier for each subject, typically
created by concatenating the study ID and subject ID.
SUBJID: A unique identifier used within the study.
2 Timing Variables
RFSTDTC: Reference start date/time – usually the date the
subject started participating in the study.
RFENDTC: Reference end date/time – usually the last date
the subject participated in the study.
RFXSTDTC: Randomization start date/time.
RFXENDTC: Randomization end date/time.
3 Demographic Variables
AGE: Age of the subject at the time of study entry.
AGEU: Units for age (e.g., YEARS, MONTHS).
SEX: Gender of the subject.
RACE: Race of the subject, based on a predefined list of
categories.
ETHNIC: Ethnicity of the subject.
4 Trial Participation Variables
ARM: The name of the treatment or intervention group
assigned to the subject.
ARMCD: A code for the treatment group.
ACTARM: Actual treatment arm received by the subject, if
it differs from the planned arm.
ACTARMCD: Code for the actual arm.
5 Stratification and Group Variables
BRTHDTC: Birth date of the subject.
DTHFL: Death flag indicating if the subject has died
during the trial.
DTHDTC: Date of death, if applicable.
SITEID: The site identifier for where the subject was
enrolled.
COUNTRY: Country of the subject’s participation.
Deriving Variables in the DM
Domain
1 Unique Subject Identifier (USUBJID)
The USUBJID combines the STUDYID and SUBJID to create a
unique identifier for each subject across the study.
Example:
USUBJID = catx("-", STUDYID, SUBJID);
2 Age Calculation and Units (AGE, AGEU)
The AGE variable is typically derived based on the difference
between the subject’s birth date (BRTHDTC) and the reference
start date (RFSTDTC).
Example:
AGE = intck('year', input(BRTHDTC, yymmdd10.), input(RFSTDTC,
yymmdd10.));
AGEU = "YEARS";
3 Trial Arm Assignment (ARM, ARMCD)
The planned treatment arm (ARM) can be directly mapped from
the trial protocol. If the subject receives a different arm due to
protocol deviations, the ACTARM and ACTARMCD should reflect
the actual treatment received.
4 Reference Dates (RFSTDTC, RFENDTC)
The RFSTDTC and RFENDTC fields capture the dates that mark
the start and end of a subject’s participation. These are often
derived from the trial database.
Controlled Terminology
Variable Value Description
SEX M Male
F Female
U Unknown
RACE ASIAN Asian
BLACK Black or African American
WHITE White
OTHER Other
ETHNIC HISPANIC Hispanic
NON-HISPANIC Non-Hispanic
Common Issues and Solutions
1 Missing Data
If demographic variables like RACE or SEX are missing, use the
controlled term U for unknown values where appropriate.
For age, if AGE is missing, it may need to be derived from other
available data, such as the birth date and trial reference dates.
2 Incorrect Date Formats
Dates should be in ISO 8601 format (YYYY-MM-DD). If the date is
provided in another format, ensure it is properly converted before
populating the RFSTDTC or RFENDTC variables.
3 Handling Multiple Arms
If a subject is assigned to more than one treatment arm over the
course of the study, ensure ARM and ACTARM accurately reflect this
with appropriate codes and descriptions.
Best Practices for Creating the DM
Domain
Traceability: Ensure clear documentation of data sources and
transformations from raw data to SDTM-compliant variables.
Controlled Terminology: Always use the appropriate controlled
terminology for variables like SEX, RACE, and ARM.
Data Validation: Validate the dataset for accuracy using tools like
Pinnacle 21, which ensures compliance with SDTM standards.
Subject-Level Data Integrity: Ensure no duplicate USUBJID values
and confirm that all required variables are populated.
Proper Formatting: Ensure the date variables are in ISO format
and adhere to the CDISC SDTM Implementation Guide.
Example SAS Code for DM Domain
Creation
Example SAS Code for DM Domain
Creation
Variable Metadata for DM
Domain (VARNUM Order)
MORE INFORMATIVE
POST
(CLICK ON TOPIC TO READ)
Comprehensive Classification of SDTM Domains
300 SAS Questions
SAS Practice Set (Macro)
Career Insights: Various Role in Pharma Industry
Clinical Data Standard: The CDISC Handbook
SAS MCQ Practice Set - 2
Pharmacovigilance Guide
Types Of Clinical Trials
4 Pillers For Becoming Master In Clinical SAS
Top Interview Questions On CDISC
Companies And Their Type Who Hire Clinical SAS
Programmer
SAS MCQ Practice Set
A Complete Guide to Clinical SAS
Thank you for exploring this guide to the SDTM DM
Domain. I hope it has provided valuable insights and
practical knowledge to help you navigate and
implement the DM domain effectively.
If you have any questions or feedback, feel free to
connect with me on LinkedIn or reach out directly.
Together, we contribute to advancing clinical trial
data quality and global healthcare.
Wishing you success!
Saurabh Patil
Clinical SAS Programmer