Data collection tool development,
Sampling methods, & Quality
            Assurance
              MED 410
            Dr. Ola Soudah
                    Data collection tool
• Data: means information collected as an image, text, vitals, numbers,
  or figures.
• Data sources are classified into two types:
   • Primary data: first-hand information collected by an investigator, It is
     collected for the first time. Pros: more reliable, original, and directly related to
     your research.
   • Secondary data: refers to second-hand information; It is not originally
     collected and rather obtained from already published or unpublished sources.
• The main sources of health data are:
  •   Surveys.
  •   Electronic Medical records (EMR) .
  •   Claims data or administrative data.
  •   Vital records.
  •   Surveillance.
  •   Disease registries.
  •   Peer-reviewed literature.
Surveys
• Surveys are an important means of collecting health and social science
  information from a sample of people in a standardized way to better
  understand a larger population.
• Pros:
   • Collect empirical data in a relatively short period of time.
   • Surveys can collect data on a representative sample of people, particularly when
     samples are randomized or purposive nonprobability sampling is used.
• There are many methods used to conduct surveys, including
  questionnaires and in-depth interviews via phone, mail, email, and in-
  person.
Electronic Medical records
• Electronic Medical records (EMR) or electron health records (EHR) track
  events and transactions between patients and health care providers,
  real-time data.
• EMR Contain a patient’s medical history, diagnoses, medications,
  treatment plans, immunization dates, allergies, radiology images, and
  laboratory and test results.
• Medical records help us measure and analyze trends in health care use,
  patient characteristics, and quality of care.
• Pros: The data are automatically collected and usually accurate and
  detailed because they come from health care providers.
Claims data or administrative data
• Claims databases collect information on millions of doctors’
  appointments, bills, insurance information, and other patient-provider
  communications.
• Pros:
   • Like other medical records, they come directly from notes made by the health
     care provider, real-time data.
   • The large sample size of claims data, researchers can analyze groups of
     patients with rare illnesses and medical conditions.
• Cons: there may be low validity due to certain illegal billing practices,
  like ordering unnecessary tests or billing for services that were not
  provided.
Vital Records
• Vital records are collected by the National Vital Statistics System, and are
  maintained by state and local governments.
• Vital records include births, deaths, marriages, divorces, and fetal deaths.
  They also record information about the cause of death, or details of the birth.
• Pros: Vital records are useful because they offer very detailed information
  and include information about rare disorders that end in death.
• Cons: Vital records only provide information on diseases and illnesses that
  end in death.
Surveillance
• Public health surveillance is the ongoing systematic collection, analysis,
  and interpretation of data, closely integrated with the timely
  dissemination of these data to those responsible for preventing and
  controlling disease and injury.”
• These systems function through the efforts of local and state health
  departments, working in tandem with a variety of health care
  providers (laboratories, hospitals, private providers), who are
  mandated by law to report cases of certain diseases.
Disease registries
• Disease registries are another type of public health surveillance.
• Registries are systems that allow people to collect, store, retrieve,
  analyze, and disseminate information about people with a specific
  disease or condition.
• Pros:
   • Disease registries let researchers estimate how large a health problem is,
     determine the incidence of the disease, study trends over time, and evaluate
     the effects of certain environmental exposures.
 Surveillance data has a higher validity than surveys, because the data
     comes from lab tests, diagnoses, and other patient records.
Peer-reviewed literature
• Peer-reviewed journals may include their articles data uploaded on
  some data repository, like Harvard COVID-19 dataverse.
   • The research of scholars who have collected their own data using an
     experimental study design, survey, or various other study methodologies.
• They also present the work of researchers who have performed novel
  analyses of existing data sources.
   Selecting, designing, and developing your
                 questionnaire
Questionnaires offer an objective means
of collecting information about people’s
knowledge, beliefs, attitudes, and
behavior.
Steps in questionnaire development
• Ask your self the following questions:
• What information are you trying to collect? Require a deep knowledge
  of the research topic.
• Is a questionnaire appropriate? Sometimes questionnaires were used
  inappropriately.
Rule of thumb:
don’t use a
questionnaire to
assess sensitive
topics (associated
with stigma), can
be mixed with
people perceptions,
a broad issue that
can’t be assessed
by a question.
• Could you use an existing instrument? Using a previously validated
  and published questionnaire will save you time and resources; you will
  be able to compare your own findings with those from other studies,
  you need only give outline details of the instrument when you write up
  your work, and you may find it easier to get published.
• Is the questionnaire valid and reliable?
• A valid questionnaire measures what it claims to measure.
• Repliable questionnaires yield consistent results from repeated
  samples and different researchers over time.
Questions
• Types of questions:
• Open ended            Usually is
                        answered
                         as TEXT
• Close ended
                 Open ended VS Close ended 
                         questions
                         Pros                              Cons
Open       Allow free expression           Take longer time
ended      Capture new response options,   Take more effort to finish (exhaustive)
           emotions, & thoughts            Need coding and analysis
                                           Rely on hand writing skills & clarity
Closed     Easy and quick                  Guessed answers
question   Clear and complete              Ceiling – Floor effect
           Suitable for self completion    Errors in filling it
           Easy to standardize             can’t capture wider options
How should you
present your
questions? 
(Question formatting
and response options)
A Likert scale presents
ordered responses to a
questionnaire item that
asks participants to rank
preferences numerically,
such as
by using a scale for which 1
indicates strong
disagreement and 5
indicates strong agreement.
Most scales with a neutral
option list 5 to 7 categories.
Most scales without a
neutral option list 4
or 6 categories.
Wording
• Clarity. Make questions as clear and specific as possible.
      • For example, asking, “How much exercise do you usually get?” is less clear than asking,
        “During a typical week, how many hours do you spend in vigorous walking?”
• Simplicity. Use simple, common words and grammar that convey the
 idea, and avoid technical terms and jargon.
      • For example, it is clearer to ask about “drugs you can buy without a prescription from a
        doctor” than to ask about “over-the-counter medications.”
• Neutrality. Avoid “loaded” words and stereotypes that suggest a
 desirable answer.
      • Asking “During the last month, how often did you drink too much alcohol?” may
        discourage respondents from admitting that they drink a lot of alcohol.
Setting the Time Frame
• To measure the frequency of the behavior it is essential to have the
  respondent describe it in terms of some unit of time.
 During the last 7 days, how many cigarettes did you smoke (one pack is
                         equal to 20 cigarettes)?
                    [     ] cigarettes in the last 7 days
• The investigator must first decide what aspect of the behavior is most
  important to the study: the average or the extremes.
Avoid Pitfalls
• Double-barreled questions. Each question should contain only one
  concept.
   • “How many cups of coffee or tea do you drink during a day?”
• Hidden assumptions. Sometimes questions make assumptions that
  may not apply to all people who participate in the study.
   • For example, a standard depression item asks how often, in the
     past week: “I felt that I could not be happy even with help from my
     family.”
What should the questionnaire look like?
  • In general, questions should be short
  and to the point (around 12 words or less).
  • Physical layout of their questionnaire as the font size, color,
    or question order.
Good questionnaire checklist
• After drafting the questionnaire, check each question for clarity and
  confirm that the responses are also carefully worded. For example:
   • Does each question ask what it is intended to ask?
   • Is the language of each question clear and neutral?
   • Will members of the study population understand the language?
   • Do questions about sensitive topics use language acceptable to the source
     population?
   • Are the response options clearly presented?
   • For scaled questions, is the rank order clear? (For example, is it obvious that 1
     is “strongly disagree” and 5 is “strongly agree”? Or, alternatively, that 1 is
     “excellent” and 7 is “poor”?)
   • For questions with unranked categories, is the order of possible responses
     alphabetical or otherwise neutral?
For new questions and scales
• Pretest
• Pretest the instrument to insure the clarity and timing of the questions.
•   Validate
•   Correctness or accuracy : measure what it meant to measure)
•   Reliable
•   consistency: give similar results when it used by different individuals or
    settings
• Questionnaires and interviews can be assessed for validity and reliability by
  taking a pilot study on small number of people or/and expert view.
Pilot Testing
• A pilot test, or pretest, is a small-scale preliminary study conducted to
  evaluate the feasibility of a full scale research project.
• A pilot test of a questionnaire is helpful for checking:
       • The wording and clarity of the questions
       • The order of the questions
       • The ability and willingness of participants to answer the questions
       • The responses given, and whether the responses match the intended types
         of responses
       • The amount of time it takes to complete the survey
                                                          α
       • Check the reliability (internal consistency, Cronbach )
Using available scales
• One way to improve validity is to include survey questions or modules
  that are identical to the ones used in previous research projects.
• Such as:
      • The Beck Depression Inventory and the General Health Questionnaire (GHQ), which
        assesses psychological status
      • The Mini-Mental State Examination (MMSE), which evaluates cognitive function
      • The SF-36 and SF-12, which both measure health-related quality of life, that captures an
        individual’s perceived physical, mental, emotional, and social well-being.
Translation
• One way to ensure that the correct meaning is being conveyed is to
  use back translation, or double translation, in which one person
  translates the questionnaire from the original language to a new
  language and then a second person translates the survey instrument
  in the new language back into the original language.
Response rate 
• Response rates tend to be low, particularly when used without a well-
  defined target population or sample.
• Here are some helpful guidelines for surveys of health professionals,
  students, and educators.
      • An introductory statement.
      • The survey instrument must have well designed content and be visually
        appealing.
      • A respondent should be able to complete the survey in a short period.
      • Some inducement or gift may be included.
      • A follow-up request may spur some respondents.
Questionnaire administration methods
Types of surveys
• Self reported by mail or online.
• Interview by face to face, telephone based , or virtual.
• Group or focus group interview.
                      Population Sampling
• We can’t sample the entire
  population …. For a study.
• The source population,
sometimes called a sampling
frame, is a well-defined subset of
individuals from the target
population from which potential
study participants will be
sampled.
• Sampling methods classification:
• Non-Probability Sampling Methods
• Probability Sampling Methods
Non-Probability Sampling Methods
•   Convenience sampling
•   Quota sampling
•   Purposive Sampling
•   Snowball sampling
Convenience sampling
• Select any one available.
Quota sampling
Purposive Sampling
    Snowball sampling
Existing subjects are
asked to nominate
further subjects known
to them, so the sample
increases in size like a
rolling snowball.
Probability Sampling Methods
•   Simple random sampling
•   Systematic sampling
•   Stratified sampling
•   Clustered sampling
•   Multi-stage sampling
  Simple random sampling
In this case each individual is chosen
entirely by chance and each member of
the population has an equal chance, or
probability, of being selected.
Method to choose by chance:
  • Toss a coin.
  • Random Number generator tables
    or software.
Random number generator                       
Have a list of individuals name (total ).
Determine your sample size (n).
Press generate, pick the person with the number shown.
Systematic sampling
Stratified sampling
Clustered sampling
• In a clustered sample, subgroups of the population are used as the
  sampling unit, rather than individuals.
Multi-stage sampling
Sample Quality
• Sample bias: the sample is not representative for the population.
  Sample is different in its’ characteristics compared to the population.
•   Sampling bias may be introduced when:
•   Poor sampling methodology
•   Poor recruitment methodology
•   High rate withdrawal from the study
• How to avoid: is to stick to probability-based sampling methods and to
  add 20% extra to the required sample size.
Questions