Introduction to SPSS
1
SPSS
SPSS is most widely used in social science
disciplines and courses.
SPSS is the oldest software programs developed and
made available in 1960s and has been redeveloped
over the years, the latest version is SPSS 24.0 which
was produced in.
SPSS has a "point and click" interface that allows
you to use pull down menus to select commands that
you wish to perform.
SPSS
SPSS assists the user in describing data, testing
hypotheses and looking for a correlation or
relationship between one or more variables.
SPSS is very suitable for most regression analysis
and different kinds of analytical tests
regression, linear, logistic, etc
survival analysis,
analysis of variance,
factor analysis,
multivariate analysis
but not suitable for time series analysis and
multilevel regression analysis)
SPSS
PRO CON
Easy to learn and use Very expensive
More powerful than some
Not adequate for modeling
other soft wares
and cutting edge statistical
One of the most widely used analysis
statistical packages in
academia and industry
Has a command line interface
in addition to menu driven
user interface
One of the most powerful
statistical package that is also
easy to use.
What is Used? (Academia)
Figure 7a. Use of data analysis software in academic publications as measured by hits on Google Scholar.
SPSS Windows has 3 windows:
Data Editor
Viewer or Draft Viewer which displays the output files
Syntax Editor, which displays syntax files
The Data Editor has two parts:
Data View window, which displays data from the active file in
spreadsheet format
Variable View window, which displays metadata or information
about the data in the active file, such as variable names and
labels, value labels, formats, and missing value indicators.
6
SPSS Data View
7
SPSS Variable View
8
SPSS Menu & Toolbars
• File, Edit, View, Window, Help: Similar to most windows
applications.
File - Standard options for opening, saving, printing and
exiting
Edit - Standard commands to undo, redo, cut, copy and
paste
View - Options for showing/hiding toolbars, displaying
values or their labels in Data Editor
Window - Provides option for switching between different
SPSS windows
Help – Contains SPSS help system
Toolbars Continued
• Data – Used to manipulate the data; sort, merge.. etc
• Transform - Creation of new variables.
• Analyze - Heart of SPSS.
– This menu provides access to the statistical procedures
for analysing your data set.
– All the items on the analyze menu have sub menus.
• Graphs - Provide options to create high quality plots and charts.
• Utilities - Used to display information on individual variables.
Data Entry into SPSS
• There are 2 ways to enter data into SPSS:
1. Directly enter in to SPSS by typing in Data
View
2. Enter into other database software such as
Excel, EpiInfo, EpiData, etc and then import
into SPSS
1. Manual Data Entry
• Manually Enter Data:
1. Define Variables in Variable View
2. Enter data in Data view
Enter variables
1. Click Variable View
2. Type variable name under
2. Type 4. Description Name column (e.g. Age).
variable of variable NOTE: Variable name can be 64
name bytes long, and the first
3. Type: character must be a letter or
numeric or
one of the characters @, #,
string
or $.
3. Type: Numeric, string, etc.
4. Label: description of
variables.
1. Click this
Window
13
The Workspace
Variables
Value labels
Cases
Toggle between
Data and Variable
Views
14
Enter cases
1. Two variables in the data set.
2. They are: Code and Q01.
3. Code is an ID variable, used to identify
individual case (NOT people’s real IDs).
4. Q01 is about participants’ ages: 1 = 12 years
or younger, 2 = 13 years, 3 = 14 years…
Under Data
View
15
2. Import from other software
Example: Reading in Data from Excel to SPSS
• Two options:
1 – Copy data in excel and paste directly into the
Data View screen
2 – Read in an excel file (.xls)
Read in an excel file (.xls)
• Select File Open Data
• Choose Excel as file type
• Select the file you want to import
• Then click Open
17
Reading in Data from Excel to SPSS
Warning:
•SPSS is much better at handling numeric variables than
string variables (categorical data entered as text).
•Therefore, if you want to transfer data from Excel to SPSS it
is a good idea to ensure that any categorical data (e.g.
yes/no/don’t know, male/female, etc.) are entered in Excel as
numeric data (codes) rather than text.
•For example, you could always code ’No’ as 0 and ’Yes’ as
1, and so on.
Clean data after import data files
• Key in values and labels for each variable
• Run frequency for each variable
• Check outputs to see if you have variables with
wrong values.
• Check missing values and physical surveys if you use
paper surveys, and make sure they are real missing.
• Sometimes, you need to recode string variables into
numeric variables
19
General guidelines for data entry
Encode categorical variables.
Convert letters and words to numbers.
Avoid mixing symbols with data and convert them to
numbers
Give each participant a unique, sequential case
number (ID).
Place this ID number in the first column on the left
20
General guidelines…
• Each variable should be in its own column.
Change to:
Avoid this:
Animal Group
Animal
1 0
Control1
2 0
Control2
Experiment1 3 1
4 1
Experiment2
• Do not combine variables in one column
• It is recommended to use 0/1 for 2 groups with 0 as a reference
group.
21
General guidelines…
• All data for a project should be in one spreadsheet.
• Do not include graphs or summary statistics in the
spreadsheet.
• Each participant should be entered on a single line or
row.
• Do not copy a participant's information to another row
to perform subgroup analysis.
22
General guidelines…
However when data are repeatedly collected over the same
participant, it’s recommended to have patient-day observation on a
simple line to ease data management.
SPSS has a nice feature to convert from the longitudinal format to
horizontal format.
When the number of repeats are few 2 or 3, horizontal format may be
preferred for simplicity.
Longitudinal data entry Horizontal data entry
Date ID SYSBP ID SYSBP1 SYSBP2 SYSBP3
1/2/2005 1 130 1 130 120 120
1/3/2005 1 120 2 110 140
1/4/2005 1 120
3/1/2005 2 110
3/2/2005 2 140
23
General guidelines…
• Do not leave blanks for no.
•Do not enter “?”, “*”, or “NA” for missing data because this
indicates to the statistical program than the variable is a string
variable.
• String variables cannot be used for any arithmetic computation.
• Put ordinal variables into one column if they are mutually exclusive
Avoid: Preferred:
Pain Pain
Mild Moderate Severe
1 0 0 1
0 1 0 2
0 0 1 3
24
Data merging in SPSS – Adding Variables
It a way of merging or joining two or more data set
into a single data set
To do this, we must merge the variables in the two
data sets by the values of the common (and unique)
SampleID variable (this is known as the key variables),
so that the correct unit information is associated with
each candy packet.
In order to do this:
Click Data>Merge Files>Add Variables
25
Side to Side Merge
ID Health1 Health2 ID Educ1 Educ2
01 02 03 01 34 45
02 04 05 02 71 55
03 14 24 03 62 34
: : : : : :
n X1 X2 n X1 X2
• Used when data files have same records but different variables
• Each file should have key field(s) to ensure correct merging
• For example: Person A enters Health data, Person B enters
Education data
26
Data merging…
1. Make sure that both files are sorted by Key variable in ascending order
2. In SPSS, open Data from one of the data source
3. Select Add Variables under Data, Merge Files
27
Data merging…
4. Select the dataset you want to merge into the working file.
28
Data merging…
5. Click on Match cases on key variables in sorted files,
6. Click on Both files provide cases
7. Highlight ID in the excluded variables box, then click ► near key
Variables
29
Note in Data merging in SPSS
• Cases must be sorted in the same order in both data files.
• If one or more key variables are used to match cases, the two data
files must be sorted by ascending order of the key variable.
• Variable names in the second data file that duplicate variable names
in the working data file are excluded by default because Add
Variables assumes that these variables contain duplicate information.
•Thus before you merge data files, you need carefully to check two
variables with the same name.
•If two variables contain different information, SPSS automatically
delete variable from the file, which is being merged into
30
Concatenating or appending data in SPSS
This is merging data that was entered into two
different data set
Click Data>Merge Files>Add Cases
31
Top to Bottom Merge
• Used when data files ID Var1 Var2 var3
have the same 01 24 54 62
variables but different 02 32 54 14
03 54 24 35
records
: : : :
• Used to combine data 10 35 46 45
entered by different
data entry staff
• For example: A enters
records 1 to 10, B ID Var1 Var2 var3
11 35 45 12
enters records 11 to
12 64 74 25
20
13 54 54 65
: : : :
20 37 65 56
32
Data Cleaning in SPSS
1. Re-coding existing variables – into the same
variable
2. Re-coding existing variables – into the different
variable
3. Creating new variable from existing variables
33
Recoding existing variables
• We want to use numeric coding for group instead of A
and B.
Old New
ID Group Group
1 A 0
2 A 0
3 B 1
4 B 1
34
Recoding existing variables (2)
From SPSS dialog box, go to:
Transform
Recode
Into Same variables
35
Recoding existing variables (3)
1. Select Group from the variable box into String Variables box
2. Click on Old and new Values to proceed
36
Recoding existing variables (4)
1. Type the old value and the new value you want to convert into
2. Click on Add (To remove, or change, click on Change or Remove)
3. Type all values in the Old New box, then click Continue
4. Click OK to execute the commands.
37
Re-coding existing variables – into the different variable
• Recoding into a different variable transforms an
original variable into a new variable.
•That is, the changes do not overwrite the original variable; they are
instead applied to a copy of the original variable under a new name.
To recode into different variables, click Transform > Recode into
Different Variables.
38
Re-coding existing variables – into the different variable
• The Recode into Different Variables window will
appear.
39
Re-coding existing variables ….
Input Variable -> Output Variable: The center text box lists the
variable(s) you have selected to recode, as well as the name your
new variable(s) will have after the recode. You will define the new
name in (C).
Output Variable: Define the name and label for your recoded
variable(s) by typing them in the text fields. Once you are finished,
click Change. Now the center text box, (B), will display both the
name of the original variable as well as the name for the new
variable (e.g., “Height --> Height_categ”).
Old and New Variables: Click the Old and New Values to
specify how you wish to recode the values for the selected variable.
If: The If option allows you to specify the conditions under which
your recode will be applied.
40
Re-coding existing variables ….
Old and New Values
Once you click Old and New Values, a new window where you will
specify how to transform the values will appear.
41
Re-coding existing variables ….
Old Value: Specify the type of value you wish to recode (e.g., a
specific value, missing data, or a range of values) and the specific
value to be recoded (e.g., a value of “1” or a range of “1-5”).
New Value: Specify the new value for your variable (i.e., a specific
numeric code such as “2,” system-missing, or copy old values).
Old -> New: Once you have selected the old and new values for
your selected variable in (1) and (2), click Add in area (3), Old--
>New.
• The recode that you have specified now appears in the text field.
• If you need to change one of the recodes that you have added to
the Old-->New area section, simply click on the one you wish to
change and make changes in (1) and (2) as necessary.
42
Creating a new variable for Diastolic blood pressure (DiasBP):
In SPSS, go to Variable View,
Then type DiasBP at the last row under Name
Go back to Data View and directly type diastolic blood pressure to separate from
SysBP. For ease of data entry, you can move DiasBP right after SysBP. Now also
edit sysBP.
43
Creating new variable from existing variables
• Sometimes you may need to compute a new variable
based on existing information (from other variables) in
your data.
•For example, you may want to:
• Convert the units of a variable from feet to meters
• Use a subject's height and weight to compute their
BMI
• Apply a computation conditionally, so that a new
variable is only computed for cases where certain
conditions are met
44
Creating new variable from….
To compute a new variable,
click Transform > Compute Variable.
45
Creating new variable from …
The Compute Variable window will open where you will specify how to
calculate your new variable.
46
Target Variable: The name of the new variable that will be
created during the computation.
The left column lists all of the variables in your dataset
Numeric Expression: Specify how to compute the new variable
by writing a numeric expression.
The center of the window includes a collection of arithmetic
operators, Boolean operators, and numeric characters, which you
can use to specify how your new variable will be calculated.
I IF: The If option allows you to specify the conditions under
which your computation will be applied.
Function group: You can also use the built-in functions in
the Function group list on the right-hand side of the
window. 47