Update to Professional SAS Programming Secrets
by Rick Aster. © 1995 Rick Aster
http://www.globalstatements.com
The REPORT proc
Designed as an interactive way to generate reports, the REPORT proc is perhaps
even more valuable to programmers. Use the NOWD option to bypass the proc
windows and send output directly to the standard print file.
By default, the proc produces the same kind of listing as the _ALL_ keyword of the
PUT statement. With additional statements, however, it produces tables and gives you
much more control over their appearance than the PRINT proc allows.
Tables
Unlike most procs that produce tables, the REPORT proc is designed around
columns rather than variables. Instead of a VAR statement, you list variables in the
COLUMN statement and define them further in DEFINE statements.
The COLUMN statement lists the columns of the report and determines the order
in which they appear. In simple cases, the columns are simply variables from the SAS
dataset named in the PROC statement. The COLUMN statement also includes any
header that spans columns. In parentheses, list the header and columns it covers. For a
header of more than one line, simply use more than one character constant.
Each variable is listed again in its own DEFINE statement. The DEFINE statement
is the word DEFINE, the variable name from the COLUMN statement, a slash, and any
options that apply to the variable. The possible options include one or more character
constants as a column header, the FORMAT= or F= option with a format specification
or the WIDTH= option to set the column width, the SPACING= option to set the
number of spaces to the left of the column, the FLOW option for word wrap when the
value is wider than the column, LEFT, CENTER, or RIGHT for alignment, and PAGE to
begin the column on a new page. The DEFINE statement may be omitted if there are no
options to declare for a variable.
Even in this simple example, the REPORT proc goes beyond what the PRINT proc
can do:
TITLE1 'Your Elected Officials';
PROC REPORT DATA=POLS NOWD;
COLUMN NAME OFFICE DISTRICT ('Elections' ELECTED NEXT);
DEFINE NAME / 'Name' WIDTH=23;
DEFINE OFFICE / 'Office' WIDTH=15 SPACING=1;
DEFINE DISTRICT / 'District' WIDTH=11 SPACING=1;
DEFINE ELECTED / 'Last' FORMAT=YEAR4.;
DEFINE NEXT / 'Next' FORMAT=YEAR4.;
RUN;
Your Elected Officials
Elections
Name Office District Last Next
Public, Joe Mayor Anytown 1993 1998
Sharp, Guy Sen. FL 1990 1996
Badley, Sue Sen. NY 1992 .
Doppler, Elizabeth Rep. MD 37 1994 1996
North, Jerome Pres. USA . 1996
LaFemme, Churchill V.P. USA . 1996
Several PROC statement options affect the appearance of the table. To set off the
column headers, use the HEADLINE option for a line and the HEADSKIP option for a
blank line. To produce a table with no headers, use the NOHEADER option. You can
change the default column width and spacing with the COLWIDTH= and SPACING=
options. The CENTER, LS=, PS=, WRAP, PANELS=, and PSPACE= options affect the
layout of the page.
Aliases
Because each column has its own DEFINE statement, the REPORT proc can print
the same variable twice with two different formats. But if the same variable appears
twice, how do you know which DEFINE statement belongs to which column? That’s
what aliases are for. An equals sign and an alias follow the variable name in the
COLUMN statement, and you use the alias in the DEFINE statement.
TITLE1 'Your Elected Officials';
PROC REPORT DATA=POLS NOWD HEADLINE;
COLUMN NAME OFFICE ('Last Elected' ELECTED=MONTH ELECTED=YEAR);
DEFINE NAME / 'Name' WIDTH=23;
DEFINE OFFICE / 'Office' Width=17 SPACING=1;
DEFINE MONTH / 'Month' FORMAT=MONNAME9.;
DEFINE YEAR / 'Year' FORMAT=YEAR4.;
RUN;
Your Elected Officials
Last Elected
Name Office Month Year
----------------------------------------------------------
Public, Joe Mayor November 1993
Sharp, Guy Sen. November 1990
Badley, Sue Sen. November 1992
Doppler, Elizabeth Rep. November 1994
North, Jerome Pres. . .
LaFemme, Churchill V.P. . .
Usage
One of the options that can appear in a DEFINE statement is the usage of a
variable, a code that determines what part of the report the column is. In a regular table
report, the usage of every variable is DISPLAY, and you don’t have to specify it. Other
usages that can appear in a regular table report are ORDER and COMPUTED.
Order variables
An order variable differs from a display variable in two ways. First, it changes the
order of rows in the table. That is, the table is sorted according to the order variable. If
there are several order variables, they take precedence in sorting according to their
order of appearance in the COLUMN statement.
Second, each value of an order variable appears only once, on the first row with
that value. The column is blank in subsequent rows that have the same value.
(However, when an order variable is displayed in a row, all order variables to its right
are also displayed.) Even if the SAS dataset is already sorted, you can define a variable
as an order variable in order to have each value of the variable appear only once, not
repeated on every line.
Computed variables
You can display variables that are based on other variables in the report or in the
SAS dataset. A computed variable has the usage COMPUTED in its DEFINE statement
and gets its value in a code segment.
A code segment is a group of general programming statements, much the same as
in the data step. It can include the data step assignment, sum, LENGTH, and CALL
statements, along with control flow statements other than those that refer to the
observation loop of the data step.
A code segment is preceded by a COMPUTE statement and followed by an
ENDCOMP statement to form a COMPUTE block. The COMPUTE statement names
the variable being computed and, for a character variable, sets the length of the variable
in the /LENGTH= option. This is an example of a DEFINE statement and COMPUTE
block to compute a numeric variable:
DEFINE T_KELVIN / COMPUTED WIDTH=8;
COMPUTE T_KELVIN;
T_KELVIN = (T_FAHREN + 459.67)*5/9;
IF .Z < T_KELVIN < 0 THEN T_KELVIN = 0;
ENDCOMP;
A COMPUTE block can create and use variables other than ones that appear in the
report. Those variables survive from one code segment to the next, so variables created
in one code segment can be used in a subsequent code segment. The REPORT proc
computes columns from left to right in each row, so a COMPUTE block can only rely on
values to its left — variables listed in the COLUMN statement to the left of the variable
being computed, and variables computed in those columns’ COMPUTE blocks. If you
need to base a computation on a variable in the SAS dataset that does not appear in the
report, list the variable in the COLUMN statement and define it with the appropriate
usage and the NOPRINT option.
Conditional formatting
Among several ways you can format a variable differently in different rows of a
report, the most direct is with the DEFINE routine. This CALL routine can only be used
in the code segments of the REPORT proc. Its first argument is the number or name of a
column. The second argument, when setting a format, is the code string 'FORMAT'.
The third argument is a format specification. The routine changes the format of the
column for one row only. To be effective, the routine must execute to the left of the
column.
The following example simply blanks out a column, using the $BLANK format,
based on a condition of a variable in the SAS dataset. Two columns with the NOPRINT
option make the computation possible: CONFIRM, a variable from the SAS dataset
used in the computation, and BLANK, which provides an anchor for the execution of
COMPUTE block.
PROC FORMAT;
VALUE $BLANK OTHER=' ';
RUN;
TITLE 'Scheduled Events';
PROC REPORT DATA=FUTURE NOWD NOHEADER;
COLUMN DATE CONFIRM BLANK CITY SPONSOR;
DEFINE DATE / ORDER FORMAT=DATE9. ORDER=INTERNAL;
DEFINE CONFIRM / DISPLAY NOPRINT;
DEFINE BLANK / COMPUTED NOPRINT;
DEFINE CITY / WIDTH=15;
DEFINE SPONSOR / WIDTH=15;
COMPUTE BLANK;
IF NOT CONFIRM THEN
CALL DEFINE('SPONSOR', 'FORMAT', '$BLANK.');
ENDCOMP;
RUN;
Scheduled Events
21JAN1997 NEW YORK Bjorn's Pizza
28JAN1997 DAYTONA BEACH
06JUL1997 HALIFAX Devon Island Fr
08JUL1997 HAGERSTOWN, MD Parker & Locker
The LINE statement
COMPUTE blocks can also execute at the beginning or end of the report or at the
beginning or end of each value of an ORDER variable. There, COMPUTE blocks are not
necessarily used to compute new variables. More often, they put additional lines, called
break lines, in the report. They do that with the LINE statement.
The LINE statement implements a subset of the features of the PUT statement.
The LINE statement can include character constants, repeated character constants, the @
and + pointer controls, and variables with formats. Unlike the PUT statement, the LINE
statement is not an executable statement. It cannot be made conditional, because it is
not affected by control flow statements. The following example produces a line of
periods between values of the ORDER variable DATE:
COMPUTE AFTER DATE;
LINE @11 60*'.';
ENDCOMP;
Use multiple LINE statements to print multiple lines.
Blank lines
If you just want a blank line between groups, you can do that with a BREAK
statement. Its options determine actions that happen between groups. Use the SKIP
option for a blank line:
BREAK AFTER DATE / SKIP
Other break options are PAGE, for a page break, SUMMARIZE to print a summary
line, OL, UL, DOL, and DUL for lines over or under the break lines or summary line,
and SUPPRESS to keep the break variable from being printed on the summary line.
A break can also happen at the beginning or end of the report. Use the RBREAK
BEFORE or RBREAK AFTER statement with break options and COMPUTE blocks with
the keyword BEFORE or AFTER, but without a variable name.
Summary lines
When you use the SUMMARIZE option on the RBREAK statement, the report
includes a summary line. The traditional use of the summary line is for a line of totals
at the end of the report, as in the example below. This example also demonstrates the
use of the PANELS= option to divide the page into columns.
TITLE1 'State Names';
PROC REPORT DATA=STATES NOWD
LS=80 PS=30 PANELS=2 PSPACE=12 HEADSKIP;
COLUMN NAME LETTERS;
DEFINE NAME / WIDTH=18;
DEFINE LETTERS / WIDTH=7;
RBREAK AFTER / OL SUMMARIZE;
RUN;
State Names
NAME LETTERS NAME LETTERS
Alabama 7 Nebraska 8
Alaska 6 Nevada 6
Arizona 7 New Hampshire 12
Arkansas 8 New Jersey 9
California 10 New Mexico 9
Colorado 8 New York 7
Connecticut 11 North Carolina 13
Delaware 8 North Dakota 11
Florida 7 Ohio 4
Georgia 7 Oklahoma 8
Hawaii 6 Oregon 6
Idaho 5 Pennsylvania 12
Illinois 8 Rhode Island 11
. . .
Mississippi 11 Wyoming 7
Missouri 8 -------
Montana 7 412
Summary lines can also appear at the end of a group if you use the SUMMARIZE
option in the BREAK statement, or at the beginning of the report or a group if you
specify BEFORE instead of AFTER in the RBREAK or BREAK statement.
Analysis variables
In the previous example, because of the SUMMARIZE option, the variable
LETTERS is an analysis variable instead of a display variable. An analysis variable has
a statistic applied to it before it is printed — in this case, the default statistic, SUM. A
more complete DEFINE statement would read:
DEFINE LETTERS / WIDTH=7 ANALYSIS SUM;
The detail rows in the report actually calculate the statistic on the single value of the
observation shown in the row. Because the SUM statistic of a single value is the value
itself, the detail rows look the same as if the variable were a simple display variable.
Other statistics that may make sense to use in a regular table report are MIN, MAX, and
MEAN. You can select the statistic in the DEFINE statement or in the COLUMN
statement, putting a comma between the variable and the statistic.
Summary reports
If you eliminate the detail lines from a report, keeping only the summary lines,
you have a summary report. In a summary report, group and analysis variables take
the place of order and display variables. The report displays one row of summary
statistics per group instead of one row per observation. If there is only one observation
per group, the summary report may be the same as a detail report of the same SAS
dataset.
You can’t tell by looking at the following report that it is a summary report. The
group and analysis variables are presented just the same as order and display variables.
PROC REPORT DATA=ION NOWD;
COLUMN SCHEME ELEMENT COUNT;
DEFINE SCHEME / 'Scheme' GROUP WIDTH=6;
DEFINE ELEMENT / 'Element' GROUP WIDTH=7;
DEFINE COUNT / 'Count' ANALYSIS SUM FORMAT=COMMA9.;
BREAK AFTER SCHEME / SKIP;
RUN;
Ion Counts
Scheme Element Count
A H 18,294
He 3,181
Li 1,928
Other 2,437
B H 9,647
He 3,534
Li 1,929
Other 2,006
In a summary report, you might want to display several columns with different
statistics for the same variable. You can use parentheses in the COLUMN statement to
associate a list of variables with a list of statistics, as:
COLUMN GROUP (LENGTH WIDTH),(MIN MEAN MAX);
A comma between two items in the COLUMN statement means that the items
share the same column. With parentheses for grouping, all combinations of the two
groups are formed.
An across variable is just a group variable displayed horizontally rather than
vertically. It is associated with an analysis variable to make a two-dimensional table of
statistics, similar to the TABULATE proc. For that kind of report, list the across variable
after the group variables, because the across variable appears to the right of the group
variables in the report. In the following example, SCHEME is the across variable, and it
is grouped with the analysis variable COUNT and the statistic N. When not associated
with an analysis variable, N produces a frequency count.
There are several other things to notice in the following example. Aliases are used
for the statistic N, which appears twice in the COLUMN statement. The header defined
for SCHEME begins and ends with a hyphen, which is repeated in the report to fill the
width of its header space; this helps the reader identify which columns SCHEME
covers. The COMPUTE block uses the DEFINE routine with the BLANK format to
blank out the N columns in the summary rows, where they aren’t meaningful. A
summary row is identified when the value of the last group variable is missing.
Column numbers, rather than names, are used to identify the columns because, with the
across variable, more than one column has the same name. When counting columns to
determine the column arguments for the DEFINE routine, you have to count the
BLANK column, even though it is not displayed in the report.
PROC FORMAT;
VALUE BLANK OTHER=' ';
RUN;
TITLE 'Ion Counts';
PROC REPORT DATA=ION NOWD;
COLUMN ELEMENT SITE BLANK SCHEME,(COUNT N=N1) N=N2;
DEFINE SCHEME / '- Scheme -' ACROSS WIDTH=6;
DEFINE ELEMENT / 'Element' GROUP WIDTH=7;
DEFINE SITE / 'Site' GROUP WIDTH=4;
DEFINE BLANK / COMPUTED NOPRINT;
DEFINE COUNT / 'Count' ANALYSIS SUM FORMAT=COMMA9.;
DEFINE N1 / 'Trials' FORMAT=COMMA6.;
DEFINE N2 / 'Total' 'Trials' FORMAT=COMMA7.;
BREAK AFTER ELEMENT / SUMMARIZE SUPPRESS SKIP;
RBREAK AFTER / SUMMARIZE PAGE;
COMPUTE BLANK;
IF SITE = '' THEN DO I = 5, 7, 8;
CALL DEFINE(I, 'FORMAT', 'BLANK.');
END;
ENDCOMP;
RUN;
Ion Counts
-------------- Scheme --------------
A B Total
Element Site Count Trials Count Trials Trials
H 1 17,770 3 9,268 3 6
2 310 3 209 3 6
3 214 3 170 3 6
18,294 9,647
He 1 2,981 3 3,336 3 6
2 105 3 104 3 6
3 95 3 94 3 6
3,181 3,534
Li 1 1,876 3 1,848 3 6
2 33 3 45 3 6
3 19 2 36 3 5
1,928 1,929
Other 1 1,749 3 1,367 3 6
2 287 3 281 3 6
3 401 3 358 3 6
2,437 2,006
25,840 17,116
If you define a table with only group variables, the table is a list of the values of the
group variables. You can make it a frequency table by adding the column N or by using
an across variable. If you define a table with only analysis variables, the table shows
one row of statistics calculated over the entire SAS dataset. Computed variables and
break lines can also appear in a summary report and work the same way as in a listing
report.
Update to Professional SAS Programming Secrets
by Rick Aster. © 1995 Rick Aster
http://www.globalstatements.com
Data step features
The new features of data step syntax shouldn’t fundamentally change your
approach to SAS programming, but they present major improvements in a few
specialized areas.
Indexed table lookup
The KEY= option of the SET statement makes it possible to do an indexed table
lookup from a SAS data file. This is easier to code and usually runs faster than the other
storage table lookup techniques described in chapter 15.
The KEY= option names an index of the SAS dataset. The SET statement then
looks for an observation whose key values match the value of the same variables in the
data step. Use the /UNIQUE option after the KEY= option to indicate that you are
looking for just one observation. The automatic variable _IORC_ is 0 if the observation
is found, a nonzero return code if it is not found.
The following example puts all this together. Suppose you want to summarize
account data by customer, looking up the customer ID and name in an account table.
The account data is the SAS dataset CURRENT, the account table is ACCT, and it has a
simple index on ACCOUNT.
DATA CUSTCUR;
SET CURRENT;
* Add customer ID and name from account table;
SET ACCT (KEEP=ACCOUNT CUSTID CUSTNAME) KEY=ACCOUNT/UNIQUE;
IF _IORC_ THEN DO; * Account not found in account table;
CUSTID = '';
CUSTNAME = '';
END;
RUN;
PROC SUMMARY DATA=CUSTCUR PRINT SUM N;
CLASS CUSTID;
ID CUSTNAME;
RUN;
If you want to look up more than one observation with the same key value, omit
the /UNIQUE option and put the SET statement in a DO loop that repeats until _IORC_
is a nonzero return code.
MODIFY statement
The introduction of the MODIFY statement makes it possible to use the same
physical SAS dataset for input and output in a data step. Effectively, you can use a data
step to edit a SAS dataset. Name the SAS dataset in the DATA statement and again in
the MODIFY statement, which acts basically like the SET statement. Then, based on
data step logic, you can write out modified observations with the REPLACE statement,
remove observations with the REMOVE statement, and add new observations with the
OUTPUT statement.
Suppose, for example, you want to delete from the SAS dataset XFER.RECENT all
observations that are more than a year old , based on the value of the variable
INITDATE. You could do that with this step:
DATA XFER.RECENT;
MODIFY XFER.RECENT;
IF INITDATE < "&SYSDATE"D - 365 THEN REMOVE;
RUN;
The REPLACE statement would be useful, for example, if you want to correct
missing and negative values of the variable COUNT, setting them to 0. After the DATA
and MODIFY statements, the statements would be:
IF COUNT < 0 THEN DO;
COUNT = 0;
REPLACE;
END;
The MODIFY statement is usually faster than the SET statement for this kind of
task. With the SET statement, the data step write the entire SAS dataset; with the
MODIFY statement, it writes only the parts that are changed.
The MODIFY statement can also be used with the WHERE= option or the KEY=
option (described above for the SET statement) to edit only selected observations of a
SAS dataset. Other observations in the SAS dataset are not affected by the data step.
For example, the INITDATE example above could have been written:
DATA XFER.RECENT;
MODIFY XFER.RECENT (WHERE=(INITDATE < "&SYSDATE"D - 365));
REMOVE;
RUN;
The MODIFY statement can also be used with two SAS datasets and a BY
statement. In this form, it resembles the behavior of the UPDATE statement, using
nonmissing values in the second SAS dataset to update values from the first SAS
dataset. However, although you can change the values of existing variables by using
the REPLACE statement, you cannot add new variables to the SAS dataset named in the
MODIFY statement, and you can add new observations only at the end of the SAS
dataset.
Routines
The number of routines in the SAS System continues to increase. These are some
of the most useful new routines.
EXECUTE
The EXECUTE routine is another useful tool for flexcode and macro programming.
Used in the CALL routine in a data step, it takes one argument, a character value. It
resolves any macrolanguage references in the value. The resulting SAS statements, if
any, are executed after the data step executes. The resulting statements must be
complete statements. The EXECUTE routine is the only way to access the macro
processor during the execution of a data step. Be sure to use a RUN statement at the
end of a data step that uses the EXECUTE routine.
The EXECUTE routine offers a better way to stop a program based on a data step
condition. The statement is:
IF condition THEN CALL EXECUTE('ENDSAS;');
The program then stops at the conclusion of the data step.
Bitwise functions
A new set of functions does bitwise operations on the set of integers from 0 to 232 –
1, treating a number as a string of 32 bits. Bitwise logical operations apply logical
operators independently to each of the 32 bits in the integer. Bit shifting removes some
of the bits at the left or the right and moves the rest of them over in that direction.
These are the bitwise functions:
Function call Description
BAND(n, n) bitwise logical AND
BOR(n, n) bitwise logical OR
BXOR(n, n) bitwise logical exclusive OR (bitwise ≠)
BNOT(n) bitwise NOT
BLSHIFT(n, d) shift left d bits
BRSHIFT(n, d) shift right d bits
String functions
New functions fill in some of the gaps in the SAS System’s string processing. The
LOWCASE function is the lowercase counterpart to the UPCASE function. The TRIMN
function goes beyond the TRIM function to convert a blank value to a null string. The
TRANWRD function is similar to the TRANSLATE function, but replaces substrings
rather than individual characters. The INDEXW function is like the INDEX function,
but looks for the substring as an entire word in the string.
The QUOTE and DEQUOTE functions turn a character value into a SAS character
constant and vice versa. These functions are useful when exchanging data with other
programs in comma-delimited files.
Finally, the SOUNDEX function encodes a word with a modified Soundex
encoding, which is traditionally used to find out whether two words are likely to sound
similar.
International dates
Release 6.10 introduced international informats and formats. Primarily, they read
and write dates in various languages. The routine names have the form lanDFir, where
lan is one of the language codes listed below and ir is a code identifying the international
informat or format. The codes are:
Code Language Code International Informat
DAN Danish DE DATE
DES Swiss_German DT DATETIME
DEU German MY MONYY
ENG English
ESP Spanish Code International Format
FIN Finnish DD DDMMYY
FRA French DE DATE
FRS Swiss_French DN WEEKDAY
ITA Italian DT DATETIME
NLD Dutch DWN DOWNAME
NOR Norwegian MN MONNAME
PTG Portuguese MY MONYY
SVE Swedish WKX WEEKDATX
WDX WORDDATX
For example, use the format DEUDFMN to write the German name of the month
of a SAS date value.
The language that the international informats and formats use is determined by
the system option DFLANG=, which selects one of the languages listed above. The
default in English-speaking countries is DFLANG=English.