0% found this document useful (0 votes)
162 views10 pages

Good Programming Practice (GPP) in SAS® & Clinical Trials

Uploaded by

Sandeep Bellapu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
162 views10 pages

Good Programming Practice (GPP) in SAS® & Clinical Trials

Uploaded by

Sandeep Bellapu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Good Programming Practice [GPP] in SAS® & Clinical Trials

Srinivas Vanam, Percept Pharma Services, NJ

Manvitha Yennam, Seattle Genetics, WA

Phaneendhar Vanam, Percept Pharma Services(pharmacyclics), NJ

ABSTRACT

This Paper presents certain Tips and Techniques/Conventions that can be implemented in your Day-to-
Day Programming activities to increase your Efficiency at Work. Since, SAS® Institute (or) FDA does not
specify any Standards for the development of Programs, SAS® Users have the flexibility to write the
programs in their own style. This might sound cool, but this results in having Inconsistent programs
across the Project/Study. Hence following certain Principles/Conventions while Programming will make
the programs an Asset to the Company. This target audience of this paper are both the Programmer (who
creates the building blocks of the Project) and Manager (who oversees the entire Project and keeps all
the building blocks together).

INTRODUCTION

Before starting on the programming of any SAS® Project, it is a good idea to use a Software Methodology
called as “Software Development Life Cycle”, abbreviated as “SDLC”. SDLC allows you to design your
Project based on the requirements of the Project. A well designed Project avoids redundancy and
enhances reusability, promotes efficiency which ultimately helps in optimizing the resources.

Once the design of the project is ready, it is work of the programmers in developing those building blocks.
While performing the programming activities, it is a good idea to following certain conventions that
increase the worth of the programs. In this paper, we have categorized those conventions into four
criteria.

1. Readability….: To make your programs easily understandable. This increases Programmer’s


Efficiency.
2. Efficiency……: To reduce the usage of resources like memory and CPU processing time. This
increases Computer’s Efficiency.
3. Reusability….: To make your programs reusable by separating frequently used logic from code
and creating a separate program/macro/user-defined function.
4. Robustability.: To make your programs handle all possible or wide variety of scenarios and
does not crash. The program should be executable on wider range of platforms.

Most of the times they are independent of each other and all of them can be implemented without
affecting the other. But when there exists a conflict between any of the above two criteria, then decision
should be made based on the situation on which criteria should be given priority.

Page 1 of 10
Example: When there is a conflict between Readability and Efficiency of a program, then if the nature of
execution environment is of the Interactive type, then Readability should be given priority. But if the
nature of execution environment is of the Batch-run type, then Efficiency should be given priority.

SOFTWARE DEVELOPMENT LIFE CYCLE (SDLC)

SDLC provides a model for the development and maintenance of the software applications. It comprises
of 6 phases. They are:

1. Requirements: In this phase, the requirements that should be fulfilled for the project are
collected.
2. Analysis: In this phase, the requirements are analyzed to group similar requirements together
and split complex one into two.
3. Design: After the analysis, the system will be designed in the form of modules which gives a
clear picture of the project.
4. Development: Once the design is completed, each module is developed separately and all the
modules are put together.
5. Testing: Test the program in a testing environment which simulates the real working
environment to make sure it is working as per the specifications and find if there exists any flaws
in the system.
6. Maintenance: Once the program/project is in usage, there may be certain updates needed as the
needs/requirements changes. To accommodate any such new requirements, updates to the
program/product will be made as part of the Maintenance phase to keep the program running
smoothly.

Example: Scenario 1:

Consider a situation where there is a need to create 20 Adverse Event tables. If SDLC is followed
for analyzing this situation, the best way is to write a macro that can create all of the 20 tables.
Have individual driver programs that call this macro. The advantages of following this approach
are,

I. Avoids redundancy. The main logic of the table is present only in macro.
II. A change in the macro will be sufficient if any update is needed. Otherwise, each of
the program should be updated to reflect the update in the specification.
III. Enhances reusability. A well-programmed and well-documented macro can be used
across various studies within the company.

Example: Scenario 2:

The existing report generating macro for NON-CIDISC data should be revised and updated to
work for the CIDISC data.

Now we will go through each of the criteria that we discussed above.

Page 2 of 10
READABILITY

In this section we will discuss certain conventions / practices that enhances the readability of your
program.

 Header: A self-describing header at the beginning of the programs always helps the programmers
understand what the program is about.

/*********************************************************************** /
/ Project: PharmaSUG - 2016 /
/ Program: myprogram.sas /
/ Programmer: Srinivas Vanam /
/ Date: 28-AUG-2015 /
/ Purpose: This Program is a template for the Header. /
/ /
/ --------------------------------------------------------------------- /
/ Modification History /
/ Version Date Programmer Comments /
/ --------------------------------------------------------------------- /
/ 1.0 28-AUG-2015 Srinivas Initial Version /
/ /
/ ***********************************************************************/

 Comments: A program well-documented becomes a reusable entity for the Company. Comments
should be included to explain what each step / module does. In some situations, we might have to
use a program that was written some time ago, and when we go through this program, these
comments help us in understanding the flow of the logic.

/* This DATA step reads the data from DATALINES */


data test1 ;
input stuid stuname $ math phy chem ;
datalines;
1 Venkat 98 90 92
2 David 99 89 87
3 Karthik 90 90 94
4 Todd 92 96 81
;
run;

/* This DATA step calculates the Total and % */


data test2 ;
set test1 ;
total = sum(math, phy, chem) ;
percnt = total / 3 ;
run;

 Indentation: It makes the program look clear especially when multi-level if-else conditions / do loops
are used. Use either a tab space or 2/3 spaces for intending each block.

Page 3 of 10
data test2 ;
set test1 ;

if state = "NJ" then do ;


if city = "Bridgewater" then do ;
zipcode = "08807" ;
end;
else if city = "Edison" then do ;
zipcode = "08820" ;
end;
end;
else if state = "MS" then do ;
if city = "Gulf port" then do ;
zipcode = "39507" ;
end;
else if city = "Brandon" then do ;
zipcode = "39047" ;
end;
end;
run;

 Leave one or two blank lines between each DATA/PROC step.


 Consistent casing for Keywords: Although SAS is case-insensitive language, it is a good practice
to use uniform casing throughout the program.

/* Consistent casing for Keywords */


* Example(Original): DATA / Set/ run ;
DATA student ;
Set student Employee ;
run;

* Example(Alternative 1): data / set / run ;


data student ;
set student employee ;
run;

* Example(Alternative 2): DATA / SET / RUN ;


DATA student ;
SET student employee ;
RUN;

 camelCase of Identifiers: For user-defined names for datasets / macros / variables, it is a good idea
to use camelCase notation. For example, the macro name “foreachcycle” can be named as
“forEachCycle”. Similarly, the dataset name “adverseevents” can be named as “adverseEvents”.

 Always avoid clever code. Programs should be written with as simple as possible that even a
beginner should be able to understand it.

Page 4 of 10
 In an Interactive SAS mode, always place the following code at the beginning of the program:

o To clear the Log & Output window contents by using the following statement before
submitting the program.

dm "log; clear; lst; clear;" ;

o To delete all the temporary datasets in WORK library.

proc datasets lib = work memtype = data kill nolist nowarn ;


quit;

 Avoid overwriting WORK datasets. Try to save the updated dataset with a different name. This helps
in debugging the program in case of logical / data errors.

 Where ever needed, try to print messages in the SAS Log using PUT and %PUT statements.

 While working with macros, try to use options like MPRINT, MLOGIC, SYMBOLGEN. These options
will print the messages in the SAS Log which helps in understanding the macro.

 Try to use SOURCE2 option on the %INCLUDE statement. This will print the included code in the
SAS Log.

 Try to use consistent path names in LIBNAME statements. For example,


libname sdtm "c:\study-001\analysis\sdtm" access=readonly ;
libname adam "c:\study-001\analysis\adam" access=readonly ;
libname raw "c:\study-001\databases\oracle\raw\" access=readonly ;

 Always use DATA= option on the PROC step to know which dataset is being used by the Procedure.

 Always use OUT= option on the PROC SORT to have the source dataset unchanged.

 Always use LENGTH statement to explicitly declare the length of character variables and avoid
possible truncation of data.

 Use parenthesis in the arithmetic expressions for better understanding of computations. Try to give
proper spacing between variables in arithmetic expression.

data _null_ ;
x = sqrt(b*b - 2*a*c )/(4*a) ;
run;

 In the macro definitions, keyword parameters are preferred over positional parameters.

 While producing the reports using PROC REPORT, make sure that the line size does not exceed the
screen size.

Page 5 of 10
 Try to avoid having multiple SAS statements on a single line. Except in those scenarios like initializing

.
the variables with the FIRST variable.

 Keep only required variables / observations in the dataset being used. This saves both space an
processing time.

 Try to use single-hyphen (-) and double-hyphen (--) variable shortcuts instead of listing all the
variables.

 Variables names TEMP01, TEMP02, TEMP03, … , TEMP11 are better than TEMP1, TEMP2,
TEMP3, … , TEMP11.

 Mention the macro name on the %MEND statement to know which macro it corresponds to.

 Avoid nested macros. If it is mandatory, then use it with MPRINTNEST option.

 Remove dead (non-functional) code from your program.

data test ;
x = 1 ;
y = 2 ;
output ;
put "This message is printed" ;
stop ;

/* The below is Dead Code: Never executed */


put "This message is not printed" ;
run;

 If there any debugging code in the program, place it at the end to separate it from the main logic of
the program.

 If there is any suggestion from statistician which is not present in the specification document, specify
it in the program with a comment.

data _null_ ;

/* As per Statistician's(Mr. ABC) Suggestion on 31-Aug-2015,


the units "mycrogram", "mycrogramm" needs to be mapped to
"ug" since "ug" is present in the CDISC Controlled
Terminology
*/
if cmdosu in ("mycrogram","mycrogramm") then cmdosu = "ug" ;
run;

Page 6 of 10
EFFICIENCY

 For sub-setting the datasets, try to use WHERE condition instead of an IF condition. WHERE
condition subsets the observations before reading the input dataset, where as, IF condition subsets
the observations after reading the input dataset and before writing the observation on to the output
dataset.

 While writing an IF-ELSE block, use the most probable condition as the first IF condition and second
most probably one as second IF condition and so on for all of the conditions. This reduces the
number of overall comparisons needed.

 Keep only required observations and variables in the dataset that are useful for processing. Try to
drop the observations and variables that might not be used to save memory and processing time.
This should be done as early/beginning as possible in the program. This approach both improves
efficiency and also readability.

 When there is a need to write multiple IF statements for mutually exclusive conditions, try to use IF-
ELSE blocks as it saves some comparisons from being performed.

 Try to use new appropriate programming procedures where ever possible. Like for example: Forest
plots created using PROC SGRENDER and PROC TEMPLATE are much easier than the ones
created using PROC GPLOT with Annotation Facility.

 When concatenation two datasets, try to use PROC APPEND instead of the SET statement. PROC
APPEND just reads in observations from only one dataset and appends them at the end of other
dataset. But SET statement reads the observations from both the datasets and writes all those
observations to another output dataset.

 Try to use Hash objects/Indexes where ever possible for faster access of the data.

 While working with PROC MEANS, try to use CLASS statement instead of BY statement as it does
not require you to sort the source dataset which avoids a PROC SORT.

REUSABILITY

 Whenever there is some logic that seems to be repeated at various instances, try to macrotize that
logic and save it in a central location. Educate the programmers in your company to use this macro.

 If there is some logic that may be suitable for macro, try to create a user-defined function for that logic
using PROC FCMP.

 Every company might have some set of validated standard TLF macros. Try to learn and make use of
them in your programs instead of reinventing the wheel.

 Try to use standard macros as much as possible in your programs.

Page 7 of 10
 While creating some Efficacy outputs usually Efficacy datasets are generated that are not part of the
Analysis dataset Specifications. Try to save these datasets in some location which may be useful in
the future.

ROBUSTABILITY

 Avoid platform dependency in your programs and try to make them as generalized as possible.

 Always avoid hardcoding in your programs. Try to read the source data either from other SAS
datasets or from Excel spreadsheets.

 Try to use macro parameters as much as possible.

 Try to consider all possibilities of the data and take measures avoiding program crash/halt. This
technique is also called as “Defensive Programming”.

 Print messages in the SAS Log if any unexpected data is encountered.

 Try to use PROC COPY instead of CPORT and CIMPORT. COPY is platform independent procedure
and is recommended in the Clinical Trials Industry. XPT files created using CPORT can only be read
using CIMPORT. XPT files created using CPORT in one computer having higher SAS version cannot
be read with CIMPORT on another computer with lower SAS version.

 Always write code for handling missing values with either ELSE block or OTHER option in PROC
FORMAT.

 .. . . . .
In SAS, there are 28 different types of missing values. They are: , _, A, B, C, …, Z. So while

considering missing values in your program, try to use .Z instead of ..

 Avoid simple WARNING messages like Numeric to Character and Character to Numeric conversions.

GENERAL TIPS FOR DAY-TO-DAY PROGRAMMING ACTIVITIES:

In this section we discuss few tips that can be used by SAS Programmers in their day to day activities.

 Write your program incrementally. It means that when you are given a task for creating a dataset /
table, start with the basic things first. Just get the required datasets, and then derive each variable at
a time. This way you will not get overwhelmed with the complexity of the dataset logic.

 Whenever you submit a SAS Program, always check the SAS Log for ERRORs, WARNINGs, and
NOTEs from Top to Bottom.

 When you are supposed to work on a Lab dataset for example which usually has large number of
observations, start with few observations first and then derive all the variables for those observations.
Then apply the same logic to all the observations.

Page 8 of 10
 Always keep a backup of your program. We never know when our systems will be down. It is always
safe to have a backup copy of our program.

 Use Version Control Tools like Visual SlickEdit / Subversion / CVS / Clearcase. If these tools are not
allowed in your company, try to save the versions manually in your home directory.

 Use defensive programming techniques in your programs to handle all possible scenarios. Especially
in the scenarios like divide by zero.

 While referencing a SAS library, always use ACCESS = READONLY option unless you need to write
some datasets to that directory, to avoid any overwriting of existing datasets.

 Always make your programs / log files/ lst files/ datasets read-only. So that they will not get
overwritten unknowingly.

CODE REFACTORING

Code refactoring is the process of restructuring existing computer code – changing the factoring – without
changing its external behavior. Refactoring improves nonfunctional attributes of the software. Advantages
include improved code readability, and reduced complexity.

In the Clinical SAS Industry, we usually write a lot of macros and programs for day-to-day activities. While
writing the programs / macros, our intention is to make it work to produce desired output / dataset. During
that time we do not pay much attention to its four criteria discussed above. However, we can always
come back to those programs later and update them to enhance its four criteria without changing its
external behavior.

CONCLUSION

Hence by following some / all of the above discussed standards / conventions makes your day-to-day
work more easy and enjoyable. The list provided above is not exhaustive. Try to add as many points as
you can that you encounter in your day-to-day work.

REFERENCES

1. “Guidelines for Coding of SAS Programs” by Thomas J. Winn Jr.


2. “An Animated Guide: Coding Standards for SAS Production Programs” by Russ Lavery.
3. “Industry Standard Good Programming Practice for Clinical Trials (Using SAS)” by Mark
Foxwell.
4. “How Readable and Comprehensible Is a SAS Program? A Programmatic Approach to
Getting an Insight into a Program” by Rajesh Lal, Raghavender Ranga.
5. “SAS Programming Standards (or) Nobody’s gonna tell me what to do!” by John Olson,
Michael L. Sperling.

Page 9 of 10
ACKNOWLEDGEMENTS

Percept Pharma Services, Bridgewater, NJ.

CONTACT INFORMATION

Your comments and suggestions are valued and encouraged. Please contact the authors at:

Srinivas Vanam srinivasvanam@gmail.com

Manvitha Yennam ymanvitha@gmail.com

Phaneendhar Vanam phaneendhar.v@gmail.com

TRADEMARK INFORMATION

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of
SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Other brand and product names are trademarks of their respective companies.

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of
SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Other brand and product names are trademarks of their respective companies.

Page 10 of 10

You might also like