Understanding SAS/Warehouse Administrator: Michael Davis, Bassett Consulting Services, North Haven, Connecticut
Understanding SAS/Warehouse Administrator: Michael Davis, Bassett Consulting Services, North Haven, Connecticut
                                                        1
    effort to correct this and other logical                     the same data sources, performance of the
    inconsistencies should be mounted. Edit-check                transaction systems can suffer.      A better
    programs can identify elements that require                  solution may be to create or update data
    cleansing and can perhaps correct some errors                warehouse and mart tables from the operational
    without manual intervention.                                 systems during off-peak hours.
•   Standardize physical file structures. Some data          •   Save human resources. In organizations where
    may be in flat files, some in SAS data sets, and             data warehousing is not well organized, multiple
    others in third-party data base management                   persons often duplicate efforts to transform
    systems (DBMSs) such as Oracle, Microsoft                  operational data for reporting. Some of these
    SQL Server, and DB2. To facilitate reporting,               persons may not have the appropriate skills or
    the data should be transformed into tables of                tools to perform this task. One of the benefits to
    same physical format. If the reporting is to be              data warehousing is to save human effort and
    done with SAS tools, then the data should be                 costs in creating and maintaining data
    stored in SAS tables, MDDB cubes, or views                   warehouses and data marts.
    created by SAS/ACCESS.
•   Document data repository.        Much effort is          •   changes are automatically posted to generated
    consumed in answering questions such as                      SAS code
    “Where does that number come from?” and
    “What does that code represent?” One common              •   process flowcharts can be generated
    benefit of data warehousing is that the process
    and resulting data stores of the data warehouse          •   HTML document can be generated and posted
    and marts are documented for the benefit of the              to a web server
    potential users.
                                                             •   Dependent job scheduling and load-sharing can
•   Improve performance of operational systems.                  be accomplished via the LSF JobScheduler
    Transaction systems are often designed to give
    best performance when a few records are to be            •   Process libraries and other features are
    retrieved.     By contrast, decision support                 available to structure the warehousing process
    applications typically read entire tables. When
    transaction and decision support systems share
                                                         2
WHAT IS METADATA AND WHY IS IT NEEDED ?                        Another feature of the Data Warehousing
                                                               Community section of the SAS web site that is well
Metadata is information that defines sources, data             worth exploring is the “Getting Started with
stores, code libraries, and other resources. It is             SAS/Warehouse Administrator”. Also supplied with
used to write the actual SAS code. Technical                   the software, it can be found at:
metadata defines where the data lives and how to
access it. Business metadata defines what the data             http://www.sas.com/service/tutorials/v8/warehous/index.html
means and who is responsible for it.
                                                               This tutorial is invaluable to those attempting to set
Perhaps     the    major  advantage    of    using             up    their    first   project in SAS/Warehouse
SAS/Warehouse Administrator is that it facilitates             Administrator.
the creation and maintenance of metadata.
Consider the following example. It is decided to
change the logic used to transform a column of                 STARTING SAS/WAREHOUSE ADMINISTRATOR
intermediate information? How do we find all of the
places affected by the change and make sure that               The     current     version     of    SAS/Warehouse
they use the new logic?                                        Administrator is run within a SAS session as a SAS
                                                               desktop application. The SAS desktop is a graphical
Without a tool such as SAS/Warehouse                           interface to tools and files. It is supplied with Base
Administrator, making changes to an existing data              SAS. However, the SAS desktop is more commonly
warehouse or mart can be a nightmare. Metadata                 used to access features of SAS/EIS and other SAS
gives us a single point of control, even when                  products.
warehousing occurs across multiple computer
platforms.                                                     To start SAS/Warehouse Administrator, one can
                                                               select -> Solutions -> Development and
SAS/Warehouse Administrator facilitates changes to             Programming      ->    Warehouse       Administrator.
programs that create and maintain data warehouses              However, the author finds it more convenient to
and marts because it actually generates the SAS                issue the command “DW” from the command bar.
code to be run. It also provides tools to search,              This will open up a window similar to the one shown
report, and document the metadata.              Finally,       in Figure 1:
SAS/Warehouse Administrator can import and
export metadata to other applications. This opens
the possibility of using additional tools to create and
maintain a data warehouse or mart.
                                                           3
organized into Subjects, which may contain Data           The Count Input ODD Group defines all of the input
Groups, Infomarts, and OLAP Groups. Operational           sources to our warehouse. In addition to the files
Data Groups Definition Groups include one or more         received periodically from three countries (France,
Operational Data Definitions. Figure 2 shows the          Italy, and Netherlands), the Call Center History table
hierarchy of groups and data stores when our              also appears. This icon refers to the same physical
“demo” environment is opened:                             table that the Call Center History data table. This is
                                                          because the inputs to warehouse processes must
                                                          be defined as ODDs and output tables must defined
                                                          as data tables.
GLOBAL METADATA
Figure 2
In our demo environment, we see the HASUG demo            One can create, examine update, and delete
warehouse environment icon at the top of the              different categories of metadata by selecting a radio
hierarchy.   Next in the hierarchy, we see                station in the Type window, selecting the metadata
HASUG_demo data warehouse icon and the                    item in the window below it (SAS Libraries as shown
Country Input ODD (Operational Data Definition)           in Figure 3), and clicking on the Add, Edit, or
Group icon.                                               Remove buttons.
Under the HASUG_demo data warehouse icon is               SAS library metadata is largely self-explanatory.
the Call Center Data Group, the Temporary Files           However, one of the author’s tricks is to use
data group, and the MDDBs subject. In Figure 2,           SAS/ACCESS Libname engines to define data
the icons for the history tables and input errors         sources in DBMSs as SAS libraries rather than
tables are shown under the Call Center Data Group.        DBMS Connections. This trick seems to work better
                                                          when the DBMS options are embedded in the Path
Under the MDDBs subject icon, there is a single           text box instead of the Options text box on the
MDDBS OLAP group. Under this group are the                Details tab.
three MDDB OLAP cubes created by the demo
environment.
                                                      4
In our demo, only the computer on which                       OPERATIONAL DATA DEFINITIONS
SAS/Warehouse Administrator is defined as a host
computer. However, in a distributed computing                 Operational Data Definitions are metadata records
environment, remote hosts can be defined to and               that provide the instructions to access data sources.
can be controlled by SAS/Warehouse Administrator.             Figure 4 illustrates an Operational Data Definition
                                                              Properties Window.
All elements of a data warehouse have an owner
and an administrator as attributes. The contact
information for all individuals who serve in these
roles is defined in a single place. This makes
updating this information much more convenient.
The author often defines the Work libref as part of           Two tips can be shared about the Columns tab. It
the metadata so it can be used when defining                  can take a bit of time to key in the required
temporary tables that should disappear when the               information for a new table. If a similar table already
SAS session ends. If MDDBs are to be used                     exists or can be generated by running some legacy
outside of SAS/Warehouse Administrator, such as               SAS code, it is much faster and easier to “import”
with AppDev Studio or WebHound software,                    the required information from that table. Also, after
then it may be useful to define an MDDB libref in the         moving rows up or down with the arrowhead buttons
global metadata and assign it through the                     at the bottom of the window, right-click on a row and
autoexec.sas program.                                         select Save Order to retain the new order after the
                                                              properties window is closed.
                                                          5
At this point, it might be a good idea to define any           Table… This brings up the selector shown in
additional ODDs required. After all the ODDs have              Figure 7. Select the category of output table to be
been defined, the next logical step is usually to              added and click on the Show button to display the
define the required output structures.                         output tables available to be added.
There are multiple ways to bring up the Process                Next, for each output table, define the input data
Editor. One method is to select Tools -> Process               source(s). This is done from a selector similar to the
Editor from the pull-down menus. Figure 6 shows a              one shown in Figure 7. Again the same physical
sample Process Editor window.                                  table can be an output table and an input table
                                                               within the same process flow.
MAPPING STEPS
                                                           6
generated by SAS/Warehouse Administrator or is             Mapping can be either 1 to 1 (1:1) or derived. If any
written by the user.                                       of the column names are shared between the input
                                                           and output tables, clicking on the button labeled
                                                           “1 to 1 Mappings…” automatically sets mapping
                                                           relationships for those columns whose names
                                                           match.
Figure 8
Figure 10
One advanced use of the Output Data tab is to              This expression replaces the value in the master
specify which rows are written to each of multiple         (history) table for city with the value of city in the
output tables. This is done by first selecting the         transaction (country) table only when city is a non-
appropriate target table by clicking on the down           missing value in the transaction table (and when the
arrow. Then while the desired target table is              WHERE keys match).
displayed, click on the Generation Options button.
Then on the Row Selection tab, specify “Row                As noted earlier, one of the advantages of using
Selection Conditions” or “User Defined Statements”         SAS/Warehouse Administrator is that many
to direct the rows to be output to the target table.       expressions can be built using a “point and click”
                                                       7
interface in lieu of typing, illustrated by Figure 11 on       the same result could be accomplished by the
the following page.                                            appropriate specifications to the warehouse
                                                               metadata. This temptation should be strenuously
                                                               avoided!
LOAD STEPS
                                                           8
job information library will suffice. However, if            http://www.sas.com/rnd/warehousing/wa/addins.html
multiple hosts run jobs, then SAS/SHARE should be
used.                                                        The list of add-in tools changes periodically and new
                                                             versions of existing tools are often available for
SCHEDULING JOBS                                              download.         The     application   interface   to
                                                             SAS/Warehouse Administrator is documented so
SAS/Warehouse Administrator can natively send                one can create their own add-in tools if they can
jobs to CRON (Unix hosts) and AT (Windows hosts).            code in SAS Component Language (SCL).
The null scheduler generates a “stub” file that
external job schedulers can read for scheduling              It is the author’s understanding the add-in tools will
information. To use the null scheduler, right-click on       disappear in a future version of SAS/Warehouse
a job and select Properties. On the Date/Time tab,           Administrator although the functionality that they
as illustrated by Figure 14, select when the job is to       provide should remain.
be run. Then on the Server tab, specify the null
scheduler server.
                                                             GENERATING HTML DOCUMENTATION
Figure 14
ADD-IN TOOLS
                                                         9
SEARCHING AND MIGRATING METADATA                             Java interface so it will no longer be necessary to be
                                                             sitting in front of the host computer or to operate it
One of the big advantages of entering all of the             via terminal emulation software. The Java interface
warehouse details as metadata is that one can                will communicate to a metadata repository and
search it. From the pull-down menu, select Tools ->          server.
Search Metadata…        The dialog box similar to
Figure 17 should appear.                                     There will be a one-way conversion tool to migrate
                                                             SAS/Warehouse Administrator metadata into Data
                                                             Builder.      However, existing SAS/Warehouse
                                                             Administrator users can continue to use the product
                                                             as in the past.
CONCLUSION
10