PSPP Dev
PSPP Dev
Ben Pfaff
John Darrington
This manual is for GNU PSPP version 2.0.0-g5b54d1, software for statistical analysis.
Copyright c 1997, 1998, 2004, 2005, 2007, 2010, 2014, 2015, 2016, 2020 Free Software
Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document under the
terms of the GNU Free Documentation License, Version 1.3 or any later version
published by the Free Software Foundation; with no Invariant Sections, no
Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included
in the section entitled "GNU Free Documentation License".
i
Table of Contents
2.2.4 Areas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.2.5 Borders. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.2.6 Print Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.2.7 Table Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.2.8 Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2.9 Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.2.10 Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.2.11 Axes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.2.12 Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.2.13 Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.2.14 ValueMod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.3 Legacy Detail Member Binary Format . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.3.1 Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.3.2 Numeric Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.3.3 String Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.4 Legacy Detail Member XML Format . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.4.1 The visualization Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.4.2 Variable Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.4.2.1 The sourceVariable Element . . . . . . . . . . . . . . . . . . . . . . . 62
2.4.2.2 The derivedVariable Element . . . . . . . . . . . . . . . . . . . . . . 63
2.4.2.3 The valueMapEntry Element . . . . . . . . . . . . . . . . . . . . . . . . 64
2.4.3 The extension Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.4.4 The graph Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.4.5 The location Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.4.6 The faceting Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.4.7 The facetLayout Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.4.8 The label Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.4.9 The setCellProperties Element. . . . . . . . . . . . . . . . . . . . . . . . . 70
2.4.10 The setFormat Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.4.10.1 The numberFormat Element. . . . . . . . . . . . . . . . . . . . . . . . . 73
2.4.10.2 The stringFormat Element. . . . . . . . . . . . . . . . . . . . . . . . . 74
2.4.10.3 The dateTimeFormat Element . . . . . . . . . . . . . . . . . . . . . . 74
2.4.10.4 The elapsedTimeFormat Element . . . . . . . . . . . . . . . . . . . 77
2.4.10.5 The format Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.4.10.6 The affix Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.4.11 The interval Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.4.12 The style Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.4.13 The labelFrame Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.4.14 Legacy Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
char rec_type[4];
Record type code, either ‘$FL2’ for system files with uncompressed data or data
compressed with simple bytecode compression, or ‘$FL3’ for system files with
ZLIB compressed data.
This is truly a character field that uses the character encoding as other strings.
Thus, in a file with an ASCII-based character encoding this field contains 24 46
4c 32 or 24 46 4c 33, and in a file with an EBCDIC-based encoding this field
contains 5b c6 d3 f2. (No EBCDIC-based ZLIB-compressed files have been
observed.)
char prod_name[60];
Product identification string. This always begins with the characters
‘@(#) SPSS DATA FILE’. PSPP uses the remaining characters to give its
version and the operating system name; for example, ‘GNU pspp 0.1.4 -
sparc-sun-solaris2.5.2’. The string is truncated if it would be longer than
60 characters; otherwise it is padded on the right with spaces.
The product name field allow readers to behave differently based on quirks
in the way that particular software writes system files. See Section 1.4
[Value Labels Records], page 9, for the detail of the quirk that the
PSPP system file reader tolerates in files written by ReadStat, which has
https://github.com/WizardMac/ReadStat in prod_name.
int32 layout_code;
Normally set to 2, although a few system files have been spotted in the wild
with a value of 3 here. PSPP use this value to determine the file’s integer
endianness (see Chapter 1 [System File Format], page 2).
Chapter 1: System File Format 5
int32 nominal_case_size;
Number of data elements per case. This is the number of variables, except that
long string variables add extra data elements (one for every 8 characters after
the first 8). However, string variables do not contribute to this value beyond
the first 255 bytes. Further, some software always writes -1 or 0 in this field.
In general, it is unsafe for systems reading system files to rely upon this value.
int32 compression;
Set to 0 if the data in the file is not compressed, 1 if the data is compressed
with simple bytecode compression, 2 if the data is ZLIB compressed. This field
has value 2 if and only if rec_type is ‘$FL3’.
int32 weight_index;
If one of the variables in the data set is used as a weighting variable, set to
the dictionary index of that variable, plus 1 (see [Dictionary Index], page 6).
Otherwise, set to 0.
int32 ncases;
Set to the number of cases in the file if it is known, or -1 otherwise.
In the general case it is not possible to determine the number of cases that will
be output to a system file at the time that the header is written. The way that
this is dealt with is by writing the entire system file, including the header, then
seeking back to the beginning of the file and writing just the ncases field. For
files in which this is not valid, the seek operation fails. In this case, ncases
remains -1.
flt64 bias;
Compression bias, ordinarily set to 100. Only integers between 1 - bias and
251 - bias can be compressed.
By assuming that its value is 100, PSPP uses bias to determine the file’s
floating-point format and endianness (see Chapter 1 [System File Format],
page 2). If the compression bias is not 100, PSPP cannot auto-detect the
floating-point format and assumes that it is IEEE 754 format with the same
endianness as the system file’s integers, which is correct for all known system
files.
char creation_date[9];
Date of creation of the system file, in ‘dd mmm yy’ format, with the month as
standard English abbreviations, using an initial capital letter and following with
lowercase. If the date is not available then this field is arbitrarily set to ‘01 Jan
70’.
char creation_time[8];
Time of creation of the system file, in ‘hh:mm:ss’ format and using 24-hour
time. If the time is not available then this field is arbitrarily set to ‘00:00:00’.
char file_label[64];
File label declared by the user, if any (see Section “FILE LABEL” in PSPP
Users Guide). Padded on the right with spaces.
A product that identifies itself as VOXCO INTERVIEWER 4.3 uses CR-only line
ends in this field, rather than the more usual LF-only or CR LF line ends.
Chapter 1: System File Format 6
char padding[3];
Ignored padding bytes to make the structure a multiple of 32 bits in length.
Set to zeros.
int32 n_missing_values;
If the variable has no missing values, set to 0. If the variable has one, two, or
three discrete missing values, set to 1, 2, or 3, respectively. If the variable has
a range for missing variables, set to -2; if the variable has a range for missing
variables plus a single discrete value, set to -3.
A long string variable always has the value 0 here. A separate record indicates
missing values for long string variables (see Section 1.16 [Long String Missing
Values Record], page 21).
int32 print;
Print format for this variable. See below.
int32 write;
Write format for this variable. See below.
char name[8];
Variable name. The variable name must begin with a capital letter or the at-
sign (‘@’). Subsequent characters may also be digits, octothorpes (‘#’), dollar
signs (‘$’), underscores (‘_’), or full stops (‘.’). The variable name is padded
on the right with spaces.
The ‘name’ fields should be unique within a system file. System files written
by SPSS that contain very long string variables with similar names sometimes
contain duplicate names that are later eliminated by resolving the very long
string names (see Section 1.13 [Very Long String Record], page 17). PSPP
handles duplicates by assigning them new, unique names.
int32 label_len;
This field is present only if has_var_label is set to 1. It is set to the length, in
characters, of the variable label. The documented maximum length varies from
120 to 255 based on SPSS version, but some files have been seen with longer
labels. PSPP accepts labels of any length.
char label[];
This field is present only if has_var_label is set to 1. It has length label_len,
rounded up to the nearest multiple of 32 bits. The first label_len characters
are the variable’s variable label.
flt64 missing_values[];
This field is present only if n_missing_values is nonzero. It has the same
number of 8-byte elements as the absolute value of n_missing_values. Each
element is interpreted as a number for numeric variables (with HIGHEST and
LOWEST indicated as described in the chapter introduction). For string vari-
ables of width less than 8 bytes, elements are right-padded with spaces; for
string variables wider than 8 bytes, only the first 8 bytes of each missing value
are specified, with the remainder implicitly all spaces.
For discrete missing values, each element represents one missing value. When a
range is present, the first element denotes the minimum value in the range, and
the second element denotes the maximum value in the range. When a range plus
a value are present, the third element denotes the additional discrete missing
value.
Chapter 1: System File Format 8
The print and write members of sysfile variable are output formats coded into int32
types. The least-significant byte of the int32 represents the number of decimal places, and
the next two bytes in order of increasing significance represent field width and format type,
respectively. The most-significant byte is not used and should be set to zero.
Format types are defined as follows:
Value Meaning
0 Not used.
1 A
2 AHEX
3 COMMA
4 DOLLAR
5 F
6 IB
7 PIBHEX
8 P
9 PIB
10 PK
11 RB
12 RBHEX
13 Not used.
14 Not used.
15 Z
16 N
17 E
18 Not used.
19 Not used.
20 DATE
21 TIME
22 DATETIME
23 ADATE
24 JDATE
25 DTIME
26 WKDAY
27 MONTH
28 MOYR
29 QYR
30 WKYR
31 PCT
32 DOT
33 CCA
34 CCB
35 CCC
36 CCD
37 CCE
38 EDATE
Chapter 1: System File Format 9
39 SDATE
40 MTIME
41 YMDHMS
A few system files have been observed in the wild with invalid write fields, in particular
with value 0. Readers should probably treat invalid print or write fields as some default
format.
The value label record is always immediately followed by a value label variables record
with the following format:
int32 rec_type;
int32 var_count;
int32 vars[];
int32 rec_type;
Record type. Always set to 4.
int32 var_count;
Number of variables that the associated value labels from the value label record
are to be applied.
int32 vars[];
A list of 1-based dictionary indexes of variables to which to apply the value
labels (see [Dictionary Index], page 6). There are var_count elements.
String variables wider than 8 bytes may not be specified in this list.
/* Data. */
int32 version_major;
int32 version_minor;
int32 version_revision;
int32 machine_code;
Chapter 1: System File Format 11
int32 floating_point_rep;
int32 compression_code;
int32 endianness;
int32 character_code;
int32 rec_type;
Record type. Always set to 7.
int32 subtype;
Record subtype. Always set to 3.
int32 size;
Size of each piece of data in the data part, in bytes. Always set to 4.
int32 count;
Number of pieces of data in the data part. Always set to 8.
int32 version_major;
PSPP major version number. In version x.y.z, this is x.
int32 version_minor;
PSPP minor version number. In version x.y.z, this is y.
int32 version_revision;
PSPP version revision number. In version x.y.z, this is z.
int32 machine_code;
Machine code. PSPP always set this field to value to -1, but other values may
appear.
int32 floating_point_rep;
Floating point representation code. For IEEE 754 systems this is 1. IBM 370
sets this to 2, and DEC VAX E to 3.
int32 compression_code;
Compression code. Always set to 1, regardless of whether or how the file is
compressed.
int32 endianness;
Machine endianness. 1 indicates big-endian, 2 indicates little-endian.
int32 character_code;
Character code. The following values have been actually observed in system
files:
1 EBCDIC.
2 7-bit ASCII.
1250 The windows-1250 code page for Central European and Eastern
European languages.
1252 The windows-1252 code page for Western European languages.
28591 ISO 8859-1.
65001 UTF-8.
Chapter 1: System File Format 12
/* Data. */
flt64 sysmis;
flt64 highest;
flt64 lowest;
int32 rec_type;
Record type. Always set to 7.
int32 subtype;
Record subtype. Always set to 4.
int32 size;
Size of each piece of data in the data part, in bytes. Always set to 8.
int32 count;
Number of pieces of data in the data part. Always set to 3.
flt64 sysmis;
flt64 highest;
flt64 lowest;
The system missing value, the value used for HIGHEST in missing values, and
the value used for LOWEST in missing values, respectively. See Chapter 1
[System File Format], page 2, for more information.
The SPSSWriter library in PHP, which identifies itself as FOM SPSS 1.0.0 in
the file header record prod_name field, writes unexpected values to these fields,
but it uses the same values consistently throughout the rest of the file.
multiple response sets that can be understood by SPSS before version 14. The second type
of record, with a closely related format, is used for multiple dichotomy sets that use the
CATEGORYLABELS=COUNTEDVALUES feature added in version 14.
/* Header. */
int32 rec_type;
int32 subtype;
int32 size;
int32 count;
0 Unknown
1 Nominal
2 Ordinal
3 Scale
An “unknown” measure of 0 means that the variable was created in some way
that doesn’t make the measurement level clear, e.g. with a COMPUTE transfor-
mation. PSPP sets the measurement level the first time it reads the data using
the rules documented in Section “Measurement Level” in PSPP Users Guide,
so this should rarely appear.
int32 width;
The width of the display column for the variable in characters.
This field is present if count is 3 times the number of variables in the dictionary.
It is omitted if count is 2 times the number of variables.
int32 alignment;
The alignment of the variable for display purposes:
0 Left aligned
1 Right aligned
2 Centre aligned
char text[];
The variable sets, in a text-based format.
Each variable set occupies one line of text, each of which ends with a line feed
(byte 0x0a), optionally preceded by a carriage return (byte 0x0d).
Each line begins with the name of the variable set, followed by an equals sign
(‘=’) and a space (byte 0x20), followed by the long variable names of the mem-
bers of the set, separated by spaces. A variable set may be empty, in which
case the equals sign and the space following it are still present.
A very long string with a width of w has n = (w + 251) / 252 segments, that is, one
segment for every 252 bytes of width, rounding up. It would be logical, then, for each of the
segments except the last to have a width of 252 and the last segment to have the remainder,
but this is not the case. In fact, each segment except the last has a width of 255 bytes. The
last segment has width w - (n - 1) * 252; some versions of SPSS make it slightly wider, but
not wide enough to make the last segment require another 8 bytes of data.
Data is packed tightly into segments of a very long string, 255 bytes per segment. Because
255 bytes of segment data are allocated for every 252 bytes of the very long string’s width
(approximately), some unused space is left over at the end of the allocated segments. Data
in unused space is ignored.
Example: Consider a very long string of width 20,000. Such a very long string has 20,000
/ 252 = 80 (rounding up) segments. The first 79 segments have width 255; the last segment
has width 20,000 - 79 * 252 = 92 or slightly wider (up to 96 bytes, the next multiple of 8).
The very long string’s data is actually stored in the 19,890 bytes in the first 78 segments,
plus the first 110 bytes of the 79th segment (19,890 + 110 = 20,000). The remaining 145
bytes of the 79th segment and all 92 bytes of the 80th segment are unused.
The very long string record explains how to stitch together segments to obtain very long
string data. For each of the very long string variables in the dictionary, it specifies the
name of its first segment’s variable and the very long string variable’s actual width. The
remaining segments immediately follow the named variable in the system file’s dictionary.
The very long string record, which is present only if the system file contains very long
string variables, has the following format:
/* Header. */
int32 rec_type;
int32 subtype;
int32 size;
int32 count;
fields are separated by a ‘=’ byte. Tuples are delimited by a two-byte sequence
{00, 09}. After the last tuple, there may be a single byte 00, or {00, 09}. The
total length is count bytes.
int32 subtype;
int32 size;
int32 count;
this mistake, if they wish, by noticing and skipping the extra int32 values, which wouldn’t
ordinarily occur in strings.
Example
A system file produced with the following VARIABLE ATTRIBUTE commands in effect:
VARIABLE ATTRIBUTE VARIABLES=dummy ATTRIBUTE=fred[1](’23’) fred[2](’34’).
VARIABLE ATTRIBUTE VARIABLES=dummy ATTRIBUTE=bert(’123’).
will contain a variable attribute record with the following contents:
0000 07 00 00 00 12 00 00 00 01 00 00 00 22 00 00 00 |............"...|
0010 64 75 6d 6d 79 3a 66 72 65 64 28 27 32 33 27 0a |dummy:fred(’23’.|
0020 27 33 34 27 0a 29 62 65 72 74 28 27 31 32 33 27 |’34’.)bert(’123’|
0030 0a 29 |.) |
int64 unknown;
Meaning unknown. Always set to 1.
int64 ncases64;
Number of cases in the file as a 64-bit integer. Presumably this could be -1 to
indicate that the number of cases is unknown, for the same reason as ncases
in the file header record, but this has not been observed in the wild.
prefix_Imagegeneric.png
prefix_PastedObjectgeneric.png
prefix_imageData.bin
A PNG image referenced by an object element (in the first two cases) or an
image element (in the final case). See Section 2.1.9 [SPV Structure object and
image Elements], page 37.
prefix_pmml.scf
prefix_stats.scf
prefix_model.xml
Not yet investigated. The corpus contains few examples.
The prefix in the names of the detail members is typically an 11-digit decimal number
that increases for each item, tending to skip values. Older SPV files use different naming
conventions for detail members. Structure member refer to detail members by name, and
so their exact names do not matter to readers as long as they are unique.
SPSS tolerates corrupted Zip archives that Zip reader libraries tend to reject. These can
be fixed up with zip -FF.
</table>
</container>
</heading>
</heading>
heading
:creator-version?
:commandName?
:visibility[heading_visibility]=(collapsed)?
:locale?
:olang?
=> label (container | heading)*
A heading represents a tree of content that appears in an output viewer window. It
contains a label text string that is shown in the outline view ordinarily followed by content
containers or further nested (sub)-sections of output. Unlike heading elements in HTML
and other common document formats, which precede the content that they head, heading
contains the elements that appear below the heading.
The root of a structure member is a special heading. The direct children of the root
heading elements in all structure members in an SPV file are siblings. That is, the root
heading in all of the structure members conceptually represent the same node. The root
heading’s label is ignored (see see Section 2.1.2 [SPV Structure label Element], page 33).
The root heading in the first structure member in the Zip file may contain a pageSetup
element.
The schema implies that any heading may contain a sequence of any number of heading
and container elements. This does not work for the root heading in practice, which must
actually contain exactly one container or heading child element. Furthermore, if the root
heading’s child is a heading, then the structure member’s name must end in _heading.xml;
if it is a container child, then it must not.
The following attributes have been observed on both document root and nested heading
elements.
creator-version [Attribute]
The version of the software that created this SPV file. A string of the form xxyyzzww
represents software version xx.yy.zz.ww, e.g. 21000001 is version 21.0.0.1. Trailing
pairs of zeros are sometimes omitted, so that 21, 210000, and 21000000 are all version
21.0.0.0 (and the corpus contains all three of those forms).
The following attributes have been observed on document root heading elements only:
Chapter 2: SPSS Viewer File Format 33
creator [Attribute]
The directory in the file system of the software that created this SPV file.
creation-date-time [Attribute]
The date and time at which the SPV file was written, in a locale-specific format,
e.g. Friday, May 16, 2014 6:47:37 PM PDT or lunedı̀ 17 marzo 2014 3.15.48 CET
or even Friday, December 5, 2014 5:00:19 o’clock PM EST.
lockReader [Attribute]
Whether a reader should be allowed to edit the output. The possible values are true
and false. The value false is by far the most common.
schemaLocation [Attribute]
This is actually an XML Namespace attribute. A reader may ignore it.
The following attributes have been observed only on nested heading elements:
commandName [Attribute]
A locale-invariant identifier for the command that produced the output, e.g.
Frequencies, T-Test, Non Par Corr.
visibility [Attribute]
If this attribute is absent, the heading’s content is expanded in the outline view. If it
is set to collapsed, it is collapsed. (This attribute is never present in a root heading
because the root node is always expanded when a file is loaded, even though the UI
can be used to collapse it interactively.)
locale [Attribute]
The locale used for output, in Windows format, which is similar to the format used in
Unix with the underscore replaced by a hyphen, e.g. en-US, en-GB, el-GR, sr-Cryl-
RS.
olang [Attribute]
The output language, e.g. en, it, es, de, pt-BR.
visibility [Attribute]
Whether the container’s content is displayed. “Notes” tables are often hidden; other
data is usually visible.
text-align [Attribute]
Alignment of text within the container. Observed with nested table and text ele-
ments.
width [Attribute]
The width of the container, e.g. 1097px.
All of the elements that nest inside container (except the label) have the following
optional attribute.
commandName [Attribute]
As on the heading element. The corpus contains one example of where commandName
is present but set to the empty string.
commandName [Attribute]
See Section 2.1.3 [SPV Structure container Element], page 34. For output not specific
to a command, this is simply log.
type [Attribute]
The semantics of the text.
creator-version [Attribute]
As on the heading element.
Chapter 2: SPSS Viewer File Format 35
lang [Attribute]
This always contains en in the corpus.
:creator-version?
:displayFiltering=bool?
:maxNumCells=int?
:orphanTolerance=int?
:rowBreakNumber=int?
:subType
:tableId
:tableLookId?
:type[table_type]=(table | note | warning)
=> tableProperties? tableStructure
commandName [Attribute]
See Section 2.1.3 [SPV Structure container Element], page 34.
type [Attribute]
One of table, note, or warning.
subType [Attribute]
The locale-invariant command ID for the particular kind of output that this table
represents in the procedure. This can be the same as commandName e.g. Frequencies,
or different, e.g. Case Processing Summary. Generic subtypes Notes and Warnings
are often used.
tableId [Attribute]
A number that uniquely identifies the table within the SPV file, typically a large
negative number such as -4147135649387905023.
creator-version [Attribute]
As on the heading element. In the corpus, this is only present for version 21 and up
and always includes all 8 digits.
See Section 2.4.14 [SPV Detail Legacy Properties], page 81, for details on the
tableProperties element.
:csvFileIds?
:csvFileNames?
=> dataPath? path csvPath?
This element represents a graph. The dataPath and path elements name the Zip mem-
bers that give the details of the graph. Normally, both elements are present; there is only
one counterexample in the corpus.
csvPath only appears in one SPV file in the corpus, for two graphs. In these two cases,
dataPath, path, and csvPath all appear. These csvPath name Zip members with names
of the form number_csv.bin, where number is a many-digit number and the same as the
csvFileIds. The named Zip members are CSV text files (despite the .bin extension). The
CSV files are encoded in UTF-8 and begin with a U+FEFF byte-order marker.
image
:commandName?
Chapter 2: SPSS Viewer File Format 38
:VDPId
=> dataPath
These two elements represent an image in PNG format. They are equivalent and the
corpus contains examples of both. The only difference is the syntax: for object, the uri
attribute names the Zip member that contains a PNG file; for image, the text of the inner
dataPath element names the Zip member.
PSPP writes object in output but there is no strong reason to choose this form.
The corpus only contains PNG image files.
initial-page-number [Attribute]
The page number to put on the first page of printed output. Usually 1.
chart-size [Attribute]
One of the listed, self-explanatory chart sizes, quarter-height, or a localization (!)
of one of these (e.g. dimensione attuale, Wie vorgegeben).
margin-left [Attribute]
margin-right [Attribute]
margin-top [Attribute]
margin-bottom [Attribute]
Margin sizes, e.g. 0.25in.
paper-height [Attribute]
paper-width [Attribute]
Paper sizes.
reference-orientation [Attribute]
Indicates the orientation of the output page. Either 0deg (portrait) or 90deg (land-
scape),
space-after [Attribute]
The amount of space between printed objects, typically 12pt.
type [Attribute]
Always text.
byte A byte.
bool A byte with value 0 or 1.
int16
be16 A 16-bit unsigned integer in little-endian or big-endian byte order, respectively.
int32
be32 A 32-bit unsigned integer in little-endian or big-endian byte order, respectively.
int64
be64 A 64-bit unsigned integer in little-endian or big-endian byte order, respectively.
double A 64-bit IEEE floating-point number.
float A 32-bit IEEE floating-point number.
string
bestring A 32-bit unsigned integer, in little-endian or big-endian byte order, respectively,
followed by the specified number of bytes of character data. (The encoding is
indicated by the Formats nonterminal.)
x? x is optional, e.g. 00? is an optional zero byte.
x*n x is repeated n times, e.g. byte*10 for ten arbitrary bytes.
x[name] Gives x the specified name. Names are used in textual explanations. They
are also used, also bracketed, to indicate counts, e.g. int32[n] byte*[n] for a
32-bit integer followed by the specified number of arbitrary bytes.
a|b Either a or b.
(x) Parentheses are used for grouping to make precedence clear, especially in the
presence of |, e.g. in 00 (01 | 02 | 03) 00.
count(x)
becount(x)
A 32-bit unsigned integer, in little-endian or big-endian byte order, respectively,
that indicates the number of bytes in x, followed by x itself.
v1(x) In a version 1 .bin member, x; in version 3, nothing. (The .bin header indi-
cates the version.)
v3(x) In a version 3 .bin member, x; in version 1, nothing.
PSPP uses this grammar to parse light detail members. See src/output/spv/light-binary.grammar
in the PSPP source tree for the full grammar.
Little-endian byte order is far more common in this format, but a few pieces of the
format use big-endian byte order.
Light detail members express linear units in two ways: points (pt), at 72/inch, and
“device-independent pixels” (px), at 96/inch. To convert from pt to px, multiply by 1.33
and round up. To convert from px to pt, divide by 1.33 and round down.
A “light” detail member .bin consists of a number of sections concatenated together,
terminated by an optional byte 01:
Table =>
Chapter 2: SPSS Viewer File Format 42
2.2.1 Header
An SPV light member begins with a 39-byte header:
Header =>
01 00
(i1 | i3)[version]
bool[x0]
bool[x1]
bool[rotate-inner-column-labels]
bool[rotate-outer-row-labels]
bool[x2]
int32[x3]
int32[min-col-heading-width] int32[max-col-heading-width]
int32[min-row-heading-width] int32[max-row-heading-width]
int64[table-id]
version is a version number that affects the interpretation of some of the other data
in the member. We will refer to “version 1” and “version 3” later on and use v1(. . . ) and
v3(. . . ) for version-specific formatting (as described previously).
If rotate-inner-column-labels is 1, then column labels closest to the data are rotated
90˚ counterclockwise; otherwise, they are shown in the normal way.
If rotate-outer-row-labels is 1, then row labels farthest from the data are rotated
90˚ counterclockwise; otherwise, they are shown in the normal way.
min-col-heading-width, max-col-heading-width, min-row-heading-width, and
max-row-heading-width are measurements in 1/96 inch units (called “device independent
pixel” units in Windows) whose values influence column widths. For the purpose of
interpreting these values, a table is divided into the three regions shown below:
+------------------+-------------------------------------------------+
| | column headings |
| +-------------------------------------------------+
| corner | |
| and | |
| row headings | data |
| | |
| | |
+------------------+-------------------------------------------------+
min-col-heading-width and max-col-heading-width apply to the columns in the col-
umn headings region. min-col-heading-width is the minimum width that any of these
columns will be given automatically. In addition, max-col-heading-width is the maximum
width that a column will be assigned to accommodate a long label in the column headings
Chapter 2: SPSS Viewer File Format 43
cells. These columns will still be made wider to accommodate wide data values in the data
region.
min-row-heading-width is the minimum width that a column in the corner and row
headings region will be given automatically. max-row-heading-width is the maximum
width that a column in this region will be assigned to accomodate a long label. This region
doesn’t include data, so data values don’t affect column widths.
table-id is a binary version of the tableId attribute in the structure member that refers
to the detail member. For example, if tableId is -4122591256483201023, then table-id
would be 0xc6c99d183b300001.
The meaning of the other variable parts of the header is not known. A writer may safely
use version 3, true for x0, false for x1, true for x2, and 0x15 for x3.
2.2.2 Titles
Titles =>
Value[title] 01?
Value[subtype] 01? 31
Value[user-title] 01?
(31 Value[corner-text] | 58)
(31 Value[caption] | 58)
The Titles follow the Header and specify the table’s title, caption, and corner text.
The user-title reflects any user editing of the title text or style. The title is the title
originally generated by the procedure. Both of these are appropriate for presentation and
localized to the user’s language. For example, for a frequency table, title and user-title
normally name the variable and c is simply “Frequencies”.
subtype is the same as the subType attribute in the table structure XML element that
referred to this member. See Section 2.1.6 [SPV Structure table Element], page 35, for
details.
The corner-text, if present, is shown in the upper-left corner of the table, above the row
headings and to the left of the column headings. It is usually absent. When row dimension
labels are displayed in the corner (see show-row-labels-in-corner), corner text is hidden.
The caption, if present, is shown below the table. caption reflects user editing of the
caption.
2.2.3 Footnotes
Footnotes => int32[n-footnotes] Footnote*[n-footnotes]
Footnote => Value[text] (58 | 31 Value[marker]) int32[show]
Each footnote has text and an optional custom marker (such as ‘*’).
The syntax for Value would allow footnotes (and their markers) to reference other foot-
notes, but in practice this doesn’t work.
show is a 32-bit signed integer. It is positive to show the footnote or negative to hide
it. Its magnitude is often 1, and in other cases tends to be the number of references to the
footnote. It is safe to write 1 to show a footnote and -1 to hide it.
Chapter 2: SPSS Viewer File Format 44
2.2.4 Areas
Areas => 00? Area*8
Area =>
byte[index] 31
string[typeface] float[size] int32[style] bool[underline]
int32[halign] int32[valign]
string[fg-color] string[bg-color]
bool[alternate] string[alt-fg-color] string[alt-bg-color]
v3(int32[left-margin] int32[right-margin] int32[top-margin] int32[bottom-margin])
Each Area represents the style for a different area of the table, in the following order:
title, caption, footer, corner, column labels, row labels, data, and layers.
index is the 1-based index of the Area, i.e. 1 for the first Area, through 8 for the final
Area.
typeface is the string name of the font used in the area. In the corpus, this is SansSerif
in over 99% of instances and Times New Roman in the rest.
size is the size of the font, in px (see Section 2.2 [SPV Light Detail Member Format],
page 40). The most common size in the corpus is 12 px. Even though size has a floating-
point type, in the corpus its values are always integers.
style is a bit mask. Bit 0 (with value 1) is set for bold, bit 1 (with value 2) is set for
italic.
underline is 1 if the font is underlined, 0 otherwise.
halign specifies horizontal alignment: 0 for center, 2 for left, 4 for right, 61453 for
decimal, 64173 for mixed. Mixed alignment varies according to type: string data is left-
justified, numbers and most other formats are right-justified.
valign specifies vertical alignment: 0 for center, 1 for top, 3 for bottom.
fg-color and bg-color are the foreground color and background color, respectively. In
the corpus, these are always #000000 and #ffffff, respectively.
alternate is 1 if rows should alternate colors, 0 if all rows should be the same color.
When alternate is 1, alt-fg-color and alt-bg-color specify the colors for the alternate
rows; otherwise they are empty strings.
left-margin, right-margin, top-margin, and bottom-margin are measured in px.
2.2.5 Borders
Borders =>
count(
ib1[endian]
be32[n-borders] Border*[n-borders]
bool[show-grid-lines]
00 00 00)
Border =>
be32[border-type]
be32[stroke-type]
be32[color]
Chapter 2: SPSS Viewer File Format 45
all-layers is 1 to print all layers, 0 to print only the layer designated by current-layer
in TableSettings (see Section 2.2.7 [SPV Light Member Table Settings], page 46).
paginate-layers is 1 to print each layer at the start of a new page, 0 otherwise. (This
setting is honored only all-layers is 1, since otherwise only one layer is printed.)
fit-width and fit-length control whether the table is shrunk to fit within a page’s
width or length, respectively.
n-orphan-lines is the minimum number of rows or columns to put in one part of a
table that is broken across pages.
If top-continuation is 1, then continuation-string is printed at the top of a page
when a table is broken across pages for printing; similarly for bottom-continuation and
the bottom of a page. Usually, continuation-string is empty.
2.2.8 Formats
Formats =>
int32[n-widths] int32*[n-widths]
string[locale]
int32[current-layer]
bool[x7] bool[x8] bool[x9]
Y0
CustomCurrency
count(
v1(X0?)
v3(count(X1 count(X2)) count(X3)))
Y0 => int32[epoch] byte[decimal] byte[grouping]
CustomCurrency => int32[n-ccs] string*[n-ccs]
Chapter 2: SPSS Viewer File Format 48
If n-widths is nonzero, then the accompanying integers are column widths as manually
adjusted by the user.
locale is a locale including an encoding, such as en_US.windows-1252 or
it_IT.windows-1252. (locale is often duplicated in Y1, described below).
epoch is the year that starts the epoch. A 2-digit year is interpreted as belonging to the
100 years beginning at the epoch. The default epoch year is 69 years prior to the current
year; thus, in 2017 this field by default contains 1948. In the corpus, epoch ranges from
1943 to 1948, plus some contain -1.
decimal is the decimal point character. The observed values are ‘.’ and ‘,’.
grouping is the grouping character. Usually, it is ‘,’ if decimal is ‘.’, and vice versa.
Other observed values are ‘’’ (apostrophe), ‘ ’ (space), and zero (presumably indicating
that digits should not be grouped).
n-ccs is observed as either 0 or 5. When it is 5, the following strings are CCA through
CCE format strings. See Section “Custom Currency Formats” in PSPP. Most commonly
these are all -,,, but other strings occur.
A writer may safely use false for x7, x8, and x9.
X0
X0 only appears, optionally, in version 1 members.
X0 => byte*14 Y1 Y2
Y1 =>
string[command] string[command-local]
string[language] string[charset] string[locale]
bool[x10] bool[include-leading-zero] bool[x12] bool[x13]
Y0
Y2 => CustomCurrency byte[missing] bool[x17]
command describes the statistical procedure that generated the output, in English. It
is not necessarily the literal syntax name of the procedure: for example, NPAR TESTS
becomes “Nonparametric Tests.” command-local is the procedure’s name, translated into
the output language; it is often empty and, when it is not, sometimes the same as command.
include-leading-zero is the LEADZERO setting for the table, where false is OFF (the
default) and true is ON. See Section “SET LEADZERO” in PSPP.
missing is the character used to indicate that a cell contains a missing value. It is always
observed as ‘.’.
A writer may safely use false for x10 and x17 and true for x12 and x13.
X1
X1 only appears in version 3 members.
X1 =>
bool[x14]
byte[show-title]
bool[x16]
byte[lang]
byte[show-variables]
Chapter 2: SPSS Viewer File Format 49
byte[show-values]
int32[x18] int32[x19]
00*17
bool[x20]
bool[show-caption]
lang may indicate the language in use. Some values seem to be 0: en, 1: de, 2: es, 3:
it, 5: ko, 6: pl, 8: zh-tw, 10: pt_BR, 11: fr.
show-variables determines how variables are displayed by default. A value of 1 means
to display variable names, 2 to display variable labels when available, 3 to display both
(name followed by label, separated by a space). The most common value is 0, which
probably means to use a global default.
show-values is a similar setting for values. A value of 1 means to display the value, 2
to display the value label when available, 3 to display both. Again, the most common value
is 0, which probably means to use a global default.
show-title is 1 to show the caption, 10 to hide it.
show-caption is true to show the caption, false to hide it.
A writer may safely use false for x14, false for x16, 0 for lang, -1 for x18 and x19, and
false for x20.
X2
X2 only appears in version 3 members.
X2 =>
int32[n-row-heights] int32*[n-row-heights]
int32[n-style-map] StyleMap*[n-style-map]
int32[n-styles] StylePair*[n-styles]
count((i0 i0)?)
StyleMap => int64[cell-index] int16[style-index]
If present, n-row-heights and the accompanying integers are row heights as manually
adjusted by the user.
The rest of X2 specifies styles for data cells. At first glance this is odd, because each
data cell can have its own style embedded as part of the data, but in practice X2 specifies
a style for a cell only if that cell is empty (and thus does not appear in the data at all).
Each StyleMap specifies the index of a blank cell, calculated the same was as in the Cells
(see Section 2.2.12 [SPV Light Member Cells], page 52), along with a 0-based index into
the accompanying StylePair array.
A writer may safely omit the optional i0 i0 inside the count(...).
X3
X3 only appears in version 3 members.
X3 =>
01 00 byte[x21] 00 00 00
Y1
double[small] 01
(string[dataset] string[datafile] i0 int32[date] i0)?
Chapter 2: SPSS Viewer File Format 50
Y2
(int32[x22] i0 01?)?
small is a small real number. In the corpus, it overwhelmingly takes the value 0.0001,
with zero occasionally seen. Nonzero numbers with format 40 (see Section 2.2.13 [SPV
Light Member Value], page 53) whose magnitudes are smaller than displayed in scientific
notation. (Thus, a small of zero prevents scientific notation from being chosen.)
dataset is the name of the dataset analyzed to produce the output, e.g. DataSet1, and
datafile the name of the file it was read from, e.g. C:\Users\foo\bar.sav. The latter is
sometimes the empty string.
date is a date, as seconds since the epoch, i.e. since January 1, 1970. Pivot tables within
an SPV file often have dates a few minutes apart, so this is probably a creation date for the
table rather than for the file.
Sometimes dataset, datafile, and date are present and other times they are absent.
The reader can distinguish by assuming that they are present and then checking whether
the presumptive dataset contains a null byte (a valid string never will).
x22 is usually 0 or 2000000.
A writer may safely use 4 for x21 and omit x22 and the other optional bytes at the end.
Encoding
Formats contains several indications of character encoding:
• locale in Formats itself.
• locale in Y1 (in version 1, Y1 is optionally nested inside X0; in version 3, Y1 is nested
inside X3).
• charset in version 3, in Y1.
• lang in X1, in version 3.
charset, if present, is a good indication of character encoding, and in its absence the
encoding suffix on locale in Formats will work.
locale in Y1 can be disregarded: it is normally the same as locale in Formats, and it
is only present if charset is also.
lang is not helpful and should be ignored for character encoding purposes.
However, the corpus contains many examples of light members whose strings are encoded
in UTF-8 despite declaring some other character set. Furthermore, the corpus contains
several examples of light members in which some strings are encoded in UTF-8 (and contain
multibyte characters) and other strings are encoded in another character set (and contain
non-ASCII characters). PSPP treats any valid UTF-8 string as UTF-8 and only falls back
to the declared encoding for strings that are not valid UTF-8.
The pspp-output program’s strings command can help analyze the encoding in an
SPV light member. Use pspp-output --help-dev to see its usage.
2.2.9 Dimensions
A pivot table presents multidimensional data. A Dimension identifies the categories asso-
ciated with each dimension.
Dimensions => int32[n-dims] Dimension*[n-dims]
Chapter 2: SPSS Viewer File Format 51
Dimension =>
Value[name] DimProperties
int32[n-categories] Category*[n-categories]
DimProperties =>
byte[x1]
byte[x2]
int32[x3]
bool[hide-dim-label]
bool[hide-all-labels]
01 int32[dim-index]
name is the name of the dimension, e.g. Variables, Statistics, or a variable name.
The meanings of x1 and x3 are unknown. x1 is usually 0 but many other values have
been observed. A writer may safely use 0 for x1 and 2 for x3.
x2 is 0, 1, or 2. For a pivot table with L layer dimensions, R row dimensions, and C
column dimensions, x2 is 2 for the first L dimensions, 0 for the next R dimensions, and 1
for the remaining C dimensions. This does not mean that the layer dimensions must be
presented first, followed by the row dimensions, followed by the column dimensions—on
the contrary, they are frequently in a different order—but x2 must follow this pattern to
prevent the pivot table from being misinterpreted.
If hide-dim-label is 00, the pivot table displays a label for the dimension itself. Because
usually the group and category labels are enough explanation, it is usually 01.
If hide-all-labels is 01, the pivot table omits all labels for the dimension, including
group and category labels. It is usually 00. When hide-all-labels is 01, show-dim-label
is ignored.
dim-index is usually the 0-based index of the dimension, e.g. 0 for the first dimension,
1 for the second, and so on. Sometimes it is -1. There is no visible difference. A writer may
safely use the 0-based index.
2.2.10 Categories
Categories are arranged in a tree. Only the leaf nodes in the tree are really categories; the
others just serve as grouping constructs.
Category => Value[name] (Leaf | Group)
Leaf => 00 00 00 i2 int32[leaf-index] i0
Group =>
bool[merge] 00 01 int32[x23]
i-1 int32[n-subcategories] Category*[n-subcategories]
name is the name of the category (or group).
A Leaf represents a leaf category. The Leaf’s leaf-index is a nonnegative integer
unique within the Dimension and less than n-categories in the Dimension. If the user
does not sort or rearrange the categories, then leaf-index starts at 0 for the first Leaf
in the dimension and increments by 1 with each successive Leaf. If the user does sorts or
rearrange the categories, then the order of categories in the file reflects that change and
leaf-index reflects the original order.
A dimension can have no leaf categories at all. A table that contains such a dimension
necessarily has no data at all.
Chapter 2: SPSS Viewer File Format 52
A Group is a group of nested categories. Usually a Group contains at least one Cate-
gory, so that n-subcategories is positive, but Groups with zero subcategories have been
observed.
If a Group’s merge is 00, the most common value, then the group is really a distinct
group that should be represented as such in the visual representation and user interface. If
merge is 01, the categories in this group should be shown and treated as if they were direct
children of the group’s containing group (or if it has no parent group, then direct children of
the dimension), and this group’s name is irrelevant and should not be displayed. (Merged
groups can be nested!)
Writers need not use merged groups.
A Group’s x23 appears to be i2 when all of the categories within a group are leaf
categories that directly represent data values for a variable (e.g. in a frequency table or
crosstabulation, a group of values in a variable being tabulated) and i0 otherwise. A writer
may safely write a constant 0 in this field.
2.2.11 Axes
After the dimensions come assignment of each dimension to one of the axes: layers, rows,
and columns.
Axes =>
int32[n-layers] int32[n-rows] int32[n-columns]
int32*[n-layers] int32*[n-rows] int32*[n-columns]
The values of n-layers, n-rows, and n-columns each specifies the number of dimensions
displayed in layers, rows, and columns, respectively. Any of them may be zero. Their values
sum to n-dimensions from Dimensions (see Section 2.2.9 [SPV Light Member Dimensions],
page 50).
The following n-dimensions integers, in three groups, are a permutation of the 0-based
dimension numbers. The first n-layers integers specify each of the dimensions represented
by layers, the next n-rows integers specify the dimensions represented by rows, and the final
n-columns integers specify the dimensions represented by columns. When there is more
than one dimension of a given kind, the inner dimensions are given first. (For the layer axis,
this means that the first dimension is at the bottom of the list and the last dimension is at
the top when the current layer is displayed.)
2.2.12 Cells
The final part of an SPV light member contains the actual data.
Cells => int32[n-cells] Cell*[n-cells]
Cell => int64[index] v1(00?) Value
A Cell consists of an index and a Value. Suppose there are d dimensions, numbered
1 through d in the order given in the Dimensions previously, and that dimension i has ni
categories. Consider the cell at coordinates xi , 1 ≤ i ≤ d, and note that 0 ≤ xi < ni . Then
the index is calculated by the following algorithm:
let index = 0
for each i from 1 to d:
index = (ni × index ) + xi
Chapter 2: SPSS Viewer File Format 53
For example, suppose there are 3 dimensions with 3, 4, and 5 categories, respectively.
The cell at coordinates (1, 2, 3) has index 5 × (4 × (3 × 0 + 1) + 2) + 3 = 33. Within a given
dimension, the index is the leaf-index in a Leaf.
2.2.13 Value
Value is used throughout the SPV light member format. It boils down to a number or a
string.
Value => 00? 00? 00? 00? RawValue
RawValue =>
01 ValueMod int32[format] double[x]
| 02 ValueMod int32[format] double[x]
string[var-name] string[value-label] byte[show]
| 03 string[local] ValueMod string[id] string[c] bool[fixed]
| 04 ValueMod int32[format] string[value-label] string[var-name]
byte[show] string[s]
| 05 ValueMod string[var-name] string[var-label] byte[show]
| 06 string[local] ValueMod string[id] string[c]
| ValueMod string[template] int32[n-args] Argument*[n-args]
Argument =>
i0 Value
| int32[x] i0 Value*[x] /* x > 0 */
There are several possible encodings, which one can distinguish by the first nonzero byte
in the encoding.
01 The numeric value x, intended to be presented to the user formatted according
to format, which is about the same as the format described for system files (see
[System File Output Formats], page 8). The exception is that format 40 is not
MTIME but instead approximately a synonym for F format with a different
rule for whether a value is shown in scientific notation: a value in format 40 is
shown in scientific notation if and only if it is nonzero and its magnitude is less
than small (see Section 2.2.8 [SPV Light Member Formats], page 47).
Most commonly, format has width 40 (the maximum).
An x with the maximum negative double value -DBL_MAX represents the system-
missing value SYSMIS. (HIGHEST and LOWEST have not been observed.) See
Chapter 1 [System File Format], page 2, for more about these special values.
02 Similar to 01, with the additional information that x is a value of variable
var-name and has value label value-label. Both var-name and value-label
can be the empty string, the latter very commonly.
show determines whether to show the numeric value or the value label. A
value of 1 means to show the value, 2 to show the label, 3 to show both, and
0 means to use the default specified in show-values (see Section 2.2.8 [SPV
Light Member Formats], page 47).
03 A text string, in two forms: c is in English, and sometimes abbreviated or
obscure, and local is localized to the user’s locale. In an English-language
locale, the two strings are often the same, and in the cases where they differ,
Chapter 2: SPSS Viewer File Format 54
local is more appropriate for a user interface, e.g. c of “Not a PxP table for
MCN...” versus local of “Computed only for a PxP table, where P must be
greater than 1.”
c and local are always either both empty or both nonempty.
id is a brief identifying string whose form seems to resemble a programming
language identifier, e.g. cumulative_percent or factor_14. It is not unique.
fixed is 00 for text taken from user input, such as syntax fragment, expressions,
file names, data set names, and 01 for fixed text strings such as names of
procedures or statistics. In the former case, id is always the empty string; in
the latter case, id is still sometimes empty.
04 The string value s, intended to be presented to the user formatted according to
format. The format for a string is not too interesting, and the corpus contains
many clearly invalid formats like A16.39 or A255.127 or A134.1, so readers
should probably entirely disregard the format. PSPP only checks format to
distinguish AHEX format.
s is a value of variable var-name and has value label value-label. var-name
is never empty but value-label is commonly empty.
show has the same meaning as in the encoding for 02.
05 Variable var-name with variable label var-label. In the corpus, var-name is
rarely empty and var-label is often empty.
show determines whether to show the variable name or the variable label. A
value of 1 means to show the name, 2 to show the label, 3 to show both, and 0
means to use the default specified in show-variables (see Section 2.2.8 [SPV
Light Member Formats], page 47).
06 Similar to type 03, with fixed assumed to be true.
otherwise When the first byte of a RawValue is not one of the above, the RawValue starts
with a ValueMod, whose syntax is described in the next section. (A ValueMod
always begins with byte 31 or 58.)
This case is a template string, analogous to printf, followed by one or more
Arguments, each of which has one or more values. The template string is copied
directly into the output except for the following special syntax,
\%
\:
\[
\] Each of these expands to the character following ‘\\’, to escape
characters that have special meaning in template strings. These
are effective inside and outside the [...] syntax forms described
below.
\n Expands to a new-line, inside or outside the [...] forms described
below.
^i Expands to a formatted version of argument i, which must have
only a single value. For example, ^1 expands to the first argument’s
value.
Chapter 2: SPSS Viewer File Format 55
[:a:]i Expands a for each of the values in i. a should contain one or more
^j conversions, which are drawn from the values for argument i in
order. Some examples from the corpus:
[:^1:]1 All of the values for the first argument, concatenated.
[:^1\n:]1
Expands to the values for the first argument, each fol-
lowed by a new-line.
[:^1 = ^2:]2
Expands to x = y where x is the second argument’s first
value and y is its second value. (This would be used
only if the argument has two values. If there were more
values, the second and third values would be directly
concatenated, which would look funny.)
[a:b:]i This extends the previous form so that the first values are expanded
using a and later values are expanded using b. For an unknown
reason, within a the ^j conversions are instead written as %j. Some
examples from the corpus:
[%1:*^1:]1
Expands to all of the values for the first argument,
separated by ‘*’.
[%1 = %2:, ^1 = ^2:]1
Given appropriate values for the first argument, ex-
pands to X = 1, Y = 2, Z = 3.
[%1:, ^1:]1
Given appropriate values, expands to 1, 2, 3.
The template string is localized to the user’s locale.
A writer may safely omit all of the optional 00 bytes at the beginning of a Value, except
that it should write a single 00 byte before a templated Value.
2.2.14 ValueMod
A ValueMod can specify special modifications to a Value.
ValueMod =>
58
| 31
int32[n-refs] int16*[n-refs]
int32[n-subscripts] string*[n-subscripts]
v1(00 (i1 | i2) 00? 00? int32 00? 00?)
v3(count(TemplateString StylePair))
StylePair =>
(31 FontStyle | 58)
Chapter 2: SPSS Viewer File Format 56
FontStyle =>
bool[bold] bool[italic] bool[underline] bool[show]
string[fg-color] string[bg-color]
string[typeface] byte[size]
CellStyle =>
int32[halign] int32[valign] double[decimal-offset]
int16[left-margin] int16[right-margin]
int16[top-margin] int16[bottom-margin]
A ValueMod that begins with “31” specifies special modifications to a Value.
Each of the n-refs integers is a reference to a Footnote (see Section 2.2.3 [SPV Light
Member Footnotes], page 43) by 0-based index. Footnote markers are shown appended to
the main text of the Value, as superscripts or subscripts.
The subscripts, if present, are strings to append to the main text of the Value, as
subscripts. Each subscript text is a brief indicator, e.g. ‘a’ or ‘b’, with its meaning indicated
by the table caption. When multiple subscripts are present, they are displayed separated
by commas.
The id inside the TemplateString, if present, is a template string for substitutions us-
ing the syntax explained previously. It appears to be an English-language version of the
localized template string in the Value in which the Template is nested. A writer may safely
omit the optional fixed data in TemplateString.
FontStyle and CellStyle, if present, change the style for this individual Value. In
FontStyle, bold, italic, and underline control the particular style. show is ordinarily
1; if it is 0, then the cell data is not shown. fg-color and bg-color are strings in the for-
mat #rrggbb, e.g. #ff0000 for red or #ffffff for white. The empty string is occasionally
observed also. The size is a font size in units of 1/128 inch.
In CellStyle, halign is 0 for center, 2 for left, 4 for right, 6 for decimal, 0xffffffad for
mixed. For decimal alignment, decimal-offset is the decimal point’s offset from the
right side of the cell, in pt (see Section 2.2 [SPV Light Detail Member Format], page 40).
valign specifies vertical alignment: 0 for center, 1 for top, 3 for bottom. left-margin,
right-margin, top-margin, and bottom-margin are in pt.
2.3.1 Metadata
Metadata =>
int32[n-values] int32[n-variables] int32[data-offset]
vAF(byte*28[source-name])
vB0(byte*64[source-name] int32[x])
A data source has n-variables variables, each with n-values data values.
source-name is a 28- or 64-byte string padded on the right with 0-bytes. The names that
appear in the corpus are very generic: usually tableData for pivot table data or source0
for chart data.
A given Metadata’s data-offset is the offset, in bytes, from the beginning of the mem-
ber to the start of the corresponding Data. This allows programs to skip to the beginning of
the data for a particular source. In every case in the corpus, the Data follow the Metadata
in the same order, but it is important to use data-offset instead of reading sequentially
through the file because of the exception described below.
One SPV file in the corpus has legacy binary members with version 0xb0 but a 28-
byte source-name field (and only a single source). In practice, this means that the 64-
byte source-name used in version 0xb0 has a lot of 0-bytes in the middle followed by the
variable-name of the following Data. As long as a reader treats the first 0-byte in the
source-name as terminating the string, it can properly interpret these members.
The meaning of x in version 0xb0 is unknown.
the numeric data, one double per datum. A double with the maximum negative double
-DBL_MAX represents the system-missing value SYSMIS.
src/output/spv/detail-xml.grammar in the PSPP source tree for the full grammar that
it uses for parsing.
The important elements of the detail XML format are:
• Variables. See Section 2.4.2 [SPV Detail Variable Elements], page 61.
• Assignment of variables to axes. A variable can appear as columns, or rows, or layers.
The faceting element and its sub-elements describe this assignment.
• Styles and other annotations.
This description is not detailed enough to write legacy tables. Instead, write tables in
the light binary format.
extension[visualization_extension]
:numRows=int?
:showGridline=bool?
:minWidthSet=(true)?
:maxWidthSet=(true)?
=> EMPTY
layerController
:source=(tableData)
:target=ref label?
=> EMPTY
The visualization element is the root of detail XML member. It has the following
attributes:
creator [Attribute]
The version of the software that created this SPV file, as a string of the form xxyyzz,
which represents software version xx.yy.zz, e.g. 160001 is version 16.0.1. The corpus
includes major versions 16 through 19.
date [Attribute]
The date on the which the file was created, as a string of the form YYYY-MM-DD.
lang [Attribute]
The locale used for output, in Windows format, which is similar to the format used in
Unix with the underscore replaced by a hyphen, e.g. en-US, en-GB, el-GR, sr-Cryl-
RS.
name [Attribute]
The title of the pivot table, localized to the output language.
style [Attribute]
The base style for the pivot table. In every example in the corpus, the style element
has no attributes other than id.
type [Attribute]
A floating-point number. The meaning is unknown.
version [Attribute]
The visualization schema version number. In the corpus, the value is one of 2.4, 2.5,
2.7, and 2.8.
The userSource element has no visible effect.
The extension element as a child of visualization has the following attributes.
numRows [Attribute]
An integer that presumably defines the number of rows in the displayed pivot table.
showGridline [Attribute]
Always set to false in the corpus.
minWidthSet [Attribute]
maxWidthSet [Attribute]
Always set to true in the corpus.
The extension element as a child of container has the following attribute
Chapter 2: SPSS Viewer File Format 61
combinedFootnotes [Attribute]
Meaning unknown.
A single variable’s data can be modified in two of the steps, if both valueMapEntry and
relabel are used. The following example from the corpus maps several integers to 2, then
maps 2 in turn to the string “Input”:
<derivedVariable categorical="true" dependsOn="dimension0categories"
id="dimension0group0map" value="map(dimension0group0)">
<stringFormat>
<relabel from="2" to="Input"/>
<relabel from="10" to="Missing Value Handling"/>
<relabel from="14" to="Resources"/>
<relabel from="0" to=""/>
<relabel from="1" to=""/>
<relabel from="13" to=""/>
</stringFormat>
<valueMapEntry from="2;3;5;6;7;8;9" to="2"/>
<valueMapEntry from="10;11" to="10"/>
<valueMapEntry from="14;15" to="14"/>
<valueMapEntry from="0" to="0"/>
<valueMapEntry from="1" to="1"/>
<valueMapEntry from="13" to="13"/>
</derivedVariable>
id [Attribute]
An id is always present because this element exists to be referenced from other
elements.
categorical [Attribute]
Always set to true.
Chapter 2: SPSS Viewer File Format 63
source [Attribute]
Always set to tableData, the source-name in the corresponding tableData.bin
member (see Section 2.3.1 [SPV Legacy Member Metadata], page 57).
sourceName [Attribute]
The name of a variable within the source, corresponding to the variable-name in
the tableData.bin member (see Section 2.3.2 [SPV Legacy Member Numeric Data],
page 57).
label [Attribute]
The variable label, if any.
labelVariable [Attribute]
The variable-name of a variable whose string values correspond one-to-one with the
values of this variable and are suitable for use as value labels.
dependsOn [Attribute]
This attribute doesn’t affect the display of a table.
dependsOn [Attribute]
This attribute doesn’t affect the display of a table.
from [Attribute]
A source value, or multiple source values separated by semicolons, e.g. 0 or
13;14;15;16.
to [Attribute]
The target value, e.g. 0.
combinedFootnotes [Attribute]
Always set to true in the corpus.
from [Attribute]
An integer or a name like “dimension0”.
helpId [Attribute]
An identifier.
cellStyle [Attribute]
style [Attribute]
Each of these is the id of a style element (see Section 2.4.12 [SPV Detail style
Element], page 80). The former is the default style for individual cells, the latter for
the entire table.
part [Attribute]
The part of the table being located.
Chapter 2: SPSS Viewer File Format 66
method [Attribute]
How the location is determined:
sizeToContent
Based on the natural size of the table. Observed only for parts height
and width.
attach Based on the location specified in target. Observed only for parts top
and bottom.
fixed Using the value in value. Observed only for parts top, bottom, and left.
same Same as the specified target. Observed only for part left.
min [Attribute]
Minimum size. Only observed with value 100pt. Only observed for part width.
target [Dependent]
Required when method is attach or same, not observed otherwise. This identifies an
element to attach to. Observed with the ID of title, footnote, graph, and other
elements.
value [Dependent]
Required when method is fixed, not observed otherwise. Observed values are 0%,
0px, 1px, and 3px on parts top and left, and 100% on part bottom.
layer
:variable=ref (sourceVariable | derivedVariable)
:value
:visible=bool?
:method[layer_method]=(nest)?
:titleVisible=bool?
=> EMPTY
The faceting element describes the row, column, and layer structure of the table.
Its cross child determines the row and column structure, and each layer child (if any)
represents a layer. Layers may appear before or after cross.
The cross element describes the row and column structure of the table. It has exactly
two children, the first of which describes the table’s columns and the second the table’s rows.
Chapter 2: SPSS Viewer File Format 67
Each child is a nest element if the table has any dimensions along the axis in question,
otherwise a unity element.
A nest element contains of one or more dimensions listed from innermost to outermost,
each represented by variableReference child elements. Each variable in a dimension is
listed in order. See Section 2.4.2 [SPV Detail Variable Elements], page 61, for information
on the variables that comprise a dimension.
A nest can contain a single dimension, e.g.:
<nest>
<variableReference ref="dimension0categories"/>
<variableReference ref="dimension0group0"/>
<variableReference ref="dimension0"/>
</nest>
A nest can contain multiple dimensions, e.g.:
<nest>
<variableReference ref="dimension1categories"/>
<variableReference ref="dimension1group0"/>
<variableReference ref="dimension1"/>
<variableReference ref="dimension0categories"/>
<variableReference ref="dimension0"/>
</nest>
A nest may have no dimensions, in which case it still has one variableReference child,
which references a derivedVariable whose value attribute is constant(0). In the corpus,
such a derivedVariable has row or column, respectively, as its id. This is equivalent to
using a unity element in place of nest.
A variableReference element refers to a variable through its ref attribute.
Each layer element represents a dimension, e.g.:
<layer value="0" variable="dimension0categories" visible="true"/>
<layer value="dimension0" variable="dimension0" visible="false"/>
layer has the following attributes.
variable [Attribute]
Refers to a sourceVariable or derivedVariable element.
value [Attribute]
The value to select. For a category variable, this is always 0; for a data variable, it is
the same as the variable attribute.
visible [Attribute]
Whether the layer is visible. Generally, category layers are visible and data layers are
not, but sometimes this attribute is omitted.
method [Attribute]
When present, this is always nest.
Chapter 2: SPSS Viewer File Format 68
tableLayout
:verticalTitlesInCorner=bool
:style=ref style?
:fitCells=(ticks both)?
=> EMPTY
The facetLayout element and its descendants control styling for the table.
Its tableLayout child has the following attributes
verticalTitlesInCorner [Attribute]
If true, in the absence of corner text, row headings will be displayed in the corner.
style [Attribute]
Refers to a style element.
fitCells [Attribute]
Meaning unknown.
majorTicks
:labelAngle=int
:length=dimension
:style=ref style
:tickFrameStyle=ref style
:labelFrequency=int?
:stagger=bool?
=> gridline?
gridline
:style=ref style
:zOrder=int
=> EMPTY
Each facetLevel describes a variableReference or layer, and a table has one
facetLevel element for each such element. For example, an SPV detail member that
contains four variableReference elements and two layer elements will contain six
facetLevel elements.
In the corpus, facetLevel elements and the elements that they describe are always in
the same order. The correspondence may also be observed in two other ways. First, one
may use the level attribute, described below. Second, in the corpus, a facetLevel always
Chapter 2: SPSS Viewer File Format 69
has an id that is the same as the id of the element it describes with _facetLevel appended.
One should not formally rely on this, of course, but it is usefully indicative.
level [Attribute]
A 1-based index into the variableReference and layer elements, e.g. a facetLayout
with a level of 1 describes the first variableReference in the SPV detail member,
and in a member with four variableReference elements, a facetLayout with a
level of 5 describes the first layer in the member.
gap [Attribute]
Always observed as 0pt.
Each facetLevel contains an axis, which in turn may contain a label for the
facetLevel (see Section 2.4.8 [SPV Detail label Element], page 69) and does contain a
majorTicks element.
labelAngle [Attribute]
Normally 0. The value -90 causes inner column or outer row labels to be rotated
vertically.
style [Attribute]
tickFrameStyle [Attribute]
Each refers to a style element. style is the style of the tick labels, tickFrameStyle
the style for the frames around the labels.
descriptionGroup
:target=ref faceting
:separator?
=> (description | text)+
text
:usesReference=int?
:definesReference=int?
:position=(subscript | superscript)?
:style=ref style
=> TEXT
This element represents a label on some aspect of the table.
Chapter 2: SPSS Viewer File Format 70
style [Attribute]
textFrameStyle [Attribute]
Each of these refers to a style element. style is the style of the label text,
textFrameStyle the style for the frame around the label.
purpose [Attribute]
The kind of entity being labeled.
target [Attribute]
The id of an element being described. In the corpus, this is always faceting.
separator [Attribute]
A string to separate the description of multiple groups, if the target has more than
one. In the corpus, this is always a new-line.
Which Cells?
union => intersect+
where
:variable=ref (sourceVariable | derivedVariable)
:include
=> EMPTY
intersectWhere
:variable=ref (sourceVariable | derivedVariable)
:variable2=ref (sourceVariable | derivedVariable)
Chapter 2: SPSS Viewer File Format 71
=> EMPTY
What Styles?
setStyle
:target=ref (labeling | graph | interval | majorTicks)
:style=ref style
=> EMPTY
setFormat
:target=ref (majorTicks | labeling)
:reset=bool?
=> format | numberFormat | stringFormat+ | dateTimeFormat | elapsedTimeFormat
setFrameStyle
:style=ref style
:target=ref majorTicks
=> EMPTY
The set* children of setCellProperties determine the styles to set.
When setCellProperties contains a setFormat whose target references a labeling
element, or if it contains a setStyle that references a labeling or interval element,
the setCellProperties sets the style for table cells. The format from the setFormat, if
present, replaces the cells’ format. The style from the setStyle that references labeling,
Chapter 2: SPSS Viewer File Format 72
if present, replaces the label’s font and cell styles, except that the background color is taken
instead from the interval’s style, if present.
When setCellProperties contains a setFormat whose target references a majorTicks
element, or if it contains a setStyle whose target references a majorTicks, or if it con-
tains a setFrameStyle element, the setCellProperties sets the style for row or column
labels. In this case, the setCellProperties always contains a single where element whose
variable designates the variable whose labels are to be styled. The format from the
setFormat, if present, replaces the labels’ format. The style from the setStyle that ref-
erences majorTicks, if present, replaces the labels’ font and cell styles, except that the
background color is taken instead from the setFrameStyle’s style, if present.
When setCellProperties contains a setStyle whose target references a graph el-
ement, and one that references a labeling element, and the union element contains
alternating, the setCellProperties sets the alternate foreground and background colors
for the data area. The foreground color is taken from the style referenced by the setStyle
that targets the graph, the background color from the setStyle for labeling.
A reader may ignore a setCellProperties that only contains setMetaData, as well as
setMetaData within other setCellProperties.
A reader may ignore a setCellProperties whose only set* child is a setStyle that
targets the graph element.
target [Attribute]
The id of an element whose style is to be set.
style [Attribute]
The id of a style element that identifies the style to set on the target.
target [Attribute]
Refers to an element whose style is to be set.
reset [Attribute]
If this is true, this format replaces the target’s previous format. If it is false, the
modifies the previous format.
whenNeeded
Use scientific notation when the number will not otherwise fit in the
available space.
true Always use scientific notation. Not observed in the corpus.
false Never use scientific notation. A number that won’t otherwise fit will be
replaced by an error indication (see the errorCharacter attribute). Not
observed in the corpus.
small [Attribute]
Only present when the scientific attribute is onlyForSmall, this is a numeric
magnitude below which the number will be formatted in scientific notation. The
values 0 and 0.0001 have been observed. The value 0 seems like a pathological
choice, since no real number has a magnitude less than 0; perhaps in practice such a
choice is equivalent to setting scientific to false.
prefix [Attribute]
suffix [Attribute]
Specifies a prefix or a suffix to apply to the formatted number. Only suffix has been
observed, with value ‘%’.
:quarterPrefix?
:quarterSuffix?
:showMonth=bool?
:monthFormat=(long | short | number | paddedNumber)?
:showWeek=bool?
:weekPadding=bool?
:weekSuffix?
:showDayOfWeek=bool?
:dayOfWeekAbbreviation=bool?
:dayPadding=bool?
:dayOfMonthPadding=bool?
:hourPadding=bool?
:minutePadding=bool?
:secondPadding=bool?
:showDay=bool?
:showHour=bool?
:showMinute=bool?
:showSecond=bool?
:showMillis=bool?
:dayType=(month | year)?
:hourFormat=(AMPM | AS_24 | AS_12)?
=> affix*
This element appears only in schema version 2.5 and earlier (see Section 2.4.1 [SPV
Detail visualization Element], page 59).
Data to be formatted in date formats is stored as strings in legacy data, in the format
yyyy-mm-ddTHH:MM:SS.SSS and must be parsed and reformatted by the reader.
The following attribute is required.
baseFormat [Attribute]
Specifies whether a date and time are both to be displayed, or just one of them.
Many of the attributes’ meanings are obvious. The following seem to be worth docu-
menting.
separatorChars [Attribute]
Exactly four characters. In order, these are used for: decimal point, grouping, date
separator, time separator. Always ‘.,-:’.
mdyOrder [Attribute]
Within a date, the order of the days, months, and years. dayMonthYear is the only
observed value, but one would expect that monthDayYear and yearMonthDay to be
reasonable as well.
showYear [Attribute]
yearAbbreviation [Attribute]
Whether to include the year and, if so, whether the year should be shown abbreviated,
that is, with only 2 digits. Each is true or false; only values of true and false,
respectively, have been observed.
Chapter 2: SPSS Viewer File Format 76
showMonth [Attribute]
monthFormat [Attribute]
Whether to include the month (true or false) and, if so, how to format it.
monthFormat is one of the following:
long The full name of the month, e.g. in an English locale, September.
short The abbreviated name of the month, e.g. in an English locale, Sep.
number The number representing the month, e.g. 9 for September.
paddedNumber
A two-digit number representing the month, e.g. 09 for September.
Only values of true and short, respectively, have been observed.
dayType [Attribute]
This attribute is always month in the corpus, specifying that the day of the month
is to be displayed; a value of year is supposed to indicate that the day of the year,
where 1 is January 1, is to be displayed instead.
hourFormat [Attribute]
hourFormat, if present, is one of:
AMPM The time is displayed with an am or pm suffix, e.g. 10:15pm.
AS_24 The time is displayed in a 24-hour format, e.g. 22:15.
This is the only value observed in the corpus.
AS_12 The time is displayed in a 12-hour format, without distinguishing morning
or evening, e.g. 10;15.
hourFormat is sometimes present for elapsedTime formats, which is confusing since a
time duration does not have a concept of AM or PM. This might indicate a bug in the
code that generated the XML in the corpus, or it might indicate that elapsedTime
is sometimes used to format a time of day.
For a baseFormat of date, PSPP chooses a print format type based on the following
rules:
1. If showQuarter is true: QYR.
2. Otherwise, if showWeek is true: WKYR.
3. Otherwise, if mdyOrder is dayMonthYear:
a. If monthFormat is number or paddedNumber: EDATE.
b. Otherwise: DATE.
4. Otherwise, if mdyOrder is yearMonthDay: SDATE.
5. Otherwise, ADATE.
For a baseFormat of dateTime, PSPP uses YMDHMS if mdyOrder is yearMonthDay and
DATETIME otherwise. For a baseFormat of time, PSPP uses DTIME if showDay is true,
otherwise TIME if showHour is true, otherwise MTIME.
For a baseFormat of date, the chosen width is the minimum for the format type, adding
2 if yearAbbreviation is false or omitted. For other base formats, the chosen width is the
minimum for its type, plus 3 if showSecond is true, plus 4 more if showMillis is also true.
Decimals are 0 by default, or 3 if showMillis is true.
Chapter 2: SPSS Viewer File Format 77
baseFormat [Attribute]
Specifies whether a day and a time are both to be displayed, or just one of them.
The remaining attributes specify exactly how to display the elapsed time.
For baseFormat of time, PSPP converts this element to print format type DTIME; other-
wise, if showHour is true, to TIME; otherwise, to MTIME. The chosen width is the minimum
for the chosen type, adding 3 if showSecond is true, adding 4 more if showMillis is also
true. Decimals are 0 by default, or 3 if showMillis is true.
:weekSuffix?
:showDayOfWeek=bool?
:dayOfWeekAbbreviation=bool?
:hourPadding=bool?
:minutePadding=bool?
:secondPadding=bool?
:showDay=bool?
:showHour=bool?
:showMinute=bool?
:showSecond=bool?
:showMillis=bool?
:dayType=(month | year)?
:hourFormat=(AMPM | AS_24 | AS_12)?
:minimumIntegerDigits=int?
:maximumFractionDigits=int?
:minimumFractionDigits=int?
:useGrouping=bool?
:scientific=(onlyForSmall | whenNeeded | true | false)?
:small=real?
:prefix?
:suffix?
:tryStringsAsNumbers=bool?
:negativesOutside=bool?
=> relabel* affix*
This element is the union of all of the more-specific format elements. It is interpreted in
the same way as one of those format elements, using baseFormat to determine which kind
of format to use.
There are a few attributes not present in the more specific formats:
tryStringsAsNumbers [Attribute]
When this is true, it is supposed to indicate that string values should be parsed as
numbers and then displayed according to numeric formatting rules. However, in the
corpus it is always false.
negativesOutside [Attribute]
If true, the negative sign should be shown before the prefix; if false, it should be
shown after.
definesReference [Attribute]
This specifies the footnote number as a natural number: 1 for the first footnote, 2 for
the second, and so on.
position [Attribute]
Position for the footnote label. Always superscript.
suffix [Attribute]
Whether the affix is a suffix (true) or a prefix (false). Always true.
value [Attribute]
The text of the suffix or prefix. Typically a letter, e.g. a for footnote 1, b for footnote
2, . . . The corpus contains other values: *, **, and a few that begin with at least
one comma: ,b, ,c, ,,b, and ,,c.
labeling
:style=ref style?
:variable=ref (sourceVariable | derivedVariable)
=> (formatting | format | footnotes)*
footnotes
:superscript=bool?
:variable=ref (sourceVariable | derivedVariable)
=> footnoteMapping*
Each footnoteMapping child of the footnotes element defines the footnote marker to be
its to attribute text for the footnote whose 1-based index is given in its definesReference
attribute.
color [Attribute]
In some cases, the text color; in others, the background color.
color2 [Attribute]
Not used.
labelAngle [Attribute]
Normally 0. The value -90 causes inner column or outer row labels to be rotated
vertically.
Chapter 2: SPSS Viewer File Format 81
labelLocationHorizontal [Attribute]
Not used.
labelLocationVertical [Attribute]
The value positive corresponds to vertically aligning text to the top of a cell,
negative to the bottom, center to the middle.
generalProperties
:hideEmptyRows=bool?
:maximumColumnWidth=dimension?
:maximumRowWidth=dimension?
:minimumColumnWidth=dimension?
:minimumRowWidth=dimension?
Chapter 2: SPSS Viewer File Format 82
:rowDimensionLabels=(inCorner | nested)?
=> EMPTY
footnoteProperties
:markerPosition=(superscript | subscript)?
:numberFormat=(alphabetic | numeric)?
=> EMPTY
any[cell_style]
:alternatingColor=color?
:alternatingTextColor=color?
=> style
style
:color=color?
:color2=color?
:font-family?
:font-size?
:font-style=(regular | italic)?
:font-weight=(regular | bold)?
:font-underline=(none | underline)?
:labelLocationVertical=(positive | negative | center)?
:margin-bottom=dimension?
:margin-left=dimension?
:margin-right=dimension?
:margin-top=dimension?
:textAlignment=(left | right | center | decimal | mixed)?
:decimal-offset=dimension?
=> EMPTY
any[border_style]
:borderStyleType=(none | solid | dashed | thick | thin | double)?
:color=color?
=> EMPTY
printingProperties
:printAllLayers=bool?
:rescaleLongTableToFitPage=bool?
:rescaleWideTableToFitPage=bool?
:windowOrphanLines=int?
:continuationText?
:continuationTextAtBottom=bool?
:continuationTextAtTop=bool?
Chapter 2: SPSS Viewer File Format 83
:printEachLayerOnSeparatePage=bool?
=> EMPTY
The name attribute appears only in standalone .stt files (see Section 3.1 [SPSS
TableLook STT Format], page 84).
84
3.2.1 PTTableLook
PTTableLook =>
ff ff 00 00 "PTTableLook" (00|02)[version]
int16[flags]
00 00
bool[nested-row-labels] 00
bool[footnote-marker-subscripts] 00
i54 i18
In PTTableLook, version is 00 or 02. The only difference is that version 00 lacks
V2Styles (see Section 3.2.4 [V2Styles in SPSS TLO Files], page 87) and that version 02
includes it. Both TLO versions are seen in the wild.
Chapter 3: SPSS TableLook File Formats 85
3.2.2 PVSeparatorStyle
PVSeparatorStyle =>
ff ff 00 00 "PVSeparatorStyle" 00
Separator*4[sep1]
03 80 00
Separator*4[sep2]
Separator =>
case(
00 00
| 01 00 int32[color] int16[style] int16[width]
)[type]
PVSeparatorStyle contains eight Separators, in two groups. Each Separator represents
a border between pivot table elements. TLO and SPV files have the same concepts for
borders. See Section 2.2.5 [SPV Light Member Borders], page 44, for the treatment of
borders in SPV files.
A Separator’s type is 00 if the border is not drawn, 01 otherwise. For a border that is
drawn, color is the color that it is drawn in. style and width have the following meanings:
style = 0 and 0 ≤ width ≤ 3
An increasingly thick single line. SPV files only have three line thicknesses.
PSPP treats width 0 as a thin line, width 1 as a solid (normal width) line, and
width 2 or 3 as a thick line.
Chapter 3: SPSS TableLook File Formats 86
PVTextStyle =>
ff ff 00 00 "PVTextStyle" 00
AreaStyle[title-style] MostAreas*7[most-areas]
MostAreas =>
06 80
AreaColor[color] 08 80 00 AreaStyle[style]
These sections hold the styling and coloring for each of the 8 areas in a pivot table.
They are conceptually similar to the area style information in SPV light members (see
Section 2.2.4 [SPV Light Member Areas], page 44).
The styling and coloring for the title area is split between PVCellStyle and PVTextStyle:
the former holds title-color, the latter holds title-style. The style for the remaining
7 areas is in most-areas in PVTextStyle, in the following order: layers, corner, row labels,
column labels, data, caption, and footer.
AreaColor =>
00 01 00 int32[color10] int32[color0] byte[shading] 00
AreaColor represents the background color of an area. TLO files, but not SPV files,
describe backgrounds that are a shaded combination of two colors: shading of 0 is pure
color0, shading of 10 is pure color10, and value in between mix pixels of the two different
colors in linear degree. PSPP does not implement shading, so for 1 ≤ shading ≤ 9 it
interpolates RGB values between colors to arrive at an intermediate shade.
AreaStyle =>
Chapter 3: SPSS TableLook File Formats 87
3.2.4 V2Styles
V2Styles =>
Separator*11[sep3]
byte[continuation-len] byte*[continuation-len][continuation]
int32[min-col-width] int32[max-col-width]
int32[min-row-height] int32[max-row-height]
88
This final, optional, part of the TLO file format contains some additional style informa-
tion. It begins with sep3, which represents the following borders within the pivot table, by
index:
0 Title.
1. . . 4 Left, right, top, and bottom inner frame.
5. . . 8 Left, right, top, and bottom outer frame.
9, 10 Left and top of data area.
When V2Styles is absent, the inner frame borders default to a solid line and the others
listed above to no line.
continuation is the string that goes at the top or bottom of a table broken across pages.
When V2Styles is absent, the default is (Cont.).
min-col-width is the minimum width that a column will be assigned automatically.
max-col-width is the maximum width that a column will be assigned to accommodate a
long column label. min-row-width and max-row-width are a similar range for the width
of row labels. All of these measurements are in points. When V2Styles is absent, the
defaults are 36 for min-col-width and min-row-height, 72 for max-col-width, and 120
for max-row-height.
89
Example
Consider the password ‘pspp’. password is:
0000 70 73 70 70 00 00 00 00 00 00 00 00 00 00 00 00 |pspp............|
0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
cmac is:
0000 3e da 09 8e 66 04 d4 fd f9 63 0c 2c a8 6f b0 45
Chapter 4: Encrypted File Wrappers 90
4. Let al be the least significant 4 bits of a. Find the line in the table below that has al on
the left side. The right side of the line is a set of possible values for the least significant
4 bits of the decoded byte.
03cf ⇒ 0145
12de ⇒ 2367
478b ⇒ 89cd
569a ⇒ abef
5. Let bl be the least significant 4 bits of b. Find the line in the table below that has
bl on the left side. The right side of the line is a set of possible values for the least
significant 4 bits of the decoded byte. Together with the results of the previous step,
only a single possibility is left.
03cf ⇒ 028a
12de ⇒ 139b
478b ⇒ 46ce
569a ⇒ 57df
Example
Consider the encoded character pair ‘-|’. a is 0x2d and b is 0x7c, so ah is 2, bh is 7, al is
0xd, and bl is 0xc. ah means that the most significant four bits of the decoded character is
2, 3, 6, or 7, and bh means that they are 4, 6, 0xc, or 0xe. The single possibility in common
is 6, so the most significant four bits are 6. Similarly, al means that the least significant
four bits are 2, 3, 6, or 7, and bl means they are 0, 2, 8, or 0xa, so the least significant four
bits are 2. The decoded character is therefore 0x62, the letter ‘b’.
92
• Data.
Most records are identified by a single-character tag code. The file header and version
info record do not have a tag.
Other than these single-character codes, there are three types of fields in a portable file:
floating-point, integer, and string. Floating-point fields have the following format:
• Zero or more leading spaces.
• Optional asterisk (‘*’), which indicates a missing value. The asterisk must be followed
by a single character, generally a period (‘.’), but it appears that other characters may
also be possible. This completes the specification of a missing value.
• Optional minus sign (‘-’) to indicate a negative number.
• A whole number, consisting of one or more base-30 digits: ‘0’ through ‘9’ plus capital
letters ‘A’ through ‘T’.
• Optional fraction, consisting of a radix point (‘.’) followed by one or more base-30
digits.
• Optional exponent, consisting of a plus or minus sign (‘+’ or ‘-’) followed by one or
more base-30 digits.
• A forward slash (‘/’).
Integer fields take a form identical to floating-point fields, but they may not contain a
fraction.
String fields take the form of an integer field having value n, followed by exactly n
characters, which are the string content.
64–73
Digits ‘0’ through ‘9’.
74–99
Capital letters ‘A’ through ‘Z’.
100–125
Lowercase letters ‘a’ through ‘z’.
126
Space.
127–130
Symbols .<(+
131
Solid vertical pipe.
132–142
Symbols &[]!$*);^-/
143
Broken vertical pipe.
144–150
Symbols ,%_>?‘:
151
British pound symbol.
152–155
Symbols @’=".
156
Less than or equal symbol.
157
Empty box.
158
Plus or minus.
159
Filled box.
160
Degree symbol.
161
Dagger.
162
Symbol ‘~’.
Chapter 5: Portable File Format 95
163
En dash.
164
Lower left corner box draw.
165
Upper left corner box draw.
166
Greater than or equal symbol.
167–176
Superscript ‘0’ through ‘9’.
177
Lower right corner box draw.
178
Upper right corner box draw.
179
Not equal symbol.
180
Em dash.
181
Superscript ‘(’.
182
Superscript ‘)’.
183
Horizontal dagger (?).
184–186
Symbols ‘{}\’.
187
Cents symbol.
188
Centered dot, or bullet.
189–255
Reserved.
Symbols that are not defined in a particular character set are set to the same value as
symbol 64; i.e., to ‘0’.
The 8-byte tag string consists of the exact characters SPSSPORT in the portable file’s
character set, which can be used to verify that the file is indeed a portable file.
Chapter 5: Portable File Format 96
Data elements are output in the same order as the variable records describing them.
String variables are output as string fields, and numeric variables are output as floating-
point fields.
99
The following sections describe the contents of each record, identified by the index into
the records array.
uint16 compressed;
Set to 0 if the data in the file is not compressed, 1 if the data is compressed
with simple bytecode compression.
uint16 nominal_case_size;
Number of data elements per case. This is the number of variables, except that
long string variables add extra data elements (one for every 8 bytes after the
first 8). String variables in SPSS/PC+ system files are limited to 255 bytes.
uint16 n_cases0;
uint16 n_cases1;
The number of cases in the data record. Both values are the same. Some files in
the corpus contain data for the number of cases noted here, followed by garbage
that somewhat resembles data.
uint16 weight_index;
0, if the file is unweighted, otherwise a 1-based index into the data record of
the weighting variable, e.g. 4 for the first variable after the 3 system-defined
variables.
char creation_date[8];
The date that the file was created, in ‘mm/dd/yy’ format. Single-digit days and
months are not prefixed by zeros. The string is padded with spaces on right or
left or both, e.g. ‘_2/4/93_’, ‘10/5/87_’, and ‘_1/11/88’ (with ‘_’ standing in
for a space) are all actual examples from the corpus.
char creation_time[8];
The time that the file was created, in ‘HH:MM:SS’ format. Single-digit hours are
padded on a left with a space. Minutes and seconds are always written as two
digits.
char file_label[64];
File label declared by the user, if any (see Section “FILE LABEL” in PSPP
Users Guide). Padded on the right with spaces.
or less. String variables wider than 8 bytes have one instance for each 8 bytes, rounding up.
The first instance for a long string specifies the variable’s correct dictionary information.
Subsequent instances for a long string are generally filled with all-zero bytes, although the
missing field contains the numeric system-missing value, and some writers also fill in var_
label_ofs, format, and name, sometimes filling the latter with the numeric system-missing
value rather than a text string. Regardless of the values used, readers should ignore the
contents of these additional instances for long strings.
uint32 value_label_start;
uint32 value_label_end;
For a variable with value labels, these specify offsets into the label record of
the start and end of this variable’s value labels, respectively. See Section 6.3
[Record 2 Labels Record], page 103, for more information.
For a variable without any value labels, these are both zero.
A long string variable may not have value labels.
uint32 var_label_ofs;
For a variable with a variable label, this specifies an offset into the label record.
See Section 6.3 [Record 2 Labels Record], page 103, for more information.
For a variable without a variable label, this is zero.
uint32 format;
The variable’s output format, in the same format used in system files. See
[System File Output Formats], page 8, for details. SPSS/PC+ system files only
use format types 5 (F, for numeric variables) and 1 (A, for string variables).
char name[8];
The variable’s name, padded on the right with spaces.
union { ... } missing;
A user-missing value. For numeric variables, missing.f is the variable’s user-
missing value. For string variables, missing.s is a string missing value. A
variable without a user-missing value is indicated with missing.f set to the
system-missing value, even for string variables (!). A Long string variable may
not have a missing value.
In addition to the user-defined variables, every SPSS/PC+ system file contains, as its
first three variables, the following system-defined variables, in the following order. The
system-defined variables have no variable label, value labels, or missing values.
$CASENUM A numeric variable with format F8.0. Most of the time this is a sequence
number, starting with 1 for the first case and counting up for each subsequent
case. Some files skip over values, which probably reflects cases that were deleted.
$DATE A string variable with format A8. Same format (including varying padding) as
the creation_date field in the main header record (see Section 6.1 [Record 0
Main Header Record], page 100). The actual date can differ from creation_
date and from record to record. This may reflect when individual cases were
added or updated.
Chapter 6: SPSS/PC+ System File Format 103
$WEIGHT A numeric variable with format F8.2. This represents the case’s weight;
SPSS/PC+ files do not have a user-defined weighting variable. If weighting has
not been enabled, every case has value 1.0.
under this License. If a section does not fit the above definition of Secondary then it is
not allowed to be designated as Invariant. The Document may contain zero Invariant
Sections. If the Document does not identify any Invariant Sections then there are none.
The “Cover Texts” are certain short passages of text that are listed, as Front-Cover
Texts or Back-Cover Texts, in the notice that says that the Document is released under
this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may
be at most 25 words.
A “Transparent” copy of the Document means a machine-readable copy, represented
in a format whose specification is available to the general public, that is suitable for
revising the document straightforwardly with generic text editors or (for images com-
posed of pixels) generic paint programs or (for drawings) some widely available drawing
editor, and that is suitable for input to text formatters or for automatic translation to
a variety of formats suitable for input to text formatters. A copy made in an otherwise
Transparent file format whose markup, or absence of markup, has been arranged to
thwart or discourage subsequent modification by readers is not Transparent. An image
format is not Transparent if used for any substantial amount of text. A copy that is
not “Transparent” is called “Opaque”.
Examples of suitable formats for Transparent copies include plain ascii without
markup, Texinfo input format, LaTEX input format, SGML or XML using a publicly
available DTD, and standard-conforming simple HTML, PostScript or PDF designed
for human modification. Examples of transparent image formats include PNG, XCF
and JPG. Opaque formats include proprietary formats that can be read and edited
only by proprietary word processors, SGML or XML for which the DTD and/or
processing tools are not generally available, and the machine-generated HTML,
PostScript or PDF produced by some word processors for output purposes only.
The “Title Page” means, for a printed book, the title page itself, plus such following
pages as are needed to hold, legibly, the material this License requires to appear in the
title page. For works in formats which do not have any title page as such, “Title Page”
means the text near the most prominent appearance of the work’s title, preceding the
beginning of the body of the text.
The “publisher” means any person or entity that distributes copies of the Document
to the public.
A section “Entitled XYZ” means a named subunit of the Document whose title either
is precisely XYZ or contains XYZ in parentheses following text that translates XYZ in
another language. (Here XYZ stands for a specific section name mentioned below, such
as “Acknowledgements”, “Dedications”, “Endorsements”, or “History”.) To “Preserve
the Title” of such a section when you modify the Document means that it remains a
section “Entitled XYZ” according to this definition.
The Document may include Warranty Disclaimers next to the notice which states that
this License applies to the Document. These Warranty Disclaimers are considered to
be included by reference in this License, but only as regards disclaiming warranties:
any other implication that these Warranty Disclaimers may have is void and has no
effect on the meaning of this License.
2. VERBATIM COPYING
Appendix A: GNU Free Documentation License 107
You may copy and distribute the Document in any medium, either commercially or
noncommercially, provided that this License, the copyright notices, and the license
notice saying this License applies to the Document are reproduced in all copies, and
that you add no other conditions whatsoever to those of this License. You may not use
technical measures to obstruct or control the reading or further copying of the copies
you make or distribute. However, you may accept compensation in exchange for copies.
If you distribute a large enough number of copies you must also follow the conditions
in section 3.
You may also lend copies, under the same conditions stated above, and you may publicly
display copies.
3. COPYING IN QUANTITY
If you publish printed copies (or copies in media that commonly have printed covers) of
the Document, numbering more than 100, and the Document’s license notice requires
Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all
these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
the back cover. Both covers must also clearly and legibly identify you as the publisher
of these copies. The front cover must present the full title with all words of the title
equally prominent and visible. You may add other material on the covers in addition.
Copying with changes limited to the covers, as long as they preserve the title of the
Document and satisfy these conditions, can be treated as verbatim copying in other
respects.
If the required texts for either cover are too voluminous to fit legibly, you should put
the first ones listed (as many as fit reasonably) on the actual cover, and continue the
rest onto adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more than 100,
you must either include a machine-readable Transparent copy along with each Opaque
copy, or state in or with each Opaque copy a computer-network location from which
the general network-using public has access to download using public-standard network
protocols a complete Transparent copy of the Document, free of added material. If
you use the latter option, you must take reasonably prudent steps, when you begin
distribution of Opaque copies in quantity, to ensure that this Transparent copy will
remain thus accessible at the stated location until at least one year after the last time
you distribute an Opaque copy (directly or through your agents or retailers) of that
edition to the public.
It is requested, but not required, that you contact the authors of the Document well
before redistributing any large number of copies, to give them a chance to provide you
with an updated version of the Document.
4. MODIFICATIONS
You may copy and distribute a Modified Version of the Document under the conditions
of sections 2 and 3 above, provided that you release the Modified Version under precisely
this License, with the Modified Version filling the role of the Document, thus licensing
distribution and modification of the Modified Version to whoever possesses a copy of
it. In addition, you must do these things in the Modified Version:
A. Use in the Title Page (and on the covers, if any) a title distinct from that of the
Document, and from those of previous versions (which should, if there were any,
Appendix A: GNU Free Documentation License 108
be listed in the History section of the Document). You may use the same title as
a previous version if the original publisher of that version gives permission.
B. List on the Title Page, as authors, one or more persons or entities responsible for
authorship of the modifications in the Modified Version, together with at least five
of the principal authors of the Document (all of its principal authors, if it has fewer
than five), unless they release you from this requirement.
C. State on the Title page the name of the publisher of the Modified Version, as the
publisher.
D. Preserve all the copyright notices of the Document.
E. Add an appropriate copyright notice for your modifications adjacent to the other
copyright notices.
F. Include, immediately after the copyright notices, a license notice giving the public
permission to use the Modified Version under the terms of this License, in the form
shown in the Addendum below.
G. Preserve in that license notice the full lists of Invariant Sections and required Cover
Texts given in the Document’s license notice.
H. Include an unaltered copy of this License.
I. Preserve the section Entitled “History”, Preserve its Title, and add to it an item
stating at least the title, year, new authors, and publisher of the Modified Version
as given on the Title Page. If there is no section Entitled “History” in the Docu-
ment, create one stating the title, year, authors, and publisher of the Document
as given on its Title Page, then add an item describing the Modified Version as
stated in the previous sentence.
J. Preserve the network location, if any, given in the Document for public access to
a Transparent copy of the Document, and likewise the network locations given in
the Document for previous versions it was based on. These may be placed in the
“History” section. You may omit a network location for a work that was published
at least four years before the Document itself, or if the original publisher of the
version it refers to gives permission.
K. For any section Entitled “Acknowledgements” or “Dedications”, Preserve the Title
of the section, and preserve in the section all the substance and tone of each of the
contributor acknowledgements and/or dedications given therein.
L. Preserve all the Invariant Sections of the Document, unaltered in their text and
in their titles. Section numbers or the equivalent are not considered part of the
section titles.
M. Delete any section Entitled “Endorsements”. Such a section may not be included
in the Modified Version.
N. Do not retitle any existing section to be Entitled “Endorsements” or to conflict in
title with any Invariant Section.
O. Preserve any Warranty Disclaimers.
If the Modified Version includes new front-matter sections or appendices that qualify
as Secondary Sections and contain no material copied from the Document, you may at
your option designate some or all of these sections as invariant. To do this, add their
Appendix A: GNU Free Documentation License 109
titles to the list of Invariant Sections in the Modified Version’s license notice. These
titles must be distinct from any other section titles.
You may add a section Entitled “Endorsements”, provided it contains nothing but
endorsements of your Modified Version by various parties—for example, statements of
peer review or that the text has been approved by an organization as the authoritative
definition of a standard.
You may add a passage of up to five words as a Front-Cover Text, and a passage of up
to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified
Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be
added by (or through arrangements made by) any one entity. If the Document already
includes a cover text for the same cover, previously added by you or by arrangement
made by the same entity you are acting on behalf of, you may not add another; but
you may replace the old one, on explicit permission from the previous publisher that
added the old one.
The author(s) and publisher(s) of the Document do not by this License give permission
to use their names for publicity for or to assert or imply endorsement of any Modified
Version.
5. COMBINING DOCUMENTS
You may combine the Document with other documents released under this License,
under the terms defined in section 4 above for modified versions, provided that you
include in the combination all of the Invariant Sections of all of the original documents,
unmodified, and list them all as Invariant Sections of your combined work in its license
notice, and that you preserve all their Warranty Disclaimers.
The combined work need only contain one copy of this License, and multiple identical
Invariant Sections may be replaced with a single copy. If there are multiple Invariant
Sections with the same name but different contents, make the title of each such section
unique by adding at the end of it, in parentheses, the name of the original author or
publisher of that section if known, or else a unique number. Make the same adjustment
to the section titles in the list of Invariant Sections in the license notice of the combined
work.
In the combination, you must combine any sections Entitled “History” in the vari-
ous original documents, forming one section Entitled “History”; likewise combine any
sections Entitled “Acknowledgements”, and any sections Entitled “Dedications”. You
must delete all sections Entitled “Endorsements.”
6. COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other documents released
under this License, and replace the individual copies of this License in the various
documents with a single copy that is included in the collection, provided that you
follow the rules of this License for verbatim copying of each of the documents in all
other respects.
You may extract a single document from such a collection, and distribute it individu-
ally under this License, provided you insert a copy of this License into the extracted
document, and follow this License in all other respects regarding verbatim copying of
that document.
Appendix A: GNU Free Documentation License 110