0% found this document useful (0 votes)

37 views7 pages

Max Core Value & Performance

The document provides guidelines on setting the max-core parameter for components like SORT, JOIN, and ROLLUP, emphasizing that the optimal value varies based on the specific graph and data. It explains the consequences of setting max-core too low or too high, including performance degradation and potential system failures. Additionally, it discusses memory usage for graphs, filesystem performance, and the importance of testing configurations to optimize performance.

Uploaded by

edutech2026

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views7 pages

Max Core Value & Performance

Uploaded by

edutech2026

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

What should you set it to?

What you should set max-core to, or whether you should leave it at its default setting, depends
on the component and the data you’re working with. For specific guidelines, see the
documentation for the individual components. If you happen to have a “just right” max-core
setting, or if the default value serves (often it does), that’s fine. But since, if you have to set it,
you have to set it by estimate, it’s important to know what happens when you set it too low or
too high.
Too low
Giving max-core a value less than what the component needs to get the job done completely in
memory will result in the component writing more temporary data to disk at runtime. Depending
on how much data is involved, this can slow performance. But this type of disk activity is much,
much preferable to the uncontrolled disk activity of paging or thrashing.
Too high
Allocating too much memory with max-core can have various effects, depending on the
circumstances:
Perhaps max-core is set to a value higher than needed by the component, but still within the
capacity of the system. In this case, no harm is done — unless the data size increases in later
runs, causing more memory to be allocated, with the result that system performance begins to be
affected (see “Conclusion”).
NOTE: If you set a SORT component’s max-core too low, the component may create too
many little temporary files. This can be a problem, and a good reason to increase max-core.
If max-core is set high enough that your graph’s working set no longer fits in physical memory,
the computer will have to start paging simply to run the graph. This will certainly have an
adverse effect on the graph’s performance, and on the performance of any other applications
running at the time.
If max-core is set so high that the computer’s swap space is exhausted, you can cause your own
graph, and possibly other applications, or even the computer itself, to fail. This has the worst
possible effect on performance.

-================

How does a single graph use memory?

In many cases, the working set of a graph is only a fraction of the total memory demands of all
its components and data, for the following reasons:
Every graph consists of one or more phases
A graph that has more than one phase runs one phase at a time, sequentially, to completion. All
data is written to disk at each phase break. The amount of memory needed by a multiphase graph
is thus equal only to the amount of memory needed by the graph’s single most memory-intensive
phase.
Memory demand is relevant only to separate individual computers
A graph with layouts that involve more than one computer will make memory demands on all
those computers. But the ultimate unit of available memory is the individual computer. You
should calculate each graph phase’s memory demands in terms only of each particular computer
that the phase is running on.
However, it’s important to remember that just because a graph is running parallel, that doesn’t
mean it’s necessarily running on more than one machine. It could be running (for example) on an
SMP, with multiple processors but only one memory space.

Filesystem layout and performance

A graph’s performance is dependent on the performance of the file systems it uses, just as much
as on its own computational efficiency. Graphs work best with an application file space that is
optimized for reading and writing large contiguous blocks of data, fast.
What determines the read/write efficiency of a filesystem is the number of independent disk
controllers and disks available, and whether the files are cached in the controller. These things
are often hidden, however, beneath the configured filesystem (for example, within a storage area
network (SAN)) as it appears to you at the user level. You only see them indirectly, in the
sometimes surprising effects they have on filesystem operations.
The simple tests described in this section will give you a good idea of the performance
capabilities of your filesystem. Three things are separately measured:
Write performance
Read performance
Write/read performance in the same graph
Filesystem performance testing should begin with serial operations. This gives you a set of
simple base observations. You can then go on to test parallel performance running with a
succession of multifiles using increasing numbers of partitions. When testing the performance of
a system or an application in various scenarios and configurations, it is critical to change only
one thing at a time. That way, you will always know the precise cause of any performance
change.
Related topics
Formula for calculating a component’s memory usage
As explained above, the memory used by a graph phase should be roughly equivalent to the
following:
Component_instance1 + Component_instance2 + Component_instance3 + . . .
where Component_instance1 (and so on) represent the memory requirements of each component
instance process in the graph phase. We say component instance because, in cases where a
component is running parallel in a partitioned layout, each partition’s instance of the component
process must be added into the total. Thus, a component running four ways parallel is really four
separate instances of the component, and all four instances have to be counted in the memory
usage total.
Note that these are program components only: an INPUT FILE or OUTPUT FILE component
doesn’t count, although (for example) READ MULTIPLE FILES or WRITE MULTIPLE FILES
does.
The formula for (roughly) calculating the amount of memory required by one component
instance process is as follows:
base amount + lookups + max-core
where:
base amount is an amount of memory, usually equal to about 7 MB
The actual base amount varies by the particular component, platform and compilation mode, and
can be as low as 3 MB and as high as 10 MB. However, 7 MB is a good middle figure to use in
these calculations.
lookups is the amount of memory required by any lookup file referenced by the component
Lookup files use memory for the lookup data itself, and for the “indexes” used to do the lookup.
In some cases, the lookup data can be shared among components. In these cases, the memory
used by the data should only be counted once in a graph phase. See “Memory needs for lookup
files”.
max-core is an extra amount of memory specified by the component’s max-core parameter (if
any)
Certain components have extra memory needs which can vary, depending on the size of the data
involved, and other things. This parameter, if a component has it, allows you to specify how
much extra memory can be allocated to the component. See “Choosing the max-core setting”.

Memory needs for lookup files

You can assume that both parts of any lookup file (both the data and the indexes) are always
shared among the graph components that use it, unless:
The lookup file is remote. If (for example) you have two REFORMAT components in your
graph that access the same lookup file, they cannot share a copy of it if they run on two different
computers.
The lookup file is an MVS dataset.
The lookup file is of a type that does not have a precomputed index — for example, appendable
lookup files and updatable lookup files.
When you’re counting up the memory needs of components in a graph phase, you should count
only once any lookup file that is shared.
Two things, taken together, make up the size of a lookup file:
Lookup data size
This is the same as the size of the file itself. If the file is a multifile, and the component doing the
lookup is partitioned on more than one computer, then you should count only the data in a single
partition on the same computer.
Note also that if the lookup file will be growing over time, you should allow for this growth in
your memory estimate as well.
Index size
This is equal to:
number of records in the data * index entry size
The index entry size varies, depending on the types and numbers of key fields:
For a 32-bit Co>Operating System, an index entry will be about 20 bytes
Simpler keys (fixed length types, contiguous fields) will have entries of about 12 bytes
For a 64-bit Co>Operating System, an index entry will be about 32 bytes
Simpler keys (fixed length types, contiguous fields) will have entries of about 24 bytes

File table overflow

Question

What does the error message “File table overflow” mean?

Short answer

This error message indicates that the system-wide limit on open files has been exceeded. Either
there are too many processes running on the system, or the kernel configuration needs to be
changed.

Details

This error message might occur if the maximum number of open files allowed on the machine is
set too low, or if max-core is set too low in the components that are processing large amounts of
data. In the latter case, much of the data processed in a component (such as a SORT or JOIN
component) spills to disk, causing many files to be opened. Increasing the value of max-core is
an appropriate first step in the case of a sort, because it reduces the number of separate merge
files that must be opened at the conclusion of the sort.
NOTE: Because increasing max-core also results in the memory requirements of your
graph increasing be careful not to increase it too much (and you might need to consider changing
the graph’s phasing to reduce memory requirements). It is seldom necessary to increase max-
core beyond 100MB.
If the error still occurs, see your system administrator. Note that the kernel setting for the
maximum number of system-wide open files is operating system-dependent (for example, this is
the nfile parameter on Unix systems), and, on many platforms, requires a reboot in order to take
effect. See the Ab Initio Server Software Installation Guide for Unix for the recommended
settings.

Value for max-core

Question

What value should I set for the max-core parameter?

Short answer

The max-core parameter is found in the SORT, JOIN, and ROLLUP components, among others. There is
no single optimal value for the max-core parameter, because a “good” value depends on your particular
graph and the environment in which it runs, and on the data.

Details
The Sort, Rollup, Scan and Join components have a parameter max-core which determines the
maximum amount of memory they will consume per partition before they spill to disk. When the value
of max-core is exceeded, all input (in the case of Sort) or the excess input (in the cases of the other
components) are dropped to disk in the form of temporary files. This can have a dramatic impact on
performance, but it does not mean that it is always better to increase the value of max-core in these
situations.

The higher you set the value of max-core, the more memory the component can use. Using more
memory generally improves performance — up to a point. Beyond this point, performance will not
improve and may even worsen. If the value of max-core is set too high, operating system swapping can
occur and the graph may fail if virtual memory on the machine is exhausted.

When setting the value for max-core, you can use the suffixes k, m, and g (uppercase is also supported)
to indicate powers of 1024. For max-core, the suffix k (kilobytes) means precisely 1024 bytes, not 1000.
Similarly, the suffix m (megabytes) means precisely 1048576 (10242), and g (gigabytes) means precisely
10243. Note that the maximum allowed value for max-core is 2147483647 in 32-bit builds of the
Co>Operating System.

In general, using additional memory can improve the performance of in-memory Rollup or Join, but not
of Sort.

When spillage occurs, consider setting the configuration variable AB_SPILL_FILE_COMPRESSION_LEVEL.

This variable compresses the temporary files spilled to disk. It is most helpful when you have a fast CPU
but slow disk (which is common).

In-memory ROLLUP or JOIN

It is difficult to be precise about the amount of memory an in-memory Rollup or Join can use.

An in-memory Join tries to hold all its nondriving inputs in memory. Thus you should make the largest
input by volume the driving one by setting the driving parameter to the number of its port.

When the non-driving inputs fit in memory, the driving input is pipelined, resulting in pipeline
parallelism. Any spillage of the non-driving input (which happens incrementally when its size exceeds
the value of max-core) breaks the pipeline and eliminates the parallelism.

An in-memory Rollup component must have enough memory to hold the size of its keys, plus the size of
its temporaries, plus the size of any input fields required in finalize to produce the output. In practice, in
most Rollup components, this is simply the size of the output. In addition, some space is needed for the
in-memory index.

If the totality of this data exceeds the value of max-core, the component spills the excess to disk
incrementally.
You should always set max-core’s value in in-memory Rollup and Join components as a reference to a
sandbox input parameter declared with an appropriate default value. The input parameter’s value can
be changed at runtime if required.

NOTE: The Ab Initio Environment’s AI_GRAPH_MAX_CORE parameter is predefined for this

purpose. AI_GRAPH_MAX_CORE is defined in terms of declarations for AI_GRAPH_MAX_CORE_HALF
and AI_GRAPH_MAX_CORE_QUARTER and thus you can easily divide the available max-core among
different in-memory components in a phase. The Ab Initio Environment checks that
$AI_GRAPH_MAX_CORE has a sensible value by comparing it to $AI_GRAPH_MAX_CORE_MIN and
$AI_GRAPH_MAX_CORE_MAX.

If two or more in-memory components each need most or all of the memory available for max-core, you
should put the components in separate phases, provided you have the disk space to hold the data at the
phase break.

Another use of phasing is to control the allocation of memory among in-memory components. When
there is a limited amount of memory available you can use phasing to make sure each in-memory
component gets a sufficient amount. Typically, only one to four in-memory components of significant
size should occupy the same phase, depending on memory availability and demands.

To compute a runtime estimate for max-core, take two thirds of the total memory available on the
machine and subtract any memory used by lookups and competing jobs, including other graphs, running
at the same time on the machine. This is the available memory. Divide this result by the number of
partitions to get your max-core estimate — max-core is measured per partition. The formula is thus:

AI_GRAPH_MAX_CORE = ((2/3 * total memory) - memory used elsewhere)/(number of partitions)

SORT component
For the Sort component, 100 MB is the default value for max-core. This default works well for a wide
variety of situations, and you rarely need to change it.

You should increase max-core when the data volume is so large that the number of temporary (spillage)
files exceeds 1000 (approximately — the actual value depends on ulimit). In this case SORT writes to disk
twice, slowing performance significantly. You can estimate the number of temporary files by multiplying
the data volume being sorted by three and dividing by the value of max-core (because data is written to
disk in blocks that are one-third the size of the max-core setting). For example, suppose you are sorting
100 GB of data with the default max-core setting of 100 MB and the process is running in serial. The
number of temporary files that will be created is:

3 × 100000 MB / 100 MB = 3000 files

In this case, increasing max-core would reduce the number of temporary files. (Remember that if you
are using a multifile system, you should use the data volume per partition in this calculation.)
For other cases, where a SORT component is a critical bottleneck, you must experiment to determine
whether increasing max-core to keep more data in memory is worthwhile. To keep data in memory, you
need max-core to be 50% greater than the volume of data being sorted. However, increasing the max-
core to accommodate larger volumes of data in memory typically does not increase performance.
(Briefly, SORT divides the data into blocks; each block is one-third the size of max-core. When you
increase max-core, the size of each block increases. However, the time to sort each block increases
disproportionately, slowing performance even with fewer blocks to sort.)

Rarely, you may see a “Too many open files” error message. Most often this occurs when the sort
operation encounters the system limit for the number of open files. To avoid the error, decrease the
value of the configuration variable AB_MAX_SIMULTANEOUS_MERGE_FILES. For more information, see
“Too many open files in the system or Too many open files”.

NOTE: We recommend setting the value max-core as a $ reference to a parameter (for example,
$AI_SORT_MAX_CORE) so you can easily adjust the value at runtime if required.

=========================================

Introduction To Data and Memory Intensive Computing
No ratings yet
Introduction To Data and Memory Intensive Computing
31 pages
Advanced Data Processing Techniques
100% (2)
Advanced Data Processing Techniques
45 pages
Asignment2 FIT1047
No ratings yet
Asignment2 FIT1047
8 pages
More Than 4GB of Memory in A 32-Bit Windows Process
No ratings yet
More Than 4GB of Memory in A 32-Bit Windows Process
12 pages
Performance Monitoring Poster v1.0
No ratings yet
Performance Monitoring Poster v1.0
1 page
ТБК 116 ССО Бахитова Бахыткул Алфавит, графика, орфография, транскрипция, транслитерация
No ratings yet
ТБК 116 ССО Бахитова Бахыткул Алфавит, графика, орфография, транскрипция, транслитерация
10 pages
Computer System Performance Guide
100% (1)
Computer System Performance Guide
10 pages
Amdahl's Law: S (N) T (1) /T (N)
No ratings yet
Amdahl's Law: S (N) T (1) /T (N)
46 pages
Ab Initio - Study Material - Part 1
No ratings yet
Ab Initio - Study Material - Part 1
39 pages
CS-311 Design and Analysis of Algorithms
No ratings yet
CS-311 Design and Analysis of Algorithms
50 pages
Abintio 2
No ratings yet
Abintio 2
4 pages
OS Concept Related To MM
No ratings yet
OS Concept Related To MM
16 pages
WW 3
No ratings yet
WW 3
25 pages
Unit 4
No ratings yet
Unit 4
39 pages
Osmod 3
No ratings yet
Osmod 3
14 pages
Tuning Programs With Oprofi Le
No ratings yet
Tuning Programs With Oprofi Le
10 pages
Operating System (5th Semester) : Prepared by Sanjit Kumar Barik (Asst Prof, Cse) Module-Iii
No ratings yet
Operating System (5th Semester) : Prepared by Sanjit Kumar Barik (Asst Prof, Cse) Module-Iii
41 pages
Virtual Memory
No ratings yet
Virtual Memory
8 pages
Lab 5 Op&Sys
No ratings yet
Lab 5 Op&Sys
4 pages
Operating Systems: Memory Management (Chapter 8: 8.1-8.6)
No ratings yet
Operating Systems: Memory Management (Chapter 8: 8.1-8.6)
48 pages
Lecture 13 Memory Management V2
No ratings yet
Lecture 13 Memory Management V2
46 pages
Oslecture10 13
No ratings yet
Oslecture10 13
160 pages
Memory2 PDF
No ratings yet
Memory2 PDF
93 pages
Memory Management Techniques Guide
No ratings yet
Memory Management Techniques Guide
26 pages
Ab Initio Interview Questions - 1
80% (5)
Ab Initio Interview Questions - 1
19 pages
Unit 4-Memory Maangement
No ratings yet
Unit 4-Memory Maangement
15 pages
Hadoop Performance Optimization Guide
No ratings yet
Hadoop Performance Optimization Guide
132 pages
Memory Management in 7.4 - Kernel
No ratings yet
Memory Management in 7.4 - Kernel
6 pages
In3200 Chap05
No ratings yet
In3200 Chap05
34 pages
Memory Management.
100% (1)
Memory Management.
37 pages
Ab Initio Tuning Tips
No ratings yet
Ab Initio Tuning Tips
2 pages
Unit 1 Part 2-1 Memory Management
No ratings yet
Unit 1 Part 2-1 Memory Management
40 pages
Selector Web - Definitions
No ratings yet
Selector Web - Definitions
8 pages
Windows Task Manager
100% (1)
Windows Task Manager
4 pages
12 Profiling
No ratings yet
12 Profiling
52 pages
Msbi Notes PPT Faqs
No ratings yet
Msbi Notes PPT Faqs
3 pages
AI Questions
No ratings yet
AI Questions
12 pages
Unit IV - File Management
No ratings yet
Unit IV - File Management
37 pages
Mid3 Revision, VM and Instruction Set Architecture: Prof. Sin-Min Lee
No ratings yet
Mid3 Revision, VM and Instruction Set Architecture: Prof. Sin-Min Lee
72 pages
Memory Management-Part1
No ratings yet
Memory Management-Part1
37 pages
UNIT-1: An Operating System (OS)
No ratings yet
UNIT-1: An Operating System (OS)
51 pages
Padp Unit 4up
No ratings yet
Padp Unit 4up
147 pages
Memory Management Techniques
No ratings yet
Memory Management Techniques
48 pages
Ab Initio
No ratings yet
Ab Initio
4 pages
Review #1/3: Pipelining & Performance
No ratings yet
Review #1/3: Pipelining & Performance
7 pages
What To Do When Not Enough Memory?
No ratings yet
What To Do When Not Enough Memory?
5 pages
Unit - 4
No ratings yet
Unit - 4
46 pages
Memory Managements
No ratings yet
Memory Managements
63 pages
Memory Management in Os
No ratings yet
Memory Management in Os
37 pages
Ab Initio Technical Guide
No ratings yet
Ab Initio Technical Guide
51 pages
M3 Guide
No ratings yet
M3 Guide
21 pages
COA Unit 4 Computer Memory System RRP
No ratings yet
COA Unit 4 Computer Memory System RRP
66 pages
Wayne Essbase
No ratings yet
Wayne Essbase
37 pages
Memory Management
No ratings yet
Memory Management
13 pages
Partitioning AIX 101: Jaqui Lynch
No ratings yet
Partitioning AIX 101: Jaqui Lynch
34 pages
Day 2
No ratings yet
Day 2
100 pages
Ab Initio Services Include and Not Limited To
No ratings yet
Ab Initio Services Include and Not Limited To
1 page
AbInitio Scenarios
No ratings yet
AbInitio Scenarios
15 pages
Conduct All Cmds For Task, Resource, Dynamic Param, Shutdown Etc....
No ratings yet
Conduct All Cmds For Task, Resource, Dynamic Param, Shutdown Etc....
4 pages
Air CMD and M Commands
No ratings yet
Air CMD and M Commands
4 pages
Citi Interview Preparation Questionnaire New
No ratings yet
Citi Interview Preparation Questionnaire New
18 pages
Go, Rust Cheat Sheet
No ratings yet
Go, Rust Cheat Sheet
99 pages
OpenText Documentum Foundation Services CE 21.3 Release Notes
No ratings yet
OpenText Documentum Foundation Services CE 21.3 Release Notes
17 pages
Oracle Database FAQ
100% (8)
Oracle Database FAQ
148 pages
Vehicle OBD GPS Tracker Guide
No ratings yet
Vehicle OBD GPS Tracker Guide
4 pages
Project Report - 2 On CreditCard Fraud Detection
No ratings yet
Project Report - 2 On CreditCard Fraud Detection
42 pages
CCNPv7 TSHOOT Lab10-2 Sandbox Instructor
100% (2)
CCNPv7 TSHOOT Lab10-2 Sandbox Instructor
198 pages
Warehouse Information Technology - Written Report
No ratings yet
Warehouse Information Technology - Written Report
4 pages
22516-2019-Winter-Model-Answer-Paper (Msbte Study Resources)
100% (3)
22516-2019-Winter-Model-Answer-Paper (Msbte Study Resources)
26 pages
IT Professionals' Guide to Autonomic Computing
100% (1)
IT Professionals' Guide to Autonomic Computing
21 pages
Mobile Chipset Specifications
No ratings yet
Mobile Chipset Specifications
1 page
SWOT Analysis of Microsoft's MSN
No ratings yet
SWOT Analysis of Microsoft's MSN
5 pages
Introducing SAPUI5: Sap Web Ide 2 Bootstrap 3
No ratings yet
Introducing SAPUI5: Sap Web Ide 2 Bootstrap 3
6 pages
Indexing Structures & Database Design
No ratings yet
Indexing Structures & Database Design
39 pages
User Instructions - MinIPAQ-HLP en
No ratings yet
User Instructions - MinIPAQ-HLP en
2 pages
Computer Science Quiz for Students
No ratings yet
Computer Science Quiz for Students
4 pages
DDF Framework TestNG
No ratings yet
DDF Framework TestNG
6 pages
PHPExcel Developer Documentation PDF
No ratings yet
PHPExcel Developer Documentation PDF
52 pages
FAQ SINAMICS G Firmwarestnde V4.7 SP10 HF5
No ratings yet
FAQ SINAMICS G Firmwarestnde V4.7 SP10 HF5
4 pages
Revision 1
No ratings yet
Revision 1
3 pages
Chuong Trinh PDF
No ratings yet
Chuong Trinh PDF
21 pages
Chapter 1 - Introduction To Programming and Visual Basic
No ratings yet
Chapter 1 - Introduction To Programming and Visual Basic
59 pages
LQKZG 5 ZG
No ratings yet
LQKZG 5 ZG
2 pages
UPTIME Data Center Report 2025
No ratings yet
UPTIME Data Center Report 2025
15 pages
Stm32Cubeide Release V1.7.0
No ratings yet
Stm32Cubeide Release V1.7.0
17 pages
VC Product Wiring Diagrams
No ratings yet
VC Product Wiring Diagrams
108 pages
NRF52 DK User Guide v1.3.1
No ratings yet
NRF52 DK User Guide v1.3.1
31 pages
VIMnet
No ratings yet
VIMnet
22 pages
AMD Accelerated Parallel Processing OCL Programming Guide-2013!06!21
No ratings yet
AMD Accelerated Parallel Processing OCL Programming Guide-2013!06!21
288 pages
List Devices
No ratings yet
List Devices
106 pages
Spesifikasi PDF
No ratings yet
Spesifikasi PDF
6 pages

Max Core Value & Performance

Uploaded by

Max Core Value & Performance

Uploaded by

What should you set it to?

How does a single graph use memory?

Filesystem layout and performance

Memory needs for lookup files

File table overflow

What does the error message “File table overflow” mean?

Value for max-core

What value should I set for the max-core parameter?

When spillage occurs, consider setting the configuration variable AB_SPILL_FILE_COMPRESSION_LEVEL.

In-memory ROLLUP or JOIN

NOTE: The Ab Initio Environment’s AI_GRAPH_MAX_CORE parameter is predefined for this

AI_GRAPH_MAX_CORE = ((2/3 * total memory) - memory used elsewhere)/(number of partitions)

3 × 100000 MB / 100 MB = 3000 files

You might also like