0% found this document useful (0 votes)
88 views506 pages

Malcomb NewUniversalLaw

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views506 pages

Malcomb NewUniversalLaw

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 506

New Universal Law: Application of Tracy-Widom Theory for Construction Network

Schedule Resilience
Source: Catholic University of America
Contributed by: Malcomb, Armelle (Author); The Catholic University of America (Degree
granting institution); Lucko, Gunnar (Thesis advisor); Thompson, Rick (Committee
member); Agbelie, Bismark (Committee member)
Stable URL: https://www.jstor.org/stable/community.38760519

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

This item is being shared by an institution as part of a Community Collection.


For terms of use, please refer to our Terms & Conditions at https://about.jstor.org/terms/#whats-in-jstor

Catholic University of America is collaborating with JSTOR to digitize, preserve and extend access to Catholic
University of America

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
THE CATHOLIC UNIVERSITY OF AMERICA

New Universal Law: Application of Tracy-Widom Theory for Construction Network Schedule
Resilience

A DISSERTATION

Submitted to the Faculty of the

Department of Civil and Environmental Engineering

School of Engineering

Of The Catholic University of America

In Partial Fulfillment of the Requirements

For the Degree

Doctor of Philosophy

©
Copyright

All Rights Reserved

By

Armelle P. Malcomb

Washington, D.C.

2022

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Abstract

New Universal Law: Application of Tracy-Widom Theory for Construction Network Schedule
Resilience

Armelle P. Malcomb, Ph.D.

Director: Gunnar Lucko, Ph.D.

A methodology based on random matrix theory (RMT) has been proposed to investigate the

underlying behavior of project network schedules. The approach relies on a devised mathematical

model and three premises. The first assumption demands that the probabilistic activity durations

have an identical triangular distribution with known parameters. A repetitive joint sampling of

activity durations serves to create a sample data matrix 𝑿 using the identified scheme for

translating a project network of size p into a random matrix utilizing its dependency structure

matrix. Although the joint sampling distribution was unknown, it served to draw each of the n

rows of 𝑿 . The second assumption is that the Tracy-Widom (TW1) distribution is the natural

distribution of each row of 𝑿 's sampling. Interactions between numerous parties participating

in project management and construction cause a project network schedule to fall under complex

systems marked by a phase transition and a tipping point. In addition, the striking similarities

between the fields of applications of the TW distributions and those of project scheduling support

this assumption. The last assumption is that a project network schedule with sufficient correlation

in its structure, like that of complex systems, can be investigated within the framework of RMT.

This assumption is justified by the interdependence structure defined by the various pairwise links

between project activities. This assumption enabled the application of RMT’s universality results

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
to project networks schedules to investigate their underlying behavior. In RMT, the appropriately

scaled eigenvalues of sample covariance matrices serve as test statistics for such a study.

, ,
As a result, a carefully engineered sample covariance matrix 𝑺 was developed, and two

standardization approaches (Norm I and Norm II) for its eigenvalues were identified. Both

standardization approaches relate to the universality of the TW1 limit law, which many authors

have extended (e.g., Soshnikov 2002, Péché 2008) to a broad class of matrices that are not

necessarily Gaussian under relaxed assumptions. Although some of these assumptions have been
, ,
eased, others must still be met. Among these extra requirements, the formulation of 𝑺 was

chosen. Its formulation necessitated the centering and scaling of the matrix X consisting of n

samples of p early finish (EF) times of a project network’s activities. In addition, it included the

significance level α to test the TW1 distributional assumption. The Kolmogorov-Smirnov (K-S)

goodness-of-fit test with the α values of 5, 10, and 20% was found suitable for this study.

35 project networks of diverse sizes and complexity values were identified from the study's

benchmark networks 2040 obtained from the Project Scheduling Problem Library (PSPLIB). Their

sizes (resp. restrictiveness RT values) ranged from 30 to 120 activities (resp. 0.18 to 0.69). Kelly's

(1961) forward and backward passes of the critical path method (CPM) determined the EF times.

Using the devised methodology, the set of 100 simulations of network schedules yielded three

significant findings. First, the scatterplot of 100 pairs of the normalized largest eigenvalue (𝑙 )
, ,
of 𝑺 and the sample size n revealed a distinct and consistent pattern. The pattern is a concave

upward curve that steepens to the left and flattens to the right as n increases. Surprisingly, networks

of varying sizes and complexity showed the same pattern regardless of the normalization method.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Using the distributional assumption on activity durations, the deviations ∆ of the empirical means

of 𝑙 from the mean of the TW distribution (𝜇 ) were determined using the same 100 outputs.

They enabled the graphing of scatterplots of sample size n against ∆ . The resulting pattern

highlighted the association between n and 𝑙 . Similarly, the deviations ∆ between the

variances of 𝑙 and 𝑣𝑎𝑟 were calculated. The resulting pattern, also consistent across

networks, helped determine an optimum sample size (𝑛 ) that would maximize variance in a

project network schedule's sampled durations. This sample size was found at the mean deviation

curve's intersection with the horizontal axis (n-axis). One may view 𝑛 as the required pixel

count for high-quality printing. The size 𝑛 was found to be related to the network size p but not

its RT value. Moreover, an 𝑛 value was found for all the 35 networks and included α in the

, ,
expression of 𝑺 was not necessary. Still, leaving it out resulted in higher values of 𝑛 .

Subsequently, the derived 𝑛 was used in a series of 1000 simulations to validate the

distributional assumption on activity durations. The K-S test statistics were the normalized first

, ,
through fourth-largest eigenvalues 𝑙 ,𝑙 ,𝑙 , and 𝑙 of the matrix 𝑺 . By comparing

results based on the normalization approaches, Baik et al. (1999) and Johansson (1998)—Norm II

may be better suited to studying project network scheduling behavior than Johnstone (2001)—

Norm I. Under Norm I, 18 of the 35 project networks validated the null hypothesis when using

, ,
𝑙 and 𝑙 of their matrices 𝑺 . Norm II supported the null hypothesis for 19 of the 21

, ,
networks evaluated when using 𝑙 and 𝑙 of the matrices 𝑺 . This discovery is significant,

perhaps expected, since Baik et al. (1999) introduced Norm II while studying the length of the

longest increasing sequence of random permutations, which was governed by a TW limit law. The

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
empirical and theoretical distribution plots' agreement was displayed and compared using Q-Q

plots and histograms. The graphs corroborated the K-S test results that the TW1 distribution is the

limiting joint sampling distribution of project network schedules. Also, the Q-Q plots showed a

proper normalization of the mth largest eigenvalue should improve the K-S test performance.

After the assumed limiting distribution validation for the durations of project schedules, another

methodology was proposed to help design better project schedules. The intended methodology is

formulated based on the previous model and assumptions. For the matrices' eigenvalues to be

standardized, the methodology's assumptions limit the sample size to 𝑛 established at a

significance level 𝛼 5%. At this 𝛼 level, the TW1 distribution is the natural limiting distribution

for the sampling of durations of project activities. The suggested methodology relies on three rules

to help choose which principal components (PCs) to keep. The simulations on four networks of

various sizes yielded the following findings. First, using the scree plot rule and proportions of each
, ,
PC to the total variance of the sample covariance 𝑺 or population correlation R matrix, the

study discovered a link between 𝑛 and PC retention for any of the networks. In addition, the

, ,
eigenvalues of both 𝑺 and R are very nearly equal. This is a significant result.

Furthermore, the investigation demonstrated that Johnston's (2001) spiked covariance model might

forecast project network activities' limiting durations via a PCA-based linear regression model. On

scree plots, one or a few largest eigenvalues stood out from the rest. While the proportion of total

variances with an 80% cutoff criterion for selection helped select the number of the rth ranked 𝑙

to retain as PCs, the hypothesis testing criteria based on TW p-values did not. The TW p-value

estimations' availability for testing s after the 4th largest eigenvalues was the issue. Finally, the

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
threshold value computation for each of the four networks indicated strong evidence of a phase

transition in project network schedules transitioning from stable to unstable. This discovery is

critical because it may help practitioners determine when a construction project's schedule may

become problematic. Since the empirical study involved only four networks, more study is needed

on PCA-based models and locating phase transitions in project network schedules.

In conclusion, while the uncovered universal pattern may not be suited for manual applications, it

can be added as an add-on to project network scheduling applications. Doing so would aid in

simulating the necessary network schedules and determining the optimal sample size

corresponding to the tipping point associated with a project network schedule. At that point, a

project schedule may transition from a strong-coupling phase with activities in concert to a weak-

coupling phase with independent activities. In addition, because the optimal sample size

corresponds to the maximum variance in the project activity durations, it may determine the

limiting duration of each activity and total project duration, which, if exceeded, may result in a

project schedule instability with unrecoverable delays. The proposed PCA-based linear regression

model, based on Johnstone's (2001) spiked covariance model, is intended to forecast project

limiting durations. These durations may aid practitioners in predicting project schedules and costs

that are resilient. Finally, the significant discoveries of this pioneering study have resulted in

proposals for contributions to the body of knowledge and recommendations for future research.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Signature Page

This dissertation by Armelle P. Malcomb fulfills the dissertation requirement for the doctoral
degree in Civil Engineering and Management, approved by Gunnar Lucko, Ph.D., as Director, and
by Richard C. Thompson, Jr., Ph.D., and Bismark Agbelie, Ph.D., as Readers.

Gunnar Lucko, Ph.D., Director

Richard C. Thompson, Jr., Ph.D., Reader

Bismark Agbelie, Ph.D., Reader

ii

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Table of Contents
Title Page
Abstract

Signature Page............................................................................................................................... ii
Table of Contents ......................................................................................................................... iii
List of Figures ................................................................................................................................ x
List of Tables .............................................................................................................................. xiii
List of Equations ........................................................................................................................ xvi
List of Abbreviations ............................................................................................................... xxiv
Acknowledgment ...................................................................................................................... xxvi
Introduction ........................................................................................................... 1
Abstract ............................................................................................................ 1
1.1 Background ............................................................................................... 2
1.2 Network Schedules ..................................................................................... 5
1.2.1 Definitions .................................................................................................................... 5
1.2.2 Planning a Construction Project ................................................................................... 7
1.2.3 Visual Displaying a Construction Project Schedule ..................................................... 8
1.2.4 Scheduling a Construction Project ............................................................................. 11
1.2.5 Network Schedule Structures ..................................................................................... 11
1.3 Construction Scheduling Techniques .......................................................... 12
1.3.1 PERT .......................................................................................................................... 12
1.3.2 Critical Path Method (CPM)....................................................................................... 15
1.4 Vectors and Matrices ................................................................................. 22
1.4.1 Vector Definitions and Operations ............................................................................. 22
1.4.2 Matrix Definitions and Operations ............................................................................. 28
1.5 Descriptive Statistics and Inferential Statistics ............................................ 42
1.5.1 Prelude ........................................................................................................................ 42
1.5.2 Describing the Data: Sample and Population ............................................................. 43

iii

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
1.5.3 Arranging Data in Tables or Arrays ........................................................................... 48
1.5.4 Plotting Data ............................................................................................................... 50
1.5.5 Describing Univariate Data in Numerical Form......................................................... 61
1.5.6 Describing Multiple Dimensional Data in Numerical Form ...................................... 66
1.5.7 Determining the Data Distribution through Hypothesis Testing ................................ 68
1.5.8 Covariance Matrices: Introduction ............................................................................. 79
1.6 Probability Distributions and Random Variables ......................................... 82
1.6.1 Preface ........................................................................................................................ 82
1.6.2 Univariate Random Variables and Probabilities ........................................................ 83
1.6.3 Multivariate Random Variables................................................................................ 100
1.7 Multivariate Data and Analysis ................................................................ 103
1.7.1 Preface ...................................................................................................................... 103
1.7.2 Introductory to Multivariate Analysis Techniques ................................................... 105
1.7.3 Notation Convention................................................................................................. 106
1.8 Null Hypothesis Significance Testing (NHST) .......................................... 106
1.9 Research Organization ............................................................................. 113
1.9.1 Chapter 1 – Introduction ........................................................................................... 113
1.9.2 Chapter 2 - An Investigation of the Underlying Behavior of Construction Project
Network Schedules ................................................................................................................ 114
1.9.3 Chapter 3 – Application of PCA for Data Reduction in Modeling Project Network
Schedules Based on the Universality Concept in RMT ......................................................... 115
1.9.4 Summary and Conclusions ....................................................................................... 116
References ..................................................................................................... 117

CHAPTER 2 An Investigation of the Underlying Behavior of Construction Project Network


Schedules.................................................................................................................................... 122
Abstract ......................................................................................................... 122
2.1 Introduction ............................................................................................ 126
2.2 Research Question ................................................................................... 128
2.3 Literature Review ................................................................................... 130

iv

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
2.3.1 Random Matrix Models and Ensembles................................................................... 133
2.3.2 Elements and Properties of Wishart Random Matrices ............................................ 146
2.3.3 Bulk Spectrum Behaviors (Universality Theorems) ................................................ 156
2.3.4 Edge Spectrum and Universality of The Tracy Widom Distributions ..................... 164
2.3.5 The universality of the Tracy Widom Distributions ................................................. 174
2.3.6 Approximation of the Tracy Widom Distributions .................................................. 177
2.4 Research Methodology ............................................................................ 180
2.4.1 Research Objectives ................................................................................................. 183
2.4.2 Chapter 2’s Algorithm .............................................................................................. 184
2.4.3 Map and Match Conceptual Analogies Between Study Fields of Interests ............. 185
2.4.4 Data Collection and Preparation ............................................................................... 193
2.4.5 Formatting and Transforming Networks into Dependency Matrices ....................... 197
2.4.6 Model Development to Investigate Project Schedules Underlying Behaviors......... 223
2.5 Research Results ..................................................................................... 241
2.5.1 Benchmark Network Structure Analysis: Complexity Measures ............................. 241
2.5.2 Uncovering the Underlying Behavior of Project Network Schedules ...................... 252
2.6 Research Contributions and Recommendations .......................................... 295
2.6.1 Research Contribution 1 ........................................................................................... 296
2.6.2 Research Contribution 2 ........................................................................................... 296
2.6.3 Research Contribution 3 ........................................................................................... 296
2.6.4 Research Contribution 4 ........................................................................................... 296
2.6.5 Research Contribution 5 ........................................................................................... 297
2.7 Recommendations for Future Research ..................................................... 297
2.7.1 Recommendation 1 – Using Larger and Real-Life Project Network Schedules ...... 299
2.7.2 Recommendation 2 – Considering other Measures of Complexity .......................... 299
2.7.3 Recommendation 3 – Using a Different Normalization Approach .......................... 300
2.7.4 Recommendation 4 – Extending to Include the Next, Next Largest Eigenvalue ..... 300
2.8 Conclusion ............................................................................................. 300
References ..................................................................................................... 301

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
CHAPTER 3 Application of PCA for Data Reduction in Modeling Project Network
Schedules Based on the Universality Concept in RMT ......................................................... 311
Abstract ......................................................................................................... 311
3.1 Introduction and Research Questions ........................................................ 313
3.2 Literature Review ................................................................................... 314
3.2.1 PCA Methods ........................................................................................................... 315
3.2.2 PCA's Applications in Construction and Civil Engineering (CE) Fields ................. 318
3.3 The Fundamentals of Principal Component Analysis .................................. 321
3.3.1 Principal Components of the Population .................................................................. 322
3.3.2 Principal Components from Standardized Population.............................................. 325
3.3.3 Principal Components for Covariance Matrices with Special Structures................. 328
3.3.4 Principal Components in Random Variable Observations ....................................... 330
3.3.5 Standardizing the sample Principal Components ..................................................... 333
3.3.6 PCA in Terms of Singular Value Decomposition .................................................... 336
3.3.7 Geometric Interpretation of the Sample Principal Components ............................... 336
3.3.8 The Number of Principal Components ..................................................................... 339
3.3.9 Graphing the Principal Components and Regression Model.................................... 341
3.4 Principal Components in Regression ......................................................... 342
3.4.1 Classical Single Linear Regression Model ............................................................... 342
3.4.2 Least Square Estimation ........................................................................................... 344
3.4.3 Principal Components and Linear Regressions ........................................................ 345
3.5 Large Sample inferences .......................................................................... 347
3.5.1 Sphericity Test for Sample Covariance Matrices under Multinormality ................. 348
3.5.2 Sphericity Tests Based on TW1 p-Values................................................................. 349
3.5.3 Sphericity Test Applications .................................................................................... 352
3.5.4 The Spiked Model (Johnstone 2001, 2006) .............................................................. 356
3.5.5 Phase Transition and Tracy-Widom Distribution..................................................... 356
3.6 Research Objectives ................................................................................ 358
3.6.1 Research Objective 1 ................................................................................................ 358

vi

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
3.6.2 Research Objective 2 ................................................................................................ 358
3.6.3 Research Objective 3 ................................................................................................ 358
3.6.4 Research Objective 4 ................................................................................................ 359
3.6.5 Research Objective 5 ................................................................................................ 359
3.7 Research Methodology ............................................................................ 359
3.7.1 Assumptions ............................................................................................................. 360
3.7.2 Analysis of Literature Review .................................................................................. 360
3.7.3 Procedure for Conducting a PCA for Construction Project Network Schedules ..... 361
3.7.4 Methodology Conclusion ......................................................................................... 362
3.8 Simulation Results .................................................................................. 362
3.8.1 Analysis of PCs Based on Graphical Analysis and Sample Variabilities ................ 363
3.8.2 PCA Based on Hypothesis Testing........................................................................... 371
3.8.3 Phase Transition ....................................................................................................... 373
3.9 Conclusions and Contributions to the Body of Knowledge ......................... 373
3.9.1 Research Contribution 1 ........................................................................................... 374
3.9.2 Research Contribution 2 ........................................................................................... 374
3.9.3 Research Contribution 3 ........................................................................................... 374
3.9.4 Research Contribution 4 ........................................................................................... 375
3.9.5 Recommendations for Future Research .................................................................... 375
3.10 Conclusion ........................................................................................ 375
References ..................................................................................................... 376

CHAPTER 4 Summary and Conclusions ............................................................................... 381


Chapter Summary ........................................................................................... 381
4.1 Conclusions and Contributions to the Body of Knowledge ......................... 381
4.1.1 Introduction (Chapter 1) ........................................................................................... 381
4.1.2 An Investigation of the Underlying Behavior of Construction Project Network
Schedules (Chapter 2) ............................................................................................................ 381
4.1.3 Application of PCA for Data Reduction in Modeling Project Network Schedules Based
on the Universality Concept in RMT (Chapter 3) .................................................................. 382

vii

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
4.2 Recommendations for Future Research ..................................................... 383
4.2.1 An Investigation of the Underlying Behavior of Construction Project Network
Schedules (Chapter 2-Recommendations) ............................................................................. 383
4.2.2 Application of PCA for Data Reduction in Modeling Project Network Schedules Based
on the Universality Concept in RMT (Chapter 3-Recommendations)................................... 384
4.2.3 Conclusion ................................................................................................................ 385

APPENDIXES ........................................................................................................................... 386


Appendix A Original PSPLIB Files ....................................................................................... 387
Appendix A.1: PSPLIB Files Converted from.sm to.txt Format .......................... 387
Appendix A.2: Original j301_1.sm ................................................................... 401

Appendix B VBA/MATLAB Codes........................................................................................ 403


Appendix B.1: VBA Code ............................................................................... 403
Appendix B.2: MATLAB Code for Network Files’ Activity Entry Computations . 404

Appendix C Flowcharts ............................................................................................................ 408


Appendix C.0: Meanings of Flowchart Symbols ................................................ 408
Appendix C.1: Flowchart - CPM Forward Pass ................................................. 409
Appendix C.2: Flowchart - CPM Backward Pass ............................................... 410
Appendix C.3: Flowchart - Activity Float Calculations ...................................... 411
Appendix C.4: Flowchart - Formatting a Network for Activity Entry Computations
..................................................................................................................... 412
Appendix C.5: Flowchart - Probabilistic Duration Calculations .......................... 413
Appendix C.6: Flowchart - Network Path Determinations .................................. 414
Appendix C.7: Flowchart - Johnson Complexity Measure Calculation ................ 415

Appendix D Code Outputs/Network Activity Information................................................... 416


Appendix D.1: Converted ‘j30_1.sm’ to ‘J30_Tri_ j301_1.txt’ ........................... 416
Appendix D.2: PSPLIB Network j3038-7 Input Data ......................................... 417
Appendix D.3: PSPLIB Network j902-4 Input Data ........................................... 418

viii

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix E Code Outputs/Network ...................................................................................... 421
Appendix E.1: All Paths from Source to Sink of Network j301 .......................... 421
Appendix E.2 – Network Complexity Measures ................................................ 422

Appendix F – Simulation Outputs – Networks’ Population Structure ................................ 455


Appendix F.1 – PSPLIB j30 – Scatterplots of Network's Normalized 1 st Eigenvalues
against Sample Size (Networks j3024-8 and j3032-4) ........................................ 456
Appendix F.2 – Plots of Deviations ∆ μ,1 00 versus Sample Size n Required to Construct
the Matrix X n x p (1 st Largest Eigenvalues) ........................................................ 457
Appendix F.3 – Plots of Deviations ∆ σ 2 ,100 versus Sample Size n Required to
Construct the Matrix X n x p (1 st Largest Eigenvalues) ......................................... 460
Appendix F.4 – Outputs of Means’ and Variances’ Deviations for the Set of j60
Networks ........................................................................................................ 463
Appendix F.5 – Outputs of Deviations ∆μ,100 and Slopes for Network j3032-4 .. 465
Appendix F.6 – Outputs of Deviations ∆μ,100 and Slopes for Network j3032-4 .. 466
Appendix F.7 – Outputs of KS Testing (2 nd Largest Eigenvalue) ........................ 467
Appendix F.8 – Outputs of KS Testing (3 rd Largest Eigenvalue) ........................ 469
Appendix F.9 – Outputs of KS Testing – Illustration of Untreated Results (1 st Largest
Eigenvalue – Norm I – j90) ............................................................................. 471
Appendix F.10 – Q-Q Plots (3 rd Largest Eigenvalue) ......................................... 472

Appendix G Different Formulations of the Sample Covariance Matrix 𝑺𝒑 𝒑 ................ 474

ix

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
List of Figures
Chapter 1 – Introduction.............................................................................................................. 1
Figure 1.1: Dissertation Flowchart ...................................................................... 4
Figure 1.2: Gantt Chart Schedule for a Concrete Wall Construction ...................... 9
Figure 1.3: AON and AOA Network Diagrams .................................................... 10
Figure 1.4: Network Schedule with PERT Calculations ....................................... 12
Figure 1.5: General Representation of a CPM Activity Node ................................ 16
Figure 1.6: General Depiction of CPM Logic Constraints in an AON Diagram ...... 19
Figure 1.7: Vector Representation, Vector Summation......................................... 23
Figure 1.8: 2D and 3D Analytical Representations of Vector Components ............ 24
Figure 1.9: Dot Product, Vector Product, and Projection ..................................... 27
Figure 1.10: Illustration of a Population and its Sample Values ........................... 43
Figure 1.11: Illustrations of a Histogram and a Cumulative Distribution Graphs ... 54
Figure 1.12: Illustration of a Q-Q Plot: Normal Distribution ................................ 59
Figure 1.13: Illustrations of P-P Plots in Genetics and Hydrologic Engineering .... 61
Figure 1.14: Illustration of the Percentage Points for a Chi-Square CDF ............... 73
Figure 1.15: Euclidian and Statistical Distance Illustrations ................................ 81
Figure 1.16: Probability Distribution and Mass (resp. Density) Functions Illustrations
for Discrete (resp. Continuous) Random Variables .............................................. 88
Figure 1.17: Uniform Probability Density and Distribution Functions’ Graphs ...... 92
Figure 1.18: Triangular Probability Density and Distribution Functions’ Graphs ... 94
Figure 1.19: Normal Probability Density and Distribution Functions’ Graphs ....... 96
Figure 1.20: Beta Probability Density and Distribution Functions’ Graphs ............ 99

Chapter 2 - An Investigation of the Underlying Behavior of Construction Project Network


Schedules.................................................................................................................................... 122
Figure 2.1: Tracy-Widom Distribution with Phase Transitions ........................... 128
Figure 2.2: Illustrations of The Wigner Semicircle Law: Random Matrices with
Normally and Uniformly Distributed Entries..................................................... 159

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Figure 2.3: Marchenko-Pastur Law of Density Function g .................................. 162
Figure 2.4: Joint Density Functions f1, f2, and f4 of the Largest Eigenvalues
Associated with the TW Laws F1, F2, and F4 .................................................... 166
Figure 2.5: Illustrations of Theoretical Limiting Densities for the 1 st (Curve in Right)
through 4 th (Curve in Left) Largest Eigenvalue of 10 4 Realizations of 10 3 x 10 3 GOE
Matrices ......................................................................................................... 174
Figure 2.6: Illustration of the Left and Right Tail Behavior of the TW F β ........... 177
Figure 2.7: Overall Research Methodology for Chapter 2 .................................. 182
Figure 2.8: Chapter 2’s Algorithm ................................................................... 184
Figure 2.9: Exemplar Activity-on-Node Diagram .............................................. 195
Figure 2.10: PSPLIB J301-1 Activity-On-Node Diagram ................................... 196
Figure 2.11: Exemplar Network: Activity Probabilistic Duration Plots ............... 200
Figure 2.12: Network Structure Representations ............................................... 211
Figure 2.13: Illustration of Activity Predecessors and Successors on a Network
Dependency Matrix ......................................................................................... 218
Figure 2.14: Different Schemes for Encoding a Project Network Schedule into a
Sample Data Matrix ........................................................................................ 227
Figure 2.15: Distributions of Network Complexity Measures ............................. 245
Figure 2.16: Scatterplots of Normalized 1 st Largest Eigenvalues Versus X’s n Rows
..................................................................................................................... 258
Figure 2.17: Deviations between Means of the Assumed PDF and Empirical PDF of a
Set of j30 Project Network Schedules's Largest Eigenvalues .............................. 263
Figure 2.18: Plots of Deviations between Variances of the Assumed PDF and
Empirical PDF of a Set of j30 Project Network Schedules 's Largest Eigenvalues 267
Figure 2.19: Illustrations of the K-S test Results for all Project Networks .......... 285
Figure 2.20: Q-Q Plots of Networks j3011-1 (Norm I) and j3038-5 (Norm II) ..... 289
Figure 2.21: Histograms of Networks j3011-1 (Norm I) and j3038-5 (Norm II) ... 290
Figure 2.22: Q-Q Plots of Networks j6015-1 (Norm I) and j12024-2 (Norm II) ... 291
Figure 2.23: Histograms of Networks j6015-1 (Norm I) and j12024-2 (Norm II) . 292

xi

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Chapter 3 - Application of PCA for Data Reduction in Modeling Project Network Schedules
Based on the Universality Concept in RMT ........................................................................... 311
Figure 3.1: Illustration of a Vector Maximizing the ith Population PC ................ 323
Figure 3.2: Illustration of Coefficient Vectors Maximizing the 1 st and ith Sample
..................................................................................................................... 331
Figure 3.3: Geometric Illustration of the Sample Principal Components ............. 339
Figure 3.4: Illustration of a Scree Plot ............................................................. 340
Figure 3.5: Elements of Fitting a Model to Data Using Least Square Estimates ... 345
Figure 3.6: Illustration of Residual Vector (PCA) ............................................. 346
Figure 3.7: Illustration of the Mauchly (1940)’s Sphericity Test ........................ 354
Figure 3.8: Illustration of Sphericity Tests Based on TW F 1 p-Values ................ 355
Figure 3.9: Illustrations of Transition between Two Distinct Phases–Strong and Weak
..................................................................................................................... 357
Figure 3.10: Scree Plot for Network j3037-6 (Norm II) ..................................... 365
Figure 3.11: Scree Plot for Network j6028-9 (Norm II) ..................................... 365
Figure 3.12: Scree Plot for Network j9010-5 (Norm II) ..................................... 366
Figure 3.13: Scree Plot for Network j12014-1 (Norm II) .................................... 367

xii

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
List of Tables

Chapter 1 – Introduction.............................................................................................................. 1
Table 1-1: Activity Float Types and their Mathematical Equations ....................... 18
Table 1-2: Logic Constraints for Link Connections between Activities in a CPM
Network Schedule ............................................................................................. 20
Table 1-3: Tabular Presentation of n Measurements on p Variables ...................... 49
Table 1-4: Computations for Constructing a Q-Q Plot: Normal Distribution .......... 59
Table 1-5: Kolmogorov-Smirnov Test - Critical Values between Data Sample and
Hypothesized CDFs .......................................................................................... 78
Table 1-6: Illustrations of n and p-Values in Various Fields of Applications of
Multivariate Analysis ...................................................................................... 104
Table 1-7: Decision Rules When Testing H 1 to Reject H 0 Given α ...................... 109
Table 1-8: Type I Error versus Type II Error .................................................... 110

Chapter 2 - An Investigation of the Underlying Behavior of Construction Project Network


Schedules.................................................................................................................................... 122
Table 2-1: Summary of a Wishart Real Model ................................................... 156
Table 2-2: Components of the Marchenko - Pastur Law ..................................... 160
Table 2-3: Statistical Properties of F 1 /F 4 for Various k Values ........................... 180
Table 2-4: Statistical Properties of F 2 for Various k Values ............................... 180
Table 2-5: Conceptual Analogy: The Tracy-Widom Distribution Laws–A Synthesis
with References .............................................................................................. 186
Table 2-6: Conceptual Analogy: Applications of the Tracy-Widom Distributions 187
Table 2-7: Conceptual Analogy: Universality Summary of the Tracy-Widom
Distribution Laws ........................................................................................... 189
Table 2-8: Conceptual Analogy: Construction Scheduling Theory ...................... 190
Table 2-9: Conceptual Analogy: Construction Scheduling Theory and Applications
..................................................................................................................... 191

xiii

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Table 2-10: Benchmark Schedule Information ................................................... 194
Table 2-11: Exemplar Network Activity Probabilistic Durations and Information 198
Table 2-12: Exemplar Network Dependency Matrix .......................................... 202
Table 2-13: All Paths from Source to Sink of the Exemplar Network .................. 206
Table 2-14: All Critical Paths of the Exemplar Network for 100 Simulations ...... 207
Table 2-15: All Critical Paths of Network j301 for 100 Simulations ................... 208
Table 2-16: Interpretation of the Complexity Measure (Cn) Values .................... 219
Table 2-17: Exemplar Network Reachability Matrix .......................................... 223
Table 2-18: Illustration of a Sample Data Matrix Derived from Early Finish Times of
Project Network Activity ................................................................................. 229
Table 2-19: A Project Network Scheduling Mathematical Model ........................ 231
Table 2-20: Normalization Methods for Scaling the mth Eigenvalue of Sample
Covariance Matrix S NET ................................................................................... 236
Table 2-21: Summary of Complexity Measure Computations ............................. 243
Table 2-22: Complexity Measure Values of few Networks ................................. 252
Table 2-23: Identified Benchmark Networks for Underlying Behavior Study ....... 254
Table 2-24: Sample Sizes and Numbers of Data Points Required for Project Networks
..................................................................................................................... 255
Table 2-25: Optimal Sample Size Predictions for all Networks of Interest .......... 271
Table 2-26: Optimum Sample Sizes of Networks of Equal Complexities ............. 272
Table 2-27: Effect of α on the Optimal Sample Size of Network j3032-4 ............ 273
Table 2-28: A Few Percentage Points of the Kolmogorov-Smirnov Test Statistics277
Table 2-29: Kolmogorov-Smirnov Test of Goodness of Fit for the 1 st Largest
Eigenvalues of Project Networks ..................................................................... 279
Table 2-30: K-S Test of Goodness of Fit for the 4 th Largest … .......................... 280
Table 2-31: K-S Test of Goodness of Fit – All Test Results with Norm I… ......... 283
Table 2-32: K-S Test of Goodness of Fit – All Test Results with Norm II… ....... 284

Chapter 3 - Application of PCA for Data Reduction in Modeling Project Network Schedules
Based on the Universality Concept in RMT ........................................................................... 311

xiv

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Table 3-1: Identified Project Networks for PCA ................................................ 363
Table 3-2: Principal Components of the Project Network j3037-6 (Size 49 x 32) . 368
Table 3-3: Principal Components of the Project Network j3037-6 (Size 52 x 32) . 369
Table 3-4: Principal Components of the Project Network j6028-9 ...................... 369
Table 3-5: Principal Components of the Project Network j9010-5 ...................... 370
Table 3-6: Principal Components of the Project Network j12014-1 ..................... 371
Table 3-7: p-Values of the Tracy-Widom Distribution ....................................... 372
Table 3-8: Threshold Value for a Phase Transition (Baik et al. 2005) ................. 373

CHAPTER 4 - Summary and Conclusions............................................................................. 381

xv

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
List of Equations
Chapter 1 – Introduction.............................................................................................................. 1
Equation 1.1: Expected Mean Duration General Expression ................................. 13
Equation 1.2: General PERT Expected Standard Deviation .................................. 13
Equation 1.3: PERT Expected Standard Deviation for α=0 ................................... 13
Equation 1.4: General Expression of the Standard Deviation and Variance ........... 14
Equation 1.5: Vector Notation as a Function of its Coordinates in 2D ................... 24
Equation 1.6: Vector Length and Direction as a Function of its Coordinates in 2D 24
Equation 1.7: Vector Notation as a Function of its Coordinates in 3D ................... 24
Equation 1.8: Vector Length and Direction as a Function of its Coordinates in 3D 25
Equation 1.9: Vector Summation ........................................................................ 25
Equation 1.10: Vector Product with a Scalar ....................................................... 25
Equation 1.11: Vector Product ........................................................................... 26
Equation 1.12: Vector Product Equation in 2D .................................................... 26
Equation 1.13: Vector Product Equation in 3D .................................................... 27
Equation 1.14: Linearly Dependent Vectors ........................................................ 28
Equation 1.15: Linearly Independent Vectors ...................................................... 28
Equation 1.16: General Term of a Matrix ........................................................... 29
Equation 1.17: Identity Matrix ........................................................................... 29
Equation 1.18: Zero Matrix ............................................................................... 30
Equation 1.19: Triangular and Diagonal Matrix Representations .......................... 30
Equation 1.20: Equal Matrices ........................................................................... 31
Equation 1.21: Product Matrix by a Constant ...................................................... 31
Equation 1.22: Matrix Product ........................................................................... 32
Equation 1.23: Determinant Expression of an n n Matrix Determinant ............... 33
Equation 1.24: Determinant Expression of an 2 2 Matrix .................................. 33
Equation 1.25: Singular Matrix Condition .......................................................... 34
Equation 1.26: Inverse Matrix Condition ............................................................ 35
Equation 1.27: Entries of the Inverse Matrix of a Given Matrix A ........................ 35

xvi

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Equation 1.28: Inverse of an 2 2 Matrix ........................................................... 35
Equation 1.29: Inverse of an 3 3 Matrix ........................................................... 36
Equation 1.30: Trace of a Square Matrix ............................................................ 36
Equation 1.31: Eigenvalue and Eigenvector Relationship Equation ....................... 37
Equation 1.32: Spectral Decomposition of an n x n Matrix .................................. 37
Equation 1.33: Determinant Representation of a Matrix Characteristic Equation ... 38
Equation 1.34: Polynomial Representation of a Matrix Characteristic Equation ..... 38
Equation 1.35: Eigenvalues of a Triangular Matrix ............................................. 38
Equation 1.36: Eigenvalues of the Power A k of a Matrix...................................... 38
Equation 1.37: Determinant of a Matrix Terms of its Eigenvalues ........................ 38
Equation 1.38: Trace of a Matrix as a Function of its Eigenvalues........................ 38
Equation 1.39: Orthogonal Matrix Reciprocal Results ......................................... 39
Equation 1.40: Matrix Expansion for Singular Value Decomposition .................... 40
Equation 1.41: Singular Value Decomposition-Diagonal Matrix ⋀ Entries ............ 40
Equation 1.42: Singular Value Decomposition of a Matrix in Terms of its Rank .... 40
Equation 1.43: Product Matrix AA T as a Function of its Eigenvalues/Eigenvectors 40
Equation 1.44: Conditions on Single Value Decomposition Positive Coefficients .. 41
Equation 1.45: Single Value Decomposition Left and Right-Single Vectors .......... 41
Equation 1.46: Quadratic Form of a Symmetric and Positive Definite Matrix ........ 41
Equation 1.47: Matrix Notation: Array Representing n Measurements on p Variables
....................................................................................................................... 49
Equation 1.48: Cumulative Relative Frequency .................................................. 52
Equation 1.49: Q-Q Plot: Quantile of the Theoretical CDF .................................. 56
Equation 1.50: Q-Q-Plot: Hypothesized Quantiles Corresponding to Observed CDFs
....................................................................................................................... 57
Equation 1.51: Probability levels for a Standard Normal Random Variable ........... 58
Equation 1.52: Theoretical Cumulative Probability for Use in P-P Plots ............... 60
Equation 1.53: Sample Mean ............................................................................. 62
Equation 1.54: Sample Median .......................................................................... 62

xvii

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Equation 1.55: Sample Variance Expression ....................................................... 63
Equation 1.56: Sample Skewness ....................................................................... 64
Equation 1.57: Sample Kurtosis ......................................................................... 64
Equation 1.58: Sample Mean x ........................................................................... 66
Equation 1.59: Sample Variance ........................................................................ 66
Equation 1.60: Sample Covariance ..................................................................... 67
Equation 1.61: Sample Correlation Coefficient ................................................... 67
Equation 1.62: Chi-Square Test: Observed CDF Expression ................................. 71
Equation 1.63: Chi-Square Test: Hypothesized CDF Expression ........................... 71
Equation 1.64: Chi-Square Test: Test Statistic Q 2 ............................................... 71
Equation 1.65: Chi-Square Test – Rejection/Critical Region Criterion .................. 72
Equation 1.66: K-S Test: Test Statistic D 2 .......................................................... 75
Equation 1.67: Kolmogorov-Smirnov – Rejection/Critical Region Criterion .......... 76
Equation 1.68: Kolmogorov-Smirnov Test– Observed CDF Expression ................. 77
Equation 1.69: K-S Test –Differences between Empirical and Assumed CDFs ...... 77
Equation 1.70: Straight-Line Distance and its Generalization ............................... 80
Equation 1.71: Statistical Distance of a Pair of Coordinate Variables ................... 82
Equation 1.72: Cumulative Probability Function of a Discreet Variable ................ 84
Equation 1.73: CDF Properties of a Discrete Random Variable ............................ 85
Equation 1.74: Probability Mass Function of a Random Variable ......................... 85
Equation 1.75: Expected Value of a Discrete Variable ......................................... 86
Equation 1.76: rth Moment of a Discrete Random Variable .................................. 86
Equation 1.77: Variance and Standard Deviation of a Random Variable ................ 86
Equation 1.78: Probability Distribution and Density Functions of a Continuous
Random Variable .............................................................................................. 88
Equation 1.79: Expected Value of a Continuous Variable .................................... 89
Equation 1.80: rth Moment of a Continuous Random Variable ............................. 89
Equation 1.81: Variance of a Continuous Random Variable ................................. 89
Equation 1.82: Skewness of a Continuous Random Variable ................................ 90

xviii

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Equation 1.83: Kurtosis of a Continuous Random Variable .................................. 90
Equation 1.84: PDF, CDF, Mean, and Variance of a Uniform Distribution ............ 92
Equation 1.85: PDF, CDF, Mean, and Variance of a Triangular Distribution ......... 94
Equation 1.86: PDF, CDF, Mean, and Variance of a Normal Distribution ............. 95
Equation 1.87: Standard Normal Random Variable .............................................. 95
Equation 1.88: PDF, CDF, Mean, and Variance of a Beta Distribution .................. 98
Equation 1.89: Variance and Standard Deviation of a Random Variable ................ 98
Equation 1.90: PDF, Mean, and Variance of a Chi-Square Distribution ............... 100
Equation 1.91: Joint Distribution Functions for Random Variables ..................... 102
Equation 1.92: Joint Distribution Functions for Independent Variables ............... 102
Equation 1.93: Covariance of a Pair of Random Variables ................................. 103
Equation 1.94: Standard Deviation of a Random Variable .................................. 111
Equation 1.95: Expression of the p-Value ......................................................... 111

Chapter 2 - An Investigation of the Underlying Behavior of Construction Project Network


Schedules.................................................................................................................................... 122
Equation 2.1: Average Joint PDF of Coulomb Gas Particles ............................... 136
Equation 2.2: Joint Density Functions Pβ of Eigenvalues of Gaussian β-Ensemble
Matrices ......................................................................................................... 137
Equation 2.3:Quaternionic Entries of a Symplectic Matrix Q ............................. 140
Equation 2.4:Expression of a Quaternionic Matrix ............................................ 140
Equation 2.5: Sample Covariance Matrix S ....................................................... 142
Equation 2.6: General Sample Covariance Matrix S Σ ......................................... 142
Equation 2.7: Density Function of the Wishart Distribution Wp n, Σ . ................. 143
Equation 2.8: Spectral Decomposition of the Sample Covariance Matrix S .......... 144
Equation 2.9: SVD of A Matrix X .................................................................... 144
Equation 2.10: General Joint Density Function of the Eigenvalues of S .............. 145
Equation 2.11: Multivariate Gamma Function ................................................... 145
Equation 2.12: General Form of a Joint Density Function of Eigenvalues ........... 145
Equation 2.13: Spectral Decomposition of Σ ..................................................... 145

xix

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Equation 2.14: Population Covariance Matrix ................................................... 147
Equation 2.15: Random Matrix Representation 𝛴 .............................................. 148
Equation 2.16: Expression of a Sample Data Matrix X ....................................... 148
Equation 2.17: Quadratic Form of a Symmetric and Positive Definite Matrix ...... 149
Equation 2.19: Unbiased Sample Covariance Matrix S....................................... 149
Equation 2.20: Sample Correlation Matrix R .................................................... 150
Equation 2.21: Relation between the Sample Correlations R and Variances S ...... 150
Equation 2.22: Sample Data Matrix A .............................................................. 152
Equation 2.23: Joint Density of n Independent Samples ..................................... 152
Equation 2.24: Expression of the Sampling Mean of n Independent Samples ....... 153
Equation 2.25: Sampling Covariance Matrix (n-1)S ........................................... 153
Equation 2.26: Null Case: Simplified Integral Expression .................................. 155
Equation 2.27: Null Case: Density Function of Eigenvalues of S ........................ 155
Equation 2.28: Empirical Spectral Distribution (ESD) of a Random Matrix ......... 157
Equation 2.29: Wigner Semicircle Law, General Expression .............................. 158
Equation 2.30: Wigner Semicircle Law Defined on -2, 2 .................................. 158
Equation 2.31: Wigner Semicircle Law on [-1, 1] .............................................. 159
Equation 2.32: Marchenko-Pastur Law ............................................................. 161
Equation 2.33: Asymptotic Behavior of the Largest Eigenvalue in Terms of the
Wigner Semicircle Law and Marchenko Pastur Distributions’ Top Edge Support . 164
Equation 2.34: Distribution Function for the Largest Eigenvalue of a GOE/GSE/GUE
Matrix A ........................................................................................................ 165
Equation 2.35: Limiting Distribution the Largest Eigenvalue of a GOE/GSE/GUE
Matrix A ........................................................................................................ 165
Equation 2.36: Theoretical Expressions of the Tracy-Widom Distribution Laws F 1 ,
F 2 , and F 4 ....................................................................................................... 165
Equation 2.37: Painlevé II Equation Related to the TW Laws ............................. 165
Equation 2.38: Solution of the Painlevé II Equation in Terms of the Airy Equation
..................................................................................................................... 165

xx

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Equation 2.39: Johnstone 2001’s Centering and Scaling Constants μ n p and σ np .... 167
Equation 2.40: Johnstone’s 2001 Celebrated Theorem ....................................... 167
Equation 2.41: Johnstone 2006’s Ad Hoc Centering and Scaling Constants μ np and σ np
..................................................................................................................... 168
Equation 2.42: Soshnikov’s 2002 Theorem ....................................................... 169
Equation 2.43: Expression of the Rescaled m th Eigenvalue of a GOE, GUE, or GSE
Matrix A ........................................................................................................ 171
Equation 2.44: Convergence of the Largest Eigenvalue of a GOE, GUE, or GSE
Matrix A ........................................................................................................ 171
Equation 2.45: Law of the m th Largest Eigenvalue in GOE as a Recurrence of F2s, m
..................................................................................................................... 172
Equation 2.46: Expression of the Argument D2s, λ as a Recurrence of F2s, m ....... 172
Equation 2.47: Law of the m th Largest Eigenvalue in GUE/GSE as a Recurrence of
Fβs, m with β 1, 4 .......................................................................................... 173
Equation 2.48: Interlacing Property Between GOE/GSE in terms of F1 and F4 .... 173
Equation 2.49: Right Tail of the 𝛽-TW Distribution (Dumaz and Virág 2013) ..... 177
Equation 2.50: Approximation of the TW Equation: q 0 ...................................... 178
Equation 2.51: Approximation of the TW Equation: q 1 ...................................... 178
Equation 2.52: Criticality Index of an Activity ................................................. 209
Equation 2.53: Pascoe’s Network Complexity Measure ...................................... 215
Equation 2.54: Ratio of Network Paths ............................................................. 215
Equation 2.55: Johnson Network Complexity Measure ...................................... 216
Equation 2.56: Redundant Link Measure .......................................................... 219
Equation 2.57: C n Network Complexity Measure ............................................... 220
Equation 2.58: Complexity Measure-Density (OS) ............................................ 221
Equation 2.59: RT Network Complexity Measure .............................................. 222
Equation 2.60: Standardized Sample Data Matrix .............................................. 232
Equation 2.61: Random Chi-Square Matrix R ................................................... 233
Equation 2.62: Constant Defining the Random Matrix R .................................... 233

xxi

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Equation 2.63: Standardized Sample Data Matrix .............................................. 233
Equation 2.64: Sample Covariance Matrix (S) for Probabilistic Durations of a Project
Network ......................................................................................................... 234
Equation 2.65: Definition of the Constant c in the Expression of the Sample
Covariance Matrix S ....................................................................................... 234
Equation 2.66: Deviations between the Empirical Mean/Variance of the mth
Normalized Largest Eigenvalues and TW 1 Mean/TW 1 Variance ......................... 240
Equation 2.67: Hypothesis Formulation for Distributional Assumption Testing ... 277

Chapter 3 - Application of PCA for Data Reduction in Modeling Project Network Schedules
Based on the Universality Concept in RMT ........................................................................... 311
Equation 3.1: General Formulation of Population Principal Components ............ 322
Equation 3.2: Variance and Covariance of Population Principal Components ...... 322
Equation 3.3: ith Population PC (Result 1) ....................................................... 323
Equation 3.4: Variance and Covariance of Population PCs (Result 1) ................. 323
Equation 3.5: Link Between the Population Covariance Matrix and Pcs .............. 324
Equation 3.6: Proportion of the Total Population Variance Due to k-th PC .......... 324
Equation 3.7: Correlation Coefficients between the Population PCs Y and Random
Variables X .................................................................................................... 325
Equation 3.8: PCs for Standardized Variables ................................................... 326
Equation 3.9: PCs for Standardized Population Variables (Matrix Notation) ....... 326
Equation 3.10: Covariance of Standardized Population PCs Z ............................ 326
Equation 3.11: ith Standardized PC (Result 4) .................................................. 327
Equation 3.12: Link Between Variances of Standardized and Pcs (Result 4) ........ 327
Equation 3.13: Correlation Coefficients between Standardized Variables’ PCs and
Original Random Variables ............................................................................. 327
Equation 3.14: Proportion of the Total Standardized Population Variance Due to kth
PC ................................................................................................................. 328
Equation 3.15: Covariance Matrices with Special Structures (Example 1) ........... 328
Equation 3.16: Covariance Matrices with Special Structures (Example 2) ........... 329

xxii

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Equation 3.17: PCs Information for Special Structures (Example 2) ................... 330
Equation 3.18: Quadratic Maximization of a Sample Covariance Matrix ............. 332
Equation 3.19: Elements Defining Sample Principal Components ....................... 332
Equation 3.20: Principal Components of Centered Sample Observations ............. 333
Equation 3.21: Standardization of a Sample Data Matrix (Example) ................... 334
Equation 3.22: Sample Mean Vector and Covariance Matrix of Standardized Data
(Example) ...................................................................................................... 334
Equation 3.23: ith Sample Principal Component of Standardized Observations ... 335
Equation 3.24: Proportion of the Total Standardized Sample Variance Due to kth PC
..................................................................................................................... 335
Equation 3.25: PCA in Terms of SVD .............................................................. 336
Equation 3.26: ith Principal Component Based on Geometric Interpretation ........ 337
Equation 3.27: Mahalanobis Distance from the Sample Mean ............................. 337
Equation 3.28: jth Observation Expressed as a Linear Combination of the Complete
Set of Eigenvectors of The Sample Covariance Matrix ...................................... 341
Equation 3.29: Effect of the Last PCs’ Magnitudes on the Data Predictions ........ 341
Equation 3.30: Multiple Linear Regression Model ............................................. 343
Equation 3.31: Linear Regression Assumptions ................................................. 343
Equation 3.32: Matrix Notation of a Multiple Linear Regression Model.............. 343
Equation 3.33: Residual Sum of Squares S (General Formula) ........................... 344
Equation 3.34: Vector of Residuals ε ................................................................ 345
Equation 3.35: Regression Model in Terms of PCs ............................................ 346
Equation 3.36: Residuals’ Covariance Matrix (PCA) ......................................... 347
Equation 3.37: Sphericity Test (PCA) .............................................................. 348
Equation 3.38: Mauchly’s Sphericity Test Statistic ........................................... 349
Equation 3.39: Mauchly’s Sphericity Test in Terms of Critical Region ............... 350
Equation 3.40: Sphericity Test: Johnstone (2001)’s Theorem ............................. 350

CHAPTER 4 - Summary and Conclusions............................................................................. 381

xxiii

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
List of Abbreviations

SK Sherrington-Kirkpatrick
ADM Arrow diagramming method
AOA Activity-on-Arrow
AON Activity-on-Node
CDF Cumulative distribution function
CI Complexity index
CNC Coefficient of network complexity
CPM Critical Path Method
EF Early Finish
ES Early Start
ESD Empirical Spectral Distribution
FF Free float
FS Finish-to-start
FTS Finish-to-start
GOE Gaussian Orthogonal Ensemble
GPM Graphical Planning Method
GSE Gaussian Symplectic Ensemble
GUE Gaussian Unitary Ensemble
KLT Karhunen-Loeve Transform
LDM Logic Diagramming Method
LF Late finish
LS Late start
LSM Linear scheduling method
NHST Null Hypothesis Significance Testing
OE Orthogonal ensemble
OS Order of Strength
PCA Principal Component Analysis

xxiv

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PDF Probability distribution function
PDT Project development team
PERT Program Evaluation and Review Technique
PSPLIB Project networks from the Project Scheduling Problem Library
RMM Random matrix model
RMT Random matrix theory
SE Symplectic ensemble
SF Start-to-finish
SVD Single value decomposition
TF Total float
TW Tracy-Widom
UE Unitary ensemble
VBA Visual Basic for Applications
WBS Work breakdown structure

xxv

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Acknowledgment

First and foremost, I would want to convey my heartfelt gratitude to my adviser, Dr. Gunnar Lucko,

who has guided, encouraged, and supported me.

I would like to convey my sincere appreciation to Dr. Richard C. Thompson, Jr., and Dr. Bismark

Agbelie, members of my committee. I appreciate their guidance and advice.

I am grateful to the Department of Civil Engineering at The Catholic University of America for

allowing me to remotely access many computers required to run simulations at the same time.

Finally, I thank God for the intelligence and patience He has given upon me to complete my

dissertation!

xxvi

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Introduction

Abstract

This chapter aims to introduce the current research study and other background materials required

to understand the development of advanced concepts in subsequent chapters. The materials

primarily address conventional project scheduling strategies based on deterministic and

probabilistic approaches. In addition, background materials address concepts from applied

mathematics and probability. This chapter begins by introducing projects and their associated

planning and scheduling approaches. Following that, this chapter discusses vectors and matrices

in general. The chapter next discusses descriptive and inferential statistics in greater detail. As a

result, this chapter discusses the prerequisites for probability distributions and random variables.

Additionally, it gives context for multivariate data and their analysis. Finally, the chapter

concludes with a synopsis of each subsequent chapter.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
2

1.1 Background

For decades, practitioners in the construction industry have struggled with project delivery delays.

The risk variables that contribute to its occurrence have been thoroughly explored. For example,

Tafazzoli and Shrestha (2017) discovered fourteen primary reasons for delays in the United States

alone after surveying more than 10,000 construction specialists participating in various projects

with multiple delivery systems and ownerships. Among them, the authors cited "unrealistic

schedules (bid durations that are too short)" and modification orders, both of which have been

linked to increased schedule growth and delayed construction projects in numerous studies

(Tafazzoli and Shrestha 2017, p.144 and 117). To address project schedule delays, researchers

have undertaken extensive analysis using a variety of approaches to uncover delay drivers (Stumpf

2000, Lucko et al. 2018, Bagaya and Song 2016) and propose remedies (Youcef and Andersen

2018). On the one hand, several writers have developed ways that are based on traditional

deterministic and probabilistic Critical Path Method (CPM) scheduling techniques. On the other

hand, others (e.g., BD+C Staff 2018) have investigated artificial intelligence to forecast the future

condition of construction scheduling techniques.

Additionally, resiliency in project schedules has been a goal in resolving schedule-related delays.

Han and Bogus (2011) defended their use of these approaches: " A disrupted schedule becomes

resilient when the construction workers, instead of sitting idle, are reassigned to other tasks they

otherwise were not expecting." Thus, to incorporate resiliency into construction schedules, one

must account for resource availability and interactions between the various parties involved in the

construction activities. However, due to the unpredictable nature of these interactions, particularly

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
3

as the number of activities increases, analyzing construction network schedules becomes like

studying the Cuernavaca bus problem, which exhibits a ubiquitous pattern known as universality

(Baik et al. 2006). As a result, to assist in resolving persistent delays and mitigating their

consequences during the design and construction of projects, the current study intends to adapt and

adopt methodologies based on the Tracy-Widom distribution laws, which scholars observed

emerging in "[s]ytems of many interacting components" (Wolchover 2014, p. 2).

Furthermore, because network schedules are composed of construction activities, the Tracy-

Widom limit law s may aid in elucidating the underlying behavior of their complex interactions.

If so, this may help prevent delays, improve on-time delivery using resilient schedules, and propose

remedies that may aid in making more accurate forecasts of project duration and cost. Figure 1.1

depicts the operations that must be performed to accomplish the primary objective of this research

study, which is to develop realistic solutions to the delays problem in the building of projects based

on their project network schedules.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
4

Figure 1.1 Dissertation Flowchart


5

1.2 Network Schedules

This section will provide an overview of the elements employed in construction network schedules

due to the scope of this study. The following development will place the most significant emphasis

on the time or duration of each activity involved with the construction of projects.

1.2.1 Definitions

Project and Activities

Harris (2006) defines a project as a set of operations or activities that need to be completed in

logical order. These activities, all together, define not only the project goal but cannot be performed

by following any arbitrary order (Adeli and Karim 2001). Accordingly, all project activities must

be carefully planned by considering other project activities and constraints. For instance, in vertical

construction, the activity “pour and finish concrete slab” can only occur after the activity

“formwork for concrete” activity has been completed. Thus, a rigorous schedule of all project

activities and their relationships is necessary to achieve the project goal, which typically involves

major expenses to procure materials and equipment and compensate personnel.

Network

The Free Dictionary by Farlex defines a network, also known as a net, as an interconnected system

of things or people. Depending on the things or people being connected, its meaning varies across

different fields. For instance, in computer science, any system that allows an exchange of

information between interconnected computer systems or akin equipment is termed a network. In

biology (ecosystem), the term network refers to a system allowing random interactions between

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
6

species such as people or turtles. In construction management and engineering, project tasks or

activities that need to be executed in a specific order to complete the project successfully are known

as a network. Because their execution sequence is restricted by the project's available resources

and the construction methodology and practical methodology (Adeli and Karim 2001, p. 49), these

activities are methodically interconnected. For, project activities are networks.

Construction schedule

A schedule is generally a term used to refer to a plan for either performing work or achieving an

objective by following a specific order to complete each part within the allotted time. For instance,

a bus schedule lists all departure and arrival times of buses. Similarly, a construction schedule is

“traditionally defined as the timetable of the execution of tasks in a project” (Adeli and Karim

2001). Accordingly, it is a crucial piece of a document contained in the contract documents

designed to define each party’s responsibility to complete the project on time and within budget

and make available the project documents to all parties. Such participants in the construction

industry would be the owner of the project, the contractor or executor of the project, the project's

designer, financial institutions, and the project manager (PM). In addition, the project manager

may have the scheduler's role inherent in developing and maintaining the schedule during the

project life cycle. Since scheduling a project is a painstaking task that can become cumbersome

for large projects with hundreds or thousands of activities, software such as Primavera P6 or

Microsoft Project allows schedulers to carry out this task effectively.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
7

1.2.2 Planning a Construction Project

Developing a project schedule requires a collaborative effort between the scheduler in charge of

developing a project schedule and other project team members involved with the project's design

phase. This collaborative effort is crucial for the scheduler to become acquainted with the project

information, which is usually available to all parties involved. The project information includes

the project scope, specifications, plans, project execution plan, contracting and procurement plan,

equipment lists, and operations. Getting technical inputs and feedback from the project

development team (PDT) or subject matter experts are essential to creating sound and resilient

schedules. After understanding the project and collecting the project relevant data such as project

total duration, budget, and applicable construction methods, depending on the project complexity,

a project scheduler would either use software or an Excel spreadsheet to plan and schedule the

project activities by following five main steps.

In step 1, a project is broken down into individual activities. For this step, the work breakdown

structure (WBS) is used to dissect the project’s work into smaller and logical chunks. In step 2,

the duration, predecessors, and successors, of each activity are established and calculations of its

start and finish dates are performed. In step 3, resources such as people, equipment, or materials

for the execution of each activity are allocated. For instance, during this step also known as

resource leveling for dealing with limited resources, activity start dates may be postponed until

resources become available. In step 4, each activity actual progress, is monitored, and the original

schedule is amended if necessary. In the last step, the consumption of resources is monitored, and

resources needed to complete the project is re-estimated (Harris 2006). Accordingly, planning and

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
8

scheduling a project is an elaborate task which can be successfully accomplished through a

collaborative effort between the project stakeholders and a great understanding of the project

intricacy. Since this study uses generated networks as benchmark schedules, planning a complete

project is not required except for schedule calculations performed in step 2 to determine activity

start and finish dates. This topic will be covered in a subsequent section.

1.2.3 Visual Displaying a Construction Project Schedule

1.2.3.1 Gantt or bar Charts

Pioneered by Henry L. Gantt and Frederick W. in the early 1900s, Gantt charts are diagrams

partitioned into rows and columns drawn to schedule a project or work. Columns are used as a

timescale to represent activity durations which may be expressed in either hours, days, weeks, or

months. In contrast, project activities are scheduled in rows as horizontal bars drawn within the

timeframe of the Gantt chart. As shown in Figure 1.2, each bar's start and endpoints correspond to

the beginning and end of successful events that must be carried out to complete an activity. For

instance, the activity “pour ready mix concrete” starts with the event “delivery of the ready-mix to

the project site,” followed by the event “on-site inspection and testing of the mix” to ensure the

engineered design is met…, and finish with the event “allow a proper curing of concrete.” The bar

length represents the activity duration. In addition, horizontal bars can overlap, and the end point

of each horizontal bar, describing a project activity, indicates the relationship between that activity

and the following one (Uher and Zantis 2012). Due to their simplicity and ease of understanding,

Gantt charts would be appropriate for small to midsize projects or even portions of a large project.

However, Gantt charts would not be suitable for scheduling large projects that generally possess

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
9

many activities and complex relationships between activities. Moreover, according to (Adeli and

Karim 2001), they “are not recommended as the primary tools for monitoring and managing

projects.”

Days
Activities 2 4 6 8 10 12
Approve shop drawings by owner
Mobilization
Excavate soil for concrete work
Prepare and install rebars and form work
Pour ready mix concrete
Strip forms
Clean up site/Demobilization

Figure 1.2: Gantt Chart Schedule for a Concrete Wall Construction

1.2.3.2 Network Diagrams

Network diagrams are intensively used in different fields for scheduling or modeling purposes. In

the field of construction engineering and management, project planners and managers widely use

them to represent project schedules. They are composed of nodes, arrows, numbering or lettering

to express activity resources and activity logic constraints. Depending on whether activities are

represented on nodes or arrows to form a network, the resulting network schedule is either an

activity-on-node (AON) diagram or an activity-on-arrow (AOA) diagram. For an AON diagram,

as illustrated in Figure 1.3(a), nodes and arrows serve to represent respectively activities and

precedence relationships between activities. Conversely, for an AOA diagram, such as the one

shown in Figure 1.3(b), nodes are used to represent events or times of importance while arrows

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
10

correspond to activities. A good example of times of importance would be the start time or finish

time of an activity. Although AON and AOA diagrams convey the same information, the AON

diagram is widely used mostly due to its compact appearance and ability to display more

information at once.

(a) AON network (b) AOA network

Figure 1.3: Activity-on-Node and Activity-on-Arrow Network Diagrams

1.2.3.3 Other Diagrams

Besides Gantt charts and network diagrams, few other forms of graphs are used to represent project

activities. Among them is the Logic Diagramming Method (LDM), a hybrid diagramming method

of the precedence diagramming method and arrow diagramming method (ADM). This method is

used in conjunction with a Graphical Planning Method (GPM) to display activity sequences and

their timing and represent connections between activities in intuitive and versatile ways. LDM

activity notation resembles ADM but is defined on a time scale, while logic links have multiple

arrowheads (Ponce de Leon 2008). Another project planning and management technique

practitioners use to display project schedules is the Linear Scheduling Method (LSM). The LSM

uses a coordinate system with a time axis and another axis to indicate the amount of already

completed work (Lucko 2009).

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
11

1.2.4 Scheduling a Construction Project

Literature on scheduling techniques used by practitioners in the construction industry suggests that

there are various techniques. Among them is the Program Evaluation and Review Technique

(PERT), developed for the Polaris Missile Project of the U.S. Navy in 1958 (Lucko 2017). Another

technique is the CPM, which is well known and extensively used in the industry. Although Walker

of Du Pont and Kelley of Remington Rand later achieved its full potential, in the USA, the CPM

was first introduced in Great Britain sometime in the middle of 1950 during the construction of a

central electricity-generating complex (Uher and Zantis 2012). As devised, for PERT and CPM

scheduling techniques, project activities' precedence is used to calculate activity durations -

performed differently – and display project schedules using AON network diagrams.

1.2.5 Network Schedule Structures

The number of its activities and precedence relationships and a project network can also be

characterized by its topological structure or morphology. Because of the additional information

that can be derived by studying its structure, many scholars have developed various measures to

gauge network structures. These measures allow not only network comparisons but also their

classifications. For instance, Vanhoucke et al. (2008) measured the topological network structures

to evaluate and compare network generators used to generate project scheduling instances. Hence,

it is excellent information to consider while developing a project schedule.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
12

1.3 Construction Scheduling Techniques

1.3.1 PERT

PERT is a construction scheduling technique based on probabilistic activity durations instead of

deterministic. In other words, durations of activities are uncertain and variable rather than fixed

and certain. PERT is equipped with a probability distribution for the activity durations. Schedules

are drawn as AON network diagrams, as shown in Figure 1.4, where each activity carries three

duration values: optimistic, typical, and pessimistic durations. These durations may be determined

at various α-percentiles. Though, it can be challenging to find those values. Practitioners find

reliable data from company records, experience, or industry averages. One of the advantages of

using PERT over the CPM scheduling technique, one can use probability tables to answer the

following question: “What is the probability that the project is done in x days?” (Lucko 2017).

Figure 1.4: Network Schedule with PERT Calculations


Courtesy of Lucko (2017)

The following are durations used for the PERT scheduling technique. The first one is the optimistic

duration (a). An optimistic duration is defined as the duration of an activity if everything works

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
13

well. Thus, it represents the minimum possible duration of an activity. In general, a0, a1, and a5

denote the optimistic durations at 0, 1, and 5 percentiles. It is not likely that one has or ever will

experience 0 percentile. As a result, PERT practitioners have considered different percentiles

without revising the estimation equations. The second one is a typical duration denoted as m. It

represents the usual duration of an activity which is the average duration that an activity follows.

Its estimate may vary whether percentile definitions of a and b are being employed to find the

optimistic and pessimistic durations. The third one is the pessimistic duration, symbolized by the

letter b. The pessimistic duration is the duration of an activity if nothing works as planned.

Accordingly, it is the maximum duration of an activity and can be denoted b100, b99, b95 at 100, 99,

and 95 percentiles, respectively. The last one is termed as the expected mean duration and is

denoted by 𝝁𝜶 . Given an activity, the expected mean duration 𝜇̂ at α-percentile is the weighted

average of a, m, and b as provided in Equation 1.1. The expected mean duration of an entire project

is calculated by summing up the expected mean durations of all activities on the critical path.

𝒂𝜶 𝟒𝒎 𝒃𝟏𝟎𝟎 𝜶 [Eq. 1-1]


𝝁𝜶
𝟔

Given a α-percentile, for any activity on the network schedule, the expected standard deviation sα

or 𝜎 at α-percentile can be found. As defined in Equation 1.2, 𝜎 represent the empirically scaled

difference between a and b. When α is zero, 𝜎 is simply denoted by s which is also given by

Equation 1.3, where K 3.2 for 5-percentile and K 2.7 for 10-perecntile.

𝑏 𝑎 [Eq. 1-2]
𝜎 𝐾

b a [Eq. 1-3]
s σ

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
14

1.3.1.1 Probability Calculation for Completing a Project Using PERT

With the PERT scheduling technique, one can calculate the probability of completing a project in

x days using PERT. The calculation is performed as follows. First, the critical path for minima “a”

determines the minimum project duration and its associated critical path using each activity

minimum duration a. Second, the critical path for maxima “b” computes the maximum project

duration using each activity to calculate the project's duration and associated critical path. Both

project durations are used to calculate the expected project duration and related path. Third, the

expected project variance 𝑣 can be calculated by summing up 𝑣 or 𝑠 initially calculated or set

for each activity and then sum them up to find the expected project variance and standard deviation.

The expected project standard deviation of the entire project is the square root of the expected

variance 𝑣 . That is 𝑠 𝑣 . Last, knowing the project expected duration x, its associated

normalized value z may be computed using Equation 1.4. Thus, a look-up table, such as the normal

distribution lookup table, can be used to find the probability of the project being completed in x

days.

x μ x ∑t [Eq. 1-4]
z
σ s

1.3.1.2 Mode and Shape of the Distribution in Question

The ratio s in Equation 1.3 is susceptible to the shape of each distribution (beta, exponential,

normal, triangular, and uniform) and moderately sensitive to the location of the mode for the

triangular distribution. The normal distribution has a ratio of infinity when plotted with the mode

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
15

on the abscissa scale. Except for normal distribution, s ranges from 3.5 to 6. Conversely, the ratios

𝜎 and 𝜎 respectively vary from 3.12 to 3.35 and 2.52 to 2.77 for all distributions and mode

locations considered. Pearson Tukey, who studied 29 distributions including Pearson, log-normal,

and non-central t curves, obtained similar results. From a statistical standpoint, the use of the 5 and

95 percentiles, or the 10 and 90 percentiles, provides a method for estimating the standard

deviation that is resilient to both mode location and distribution shape fluctuations (Williams

1992).

1.3.2 Critical Path Method (CPM)

1.3.2.1 Nodes and Links

Unlike PERT, the critical method path method is a deterministic method for not inherently

supporting activities that are distributed as random variables (Lindeburg 2011). In other words,

the CPM requires that single or fixed values be used to define the durations of activity considered

as chunks of work necessary to complete a project. The CPM uses an AON network diagram to

display project activity information and their precedence relationships using nodes and links.

Nodes show activity identifications and start and finish dates, whereas links describe the activities’

precedence relationships with other activities. Nodes are drawn as rectangular or square boxes to

include any activity scheduling information by following the general layout provided in Figure 1.5

below provided.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
16

ID = Activity Identification
ES = Early Start Time
EF = Early Finish Time
LS = Late Start Time
LF = Late Finish Time
ST = Start Time
FT = Finish Time
TF = Total Float
D = Duration

Figure 1.5: General Representation of a CPM Activity Node


Adapted from Adeli and Karim (2001, p. 61)

Where:

Activity Identification (ID): an activity ID is required to identify the project activity whose

information is being provided in the box. The activity description – optional - may also be provided

in addition to its identification.

Duration (D): is defined as the normal duration of an activity under normal circumstances.

Early Start Time (ES): an activity ES time is the earliest time an activity may get started.

Early Finish Time (EF): an activity EF time is the sum of an activity ES time plus the activity

duration.

Late start time (LS): an activity LS time is the latest time upon which an activity may be started

without delaying the entire project completion date.

Late finish time (LF): an activity LF is the sum of the activity LS and the activity D.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
17

Total float (TF): The amount of time an activity can be delayed without affecting its successors,

even more so than the overall project schedule. It can be computed by subtracting the sum of its

ES time and duration D from the activity LF time. When the TF equals zero, the activity is said to

be critical.

Free float (FF): The amount of time a non-critical activity can be extended without delaying the

final project completion date. It is also defined as the amount of time an activity may be postponed

without affecting the ES times of its succeeding activities (Uher and Zantis 2012).

Starting float (StF): The Start-to-finish SF time for an activity is the difference between the activity

LS time and ES time. If the starting float for this activity equals “zero,” then the start of the activity

is said to be critical. In this case, any delay in the starting time of this activity would delay the

entire project. Note that an activity with a critical starting time is not necessarily a critical activity,

but the opposite is true.

Finishing float (FnF): The difference between an activity's LF time and its EF time represents the

finishing float for the activity. If the float is “zero,” then the finish time of the activity is critical.

That said, there is no flexibility in the finishing time of the activity. The activity must finish by its

finishing time for the project to complete on time. Note that a critical activity has a critical finishing

float, but the opposite is not necessarily true (Adeli and Karim 2001).

Table 1-1 provides the mathematics of the float types hereby defined since as (Kelley 1961)

pointed out, “several float measures have been tested … As a result, the following measures

…have been found useful.” Those measures were: total, free, and independent floats.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
18

Float type Float Equation Float Highlights

Total float for activity “i” TF 𝐿𝐹 𝐸𝑆 𝐷) Amount = “0” for critical


activities

Free float for activity “i” which FF min 𝐸𝑆 , ,…, 𝐸𝐹 ) Characteristics of non-
has “k” succeeding activities. critical activities

Starting float for activity “i” 𝑆𝑡F 𝐿𝑆 𝐸𝑆 Degree of flexibility for an


activity start time

Finishing float for activity “i” FnF 𝐿𝐹 𝐸𝐹 Degree of flexibility for an


activity finish time

Table 1-1: Activity Float Types and their Mathematical Equations

Unlike nodes, links are drawn by using any of the four different logic constraints, as shown in

Figure 1.6(a) through Figure 1.6(d). These constraints establish precedence relationships between

successive activities using activity start or finish times (T). In addition, there are two other types

of constraints besides the logic constraints that are out of the scope of this study. One of them is

absolute constraints used to indicate the time constraint on the activity's finish and or start time.

The other is buffer constraints defined between pairs of activities to maintain a minimum buffer.

At the same time, they are being executed (Adeli and Karim 2001).

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
19

Figure 1.6: General Representation of CPM Logic Constraints in an AON Diagram


Adapted from Adeli and Karim (2001, p. 61)

Where:

Indices “i” and “j”: are used to refer respectively to the preceding and succeeding activities.

Time lags/leads (L): are positive or negative slack times allowed in the constraint. A positive L is

indicative of a lead-time, while a negative L refers to a lag time. To illustrate this, one may consider

the finish-to-start relationship depicted in Figure 1.5(a). If L is a lead-time, activity “j” can only

start after L time units following the completion of activity “i.” Else, if L is a lag time, activity “j”

can only start after L time units before the completion of activity “i.”

As part of the CPM scheduling process, constraints need to be specified because activities can not

be executed in any arbitrary sequence. Thus, constraints will also always exist between activities,

even if they may be omitted from a project network schedule. Table 1-2 provides the mathematical

expressions of the four constraints in terms of inequalities.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
20

Logic Constraints Mathmatical Expressions / Inequalities

Finish-to-start (FS) 𝑇 𝐷 𝐿 𝑇

Start-to-start (SS) 𝑇 𝐿 𝑇

Finish-to-finish (FF) 𝑇 𝐷 𝐿 𝑇 𝐷

Start-to-finish (SF) 𝑇 𝐿 𝑇 𝐷

Table 1-2: Logic Constraints for Link Connections between Activities in a CPM Network
Schedule

1.3.2.2 Critical Path

A network schedule may have more than one path of critical and or non-critical activities. The

critical path is a path that sequentially connects all critical activities. A network may have one or

more critical paths, but their number is smaller than non-critical paths. Based on real projects

studied with shortest or longest durations, (Kelley 1961) suggested that less than 10 percent of the

activities were critical. To explain this deduction, Kelley (1961) added, "This is probably an

illustration of Pareto’s principle….” Nevertheless, since critical activities on a critical path have

zero floats, delaying any of them would delay the entire project completion date. Also, the duration

of the critical path corresponds to the duration of the project (Adeli and Karim 2001).

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
21

1.3.2.3 Scheduling Algorithm

Based on the forward and backward passes, each starts with an appropriate numbering of nodes.

After that, nodes are numbered consecutively, with the number assigned to the node located at the

tail of an arrow always being more significant than the one assigned to the node at the head of the

arrow.

Step 1: Node Numbering

Start with numbering all N nodes, including dummy activities, so that each number at the tail of

an arrow is always smaller than the one at the arrow’s head. Note a dummy or fictitious activity

with a zero duration may be added at the beginning or end of a project network schedule to connect

all activities that begin or end the project. Next, prepare the schedule data for use in the forward

and backward passes and the calculations of activity floats.

Step 2: Forward Pass

Follow the flowchart provided in Appendix C.1 (p. 409) to conduct the forward pass. This process

allows the determination of each activity at early times. In addition, it will enable the calculation

of the total project duration (𝐷 ) as the maximum of the activity EF times.

Step 3: Backward Pass

Follow the flowchart provided in Appendix C.2 (p. 410) to perform the backward pass. This

process allows the determination of each activity's finish times needed. In addition, it permits the

calculation of the total project duration (𝐷 ) as the maximum of the activity LF times. Note that

the total project duration calculated using the activity finish times is identical to that estimated in

step 2 using EF times.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
22

Step 4: Activity Float Calculations and Identification of the Critical Path

The network critical path is identified as the path that sequentially connects all its critical activities,

which is the path connecting all activities with no float. The activity float can be calculated

following the methodology depicted in the flowchart in Appendix C.3 (p. 411). As previously

mentioned, a network may have more than one critical path. The duration of the critical path equals

the total duration of the project. As a result, the critical path is also known as the network's longest

path.

1.4 Vectors and Matrices

1.4.1 Vector Definitions and Operations

One often runs into numbers in science and engineering represented by quantities such as time,

volume, mass, length, and temperature. These numbers are called scalars that stay constant even

if their coordinate systems change. Alternatively, one may also encounter various fascinating

physical quantities equipped with magnitudes and directions (Weber and Arfken 2005). Examples

include velocity, force, acceleration, electric current, and protons in this group. These quantities

are known as vector quantities and are encountered in pairs in forms of magnitude and directions.

For differentiation purposes, vector quantities are often written with either boldface letters (e.g.,

V) or arrows (e.g., 𝐕⃗). For this study, all vectors will be written with boldface letters except in a

few subsequent illustrations proposed to understand preliminary definitions better. The following

paragraphs define terms associated with vectors, such as vector length, graphical and analytical

representation of a vector by its components, followed by vector operations and proprieties worth

understanding for later use in subsequent chapters.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
23

1.4.1.1 Vector definition and Graphical Representation

Weber and Arfken (2005) define a vector as “a quantity having magnitude and direction.” Both

magnitude and direction are generally used to geometrically represent a vector V. In Figure 1.7(a),

the segment 𝐎𝐀⃗ and its magnitude or length 𝑂𝐴 serve to indicate the direction and magnitude of

the vector V. Unlike V, the segment 𝐀𝐎⃗ may serve to plot the vector -V as shown in Figure 1.7(a).

Similarly, more than one vector can be geometrically added together to form one vector given their

directions and magnitudes. Figure 1.7(b) shows the geometrical addition of a couple of vectors,

V1 and V2, defined by their segments 𝐎𝐀⃗ and 𝐀𝐁⃗ and magnitudes 𝑂𝐴 and 𝐴𝐵 are to form the

resulting vector V. Figure 1.7(c) depicts the addition of four vectors to obtain the vector V.

A
A
A
V

-V
O
O

𝐕 𝐎𝐀⃗; 𝐕 𝐀𝐎⃗ 𝐕 𝐎𝐀⃗; 𝐕 𝐀𝐁⃗; 𝐕 𝐎𝐀⃗; 𝐕 𝐀𝐁⃗; 𝐕 𝐎𝐀⃗; 𝐕 𝐀𝐁⃗;


𝐕 𝐕𝟏 𝐕𝟐 𝐎𝐁⃗ 𝐕 𝐕𝟏 𝐕𝟐 𝐕𝟑 𝐕𝟒 𝐎𝐃⃗
(a) (b) (c)
Figure 1.7: Vector Representation, Vector Summation
Adapted from Newnan (2007), p. 28

Unlike the vectors in Figure 1.7, Figure 1.8(a) depicts vector V in the ax-y coordinate plane based

on its horizontal and vertical components x and y and the OX and OY axis. If the vectors i and j

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
24

have a magnitude of a unit each; moreover, if the vectors x and y have magnitudes respectively of

a and b; hence, Equation 1.5 may represent vector V and Equation 1.6.

𝐕 a𝐢 b𝐣 [Eq. 1-5]

|𝐕| √a b ;𝛼 tan [Eq. 1-6]

where tan-1 is the abbreviation of the trigonometric function “tangent,” defined as the ratio of the

length of the side opposite to the angle α, that is b, to the size of the side adjacent to α, that is a.

Z
Y

γ z
β
o y
Y
α α x

X
x
X

(a) (b)

Figure 1.8: 2D and 3D Analytical Representations of Vector Components


(Adapted from (Newnan 2007, p. 29))

A vector V in a three-dimensional space denoted as 3D can be represented by its components x, y,

and z along three mutual and perpendicular lines OX, OY, and OZ, as shown in Figure 1.8(b). If

the vectors x, y, and z have magnitudes a, b, and c as given in Equation 1.7 and are represented by

the unit vectors i, j, and k, then the magnitude |𝐕| and directions α, β, and γ of V as provided in

Equation 1.8 may be determined as follows

𝐕 a𝐢 b𝐣 c𝐤 [Eq. 1-7]

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
25

|𝐕| √a b c ;α ;β ;γ [Eq. 1-8]


|𝐕| |𝐕| |𝐕|

1.4.1.2 Vector Operations

As geometrically added in the case of four vectors as depicted in Figure 1.7(c), any number of

vectors can be summed up and the resulting vector sum V direction and magnitude can be

determined using Equation 1.9 below provided. Besides adding vectors, vectors can also be either

multiplied by scalars or multiplier among them to produce new vectors or scalars. Multiplying a

vector V by a scalar s produces a vector sV with the same direction as V and a magnitude s times

than V as given by Equation 1.10.

𝐕 𝐕 𝐕 𝐕 ⋯ [Eq. 1-9]
𝑎 𝑎 𝑎 ⋯ 𝐢 𝑏 𝑏 𝑏 ⋯ 𝐣 𝑐 𝑐 𝑐 ⋯ 𝐤

s𝐕 s a𝐢 b𝐣 c𝐤 sa 𝐢 sb 𝐣 sc 𝐤 [Eq. 1-10]

In addition to the previous multiplication type, two vectors can be multiplied through a scalar

product, also known as a dot product. The dot product of two vectors V1 and V2 is denoted as 𝐕 •

𝐕 is equals to the product of the magnitudes of both vectors times the cosine of the angle “θ”

between both vector directions. The dot product expression is as follows: 𝐕 • 𝐕

|𝐕 ||𝐕 |cos θ , where the angle “θ” is as shown in Figure 1.9(a). As a result, the dot product of

two vectors is a scalar number. In addition, the dot product is commutative and distributive over

vector addition. Namely, 𝐕 • 𝐕 𝐕 • 𝐕 and 𝐕 𝐕 •𝐕 𝐕 •𝐕 𝐕 • 𝐕 , respectively.

Moreover, the dot product of a vector is equal to the square of its magnitude. This can be expressed

as 𝐕 • 𝐕 𝐕 |𝐕 | .

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
26

Using the properties of the dot product and the fact that the unit vectors i, j, and k are mutually

perpendicular in a plane or space, the following expression can be derived: 𝐢 • 𝐢 𝐣•𝐣 𝐤•𝐤

1; and 𝐢 • 𝐣 𝐣•𝐤 𝐤•𝐢 0. Thus, the dot product 𝐕 • 𝐕 can be expressed in terms of the

magnitudes of vectors V1 and V2 in a plane as 𝐕 • 𝐕 a a b b or space as 𝐕 • 𝐕 a a

b b c c with a , b and a , b or a , b , c and a , b , c being the coordinates of V1

and V2, respectively. Another type of vector multiplication is called vector product. The vector

product of V1 and V2, denoted as 𝐕 𝐕 and shown in Figure 1.9 (b), may be computed using

Equation 1.11 below, where p is the unit vector perpendicular to the plane containing 𝐕 and 𝐕

and directed by the right-hand rule, and θ is the angle oriented from V1 to V2. Unlike the dot

product, the vector product result is another vector. In addition, the vector product is

anticommutative and distributive over vector addition, namely,𝐕 𝐕 𝐕 𝐕 . and

𝐕 𝐕 𝐕 𝐕 𝐕 𝐕 𝐕 , respectively.

Moreover, the product of a vector by itself is equal to the null vector. That is, 𝐕 𝐕 0.

Through the application of the properties of the dot product and the knowledge that the unit vectors

i, j, and k are mutually perpendicular in a plane or space, the following expression can be derived:

𝐢 𝐢 𝐣 𝐣 𝐤 𝐤 0 and 𝐢 𝐣 𝐤; 𝐣 𝐤 𝐢; 𝐤 𝐢 𝐣. From the previous expressions, the

expressions of 𝐕 𝐕 in the x-y plane or 3D may be derived in Equation 1.12 and Equation 1.13.

𝐕 𝐕 𝐩|𝐕 ||𝐕 | sin θ [Eq. 1-11]

𝐕 𝐕 a 𝑏 a b 𝐤 [Eq. 1-12]

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
27

𝐕 𝐕 𝑏𝑐 𝑏 c 𝐢 a 𝑐 a c 𝐣 a 𝑏 a b 𝐤 [Eq. 1-13]

V1
V2
𝐕 𝐕 y
θ V2 θ
V1 θ
V2
V1

(a) Dot Product of V1&V2 (b) Vector Product of V1&V2 (c) Vector Projection

Figure 1.9: Dot Product, Vector Product, and Projection


Adapted from Newnan (2007, p. 29-30)

1.4.1.3 Vector Projections, Linearly Dependent Vectors

The projection or shadow of V1 on V2, shown in Figure 1.9(c), is a vector that has the same

direction as V2 and a magnitude |𝐕 | cos θ. The concept of projections has a great application in

mechanics as it serves to define the moment of a vector force V1 at the point of application O

located at the intersection of the directions of both vectors which arm length 𝑦 |𝐕 | sin θ

represents the product of two vectors, V1 and V2. In addition to the concept of vector projections,

linear combination and linear span of vectors are possible. A vector y is said to represent a linear

combination of the set of vectors 𝐱 𝟏 , 𝐱 𝟐 , … , 𝐱 𝐤 if there exist k-constants in a way that y can be

expressed as in Equation 1.14. The group of all linear combinations of 𝐱 𝟏 , 𝐱 𝟐 , … , 𝐱 𝐤 is known as

their linear span. For the set of vectors 𝐱 𝟏 , 𝐱 𝟐 , … , 𝐱 𝐤 to be considered as linearly independent, if at

least one of the constants is non-null. In this case, Equation 1.14 becomes Equation 1.15.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
28

Otherwise, the set of vectors 𝐱 𝟏 , 𝐱 𝟐 , … , 𝐱 𝐤 are said to be linearly independent and their set is known

as a basis for the vector space of all k-tuples of k real numbers.

𝐲 a 𝐱 a 𝐱 ⋯ a 𝐱 [Eq. 1-14]

a 𝐱 a 𝐱 ⋯ a 𝐱 𝟎 [Eq. 1-15]

The following example illustrates a case of three linearly dependent sets of vectors, given that any

vector can be written in terms of the other vectors.

2 2 2
𝐱𝟏 1 ,𝐱 3 , 𝐱𝟑 3 , then 3𝐱 2𝐱 𝐱 0 or 𝐱 𝟑𝐱 𝟏 𝟐𝐱 𝟐
0 1 2

1.4.2 Matrix Definitions and Operations

A matrix is a rectangular array of real (ℝ) or complex (ℂ) numbers in horizontal rows and vertical

columns enclosed in brackets or parentheses. The matrix entries can also be functions taking values

in ℝ or ℂ. A matrix can be described by its row and column dimensions. A matrix with m rows

and n columns is an n p matrix (read “m by n”). While a capital letter is generally used to

name a matrix, lowercase letters with double subscripts, such as a in the ith row and jth column

of A in Equation 1.16, denote the matrix entries. Below the letter depicting a matrix, the dimension

of the matrix can be indicated in parentheses, as shown in Equation 1.16. For example, matrix A

entry 𝑎 read “a sub two one,” indicates the entry in the second row and the first column. A

general term is represented by 𝑎 . The notation indicates the entry in row i and column j (Dawkins

2007).

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
29
a a a a a a [Eq. 1-16]
a a ⋯ a a a ⋯ a
𝐀 ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ a ∈ℝ
a a ⋯ a a a ⋯ a

or

𝐀
𝐀 a
mxn

Entries a , a , …, a in Equation 1.16 are called main diagonal entries of A.

1.4.2.1 Square, Identity, and Zero Matrices

Square matrix: If the number of columns m equals the number of rows n, A is said to be a square

matrix.

Identity matrix: An identity matrix is a square matrix that equals that same matrix when multiplied

by another matrix. As provided in Equation 1.17, its main diagonal entries are all “1” while entries

elsewhere are zeros. It is usually denoted by In or I, where n is the matrix size. “The matrix I acts

like “1” in ordinary multiplication 1 ∙ a a ∙ 1 , so it is called the identity …” (Johnson and

Wichern 2019, p.58).

1 0 0 [Eq. 1-17]
⎡0 ⋯
⎢ 1 0 ⎤⎥
𝐈 ⎢ ⋮⎥
⋮ ⋱
⎢ ⎥
⎣0 0 ⋯ 1⎦

𝟎
Zero matrix: Denoted by or simply by 0, a zero matrix is an m n matrix whose entries
m n

are all “0,” as its name implies. Equation 1.18 is its matrix representation.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
30

0 0 0 [Eq. 1-18]
⎡0 0 ⋯
𝟎 ⎢ 0 ⎤⎥
⎢ ⋮ ⋮⎥
m n ⋱
⎢ ⎥
⎣0 0 ⋯ 0⎦

1.4.2.2 Triangular, Diagonal, and Symmetric Matrices

Triangular and diagonal matrices: A triangular matrix is a square matrix denoted by “U” for upper

when all the entries below the main diagonal are all zeros and by “L” for lower when the entries

above the main diagonal are zeros in a lower triangular matrix. Equation 1.19 represents both

matrices. A special case of a triangular matrix is a diagonal matrix where all entries u or l are

zeros except for the ones in the diagonal u or l .

u u u ⋯ u l 0 0 ⋯ 0 [Eq. 1-19]
⎡0 u ⋯ u ⎤ ⎡ 11 ⎤
u ⎢l21 l22 0 ⋯ 0 ⎥
⎢ u ⎥
U ⎢0 0 ⋯ u ⎥ 𝐋 ⎢l31 l32 l33 ⋯ 0⎥
⎢ ⋮ ⋮ ⋮ ⋱ ⋮ ⎥ ⎢ ⋮ ⋮ ⋮ ⋱ ⋮ ⎥
⎣0 0 0 ⋯ u ⎦ ⎣ln1 ln2 ln3 ⋯ lnn ⎦

Symmetric matrix: A symmetric matrix A, such as the one given in the example below, is a square

matrix if its entries verify the following condition.

A symmetric  a a ∀i, j

While A in the example below is a symmetric matrix, B is not since its entries verify the above

condition.

1 0 4 1 0 4
𝑨 0 3 2 ,𝑩 0 3 2
4 2 5 4 1 5

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
31

1.4.2.3 Equal and Transpose Matrices

Equal Matrices: Two matrices A and B of the same dimension n p are equivalent only if their

corresponding entries a and b are identical for all i and j as specified in Equation 1.20,

𝐀 𝐁 [Eq. 1-20]
a b ∀i 1, … , m; ∀ j 1, … , n.
m n m n

Transpose Matrix: The transpose A’ or AT of an n p matrix 𝐀 a , is the p n matrix

whose elements are a . As defined, the transpose of A is obtained by swapping the rows of A

into columns in a way that the first row of A becomes the first column of AT, the second row of A

becomes the second column of AT, and so on. The following is an illustration of an 2 3 matrix

A whose transpose AT is a 3 2 obtained by swapping the rows of A into columns to form AT.

9 3
9 2 1
𝐀 ;𝐀 2 0
3 0 4
1 4

1.4.2.4 Matrix Products and Additions

Matrix Scalar Multiplication: A matrix A may also be multiplied by a constant c to obtain the

product matrix cA, whose elements are obtained by multiplying each entry of A by c as provided

in Equation 1.21.

ca ca ca [Eq. 1-21]
ca ca ⋯ ca
c𝐀 ⋮ ⋱ ⋮
ca ca ⋯ ca

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
32

Matrix Addition/Subtraction and Product: Two matrices, A and B, of the same dimension n p ,

can be added or subtracted. The resulting matrix 𝐀 ∓ 𝐁 with entries a ∓ b is of the same order

m n as A or B. If A and B have their orders in a way that A is n k and B is k p , they

can be multiplied to form a matrix AB of order n p . Elements of AB are obtained by

performing the inner product of each row i of A with column j of B as in Equation 1.22 for the

element at the ith row and jth column of AB.

[Eq. 1-22]
ab a b a b ⋯ a b a b

Let A and B be the below-provided matrices to apply the above definition of a matrix product.

Their product matrix operations are provided to illustrate the above description regarding the

product of two matrices.

𝐀 3 1 4 𝐁 1
If
3x3 3 0 2 and 3 x 1 5
6 2 1 2

3 1 1 5 4 2 0
𝐀 𝐁 3 1 4 1
7
Then, 3 0 2 5 3 1 0 5 2 2
3x3 3x1 18
6 2 1 2 6 1 2 5 1 2 3x1

1.4.2.5 Determinant of a Square Matrix

The renowned German mathematician and philosopher Gottfried Wilhelm von Leibniz introduced

the concept of determinant and its notation (Weber and Arfken 2005, p. 165). Given an n n

square matrix A, the determinant function of A denoted by det 𝐀 or |𝐀| is the sum of all the

signed k 1 k 1 elementary matrices A of A. As defined, |𝐀| is the constant in Equation

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
33

1.23. Each matrix 𝐀 is found by entirely discarding the entries in the first row and jth column of

A. Equally, 𝐀 can be obtained by deletion of the ith instead of the first row.

|𝐀| a ∀k 1 [Eq. 1-23]

|𝐀| a 𝐀 1 or |𝐀| a 𝐀 1 ∀k 1

For larger matrices, it can be tedious to determine det(A) manually using Equation 1.23. In general,

if A is 2 2 , the expression of Equation 1.24 below may be used to manually compute det(A)

as performed in the following instance, illustrating all operations necessary to determine the

determinant of a given matrix.


𝑎 𝑎 [Eq. 1-24]
det 𝐀 |𝐀| 𝑎 𝑎

𝑎 ∙𝑎 1 𝑎 ∙𝑎 1 𝑎 ∙𝑎 𝑎 ∙𝑎

1 2 4
1 3 2 3 2 1
2 1 3 1 1 2 1 4 1
4 1 5 1 5 4
5 4 1

1 1 12 1 2 2 15 1 4 8 5 1

1 13 2 13 4 13 5 13 65

1.4.2.6 Matrix Rank

The row rank represents the maximum number of independent rows deemed as vectors. Likewise,

the column rank of a matrix is the maximum number of its independent column vectors. Below is

an example of the determination of row and column ranks of a matrix based on the section on the

illustration provided on linearly dependent vectors. For the matrix, A below, which columns

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
34

written as vectors are the vectors 𝐱 , 𝐱 , and 𝐱 shown to be linearly dependent. Thus, the column

rank of matrix A is two since columns 1 and 2 are linearly independent.

2 2 2 2 2 2
𝐀 1 3 3 ;𝐱 1 ,𝐱 3 , 𝐱𝟑 3 with 𝐱 3𝐱 2𝐱
0 1 2 0 1 2

The row vectors are linearly dependent, with rows 1 and 2 linearly independent. Hence, the row

rank of matrix A is two equals to the column rank. This result is not surprising since a matrix's row

and column rank are equal (Johnson and Wichern 2019).

2 1 0 0
2 2 3 4 1 0
2 3 2 0

The rank of a matrix is either its row or column rank.

1.4.2.7 Singular and Nonsingular Matrices

Let A be an n n square matrix. A is singular if A has a zero determinant. Alternatively, A is

nonsingular if its determinant is different from zero. As a result, if A is nonsingular, its columns

are linearly independent. Equation 1.25 justifies this result,

𝐀 𝐱 𝟎 𝒙𝟏 𝒂𝟏 𝒙𝟐 𝒂𝟐 ⋯ 𝒙𝒏 𝒂𝒏 [Eq. 1-25]
n n n 1 n 1

where 𝒂𝒊 is the ith column of A. For the condition in Equation 1.22 to hold, x must be an n 1

zero matrix. Otherwise, A is considered a singular matrix. Equally, A square matrix is deemed to

be nonsingular if its rank equals the number of its rows or columns.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
35

1.4.2.8 Invertible matrix

When multiplied with a given element, an element that produces the identity element is called an

inverse of the given element. In terms of matrices, if a matrix A is a nonsingular square matrix of

dimension n n, then a unique n n matrix B is related to A through Equation 1.26 below

provided.

𝐀𝐁 𝐁𝐀 𝐈 [Eq. 1-26]

where I is an identity matrix whose matrix notation is provided by Equation 1.17, matrix B is

called the inverse of matrix A, designated as B-1. In general, given a matrix A whose entries are

𝑎 , the entries 𝑎 of B-1 may be computed using Equation 1.27.

𝐀 [Eq. 1-27]
a 1
|𝐀|

where 𝐀 is the matrix resulting from deleting the ith row and jth column of A.

The expressions of Equation 1.28 and Equation 1.29 are, respectively, general formulas

necessary for the manual computation of any 2 2 or 3 3 matrices. They are worth now

reminding since manual calculations of inverses of matrices become tedious and cumbersome as

their sizes increase.

𝐀 a a 𝐀 𝟏 a a [Eq. 1-28]
a a , |𝐀| 0 → a a
2 2 2 2 |𝐀|

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
36
a a a [Eq. 1-29]
𝐀 a a a , |𝐀| 0
3 3 a a a
𝑎 𝑎 𝑎 𝑎 𝑎 𝑎
⎡ 𝑎 𝑎 𝑎 𝑎 𝑎 𝑎 ⎤
𝐀 𝟏 1 ⎢ 𝑎 𝑎 𝑎 𝑎 𝑎 𝑎 ⎥
⎢ ⎥
3 3 |𝑨| ⎢ 𝑎 𝑎 𝑎 𝑎 𝑎 𝑎 ⎥
⎢ 𝑎 𝑎 𝑎 𝑎 𝑎 𝑎 ⎥
⎣ 𝑎 𝑎 𝑎 𝑎 𝑎 𝑎 ⎦

𝟏
From the expressions of A-1 given by Equation 1.28 or Equation 1.29, it is noticeable that for 𝐀

to exist, the determinant of A as provided in Equation 1.20 must be a non-zero number. In the

following example, the 2 2 matrix A, whose determinant, computed by the mean of Equation

1.23, is equal to 8.

2 1 |𝐀|
𝐀 , 8 0
2 3

Thus, its inverse can be calculated utilizing Equation 1.28 as follows:

3 1
𝑨 𝟏 8 8
1 1
4 4

1.4.2.9 Matrix Trace

Denoted by tr(A), the trace of an 𝑛 𝑛 square matrix whose entries are 𝑎 is the sum of its

diagonal entries 𝑎 . Equation 1.30 provides the mathematical expression of tr(A).

[Eq. 1-30]
𝑡𝑟 𝑨 𝑎

The matrix A in the above example has a trace of 5.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
37

1.4.2.10 Eigendecomposition of a Matrix

Suppose that A is an 𝑛 𝑛 matrix, x is a non-zero vector from the set of real numbers Rn or

complex numbers Cn, and that λ is any scalar so that they are all related through Equation 1.31,

𝐀𝐱 λ𝐱 or 𝐀 λI 𝐱 0 [Eq. 1-31]

then x is said to be an eigenvector and λ an eigenvalue of A. Generally, x is called the eigenvector

corresponding to or associated with λ, and λ is the eigenvalue corresponding to or associated with

x. Both x and its corresponding λ occur in pairs (Dawkins 2007). The set of solutions of Equation

1.31 is referred to as the eigenspace of A corresponding to λ.

Spectral decomposition: If A is an 𝑛 𝑛 symmetric matrix, its spectral decomposition is in the

form of the Equation 1.32 below provided,

𝑨 𝜆 𝒖 𝒖 𝜆 𝒖 𝑢 𝜆 𝒖 𝒖 [Eq. 1-32]

𝑛 𝑛 𝑛 1 1 𝑛 𝑛 1 1 𝑛 𝑛 1 1 𝑛

where 𝜆 , 𝜆 , ⋯, 𝜆 and 𝒖 , 𝒖 , ⋯, 𝒖 are the eigenvalues of A and their corresponding


normalized eigenvectors, respectively. Hence, 𝒖 𝒖 1 for

𝒖 𝒖 1, ∀ 𝑖 1, ⋯

𝒖 𝒖 0, 𝑖 𝑗 , ∀ 𝑖, 𝑗 1, ⋯ , 𝑛

Characteristic equation: For A's eigenvectors A to contain vectors other than the zero vector, the

determinant of matrix 𝐀 𝛌𝐈 as provided in Equation 1.33 must be null, resulting in Equation

1.34. The nth degree polynomial in λ provided in this latest equation is a different representation

of the characteristic equation of A.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
38

det (λI − A) = 0 [Eq. 1-33]

P λ λ c ∙λ ⋯ c ∙λ c [Eq. 1-34]

If λ1, λ2, …., λn is the complete list of eigenvalues of A, including repeats, any λi that occurs

precisely once is called a simple eigenvalue. In contrast, any that happens more than once is said

to have a multiplicity of k.

Eigenvalues and Eigenvector Properties: If A is a triangular matrix such as the one provided in

Equation 1.19, then its eigenvalues λ can be found by solving Equation 1.35 where

𝑎 ,𝑎 ,⋯,𝑎 are the diagonal entries of A.

det λ𝐈 𝐀 λ a ∙ λ a ⋯ λ a 0 [Eq. 1-35]

Suppose that λ is an eigenvalue of the matrix A with a corresponding eigenvector x, if k is a

positive integer, then λk is an eigenvalue of the matrix Ak with matching eigenvector x as

represented by Equation 1.36.

𝐀∙𝐱 λ∙𝐱→𝐀 ∙𝐱 λ ∙𝐱 [Eq. 1-36]

Let λ1, λ2, …, λn be the complete list of all A's eigenvalues, including repeats. Then, Equation 1.37

gives the determinant of A in terms of its eigenvalues.

det 𝐀 λ ∙ λ ⋯λ [Eq. 1-37]

Its trace may also be computed using Equation 1.38 instead of Equation 1.30.

tr A λ λ ⋯ λ [Eq. 1-38]

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
39

1.4.2.11 Orthogonal Matrices

A square matrix A whose rows, deemed as vectors, is orthogonal if its row vectors not only have

unit lengths but also are mutually perpendicular. This is translated by the reciprocal results

provided in Equation 1.39, which is illustrated in the following example.

𝐀 𝐀 ↔ 𝐀 𝐀 𝐀 𝐀 [Eq. 1-39]

Given the 3 3 matrix A below, the lengths of its rows and columns considered as vectors can be

calculated using Equation 1.8 to verify that they are all of the unit lengths. In addition, the mutual

perpendicularity of any pair of rows or columns of matrix A may also be verified using Equation

1.13 to compute each pair's 3D vector product, which should be zero.

⎡ 1 √3
0⎤
⎢ 2 2 ⎥
𝑨 ⎢ √3 1 ⎥
⎢ 0⎥
2 2
⎣ 0 0 1⎦

With all the conditions met, it can be concluded that A is an orthogonal matrix. Therefore, by

calculating the inverse 𝐀 of matrix A using Equation 1.29 and the transpose 𝐀 of A, obtained

by swapping the rows of A into columns (see Page 31), one can verify Equation 1.39.

⎡1 √3
0⎤
⎢2 2 ⎥
𝐀 𝐀 ⎢√3 1 ⎥
⎢2 0⎥
2
⎣0 0 1⎦
1.4.2.12 Singular Value Decomposition and Singular Values

Given an m k matrix A of real numbers, there exist two orthogonal matrices U and V of orders,

respectively m m and k k, such that A may be written as in Equation 1.37.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
40

𝐀 𝐔⋀𝐕
[Eq. 1-40]
𝐀 𝐔 ⋀ 𝐕
m k m m m k k k

Where ⋀ is an m k diagonal matrix whose entries λ denoted as the singular values of A are

defined in Equation 1.41.

𝜆 0 ∀ 1 𝑖 𝑚𝑖𝑛 𝑚, 𝑘 [Eq. 1-41]

𝜆 0, ∀ 𝑚𝑖𝑛 𝑚, 𝑘 𝑖 𝑘

Another way of expressing the singular value decomposition (SVD) of matrix A is in terms of its

rank “r” as a matrix expansion. Namely, there exist: r real positive coefficients λ , λ , … , λ ; r

orthogonal m 1 unit vectors 𝐮 , 𝐮 , … , 𝐮 ; and r orthogonal k 1 unit vectors 𝐯 , 𝐯 , … , 𝐯 such

that A may also be expressed as in Equation 1.42 in terms of its rank r and coefficients λ .

[Eq. 1-42]
𝐀 λ𝐮𝐯 𝐔⋀𝐕

Where the columns of 𝐔 as below provided are called the left-single vectors, whereas the columns

of 𝐕 also as below provided are called the right-single vectors, and ⋀ is an (m k) diagonal

matrix whose entries are λ .

𝐔 𝐮 , 𝐮 , … , 𝐮 and 𝐕 𝐯 ,𝐯 ,…,𝐯

Moreover, the product matrix 𝐀𝐀𝐓 has eigenvalue-eigenvector pairs λ , 𝐮 , such that 𝐀𝐀𝐓 can be

represented by Equation 1.43.

𝐀𝐀𝐓 𝐮 λ 𝐮 [Eq. 1-43]

Where the r real and positive numbers λ satisfy the conditions of Equation 1.44.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
41

λ 0, λ 0, ⋯ , λ 0, ∀ m k [Eq. 1-44]

By combining the expressions of Equation 1.40, Equation 1.42, and Equation 1.430, one may

derive a different expression for the SVD of the matrix A in terms of the orthogonal left and right

vectors in the form of Equation 1.45.

𝐯 λ 𝐀 𝐮. [Eq. 1-45]

As previously mentioned, the pairs λ , 𝐯 are eigenvalues and eigenvectors of the matrix product

𝐀𝐓 𝐀. Alternatively, U, V, and ⋀ as given in Equation 1.42 have respectively “m” orthogonal

eigenvectors of 𝐀𝐀 as its columns, “k” orthogonal eigenvectors of 𝐀𝐓 𝐀 as its columns, and λ .as

its diagonal entries.

1.4.2.13 Positive Definite Matrix

See below for the definition of a positive definite matrix.

1.4.2.14 Quadratic Form of a Positive Definite Matrix

Given a p-dimensional vector 𝒙, an 𝑝 𝑝 real and symmetric matrix 𝑴 is positive definite if the

square of the statistical distance (d), defined in Section 1.5.8.2, satisfies the condition provided in

Equation 1.46.

0 𝑑 𝒙 𝑴𝒙 for 𝒙 ∈ ℝ𝒏 \ 𝟎 [Eq. 1-46]

In the above inequality, the square of the distance d is known as the quadratic form of M. Quadratic

forms, also known as matrix products, and distances play a vital role in multivariate analysis. One

may refer to a multivariate statistical analysis book such as Johnson and Wichern (2019) for further

interest in both topics.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
42

1.5 Descriptive Statistics and Inferential Statistics

1.5.1 Prelude

Statistics is a branch of mathematics concerned with summarizing, analyzing, and interpreting a

collection of numbers or observations. For that reason, they provide context for or meaning for

exciting observations. For instance, a college professor may survey students to ascertain their

satisfaction with (or dissatisfaction with) the course. Indeed, statistics inform and are ingrained in

one’s life, and statistics are interpretable. Researchers can collect multiple variables

simultaneously, such as several genes from human populations in hundreds of locations across a

continent or country. Countless measurements are often taken, and procedures for organizing,

summarizing, and making sense of these measurements have been developed. These procedures,

collectively referred to as descriptive statistics, describe or summarize numerical observations or

data. Ad defined, descriptive statistics are methods for synthesizing, organizing, and making sense

of a collection of scores or observations. Typically, descriptive statistics are presented visually,

tabulated (in tables), or as summary statistics (single values). Usually, when referring to data

(plural), they mean numerical measurements or observations. Alternatively, a datum (singular) is

a single measurement or observation, more commonly known as a score or raw score (SAGE

Campus). Unlike descriptive statistics, statistical inference is inferring population characteristics

from sample observations. The analysis considers the probability that an observed difference

occurred by chance. The z-test is the simplest type of statistical test; it compares a sample to the

population when the variable is a measured quantity (Norman and Streiner 2003). This section

discusses descriptive statistics and statistical inference components and procedures, respectively.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
43

1.5.2 Describing the Data: Sample and Population

As a discipline, statistics deals with collecting, organizing, analyzing, explaining, and displaying

data. When dealing with data, it is typical to specify the extent of the data. For example, is it a

sample or population data? Nevertheless, this section will cover all involved with statistics in the

same order.

1.5.2.1 Sample Data Versus Population Data

The statistical population is quite different from the mainstream population except for census data

or Gallup polls (Norman and Streiner 2003). From an investigator's viewpoint, a population, also

called the universe, is a set of things, individuals, or data from which the investigator can draw a

statistical sample to make statements. Thus, a sample is defined as a subset 𝑋 , 𝑋 , ⋯ , 𝑋 chosen

from a population X. The sample size n is determined by the number of elements it contains, and

its sampled values (observations) are denoted 𝑥 , 𝑥 , ⋯ , 𝑥 . Figure 1.10 below illustrates all the

three terms population, sample, and observations.

Figure 1.10: Illustration of a Population and its Sample Values

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
44

In most studies, such as the one involving multivariate analysis, the parameters of the population

(e.g., mean and variance) are unknown. To carry out such a study, the investigator would

"randomly" draw a set of a multivariate data sample—a portion—from the population to derive

the whole population characteristics necessary to make statistical inferences. Statistical inference

is “an estimate, a prediction, a decision, or a generalization about the population based on

information contained in a sample” (Ramachandran and Tsokos 2014, p. 5). As discussed in the

subsequent section, employing representative samples eliminates biases and errors and ensures

fairness in making statistical inferences. Hence, randomness in data sampling is critical. In other

words, randomness guarantees that potential differences arising between the data sample and its

population are due to chance alone but not to biases or something the researcher may do to

influence the experiment's outcome. The following are a couple of examples to help better

understand the terms population and sample.

Political polls: The subset of voters polled represents a sample rather than the entire population.

Laboratory experiment: An experiment can be repeated an endless number of times by a lab

technician, who records the results after each run. An experiment's final measurements are a

sample since they reflect a subset of the population as a whole.

Nevertheless, there are two types of population in the literature: finite and infinite. The elements

or units necessary to construct a random sample are selected without replacement for a limited

population. As a result, Ramachandran and Tsokos (2014, p.181) wrote that the resulting sample

“is not a random sample and 𝑋 s [elements, variables, or characteristics of the population] are not

random i.i.d random variables.” Whereas an infinite population, having a finite population factor

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
45

of one, is the opposite of the finite population whose correction factor is ( 𝑁 𝑛 / 𝑁 1 ),

where n is the sample size, and N is the population size.

1.5.2.2 Sampling Methods

This section briefly describes a preliminary plan of actions a researcher must prepare to draw

samples from a population of interest. This step is crucial in preventing sampling errors and

systematic biases and preparing reliable and accurate plans. While systematic bias is an error

caused by an inadequate sampling procedure (e.g., reporting data with biases), sampling error

refers to the random fluctuations in the sample estimates about the true population parameters

(Kothari, 2004). The goal is to devise a sampling method to produce more minor sampling errors

and control systematic biases. The literature on the topic suggests that there exist various sampling

methods. Accordingly, devising a sampling plan is recommended and usually results in selecting

the appropriate sampling method for the investigation. During the planning process, researchers

often consider the following points to decide the sampling method: type of population (finite or

infinite), sample size, parameters of interest, budgetary constraints, and relevant sampling

procedures. The following paragraphs describe only three main sampling designs that are relatively

pertinent to the current study.

Simple Random Sampling

In the literature, a sampling procedure is either non-probability or probability sampling. Non-

probability sampling allows a researcher to purposefully select specific items or units of the

population to form a sample. This sampling procedure (e.g., quota sampling for market surveys)

is convenient and inexpensive. Alternatively, probability sampling, also known as random or

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
46

chance sampling, is based on the law of statistical regularity to select samples, as Kothari (2004)

noted. Under this law, each sample will exhibit the same composition and characteristics as the

population of interest. In addition, each sample will have an equal chance of being selected. A

sample drawn this way is called a simple random sample. Although simply random sampling

seems to present great advantages, such as significantly reducing the investigator’s biases and

having relatively simple analytical computations such as sample size given an error level, it “may

not be effective in all situations” (Ramachandran and Tsokos 2014, p. 9). Hence, considering other

random sampling methods such as systematic or stratified sampling would be more appropriate to

select systematic samples or stratified samples.

Systematic Sampling

Under systematic sampling, samples are extracted at evenly spaced times. The ith item in the

sampling frame – a set of elements, units, or individuals to draw from to create a representative

sample – can be chosen only after an appropriate random start for the first item. Hence, this

sampling procedure requires some order in the population items in sampling the desired fraction

needed to create a systematic sample. Courtesy of Ramachandran and Tsokos (2014), Below are

steps provided to choose a systematic sample:

Step 1: Itemize from 1 to N the items in the sampling frame (e.g., 𝑁 1000).

Step 2: Set the sample size; let n (e.g., 𝑛 100 or 10% of the population) be that size.

Step 3: Select 𝑝 𝑁
𝑛 (e.g., 𝑝 10).

Step 4: Randomly choose an integer between 1 and p (e.g., 𝑝 4).

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
47

Step 5: Draw (with replacement) each pth element (e.g., selected items would be 4, 14, 24, …,994.

Strictly speaking, a systematic sample may not entirely be comparable to a random sample. Still,

it is reasonable to be treated as one and thus considered an improvement of a simple random sample

(Ramachandran and Tsokos 2014). This is because it provides individual population elements with

an equal probability of being selected to form a sample. In addition, it gives each combination of

samples with an identical probability (joint) of being drawn. Thanks to its simplicity, systematic

sampling is widely used and appropriate when the items (variables) characterizing a population

are available and of sizeable length.

Stratified Sampling

Thanks to its formulation, stratified sampling is considered an improvement of either simple

random or systematic sampling (Ramachandran and Tsokos 2014). Under the stratified sampling

scheme, one subdivides the population of interest into homogenous strata or subpopulations, then

selects independently each stratum, whose size may vary from one another, to form a sample. To

choose a stratified sample, one may follow the following steps:

Step 1: Decide on the stratification factors relevant to the research and define their criteria.

Step 2: Partition the whole population into strata or subpopulations, not necessarily of equal size,

according to the stratification criteria outlined in step 1.

Step 3: Choose the number of items to sample from each stratum using simple random or

systematic sampling. Note that this number may vary from stratum to stratum.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
48

Stratified samples are widely used and are appropriate in situations when at least one of the strata

has an insignificant prevalence in other strata. Besides providing information on the entire

population, stratified sampling offers detailed information on subpopulations. Moreover,

compared to the other two sampling schemes, stratified sampling produces accurate, reliable, and

detailed information. While this section has provided helpful background information to formulate

sounded sampling schemes, one may refer to the cited authors’ materials for more interest in

sampling methods and other terms.

1.5.3 Arranging Data in Tables or Arrays

A researcher often arranges multivariate data methodically to collect the data necessary for a study

successfully. Depending on the study's purpose or intended use, the researcher may explore

capturing data in an array format, as illustrated in Equation 1.47 or a tabular form, as presented in

Table 1-3. In either of the arrangements, n specifies the number of units or features. Alternatively,

p is the number of variables of interest for which the measurements 𝑥 on the ith unit of the jth

variable are being recorded. In general, the measurements 𝑥 are stored in an array format similar

to an 𝑛 𝑝 data matrix 𝑋 𝒙 , 𝒙 , … , 𝒙 , with each 𝒙 𝑥 ,𝑥 ,⋯,𝑥 . In contrast to the

n-dimensional observations 𝒙 , 𝒙 , … , 𝒙 , the random variables X , X , ⋯ , X represent the

theoretical elements that characterize the univariate distributions of each variable X and their joint

distribution (Everitt and Hothorn 2011). The sampling distribution of the variables X , X , ⋯ , X

as a collective is a critical topic in multivariate analysis and will be examined in Section 2.3.2.2

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
49

Variables
Features 𝑋 𝑋 ⋯ 𝑋 ⋯ 𝑋
1 𝑥 𝑥 ⋯ 𝑥 ⋯ 𝑥
2 𝑥 𝑥 ⋯ 𝑥 ⋯ 𝑥
⋮ ⋮ ⋮ ⋮ ⋮
i 𝑥 𝑥 ⋯ 𝑥 ⋯ 𝑥
⋮ ⋮ ⋮ ⋯ ⋮ ⋯ ⋮
n 𝑥 𝑥 ⋯ 𝑥 ⋯ 𝑥

Table 1-3: Tabular Presentation of n Measurements on p Variables

or,

𝑥 𝑥 ⋯ 𝑥 ⋯ 𝑥
⎡𝑥 𝑥 𝑥 𝑥 ⎤

⎢ ⋮ ⋮ ⋮ ⋮ ⎥
𝐗 ⎢ ⎥
⎢𝑥 𝑥 ⋯ 𝑥 ⋯ 𝑥 ⎥ [Eq. 1-47]
⎢ ⋮ ⋮ ⋮ ⋮ ⎥
⎣𝑥 𝑥 ⋯ 𝑥 ⋯ 𝑥 ⎦

The following is a plausible justification for the array format preference for encoding observed

values on the p-variables. Measuring the entire set of variables on each unit alters the variables'

correlations somehow. As a result, conducting a statistical analysis individually on each variable

may obscure the inherent structure of the entire data set. In other words, analyzing each variable

on its own may be a missed opportunity for the researcher attempting to uncover the primary

characteristics of the multivariate data and identify any interesting "patterns" hidden within the

data. Typically, multivariate statical analysis reveals the entire data structure. A multivariate

statistical analysis may be defined as the simultaneous statistical analysis of a group of variables.

The primary objective is to improve univariate analyses of each variable performed independently

by incorporating information describing the variables' relationships (Everitt and Hothorn 2011).

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
50

1.5.4 Plotting Data

1.5.4.1 Prelude

Statistical data visualization has long been used in science and technology by users from various

fields. R.A. Fisher's first methods were diagrams, which he published in 1925. Since then, data

visualization has made tremendous progress, thanks primarily to rapid advancements in computer

systems. These advancements increasingly necessitate the automation of complex processes

involving significant technical information to solve many problems confronting today's fast-paced

society. Examples of visualization methods are bar graphs, Pareto charts, stem and leaf plots, pie

charts, scatterplots, and probability plots. A vital link is a visualization method, such as a graph,

whose purpose is first to encode quantitative and categorical information into a display medium,

then decode it through visual perception (Cleveland 1993). Still, as Cleveland (1993, p. 2) pointed

out, "no matter how clever the choice of information, and no matter how technologically

impressive the encoding, a visualization fails if the encoding fails."

Nevertheless, how one knows whether a given probability distribution is a good model for the data

is a critical question for statisticians or researchers. The reason is that several statistical models are

based on the assumption of a specific type of population distribution. As a result, statistical data

analysis to determine whether the data came from a particular probability distribution is commonly

used to validate such assumptions. Occasionally, the shape of the distribution can reveal

information about the underlying structure of the data. Hence, a graphical representation of the

data, say a histogram, may provide insight into the shape of the underlying distribution.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
51

However, histograms are not always reliable predictors of distribution shape unless there is a

considerable sample size. Therefore, practitioners may use probability plots such as Q-Q and P-P

plots for small to moderate samples, which are more reliable than histograms. Probability plotting,

for example, is a subjective visual assessment of data that is used to determine whether a given

sample of data fits a conjectured distribution (Wilk and Gnanadesikan 1968). Even though there

is substantial literature on graphical techniques for statistical data analysis based on the empirical

cumulative distribution function and its implications, the following subsections are devoted to Q-

Q and P-P plots and histograms.

1.5.4.2 Frequency Distributions/Histograms

This section aims to provide background information on histograms and the frequency distribution

tables that serve as histograms' prerequisites. In the eighteenth century, the histogram was invented

as a data analysis tool for summarizing data (Chambers et al. 2018). As described in the following

sequel, a histogram condenses a data set into a compact image that illustrates the location of the

mean and modes of the data and the data's variation, mainly the range. In addition, it serves to

deduce patterns from data. A histogram is an excellent aggregate graph of a single variable that

should always be used as a starting point for determining the variability in the data (Ramachandran

and Tsokos 2014).

Frequency Distributions

A frequency distribution (or table) summarizes data more concisely than a stem-and-leaf diagram.

To create a frequency distribution, divide the data range into intervals, commonly referred to as

class intervals, cells, or bins. The bins should be similar in width to enhance the visual information

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
52

in the frequency distribution. As directed by Chambers et al. (2018, p. 41), for more information

on the topic, consult Diaconis and Freedman (1981) and Scott (1979) for their exciting reading on

optimal procedures for choosing the interval width. Nonetheless, some discretion must be

exercised in determining the number of bins to be used to create a pleasing display.

The number of bins is determined by the number of observations and the data's degree of scatter

or dispersion. A frequency distribution with too few or too many bins will be uninformative.

Choosing between five and twenty classes is a general guideline from various authors on the

subject. Montgomery and Runger went on to say that the number of bins should increase with n

and be chosen to be roughly equal to the square root of the number of observations, which often

works well in practice. Finally, the goal is to use enough classes to show the variation in the data,

but not so many that many of the classes have only a few data points (Ramachandran and Tsokos

2014). The frequency distribution table contains the relative frequency distribution in addition to

classes. The observed frequency of a given bin, say i, divided by the total number of observations

n represents the bin's relative frequency (𝑓 ). That is the ratio 𝑓 /𝑛, and the bin's cumulative relative

frequency, provided in Equation 1.48, is defined as the sum of all the relative frequency 𝑓 of all

classes preceding class i

[Eq. 1-48]
𝑓 /𝑛

Histograms

A histogram is a graphical representation of the frequency distribution. Specifically, a histogram

is a graph where the horizontal axis represents classes, and the vertical axis represents frequencies,

relative frequencies, or percentages. Histograms may have equal or uneven bins. One advantage

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
53

of utilizing equal bin widths rather than unequal bin widths is that if the data contains few extreme

observations or outliers, employing a few equal-width bins results in practically all observations

falling within a few bins. On the other hand, using many equal-width bins results in many bins

with zero frequency. As a guideline, when the bins are different in width, the rectangle's area, not

the height, must be proportional to the bin frequency. This guideline entails that the rectangle's

height equals the bin frequency divided by the bin width. In addition to quantitative data, frequency

distributions and histograms can visualize qualitative or categorical data. Typically, the width of

the bins is equal in this scenario (Montgomery and Runger 2007). However, the steps for creating

a histogram with identical bin sizes are as follows.

Step 1: Label the bin (class interval) boundaries on a horizontal scale.

Step 2: Indicate and name the frequencies or relative frequencies on the vertical scale.

Step 3: Draw a rectangle over each bin with a height equal to the frequency (or relative frequency)

associated with that bin.

Some Issues with Histograms

Although histograms are commonly used to communicate distributional information to broad

audiences, they have several limitations as a data analysis tool (Chambers et al. 2018). Choosing

the appropriate interval width and the number of bins, as discussed previously, is one of the

problems. Montgomery and Runger (2007, p. 205) demonstrated this point with an experiment in

Minitab. They varied either or both the number of bins and the width of the same data on the

compressive strength of 80 aluminum-lithium alloy specimens. They found "That histograms may

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
54

be relatively sensitive to the number of bins and their width." In other words, for small data sets,

the appearance of histograms might alter substantially if either one or both the number bins or

width of the bins are changed. Additionally, histograms are more reliable when used with larger

data sets, ideally 75 to 100 or greater. Figure 1.11(a) and (b) illustrate one of the histograms and

cumulative distributions of compressive strength data that they plotted in their experiments,

respectively.

(a) Histogram of compressive strength for 80 (b) Cumulative distribution plot of the
aluminum-lithium alloy specimens compressive strength data

Figure 1.11: Illustrations of a Histogram and a Cumulative Distribution Graphs


Courtesy of Montgomery and Runger (2007)

Interpreting A Histogram / Cumulative Distribution

As indicated previously, Figure 1.10 (b) depicts the compressive strength data's cumulative

frequency plot. The height of each bar represents the total number of observations that are less

than or equal to the bin's upper limit. Cumulative distributions are also helpful when interpreting

data. For instance, one may deduce immediately from the exact figure that around 70 observations

(out of 80) are less than or equal to 200 psi. In other words, 87.5 % of the time, an observation x

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
55

selected from the dataset will be less than 200 psi. A histogram can be defined as an approximation

to the probability density function 𝑓 𝑥 . The bar area represents the relative frequency

(percentage) of the measurements in each histogram interval. Thus, the relative frequency can be

viewed as a probability estimate for whether a measurement falls within an interval. Similarly, the

area beneath 𝑓 𝑥 represents the true probability that a measurement falls in an interval

(Montgomery and Runger 2007).

Moreover, as previously stated, histograms may present some challenges. However, when the

sample size is large sufficiently, a histogram can be a relatively reliable indicator of the general

distribution of the population of measurements from which the sample was selected. When data

are symmetrical, the mean and median coincide in most cases. Additionally, the mean, median,

and mode coincide if the data are unimodal. However, the mean, median, and mode do not coincide

when the data are skewed (asymmetric, having a long tail to one side). Typically, given a right-

skewed distribution, the mean, median, and mode satisfy mode< median < mean. The three metrics

are mode > median > mean with a left-skewed distribution. Refer to Montgomery and Runger's

book (2003) for more information and excellent illustrations on this subject.

1.5.4.3 Quantile Probability Plots (Q-Q Plots)

The Q-Q plot is a highly effective visualization technique frequently used to produce a graphical

representation of the true pdf from which a given data may have originated. This plot represents

the quantiles of the empirical distribution of the provided data versus the quantiles of the

hypothesized true pdf under test graphically. Accordingly, it is referred to as a theoretical Q-Q plot

due to its distinction from an empirical Q-Q plot. If the resulting graph of these two distributions

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
56

is linear, the estimated pdf fits the given data reasonably well (Ramachandran and Tsokos 2014).

This result is Wilk and Gnanadesikan's brilliant finding, and their invention could not be more

straightforward or elegant than comparing one distribution's quantiles against the other's quantiles

(Cleveland 2003).

To describe the construction of a Q-Q plot, let 𝑥 , 𝑥 , ⋯ , 𝑥 be the “raw data” and 𝐹 𝑥 be the

cumulative distribution function CDF of the theoretical distribution in question. To define the

quantiles q of F, let 𝑄 𝑞 denote a number, with 0 𝑞 1, satisfying Equation 1.49(a) or

Equation 1.49(b), where 𝐹 signifies the inverse function of the CDF F.

𝐹 𝑄 𝑞 𝑞 (a)
[Eq. 1-49]
𝑄 𝑞 𝐹 𝑞 (b)

In other words, A portion q of the probability of the distribution happens for x values less than or

equal to 𝑄 𝑞 , just as a fraction q of the data is less than or equal to 𝑄 𝑞 . The subscripts “e”

and “t” differentiate theoretical from empirical versions (Chambers et al. 1983). Now, construct a

Q-Q plot by following the steps outlined below.

Step 1: Rank the observations in the sample from smallest to largest. That is, the sample

𝑥 , 𝑥 , ⋯ , 𝑥 is arranged as 𝑥 ,𝑥 ,⋯,𝑥 , where 𝑥 is the smallest observation, 𝑥 is the

second smallest observation, and so forth, and 𝑥 is the largest observation. Note that when

values of the 𝑥 are distinct, exactly j observations are less than or equal to 𝑥 . Theoretically,

this is always valid when the sampled values are of the continuous type assumed in most cases.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
57

Step 2: Assign a cumulative probability q 𝑗 0.5 /𝑛 to each ordered observation 𝑥 , also

known as order statistics.

Step 3: Use Equation 1.50 to estimate the values 𝑥 ,𝑥 ,⋯,𝑥 of the random variable X

associated with the assumed probability distribution 𝐹 .

𝑥 𝑄 q 𝐹 q 𝐹 𝑗 0.5 /𝑛 . [Eq. 1-50]

Step 4: Graph the scatterplot of the pairs 𝑥 ,𝑥 , 𝑥 ,𝑥 ,⋯, 𝑥 ,𝑥 . Some authors or

computer programs flip the axis, putting the observed quantiles 𝑥 on the vertical axis and the

theoretical quantiles 𝑥 on the horizontal. The interpretation remains the same in either case.

Step 5: Examine the "straightness" of the Q-Q plot to see how the points deviate from a straight

line. It is advantageous to draw a line by concentrating on spots close to the middle of a Q-Q plot

rather than on the plot’s extreme left and right. The data set conforms to the predicted probability

distribution if the overall pattern is nearly linear. On the other hand, the data is skewed and does

not conform to the expected probability distribution if the general pattern contains curves or

shelves (Ramachandran and Tsokos 2014).

Chambers et al. (1983) and Wilk and Gnanadesikan (1994) provide guidelines for identifying and

interpreting Q-Q plots (1968). Still, the following is a quick tip on drawing the straight line “chosen

subjectively.” Montgomery and Runger (2007, p. 213) suggested a rule of thumb which “is to draw

the line approximately between the 25th and 75th percentile points.” By doing so, if the pairs of

points 𝑥 , 𝑥 , lie very nearly along a straight line, then the notion that the sample data arise

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
58

from the hypothesized distribution would not be rejected. In other words, if the plotted points

deviate substantially from the straight line, the hypothesized model is not suitable.

For illustration, one may consider a well-illustrated example courtesy of Montgomery and Runger

(2007, p. 215) on batteries’ usage times in a portable personal computer. Below are its Q-Q plot

and the necessary calculations based on ten 𝑛 10 observations on the effective service battery

service life (𝒙 𝒋 ) measured in minutes of battery usage. Measurements and calculations following

the five steps above outlined are provided in Table 1-4. Alternatively, Figure 1.12 illustrates the

Q-Q plot of pairs 𝑥 , 𝑧 , where 𝑧 represents the standardized normal scores (see page 94)

satisfying Equation 1.51 below.

𝑗 0.5 [Eq. 1-51]


𝑃 𝑍 𝑧 𝐹 𝑧
𝑛

In assessing the closeness of the points to the straight line drawn on either one of the Q-Q plots in

Figure 1.12, it is apparent that most of the points can be covered by an imaginary “fat pencil”

placed along the straight line. With the points passing the fat pencil test, one can conclude that the

Gaussian distribution is indeed a suitable model for the data being studied. The authors also

provide other Q-Q plots showing a departure from the normal distribution. For more illustrations,

one may also refer to the book by Ramachandran and Tsokos (2014).

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
59

j 𝒙 𝒋 𝒋 𝟎. 𝟓 /𝟏𝟎 𝒛𝒋
1 176 0.05 -1.64
2 183 0.15 -1.04
3 185 0.25 -0.67
4 190 0.35 -0.39
5 191 0.45 -0.13
6 192 0.55 0.13
7 201 0.65 0.39
8 205 0.75 0.67
9 214 0.85 1.04
10 220 0.95 1.64

Figure 1.12: Illustration of a Q-Q


Plot: Normal Distribution Table 1-4: Computations for Constructing a Q-
Q Plot: Normal Distribution

1.5.4.4 Probability Plots (P-P plots)

A P-P plot is a graphical tool used to determine how a given data set fits a specified probability.

This figure compares the given data's empirical cumulative probability distribution to the assumed

true cumulative probability distribution functions. If the plot of these two distributions is

approximately linear, it implies that the hypothesized true pdf fits the observed data reasonably

well (Ramachandran and Tsokos 2014). To illustrate how a P-P plot is constructed, consider a

random variable X and its assumed true cumulative distribution function 𝐹 𝑥 . Then, let

𝑥 , 𝑥 , ⋯ , 𝑥 be a random sample of X. Then, one can create a P-P plot by following these steps.

Step 1: Rank the observations in the sample from smallest to largest. That is, the sample

𝑥 , 𝑥 , ⋯ , 𝑥 is arranged as 𝑥 ,𝑥 ,⋯,𝑥 .

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
60

Step 2: Assign a cumulative probability 𝑝̂ 𝑗 0.5 /𝑛 to each ordered observation 𝑥 , with

𝑗 1, 2, ⋯ , 𝑛.

Step 3: Calculate through Equation 1.52 the theoretical cumulative probability 𝑝 ,𝑝 ,⋯,𝑝 .

corresponding to the empirical cumulative probabilities of the ordered sample data.

𝑝 𝐹𝑋 𝑥 𝑃 𝑋 𝑥 𝑝̂ , ∀𝑗 1, 2, ⋯ , 𝑛 [Eq. 1-52]

Step 4: Construct the scatterplot of the pairs 𝑝̂ ,𝑝 , 𝑝̂ ,𝑝 , ⋯ , 𝑝̂ ,𝑝 .

Step 5: Examine the "straightness" of the P-P plot to see how the points deviate from a straight

line. Everything described previously regarding the interpretation of Q-Q plots also applies to P-P

plots, as they are related.

A couple of probability plots borrowed from different sources have been provided for illustrations.

The P–P plot for a sample size of m = 200 and n = 50,000 markers is depicted in Figure 1.13(a) on

the left of the panel, derived from Patterson et al. (2006). They found the fit to be excellent for

demonstrating the Johnstone normalization's appropriateness. Whereas Figure 1.13(b) on the right

of the panel, derived from Dupuis (2010), depicts the P-P plot for the state of Nebraska, where X

represents the duration of the dry period, and Y represents the duration of the successive wet

period. As can be seen from the graph, model 1 based on X and Y does not entirely capture the

observed behavior.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
61

(a) Application to genetics in modeling Population (b) Application to Modeling of the


Structure and Conducting Eigenanalysis Monthly Palmer Drought Severity
Index/ Gamma distribution

Figure 1.13: Illustrations of P-P Plots in Genetics and Hydrologic Engineering

1.5.5 Describing Univariate Data in Numerical Form

After discussing some graphical and tabular ways for describing data sets in the prior section, this

section discusses certain numerical features of a collection of measurements. For example, assume

that one has a sample consisting of the values a, b, and c. This data set exhibits various

characteristics, including central tendency and variability. The sample's mean, median, or mode

measures central tendency, while the sample variance, standard deviation, or interquartile range

estimates dispersion or variability. Please note that the formulae and subjects discussed in this

section are available in most statistics literature. (e.g., Ramachandran and Tsokos 2014).

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
62

1.5.5.1 Measures of Central Tendency (Mean, Mode, Median)

Given a sample of n observations 𝑥 , 𝑥 , ⋯ , 𝑥 , the sample mean, also known as the empirical

mean, is denoted by 𝑥̅ and defined by Equation 1.53.

1 [Eq. 1-53]
𝑥̅ 𝑥,
𝑛

As a measure of the central location of the data, the sample mean is heavily influenced by extreme

values or outliers. However, the trimmed mean is a more robust measure of central location as it

is relatively unaffected by outliers. How is the trimmed mean determined? Given 0 𝛼 1, one

can determine a 100𝛼% trimmed mean as follows: (1) order the data, (2) discard the data values

with the lowest and highest 100𝛼 percentages, (3) determine the mean of the remaining data

values. The notation for the 100𝛼% trimmed mean is 𝑥̅ . To illustrate the trimmed mean concept,

one may refer to Ramachandran and Tsokos (2014, p. 29).

A sample median is a central value in a set of data. That is the value that partitions the data set into

two groups of the same size. Let 𝑥 ,𝑥 ,⋯,𝑥 be the ranked observations 𝑥 , 𝑥 , ⋯ , 𝑥 in

increasing order, from small to large. If n is odd, the median is jth value 𝑥 with 𝑗 𝑛 1 ⁄2.

Otherwise, n is even, and there are two values of 𝑥 equally close to the middle. Then, the

interpolation rule suggests averaging them, given by Equation 1.54:

1 [Eq. 1-54]
𝑥 𝑛 𝑥 𝑛
2 2 2 1

Unlike the mean, the median is much less susceptible to outliers in data.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
63

A sample mode represents the data value that occurs the most frequently. As a result, it shows

where the data tend to be the most concentrated. However, as Ramachandran and Tsokos (2014)

wrote, if all data values are distinct, the data set has no mode by definition.

1.5.5.2 Measures of Dispersion

Upper and Lower Quartiles, Interquartile Range (IQR)

They are two quantiles worth mentioning as they measure the spread of the data set. Those

quantiles are the lower and upper quartiles, denoted by the abbreviations Q(.25) or 𝑄 and Q(.75)

or 𝑄 , respectively. They represent 25% and 75% of the data, respectively. The interquartile range

(IQR) is the distance between the first and third quartiles, 𝑄 𝑄 . It can be used to determine the

spread of the majority of the data.

Sample Variance and Standard Deviation

The sample variance, shorthand 𝑠 or var, is given by Equation 1.55 below provided.
𝑛
1 [Eq. 1-55]
2
𝑠 𝑥𝑖 𝑥
𝑛 1
𝑖 1

The sample standard deviation, denoted s, is defined as the square root of the variance 𝑠 . Both

the sample variance 𝑠 and the sample standard deviation s are measurements of the variability or

"scatteredness" of data values in the vicinity of the sample mean 𝜇̅ . The greater the variance, the

wider the spread. It is worth noting that both 𝑠 and s are nonnegative.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
64

1.5.5.3 Statistical Skewness and Kurtosis

The skewness of the sample 𝑥 , 𝑥 , ⋯ , 𝑥 , is expressed by Equation 1.56 below.

𝑛 𝑥 𝑥̅ [Eq. 1-56]
𝑔
𝑛 1 𝑛 2 𝑠

The kurtosis of the same sample can be computed using Equation 1.57 below provided.

𝑛 𝑥 𝑥̅ [Eq. 1-57]
𝑘
𝑛 1 𝑛 2 𝑠

To interpret the values of 𝑔 and 𝑘 , one may refer to page 90, where both terms are discussed.

1.5.5.4 Percentiles and quantiles

In the classical sense, quantile is synonymous with percentile. On the one hand, the term

"percentile" refers to any of the 99 numbered points dividing an ordered collection of scores into

100 segments, each containing one-hundredth of the total scores. The percentile of a particular

number, say x, is determined by the percentage of values less than x. For example, a test score in

the 90th percentile is greater than 90% of the available scores but less than 10% of the remaining

scores. Quantile, on the other hand, is a phrase that refers to one of the groups of values in a variate

(random variable) that divides either the units (elements) of a sample into subgroups of the same

size and adjacent values or a probability distribution into elementary distributions of equal

probability. In other terms, a quantile is a value that divides a set of data into identical proportions.

Returning to the previous example, given the magnitude of the data set or score, the. The 90

quantile of a set of data is the value that separates the data into two classes or groups such that a

proportion of the observed values falls below and a fraction of the observed values falls above that

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
65

value. It is worth noting that the main distinction between percentile and quantile is that the former

refers to a percentage of the data set, while the latter refers to a fraction of the data set.

Because the terms percentile and quantile are synonymous, the focus will exclusively be on the

quantile. Quantiles may present some difficulties when computing them from a set of data, as

implied by their definition. For instance, suppose a data set contains ten observations. One can

only split off a fraction of the data: 0.1, 0.2,…,0.9. If a fraction other than those provided, say 0.33,

must be split, there will be no value that separates off a fraction of.33.

Additionally, suppose one chooses to locate the split point at the nearest observation. They may

be unsure whether to count the observation in the lower or upper part of the observed data set's

scale. Practitioners employ a different but more practical operational definition of quantile to

circumvent these difficulties. The following step-by-step process is necessary for this definition of

quantile proposed by Ramachandran and Tsokos (2014).

Step 1: Consider first, for this new definition of quantile, a set of n raw data 𝑥 , 𝑥 , ⋯ , 𝑥 .

Step 2: Order the raw data in ascending order to obtain 𝑥 ,𝑥 ,⋯,𝑥 .

Step 3: Denote with p any fraction between 0 and 1.

Step 4: Define the quantile 𝑄 𝑝 corresponding to the fraction p as 𝑄 𝑝 𝑥 whenever p is one

𝑖 0.5
of the fractions 𝑝 𝑛, with 𝑖 1, ⋯ , 𝑛.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
66

As a result, the data's quantiles 𝑄 𝑝 are simply the ordered observations themselves, 𝑥 . Several

authors have proposed different expressions for pi. For example, Wilk and Gnanadesikan (1968)

provide different variations of the term of 𝑝 (e.g., 𝑝 𝑖


𝑛 1 ) and the reasoning behind each.

1.5.6 Describing Multiple Dimensional Data in Numerical Form

1.5.6.1 Sample Mean

Given a sample of n observations 𝑥 , 𝑥 , ⋯ , 𝑥 , the sample mean, also known as the empirical

mean, is denoted as x and is defined as Equation 1.58.

1 [Eq. 1-58]
𝑥̅ 𝑥 ,𝑘 1,2, … , 𝑝
𝑛

1.5.6.2 Sample Variance

The sample variance which measures the spread of the observations of the kth variable of X around

its mean 𝑥̅ is depicted by Equation 1.59.

1 [Eq. 1-59]
𝑠 𝑠 𝑥 𝑥̅ ,𝑘 1,2, … , 𝑝
𝑛

1.5.6.3 Sample Covariance

Another measure of spread is termed the sample covariance s . As defined in Equation 1.60, the

sample covariance assesses the spread pairwise between the ith and kth variables of X.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
67

1 [Eq. 1-60]]
𝑠 𝑥 𝑥̅ 𝑥 𝑥̅ ,𝑖 1,2, … , 𝑝; 𝑘 1,2, … , 𝑝
𝑛

The square root of the variance is known as the sample standard deviation. That is 𝑠 . It is worth

noting that when i equals k, the sample covariance reduces to the Variance.

1.5.6.4 Sample Correlation Coefficient

The final descriptive statistic is the sample correlation coefficient or Pearson’s product-moment

correlation coefficient, which measures the linear association between two variables. For the ith

and kth variables, the sample correlation coefficient 𝑟 is given by Equation 1.61 below.

𝑠 ∑ 𝑥 𝑥̅ 𝑥 𝑥̅ [Eq. 1-61]]
𝑟
𝑠 𝑠
∑ 𝑥 𝑥̅ ∑ 𝑥 𝑥̅

where 𝑖 1, 2, … , 𝑝 and 𝑘 1, 2, … , 𝑝.

While both 𝑟 and 𝑠 are adequate for determining the linear association of two variables, they

may be less useful for other types of associations. Moreover, they can be unreliable when

observations contain outliers, revealing associations when there is barely one. Therefore

questionable observations must be detected and corrected if needed. The following are some

properties of the sample correlation coefficient simply denoted as r. First, values of r are

inclusively bounded between 1 and 1. Second, an r value of “0” signifies a lack of linear

association between the variables, whereas a positive or negative sign of r indicates the direction

of their association. Lastly, the value of r remains the same whether or not the factor1 𝑛 or

1
𝑛 1 is chosen to calculate 𝑠 , 𝑠 , and 𝑠 . More on sample correlation coefficients and

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
68

descriptive statistics can be found in any statistics book, such as the book by Johnson and Wichern

(2019).

1.5.7 Determining the Data Distribution through Hypothesis Testing

Researchers usually rely on mathematical models to study various real-world phenomena for

simulation. Most often, the formulated model serves to draw random samples 𝑋 , 𝑋 ⋯ , 𝑋 from

the population of interest. The values of the samples, known as observations or measurements,

usually are denoted by 𝑥 , ⋯ , 𝑥 and represents values of some sort of a subject of interest. For

example, these measurements could represent subway arrival times, household electricity

consumption, antenna spectrum sensing in cognitive radios, etc. In the field of probability and

statistics, to understand the behavior of these phenomena, one must first identify the probability

distribution from which the given data are drawn (Ramachandran and Tsokos 2014). This is

supported by the fact that, as stated by Chambers et al. (1983, p. 191), "at the heart of probabilistic

statistical analysis is the assumption that a set of data arises as a sample from a distribution in some

class of probability distributions."

In any case, there are several reasons for making distributional assumptions about data. First, if a

set of data can be described as a sample from a specific theoretical distribution, say a normal

distribution, then the data can be described more compactly. For example, in the normal case, the

data can be succinctly described by providing the mean and standard deviation and stating that the

normal distribution well approximates the data's empirical (sample) distribution. The use of

distributional assumptions can also lead to statistical procedures. The assumption that normal

probability distributions generate data, for example, leads to an analysis of variance and least

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
69

squares. Third, the assumptions enable one to characterize the sampling distribution of statistics

computed during the analysis, drawing conclusions and making probabilistic statements about

unknown aspects of the underlying distribution. For example, assuming the data is a sample from

a normal distribution, one can use the t-distribution to calculate confidence intervals for the

theoretical distribution's mean. A fourth reason for making distributional assumptions is that

understanding the distribution of a data set can sometimes shed light on the physical mechanisms

involved in data generation (Chambers et al. 1983).

Analyses based on specific data distributional assumptions are invalid if the premises are not met

to a reasonable degree. According to Chambers et al. (1983), “Garbage in, garbage out.” When

attempting to validate the assumption about the distribution of the sampled data, one is trying to

verify if the empirical distribution can sufficiently be approximated by the assumed one. Clearly

defining the task prompts the investigator to take action to shed light on the issue. For instance, it

may encourage the investigator to look closely at how the empirical distribution of a set of data

differs from the theoretical distribution.

In the previous sections, graphical displays validate distributional assumptions about data;

however, in practice, investigators use goodness-of-fit testing to complement the latter. The testing

serves to identify a probability distribution function that would likely characterize the behavior of

the data or the phenomenon of interest. Due to its importance and relevancy to the current study,

this section discusses a couple of statistical tests (methods). These tests are the chi-square and

Kolmogorov-Smirnov goodness-of-fit tests which are practical to use in determining how well the

data fits a specific probability distribution necessary to achieve one’s goal of identifying the

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
70

underlying probability distribution of a given data set. However, it should be noted that other

methods exist that do not rely on population distributional assumptions. These methods are known

as nonparametric or distribution-free tests. While the nonparametric tests are beyond the scope of

this study, to further one’s knowledge, refer to Ramachandran and Tsokos (2014) or Soong (2004),

for instance.

1.5.7.1 Chi-Squared Goodness-of-Fit Test (case of known parameters)

To study an unknown phenomenon's behavior, one must first collect a random sample of data via

experiments or other means and test whether their probability distribution well-fit a known

probability distribution. Pearson's (1900) chi-squared (𝜒2 ) goodness-of-fit test is one of the most

popular and versatile tests designed for this purpose (Soong 2004). For instance, in applied

statistics, one may refer to McAssey (2013)’s paper for applications to multivariate distributions

with known hypothesized distribution functions. Nevertheless, when conducting a Pearson's chi-

square goodness-of-Fit test (𝜒 -test), it is commonly assumed that the population X of interest

distribution is known. The 𝜒 -test is based on the test statistic 𝑄 which is equivalent to the

difference between a frequency graph (e.g., a histogram) built from the sample values and one

constructed from the predicted distribution 𝐹0 (Soong 2004).

Development of the 𝜒 -Test

To define 𝑄 , let 𝑥 , ⋯ , 𝑥 be the observations of n random samples 𝑋 , 𝑋 ⋯ , 𝑋 drawn from a

population X characterized by the CDF 𝐹 𝑥 with known parameters [for unknown parameters,

for e.g., see Soong (2004)]. One may divide the sampled data range of the population X into k

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
71

mutually exclusive intervals (classes or bins) 𝐼 , ⋯ , 𝐼 and let 𝑂 be the number of X falling into

𝐼 , with 𝑗 1, 2, ⋯ , 𝑘. Note that 𝑂𝑗 is also referred to as the jth observed frequency from which

the observed probabilities 𝑝 can be derived as follows in Equation 1.62:

𝑂𝑗 [Eq. 1-62]
𝑝𝑗 ℙ 𝑋 ∈ 𝐼𝑗 𝑛, 𝑗 1, 2, ⋯ , 𝑘 .

Moreover, if 𝐹0 𝑥 denote the hypothesized (expected) CDF of the data being tested, then the

theoretical probability 𝑝𝑗 associated with the jth interval can be determined using Equation 1.63:

𝐸𝑗 [Eq. 1-63]
𝑝𝑗 ℙ 𝑋 ∈ 𝐼𝑗 | 𝐹0 𝐹0 𝑥𝑢 𝐹0 𝑥𝑙 𝑛, 𝑗 1, 2, ⋯ , 𝑘 .

Where 𝐸𝑗 is the jth expected (theoretical) frequency for the jth interval expressed in terms of 𝐹0

evaluated at the upper (𝑥𝑢 ) and lower (𝑥𝑙 ) bounds (limits) of the interval, and n is the sample size.

Then, the test statistic 𝑄2 , provided in Equation 1.64, is a measure of deviation between the observed

and expected outcome frequencies expressed in the sum of differences between the observed and

expected outcome frequencies (counts of observations), each squared and divided by the

expectation.
𝑘 2 [Eq. 1-64]
2 𝑂𝑗 𝐸𝑗
𝑄
𝐸𝑗
𝑗 1

Theorem: Assuming that the hypothesis below is valid, the distribution of the test statistic 𝑄 is

approximately distributed according to a chi-square distribution with (𝑘 1) degree of freedom,

denoted by 𝜒 , as 𝑛 → ∞ (see Soong 2004 to demonstrate this result).

𝐻 : the given data follow a specific probability distribution (𝐹 )

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
72

Versus

𝐻 : The data do not follow the specified probability distribution.

Through the above theorem, a test of the hypothesis 𝐻 versus 𝐻 , also known as a significance

test, can be formulated by assigning a probability 𝛼 of a type -I error. As discussed in Section 1.8,

One commits a type I error when one fails to reject the null hypothesis a priori assumed to be true.

Accordingly, if one aims to attain a type-I error probability of α, the 𝜒 -test recommends rejecting

the one-sided hypothesis test each time 𝑄 satisfies the criterion given by Equation 1.65 below.

ℙ 𝑄 𝜒 , 𝛼 [Eq. 1-65]

Otherwise, accept the hypothesis. For the convenience of calculations, given the probability 𝛼 of

the type-I error, also referred to as significance level, one can use a 𝜒 lookup table to find 𝜒 ,

corresponding to the test statistic. On the lookup table [e.g., see Montgomery and Runger (2007,

p. 655)], 𝜒 , corresponds to the upper 100α percentage point of the chi-square distribution, as

shown in Figure 1.14. In practice, the most common alpha values are 0.001, 0.01, and 0.05. A

value of between 5% and 1% is almost significant; a value between 1% and 0.01% is significant,

and a value below 0.01% is very significant (Soong 2004).

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
73

Figure 1.14: Illustration of the Percentage Points for a Chi-Square CDF


Courtesy of Montgomery and Runger (2007, p. 308)

Equation 1.65 represents the critical region, also known as the rejection region, for defining the

set of values for the test statistics for which the goodness-of-fit test is being tested.

Next, the following step-by-step procedures are necessary for performing a goodness-of-fit test

where the hypothesized probability 𝐹0 is entirely known.

Step 1: Divide the range of values of the random variable X into k non-overlapping intervals

𝐼1 , ⋯ , 𝐼𝑘 . Let 𝑂𝑗 be the total number of the values of the n samples that fall in the interval 𝐼𝑗 , with

𝑗 1, ⋯ , 𝑘. As a rule, if any , 𝐼𝑗 . has less than five values, merge 𝐼𝑗 with 𝐼𝑗 1 . or 𝐼𝑗 1 .

Step 2: Calculate each theoretical probability 𝑝̂ , as in Equation YY, associated with each interval

j by in terms of 𝐹 .

Step 3: Compute 𝑄 using Equation 1.64.

Step 4: Select a value of α to construct the critical region as specified in Equation 1.65 since the

test statistic is approximately distributed according to 𝜒 , .

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
74

Step 5: Conduct the 𝜒 -test below under the assumptions that n roughly sufficiently large than 50.

𝐻 :𝐹 𝑥 𝐹 𝑥

against the alternative

𝐻 :𝐹 𝑥 𝐹 𝑥

Step 6: reject the hypothesis 𝐻0 if 𝑄2 𝜒2𝛼,𝑘 1 , and conclude that the data does not follow or fit

the specified probability distribution. Otherwise, accept 𝐻0 and deduce that the data set fits the

prescribed probability distribution at the significance level α.

It is worth noting that if one uses statistical software, the software may compute a p-value based

on the test statistic 𝑄2 and indicate the significance level threshold value for the acceptance of 𝐻0

at all significance levels α less than that of the determined p-value. For more information on the

𝜒 -test including illustrations, one may refer to Soong (2004), Ramachandran and Tsokos (2014),

and Montgomery and Runger (2007).

1.5.7.2 The Kolmogorov-Smirnov Goodness-of-Fit Test

The Kolmogorov-Smirnov goodness-of-fit, denoted as the K-S test, is a test regarding a statistic

used to measure the deviation of the empirical cumulative histogram from the assumed cumulative

distribution function. Examples of applications in various fields include chemometrics or the

science of deriving information from chemical systems by data-driven methods. For instance,

Saccenti et al. (2011, p. 648) assessed the equivalence of the empirical equipercentile function

with TW distribution using a K-S test.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
75

K-S Test Development

Nevertheless, to define the test statistic of the K-S test, let n be the set of realizations 𝑥 , ⋯ , 𝑥 of

the random samples 𝑋 , 𝑋 ⋯ , 𝑋 drawn from the population of interest X. Let 𝐹 𝑥 be the

unknown true CDF of X, and 𝐹 𝑥 be the assumed CDF of X, which parameters are entirely

known. The goal is to verify through hypothesis testing whether

𝐹 𝑥 is the true CDF of X.

From the set of observed data 𝑥 , 𝑥 , ⋯ , 𝑥 , a cumulative histogram can be plotted in three steps:

Step 1: Rank the observations in the sample from smallest to largest. That is, the sample

𝑥 , 𝑥 , ⋯ , 𝑥 is arranged as 𝑥 ,𝑥 ,⋯,𝑥 .

Step 2: Assign an empirical CDF 𝐹 𝑋 𝐹 𝑥 𝑗/𝑛 to each ordered observation 𝑥 , with

𝑗 1, 2, ⋯ , 𝑛. In this case, let 𝐹 𝑥 without a hat represent the unknown CDF of X, which is being

sought to be equal to 𝐹 .

Step 3: Join the values of 𝐹 𝑥 by straight-line segments.

The below equation defines the test statistic to consider in this case.
𝑛 𝑛 [Eq. 1-66]
𝐷2 𝑚𝑎𝑥 𝐹 𝑋 𝑗 𝐹0 𝑋 𝑗 𝑚𝑎𝑥 𝑗 𝑛 𝐹0 𝑥 𝑗
𝑗 1 𝑗 1

where X(i) represents the jth-order statistic of the sample. As defined, 𝐷 thus quantifies the

maximum absolute values of the n differences between the observed and hypothesized CDFs

evaluated for the observed samples. When the parameters in the theorized distribution must be

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
76

estimated, the values for 𝐹 𝑋 are calculated using the distribution’s estimated parameter

values. While obtaining the distribution of 𝐷 analytically is complicated; its distribution function

at various values can be computed numerically and tabulated. It can be demonstrated that the

probability distribution of 𝐷 is independent of the assumed distribution 𝐹 and is only a function

of n, the sample size (e.g., see Massey, 1951). At this point, the K–S test becomes similar to the

𝜒 -test. At a given significance level α, the operating rule is to reject the hypothesis 𝐻 , as below

defined, if 𝑑 > 𝑐 , and accept 𝐻 otherwise.

𝐻 : The inherent CDF 𝐹 𝑥 of the sampled data is the hypothesized CDF 𝐹 𝑥 ,

Versus

𝐻 : The theoretical CDF 𝐹 𝑥 is not the true CDF 𝐹 𝑥 of the empirical data.

In this case, as given in Equation 1.67, 𝑑 is the sample value of 𝐷 , and 𝑐 , represents the critical

values of the maximum absolute difference between a sample and population CDFs defined by

Equation 1.69.

ℙ 𝐷 𝑐 , 𝛼 [Eq. 1-67]

The values of 𝑐 , for 𝛼 0.001, 0.01, and 0.05 can be found numerically and tabulated as

functions of n [e.g., see Soong (2004, p. 372)]. For instance, for larger 𝑛 (𝑛 40) and 𝛼 0.05,

𝑐 1.36 .
,
√𝑛

Step-by-Step Procedure to Conduct a K–S Test

The following is a step-by-step procedure for carrying out the K–S test:

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
77

Step 1: Sort the sampled data 𝑥 , 𝑥 , ⋯ , 𝑥 in increasing order to obtain 𝑥 ,𝑥 ,⋯,𝑥 .

Step 2: Calculate the observed CDF 𝐹 𝑥 as provided in Equation 1.68 at each 𝑥

𝑗 [Eq. 1-68]
𝐹 𝑥 , 𝑗 1, 2, ⋯ , 𝑛
𝑛

Step 3: Using the hypothesized distribution, compute the theoretical distribution function 𝐹 𝑥 at

each 𝑥 . If necessary, the distribution 𝐹 ’s parameters are estimated from the data.

Step 4: Work out the differences between the empirical and assumed CDF evaluated at each 𝑥

as Equation 1.69 specifies.

𝐹 𝑥 𝐹 𝑥 , 𝑗 1, 2, ⋯ , 𝑛 [Eq. 1-69]

Step 5: Compute 𝑑 using Equation 1.67 for D2. Note that plotting 𝐹 𝑥 and 𝐹 𝑥 as functions of

x and noting the location of the maximum by examination may save time.

Step 6: Select a value of α and find 𝑐 , by using a table such as Table 1-5 hereby provided. One

may also refer to Massey (1951) for a complete table.

Step 7: Accept 𝐻 if 𝑑 𝑐 , . Else, reject 𝐻 .

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
78

Table 1-5: Kolmogorov-Smirnov Test - Critical Values between Data Sample and
Hypothesized CDFs
Courtesy of Soong (2004, p. 372)

Difference Between the 𝜒 -Test and K–S Test

It's worth noting the significant differences between this test and the 𝜒 -test. The K–S test is valid

for all values of n, whereas the 𝜒 -test is a large-sample test. Furthermore, the K–S test employs

unaltered and unaggregated sample values, whereas data lumping is required in the execution of

the 𝜒 -test. On the downside, the K–S test is only strictly valid for continuous distributions. It

should also be mentioned that available tabulated 𝑐 , values are based on a completely specified

hypothesized distribution. There is no rigorous method of adjustment available when the parameter

values of the assumed CDF must be estimated. In these cases, the only thing that can be said is

that the values of 𝑐 , should be reduced slightly (Soong 2004).

Illustrations of the K-S Test

To see an example of the K-S test in action, see Ramachandran and Tsokos’ (2014) book, which

includes an in-depth example. Following and providing all of the processes outlined herein, the

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
79

authors make it easy for one to obtain a hands-on hypothesis testing approach. Additionally, one

may refer to another well-illustrated example by Soong (2004). Finally, see Saccenti et al. (2011)

and Herzog et al. (2007) for chemometrics and structural equation modeling for real-world

applications, respectively.

1.5.8 Covariance Matrices: Introduction

Practitioners in various fields, especially in probability and statistics, use covariance matrices to

investigate or quantify interdependencies between the items of a population. For, covariance

matrices are essential objects for multivariate statistical analysis. In most methods employed in

multivariate statistical analysis, the underlying structure of the population units is unknown or

assumed. Accordingly, theoretical covariance matrices play an enormous role by serving as objects

to describe the true interdependency structure of the units of the underlying population (Bejan

2005). Still, through statistical measurements collected from a population, the sample covariance

matrices typically help clarify, to “some extent,” the interdependence structure present in the

population items.

1.5.8.1 Meaning of a Covariance Matrix

Mathematically, a covariance matrix, also known as a variance-covariance matrix or variance

matrix, is a collection, in a square matrix, of covariances between pair of variables of a 𝑝 1

random vector 𝑿 𝑋 ,⋯,𝑋 . Intuitively, one may see a covariance matrix representing

variation or spread between pairs, a collection of points in a multi-dimensional space.

Geometrically, those variations can be illustrated by considering the p elements 𝑥 , 𝑥 , … , 𝑥 of a

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
80

vector x as the realizations of the p random 𝑋 , ⋯ , 𝑋 . In p-dimensional space, the elements of x can

be interpreted as the coordinates of a point in the space. In addition, the distance of the point 𝐱

𝑥 ,𝑥 ,…,𝑥 to the space, the origin can be regarded as standard deviation, square root of the

variance, or scaled difference between units. In this manner, the inherent uncertainty or variability

in the observations should be accounted for. In addition, points of similar related “uncertainty”

should be considered as if they were at an equal distance from the space origin.

1.5.8.2 Euclidian and Statistical Distances

Many multivariate techniques are based on the simple concept of distance. This section aims to

define the Euclidean distance and the non-ordinary statistical distance. Let the points P and Q,

whose coordinates are defined by Equation 1.70(b), be represented in the Euclidian plane of origin

O, as illustrated in Figure 1.15(a). Equation 1.70(d) depicts the straight-line distance between point

P and origin O using the Pythagorean theorem. Equation 1.70(d) represents the expression of the

straight-line distance. Additionally, Equation 1.70(d) is an extension of Equation 1.70(c),

expressing the straight-line distance between points P and O.

d O, P x x ,P x ,x ,O 0,0 (a)

P x ,x ,⋯,x , Q 𝑦 ,𝑦 ,⋯,y (b)

(c) [Eq. 1-70]


d O, P x x ⋯ x ,P x ,x ,⋯,x , O 0,0, ⋯ ,0

(d)
d P, Q 𝑥 𝑦 𝑥 𝑦 ⋯ 𝑥 𝑦 ,

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
81

P x2
d O, P x x 𝑐 𝑠
P
x2
x1
𝑐 𝑠 O 𝑐 𝑠
O
x1 𝑐 𝑠

(a) Euclidian distance (b) Ellipse of constant statistical distance


𝐱 𝟏𝟐 𝐱 𝟐𝟐
𝐝𝟐 𝐎, 𝐏 𝐜𝟐
𝐬𝟏𝟏 𝐬𝟐𝟐

Figure 1.15: Euclidian and Statistical Distance Illustrations


Adaptation from Johnson and Wichern (2019)

By transforming the coordinates of P and Q, the formulas provided above can be expressed in

terms of statistical distances. One may perform this transformation in the way that the variability

in the 𝑥 direction is identical to the one in the 𝑦 direction, and the measurements or values of 𝑥

and 𝑦 vary independently to apply the Euclidian distance. In other words, all points P located at

a constant squared distance from Q must lie on a hyperellipsoid centered at Q whose major and

minor are parallel to the coordinate axes, as shown in Figure 1.15(b) for the case of two-dimension

points. As Johnson and Wichern (2019) derived, Equation 1.71(a) through Equation 1.71(b) is the

generalized expression of the statistical distance.

x y x y x y
d P, Q ⋯ (a)
s s s

P x ,x ,⋯,x (b) [Eq. 1-71]

Q 𝑦 ,𝑦 ,⋯,y (c)

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
82

where s , s , …, and s in Equation 1.71(a) are the sample variances constructed from p

measurements on 𝑥 , 𝑥 , ⋯ , 𝑥 .

1.6 Probability Distributions and Random Variables

1.6.1 Preface

In a broader sense, probability refers to the branch of mathematics concerned with the study of

probabilities dealing with the likelihood of events to happen. It applies daily to assessing risks and

devising mathematical models for calculations and predictions. More specifically, the term

probability, a probability measure, quantifies the possibility or chance (e.g., 25%) that a random

event (e.g., rain) will occur. Thus, it is a statement about the likelihood of an event. The probability

assigns to this event a value from the interval [0,1] to the outcome as a ratio in percentage. As

defined, one may interpret the probability of an outcome as a subjective degree of belief under

which the outcome will occur. Another interpretation may be that probability depends on the

theoretical model of repetitive replicas of the random experiment. To compute the probability of

an event, one needs first to define the sample space (Ω), which represents the collection of all

outcomes or results on any given attribute or variable of the population in question. The rule of

thumbs when assigning probabilities to outcomes making up the sample space is that one of the

probabilities of all results must sum up to one (Montgomery and Runger 2007). One can construct

subsets of Ω known as events from the sample space. An event (ℱ) represents a set of the

experiment’s possible outcomes given an investigation. One would say that this experiment has

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
83

occurred if all the experiment outcomes are within ℱ, which is closed under complements and

countable unions.

A sample space can either be discrete or continuous. When the set of its results is finite or countable

infinite, then Ω is discrete. Whereas, when the set of its outcomes falls in an interval of real

numbers, then Ω is continuous. A probability space (Ω, ℙ, ℱ) is a triple consisting of a set of subsets

of Ω and the probability measure ℙ. The following sections will discuss this topic in depth. The

purpose, for now, is to provide some background information on probability distributions for

random variables. While Section 1.6.2 will be concerned with univariate random variables, Section

1.6.3 will focus on multivariate random variables considering the latter's generalization as it

involves more than one variable.

1.6.2 Univariate Random Variables and Probabilities

Everyday life is full of endless systems. An investigator trying to understand any of them can

randomly create a mathematical model to experiment with all the system events. Usually, the

investigator uses random variables to analyze the system, and the resulting results may serve in

other applications. For example, in many experiments involving probability, knowing the

expression of the function that relates an experiment to its possible outcome is more informative

than the outcome itself. A random variable is that function assigning a numerical value to each

possible outcome after the realization of each experiment contained in the sample space Ω. For

instance, in a coin-tossing example where Ω represents the set of all tails and heads, X may define

the number of times the coin shows a tail. In this case, X is a univariate discrete random variable

that differs from a continuous variable. Though, the probability distribution of a random variable

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
84

X, discrete or continuous, represents the probabilities attributed to the possible values of X

(Montgomery and Runger 2007). Concerning both types of random variables, the following

sections describe the probability distributions of each variable and the parameters used to

summarize distributions.

1.6.2.1 Discrete Random Variables

Probability Distribution and Mass Functions

Let Ω be a discrete sample space and ℱ a subset of Ω containing the outcomes of a given

experience. In addition, let X be the random variable representing the outcomes contained in ℱ.

Moreover, let X take values 𝑥 , 𝑥 , ⋯ 𝑥 respectively associated with the probabilities 𝑝 , 𝑝 , ⋯ ,

𝑝 . Each 𝑝 represents the distribution of the random variable X at its discrete value 𝑥 . From the

given probability information on X, one may formulate the probability of X in terms of a function

as the probability that X takes on a value that is smaller or equal to a preselected value “x.” This

probability distribution corresponds to the sum of the probabilities 𝑝 , 𝑝 , ⋯ 𝑝 as provided in

Equation 1.72 below,

𝐹 𝑥 𝑃 𝑋 𝑥 𝑝 [Eq. 1-72]

where 𝐹 𝑥 represents X's cumulative distribution function (CDF) satisfying the following

properties given by Equation 1.73(a) and Equation 1.73(b).

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
85

0 𝐹 𝑥 1 (a)
[Eq. 1-73]

If 𝑥 𝑦, then 𝐹 𝑥 𝐹 𝑦 (b)

It is customary to associate F(x), defined earlier by Equation 1.73, with a probability mass function

𝑓 𝑥 as defined by Equation 1.74(c) and satisfying the conditions provided in Equation 1.74 (a),

Equation 1.74(b), and Equation 1.74(d) to ensure the conservation of the physical mass of the

system represented by X.

𝑓 𝑥 0 (a)

𝑓 𝑥 1 (b) [Eq. 1-74]

𝑓 𝑥 𝑃 𝑋 𝑥 𝑝 (c)

𝐹 𝑥 𝑃 𝑋 𝑥 𝑓 𝑥 (d)

For an illustration of the relationship between 𝐹 𝑥 and 𝑓 𝑥 , one may refer to Figure 1.16.

Meanwhile, the literature contains countless examples of discrete probability distributions. Among

the classic ones is the discrete uniform distribution where X is the simplest discrete variable

assuming a finite number of possible values, with equal probability each and 𝑓 𝑥 1 . For
𝑛

more examples on this topic and its applications to various fields, one may refer to the book by

Montgomery and Runger (2007).

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
86

Summaries of the Probability Distribution of a Discrete Random Variable

The expected value of the discrete variable X, denoted by 𝐸 𝑋 or 𝐸 𝑋 , represents the mean μ of

X. Its expression is provided in Equation 1.75 as a weighted average of all possible outcomes of

X.

[Eq. 1‐75]
𝜇 𝐸 𝑋 𝑥𝑝

Note that 𝐸 𝑋 is not necessarily a value that X can assume and may be different from the most

probable value of X. In addition, if one is interested in the expected value of X or any function

g(X) of X, the following should apply, let 𝑋 be of a function of X. Equation 1.76 would then

serve to determine the resulting expected value 𝐸 𝑋 . For this function, 𝐸 𝑋 is referred to as

the rth moment of the origin of X.

[Eq. 1‐76]
𝜇 𝐸 𝑋 𝑥 𝑓 𝑥 𝑥 𝑝

Also of interest as summaries of the probability distribution of X are the variance 𝑣𝑎𝑟 𝑋 and

standard deviation 𝜎 of X. As defined in Equation 1.77(b), 𝜎 represents the square root of the

variance of X given in Equation 1.77(a).

𝑣𝑎𝑟 𝑋 𝜎 𝑋 𝐸 𝑋 𝜇 𝑥 𝜇 𝑝 (a)
[Eq. 1‐77]

𝜎 𝑣𝑎𝑟 𝑋 (b)

Both 𝑣𝑎𝑟 𝑋 and 𝜎 measure the spread in the values of X. It is worthwhile noting that the variance

of X represents the expected value of the random variable X μ . In addition, the variance σ X

does not have the same dimension as the values of X. Instead, its dimension is the square root of

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
87

the dimension of values taken by X. Skewness and kurtosis are other well-known measures of the

observed data distributions that can be determined using Equation 1.82 and Equation 1.83.

1.6.2.2 Continuous Random Variable

Distribution and Density Functions

Unlike the discrete random variable in the previous section, a continuous random variable would

have a very distinct distribution because the possible values of X will be uncountable. On the other

hand, because an interval of real numbers represents the range of the values of X, one may regard

this range as a continuum. The continuous distributions find extensive applications in various

fields, including engineering, describing physical systems. For instance, the density of a cantilever

with a uniform load on a beam between points a and b (see Lindeburg 2011, p. A-120, Montgomery

and Runger 2007, p. 99). As Montgomery and Runger (2007) wrote, the integral of the density

function over the interval [a, b] represents the total loading in this interval. One may interpret the

integral as the summation of all the infinity loading points over the interval [a, b]. Nonetheless,

similar to discrete random variables, a density function 𝑓 𝑥 and a probability distribution function

𝐹 𝑥 serve to describe the probability of a continuous random variable X between the reals a and

b. They both 𝑓 𝑥 and 𝐹 𝑥 satisfy the properties given in Equation 1.78(a) through Equation

1.78(d).

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
88

𝑓 𝑥 0 (a)

𝑓 𝑥 𝑑𝑥 1 (b)

[Eq. 1-78]
𝐹 𝑥 𝑃 𝑋 𝑥 𝑓 𝑥 𝑑𝑥, ∀𝑥 ∈ℝ (c)

(d)
𝑃 𝑎 𝑋 𝑏 𝐹 𝑏 𝐹 𝑎 𝑓 𝑥 𝑑𝑥, ∀ 𝑎, 𝑏 ∈ ℝ

Equation 1.75(a) through Equation 1.75(d), 𝐹 𝑥 or 𝑃 𝑎 𝑋 𝑏 represents the area under

𝑓 𝑥 between ∞ and x or a and b as shown in Figure 1.15(b) below.

(a) Discrete random variable (b) Continuous random variable


Probability Mass function 𝑓 𝑥 at each point of X CDF 𝐹 𝑥 of X as an area under the PDF 𝑓 𝑥

Figure 1.16: Probability Distribution and Mass (resp. Density) Functions Illustrations for
Discrete (resp. Continuous) Random Variables
Adapted from Montgomery and Runger (2007, p. 65, 99)

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
89

Summaries of the Probability Distribution of a Continuous Random Variable

Let X be a continuous random variable equipped with a PDF 𝑓 𝑥 and a CDF 𝐹 𝑥 as defined in

Equation 1.78. The expected value 𝐸 𝑋 of X which also corresponds to the mean 𝜇 of X is defined

by Equation 1.79.

[Eq. 1‐79]
𝜇 𝐸 𝑋 𝑥𝑓 𝑥 𝑑𝑥

Whenever it exists, the expected value 𝐸 𝑋 , denoted by 𝜇 , with 𝑟 2, 3, 4 ⋯, of the continuous

function 𝑋 , also referred to as the rth moment about the origin of X, is given by Equation 1.80.

[Eq. 1‐80]
𝜇 𝐸 𝑋 𝑥 𝑓 𝑥 𝑑𝑥

In addition, when it exists, the rth moment about the mean μ of X, also called the central rth

moment of X, is defined as 𝐸 𝑋 𝜇 and identified by 𝜇 , with 𝑟 2, 3, 4, ⋯In particular,

𝐸 𝑋 𝜇 𝜇 and 𝜎 𝑋 𝜇 , where 𝜎 𝑋 represents the standard deviation of X, defined in

Equation 1.81 by the variance var(X) square root.

𝑣𝑎𝑟 𝑋 𝜎 𝑋 𝐸 𝑋 𝜇 𝑥 𝜇 𝑓 𝑥 𝑑𝑥
[Eq. 1‐81]

While the mean 𝜇 and standard deviation 𝜎 are useful descriptive statistics for locating the center

and describing the spread or dispersion of the probability density function f (X), they do not give

a detailed description of the distribution. For instance, two distributions may have identical mean

and variance but are different (Ramachandran and Tsokos 2014). Thus, to approximate the

probability distribution of a continuous random variable more accurately, the higher moments or

rth moment defined earlier serve to introduce a couple of other measures of skewness and kurtosis.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
90

The third standardized moment about the mean μ, as described in Equation 1.82, is referred to as

the skewness of the random variable X distribution.

𝐸 𝑋 𝜇 𝜇
𝛼 𝑋 ⁄ [Eq. 1‐82]
𝜎 𝑋 𝜇

Note that skewness measures a density function's asymmetry (lack of symmetry) around its mean.

A distribution, or data set, is said to be symmetrical if it appears identical to the left and right of

the center point. If 𝛼 𝑋 0, the distribution is symmetric around the mean; if 𝛼 𝑋 0, the

distribution has a longer right tail; and if 𝛼 𝑋 0 the distribution has a longer left tail. Thus, a

normal distribution has no skewness.

The Kurtosis, defined as the standardized fourth moment of the mean 𝜇 and given by Equation

1.83, indicates whether a distribution is peaked or flat in comparison to a normal distribution

𝐸 𝑋 𝜇
𝛼 𝑋 [Eq. 1‐83]
𝜎 𝑋

In addition, kurtosis is determined by the size of the tails of a distribution. When the kurtosis is

positive, there are not enough observations in the tails of the distribution; when it's negative, there

are way too many. Leptokurtic distributions have substantial tails, while platykurtic distributions

have negligible tails. Mesokurtic distributions have equal kurtosis as a normal distribution. The

kurtosis of a regular normal distribution is known to be 𝛼 𝑋 3.

1.6.2.3 Examples of Probability Distributions

Although the literature on probability distributions suggests numerous probability distribution

functions, the four examples provided in this section have been selected for relevance to the

proposed study. However, they are part of the most extensively used distributions in various

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
91

applications by users of different fields. The four identified distributions are uniform, triangular,

normal, and beta distributions. The following paragraphs cover each distribution by defining its

CDF and PDF functions and summaries’ values described in Section 1.6.2.2 devoted to continuous

random variables. In addition, each paragraph outlines a few examples of their applications to

construction engineering and management. For additional information on these or possibly other

distributions, see a probability and statistics book such as by Tijms (2007) or Montgomery and

Runger (2007).

Uniform Distribution

A continuous random variable X is said to exhibit over the interval [a, b] a uniform distribution if

its probability density function 𝑓 𝑥 and cumulative probability distribution function 𝐹 𝑥 are as

given in Equation 1.84(a) and Equation 1.84 (b). Equation 1.84 (c) and Equation 1.84 (d) provide the

variable variance and mean.

1
, ∀ 𝑥 ∈ 𝑎, 𝑏
𝑓 𝑥 𝑏 𝑎 (a)
0, ∀ 𝑥 ∉ 𝑎, 𝑏

0, ∀ 𝑥 𝑎


1, ∀ 𝑥 𝑏
𝐹 𝑥 (b)
⎨ [Eq. 1-84]
⎪ 1
⎩𝑏 𝑎 , ∀ 𝑎 𝑥 𝑏

𝑎 𝑏
𝜇 𝐸 𝑋 (c)
2

𝑏 𝑎
𝑣𝑎𝑟 𝑋 (d)
12

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
92

Below provided in Figure 1.17, from left to right, are the graphical representations of 𝑓 𝑥 and

𝐹 𝑥 in [a, b].

f(x) F(x)

1 1
𝑏 𝑎

0 0
a b x a b x
Figure 1.17: Uniform Probability Density and Distribution Functions’ Graphs

The uniform distribution is frequently used to simulate quantities that vary randomly and assume

values between a and b with little knowledge outside of a and b (Tijms 2007). For example, in

construction management and scheduling, when deciding which distribution to employ for project

activity durations, Williams (1992) noted that the uniform distribution is only relevant in rare

instances. Among those uncommon occurrences, this investigation discovered a small number of

studies that used the uniform distribution. De Reyck and Herroelen (1996) investigated the

potential use of both the coefficient of network complexity (CNC) and the complexity index on

networks with activity periods taken from the uniform distribution in the range [1, 10]. Dodin and

Elmaghraby (1985) used randomly generated networks with discrete and uniform distributions of

activities to approximate the criticality indices of activities in PERT Networks. Liu et al. (2006)

suggested an evolutionary method for minimizing the overall weighted tardiness of all jobs to be

performed using a single machine with random machine breakdowns. The algorithm was evaluated

using job parameters such as work weights and job release time produced from discrete uniform

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
93

distributions. Mehta (1999) used the same idea: job processing times are produced from a discrete

uniform distribution. For all activities required in designing software that builds project networks,

Agrawal et al. (1996) employed uniform distributions over specified ranges to randomly assign

values to the cost and resource parameters.

Triangular Distribution

The behavior of a random variable X is said to follow a triangular distribution if its probability

density function 𝑓 𝑥 and cumulative distribution function 𝐹 𝑥 over an interval [a, b] are

respectively given by Equation 1.85(a) and Equation 1.85(b). Equation 1.85(c) and

Equation 1.85(d) provide the variable variance and mean.

2 𝑥 𝑎
⎧ , ∀ 𝑎 𝑥 𝑚
⎪ 𝑚 𝑎 𝑏 𝑎

𝑓 𝑥 2 𝑏 𝑥 (a)
⎨ , ∀ 𝑚 𝑥 𝑏
𝑚 𝑎 𝑏 𝑎


⎩ 0, ∀ 𝑥 ∉ 𝑎, 𝑏

0, ∀ 𝑥 𝑎

⎪1 , ∀ 𝑥 𝑏

⎪ [Eq. 1-85]
𝐹 𝑥 𝑥 𝑎 (b)
⎨ 𝑚 𝑎 𝑏 𝑎 , ∀ 𝑎 𝑥 𝑚


⎪ 1 𝑏 𝑥
, ∀ 𝑚 𝑥 𝑏
⎩ 𝑏 𝑎 𝑏 𝑚

1
𝜇 𝐸 𝑋 𝑎 𝑏 𝑚 (c)
3

1
𝑣𝑎𝑟 𝑋 𝑎 𝑏 𝑚 𝑎𝑏 𝑎𝑚 𝑏𝑚 (d)
18

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
94

Below provided in Figure 1.18, from left to right, are the graphical representations of 𝑓 𝑥 and

𝐹 𝑥 in [a, b].

f(x)

2
𝑏 𝑎

0
a m b x

Figure 1.18: Triangular Probability Density and Distribution Functions’ Graphs

The values a, b, and m carefully chosen are the three parameters of the triangular distribution.

Figure 1.18 from left to right depicts the triangular probability density respectively 𝑓 𝑥 and

cumulative distribution functions. As shown, the density function 𝐹 𝑥 is an increasing function

on the interval [a, m] and decreasing on [m, b]. The triangular distribution is often recommended

for modeling purposes when there is only a little information on the variable of interest, such as

its likely lowest value a, mode m, and highest value b. Unlike the uniform distribution, the

triangular distribution can be skewed positively or negatively. Hence, unlike other curves, it is a

skewed curve that is either negatively or positively skewed. A high curvature characterizes a

positively skewed curve to the left and a tail to the right of the curve. Unlike a positively skewed

curve, a negatively skewed curve has its curvature to the right and tail to the left (Holliday et al.

2008).

Normal (Gaussian) Distribution

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
95

Let [a, b] be a real-valued interval. A continuous variable X is said to have a normal distribution

if its probability density 𝑓 𝑥 and cumulative probability 𝐹 𝑥 functions are as given by Equation

1.86(a) and Equation 1.86(b). In both equations, the constants µ and σ (positive) are as provided

in Equation 1.86(c) and Equation 1.86(d), representing the parameters mean and variance of the

distribution.

1 1
𝑓 𝑥 𝑒𝑥𝑝 𝑥 𝜇 , ∞ 𝑥 ∞ (a)
𝜎√2𝜋 2𝜎

𝐹 𝑥 𝑃 𝑋 𝑥 𝑓 𝑥 (b)

[Eq. 1-86]
1 1
𝑃 𝑎 𝑋 𝑏 exp 𝑥 𝜇 𝑑𝑥, ∀ 𝑎, 𝑏 ∈ ℝ (c)
𝜎√2𝜋 2𝜎

𝐸 𝑋 𝜇 (d)

𝑣𝑎𝑟 𝑋 𝜎 (e)

Below provided in Figure 1.18, from the left (a) to the right (b), are illustrations of the graphical

representations of 𝑓 𝑥 and 𝐹 𝑥 (Montgomery and Runger 2007). As depicted in Figure 1.18(a),

the normal density curve is symmetric around its mean µ, where it peaks. Its statistics median and

mode are identical and coincide with the mean µ. In addition, any linear combination of normally

distributed random variables is again a normally distributed variable (Tijms 2007, p. 144). For

example, the standardized random variable Z of the normally distributed random variable X is a

standard normal variable, as given in Equation 1.87.

𝑋 𝜇
𝑍 , 𝐸 𝑍 0 , 𝑣𝑎𝑟 𝑍 1 [Eq. 1-87]
𝜎

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
96

Z has a zero mean and unit variance. As defined, Z represents the distance of X from its mean 𝜇

with regards to its standard deviation σ, as depicted in Figure 1.19(c) below. As Montgomery and

Runger (2007, p. 113) said, using Z instead of X is “the key step to calculate a probability for an

arbitrary normal random variable” such as X. For instance, Figure 1.19(d) depicts the probability

that a measurement (𝑥) of the variable X, with 𝜇 10 𝑚𝐴 and 𝜎 4 𝑚𝐴 will exceed 13

milliamps (mA).

Figure 1.19: Graphs of the Normal Probability Density and Distribution Functions

To find this probability, one needs first to calculate the corresponding z-value (𝑧 𝑥 𝜇 ⁄𝜎)

obtained through standardization of x through Equation 1.87, then use a lookup table (e.g., Johnson

and Wichern 2019, p. 758) to determine 𝑃 𝑍 1.5 1 𝑃 𝑍 1.5 0.06681 represented

by the unshaded area in this Figure 1.19(d). Nevertheless, any plot of 𝑓 𝑥 in Figure 1.19(a), is

known as the bell or Gaussian curve named after Carl Friedrich Gauss, the famous mathematician

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
97

who first discovered it between 1777 and 1855. The normal curve is a sort of natural law, whose

formulation in part was made possible thanks to the three well-known mathematical constants √2,

π=3.141, and e=2.718 (Tijms 2007). In addition, from its additive property, the central limit

theorem was derived (see Section 2.3.2.3, page 153). Regarding the normal distribution

applications, they are countless across different fields. Its popularity is undoubtedly due to its

university, which was first acknowledged by the Belgian statistician Adolphe Quetelet (1796-

1874) while fitting a large number of data collected from different areas of science. Owing to its

universality, many in the eighteenth to the nineteenth century considered it a God-given law (Tijms

2007). The following are a few applications of the Gaussian distribution: the study of the height of

randomly chosen persons, the annual rainfall in a specific area, and the time of occurrence between

earthquakes in a given region. Unfortunately, in construction scheduling, the normal distribution

is not favored among practitioners because of the probability of being negative, as Williams (1992)

said, although this may be prevented.

Beta Distribution and Gamma Function

The beta family of distributions is traced to 1676 in a letter from Sir Issac Newton to Henry

Oldenbeg. Because they can be fitted nearly to any data representing a system, they have been

used extensively in statistical theory and practice for over a century in diverse fields (Nadarajah

and Kotz 2007). The behavior of a continuous random variable X governed by a beta density of

positive shape parameters α and β would have the probability density 𝑓 𝑥 and distribution 𝐹 𝑥

functions are given by Equation 1.88(a) and Equation 1.88(b) and mean and variance as in

Equation 1.88(c) and Equation 1.88(d),

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
98

𝑐𝑥 1 𝑥 , ∀ 0 𝑥 1
𝑓 𝑥 (a)
0, ∀ 𝑥 ∉ 0, 1

0, ∀ 𝑥 0

⎪𝛤 𝛼 𝛽
𝐹 𝑥 𝑢 1 𝑢 𝑑𝑢 , ∀ 𝑎 𝑥 𝑚 (b)
⎨𝛤 𝛼 𝛤 𝛽 [Eq. 1-88]

⎩ 1, ∀ 1 𝑥
𝛼
𝜇 𝐸 𝑋 (c)
𝛼 𝛽

𝛼𝛽
𝑣𝑎𝑟 𝑋 (d)
𝛼 𝛽 𝛼 𝛽 1

where the parameter c expressed in terms of the gamma function Г, which already computed values

are available on mathematical tables, is as follows:

𝛤 𝛼 𝛽
𝑐 (a)
𝛤 𝛼 𝛤 𝛽
[Eq. 1‐89]
(b)
Г 𝑎 𝑒𝑥𝑝 𝑦 𝑦 𝑑𝑦 , ∀ 0 𝑎

Based on the shape parameters α and β, the graph of the beta density function can take a wide

range of different shapes, as depicted in Figure 1.20 (Tijms 2007) for different values of the pair

𝛼, 𝛽 in (a) and (b) of this figure. Note that an extreme case that reduces the beta PDF to a constant

function is obtained when 𝛼 𝛽 1 on an interval (0,1).

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
99

𝑓 𝑥
𝑓 𝑥

𝛽 2 and 𝛼 ∈ 1, 0.8, 0.5, 0.2 𝛼, 𝛽 ∈ 1,7 , 2,6 , 4,4 , 6,2 , 7,1


α and 𝛽 selected in a way that 𝛼 𝛽 8
(a) (b)

Figure 1.20: Beta Probability Density Functions’ Graphs


Adapted from Soong (2004, p. 222)

Nevertheless, thanks to its versatility over a finite interval, the beta distribution models many

physical quantities (Soong 2004), including random proportions (Tijms 2007). Areas of

application include tolerance limits, quality control, and reliability (Soong 2004, p. 223). However,

construction engineering and management practitioners seem not to favor this distribution. Indeed,

this is due to its four degrees of freedom which render the beta distribution complex to understand

and its parameters challenging to determine, as Williams (1992) asserted. He said, “If the

distribution should be unlimited, it is not applicable, but where it can be limited, the beta fails on

its understandability.” Yet, Schexnayder et al. (2005) used the beta distribution to develop a PDF

for construction simulation modeling applications. Their proposed technique assumed the

existence of a ratio relating the 75th percentile to the mode of a set of data durations given a

construction project activity.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
100

Chi-Square Distribution (𝜒

As Montgomery and Runger (2007, p. 131) wrote, “the chi-squared distribution is a special case

of the gamma distribution.” Although not covered in this section, the gamma probability

distribution has applications in various fields. For instance, in engineering, particularly in

hydrologic engineering, the gamma probability distribution has been used to estimate flood

quantiles in hydraulic design (Ashkar and Ouarda, 1998). Like its parent distribution, the Chi-

Square distribution is extensively used in various fields thanks to its usefulness in statistical

inference and confidence intervals. Hence, including it as an example is worthwhile. To define the

chi-square, shorthand 𝜒 , distribution, let n be a positive integer. A random variable X is said to

have a 𝜒 distribution with n degrees of freedom if only if X is a gamma random variable

characterized by the parameters 𝛼 𝑛


2 and 𝛽 2. That is denoted by 𝑋~𝜒 𝑛 . Accordingly,

Equation 1.90 provides the expression of the𝜒 distribution pdf with n degrees of freedom as well

as the expressions of its mean 𝜇 and variance 𝑣𝑎𝑟 𝑋 (Ramachandran and Tsokos 2014).

1
⎧ 𝑥 ⁄
exp 𝑥 ⁄2 , ∀ 0 𝑥 ∞
⎪𝛤 𝑛
2 ⁄
𝑓 𝑥 2 (a)


⎩ 0, ∀ 𝑥 0 [Eq. 1-90]

𝜇 𝐸 𝑋 𝑛 (b)

(c)
𝑣𝑎𝑟 𝑋 2𝑛

1.6.3 Multivariate Random Variables

So far, the previous sections from Section 1.6.2 focused solely on probability distributions of

univariate random variables. The following sections will discuss the probability distributions of

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
101

multivariate random variables whose applications abound in various probability and statistics-

related fields. As addressed in Section 1.6.3, multivariate variate random variables arise when an

investigator seeks a system characterized by more than one univariate random variable.

Accordingly, one may view this section as a generalization of the previous section in the sense that

random variables 𝑋 , 𝑋 , ⋯ , 𝑋 will be considered, where each 𝑋 represents a univariate discrete

or continuous rand variables.

1.6.3.1 Joint Probability Distribution of Random Variables

Given a set of random variables 𝑋 , 𝑋 , ⋯ , 𝑋 , the probabilities ℙ of events defined by the

concurrent realizations of the p random variables can be determined in terms of the CDF F and

PDF f of the p variables. One may view the given set as a 𝑝 1 random vector

𝑿 𝑋 , 𝑋 , ⋯ , 𝑋 . Accordingly, each component 𝑋 of X represents a random variable with its

own defined marginal probability distribution 𝑓 𝑥 and 𝐹 𝑥 . The same applies to their marginal

means and variances 𝜇 𝐸 𝑋 and 𝜎 𝐸 𝑋 𝜇 , with 𝑖 1, ⋯ , 𝑝.

For simplification, let the p random variables be real numbers. Accordingly, for each set of real

numbers 𝑥 , ⋯ , 𝑥 and using Equation 1.91(c), one can calculate ℙ as the probability of the p real

numbers falling in a set ℱ in the p-dimensional Euclidian space. In addition, given the continuous

function F, through the applications of operational calculus involving differential operators (e.g.,

see Anderson 2003), one can compute the density function f as in Equation 1.91(b) and the

probability ℙ by either Equation 1.91(a) or Equation 1.91(c).

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
102

𝐹 𝑥 ,⋯,𝑥 ℙ𝑋 𝑥 ,⋯,𝑋 𝑥 (a)

𝜕 𝐹 𝑥 ,⋯,𝑥
𝑓 𝑥 ,⋯,𝑥 (b) [Eq. 1-91]
𝜕𝑥 ⋯ 𝜕𝑥

ℙ X ,X ,⋯,X ∈ℱ ⋯ 𝑓 𝑥 ,⋯,𝑥 𝑑𝑥 ⋯ 𝑑𝑥 (c)


Moreover, if the random variables 𝑋 , 𝑋 , ⋯ , 𝑋 are statically independent, then one may express

Equation 1.91(a) and Equation 1.91(b) in terms of the marginal probability distributions

𝐹 𝑥 ,⋯,𝐹 𝑥 and densities 𝑓 𝑥 , ⋯ , 𝑓 𝑥 of the p variables as in Equation 1.92(a) and

Equation 1.89(b), respectively.

𝐹 𝑥 ,⋯,𝑥 𝐹 𝑥 ⋯ 𝐹 𝑥 ℙ𝑋 𝑥 ⋯ ℙ𝑋 𝑥 (a)
[Eq. 1-92]
𝑓 𝑥 ,⋯,𝑥 𝑓 𝑥 ⋯ 𝑓 𝑥 (b)

Whether the p variables 𝑋 , 𝑋 , ⋯ , 𝑋 are discrete or continuous, one may substitute each marginal

probability distribution 𝑓 𝑥 and 𝐹 𝑥 by their appropriate expressions as provided in Equation

1.74 and Equation 1.78 to find the joint probability density and distribution functions of the p

variables as a collective. In the meantime, one may need to describe the behavior of any pair of

variables 𝑋 and 𝑋 by means of their joint probability function which represents the linear

association between the variables in terms of their covariance 𝜎 as provided in Equation 1.93.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
103

𝜎 𝐸 𝑿 𝜇 𝑿 𝜇 , ∀ 𝑖, 𝑗 1, ⋯ , 𝑝

if 𝑋 is a
⎧ 𝑥 𝜇 𝑥 𝜇 𝑓 𝑥 , 𝑥 𝑑𝑥 𝑑𝑥 continuous random (a)

⎪ variable with a [Eq. 1-93]
PDF 𝑓 𝑥

⎪ if 𝑋 is a discrete (b)
⎪ 𝑥 𝜇 𝑥 𝜇 𝑝 𝑥 ,𝑥
random variable

with a PDF 𝑝 𝑥
𝜎 𝐸 𝑿 𝜇 , ∀𝑖 1, ⋯ , 𝑝

1.7 Multivariate Data and Analysis

1.7.1 Preface

To understand certain social or physical phenomena, a researcher would often collect data on more

than one attribute. In Probability and statistics or any other field of application, though,

multivariate data originates when a researcher experiments to observe and record values of

numerous random variables on several subjects or units. Usually, the researcher performs an

investigation as part of a study intended to understand the relationships between the variables of

interest. Consequently, the researcher decides on the number of variables p and the number n of

observations to collect on each variable. For illustration, Table 1-6 below identifies n and p for

studies performed in five different fields and provides an example for each study area.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
104

Field Study Observed data (n) / Reference


number of variables (p)
Thermodynamic Investigation of n = Number of couplings Castellana,and
fluctuations of the or interactions between spins Zarinelli, (2011)
critical temperature p = Number of spins S
of the SK spin glass

Genetics Structure detection in n = Individuals Patterson et al.


genetic data p = Polymorphic markers (2006)
e.g., n = 100, p = 2000

Electrical and Modeling and analysis n = Households Liao et al. (2020)


Computer of residential electricity p = Typical operation patterns
engineering consumption statistics in terms of electrical devices
e.g., n = 100, p = 2000

Wireless Multiple antenna n = Number of receiving Taherpour et al.


communication spectrum sensing in antennae (2010)
cognitive radios p = Observed signals
e.g., n = M = 8, p = L = 32

Subway system Analysis of subway n = Train arrival times Jagannath and


arrival times p = Number of trains at a given Trogdon (2017)
station in the New York City
subways
e.g., n = 700, p = 6

Table 1-6: Illustrations of n and p-Values in Various Fields of Applications of Multivariate


Analysis

Hence, a systematic multivariate data compilation is crucial for successfully recording data

necessary for the investigation. It is then imperative to record each variable as either a specific

item or experimental unit (Johnson and Wichern 2019). For that reason, to display n values

observed on p variables set forth for the study, the researcher would consider one of the approaches

provided in Section 1.6.3 to tabulate the data. In addition, he would indicate the extent of the data:

sample or population data?

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
105

1.7.2 Introductory to Multivariate Analysis Techniques

As seen in the above table, some of the n and p-values can be pretty significant, consistent with

the well-known contemporary fact that multivariate analysis deals with a considerable amount of

data tabulated in matrices. Thus, in addition to the number of variables and sample sizes, a

researcher would, as Johnson and Wichern (2019) indicated, choose an acceptable multivariate

approach from the following:

Data Reduction or Structural Simplification. For ease of interpretation, the investigated

phenomenon is simplified without a trade-off of valuable information.

Sorting and grouping. Variables or objects are grouped according to defined classification rules

according to their measured characteristics.

Instigation of the dependence among variables. The nature of the relationships between

variables—mutual independence of variables or dependency of one or more on the others.

Prediction: Relationships between variables serve to predict the values of one or more variables

in terms of the observations of the others.

Hypothesis construction and testing: In terms of population parameters, the formulation of specific

statistical hypotheses serve to validate assumptions or reinforce beliefs.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
106

1.7.3 Notation Convention

Applicable to this study, the following rules can be set regarding vectors and matrices used

throughout the reminding sections of this manuscript. The rules are as follows:

Rule 1: Vectors and matrices will be written in bold.

Rule 2: When a superscript T is added to a vector or matrix, it indicates its transpose (e.g., 𝑿 ).

Rule 3: All vectors are column vectors.

Rule 4: Vectors or matrices at the sample or observations level will be denoted by lowercase letters.

Rule 5: If 𝒙 𝑥 ,𝑥 ,…,𝑥 is a vector of observations whose variance-covariance structure is

𝜮, then 𝐱 is said to be a realization of a random vector 𝑿 𝑋 ,⋯,𝑋 having the same

covariance matrix 𝜮 as 𝐱. In this case, X is called a population vector and 𝐱 a vector of

observations.

Rule 6: The notation 𝑥 will be used to denote the observed value of the jth variable regarding the

ith item, feature, or measurement 𝑿 .

1.8 Null Hypothesis Significance Testing (NHST)

Making inferences about the underlying characteristics of a population assumed large and of

unknown parameters based on a random sample drawn from it is called statistical inference

(Norman and Streiner 2003, Upton and Cook 2014). In statistics, researchers often formulate a null

hypothesis, a priori known to be false, and then seek to reject it. a researcher is required to define

both a null hypothesis 𝐻 and an alternative hypothesis 𝐻 to employ this process, also known as

the NHST. Thus, the researcher draws data from the population to compute the necessary test

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
107

statistic, which serves to decide whether the observed evidence rejects 𝐻 . In this process, the

researcher sets the critical regions of rejection of 𝐻 in terms of the test significance level 𝛼.

Alternatively, if the gap between the observed and predicted values is statistically significant, then

the researcher would commit a type I or II error by rejecting 𝐻 . The decision is usually based on

a p-value determined according to the observed data (Rebba et al. 2006). Later in this section, one

will notice that as Warner (2012) pointed out that the NHST implies the researcher draws samples

from a well-defined population so that the test statistics derived are reasonable estimates of the

actual population in question. Nevertheless, Ronal Fisher was the first to introduce the p-value

concept in the 1920s as “an index of the evidence against the null hypothesis” (Vidgen and Yasseri

2016). As a guide in carrying out an NHST, the following development provides the necessary

steps to conduct an NHST by defining all the terms introduced in this paragraph.

Step 1: Build a Statistical Model

Although the null distribution is often unknown or assumed, the first step consists of constructing

a statistical model that characterizes the distribution of the population being studied. This is

necessary to ensure that either chance or random processes alone would determine the outcomes

of the required experiments. When the null hypothesis is true, the probability distribution of the

test statistic—quantity or single numerical statistic to which a sample data is reduced, so it could

be used to test the hypothesis—is called the null distribution. The model of the test statistic,

obtained through a random process, determines the statistical model known as the distribution

under the null hypothesis. In an NHST, the observed test statistics or results are compared against

the distribution under the null hypothesis, and the probability of obtaining those results is therefore

determined (Salkind 2006).

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
108

Step 2: Define a Null Hypothesis 𝐻 and its Alternative 𝐻 )

A null hypothesis H0 is an assertion or a guess regarding the population of interest, while H1 is a

contradictory assertion to the null hypothesis about the population of interest. Both alternatives are

mutually exclusive and used to make statistical conclusions on whether to reject or accept the null

hypothesis. The hypothesis is said to be "null" because every so often, it states a status quo belief

referring to a "no difference" or "no effect" resulting from a change or improvement in the

population of interest. Conversely, the alternative hypothesis refers to a case of an actual difference

or effect (Lindeburg 2011). For instance, after making changes to a course syllabus, an instructor

may be interested in detecting the impact of change in terms of the difference between students'

already known average scores and the new average computed.

Depending on its purpose, a test will be denoted as a two-tailed test or simply a nondirectional test

if the researcher wishes to demonstrate a "no change" or "no effect" (e.g., 𝐻 : 𝜇 12 and 𝐻 : 𝜇

12 for no change in the students' scores). Otherwise, depending on the direction of the change, it

will be denoted as a directional one-tailed test which could be left-tailed (e.g., 𝐻 : 𝜇 12 and

𝐻 :𝜇 12 for a decrease in the students' scores) or right-tailed test (e.g., 𝐻 : 𝜇 12 and 𝐻 : 𝜇

12 for an increase in the students’ scores). Table 1-7 below, which information derived from

Warner (2012), summarizes the decision rules to reject or accept 𝐻 when testing the alternative

hypothesis 𝐻 .

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
109

Test Decision rules when testing 𝐻 to reject 𝐻 given α


Nondirectional/Two-tailed The reject region of 𝐻 contains outcomes in “both” the upper
alternative hypothesis (≠) tail and lower tail of the sampling distribution of the sample
statistic values.
Directional/Left-tailed test The decision to reject 𝐻 will be made “only” for values that
(< or ≤) are in the lower tail of the sampling distribution of the sample
statistic values.
Directional/Right-tailed test The decision to reject 𝐻 will be made “only” for values that
(> or ≥) are in the lower tail of the sampling distribution of the sample
statistic values.

Table 1-7: Decision Rules When Testing 𝑯𝟏 to Reject 𝑯𝟎 Given α

As one may notice, the term “tail” indicates the extreme regions of sample distributions, where the

observed values lead to the rejection of 𝐻 . Accordingly, asymmetric distributions that possess a

single tail, such as the chi-squared distribution, are often used in one-tailed tests. However,

symmetric distributions, such as the normal distribution, are used for two-tailed tests.

Step 3: Set a Significance Level α

In general, the results of experiments are seldom right 100%, so when designing hypotheses for

testing, researchers account for the risk of being wrong. Therefore, before conducting a hypothesis

test or a significance test, a researcher would decide on a threshold of probability 𝛼 by which to

either reject or accept the null hypothesis 𝐻 . That threshold is known as the significance level 𝛼

or “alpha risk or producer risk, as well as the probability of type I error” (Lindeburg 2011, p. 11-

14). One commits a type I error when one fails to reject the null hypothesis a priori assumed to be

true. Conversely, if the null hypothesis is considered false and the researcher fails to reject it, the

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
110

researcher would commit a type II error. However, the researcher will make the right decision with

a confidence level of either (1 𝛼) when they do not reject 𝐻 under the assumption that it is true

or (1 𝛽) when they fail to reject 𝐻 under the assumption that it is false. The term 1 𝛽

represents the power of the test or probability of rejecting 𝐻 when 𝐻 is false.

Table 1-8 summarizes all different scenarios above described, where the probability 𝛽 of

committing a type II error is determined by the distribution of the test statistic under 𝐻 (Warner

2012).

Actual State of the World


Researcher Decision 𝐻 is True 𝐻 is False
Reject H0 Type I error(α) Correct decision (1-β)

Do not reject H0 Correct decision (1-α) Type II error (β)

Table 1-8: Type I Error versus Type II error

The significance level of 0.05 (5%) or 0.01(1 %) is customary for hypothesis testing. For instance,

when designing a one-tailed hypothesis test, selecting a 5% significance level suggests that the

results obtained under the null hypothesis 𝐻 would have around a 5% of chance of being wrong.

In other words, the results would be correct at a confidence level of 95% and said to be significant

(Lindeburg 2011).

Step 4: Take a Sample and Compute its Test Statistic

This step consists in randomizing a sample from the population of interest and then calculating the

test statistic value of the sample. This value will vary between samples according to the distribution

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
111

under the null hypothesis 𝐻 . For example, in a two-tailed test where the observed data X are

normally distributed, using Equation 1.94, one can calculate the test statistic 𝑧.

𝑿 𝜇 [Eq. 1-94]
𝑧 𝜎
√𝑛

where µ and σ represent the population mean and standard deviation, n is the sample size, and 𝑿

is the sample mean.

Step 5: Calculate a p-value of a Test Statistic

Under the assumption of the null hypothesis 𝐻 as in most cases, a p-value or simply p represents

the probability (ℙ) of getting a test statistic (t) value deemed as extreme compared to the

mainstream observed test results. In other words, a p-value serves to verify a statistical hypothesis

by quantifying the concept of statistical significance of evidence, with the proof being represented

by the actual observed value. Accordingly, given an observed t value, one may define a p-value as

the prior probability of realizing a test statistic value at least as “extreme” as t under the assumption

that H was true, where H represents either 𝐻 or 𝐻 . Depending on the direction of the statistical

hypothesis test H, Equation 1.95(a) through Equation 1.95(c) represents the mathematical

expression of the p-value.

𝑝 ℙ 𝑇 𝑡 |𝐻 (a) used for right-sided tests

𝑝 ℙ 𝑇 𝑡|𝐻 (b) used for left-sided tests Eq. 1-95

𝑝 ℙ 𝑎𝑏𝑠 𝑇 𝑎𝑏𝑠 𝑡 | 𝐻 (c) used for Two-sided tests

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
112

Using Equation 1.91(a), one may express any of the three above expressions of p in terms of a cdf.

As expressed, the direct computation of a p-value may be complex. Thanks to available statistical

software or look-up tables, one can determine p-values quickly. A p-value takes values between

zero and one as a probability, including zero and one. The rule of thumb when interpreting p-

values is that a very small p-value would suggest an extreme observed event. Such an event occurs

with a very large statistical significance α, which is unlikely to occur under 𝐻 . In other words, a

very small p-value corresponds to a very large statistical significance test α—value chosen by the

researcher conducting the hypothesis testing (Warner 2012).

Although p-values are necessary for testing hypotheses, some researchers (e.g., Hwang et al. 1992,

Kim and Bang 2016, Vidgen and Yasseri 2016) have argued that they have logical flaws leading

to misleading conclusions (Rebba et al. 2006). Accordingly, being aware of its deficiencies,

including misconceptions and limitations, is as crucial as using them. Among the misconceptions

that even an inexperienced researcher may be unaware of is that a p-value is not the probability

that 𝐻 is true or 𝐻 is false or vice versa. Bayesian hypothesis testing deals with addressing such

inquiries where for instance, one seeks to assess, as Rebba et al. (2006, p. 169) wrote, “how much

evidence is there for the null hypothesis or for the model being right.” This quest corresponds to

determining the probability ℙ 𝐻 | 𝑋 of the hypothesis 𝐻 given the observed data 𝑋, which

differs from the probability ℙ 𝑋 | 𝐻 desired in classical hypothesis or NHST to answer

questions such as whether there is evidence to reject 𝐻 .

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
113

Step 6: Decide to Accept or Reject the Null Hypothesis 𝐻

In a one-sided test (resp. a two-sided test), one compares the p-value computed in step 5 against

the significance level α (resp. 𝛼 2). Depending on the direction of the test, then one decides to

either accept or reject 𝐻 . The four decision rules provided in Table 1-7 may serve as a guide.

Occasionally, one may fail to reject 𝐻 . If so, that would not constitute evidence for making 𝐻

true. It just means that there is no sufficient evidence to reject 𝐻 . Be wary that hypothesis testing

can be frustrating when no samples yield the desired result after many trials. For that reason, in

addition to using p-values, researchers sometimes employ confidence intervals (CI) as p-value

companions. As Kim and Bang (2016, p. 5) wrote, “They both are complementary in the sense that

the p-value quantifies how ‘significant’ the association /difference is while the CI quantifies the

precision of the estimation and its potential values.”

1.9 Research Organization

The sections that follow provide an overview of each chapter in this dissertation.

1.9.1 Chapter 1 – Introduction

The introduction chapter is required for any research study since it lays the groundwork for the

task. In other words, the introductory chapter should serve as a repository for the background data

needed to comprehend the development of more advanced concepts in later chapters. The reason

is that this research study allows for the adaptation and implementation of concepts from

disciplines other than construction and engineering—the background resources for this research

study address concepts from applied mathematics and probability. Furthermore, the chapter covers

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
114

random vectors and matrices as inputs to multivariate techniques such as principal component

analysis, which will be explored in the third chapter. Because principal component analysis (PCA)

relies on hypothesis formulation to make inferences, the chapter also covers descriptive and

inferential statistics by presenting examples to assist readers in comprehending the subject. As a

result, the requirements for probability distributions and random variables are covered in this

chapter.

Furthermore, it provides context for multivariate data and their analysis. Finally, the chapter

finishes with an overview of the chapters to follow. However, before using its framework to

develop new ones, the notion of project scheduling and their accompanying planning and

scheduling methodologies should be presented as a refresher. The resources primarily address

traditional deterministic and probabilistic project scheduling methodologies.

1.9.2 Chapter 2 - An Investigation of the Underlying Behavior of Construction

Project Network Schedules

The topic of this chapter is ambitious because, if completed, it will pave the way for new methods

of scheduling construction and engineering projects. Not only is Tracy-Widom distribution

underutilized in project scheduling and management, but the subject is also scary in and of itself.

Nonetheless, this issue was motivated by a Czech physicist who discovered a typical pattern after

charting thousands of collected bus departure times at a Cuernavaca bus stop and finding bus

timings to have the same behavior pattern (Wolchover 2014). This characteristic was nicknamed

"universality" by scientists, and it is frequently observed when random matrix theory (RMT) is

used to investigate the behaviors of complex systems. As a result, the goal is to adapt and use the

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
115

notion that has proven successful in random matrix theory applications to construction project

networks. This is because the project network schedule comprises nodes representing activities

linked by precedence connections. The interactions between the many parties involved in

developing a project allow it to fall into the category of complex systems. Modern mathematicians

have used RMT to simulate complex systems and analyze their behavior by employing the

eigenvalues of covariance matrices that describe a system. Because of these parallels, the new

universal law based on the Tracy-Widom distribution is expected to reveal insights into the

behavior of project networks. The upcoming literature review should aid in learning more about

the topic and establishing parallels between project network timelines and fields of application for

this new universal law. This will assist in its application to project networks.

1.9.3 Chapter 3 – Application of PCA for Data Reduction in Modeling Project

Network Schedules Based on the Universality Concept in RMT

The assumption that everything in nature, including things and persons, is governed by laws leads

one to believe there must be a universal law governing project network schedules. If this universal

law exists, it may aid project planning and management. Because the activities that comprise the

project contribute to defining the total project cost and time, this universal law or method of project

scheduling may be valuable to project managers, engineers, and the academic community. PCA

might be applied to project network schedules if the universality employed in RMT to describe the

behavior of complex and complex systems could also be used to describe and model project

network schedules. All of these subjects are expected to be covered in Chapter 3. Then, use any

significant discoveries to design procedures for using PCA, a fantastic multivariate statistical

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
116

analysis tool used for data reduction. Data reduction is typically performed by analyzing the

principal components of specific matrices to select only a few variables describing a complex

system to build new ones and use them in developing models.

1.9.4 Summary and Conclusions

The research organization described in the preceding section brings this chapter to a close.

Nevertheless, it has accomplished its intended goal of providing background information for

further use in the subsequent chapters and better preparing readers for the challenges in the next

section.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
References

Adeli, H., and Karim, A. (2001). Construction scheduling, cost optimization and
management. CRC Press.
Agrawal, M. K., Elmaghraby, S. E., and Herroelen, W. S. (1996). "DAGEN: A generator of
testsets for project activity nets." Eur.J.Oper.Res., 90(2), 376-382.
Anderson, T. W. (. (2003). An introduction to multivariate statistical analysis. Wiley-
Interscience, Hoboken, N.J.
Ashkar, F., and Ouarda, T. B. (1998). "Approximate confidence intervals for quantiles of gamma
and generalized gamma distributions." J.Hydrol.Eng., 3(1), 43-51.
Bagaya, O., and Song, J. (2016). "Empirical study of factors influencing schedule delays of
public construction projects in Burkina Faso." J.Manage.Eng., 32(5), 05016014.
Baik, J., Borodin, A., Deift, P., and Suidan, T. (2006). "A model for the bus system in
Cuernavaca (Mexico)." Journal of Physics A: Mathematical and General, 39(28), 8965.
BD+C Staff. (2018). "LIVE STREAMING: Watch select Accelerate Live! talks today." Building
Design & Construction.
Bejan, A. (2005). "Largest eigenvalues and sample covariance matrices. Tracy-Widom and
Painlevé II: computational aspects and realization in S-plus with applications." Preprint:
Http://Www.Vitrum.Md/Andrew/MScWrwck/TWinSplus.Pdf, .
Berchie, H., Gilbert J., C., Beatrice Luchin, John A., C., Daniel, M., and and Roger, D.
(2008). Algebra 2. Glencoe/McGraw-Hill.
Castellana, M., and Zarinelli, E. (2011). "Role of Tracy-Widom distribution in finite-size
fluctuations of the critical temperature of the Sherrington-Kirkpatrick spin glass." Physical
Review B, 84(14),.
Chambers, J. M., Cleveland, W. S., Kleiner, B., and Tukey, P. A. (1983). "Graphical Methods
for Data Analysis."
Cleveland, W. S. (1993). Visualizing data. AT & T Bell Laboratories, Murray Hill, New Jersey.
Cleveland, W. S., and McGill, R. (1987). "Graphical Perception: The Visual Decoding of
Quantitative Information on Graphical Displays of Data." Journal of the Royal Statistical
Society.Series A.General, 150(3), 192-229.
Dawkins, P. (2007). "Linear Algebra." (9/2/ 10:37:40 AM, 2017).

117

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
De Reyck, B., and Herroelen, W. (1996). "On the use of the complexity index as a measure of
complexity in activity networks." Eur.J.Oper.Res., 91(2), 347-366.
Dodin, B. M., and Elmaghraby, S. E. (1985). "Approximating the criticality indices of the
activities in PERT networks." Management Science, 31(2), 207-223.
Dupuis, D. J. (2010). "Statistical modeling of the monthly Palmer drought severity
index." J.Hydrol.Eng., 15(10), 796-807.
Eknoyan, G. (2008). "Adolphe Quetelet (1796–1874)—the average man and indices of
obesity." Nephrology Dialysis Transplantation, 23(1), 47-51.
Everitt, B., and Hothorn, T. (2011). An introduction to applied multivariate analysis with
R. Springer Science & Business Media.
Farlex. (2022). "https://www.thefreedictionary.com/." (Farlex, Inc, 2022).
Freedman, D., and Diaconis, P. (1981). "On the histogram as a density estimator: L 2
theory." Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete, 57(4), 453-476.
Han, F., and Bogus, S. M. (2018). "Evaluating construction work disruptions using resilient
short-interval scheduling: A case study." Proc., Construction Research Congress, 533-543.
Harris, P. E. (2006). Planning Using Primavera Project Planner P3 Version 3. 1 Revised
2006. Eastwood Harris Pty Ltd.
Herzog, W., Boomsma, A., and Reinecke, S. (2007). "The model-size effect on traditional and
modified tests of covariance structures." Structural Equation Modeling: A Multidisciplinary
Journal, 14(3), 361-390.
Holliday, B., Cuevas, C., Moore-Harris, D., and Carter, H. (2008). "Algebra 2, Glencoe."
Hwang, J. T., Casella, G., Robert, C., Wells, M. T., and Farrell, R. H. (1992). "Estimation of
accuracy in testing." The Annals of Statistics, 490-509.
Jagannath, A., and Trogdon, T. (2017). "Random matrices and the New York City subway
system." Physical Review E, 96(3), 030101.
Johnson, R. A., and Wichern, D. W. (2019). Applied Multivariate Statistical Analysis (Classic
Version), 6th Edition. Pearson Prentice Hall, Upper Saddle River, New Jersey.
Kelley, J. E., Jr. (1961). "Critical-Path Planning and Scheduling: Mathematical
Basis." Oper.Res., 9(3), 296-320.
Kim, J., and Bang, H. (2016). "Three common misuses of P values." Dent.Hypotheses, 7(3), 73-
80.

118

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Kirk, P., Rolando, D. M., MacLean, A. L., and Stumpf, M. P. (2015). "Conditional random
matrix ensembles and the stability of dynamical systems." New Journal of Physics, 17(8),
083025.
Kothari, C. R. (2004). Research Methodology: Methods and Techniques. New Age International
Ltd, Daryaganj.
Liao, S., Wei, L., Kim, T., and Su, W. (2020). "Modeling and Analysis of Residential Electricity
Consumption Statistics: A Tracy-Widom Mixture Density Approximation." IEEE Access, 8
163558-163567.
Lindeburg, M. R. (2011). Civil engineering reference manual for the P.E. exam. www.
ppi2pass.com.
Liu, L., Gu, H., and Xi, Y. (2007). "Robust and stable scheduling of a single machine with
random machine breakdowns." The International Journal of Advanced Manufacturing
Technology, 31(7), 645-654.
Lucko, G. (2017). "Course CE 489/589 Notes - Construction Scheduling Techniques."
Department of Civil Engineering of The Catholic University of America.
Lucko, G. (2009). "Productivity scheduling method: Linear schedule analysis with singularity
functions." J.Constr.Eng.Manage., 135(4), 246-253.
Massey Jr, F. J. (1951). "The Kolmogorov-Smirnov test for goodness of fit." Journal of the
American Statistical Association, 46(253), 68-78.
McAssey, M. P. (2013). "An empirical goodness-of-fit test for multivariate
distributions." Journal of Applied Statistics, 40(5), 1120-1131.
Mehta, S. V. (1999). "Predictable scheduling of a single machine subject to
breakdowns." Int.J.Comput.Integr. Manuf., 12(1), 15-38.
Montgomery, D. C., and Runger, G. C. (2007). Applied statistics and probability for engineers
(With CD). John Wiley & Sons.
Nadarajah, S., and Kotz, S. (2007a). "Multitude of beta distributions with
applications." Statistics, 41(2), 153-179.
Nadarajah, S., and Kotz, S. (2007b). "Two generalized beta distributions." Appl.Econ., 39(14),
1743-1751.
Norman, G. R., and Streiner, D. L. (2003). PDQ statistics. PMPH USA, .
Patterson, N., Price, A. L., and Reich, D. (2006). "Population structure and eigenanalysis." PLoS
Genet, 2(12), e190.

119

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Pearson, K. (1900). "X. On the criterion that a given system of deviations from the probable in
the case of a correlated system of variables is such that it can be reasonably supposed to have
arisen from random sampling." The London, Edinburgh, and Dublin Philosophical Magazine
and Journal of Science, 50(302), 157-175.
Ponce de Leon, G. (2008). "Project planning using logic diagramming method." AACE
International Transactions, 1-6.
Ramachandran, K. M., and Tsokos, C. P. (2020). Mathematical statistics with applications in
R. Academic Press.
Ramachandran, K. M., and Tsokos, C. P. (2014). Mathematical Statistics with Applications in
R. Elsevier.
Rebba, R., Huang, S., Liu, Y., and Mahadevan, S. (2006). "Statistical validation of simulation
models." International Journal of Materials and Product Technology, 25(1-3), 164-181.
Saccenti, E., Smilde, A. K., Westerhuis, J. A., and Hendriks, M. M. (2011). "Tracy-Widom
statistic for the largest eigenvalue of autoscaled real matrices." J.Chemometrics, 25(12), 644-
652.
SAGE Campus. "Introduction to
Statistics." https://campus.sagepub.com/all-courses (01/29/, 2022).
Salkind, N. J. (2006). Encyclopedia of measurement and statistics. SAGE publications.
Schexnayder, C., Knutson, K., and Fente, J. (2005). "Describing a Beta Probability Distribution
Function for Construction Simulation." J.Constr.Eng.Manage., 131(2), 221-229.
Scott, D. W. (1979). "On optimal and data-based histograms." Biometrika, 66(3), 605-610.
Soong, T. T. (2004). Fundamentals of probability and statistics for engineers. John Wiley &
Sons.
Tafazzoli, M., and Shrestha, P. (2017). "Factor Analysis of Construction Delays in the U.S.
Construction Industry." International Conference on Sustainable Infrastructure 2017, 111-122.
Taherpour, A., Nasiri-Kenari, M., and Gazor, S. (2010). "Multiple antenna spectrum sensing in
cognitive radios." IEEE Transactions on Wireless Communications, 9(2), 814-823.
Tijms, H. C. (2007). Understanding probability: chance rules in everyday life. Cambridge
University Press, Cambridge.
Uher, T., and Zantis, A. S. (2012). Programming and scheduling techniques. Routledge.
Upton, G., and Cook, I. (2014). A dictionary of statistics 3e. Oxford university press.

120

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Vanhoucke, M., Coelho, J., Debels, D., Maenhout, B., and Tavares, L. V. (2008). "An evaluation
of the adequacy of project network generators with systematically sampled
networks." Eur.J.Oper.Res., 187(2), 511-524.
Vidgen, B., and Yasseri, T. (2016). "P-values: misunderstood and misused." Frontiers in
Physics, 4 6.
Warner, R. M. (2012). Applied statistics: From bivariate through multivariate techniques. Sage
Publications.
Weber, H., and Arfken, G. B. (2005). Mathematical methods for physicists. Elsevier Academic.
Wilk, M. B., and Gnanadesikan, R. (1968). "Probability Plotting Methods for the Analysis of
Data." Biometrika, 55(1), 1-17.
Williams, T. M. (1992). "Practical use of distributions in network
analysis." J.Oper.Res.Soc., 265-270.
Wolchover, N. (2014). "At the Far Ends of a New Universal Law." Sci. Am.
Yi, S., Gunnar, L., and Thompson, R. C. (2018). "Application of Voting Theory to the Float
Ownership Problem." J.Constr.Eng.Manage., 144(1), 04017094.
Zidane, Y., and Andersen, B. (2018). "Causes of delay and their cures in major Norwegian
projects."

121

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
CHAPTER 2
An Investigation of the Underlying Behavior of Construction Project
Network Schedules

Abstract

A methodology based on Random Matrix Theory (RMT) has been proposed to aid in the

investigation of project network schedule underlying behavior. The methodology is based on three

assumptions. The first assumption demands that the probabilistic activity durations have an

identical triangular distribution with known parameters. A repetitive joint sampling of activity

durations serves to create a sample data matrix 𝑿 using one of the 13 strategies identified as

suitable for translating a project network of size p into a random matrix utilizing its dependency

structure matrix. This joint sampling distribution was unknown. Yet, it served to figuratively draw

each of the n rows of 𝑿 . The second assumption is that the Tracy-Widom (TW1) limit law is

the natural distribution of each row of 𝑿 's sampling. The interactions between various parties

involved in managing and constructing projects, defined in pairwise linked activities, cause a

project network schedule to fall under complex systems marked by a phase transition with a tipping

point supporting this assumption. In addition, the striking similarities discovered in this study

between the fields of application of the TW1 distribution and those of project scheduling support

this assumption. The last assumption is that a project network schedule with sufficient correlation

in its structure, like that of complex systems, can be investigated within the framework of RMT.

This assumption is justified by the interdependence structure defined by the various

122

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
123

activity connections between project activities (thousands in large networks). In addition, it

enabled the application of RMT’s universal results, such as those associated with the TW

distributions, to the study of project schedules’ underlying behavior. In RMT, the appropriately

scaled eigenvalues of sample covariance matrices serve as test statistics for such a study.

, ,
As a result, a carefully engineered sample covariance matrix 𝑺 was developed, and a couple

of standardization approaches (Norm I and Norm II) for its eigenvalues. Both standardization

approaches relate to the universality of the TW limit law, which many authors have extended (e.g.,

Soshnikov 2002, Péché 2008) to a broad class of matrices that are not necessarily Gaussian under

relaxed assumptions. Although some of these assumptions have been eased, others must still be
, ,
met. Among these extra requirements, the formulation of 𝑺 was chosen. Its formulation is

based on n samples of p centered and scaled EF times of the project network activities in question,

which distribution assumption is to be tested at an α significance level. The α values of 5%, 10%,

and 20% were selected to conduct various experiments.

MATLAB and Visual Basic facilitated the various data preparation and empirical study of a

handful of project networks. 35 networks of diverse sizes and complexity were chosen from the

study's benchmark networks 2040 obtained from the Project Scheduling Problem Library

(PSPLIB). The project networks ranged from 30 to 120 activities, with restrictiveness (RT) values

indicating complexity. Project network schedules were scheduled through Kelly's (1961) forward

and backward passes to determine activities' early finish (EF) times. The resulting largest
, ,
eigenvalue of 𝑺 revealed three exciting conclusions. First, the scatterplot of 100 pairs of the

, ,
normalized largest eigenvalue (𝑙 ) of 𝑺 and the sample size n revealed a distinct and

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
124

consistent trend. The pattern is a concave upward curve, like the stress-strain curve used in

materials science and engineering. The curve steepens to the left and flattens to the right as n

increases. Surprisingly, networks of varying sizes and complexity showed the same pattern

regardless of the normalization method.

The deviations ∆ of the empirical means of 𝑙 from the mean of the TW distribution (𝜇 )

were determined using the same 100 outputs. They enabled the graphing of scatterplots of sample

size n against ∆ . The resulting pattern highlighted the association between n and 𝑙 generated

from probabilistic project schedules. The deviations ∆ between the variances of 𝑙 and 𝑣𝑎𝑟

were calculated similarly. The resulting pattern, consistent across networks, helped determine an

optimum sample size n that would maximize variance in a project network schedule's sampled

data. This sample size corresponds to the abscissa at the mean deviation curve's intersection with

the horizontal axis (n-axis). One may view this ideal sample size as a stress curve yield point or

the required pixel count for high-quality printing. The optimum sample size (𝑛 ) was found to

be related to the network size p but not its RT value. Moreover, any network that meets the study's
, ,
criteria has an 𝑛 . Also, α is not required in the expression of 𝑺 . For a project network,

leaving it out resulted in a considerable increase in the value of 𝑛 .

Subsequently, the derived 𝑛 was used in a series of 1000 simulations to validate the

distributional assumption on activity durations. The test statistics for the K-S test were the

, ,
normalized first through fourth-largest eigenvalues 𝑙 ,𝑙 ,𝑙 , and 𝑙 of the matrix 𝑺 .

According to a comparison of data from both normalization approaches, Baik et al. (1999) and

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
125

Johansson (1998)—Norm II may be better suited to studying project network scheduling behavior

than Johnstone (2001)—Norm I. Under Norm I, 18 of the 35 project networks validated the null

, ,
hypothesis when using 𝑙 and 𝑙 of their matrices 𝑺 . Norm II supported the null hypothesis

, ,
for 19 of the 21 networks evaluated when using 𝑙 and 𝑙 of the matrices 𝑺 . The null

hypothesis states that the Tracy-Widom of order 1 is the natural limiting distribution of the joint

sampling of project network activity durations. This conclusion is significant, perhaps expected

with the use of the CPM, since Baik et al. (1999) introduced Norm II while studying the length of

the longest increasing sequence of random permutations, which was governed by a TW limit law.

Nevertheless, the empirical and theoretical distribution plots' agreement was displayed and

compared using Q-Q plots and histograms. The graphs corroborated the limiting distributional

assumption. Furthermore, the networks' Q-Q plots showed that appropriately rescaling and
, ,
recentering of the mth largest eigenvalue of the matrix 𝑺 will increase their test performance.
, ,
In sum, the extensive empirical investigation validated evidence of a covariance structure 𝑺 in

project network schedules. In addition, the study discovered a universal pattern for project network

schedules which assisted in establishing the Tracy-Widom distribution of order 1 as the natural

limiting joint distribution of project activity durations. The optimum sample size establishing this

significant result represents the tipping point likely to prevent delays.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
126

2.1 Introduction

The construction of engineering projects is a complex task that requires systematic planning to

complete projects on time, within allocated budgets, and avoid delays. For project managers,

identifying the origin of delays is crucial to assign its cost to the appropriate party. Various authors

have performed analyses to identify their causes and proposed solutions that mitigate their effects

on schedule-associated delays. Most resolutions revolve around improving construction

scheduling techniques that employ activity durations determined using deterministic, probability,

or simulation approaches. Deterministic durations are engaged in three forms of construction

scheduling: Bar charts introduced in the 1700s (Priestley 1764), the Critical Path Method (CPM)

introduced in the early 1920s by Kelley and Walker (1959, 1989), and Linear Scheduling Method

which earlier version derived from line-of-balance (Lumsden 1968) and flowline (Nezval 1958).

The CPM uses a three-step technique: forward pass, backward pass, and comparison. The last step

is to determine whether activity floats delay their start dates without impacting the project's total

duration. Critical activities have "0" floats. "Its seductive simplicity has earned CPM much

criticism, e.g., not facilitating planning and using unrealistic fixed durations" (Jaafari 1984). The

Linear Scheduling Method (LSM, Johnston 1981) or linear scheduling (Lucko et al. 2014) models

activities as progress curves on a two-dimensional space (work against time). Not widely used

worldwide (Seppänen 2009, Kemmer 2006), it has the weakness of stressing flow over the network

structure. Due to the criticisms of using fixed durations in deterministic scheduling, the Program

Evaluation and Review Technique (PERT) was introduced in 1958 (Malcolm et al. 1959) as a

project planning and cost control tool, Behan (1966).

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
127

PERT is based on three parallel CPM estimates and designed to handle variability in activity

durations through an assumed beta distribution, which prompted criticisms right after its

introduction and after that (van Slyke 1963). Simulation approaches to determine inputs to

construction schedules have many advantages, including creating many observations, such as in a

Monte Carlo Simulation (MCS). Its applications are countless across scientific areas, such as risk

analysis in project scheduling (Chantaravarapan et al. 2004). Although it is helpful for statistical

analysis before an eigenvalue analysis, no system patterns can derive from an MCS. As Lee et al.

(2013) stated, simulation quality depends on its probability distribution for sampling durations. If

actual data are not available, uncertainty must be assessed (Schexnayder et al. 2005). Construction

studies have used various PDFs (Thompson et al. 2016).

As a result, the primary goal of this study is to determine whether and how the recently discovered

Tracy-Widom distribution outperforms traditional ones in terms of preventing schedule delays.

This Quanta Magazine paper (Wolchover 2014) exemplifies its attraction: "At the Far Ends of a

New Universal Pattern: A potent theory has developed describing a mystery statistical law that

develops throughout science and mathematics." The Tracy-Widom distribution, introduced in the

field of random matrix theory (RMT) by C. Tracy and H. Widom (1993, 1994), is a continuous

mathematical function that characterizes the behavior of a system of independent but randomly

interacting entities that approaches a tipping point. In other words, it emerges when a phase

transition occurs between the two phases of a system's weakly linked versus strongly coupled

components (Majumdar and Schehr 2014). This behavior progresses from 'insufficient' to

'excessive' interactions, resulting in a characteristic asymmetric shape, as illustrated in Figure 2.1.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
128

Figure 2.1: Tracy-Widom Distribution with Phase Transitions


Courtesy of Wolchover (2014)

The steeper left tail of the lopsided curve, proportional to the peak 𝑁 , describes the system's

energy during the strong-coupling phase. On the other hand, the right side illustrates the system's

energy in the weak coupling phase based solely on the number of system components N. It assures

continuity during phase transitions. As a result, it represents a second-order phase transition as per

the physicist Ehrenfest's (1880-1933) classification (Kadanoff 2006; Sauer 2017). This phase

transition type is characterized by correlation length and power-law decay of correlation near the

criticality zone. Nevertheless, since its discovery, the distribution has appeared in various systems.

2.2 Research Question

Can the interdependency structure established by the numerous links between project activities in

a project network schedule represent evidence of a covariance structure that may aid in the study

of their behavior as systems characterized by phase transitions with a tipping point? The question

is whether such a structure of population covariance exists. If such a structure exists, it may aid in

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
129

examining a project network schedule with linked activities that can shift from stable to unstable

(typical of delay) in the manner of a Tracy-Widom universality class system. Furthermore, around

the tipping point, the Tracy-Widom distribution may reflect a natural distribution for project

activity durations, which may prevent delays. Can the universality found in the minuscule margins

of the TW distribution characterizing the behaviors of systems with second-order phase transitions

from strong-coupling to weak-coupling phase help explain the behavior of construction project

network schedules with analogous phase transitions? As a result, the primary purpose of this

chapter is to use the Tracy-Widom distribution's universality (with regards to its phase transition)

to investigate the existence of a population covariance structure in project network schedules.

Thus, finding the natural distribution of project networks' durations in a critical zone would help

prevent delay.

Because of its pioneering nature, the following actions are planned to complete the study for this

chapter. This chapter begins with a survey of the literature to provide a current understanding of

the Tracy-Widom distributions. It does so by analyzing and evaluating supporting results and

contributions. The chapter will then present a research strategy for reaching the chapter's primary

goal. The study objectives, data collection and preparation, and any model development to execute

any pertinent approach from the literature evaluation will all be part of the research methodology.

Subsequently, the research findings and conclusions will be provided. Finally, this chapter will

formulate the research contributions to the body of knowledge and make recommendations for

future research studies.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
130

2.3 Literature Review

The advancement of technology has brought the world an enormous volume of complex data

generated throughout many devices in various disciples, including but not limited to genomics,

communications, sciences, and economics. In most cases, researchers seek to study and plot such

data in a nominal high-dimensional coordinate system going beyond the limit of classical

multivariate analysis. Hence, random matrix theory (RMT) has become a prominent framework

allowing researchers to probe conceptual questions associated with the multivariate analysis of

high-dimensional data. According to Ergun (2007), statisticians first introduced the RMT in the

early 1900s. However, it saw significant progress in the hands of physicists such as Wigner, Mehta,

Gaudin, and Dyson throughout the 1960s. The RMT is a method used to study the statistical

behavior of large complex systems through the definition of an ensemble necessary to account for

all possible laws of interactions occurring within the system Ergun (2007). Topics that have

emerged from the researchers’ inquiries in various fields of mathematics and physics include

universality, which significantly impacts the techniques developed to solve problems associated

with high-dimensional data. The term universality perhaps appears to originate from statistical

mechanics, where one can describe critical phenomena and phase transitions of the second order

in a small region of critical or transition points independently of all except a few properties of the

model (e.g., size, symmetry of interaction). Typically, the simplified description is obtained by

appropriately rescaling relevant variables in terms of universal or similar scaling functions to

express phase transitions in entirely different materials, such as a liquid-gas transition near the

boiling point or a decrease in magnetization of a ferromagnetic metal as the temperature

approaches the Curie temperature (Pastur and Shcherbina 2011). As Ergun (2007, p. 3) stated,

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
131

“The reason for such universality is not so clear but may be an outcome of a law of large numbers

operating in the background.”

Nevertheless, the concept of universality is crucial in RMT. It enables researchers to understand

the structure of systems exhibiting universality. Thanks to universality, a random matrix may serve

to describe the behavior of a system as if its behavior acts like a signature assuring evidence of

complexity and enough correlation in its structure (Wolchover 2014).

Areas of applications of the RMT include statistical physics of disordered systems (Stein 2004),

finance (Ledoit 2004, Frahm 2004), telecommunication networks (Taherpour et al. 2010),

electrical and computer engineering (Liao et al. 2020), and number theory (Jakobson et al. 1999).

One usually solves problems in these fields by applying multivariate analysis techniques such as

PCA, hypothesis testing, regression analysis, and covariance estimation. In most cases, resolving

these problems involves studying the bulk or edge spectrum of eigenvalues of random matrices

used to represent the investigated systems. While the bulk spectrum focuses on exploring the local

properties of eigenvalues or the interactions between neighboring eigenvalues (Péché 2008), the

edge spectrum is concerned with studying the behavior of extreme eigenvalues. However, the

study of the largest eigenvalue λ of large random matrices has drawn particular attention

among scholars due to its suitability in addressing questions associated with its fluctuations.

The questions of extreme value statistics “arise naturally in the statistical physics of complex and

disordered systems” (Majumdar and Schehr 2014, p.2). The work of May (1972) on probing the

stability of large complex ecosystems is the first known direct application of the statistics of λ

(Majumdar and Schehr 2014). Through his work, May (1972) found that these systems, connected

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
132

at random, are stable until they reach some critical level of connectance—the proportion of links

per species—as it increases to become unstable abruptly. By their behaviors, construction project

network schedules are like complex systems. The intricacy and pairwise links (thousands in large

projects) between activities making up a project network schedule may help explain this similarity.

Examples of such complex systems include the New York City subway system (Jagannath and

Trogdon 2017), bus departure time in Cuernavaca (Krbálek and Seba 2000), and stability in

ecosystems (May 1972).

Establishing similarities between fields of interest in construction management and engineering

will allow this research study to adapt and adopt their techniques to the study of behaviors of

project network schedules. Among those techniques is the hypothesis testing technique widely

used to probe evidence of a covariance structure at the edge spectrum. Similarly, these techniques

may be suitable for probabilistic construction project schedules deemed complex. Proving the

existence of a covariance structure in probabilistic project schedules would be a significant finding

in construction engineering and management. Furthermore, it would conclude that construction

project schedules exhibit universality. Hence, they could be modeled, and their behaviors studied

like many other real-world systems. To device an approach that will help establish universality in

project networks, it is necessary first to provide sufficient background literature on the topic of

RMT as it applies to the study of the behavior of complex systems. Thus, the following sections

introduce successively random matrix ensembles, random matrix formulation, bulk spectrum

behaviors, edge spectrum behaviors, some applications of the universality theorems, numerical

evaluations of the universal laws, and multivariate data analysis.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
133

2.3.1 Random Matrix Models and Ensembles

In probability theory and statistics, when attempting to use random matrices—matrices filled with

random numbers—to model complex systems or random processes, various ensembles come into

play depending on the type of a researcher’s investigation. Usually, to describe all possible random

processes or interactions occurring in a complex system, an investigator would first consider a

sequence 𝑋 , ⋯ , 𝑋 of independent and identically distributed (i.i.d.) random variables essential to

approximate a random process. Next, the investigator would use these random variables to

construct a 𝑛 𝑛 random matrix. The resulted matrices would then form a group of random

matrices. Subsequently, he would design a probability measure on appropriate subsets of the

group. Following the three steps would create a random matrix model (RMM).

As defined in the literature, an RMM represents a triple object (Ω, ℙ, ℱ), where Ω is a set,

ensemble, or group of all possible matrices of interest, ℙ is the probability measure defined on Ω.

ℱ is a 𝜎-algebra on Ω – a collection of subsets of Ω that is closet under complements and countable

unions. When the group 𝛺 is compact or locally compact (endowed with a topological space—

e.g., Euclidean space—that is closed and bounded), the Haar measure ℙ is often used in

mathematical analysis to assign in terms of probabilities an invariant (see Section 2.3.1.2) volume

to subsets of Ω. Meanwhile, the literature on random matrix ensembles suggests the existence of

multiple ensembles. However, the following sections only introduce a few notable ones: the

Wigner ensemble, the Gaussian β-Ensembles, and the Wishart ensemble. For other ensembles, one

may refer to the book by Mehta, M. L. (2004).

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
134

2.3.1.1 Wigner Random Matrix Ensemble

The Wigner random matrix ensemble is one of the most celebrated ensembles in RMT. This

ensemble was named after the physicist Eugene Wigner (1950-1955), who first introduced random

matrices in nuclear physics while studying the statistics of energy levels of systems of particles in

terms of the eigenvalues of matrices randomized from some ensembles (Péché 2008). By

considering the lines representing the spectrum of a heavy atom nucleus, Wigner hypothesized that

the spacings between successive wavelengths in the spectrum of nuclei of heavy atoms should be

analog to the ones between the eigenvalues of a random matrix and depend solely on the symmetry

class of the complex nuclei (Mehta 2004). As defined in the literature, complex nuclei consist of

many strongly interacting particles, making their spectral properties nearly impossible to describe

using rudimentary computations. Remarkably, Wigner's approach led to the discovery that

revolutionized the study of the spectral properties of complex nuclei. This discovery allowed one

to identify the local statistical behavior of energy levels of charged particles, considered in a simple

sequence, with the eigenvalues of a large random matrix H.

Except for the simple sequence restriction requiring all levels to have the same spin, parity, or any

other strictly conversed quantities resulting from the symmetry of the system, there were no other

restrictions on the random elements of H sampled from a known distribution (e.g., Gaussian)

(Mehta 2004). However, in the attempt to prove the correctness of Wigner's hypothesis by various

researchers (e.g., Porter and Rosenzweig, Dyson, Moore), the following paragraph provides the

unanimous details and requirements on H's entries found in the literature (e.g., Péché 2008). While

empirical analysis (e.g., Monte Carlo) validated Wigner's hypothesis, significant findings resulted

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
135

from theoretical studies conducted for the same purpose. For instance, as Mehta (2004, p. 5) wrote,

"From a group theoretical analysis Dyson found that an irreducible ensemble of matrices, invariant

under a symmetry group G, necessary belongs to one of the three classes, named by him,

orthogonal, unitary, and symplectic.”). Section 2.3.1.2 will expand more on these three classes of

matrices.

Nevertheless, the elements of the Wigner ensemble are 𝑛 𝑛 complex Hermitian (resp. real

symmetric) matrices 𝑯 𝐻 whose off-diagonal entries 𝐻 , with 𝑖 𝑗, are i.i.d.


√ , ,⋯

complex (resp. real random) variables with a centered probability distribution 𝜇 (resp. µ) on ℂ

(resp. ℝ) that has a finite variance 𝜎 . Alternatively, their diagonal entries 𝐻 are i.i.d. real random

variables independent of the off-diagonal entries. Because 𝑯 is a complex Hermitian (resp. real

symmetric), its eigenvalues 𝜆 exist and are all real numbers. Therefore, by defining suitable

functions (e.g., level spacings) or some appropriate quantity, one may determine the statistical

properties of the sequence of the eigenvalues of H.

The distribution law of level spacings is known as the Wigner Surmise. One may refer to Mehta

(2004) or Basor et al. (1992) for its formulation and derivation in terms of energy levels E as well

as its applications (e.g., the Zeros of the Riemann Zeta function). In thermodynamics, the same

law of the eigenvalues 𝜆 of H can be interpreted as the Gibbs-Boltzmann probability distribution

expressed in terms of 𝛽 𝑘𝑇 and 𝐸 𝜆 representing respectively the inverse of the system's

equilibrium temperature T and the potential energy of the Coulomb gas of pairwise interacting

charged particles 𝑥 , 𝑥 , ⋯ , 𝑥 , where k is the Boltzmann constant. In the expression of 𝐸 𝜆 ,

𝜆 represents the position of the ith particle on a line via a two-dimensional Coulomb force. That

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
136

expression consists of a couple of terms. While one term, accounting for pairwise repulsions, tends

to outspread the charges, the other term explaining the external harmonic potential, tends to

aggregate charges around the origin. Thus, both terms represent the competitions taking place in

the system of charges, which eventually stabilize into equilibrium configuration on an average

(Mehta 2004, Majumdar 2014). In that equilibrium configuration, the average joint density 𝜌 of

the charges can be expressed in terms of angular brackets as in Equation 2.1, representing an

average of the joint PDF, with δ depicting a real positive number.

1
𝜌 𝜆 〈 𝛿 𝜆 𝜆 〉 [Eq. 2-1]
𝑛

As Mehta and Gaudin (1960) pointed out, the interactions are so numerous and complex in heavy

nuclei that the use of average density in explaining the average properties like level-density

distribution of level spacings is common, enabling the applications of theories that are statistical.

2.3.1.2 Gaussian β-Ensembles

The rapid development in RMT, leading to the so-called "new kind of statistical mechanics,"

allowed a researcher to study a complex system as if it was a black box. He would ignore the

knowledge of the system's inherent characteristics and substitute it with an ensemble of systems

in which all possible laws of interactions were identically probable (Ergun 2007, Mehta 2004). To

institute the mathematical rigor ensuring equal probability of the laws of interactions for the new

statistical theories of random matrices, Dyson derived three universal classes for all random

matrices. These three classes of ensembles, ushered by Dyson in the attempt to unify all the

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
137

statistical theories of random matrices mechanics, are orthogonal ensemble (OE), unitary ensemble

(UE), and symplectic ensemble (SE). When the entries of the Wigner matrices 𝐻 , are Gaussian

distributed, one refers respectively to the three classes as GOE, GUE, and GSE (Ergun 2007).

Because of the great interest in the eigenvalue distributions of large random matrices 𝐻 ,

depending on the probability measure ℙ to sample these matrices’ entries, “one can extract

different sub-ensembles from the ensemble of Wigner matrices.” (Bejan 2005, p. 8). Among them

are the Gaussian (e.g., see Mehta 2004), circular (e.g., see Mehta 2004), deformed (e.g., see

Johansson 2007, Péché 2008), disordered (e.g., see Bohigas et al. 2009), and Ginibre ensembles

(Collins et al. 2014, Mays 2013, Ginibre 1965). Although Mehta (2004, p.4) announced that the

Gaussian ensembles are equivalent to the circular ensembles for large orders, this section focuses

only on the classical Gaussian ensembles, also referred to as Gaussian β-ensembles. Not only

because these ensembles are the most common ones, but they are relevant to the scope of the

current work. The Gaussian β-ensembles are probability spaces on n-tuples of eigenvalues 𝒍

𝑙 , ⋯ , 𝑙 , with joint density functions ℙ (Dieng 2005, Dieng and Tracy 2011) defined as in

Equation 2.2 derived from Mehta (2004, p.58) and below provided.

1
ℙ 𝑙 ,⋯,𝑙 ℙ 𝒍 𝐶 exp 𝛽 𝑙 𝑙 𝑙 [Eq. 2-2]
2

where: the normalization constant 𝐶 is defined as follows:

⁄ ⁄ ⁄ 𝛽 𝛽𝑗
𝐶 2𝜋 2 𝛤 1 2 𝛤 1 2

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
138

and the 𝑙 are eigenvalues of randomly selected matrices from their corresponding family
,⋯,

of distribution. Because the expression of the law ℙ depends on the same 𝛽 introduced in Section

2.3.1.1, this distribution law has a physical meaning. Depending on the argument of the weight

function exp in Equation 2.2, the normalization constant 𝐶 may differ from one author to

another. For other formulations, see Dumaz and Virág (2013), and Borot et al. (2011). When 𝛽

1, 2, 4, this family of β-ensembles corresponds to the GOE, Gaussian Unitary Ensemble (GUE),

Gaussian Symplectic Ensemble (GSE), respectively. For other specific values of β, such as 𝛽 4,

see Mehta (2004) and Its and Prokhorov (2020). Additionally, for further interest on these

ensembles, one may refer to the works of Forrester (2005), Mehta (2004), Ergun (2007), and Tracy

and Widom (2008).

Nevertheless, these three classical compact groups are known as invariant ensembles since one

can explicitly compute the joint densities of their distribution laws ℙ with regards to the Haar

measure of the groups. Moreover, it is worth adding that the term invariant, as Pastur and

Shcherbina (2011, p.3) stressed, “play an important role in [RMT] and its applications since the

early 1960s when Dyson introduced them…to determine the basic Gaussian ensembles [GOE,

GUE, and GSE] and their circular analogs (COE, CUE, and CSE…).” The followings are brief

introductory paragraphs of each of the three 𝛽-ensembles: GOE, GSE, and GUE.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
139

Gaussian Orthogonal Ensemble (GOE, 𝛽 1)

The GOE, denoted 𝘖 , is a probability space on the set 𝛺 ℋ of 𝑛 𝑛 real (symmetric) Wigner

random matrices 𝑯 𝐻 , whose Gaussian entries (up to a choice of mean and variance σ )

satisfy the following conditions:

(i) the real entries 𝐻 with 1 𝑖 𝑛, on the diagonal of H are i.i.d. 𝒩 0, σ , and 𝐻

with 1 𝑖 𝑗 𝑛, above the diagonal of H, are i.i.d. 𝒩 0, σ . For reference, see

Dieng (2005, p.128) and Bejan (2005, p. 9),

(ii) the probability ℙ 𝑯 𝑑𝑯 that a system of 𝘖 will belong to the infinitesimal volume

element 𝑑𝑯 is a unique measure invariant under all real orthogonal transformations of

H. That is, the density of the probability measure ℙ 𝑯 is invariant concerning all

orthogonal transformations 𝑯 → 𝑶 𝑯𝑶 where O is any real orthogonal matrix (O is

invertible, i.e., 𝑶 =𝐎 ).

Gaussian Unitary Ensemble (GUE, 𝛽 2)

The GUE, denoted 𝑈 𝑛 or 𝑈 , is a probability space on the set 𝛺 ℋ ∗ of 𝑛 𝑛 complex

(Hermitian) Wigner random matrices 𝐻 𝐻 whose Gaussian entries (up to a choice of


,

mean and variance σ ) satisfy the following condition

(i) the real ℜ 𝐻 and imaginary ℑ 𝐻 parts of each entry 𝐻 above the diagonal of H

with 1 𝑖 𝑗 𝑛 are i.i.d. 𝒩 0, σ , and on the diagonal 𝐻 with 1 𝑖 𝑛 are

real i.i.d. 𝒩 0, σ . For reference, see Dieng (2005, p.128) and Bejan (2005, p. 9),

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
140

(ii) the probability ℙ 𝑯 𝑑𝑯 that a system of 𝑈 will belong to the infinitesimal volume

element 𝑑𝑯 is a unique measure invariant under all unitary transformations of 𝑯. That

is, the density of the probability measure ℙ 𝑯 is invariant for all unitary 𝑯 → 𝑼 𝑯𝑼

transformations, where U is any unitary matrix.

Gaussian Symplectic Ensemble (GSE, 𝛽 4)

The GSE, denoted 𝑆𝑝 𝑛 or 𝑆𝑝 , is the probability space on the set 𝛺 of 2𝑛 2𝑛 self-dual and

Hermitian Wigner-like random matrices Q. These matrices are self-dual and written as an 𝑛 𝑛

𝑎 𝑏
quaternionic matrix 𝐻 𝐻 whose quaternion Gaussian elements 𝐻 , with
, 𝑐 𝑑

𝑎, 𝑏, 𝑐, 𝑑 ∈ ℂ, correspond to the entry 𝑞 provided in Equation 2.3 below:

1 𝑖 1 𝑖
𝑞 𝑎 𝑑 𝑒 𝑎 𝑑 𝑒 𝑏 𝑐 𝑒 𝑏 𝑐 𝑒
2 2 2 2 [Eq. 2-3]

where 𝑒 , 𝑒 , 𝑒 , 𝑒 are the algebra of dimension four over ℝ satisfying the following conditions:

𝑖 𝑗 𝑘
𝑒 𝑒 𝑒𝑒 𝑒;𝑒 𝑒 ;𝑒𝑒 𝑒𝑒 𝑒 where 1 𝑖, 𝑗, 𝑘, 3 and is an even
1 2 3

permutation. Through elaborate reasoning found in Dieng (2005, p. 129-136) or Mehta (2004, p.

38-41), H can be written as the matrix provided in Equation 2.4.

𝑨 𝑩
𝑯
𝑩 𝑨 [Eq. 2-4]

where A and B are 2𝑛 2𝑛 complex matrices satisfying 𝑨 𝑨 and 𝑩 𝑩. In addition, 𝑨

denotes the complex conjugate matrix of A, and 𝑨 𝑨∗ is the Hermitian conjugate of A.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
141

For more on the development leading to the expression of Equation 2.3 and Equation 2.4, one

may refer to Nevertheless, up to a choice of mean and variance σ , the following are the conditions

set on the elements of the GSE:

(i) the off-diagonal entries of H with 1 𝑖 𝑗 𝑛 are i.i.d. 𝒩 0, σ , and on the

diagonal 𝐻 with 1 𝑖 𝑛 are real i.i.d. 𝒩 0, σ . For reference, refer to Dieng

(2005, p.136) and Bejan (2005, p. 9),

(ii) the probability ℙ 𝑯 𝑑𝑯 that a system of 𝑆𝑝 will belong to the infinitesimal volume

element 𝑑𝑯 is a unique measure invariant under all symplectic transformations. That

is, the density of the probability measure ℙ 𝑯 is invariant concerning all symplectic

𝑯 → 𝑸 𝑯𝑸 transformations where Q is any symplectic matrix in 𝑆𝑝 and 𝑸

represents its dual matrix. A self-dual matrix Q is one verifying 𝑸 𝑸.

In all the three above cases of the 𝛽-ensemble, the invariant restriction on ℙ 𝑯 𝑑𝑯 requires:

𝑃 𝐻 exp 𝑎 𝑡𝑟 𝐻 𝑏 𝑡𝑟 𝐻 𝑐

where a is a real and position number, and b and c are real numbers.

And the volume restriction on ℙ 𝑯 𝑑𝑯 requires, as Mehta (2004) wrote, 𝑑𝑯 to factor as follows

𝑑𝐻 ∏ 𝑑𝐻 , for 𝛽 1

𝑑𝐻 ∏ 𝑑𝐻 ∏ ℜ 𝐻 ℑ 𝐻 , for 𝛽 2

𝑑𝐻 ∏ 𝑑𝐻 ∏ ∏ 𝑑𝐻 , for 𝛽 4

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
142

2.3.1.3 Wishart Ensemble (Laguerre/Jacobi)

The Wishart ensemble is named after its inceptor Wishart (1928), whose pioneering work laid

down the foundation of the theory of random matrices (Péché 2008, Paul and Aue 2014). In

classical RMT, to define a Wishart matrix, one needs first to specify a couple of sequences of

integers n and p. Where n represents the sample size and 𝑝 𝑝 𝑛 the number of variables or

dimensions of the sample selected in the way that 𝑝 → ∞ as 𝑛 → ∞ and lim → 𝛾 ∈ 0, ∞ . Next,

one needs to create a random matrix 𝑿 , known as a sample data matrix, whose columns

𝑋 𝑋 entries 𝑗 1, ⋯ 𝑝 are i.i.d. complex (or real) random variables with a centered
,⋯,

probability distribution 𝜇 (resp. µ) on ℂ (resp. ℝ) that has a finite variance 𝜎 . Then, one can

compute an 𝑝 𝑝 Hermitian (resp. symmetric) matrix 𝑺 as provided in Equation 2.5.

𝑺 1 ∗ 1 𝑻
𝑛 𝑿𝑿 (resp. 𝑺 𝑛 𝑿 𝑿) [Eq. 2-5]

𝑺 is known as a white and uncentered sample covariance matrix (see Frahm 2004, p.102 for

explanation). With the true population covariance matrix 𝜮 of the samples assumed positive and

definite, one can compute the more general form of 𝑺. This matrix, denoted by 𝑺 and expressed

in Equation 2.6, is called a non-white sample covariance matrix.

𝑺 1 ∗ 1 𝑻 [Eq. 2-6]
𝑛 𝜮 𝑿𝑿 𝜮 resp. 𝑺 𝑛 𝜮 𝑿 𝑿𝜮

Note that the Wishart ensemble represents a subgroup of a larger class of covariance matrices by

its matrix construction. However, when the columns entries of X are sampled from a normal

distribution 𝒩 0, 𝜮 , the unnormalized cross-product matrix 𝓢 𝑛𝑺 is often referred to as a

Wishart matrix. In addition, when 𝜮 𝑰 and X’s entries are complex (resp. real) i.i.d. Gaussian,

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
143

the Wishart ensemble is called LUE (resp. LOE), where ‘L’ stands for Laguerre. Alternatively, for

double Wishart matrices, beyond the scope of this study but found in the literature mostly related

to canonical analysis (e.g., Johnstone 2006 and 2009), their resulting Wishart ensembles are JUE

(resp. JOE), where ‘J’ stands for Jacobi (Paul and Aue 2014).

The name Wishart matrix originated from the fact that if the columns’ entries of X are i.i.d.

according to 𝒩 𝟎, 𝜮 and 𝑛 𝑝, then 𝓢 “is said to have Wishart distribution with n degree of

freedom and covariance matrix 𝜮,” denoted as 𝓢 ~ 𝑊 𝑛, 𝚺 (Dieng and Tracy 2011, p. 4).

Nonetheless, it is customary to center the X columns by removing the mean 𝝁 through the operation

𝑿 𝝁. Thus, there should be no concern if 𝑿~𝒩 𝝁, 𝜮 . Accordingly, for simplification purposes,

researchers usually assume 𝜇 0. Nevertheless, the density function of this distribution, derived

by Wishart in 1928, is expressed in Equation 2.7,

/ ⁄
1 [Eq. 2-7]
𝑓 𝓢 𝐶 , |𝜮| |𝓢| 𝑒𝑥𝑝 𝑡𝑟 𝜮 𝓢
2

where 𝑡𝑟 ∙ represents the trace of a matrix, 𝐶 , is a normalizing constant, and n is assumed to be

greater than 𝑝 1 (e.g., see Johnstone 2006.) Since its inception, the Wishart model 𝑊 𝑛, 𝜮 has

become the focal point of various studies in multivariate statistical analysis. Perhaps regarding its

popularity, as Johnstone (2006, p.5) wrote, “this idealized model of independent draws from a

Gaussian is generally at best approximately true—but we may find some reassurance in the dictum

“All models are wrong, some are useful.” ”

Focusing exclusively on real sample covariance matrices will simplify the development to

continue developing this section. Additionally, this is justified because the current research study

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
144

focuses solely on project activity durations derived from numbers. Moreover, the general case of

𝒮 will be used since one can reorganize the expression of 𝑺 to obtain an expression like 𝒮. Now,

because 𝒮 is an 𝑝 𝑝 symmetric matrix, one may write 𝒮 as in Equation 2.8,

𝓢 𝑿𝑿 𝑙𝒖𝒖 𝑼𝑳𝑼 [Eq. 2-8]

in terms of its p eigenvalue-eigenvector pairs 𝑙 , 𝒖 , where 𝑙 0 for 𝑖 1, ⋯, represent the

entries of the diagonal matrix L and the orthogonal 1 𝑝 unit vectors 𝒖 represents the columns

of the orthogonal matrix 𝑼 𝒖 , 𝒖 , ⋯ , 𝒖 . Through the single value decomposition (SVD) of

the 𝑛 𝑝 sample data matrix X as in Equation 2.9,

𝑿 𝑑𝒘𝒗 𝑾𝑫𝑽 [Eq. 2-9]

one can show that the single values 𝑑 of X are the square roots (𝑑 𝑙 ) of the
,⋯,

eigenvalues 𝑙 ,⋯, of 𝒮, where r is the rank of X such that 𝑟 min 𝑛, 𝑝 . In Equation 2.9,

the components of 𝑾 𝒘 ,𝒘 ,⋯,𝒘 are 1 𝑛 unit vectors 𝒘 , those of 𝑽 𝒗 ,𝒗 ,⋯,𝒗

are orthogonal 1 𝑝 unit vectors, and those of the 𝑟 𝑟 diagonal matrix D are 𝑑 .

In general, as Bejan (2005, p. 10) wrote, if a sample covariance matrix A possesses a density

function 𝑓, such as the one provided in Equation 2.7 or the one expressed in terms of the weight

functions (e.g., see Johnstone 2006 and Paul and Aue 2014), then Equation 2.10 below provided

is the joint density function of its eigenvalues 𝑙 𝑙 ⋯ 𝑙 , where

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
145

𝜋 ⁄
𝑙 𝑙 𝑓 𝑯𝑳𝑯 𝑑𝑯 [Eq. 2-10]
𝛤 𝑝 ⁄2
𝒪

𝒪 is the orthogonal group of 𝑝 𝑝 matrices (see Section 2.3.1.2), 𝛤 denotes (see Bejan 2005) a

complex-valued multivariate gamma function as defined in Equation 2.11 below,


𝛤 𝑧 𝜋 ∏ 𝛤 𝑧 𝑘 1 , ℜ𝑧 𝑝 1 , [Eq. 2-11]

where 𝑑𝑯 represents the Haar invariant probability measure on 𝒪 normalized in a way that:

𝒪
𝑑𝐻 1.

When 𝑨 ~ 𝑊 𝑛, 𝚺 with 𝑛 𝑝 1, then the joint density of the eigenvalues 𝑙 𝑙 ⋯ 𝑙

of A is given by Equation 2.12 (see Bejan 2005, Johnson and Wichern 2019) as follows,


𝜋 2 ⁄ 𝑑𝑒𝑡𝚺 ⁄
⁄ 1 𝟏 Eq. 2-12
𝑙 𝑙 𝑙 𝑡𝑟 𝚺 𝐻𝑳𝐻 𝑑𝐻
𝛤 𝑛 ⁄2 𝛤 𝑝 ⁄2 2
𝒪

Because the gamma function 𝛤 is related to the famous 𝜒 test introduced by Pearson (1900), it

is worth mentioning that when 𝑛 → ∞, the sampling distribution in Equation 2.12 converges to the

𝜒 distribution (e.g., see Dieng and Tracy 2011). Nevertheless, like the spectral decomposition of

𝒮, one may express 𝜮 in terms of its spectral decomposition as in Equation 2.13:

𝜮 𝜆𝒚𝒚 𝜰𝜦𝜰 Eq. 2-13

Note that 𝓢 𝑛𝑺, up to a constant 1⁄ 𝑛 1 , coincides theoretically with the unbiased estimator

of the population covariance matrix 𝜮. In most studies, 𝜮 is unknown. Accordingly, a few exciting

questions emerge from the study of the eigenvalues of the Wishart ensemble, especially when

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
146

𝑛→∞. One of them is concerned with how the eigenvalues of a sample covariance matrix S, also

called by Baik et al. (2005) the “sample eigenvalues,” are related to the population eigenvalues

(from 𝜮)? To answer this intriguing question, the following excerpt from Johnstone (2001, p.2)

perhaps provides a clue “A basic phenomenon is that the sample eigenvalues 𝑙 are more spread

out than the population eigenvalues 𝜆 .” He goes on to add that the spreading effect is more

pronounced in the null cases (𝜮 𝜎 𝑰). The following sections attempt to answer this question by

drawing on the extensive literature on RMT. Before that, and in light of their significance and

relevance to this study, it is necessary to expand more on the subject of the Wishart matrices.

2.3.2 Elements and Properties of Wishart Random Matrices

Due to the importance of Wishart matrices and models in RMT and their relevance to this study,

this section will elaborate on the subject of Wishart matrices and models discussed in the preceding

section. Moreover, it is necessary to understand the results derived from this class of matrices and

apply them to solve this study’s problems. Accordingly, the following subsections, devoted to the

Wishart matrices, define the basic terms associated with their formulation, state their well-known

properties and results, and conclude with a summary table of the elements of a Wishart model.

2.3.2.1 Definitions

Notations and Conventions

One may refer to Section 1.8.3 for the notations and symbols used throughout this chapter.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
147

Population Variance-Covariance Matrix

Let 𝒩 𝝁, 𝜮 denotes a normal population of measurable characteristics represented by p variables

𝑿 , 𝑿 , … , 𝑿 , where 𝝁 𝜇 ,𝜇 ,⋯,𝜇 and 𝛴 represent the population mean vector and

covariance matrix, respectively. With this population, one may create a simple model to describe

an occurrence of 𝒳 as a random vector 𝑿 , 𝑿 , … , 𝑿 drawn with a probability law characterized

by the population parameters μ and 𝛴. In terms of the population variables and parameters,

Equation 2.14(a) establishes the equivalence between 𝛴 and the variance-covariance matrix of 𝒳

denoted by 𝐶𝑜𝑣 𝒳 , where E ∙ is an operator of mathematical expectation. As expressed, 𝐶𝑜𝑣 𝒳

is a square and symmetric 𝑝 𝑝 matrix whose diagonal 𝜎 and off-diagonal 𝜎 (𝑖 𝑗) entries with

𝑖, 𝑗 1, ⋯ , 𝑝, are respectively the variances of the variables 𝐗 and covariances between pairs

of variables 𝑿 and 𝑿 . Equation 2.14(b) and Equation 2.14(c) provide respectively the expressions

of 𝜎 and 𝜎 .

𝜮 ≡ 𝐶𝑜𝑣 𝒳 ≝ 𝔼 𝑿 𝝁 𝑿 𝝁 (a)

𝜎 𝐸 𝑿 𝜇 𝑿 𝜇 ≡ 𝑐𝑜𝑣 𝑿 , 𝑿 (b) Eq. 2-14

𝜎 𝐸 𝑿 𝜇 ≡ 𝑣𝑎𝑟 𝑿 (c)

Sample Data Matrix

Now, let 𝓧 , 𝓧 , ⋯ , 𝓧 be a random sample of size n drawn independently from the p-variate

normal population 𝒩 𝝁, 𝜮 . From the n drawn samples, one can define, as in Equation 2.15, an

𝑛 𝑝 random matrix 𝒳 to record future values of the p variables.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
148

X X ⋯ X
⎡ ⎤
X X ⋯ X
𝒳 𝑿 𝐗 ⋯𝐗 ≝ ⎢ ⎥ Eq. 2-15
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎣X X ⋯ 𝑋 ⎦

Where, 𝐗 X ,X ,⋯,X ~ 𝑁 𝝁, 𝜮 with 𝑖 1, ⋯ , 𝑛 are comparable to slots necessary to

record future realizations of the random vector 𝒳. Subsequently, once the random matrix 𝒳 is

observed, one can create, as provided in Equation 2.16, a sample data matrix 𝐗 𝐱 ,⋯, ,

with 𝐱 x ,x ,⋯,x .

𝑥 𝑥 ⋯ 𝑥 x Eq. 2-16
𝑥 𝑥 ⋯ 𝑥 ⎡ ⎤
𝑿𝒏 𝐱 𝐱 ⋯𝐱 ⎢x ⎥
𝒑 ⋮ ⋮ ⋱ ⋮ ⎢⋮ ⎥
𝑥 𝑥 ⋯ 𝑥 ⎣x ⎦

Through Equation 2.16, one may view each row i of the matrix 𝑿 as a row-vector 𝐱

independently drawn from 𝑁 𝝁, 𝜮 . Hence, 𝐱 is said to be equipped with the population’s

variance-covariance structure matrix 𝛴. Meanwhile, with retrospective to Section 2.3.1.3, when

𝑛 𝑝, by multiplying the matrix X by its transpose 𝑿𝑻 , one obtains a Wishart real matrix 𝑨

𝑿 𝑿 distributed according to 𝑊 𝑛, 𝜮 . This matrix belongs to the Wishart ensemble, and it is

referred to as the sample covariance matrix of X. The matrix A is said to possess a Wishart

distribution with degree of freedom n, denoted as 𝑨~𝑊 𝑛, 𝜮 . When this distribution exists, then

the 𝑝 𝑝 symmetric matrix 𝑨 is positive definite (Johnson and Wichern 2019).

Statistical Distance (d) And Quadratic Form of a Matrix

In mathematics, given a p-dimensional vector 𝒙, an 𝑝 𝑝 real and symmetric matrix 𝑴 is positive

definite if the square of the statistical distance (d) satisfies the condition in Equation 2.17.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
149

0 𝑑 𝒙 𝑴𝒙 for 𝒙 ∈ ℝ𝒏 \ 𝟎 Eq. 2-17

In the above inequality, the square of the distance d is the quadratic form of M. Quadratic forms,

also known as matrix products, and distances play an essential role in multivariate analysis. One

may refer to a multivariate statistical analysis book such as Johnson and Wichern (2019) for further

interest in both topics.

Biased And Unbiased Sample Covariance Matrix 𝑺𝒏 and 𝑺

From the observations on the p variables 𝑿 , 𝐗 , ⋯ , 𝐗 , one can compute the actual values of 𝜎

and 𝜎 in terms of the sample covariance and variance 𝑠 and 𝑠 as given by Equation 1.57 and

Equation 1.56, respectively, then record them in a matrix 𝑺 as in Equation 2.18 below provided.

The matrix S is known as the sample variance-covariance or simply covariance matrix.

s s ⋯ s ⋯ s Eq. 2-18
⎡s s s s ⎤

⎢ ⋮ ⋮ ⋮ ⋮ ⎥
𝑺𝒏 ⎢ ⎥
⎢s s ⋯ s ⋯ s ⎥
⎢ ⋮ ⋮ ⋮ ⋮ ⎥
⎣s s ⋯ s ⋯ s ⎦

The sample covariance matrix 𝑺𝒏 is a biased estimator of 𝜮, whereas the one without a subscript

𝑺 and related to 𝑺𝒏 using Equation 2.19 is an unbiased estimator of 𝜮.

𝑛 1
𝑺 𝑺𝒏 𝑥 𝑥̅ 𝑥 𝑥̅ 𝑖 1, ⋯ , 𝑝, 𝑘 1, ⋯ , 𝑝 [Eq. 2-19]
𝑛 1 𝑛 1

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
150

where 𝒙 x x ⋯x , in Equation 2.19, represents the observed mean vector of X with each

x as defined in Equation 1.50. In many multivariate test statistics, the definition of the sample

covariance 𝑺 is commonly used (Johnson and Wichern 2019).

Sample Correlation Matrices

With retrospective to Section 1.6.6.4, by standardizing each 𝑠 in the expression of Equation 2.18

to obtain the sample correlation coefficient, also known as the Pearson’s product-moment 𝑟

defined in Equation 1.58, one can create a matrix R defined by Equation 2.20. The resulting

matrix R represents the sample coefficient matrix or simply sample correlations (Johnson and

Wichern 2019).

1 r ⋯ r ⋯ r Eq. 2-20
⎡r 1 ⋯ r r ⎤
⎢ ⋮ ⎥
𝑹 ⎢ ⋮ ⋮ ⋮ ⎥
⎢r r ⋯ 1 ⋯ r ⎥
⎢ ⋮ ⋮ ⋮ ⋮ ⎥
⎣r r ⋯ r ⋯ 1⎦

Note that by creating a diagonal matrix D with the square root of the diagonal entries s of

𝑺, one can derive the expression of Equation 2.21 defining R in terms of 𝑺.

𝑠
𝑹 𝑫 𝑺𝑫 𝑟
𝑠 𝑠 [Eq. 2-21]

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
151

2.3.2.2 Wishart Distribution: Sampling Distribution Law of the Sample Mean 𝑿 and

Covariance Matrix S

Since their inceptions, scholars have intensively employed Wishart random matrices as a

framework for developing various multivariate analysis methods. This class of matrices certainly

owes popularity to its classical results, particularly those formulated for real matrices under the

Gaussian assumption. As mentioned earlier in Section 2.3.1.3, the Wishart distribution is named

after its inceptor and is known as the distribution law of the sample covariance matrices. In other

words, it represents the joint distribution law of independent and repetitive sampling of

multivariate random variables. Each sampled random variable is distributed according to the

multivariate normal distribution of mean 𝝁 𝟎 and variance-covariance matrix 𝜮. Given m

multivariate random variables 𝒁𝑗 independently distributed according to 𝑁 𝟎, 𝜮 , the Wishart

distribution with degree of freedom m, denoted as 𝑊𝑚 ∙ |𝜮 , is defined as the sum of independent

products of the variables 𝒁𝑗 . As defined, it represents the joint distribution of the m independent

observations. Its expression is as given by the following equation.

𝑊 ∙|𝜮 {Joint distribution of 𝒁1 , ⋯ , 𝒁𝑚 } = ∑ 𝒁𝒁

There is no need to provide the general expression of the probability density of the Wishart

distribution 𝑊 ∙ | 𝜮 because of its complex form, which renders its evaluation intricate.

However, it is interesting to provide its expression still to help understand any other distribution

function derived from it. To define this density function, let 𝑨 be the positive definite matrix as

given in Equation 2.22.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
152

𝑨 𝑿 𝑿
𝑝 𝑝 𝑝 𝑛 𝑛 𝑝 [Eq. 2-22]

Formulated based on n independent samples greater than the number of variables p. Under these

conditions, there exists a joint density function distribution 𝑊 𝑨 | 𝜮 . This function, whose

expression is given in Equation 2.23, represents the joint density function of the independent

samples, which happen to represent the observations of the random row vectors 𝑿 , 𝑿 , ⋯ , 𝑿 .

/ 𝑨𝜮 ⁄
|𝑨| 𝑒
𝑊 𝑨|𝜮 [Eq. 2-23]
2 / 𝜋 / |𝛴| / ∏ 𝛤 𝑛 𝑖 ⁄2

Using the above expression, one can demonstrate the following basic properties of the Wishart

distribution.

(i) if 𝑨 and 𝑨 are respectively governed by 𝑊 𝑨 | 𝜮 and 𝑊 𝑨 | 𝜮 , with 𝐴

independent of 𝑨 , then the matrix 𝑨 𝑨 is distributed according to 𝑊 𝑨 𝑨 |𝜮 .

(ii) if 𝑨 is distributed as 𝑊 𝑨 | 𝜮 and C is an arbitrary matrix, then 𝑪𝑨𝑪 follows the

Wishart distribution 𝑊 𝑪𝑨𝑪 | 𝑪𝜮𝑪 .

Given a random sample 𝑿 , 𝑿 , ⋯ , 𝑿 of size n drawn from a p-dimensional normal distribution

of mean μ and covariance matrix 𝛴, that is 𝑿 ~𝑁 𝝁, 𝜮 with 𝑖 1, … , 𝑛, the following are

essential results for sampling sample covariance matrices.

(i) the sampling mean 𝑿, as expressed below in Equation 2.24 follows the normal distribution

𝑁 𝝁, 𝜮 .

𝑿 𝑿 ⋯𝑿
𝑿 𝑛 [Eq. 2-24]

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
153

(ii) the sampling covariance matrix 𝑛 1 𝑺, as in Equation 2.25, has a Wishart distribution

𝑊 𝑛 1 𝑺 𝜮 with 𝑛 1 degree of freedom (d.f.). Where S represents the unbiased

estimator of Σ.
𝑻
𝑛 1 𝑺 𝑿 𝟏𝐱 𝑿 𝟏𝐱 𝑻
𝑝 𝑝 𝑝 𝑛 𝑛 𝑝 [Eq. 2-25]

(iii) the sampling mean 𝑿 and covariance matrix 𝑺 are independent and sufficient statistics.

According to Johnson and Wichern (2019), adequate statistics mean that all the information

about the population mean 𝝁 and covariance matrix 𝜮, can be found in the observations of the

sampling mean 𝐱 and covariance matrix S, despite the sample size n. However, “this generally

is not true for nonnormal populations.” (Johnson and Wichern 2019, p.173).

Because of the similarities between the distribution law of the sampling mean 𝑿 of Wishart real

matrices and the celebrated law of large numbers and the central limit theorem, it is worthwhile

stating both herewith.

2.3.2.3 Law of Large Numbers and the Central Limit Theorem

Because they are well-known and intensively used in statistics, this section reminds readers of the

law of large numbers and the central limit theorem. To state the law of large numbers, let

𝑌 , 𝑌 , ⋯ , 𝑌 be n independent observations drawn each from a population with a mean 𝐸 𝑌 𝜇.

Then, as 𝑛 increases, 𝑌 converges in probability to 𝜇 with no bound. In other words, given an

accuracy 𝜀 0, the probability 𝑃 𝜀 𝑌 𝜇 𝜀 approaches unity as 𝑛 → ∞. The proof of this

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
154

important result, which direct consequence is to say that 𝑋 , ,⋯, converge in probability to

𝜇 , ,⋯, , can be found in the book by Johnson and Wichern (2019).

Alternatively, the central limit theorem is stated as follows. Given 𝑋 , 𝑋 , ⋯ , 𝑋 , n independent

observations drawn from a normal population with mean 𝝁 and finite covariance matrix 𝜮 – that

is each 𝑋 ~𝒩 𝝁, 𝜮 . Then, for large sample sizes with n large relative to p, √𝑛 𝑿 𝝁 has an

approximate distribution 𝒩 𝝁, 𝜮 . As for the previous results, one can also find the proof of this

result in the book by Johnson and Wichern (2019, p. 176). From the central limit, the following

significant result is also derived. That is, 𝑛 𝑿 𝝁 𝑆 𝑿 𝝁 is approximately distributed

according to the chi-square distribution with p degrees of freedom, denoted as 𝜒 , for 𝑛 𝑝 large.

Here, 𝜒 is defined as the probability distribution of the sum 𝑍 𝑍 ⋯ 𝑍 , where each 𝑍

with 𝑖 1, ⋯ , 𝑝 represents an independent 𝒩 0,1 random variable (Johnson and Wichern 2019,

p. 163, Tijms 2007).

2.3.2.4 White Wishart Ensemble of Real Sample Covariance Matrices

In the literature, a population with, up to a constant, an identity covariance matrix (𝜮 𝜎 𝑰) and

zero mean vector (𝝁 𝟎) is referred to as the “null case.” The set of sample covariance matrices

constructed from sample data matrices 𝑿 drawn from the “null case” population is known as a

white Wishart ensemble and denoted by 𝑊 𝑛, 𝜮 𝑿 𝑿 where 𝑿~𝒩 𝟎, 𝜮 . The term “null

case” is, as Johnstone (2001) noted, “in analogy with time-series settings where a white spectrum

is one with the same variance at all frequencies.” In nuclear physics, the spectral properties of the

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
155

white Wishart matrices, mainly the white Wishart ensemble of complex matrices, are of a long-

lasting and tremendous interest (Bejan 2005).

Generally, it is tricky to evaluate the integral in Equation 2.12. However, in the null case, the

integral in that equation can be simplified as in Equation 2.26:

1 1
𝑒𝑡𝑟 𝚺 𝟏
𝐻𝐿𝐻 𝑑𝐻 exp 𝑙 [Eq. 2-26]
2 2𝜆
𝒪

and the density becomes as in Equation 2.27 below provided.


𝜋 2 ⁄ 𝑑𝑒𝑡𝚺 ⁄
1 ⁄
exp 𝑙 𝑙 𝑙 𝑙 [Eq. 2-27]
𝛤 𝑛⁄2 𝛤 𝑝⁄2 2𝜎

For more on the density of the eigenvalues of the Wishart real matrices, including the non-null

case (𝜮 𝜎 𝑰) with 𝝁 𝟎 and other properties of 𝑊 𝑛, 𝛴 , one may refer to the works of

Muirhead (2009) and Anderson (2003), Bejan (2005), Dieng and Tracy (2011), and Paul and Aue

(2014).

2.3.2.5 Wishart Real Model 𝑾𝒑 𝒏, 𝚺

From all the elements introduced in the previous section on Wishart real matrix, one may create a

Wishart model by defining the triplet 𝛺, ℙ, ℱ in terms of random samples 𝓧 as follows in Table

2-1 below provided.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
156
System: Population of parameters (µ, 𝜮 + p measurable features: 𝑿𝟏 , … , 𝑿𝒑
Probability space/Triplet 𝜴, ℙ, 𝓕 : Random process representation: 𝓧 ≝ 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝒑
𝛺 = Sample space = Group of random matrices Input: n independent samples
e.g., GOE 𝑛 𝑛 real symmetric matrices, Wishart 𝑊 𝑛, 𝜮
𝒳 , 𝒳 , ⋯, 𝒳 from population

ℙ = Probability measure on 𝛺 Output: 𝑿 ≝ sample data


(e.g., TW limit law 𝐹 , 𝑡 ℙ , ) matrix with observed values
(i) known expression but complicated to compute/simplify, 𝒙 ,⋯, of 𝒳 , where 𝒙
(ii) expressed as a function of the population covariance 𝑥 ,⋯,𝑥 [e.g., 𝒙 ~𝑁 𝝁, 𝜮 ]
matrix 𝜮, unknown in most cases, Example:
(iii) approximates using sampled matrices (unbiased
𝑥 𝑥 ⋯ 𝑥
estimator of 𝜮 ) in their eigenvalues. 𝑥 𝑥 ⋯ 𝑥
𝑿𝒏 𝒑 ⋮ ⋮ ⋱ ⋮
ℱ = σ-algebra on 𝛺, defines subsets of Ω under 𝑥 𝑥 ⋯ 𝑥
complements and countable unions.
e.g.,
𝛺 ≡ 𝒪 ≝ GOE of 𝑛 𝑛 real symmetric matrices H
𝛺 ≝ GUE of 𝑛 𝑛 complex Hermitian matrices H
𝛺 ≝ GSE of 2𝑛 2𝑛 Hermitian matrices H
Each of the three (𝛺, ℙ, 𝓕 is a probability space providing a
model
𝐹 , 𝑡 ℙ , 𝜆 𝑡 ,𝛽 1, 2, 4; where 𝐹 , is the limit
law of 𝜆

Table 2-1: Summary of a Wishart Real Model

2.3.3 Bulk Spectrum Behaviors (Universality Theorems)

2.3.3.1 Empirical spectral distribution (ESD) of eigenvalues

Suppose Y is an 𝑝 𝑝 random complex Hermitian (resp. real symmetric) matrix. As defined, the

eigenvalues of Y exist and are all always real numbers and can be sorted and ordered as 𝜆 𝜆

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
157

⋯ 𝜆 . Thus, the ESD, also known as the spectral statistics of the eigenvalues of Y, is defined as

the random distribution function 𝐺 𝒀 whose expression is given by Equation 2.28.

1 [Eq. 2-28]
𝐺𝒀 𝑡 𝟏 ; ∀𝑡 ∈ℝ
𝑝

As Frahm (2004, p. 100) pointed out, “an eigenvalue of a random matrix is random but per se not

a random variable.” The reason is that there is no single-valued mapping 𝒀 → 𝜆𝑖 with 𝑖 1, ⋯ , 𝑝

but instead 𝒀 → 𝜆 𝑌 where 𝜆 𝑌 represents the set of all eigenvalues of Y.

Nevertheless, the asymptotic behavior of the ESD, reasonably well understood now, plays a crucial

role in studying the behavior of the random matrices in RMT. For example, as the dimension p of

the matrix increases, one of the key questions that usually emerges is: does the ESD of an

adequately scaled random matrix Y converge to a probability distribution?

In the context of the two fundamental classes of random matrices introduced in the previous

sections, the celebrated semicircle for the Wigner matrices and its counterpart known as the

Marchenko-Pastur law for the Wishart matrices help answer this question in part as they are

concerned with the bulk spectrum—properties of the whole set of eigenvalues. Hence, answering

the same question from the edge spectrum or extreme eigenvalues’ perspective will be crucial in

incorporating the entire range of eigenvalues. Accordingly, the following sections state both laws

and results pertaining to the edge spectrum.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
158

2.3.3.2 Wigner Semicircle Law - Wigner (1952, 1958)

Considered one of the greatest discoveries in physics, it paved the way for developing a new field

of mathematics and physics devoted to studying quantum chaos, disordered systems, and

fluctuations in mesoscopic systems (Péché 2008). The following statement summarizes a

significant law discovered by Wigner (1952, 1958) while studying the energy levels of complex

nuclei. Suppose an 𝑛 𝑛 Wigner matrix 𝑯 𝐻 whose entries 𝐻 are i.i.d with a


√ ,

centered distribution that is independent of n and has a finite variance 𝜎 (moment of order 2).

Then, as n tends to infinity, it is well known that the ESD 𝐺 𝑯 of H approaches an n independent

limiting law 𝑔 , which takes a semi-circular shape on the compact subset of the real line

2𝜎, 2𝜎 . Equation 2.29: provides the often-represented expression of the Wigner semicircle

law 𝑔 (Paul and Aue 2014, Péché 2008, Frahm 2004).

𝑑𝐺 𝑯 𝑡 1
𝑔𝑯 𝑡 ⎯⎯ 𝑔 𝑡 4𝜎 𝑡 𝟏 , 𝑡 [Eq. 2-29]
𝑑𝑡 → 2𝜋𝜎

When 𝜇 𝜇 0 and 𝜎 1, then Equation 2.29: becomes Equation 2.30 below provided.

1
𝑔 𝑡 4 𝑡 𝟏 , 𝑡 [Eq. 2-30]
2𝜋

Equation 2.29: represents the limit law for the empirical distribution function of the eigenvalues

of the random matrix H considered as not being normalized (Frahm 2004). Accordingly,

depending on the normalization of the random matrix or scaling its eigenvalues, so they lie in a

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
159

compact, this law takes various expressions. Among other expressions of the Wigner semicircle

law found in the literature is the one provided in Equation 2.31 below (see Tracy and Widom 1992

and Frahm 2004).

𝑔 𝑡 √1 𝑡 𝟏 , 𝑡
[Eq. 2-31]

For instance, to illustrate the Wigner semicircle law in MATrix LABoratory (MATLAB), one can

easily use the function “randn” to generate 25 of 100x100 symmetric matrices whose entries are

i.i.d. Gaussian random variables of mean 0 and variance one and plot the density of their

normalized eigenvalues in the form of the histogram depicted in Figure 2.2(a). In addition, to verify

the universality of this law, one can replace the Gaussian distribution with the uniform distribution

on [-1,1] (“rand”) and plot the density of the normalized eigenvalues to obtain the histogram in

Figure 2.2(b), which shows that the law still holds.

(a) (b)

Figure 2.2: Illustrations of the Wigner Semicircle Law: Random Matrices with Normally
and Uniformly Distributed Entries
Adapted from Tracy and Widom (1992)

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
160

2.3.3.3 Marchenko - Pastur Law (1967)

Deemed as an analog of the Wigner semicircle law for sample covariance matrices, the following

is a significant result proved by Marchenko and Pastur (1967). To state this result to the white case

or general case of sample covariance matrices, let 𝑿 be an 𝑛 𝑝 random matrix whose entries

𝑋 ,⋯, ; ,⋯,
and associated sample covariance matrix 𝑺 are defined in Table 2-2 below. In

addition, one may consider the two sequences of integers n, for the sample size, and 𝑝 𝑛 , for

the number of variables or dimension, as follows: 𝑝 → ∞ as 𝑛 → ∞, and lim → 𝛾 ∈ 0, ∞ .


Cases Sample Covariance Matrix Null Hypothesis Expression


Formulation
White Case 𝑺 𝑿 1 𝑻 𝜮 𝑰
𝑛 𝑿 𝑿

𝑋 are centered (mean subtracted out) i.i.d.


sampled with a distribution that has a finite
variance 𝜎
Non-White Case 𝑺 𝜮 1 𝑻 𝜮 𝜎 𝑰
(General) 𝑛 𝜮 𝑿 𝑿𝜮
𝑋 are centered (mean subtracted out) i.i.d.
sampled with a distribution that has a variance 1

Table 2-2: Components of the Marchenko - Pastur Law

Then, as 𝑛 → ∞, the empirical distribution of the eigenvalues 𝐺 𝑺 𝑡 of the sample covariance

matrix 𝑺 almost surely converges in distribution to the Marchenko-Pastur law denoted by 𝐺 (see

Johnstone 2001, Frahm 2004, Péché 2008, Paul and Aue 2014). If 𝐺 has a p.d.f 𝑔 , then one may

derive the following Equation 2.32 from Equation 2.28

𝑑𝐺 𝑡 𝑏 𝑡 𝑡 𝑎
𝑔 𝑡 𝟏 , 𝑡 [Eq. 2-32]
𝑑𝑡 2𝜋𝛾𝑡𝜎

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
161

where 𝑎 𝜎 1 √𝛾 and 𝑏 𝜎 1 √𝛾 .

As Paul (2014) noted, and as shown in Figure 2.3 below, the implication of this result is the

spreading of the eigenvalues of S around their population counterpart (all 𝜆 𝜆 ⋯ 𝜆

𝑡 1 when 𝜎 1), and the increase in the spread (𝑏 𝑎) as the ratio 𝛾 𝑝/𝑛 increases from 0

to 1 (see the small figure in Figure 2.3). In addition, the larger 𝛾, the more spread the eigenvalues.

Even asymptotically, the spread remains. Furthermore, when 𝛾 1 corresponding to 𝑛 𝑝, the

greatest and least eigenvalue approach 4 and 0 respectively. Last,

when 𝛾 → 0, both the largest (b) and the smallest (a) eigenvalues of S converge toward 1 and the

limit law no longer holds because of lim 𝑔 𝑡 → ∞.


This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
162

Marchenko-Pastur Law: Plots of Density Function g (t)


2.5
=1.00 - n=1p - [a, b]=[0, 4]
=0.50 - n=2p - [a, b]=[0.09, 2.91]
=0.25 - n=4p - [a, b]=[0.25, 2.25]
2 =0.10 - n=10p - [a, b]=[0.47, 1.73]
=0.05 - n=20p - [a, b]=[0.6, 1.5]

1.5
Density

b-a
1

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4
t

Figure 2.3: Marchenko-Pastur Law of Density Function g

Over the years, considerable literature has rediscovered and extended this limiting law (Johnstone

2001). For instance, the law has been extended to a much broader class of random sample

covariance matrices such as non-white case 𝑺 𝜮 (Péché 2008) or 𝑺 𝑿 with X, a non-symmetric

but only square sample data matrix (Frahm 2004). In addition, the law can still be applicable even

if the sample size n is smaller than the number of variables p (Frahm 2004). That is, 𝛾 1.

2.3.3.4 Final Thoughts on Bulk Behaviors

The study of the asymptotic behaviors or properties of the largest eigenvalues of the Wigner

random matrices and sample covariance matrices has several other exciting applications (e.g., see

Patterson et al. 2006 for application to genetics, Plerou et al. 2002 for applications to mathematical

finance). The Wigner semicircle and Marchenko-Pastur laws can be perceived as a so-called

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
163

universality result. They both apply to the study of global statistics of the spectrum of random

matrices (e.g., rates of convergence, large deviations) under the assumption of a finite variance of

the matrix entries. One may refer to Bai and Yin (1988) and Péché (2003) for more on the topic.

Nevertheless, concerning the Central Limit Theorem (see Section 2.3.2.3), the asymptotic global

behavior of the eigenvalues of Wigner random matrices and sample covariance matrices is not

contingent on the characteristics of the sampling distribution law of the matrices’ entries —𝜮 and

𝜎 (Péché 2008). In addition, the asymptotic global behavior has already been proven by various

authors (e.g., see Deift and Gioev 2007).

Accordingly, the following sections will only be concerned with local properties of the spectrum

of large random matrices. More specifically, with the asymptotic behavior of the largest

eigenvalues of random matrices but emphasizing the asymptotically universal properties of the

largest eigenvalues. Why such a focus? For many scholars, such as Péché (2008), and Paul and

Aue (2014), the first motivation originates from mathematical statistics, which is concerned with

the largest eigenvalues of sample covariance matrices (high-dimensional data). For example, in

PCA, the behavior of the principal components as the number of variables 𝑝 → ∞, and the sample

size n is kept fixed is now well known. Unlike the traditional assumptions, the current trend has

been toward studying the case where n is of the same magnitude as p. The second motivation is

that the limiting behavior of the largest eigenvalues of the non-white Wishart random matrices

(S(𝛴)) is crucial for testing hypotheses on the population covariance matrix 𝛴. For instance, when

the tests involve the null hypothesis H0 and its alternative H1, one may propose a test of H0 to

study the asymptotic distribution of the extreme eigenvalues under H0.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
164

Before providing the details on the asymptotic properties of the largest eigenvalues, it’s important

to make the following observation. If b denotes the common top edge of the support of the Wigner

semicircle and Marchenko-Pastur distributions, then one can, as given by Equation 2.33, derive

that almost indeed:

lim 𝑖𝑛𝑓𝜆 𝑏

[Eq. 2-33]

As a result, the following significant questions can arise: Would the greatest eigenvalues nearly

indeed converge to b? What is the joint limiting distribution of the largest eigenvalues, then? Is

this limiting distribution a universal law like the Wigner semicircle or the Marchenko-Pastur law?

Finally, what is the class (ensemble) of this limiting distribution law if this is the case? Responses

to these questions will be offered briefly in the following developments.

2.3.4 Edge Spectrum and Universality of The Tracy Widom Distributions

This section discusses the joint limiting distribution of the largest eigenvalues of matrices

belonging to the Gaussian ensembles (GOE, GUE, and GSE) and a broader class of matrices. The

theorems chosen from various writers are noteworthy results that answer several of the problems

asked in Section 2.3.3's conclusion section. They are all based on the important work of Tracy and

Widom (1993, 1994, 1996), who identified the limit laws related to the GOE, GUE, and GSE

matrices' largest eigenvalue theoretical distribution 𝜆 (sometimes denoted by 𝑙 ). The following

is their key result concerning these distribution laws.

Let 𝑨 𝐴 be a matrix that is an element of one of the ensembles GOE, GUE, or GSE
,

specified in Section 2.3.1.2 and whose random variables 𝑨𝒋 are centered and normalized to avoid

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
165

nontrivial limits. If the following function denotes the distribution function for the largest

eigenvalue of A as given by Equation 2.34

𝐹 , 𝑡 : ℙ , 𝜆 𝑡 , 𝛽 1, 2, 4
[Eq. 2-34]
then the limiting laws provided by Equation 2.35

𝐹 𝑥 : lim 𝐹 2𝜎√𝑛 𝜎𝑥 , 𝛽 1, 2, 4
,
→ 𝑛 [Eq. 2-35]

exist and are explicitly provided by Equation 2.36(a) to Equation 2.36(c).

(a)
𝐹 𝑠 ℙ 𝑙 𝑠 𝑒𝑥𝑝 𝑥 𝑠 𝑞 𝑥 𝑑𝑥
[Eq. 2-36]
1 (b)
𝐹 𝑠 ℙ 𝑙 𝑠 𝐹 𝑠 𝑒𝑥𝑝 𝑞 𝑥 𝑑𝑥
2

1 (c)
𝐹 𝑠 ℙ 𝑙 𝑠 𝐹 𝑠 𝑐𝑜𝑠ℎ 𝑞 𝑥 𝑑𝑥
2

With reference to Section 2.3.1.2, 𝜎 is the standard deviation of the Gaussian distribution on the

off-diagonal matrix elements in the above equations, and q denotes the unique solution to the

Painlevé II equation, the formula for which is provided by Equation 2.37.

𝑞 𝑥𝑞 2𝑞
[Eq. 2-37]

in a way that 𝑞 𝑥 ~ 𝐴𝑖 𝑥 as 𝑥 → ∞, where 𝐴𝑖 𝑥 is the solution to the Airy equation, which

decays at ∞ like the function provided in Equation 2.38:

1 ⁄
2
𝜋 𝑥 exp 𝑥
2 3 [Eq. 2-38]

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
166

Meanwhile, it is worth mentioning that the six Painlevé differential equations developed a century

ago have several applications in diverse branches of contemporary physics. The general solutions

to these equations, on the other hand, are transcendental. In other words, they cannot be expressed

in terms of any previously defined function, including any commonly used special functions (Zeng

and Hou, 2012). Nevertheless, as previously stated, the Gaussian ensembles are characterized by

𝑑𝐹
invariant measures. Therefore, the joint density functions 𝑓 , 𝑓 , and 𝑓 (with 𝑓 𝑑𝑥 ) of the

largest eigenvalues associated with the TW distributions (Lebesgue measures) 𝐹 , 𝐹 , and 𝐹 exist

and are depicted in Figure 2.4.

Figure 2.4: Joint Density Functions f1, f2, and f4 of the Largest
Eigenvalues Associated with the TW Laws F1, F2, and F4
Courtesy of Dieng and Tracy (2011)

Note that these are only asymptotic graphs obtained with the approximations of 𝐹 , , as 𝑥 →

∞. This is referred to as the tail behavior or the edge scaling limit of 𝐹 , , . This will be

discussed in further detail in the following sections and their statistics.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
167

Following the introduction of the Tracy-Widom distributions, the following are key theorems

referred to as universality theorems (e.g., see Tracy and Widom 2008). As university theorems,

they relax the Gaussian and invariance assumptions required for applying the limit laws 𝐹 , , ,

so extending their applicability to a variety of complex processes that are not necessarily Gaussian

in nature.

2.3.4.1 Theorem 2.1 (Johnstone 2001)

Let 𝑨 be an element of the Wishart ensemble 𝑊 𝑛, 𝐼 with its eigenvalues ordered as follows

𝑙 ⋯ 𝑙 . For more on this class of matrices, one may refer to Section 2.3.1.3. In addition, let

the centering and scaling constants be 𝜇 and 𝜎 as follows in Equation 2.39.

𝜇 √𝑛 1 𝑝 (a)

/ [Eq. 2-39]
1 1
𝜎 √𝑛 1 𝑝 (b)
√𝑛 1 𝑝

The following result establishes, under the null hypothesis 𝐻 : 𝜮 𝑰 (versus 𝐻 : 𝜮 𝑰 ) and

the requirements on n and p specified below, the largest eigenvalue 𝑙 of A converges in law to the

edge eigenvalue distribution function 𝐹 for the GOE ( see Equation 2.36 (b)).

If 𝑛, 𝑝 → ∞ such that 𝑛 𝑝 → 𝛾, with 0 𝛾 ∞, then Equation 2.40 defines the limit law of 𝑙 .

𝑙 𝜇
⎯ 𝐹 𝑠, 1 [Eq. 2-40]
𝜎

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
168

Karoui (2003) extended Johnstone (2001)’s result to γ ∈ R+ by demonstrating that the result is true

regardless of whether 𝑝/𝑛 or 𝑛/𝑝 → ∞. In his sequel, he changed n and p in 𝜇 and 𝜎 as

specified in Equation 2.39, to 𝑛1 max 𝑛, 𝑝 and 𝑝1 min 𝑛, 𝑝 . Weakening the assumptions

behind his 2001 breakthrough has significant statistical significance since, in many situations 𝑝 ≫

𝑛, Johnstone (2006) stated. Later, Johnstone (2006) published an ad hoc modification to make a

second-order correction to his 2001 result, which surprisingly resulted in an improvement in the

accuracy of the Tracy-Widom approximation. The ad hoc merely altered the formulae for the

scaling function constants 𝜇 and 𝜎 from Equation 2.39(a) and Equation 2.39(b) to Equation

2.41(a) and Equation 2.41(b), respectively.

1 1
𝜇 𝑛 𝑝 (a)
2 2

/
[Eq. 2-41]
1 1 1 1
𝜎 𝑛 𝑝 ⎛ ⎞ (b)
2 2 1 1
⎝ 𝑛 2 𝑝
2⎠

Meanwhile, the following result by Soshnikov (2002) generalizes Johnstone (2001)’s theorem to

the mth, also referred to as the next-largest eigenvalues of A. For the mth largest eigenvalue

distribution, one may refer to Section 2.3.4.4. It should be noted that Johnstone's work followed

that of Johansson (2000), who proved a limit theorem for the largest eigenvalue of a complex

Wishart matrix. However, due to the fundamental difference in constructing real and complex

models, these two distinct models necessitated independent investigations. Soshnikov (2002)

extended Johnstone and Johansson's findings by demonstrating that the same limiting laws apply

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
169

to the covariance matrices of subgaussian real (complex) populations in the following mode of

convergence: 𝑛 𝑝 𝒪 𝑝 .

2.3.4.2 Theorem 2.2 (Soshnikov 2002):

If 𝑛, 𝑝 → ∞ such that 0 𝛾 ∞, then Equation 2.42 defines the Tracy-Widom distribution 𝐹

as the limiting distribution law of the mth largest eigenvalue, 𝑙 , of the sample covariance matrix

𝑨 .

𝑙 𝜇
⎯ 𝐹 𝑠, 𝑚 , 𝑚 1, 2, ⋯ [Eq. 2-42]
𝜎

Thanks to the latest results on the distribution of the mth largest eigenvalues for the GOE and GSE

by Dieng (2005), Dieng and Tracy (2011) remarked that the additional assumption 𝑛 𝑝

𝒪 𝑝 under which Soshnikov proved his (2002)’s result, denoted by theorem 2.2 at this moment,

could be removed. Note that the distribution of the mth largest eigenvalues for the GUE was

already examined by Tracy and Widom (1994). Consequently, Karoui (2003) extended Theorem

2.2 to 0 𝛾 ∞. As Dieng and Tracy (2011) pointed out, the extension is critical for modern

statistics, which frequently encounters applications where 𝑝 ≫ 𝑛.

Furthermore, Soshnikov (2002) and Péché (2008) lifted the Gaussian assumption, re-establishing,

therefore a 𝐹 universality theorem. In other words, as Tracy and Widom (2009) specified, they

assumed that the data matrix X's matrix elements 𝑥 are independent random variables with a

common symmetric distribution and moments that grow no faster than Gaussian ones. For a

description of the amended centering and norming constants similar to 𝜇 and 𝜎 necessary to

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
170

generalize Soshnikov (2002)’s theorem, one may refer to Péché (2008). However, it is crucial to

redefine matrix A and new the assumptions in the case of real sample covariance matrices.

Accordingly, let A be an 𝑛 𝑝 sample covariance matrix such that 𝑨 𝑿 𝑿 satisfying the

following four conditions:

(i) 𝔼 𝑥 0, 𝔼 𝑥 1,

(ii) the random variables 𝑥 follow symmetric distribution laws,

(iii) All even moments of 𝑥 are finite, and their decay rate at infinity is at least as fast as

Gaussian: 𝔼 𝑥 𝑚 , where 𝑚 is a constant,

(iv) 𝑛 𝑝 𝒪 𝑝 .

Then, as stated in Section 2.3.4.2, the preceding theorem 2.2 is restated using the same limit law

as Equation 2.42 but with the previously stated four conditions in (i) through (iv).

Moreover, Péché (2009)’s remarkable contribution to Soshnikov’s work extended this result to

the scenario when the ratio 𝛾 approaches an arbitrary finite number and another strategy when 𝛾

goes to infinity or becomes so small.

In the meantime, Deift and Gioev (2007), who already had expanded on the early work of Tracy

and Widom (1998, 1999) by establishing a 𝐹 universality in the bulk for GOE, also proved the

universality at the edge of GOE. They obtained their result by replacing the Gaussian weight

function exp 𝑥 by exp 𝑉 𝑥 in the expression of the joint density function ℙ of the largest

eigenvalues of randomly selected matrices from the GOE, where V stands for an even degree

polynomial with a positive leading coefficient (Deift and Gioev 2007, Dieng and Tracy 2011).

Notably, the authors also established comparable results for the GSE and GUE.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
171

2.3.4.3 Limiting Law in Terms of Baik et al. (1999) and Tracy and Widom (2000)

Let A be a randomly chosen matrix from one of the (finite n) GOE, GUE, or GSE. If the

eigenvalues are ordered as follows 𝑙 𝑙 using Equation 2.43 below, one can compute the

rescaled mth eigenvalue 𝑙 measured from the edge of the spectrum.

𝑙 √2𝑛
𝑙 ⁄
,𝑚 1, 2, ⋯ [Eq. 2-43]
2 𝑛 ⁄

It is worth noting where the scaling formula for 𝑙 came from: Baik et al. (1999) and Johansson

(1998). Baik et al. (1999) derived the distribution of the length of the longest increasing sequence

of random permutations using the connection between Robinson-Schensted-Knuth (RSK) type

combinatorial probability and random matrix theory distribution functions. Later, Tracy and

Widom (2000), Dieng and Tracy (2011), and other scholars extended their findings to derive the

following critical result for only the Gaussian Ensembles.

For the largest eigenvalue in the β-ensembles (𝛽 1, 2, 4), it was proven that 𝑙 in Equation 2.43

is governed by Tracy-Widom distributions as stated in Equation 2.44 below (Tracy and Widom

2000, Dieng 2005, Dieng and Tracy 2011).

𝑙 →𝑙 , [Eq. 2-44]

2.3.4.4 Distributions of the Next-Largest, Next-Next

Tracy and Widom (1994 and 1996) showed that the theoretical distribution law of 𝑙 in Equation

2.44, depending on whether A is an element of GOE, GUE, or GSE, is 𝐹 in Equation 2.36(a), 𝐹

in Equation 2.36(b), or 𝐹 in Equation 2.36(c), respectively. The distribution Law of the mth largest

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
172

eigenvalue in GOE, GUE, and GSE (Dieng and Tracy 2011). For the study of largest eigenvalues,

not only the first largest ones but also the next-next largest ones are important. For instance, the

second largest eigenvalue of a Ramanujan graph, which has critical applications in communication

network theory, is well modeled by the 𝛽 1 Tracy-Widom distribution (Miller et al. 2008).

These graphs are crucial as they enable the construction of super concentrator and nonblocking

networks in coding theory and cryptography. The following are the expressions of their

distribution laws in terms of the Tracy-Widom distributions in the case of the GOE, GUE, and

GSE.

Case 𝛽 2, GOE

Let 𝐹 𝑠, 𝑚 1 ≡ 0, then Tracy and Widom (1994) derived the expression of the series in

Equation 2.45 below,

1 𝑑
𝐹 𝑠, 𝑚 1 𝐹 𝑠, 𝑚 𝐷 𝑠, 𝜆 ⃒ ,𝑚 0, [Eq. 2-45]
𝑚! 𝑑𝜆

where 𝐷 𝑠, 𝜆 , given in Equation 2.46, has the following Painlevé representation in which 𝑞 𝑥, 𝜆

is solution to Equation 2.37 in a way that: 𝑞 𝑥, 𝜆 ~ √𝜆 𝐴𝑖 𝑥 as 𝑥 → ∞.

𝐷 𝑠, 𝜆 exp 𝑥 𝑠 𝑞 𝑥, 𝜆 𝑑𝑥
[Eq. 2-46]

Case 𝛽 1, 4, GUE and GSE

Let 𝐹 𝑠, 0 ≡ 0, then an analog combinatorial argument that led to the recurrence relation in

Equation 2.45 helped derive the one in Equation 2.47 below,

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
173

1 𝑑
𝐹 𝑠, 𝑚 1 𝐹 𝑠, 𝑚 𝐷 𝑠, 𝜆 ⃒ ,𝑚 0, 𝛽 1, 4 [Eq. 2-47]
𝑚! 𝑑𝜆

where one can obtain the expression of 𝐷 𝑠, 𝜆 , similar to the one of 𝐷 𝑠, 𝜆 in Equation 2.46,

by replacing 𝑞 𝑥 in Equation 2.36(b) and Equation 2.36(c) with 𝑞 𝑥, 𝜆 .

Alternatively, thanks to the interlacing property between GOE and GSE—“In the appropriate

scaling limit, … More generally, the joint distribution of every second eigenvalue in the GOE

coincides with the joint distribution of all the eigenvalues in the GSE, with an appropriate number

of eigenvalues.” (Dieng and Tracy 2011, p. 13)—Dieng (2005) derived in the edge scaling limit

the limiting distributions for the mth largest eigenvalues in the for GOE and GSE, followed by its

corollary given in Equation 2.48.

𝐹 𝑠, 𝑚 𝐹 𝑠, 2𝑚 , 𝑚 1
[Eq. 2-48]

For the implementation of this remarkable result to compute 𝐹 𝑠, 𝑚 and its related density

functions 𝑓 𝑠, 𝑚 , a numerical scheme developed in MATLAB by Dieng (2005) is available (see

Section 2.3.6.2). In Figure 2.5 are plots created for validation purposes. As depicted, solid curves

are, from right to left, are theoretical limiting densities for the first through the fourth largest

eigenvalue, obtained with 10 points from 10 10 generated GOE Matrices.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
174

Figure 2.5: Illustrations of Theoretical Limiting Densities for the 1st (Curve in Right)
through 4th (Curve in Left) Largest Eigenvalue of 104 Realizations of 103 x 103 GOE
Matrices
Courtesy of Dieng and Tracy (2011, p.15)

The stunning graph above, borrowed from Dieng and Tracy (2011), ends this section on the

limiting distribution of the greatest eigenvalue of a particular class of matrices under the discussed

assumptions. From this section, numerous academics have established that the Tracy-Widom laws

are the joint limiting distribution of the largest mth scaled and centered eigenvalues at the

spectrum's edge for a broader class of matrices. As it contributed to the relaxation of numerous

constraints, particularly Gaussian assumptions, imposed on earlier versions of the universal type

of theorems, the following section elaborates on the concept of universality discussed in this

section.

2.3.5 The universality of the Tracy Widom Distributions

This section aims to expand more on the concept of the universality of the Tracy-Widom laws

outlined previously. In relation to the previous section, significant work has been devoted in the

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
175

recent decade to attaining the universality of results on the behavior of random matrices'

eigenvalues. The term "universality" refers to the fact that the limiting behavior of the eigenvalue

statistics is independent of the distribution of the matrix’s entries (Paul and Aue 2014). While the

statement may not hold in all cases, in many cases, the behavior of both bulk and edge eigenvalues

is primarily determined by the first four moments of the distribution of the entries (e.g., see

Soshnikov 2002). The investigation into limiting ESDs conducted by various researchers,

including contemporary ones, demonstrated that the behavior is universal at the level of first-order

convergence. Their finding depended on the assumptions that the entries of the sample data matrix

are standardized independent random variables satisfying a Lindeberg-type condition.

With the work of Soshnikov, who proved the Tracy-Widom limit of the normalized largest

eigenvalues on Wigner (Soshnikov, 1999) and Wishart (Soshnikov, 2002) matrices, the more

refined characteristics, such as the limiting distribution of normalized extreme (or edge)

eigenvalues, began to receive increased attention. However, these results still required the

existence of all moments (particularly sub-Gaussian tails), symmetry of the entry distribution, and,

in the Wishart case, an assumption that the dimension to sample size ratio approaches one. As

noted in the previous section, Péché (2009) extended Soshnikov's (2002) results by allowing the

dimension-to-sample-size ratio to approach any nonnegative value. Significant progress has been

made in Péché and Soshnikov's relaxation of the symmetry requirement and Gaussian

assumptions. As expressed in terms of the limiting behavior of the correlation functions of the

eigenvalues, bulk universality has been achieved using various methods (e.g., Johansson 2001,

Ben Arous and Péché 2005). In addition, multiple authors, such as Erdős and Yau (2012) and Tao

and Vu (2010, 2012), have made significant new developments in the universality phenomenon.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
176

They have been using analytical techniques to study bulk and edge universality questions. Through

their remarkable work, they have managed to remove many restrictions on the distribution of the

matrices’ entries.

For instance, Tao and Vu (2010), through a “four moments theorem,” effectively demonstrated the

universality of local eigenvalue statistics at the spectrum's edge for the Wigner and Wishart cases.

In another example, Erdős and Yau (2012) extended previous universality results based on a local

semicircle law for Wigner matrices to so-called generalized Wigner matrices. Feldheim and Sodin

(2010) and Bao et al. (2012) all investigated universality at the extremes of the spectrum of sample

covariance and correlation matrices. Benaych-Georges et al. (2012) investigated large deviations

of the extreme eigenvalues. Bao et al. (2015) established Tracy-Widom universality for suitable

normalized largest eigenvalues of generally distributed sample covariance matrices (non-white

case). Bao et al. (2015) examined their findings regarding their applications to statistical signal

detection and structure recognition of separable covariance matrices, as have most other

researchers in various fields. For more on this topic, one may refer to the exhaustive review by

Paul and Aue (2014). This section has expanded on the concept of the Tracy-Widom limit laws'

universality, which has dramatically aided researchers in various fields in explaining the behavior

of the eigenvalues of random matrices whose entries are not necessarily Gaussian thus; the

behavior of the complex processes they represent.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
177

2.3.6 Approximation of the Tracy Widom Distributions

2.3.6.1 Left and Right Tail Behavior of 𝑭𝜷

It is well known that for large n, the “PDF of 𝜆 consists of a central part characterized by

Tracy-Widom distributions edged on both sides by a couple of large deviations tails (Majumdar

and Schehr 2014). As depicted in Figure 2.6 below, these left and right tails correspond to very

different physics when interpreted in terms of the underlying Coulomb gas. The left corresponds

to a pushed Coulomb gas, whereas the right to a pulled Coulomb gas. The third chapter will cover

both topics well as they explain phase transitions in various physical problems, as shown below in

Figure 2.6. In this section, however, it is essential to include the

Figure 2.6: Illustration of the Left and Right Tail Behavior of the TW Fβ
Courtesy of Majumdar and Schehr (2014)

Below in Equation 2.49 is the right tail of the 𝛽-TW distribution which Dumaz and Virág (2013)

derived from the Airy operator representation.

𝒪 2
1 𝐹 𝑥 𝑥 exp 𝛽𝑥 [Eq. 2-49]
3

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
178

2.3.6.2 Tracy-Widom Distribution: Dieng’s Approach

Dieng (2005) provided in his Ph.D. thesis MATLAB codes that are greatly valuable for evaluating

and plotting the TW density and distribution functions. Three arguments are required for each

function: “the first is “beta,” which is the beta of RMT so it can be 1, 2, or 4; then “n” which is the

eigenvalue need; finally, “s” which is the value where you want to evaluate the function.” Equation

2.50 and Equation 2.51 are the asymptotic expansions of function “q” needed to compute the

Tracy-Widom distribution F1, F2, and F4, respectively, provided in Figure 2.4. These codes can be

obtained by contacting the author. As a result, requested codes were received for this research

project, courtesy of the author and Dr. Craig A. Tracy.

1 1 73 10657 13912277 1 [Eq. 2-50]


lim 𝑞 𝑡 √𝑡 1
→ 2 2 𝑡 2𝑡 2𝑡 8𝑡 𝑡

lim 𝑞 𝑡 [Eq. 2-51]


→ 2
1 ⁄
𝑒𝑥𝑝 𝑡 17 1513 850193 407117521
3 1
2√2𝜋 𝑡 ⁄ 24𝑡 ⁄ 2 3 𝑡 2 3 𝑡 ⁄ 2 3 𝑡
1
𝑡 ⁄

It is worth noting that besides Dieng (2005)’s invaluable contribution to the work of Tracy and

Widom (1992, 1994), other scholars such as Bejan (2005), Bornemann (2009), and Chiani (2014)

have also proposed numerical evaluations of the Tracy-Widom distribution function Fβ using

different approaches.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
179

2.3.6.3 Tracy-Widom Distribution: Chiani (2014)’s Approach

While studying the distribution of the largest eigenvalues for real Wishart and Gaussian random

matrices, Chiani (2014) discovered a simple approximation of the Tracy-Widom limit law. Thanks

to her contributions, the TW can be approximated accurately by an adequately scaled and shifted

gamma distribution. The exact CDF (𝐹 ) of the largest eigenvalue of quite large matrices for

finite-dimensional Wishart and Gaussian matrices (GOE, GUE) can easily be computed without

the need for asymptotic approximations. For the CDF implementation, an algorithm in

Mathematica is available in Chiani (2004)’s paper.

2.3.6.4 Statistics of the Tracy-Widom Distributions

Courtesy of Bornemann (2009), Table 2-3 and Table 2-4 provide statistical properties of Fβ for the

first six edges scaled eigenvalues ─ the first, the two, and the six largest eigenvalues ─ in all the

three β-ensembles namely GOE (β=1), GUE (β=2), and GSE (β=4). Those statistical properties

are the first four moments—mean (µ), variance (σ2), skewness (S), and kurtosis (K)—

characterizing the CDF Fβ. Note that because F4(k; s) = F1(2k, s), values for F4 can be derived from

Table 2-3, providing values for F1, whereas those for F2 (k; s) are provided in Table 2-4. In addition,

each of the values is provided with four (4) digits or decimal points greater than the ones provided

in Dieng (2005). For the numerical calculations of these high precision values, the author

developed and used a MATLAB Toolbox, which he kindly and gratefully made available for this

study.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
180

CDF Mean Variance Skewness Kurtosis Time

F1 (1; s) −1.20653 35745 1.60778 10345 0.29346 45240 0.16524 29384 4.59

F1 (2; s) −3.26242 79028 1.03544 74415 0.16550 94943 0.04919 51565 12.45

F1 (3; s) −4.82163 02757 0.82239 01151 0.11762 14761 0.01977 46604 30.04

F1 (4; s) −6.16203 99636 0.70315 81054 0.09232 83954 0.00816 06305 51.24

F1 (5; s) −7.37011 47042 0.62425 23679 0.07653 98210 0.00245 40580 77.49

F1 (6; s) −8.48621 83723 0.56700 71487 0.06567 07705 −0.00073 42515 112

Table 2-3: Statistical Properties of F1/F4 For Various k Values

CDF Mean Variance Skewness Kurtosis Time

F2 (1; s) −1.77108 68074 0.81319 47928 0.22408 42036 0.09344 80876 1.84

F2 (2; s) −3.67543 72971 0.54054 50473 0.12502 70941 0.02173 96385 5.44

F2 (3; s) −5.17132 31745 0.43348 13326 0.08880 80227 0.00509 66000 10.41

F2 (4; s) −6.47453 77733 0.37213 08147 0.06970 92726 −0.00114 15160 17.89

F2 (5; s) −7.65724 22912 0.33101 06544 0.05777 55438 −0.00405 83706 25.56

F2 (6; s) −8.75452 24419 0.30094 94654 0.04955 14791 −0.00559 98554 34.72

Table 2-4: Statistical Properties of F2 for Various k Values

2.4 Research Methodology

A methodology based on simulation and inferential approaches is adopted to ensure a systematic

application of the Tracy-Widom law to Construction Network Schedules. While the simulation

approach is used to build an artificial environment in which data can be generated, the inferential

approach helps for either characteristics or relationships of the population (Kothari 2004). Both

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
181

approaches are worth adopting to achieve the primary goal of this chapter. Paterson et al. (2006)

applied the Tracy-Widom law, well known for its applications in various fields, to infer population

structure from a genetic structure. The Tracy-Widom law, which first arises “as the limiting law

for normalized largest eigenvalue in the GUE of Hermitian matrices” (Tracy and Widom 2001), is

associated with large numbers.

Given a probability distribution, the deterministic duration of an activity on a network schedule

can be used as an input to generate many probabilistic durations. For example, Fente et al. (2000)

used the beta distribution in defining a probability distribution function for construction

simulation. From this perspective, a methodology centered on the following points is adopted: (1)

conceptual analogical discovery of similarities between knowledge areas applying the Tracy-

Widom distributions and construction project network scheduling; (2) identification of appropriate

TW methods for the study of project network schedules underlying behaviors; (3) data collection

of benchmark schedules; (4) analysis of data for numerical applications (5) simulation runs for the

emergence of the TW behaviors in construction schedules and correlation analysis. Setting forth

the objectives of this chapter, the 5-phase methodology is illustrated in Figure 2.7.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
182

Research Objectives Research Methodologies Research Results

(1) Identify Domains of applications of Similarities between fields of


TW law, applications of the Tracy-
Research (2) Map and match domains with
Objective 1 Widom law and construction
construction scheduling, scheduling performed
(3) Discover Similarities and constraints.

(1) Collect information such as constraints


and mathematics necessary for the Information on the appropriate
Research application of TW methods in TW methods for the study of
Objective 2 Construction scheduling for the study of network schedules underlying
construction schedules underlining behaviors uncovered
behaviors.

(1) Acquire/organize networks from the


Project Scheduling Problem Library
Research (PSPLIB), Benchmark network collected
Objective 3 (2) Obtain additional Networks from and prepared
sources textbooks or scientific papers if
necessary.

(1) Format networks for use in a numerical


computing environment such as
MATLAB, Benchmark schedules
Research (2) Code the transformation of networks transformed and complexity
Objective 4 into dependency matrices and computation measures calculated
of activity probabilistic durations,
(3) Plot networks/calculate complexity
measures.

(1) Program the identified TW methods for


application in Construction scheduling, Benchmark schedules
Research (2) Run simulations and records outputs underlying behavior revealed
Objective 5 for analysis, using TW methods and
(3) Perform correlation analysis/draw correlation analysis achieved
conclusions.

Figure 2.7: Overall Research Methodology for Chapter 2

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
183

2.4.1 Research Objectives

This section provides the chapter's objectives, which are primarily based on adapting and

implementing applicable Tracy-Widom distribution applications in many fields to construction

project network scheduling.

Research Objective 1: Parse scientific literature in other knowledge areas for Tracy-Widom law

applications to map and match their conceptual analogies with construction scheduling elements.

Research Objective 2: Classify methods to identify those that enable the study of network

schedules underlying behaviors according to their sizes and structures.

Research Objective 3: Acquire several benchmark schedules (e.g., project networks from the

Project Scheduling Problem Library (PSPLIB) database) of varying sizes from smaller to larger to

study their behaviors.

Research Objective 4: Transform construction network schedules into dependency structure

matrices to allow the generation of random matrices through a repetitive sampling of activity

probabilistic durations and computation of the network complexity measures.

Research Objective 5: Establish the existence or absence of correlations between the size,

complexity, and number of simulations required to validate the Tracy-Widom limit law s'

universality.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
184

2.4.2 Chapter 2’s Algorithm

Figure 2.8 outlines the sequence of procedures crucial to achieving the objectives of this chapter.

Goal: verify if the universality of the TW limiting law is applicable to benchmark project network schedules.
Assumption: the triangle distribution governs project activity durations, with known parameters a, b, and c
preselected as 90 %, 100 %, and 150 % of their deterministic durations.

Algorithm Model for Simulations

Benchmark Project Activity Data Construct the Sample Data Matrix 𝑿


 PSPLIB project networks (sizes: 32, 62, 92, 1. Sample p activities’ probabilistic durations.
&122, total networks: 2040) 2. Schedule project network using the CPM to
 Name, precedence links (FS), deterministic find times (ES/EF/LS/LF).
duration, distribution parameters. 3. Form the 1st row of 𝑿 with the p durations.
4. Repeat step 1 to step 3 to form the 𝑛 rows of 𝑿.
Calculate Network Complexity Measures Thus, 𝑿 𝑥 𝑥 … 𝑥 with each 𝒙
 Transform a network into dependency matrix. 𝐸𝐹 … 𝐸𝐹 denoting EFs of the jth activity.
 Calculate the complexity measures of the
network: CNC, Path ratio, D, Cn, OS, RT. Transform 𝑿 into 𝑿
 Construct a histogram with 5 bins to classify
5. Standardize 𝑿 to obtain 𝐖 𝒘 ,⋯,𝒘
the results of each complexity measure and
calculate its statistics — min, max, mode, μ, σ. so 𝒘 has μ 0 & unit Euclidean norm 𝒘 .
6. Synthesize a Gaussian data matrix 𝑿 as:
Assorted Networks for Experimentations 𝑿 𝑹𝑾 𝑥 𝑥 … 𝑥 where R is a random
 Select networks representatives of each RT matrix distributed according to the chi square
group and network size ---approx. 20 total, 𝜒 probability distribution.
 Find pairs of networks of equal RTs and sizes,
different sizes with approx. equal RTs Calculate the Sample Covariance Matrix 𝑺
7. 𝑺 𝑡𝑟𝑎𝑛𝑠𝑝𝑜𝑠𝑒 𝑿 𝑿 𝑿 𝑿
Conduct a Goodness-of-Fit Test for Validation 8. Calculate and sort the eigenvalues of S as
of the Conjectured Probability Distribution 𝑙 ⋯ 𝑙 .
(TW) for Project Activities’ Durations 9. Rescale the mth largest eigenvalues of S from
14. Run 10 simulations to collect values of the edge spectrum to obtain the test statistics
𝑙 , 𝑚 1, ⋯ ,4 , go to step 1 to 9 with 𝑛 𝑛 . 𝑙 , 𝑚 1, ⋯ ,4.
15. Perform the Kolmogorov -Smirnov (KS)
goodness-of-fit test given the significance level α. Find the Optimum 𝑛 for the Matrix 𝑿
H0: Activities’ probabilistic durations are 10. Pick total number of points n for experiments.
governed by TW limiting law. Versus H1: 11. Run 100 simulations for each n. For each
Activities’ durations do not follow TW.
simulation, follow step 1 to step 9 and retain 𝑙
12. Assess deviations between stat. of 𝑙 & TW.
Interpret Simulation Results for Conclusions.
13. Plot n versus ∆ to Find 𝑛 by interpolation.
Make statements and device guidelines.
Figure 2.8: Chapter 2’s Algorithm

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
185

2.4.3 Map and Match Conceptual Analogies Between Study Fields of Interests

A methodology of conceptual analogy discovery is frequently used to map and match the elements

and restrictions among the various knowledge fields covered by a research investigation. This

methodology necessitates a thorough literature search and meticulous analysis to find previous

scientific applications in each area of interest and commonalities between them. For this research

study, information gathered from numerous sources, such as academic publications, conference

papers, and books, was synthesized and presented in tabular formats to highlight parallels between

the Tracy-Widom distribution laws and project network schedules. Although the literature on both

themes suggests numerous applications, only seven and eight applications for each topic have been

included in this study. While Table 2-5 provides a synopsis of a few Tracy-Widom distribution

applications in various fields and their sources, Table 2-6A and Table 2-6B provide further

specifics on these applications. The same applications are expressed in Table 2-7 in terms of the

Tracy-Widom laws' universality, as presented in Section 2.3.5. In addition to the Tracy-Widom

distribution laws, the following tables include the eight identified applications and concepts of

project scheduling utilized in construction and engineering.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
186

Tracy-Widom Distribution Laws


Areas Applications References Models/Descriptions
Nonblocking networks, Miller et al. (2006) - Bachmat
1- coding theory, and et al. (2005, p.3) - Bohigas et al. Model variables are subsets of a set formed
Permutation cryptography - Airline (2009, p.31117-1) - Bejan by selecting objects, quite often with no
or boarding - (2005, p.5) - DasGupta (2005, replacement. The selection is called
Combinatory Combinatorics - p.387) - Baik et al. (1999) - permutation if the order is a factor. Else, it is
Models matching and the Tracy & Widom (2002, p. 590- called combination.
birthday problems 591)
Genetics - Statistical Saccenti (2011, p. 644) - Model is an excitable medium in the presence
2 - Oriented
mechanics - Imamura & Sasamoto (2007, of persistent random spontaneous excitation
Digital
Flight control systems, p.2) - Hajiyev (2012, p. 192, (e.g., thermal/electrical stimulus). Visual
Boiling
aircraft - polymer 196) – Dotsenko (2010) features of DB dynamics resemble bubble
(ODB)
systems - Gravner et al. (2001, 2002) formation/growth/annihilation in a boiling
Models
- thermodynamics Tracy and Widom (2002) liquid. Thus, the model’s name.
3 - Ising Statistical Mechanics of Math. model of ferromagnetism
Spin Disordered Systems - consisted of discrete variables portraying
Stein (2004, p.3) - Binder (1986,
Glass Computer science – atomic (spins) magnetic dipole moments in
p.802) – Deift (2006,
Models Neural networks - one of two states (+1/-1). Spins are structured
p.9), Castellana and Zarinelli
Disordered Protein folding – in a graph (lattice), allowing
(2011)
Magnetic Condensed matter continuous/competitive interaction
Systems physics of each spin with its neighbors.
Operations Research – Baryshnikov (2001) -
Telecommunication - O'Connell (2002) - Tracy and
Model constructed to predict queue lengths
4 - Queueing Traffic Engineering - Widom (2002, p.592) – Baik et
and
Theory Computing Field of al. (2006) - Bachmat et al.
waiting time.
Queueing Theory - Bus (2005) - Deift (2006, p.10-11) -
Problem in Cuernavaca Jagannath & Trogdon (2017)
Combinatorial
Mathematics-Computer Kaplan (2009, p.1) - Fleming An Aztec diamond of order n is a geometrical
5 - Aztec Graphics – Statistical and Forrester (2011, p. 442) figure of all lattice squares within a diamond-
Diamonds of Mech. & Physics - - Colomo & Pronko, (2015) - shaped region whose centers (x,y) verify |x| +
Order/Size n Structural Glasses Garrahan et al. (2009, p.15209) |y| ≤ n. Various models are built on the
Commun. in - Tracy & Widom (2002, p.592) appealing feature of the tiling problem.
Mathematical Physics
Patience sorting is a sorting algorithm
6 – Solitaire/ Computer Bachmat et al. (2005) - Deift inspired by/named after the card game
Patience card Science - Probabilistic (2006, p. 10) - Tracy patience. A modification of the algorithm
games. Combinatorics and Widom (2002, p. 591) calculates the length of the longest increasing
subsequence in an array.
A growth model is a simulation of a droplet
7 - Growth
geometry or interface development, such as
Models- Statistical Mechanics -
Tracy and Widom (2002, p. the spread of water on a napkin or the
Polynuclear Biology and Medical
591-592) - Prähofer & Spohn advancing edge of a bacterial colony in a petri
Growth Modeling - Computer
(2000a, 2000b) - Deift (2006, dish. A growth model is a specific instance of
(PNG) Science Applied to
p.20-21) – Dyke (2019, p.198) the Polynuclear Growth (PNG) model, which
Droplet Airline Boarding
describes a crystal forming layer by layer on a
Model
one-dimensional substrate.

Table 2-5: The Tracy-Widom Distribution Laws–A Synthesis with References

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
187

Tracy-Widom Distributions
Model/Parameter/Process Probability Distribution
Areas Probability Space / Constraints Model Approximation
(Algorithm)
ISπ: set of k (≤N) ascending Random permutations π are
uniformly Lengths lN statistically behave like
numbers having their
the largest
Groups SN of all permutations π permutations π in the same distributed. lN are distributed
of order. according to an inverse function in eigenvalues of a random Hermitian
matrix with an eigenvalue density
N positive integers lN: Length of the longest N and n, with n being the greatest
1- having the form of a discrete
increasing subsequence of π length
Permutation/ Coulomb gas on Z
ISπ of a group SN
Combinatory
Models Groups of random words (w) of - Length of the longest weakly Each w has a probability of 𝑘 .
increasing subsequence in w 𝑙 𝑤 or is 𝑙 𝑤 distributed Increasing or decreasing lengths
length N from an alphabet of k
or 𝑙 𝑤 according to a function in which statistically behave like the largest
letters (Groups of permutations
- Length of the longest variables are N, k, and n. Where or smallest eigenvalues of a
σ
decreasing subsequence in w 𝑛 𝑙 𝜎 or n 𝑙 𝜎 resp. random matrix that has a trace zero
on N words)
or 𝑙 𝑤
Fluctuations of the height Defined as a function of four
2 - Oriented Space of occupied sets function (h) characteristic of variables: The ODB problem is approached
Digital changing the system state are defined as one-dimensional space-time to studying an increasing path
Boiling with time in a growing interface a function of space-time variables, problem by studying eigenvalues
(OBD) in variables and independent height function and probability of random matrices with IBRV as
Models the two-dimensional lattice Z2. Bernoulli related entries.
random variables (IBRV). to the IBRV.
Parameters xij are normally Studying the finite-size
3 - Ising Systems of Sherrington- The total energy of a
distributed fluctuations
Spin Kirkpatrick configuration as a function of
and their eigenvalues density are of the Tc of the SK model is
Glass (SK) Ising spin vectors defined spin components, coupling
defined by the susceptibility matrix reduced to studying the
Models by parameters xij for external
inverse characteristic of the phase distribution of the
Disordered their energy configuration along quenched disorder, parameter
transition at critical temperature Tc. largest eigenvalue of a random
Magnetic the axis of rotations (Si=±1; J for the strength of the energy
matrix, which depends on the
Systems i=1…N) between spins.
sample xij.

Table 2-6A: Conceptual Analogy: Applications of the Tracy-Widom Distributions


188

Tracy-Widom Distributions
Model/Parameter/Process
Areas Probability Space / Constraints Probability Distribution Model Approximation
(Algorithm)
• With some scaling, Dk is
Series of n single-server queues The quantity of interest for this model
Service times are i.i.d. and follow equal in distribution to the
4 - Queueing each with unlimited waiting is D(k,n)~Dk, the departure time of
the largest eigenvalue of a k xk
theory space with a first-in and first-out customer k (the last customer to be
Poisson distribution V. Gaussian random matrix
service. served) from the previous queue n
• Dk is independent of V.
• Sets of Aztec diamonds An.
• A weight of “ω” or “1” for each
• Each An has 2n rows of squares • Domino height function per tiling A Random tiling of the Aztec
vertical or horizontal domino
whose centers lie on a vertical • Dominoes corners are connected to diamond of size n problem can
• A tile τ has a weight of ωm where
5 - Aztec line. obtain paths be analyzed using zig-zag paths
m is the number of vertical tiles in
Diamonds of • Number of squares per row is • Each An has four regular brick wall in the tiling, which solves the
τ
Order/Size n 2k with k=1-n from top to base pattern regions and a central region of longest increasing subsequence
• τ selected with a probability P as
and inversely irregular tiling patterns or temperate problem for random
a function of ω given a tile and
• Squares are tiled with dominoes zone. permutations.
paths
or 1x2 rectangles
• Reveal 1st card i1 then 2nd i2. If i2>i1 • A shuffled deck can be
(rank), start a new pile to the right of thought as
• equals the number of piles at
i1. Otherwise, place i2 on i1. a random permutation.
6- the end of the game started with a
• Reveal i3. If i3>i1 & i2, start a new • Patience sorting is closely
Solitaire/Patience Groups of shuffled deck of cards deck σ
pile to the right of i2. Otherwise, place related
Card Games. σ = {i1, i2,. .,iN} • PDF defines
i3 on higher rank i1 or i2. to the problem of the longest
(Floyd’s Game) as a function of N and a certain
• If i1>i3 & i2>i3, place i3 on the increasing subsequences for
variable t
smaller ranked card permutations
• Play until all cards are revealed π ∈ SN = {1, 2,. .. , N}
7 - Growth Set of plateaus of crystal growing Height (h) function on the time-space • Nucleation
Solving a PNG model problem
Models- layer by layer through random defined based on sets of nucleation events occur independently and
is equivalent to solving the
PNG Droplet deposition of particles to form events inside of a rectangle. uniformly in space-time.
longest increasing subsequence
Model islands that spread laterally with Nucleation events resulting from the • Events are Poisson distributed
of permutation p problem with
(Continuous time constant speed on a one- reunion of adjacent islands of the with
h as length.
process) dimensional time-space. same level. density one.

Table 2-6B: Conceptual Analogy: Applications of the Tracy-Widom Distributions


189

As mentioned earlier, as presented in Table 2-6A and Table 2-6B, the same applications are

expressed in the following Table 2-7 in terms of the Tracy-Widom laws' universality.

Tracy-Widom Distribution Laws


Areas Universal Law
1 - Permutation or As N → ∞ and with some scaling, the limiting
Combinatory distribution of the lN is the Tracy-Widom Distribution
models (TW)
2 - Oriented Digital Boiling As the space-time variables →∞, with some scaling, the
(OBD) models distribution of the height fluctuations is the TW
In the thermodynamic limit, as N →∞, the TW governs
3 - Ising Spin Glass Models/
the pseudocritical temperatures for SK spin glass models
Disordered Magnetic Systems
regarding the disorder.
As n→∞, the distribution of the departure time Dk of
4 - Queueing Theory
customer k is the TW
The asymptotic fluctuations of the shape of the temperate
5 - Aztec Diamonds of
region or domino height in a random tiling are governed
Order/Size n
by the TW
As N→∞, the number of piles, suitably centered and
6 - Solitaire/Patience Card
scaled, obtained in patience sorting is statically governed
Games
by the TWD
As t→∞, the limiting distribution of the shape
7 - Growth Models - PNG
fluctuations of the PNG model, which grows in the form
Droplet Model
of a droplet, is the TW

Table 2-7: Conceptual Analogy: Universality Summary of the Tracy-Widom Distribution


Laws

Like the ones on the use of Tracy-Widom distribution laws in various fields, the following tables

contain several project scheduling applications and theories used in construction and

engineering. While Table 2-8 summarizes eight applications from diverse sources, Table 2-9A

and Table 2-9B go into greater detail about these applications.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
190

Construction Scheduling Theory and Applications


Areas Applications References Models/Descriptions
1 - Program Management systems (planning, controlling,
Evaluation and decision-making) - Network analysis -
Malcolm et al.
Review Technique Research & development of complex
(1959)
(PERT) projects (Polaris Missile system-NAVY) –
(Probabilistic) queueing and graph theories
2 - Critical Path
Design, procurement, and maintenance of
method (CPM) or
several types of construction projects -
Precedence
Planning, scheduling, and cost-control
Diagramming Kelley (1961)
aspects of project work - New product
Method (PDM)
launching - Installation, programming, and
Scheduling
debugging of computer systems
(Deterministic)
Instrument for scheduling and monitoring of
3 - Linear
activities by field personnel - Complement Harmelink and
Scheduling Method
to a CPM by field personnel - Roadways, Rowings (1998)
(LSM)
construction projects
4 - Repetitive Multiunit projects with repetitive activities
Scheduling Method such as - Vertical constructions: floors Harris and
(RSM) / Repetitive (multistory buildings), houses in housing Ioannou (1998)
Project Model developments - Horizontal constructions: - Reda (1990)
(RPM) meters in pipelines, stations on highways
Scheduling approach to minimize
5 - Construction
construction projects total project durations
Spatial Scheduling Lucko et al.
- Project management tool to formalize
(three-dimensional (2014) - Said
spatial-temporal needs of project activities
scheduling)
and handle potential spatial conflicts
6 - Line of Balance Projects characterized by repetitive
Scheduling Method operations such as: highway construction,
(LOB or LBSM) or housing projects, long bridges - Scheduling, Sarraj (1990)
Vertical Production resource management, project analysis, and
Method (VPM) project control of such projects
Large scale and linear projects with space
(Roofigari
congestion such as Piping or paving
7 - Time-Space Esfahan 2016) -
operations - Planning and control tools to
Scheduling Method Riley and
unforeseen variations from planned
Sanvido (1997)
schedules of linear construction projects
Risk adverse project management tool -
Electronic product development projects -
8 - Fuzzy Project
Projects with minimum schedule risk - Wang (2004)
Scheduling
Stochastic schedule models - Flow-shop and
job-shop scheduling problems

Table 2-8: Construction Scheduling Theory and Applications: – A Synthesis with


References

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
191

Construction Scheduling Theory and Applications


Model/Parameter/Process Probability Model
Areas Probability Space / Constraints
(Algorithm) Distribution Approxim.
• Activities to complete a program or project • Calculation of expected mean duration to complete Use of the Beta
1 - Program/Project
• Activity resources each activity using Three Time Estimates Method distribution to
Evaluation and
• Optimistic, typical, and pessimistic duration • Estimation of expected standard duration for each measure potential
Review NA
to complete an activity activity expected duration variability of an
Technique (PERT)
• Interdependency and sequence of activities • Determination of floats activity expected
(Probabilistic)
(network) • Computation of project critical paths duration te
• Project P or set of:
2 - CPM or • (i,j) performed sometime between events i and j and
*ordered (n+1) events labeled i. “0” for
Precedence defined based on six rules
origin and “n” for terminus
Diagramming • Yij bounded above Dij & below dij for each (i,j) in P
*m activities (i,j) of duration yij; NA NA
Method • Dij and dij are resp. normal and crash durations of (i,j)
*occurrence time ti
(PDM) Scheduling • Project feasible schedule with t0=0 is the sum of
• Schedule or set of (n+1+m) elements (y, t)
(Deterministic) utilities of each activity utility or linear function.
associated with P elements
• Plot of activities as lines with constant/changing
slopes
• Activities and completion sequences
3 - Linear • Slopes represent the amount of work completed per
• Activities resources
Scheduling working time unit NA NA
• Activity durations
Method (LSM) • Modifications can be made based on the proposed to
• Project Start time
obtain project progress
• Resource allocation monitored as with CPM
• Repetitive activities to complete project
• Resource continuity constraints
4 - Repetitive units
• Interdependence between activities as in CPM
Scheduling Method • Activity resources
• Progress of a repeating activity or its production line
(RSM) / Repetitive • Activity duration and starting time NA NA
plotted against time
Project Model • Activity production rate (set from the
• Converting or diverging production lines
(RPM) company’s data)
• Project duration estimates
• Work quantity in activity

Table 2-9A: Conceptual Analogy: Construction Scheduling Theory and Applications


192

CONSTRUCTION SCHEDULING THEORY


Model/Parameter/Process Model
Areas Probability Space / Constraints Probability Distribution
(Algorithm) Approxim.
• Construction activities define by their
5 - Construction spatial temporal attributes on the building • Use of Space Scheduling Algorithm to schedule
Spatial floor: Stationary one-directional, two- each activity and determine its buffer and 3D
Scheduling (three- directional, non-stationary activities singularity function NA NA
dimensional • Activities fixed production rates • Determination of each activity's best direction
scheduling) • Activity varying duration times in terms of • Estimation of project optimum duration
slopes
• Units required to construct a project of T
6 - Line of Balance
duration • Computation of each activity rate of output, smallest Use of the Beta
Scheduling
• time periods to complete each unit made of duration, buffer times using an algorithm distribution since the
Method
activities • Determination of slopes of the lines of the unit LOB uses known
(LOB or LBSM) or NA
• Unit production rate activities scheduling method
Vertical
• Start, finish, duration of each activity/unit • Optimal schedule found by solving conflicts that such as PERT, CPM,
Production
• Schedule of activities obtained using PERT may result from lines intersect and bar chart
Method (VPM)
• Buffers between consecutive activities
• Linear activities
• Precedence relationships
• Activity coordinates (location, time) at
• Optimization of project duration by minimizing
various times
potential congestions
7 - Time-Space • Minimum and maximum activity
• Activity uncertainty-aware productivity buffer
Scheduling production rates and time required between NA NA
estimation using a fuzzy inference system to schedule
Method succeeding activities
uncertainty.
• Project deadline
• Space-time float for space-time constraints and
• Space -time float polygons for congestion
flexibilities of activities’ resources variations
detections
Project P defines by: • Resource of activity in progress can’t exceed No probability distribution,
• Project ready time b resource availability but the use of factors of
• Project deadline a • Precedence relationships / Membership function intuition and ambiguous
8 - Fuzzy Project
• Activities labeled i =1 to n • Activity duration as six-point fuzzy numbers di for judgment, activity duration NA
Scheduling
• Activity duration, start time uncertainty in durations are modeled by membership
• Activity q-vector available resource types • Fuzzy set or degree of satisfaction for the PM based functions based on
Ni = (ni1,…,niq) on a, b, and index of optimization β the possibility theory

Table 2-9B: Conceptual Analogy: Construction Scheduling Theory and Applications


193

2.4.4 Data Collection and Preparation

2.4.4.1 Data Collection

Depending on the nature, breadth, or other variables of the research, researchers use a variety of

strategies to collect the data needed to answer their research questions. For example, while one

study may collect quantitative data through a series of experiments, another study may collect

qualitative data by polling a small group of individuals. In addition, researchers frequently use

qualitative data to test hypotheses to explain observations or facts in statistics domains (e.g.,

economics, health, and demography) concerned with gathering, organizing, analyzing,

interpreting, and presenting data. Furthermore, researchers may use qualitative data to grasp better

thoughts, experiences, or perspectives on a specific subject.

Because of the nature of the current investigation, quantifiable data on construction project

networks of various sizes and complexities is required. As a result, the systematically generated

PSPLIB developed by Kolisch and Sprecher (1997), which consisted of 2040 project networks of

various sizes and structures, represents an adequate sizeable collection of networks required to

investigate the underlying behaviors of project network schedules. Therefore, the investigation

will be conducted using the applied multivariate statistical techniques developed and introduced

in the subsequent sections. Furthermore, the PSPLIB networks are freely available electronically

in '.sm' format. Appendix A.1 contains a list of all gathered filenames. In contrast, Table 2-10

includes the total number of files collected for each set of project networks consisting of J30, J60,

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
194

J90, or J120 activities referred to as jobs or tasks. For instance, a project network J60 consists of

60 activities, and there are a total of 480 projects of size 60.

PSPLIB Networks

J30 J60 J90 J120


Total Number of files

480 480 480 600

Table 2-10: Benchmark Network Schedule Information

2.4.4.2 Data Preparation

After obtaining a benchmark set of project networks from the PSPLIB, it is critical to treat them

with care because they include critical project network data for subsequent computations. Each

collection filename begins with the suffix '.sm', which stands for Single-Mode Resource-

Constrained Project Scheduling Problems (Kolisch and Sprecher 1997). Although this file

extension is viewable in Microsoft Word, it is not readable in MATLAB. As a result, converting

each file from its original extension to a text file with the extension '.txt' was necessary. In other

words, the conversion is required before using any of the files in MATLAB. For, given the volume

of files, a procedure created in Visual Basic for Applications (VBA) and included in Appendix

B.1, on page 348, automates and manages the conversion process by preserving the integrity of

the original file contents. Appendix A.2 (p. 346) contains the contents of the filename "j301.1.sm."

Due to the technological requirements for establishing precedence links between activities, the so-

called AON, as seen in Figure 2.9 or Figure 2.10, shows the structure of a project using nodes and

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
195

arcs to represent the project's activities and their precedence relationships. Additionally, a couple

of dummy activities with zero durations are introduced to the project network, namely "1" and J.

They represent the unique initiating (source) and concluding (sink) activities. As described and

illustrated in Figure 2.10, a network is acyclic and numerically labeled (Kolisch and Sprecher

1997).

Figure 2.9: Exemplar Activity-on-Node Diagram

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
196

Figure 2.10: PSPLIB J301-1 Activity-On-Node Diagram

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
197

2.4.5 Formatting and Transforming Networks into Dependency Matrices

2.4.5.1 Formatting Networks

For this study, the triangle distribution is employed to simulate the probability durations of project

schedule activities. The distribution's parameters or boundaries: minimum (a), mode(b), and

maximum(c) are defined {90%, 100%, 150%} of the deterministic duration of any activity on a

project network, respectively. Due to the large number of network schedules obtained for this

study, it is critical to automate the operations required to determine the triangular distribution

parameters for each activity. Automation will help minimize errors and save time. Thus, the

flowchart in Appendix C.4 details the approach for computing triangle distribution parameters for

each activity on a project schedule network, either manually or automatically. This methodology

demands the storage of project activity information in a text format file tabulated and structured

similarly to Appendix A.2 for the PSPLIB files.

For any other project networks, such as the exemplar network, a table in either a text file or an

Excel Spreadsheet can be created in the same manner as in Appendix D.2 to tabulate the project

activity information. Additionally, for the PSPLIB networks, an Excel Spreadsheet table with three

columns named 'ID,' 'Name ID,' and 'pd,' for identifying the probability distribution required for

probabilistic activity durations, must be created. While several probability distributions can be

used to sample activity durations within a schedule, this study employs a homogenous probability

distribution. Due to the large number of PSPLIB network files acquired for this study, a computer

program created in MATLAB and included in Appendix B.2 assisted in tabulating project network

data and exporting the resulting table to a text file format. More precisely, the MATLAB code

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
198

calculates the probability distribution parameters for the activities, adds logic constraints and time

lags between activities, and sends the information about the activity network to a new text file for

later use. To demonstrate the validity of this methodology, the following Table 2-11 contains the

output data for the sample network. Refer to Appendix D.2 to illustrate the network j3038-7 output

information for PSPLIB networks. Because there are four distinct CPM logic constraints, as seen

in Figure 1.6, conditions between activities are finish-to-start (FTS or FF) with zero-time lags

(L=0) throughout this study.

ID Name Duration pd index successor

1 Source 0.01 Tri 0.009;0.01;0.015 Mob-FTS-0


2 Mob 7 Tri 6.3;7;10.5 A-FTS-0;B-FTS-0;E-FTS-0
3 A 19 Tri 17.1;19;28.5 D-FTS-0;J-FTS-0
4 B 10 Tri 9;10;15 C-FTS-0
5 C 6 Tri 5.4;6;9 D-FTS-0;F-FTS-0;J-FTS-0
6 D 18 Tri 16.2;18;27 L-FTS-0
7 E 15 Tri 13.5;15;22.5 F-FTS-0;G-FTS-0
8 F 17 Tri 15.3;17;25.5 H-FTS-0;I-FTS-0;K-FTS-0
9 G 16 Tri 14.4;16;24 H-FTS-0;I-FTS-0;K-FTS-0
10 H 6 Tri 5.4;6;9 M-FTS-0
11 I 11 Tri 9.9;11;16.5 L-FTS-0
12 J 19 Tri 17.1;19;28.5 L-FTS-0
13 K 15 Tri 13.5;15;22.5 T/O-FTS-0
14 L 18 Tri 16.2;18;27 T/O-FTS-0
15 M 10 Tri 9;10;15 T/O-FTS-0
16 T/O 3 Tri 2.7;3;4.5 Sink-FTS-0
17 Sink 0.01 Tri 0.009;0.01;0.015 N/A

Table 2-11: Exemplar Network Activity Probabilistic Durations and Information

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
199

2.4.5.2 Computation of Activity Probabilistic Durations

After calculating the triangular distribution parameters of each activity, the flowchart in Appendix

C.5 facilitated the numerical calculations of activity probabilistic durations. A MATLAB

subroutine for implementing flowchart operations allows for the computation and graphical

depiction of probability durations. This implementation calculates and displays any given project

activity based on its stochastic duration (b), minimum(a), and maximum (c). The MATLAB

function "makedist" creates the probability distribution object for the distribution name (triangular)

given the parameters a, b, and c. Because each activity's probabilistic durations are sampled from

a triangular distribution, their graphical representation should resemble a triangle with its peak at

b. plotting activities' probabilistic durations is not necessary. The graphs in Figure 2.11A and

Figure 2.11B depict probabilistic durations of the exemplar network activities whose data set

information is provided in Table 2-11. Each chart has a triangular shape peaking at "b" suggesting

that the triangular distribution governs the sampled durations. Knowing the distribution from

which they are drawn, it is not surprising that the triangular distribution governs the activities'

probabilistic durations. As a result, there is no need to plot activities' probability durations if their

underlying probability distribution is known.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
200

Figure 2.11A: Exemplar Network: Activity Probabilistic Duration Plots

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
201

Figure 2.11B: Exemplar Network: Activity Probabilistic Duration Plots

2.4.5.3 Creating a Network Dependency Matrix

Activity constraints are defined during the initial stages of designing a project schedule to establish

precedence relationships between consecutive activities. As seen in Figure 1.6, any of the four

distinct types of logic constraints can be represented in an n-by-n square and binary matrix with

entries p of type "0" or "1." Where "n" is the network's activity count. A "0" or "1" indicates that

there is no priority link between the pair of nodes I j) representing the project activities,

respectively (Demeulemeester et al. 2003). The resulting matrix provides a condensed

representation of the information flows between activities, allowing for a methodical mapping of

network parts (Uma Maheswari et al. 2006). Table 2-12 illustrates this matrix, representing the

exemplar network's dependence matrix. Indeed, the matrix is square and upper triangular. The

lower half of this matrix has been purposely omitted because it has the same information as the

upper half. Take note that the three ones '1' in the row corresponding to activity F indicate its

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
202

immediate successors H, I, and K in this table. The two "1"s beneath its column mean its immediate

predecessors, C and E.

Activity Source Mob A B C D E F G H I J K L M T/O Sink

Source 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Mob 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0

A 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0

B 0 1 0 0 0 0 0 0 0 0 0 0 0 0

C 0 1 0 1 0 0 0 1 0 0 0 0 0

D 0 0 0 0 0 0 0 0 1 0 0 0

E 0 1 1 0 0 0 0 0 0 0 0

F 0 0 1 1 0 1 0 0 0 0

G 0 1 1 0 1 0 0 0 0

H 0 0 0 0 0 1 0 0

I 0 0 0 1 0 0 0

J 0 0 1 0 0 0

K 0 0 0 1 0

L 0 0 1 0

M 0 1 0

T/O 0 1

Sink 0

Table 2-12: Exemplar Network Dependency Matrix

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
203

2.4.5.4 Identification of Network Paths

The methodology for identifying network paths is based on the CPM introduced in Section 1.4.2

and the procedures proposed in the previous sections. This methodology allows the determination

and identification of critical and non-critical paths of any given number of project networks.

However, it requires multiple simulations of a project schedule to find all project network paths

that are likely to become critical during the execution of the project. Finding these paths

necessitates that a given network be simulated several times to allow random sampling of activity

probabilistic durations to imitate possible scheduling issue occurrences during the construction

phase of the project network. Thus, the total number of runs to simulate network schedules of any

project and the total number of probabilistic durations to generate for any project activity need to

be established before applying this methodology. Although, because of the large number of

benchmark schedules and the process times (CPU), it may require to automatically execute the

operations of this methodology, simulating (100) schedules per project network and generating

(1000) probabilistic durations per network activity should be sufficient and feasible given the time

and resources allocated to this study. As proposed, this methodology consists of the following four

steps illustrated in the flowchart found in Appendix C.6.

Step 1: Formatting of Network Files

Follow the methodology proposed in Section 2.4.5.1 to: (1) calculate the triangular distribution

parameters associated with each activity's fixed duration; (2) organize and store into a text file the

data set on the project activities (ID, Name, Fixed Duration, etc.). If necessary, repeat this step for

all the project networks being considered before moving to Step 2.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
204

Step 2: Creation of Network Dependency Matrix

Follow the methodology proposed in Section 2.4.4.3 to create the dependency matrix of the project

network “N” using the project data set contained in the network file resulting from Step 1. This

step translates precedence relationships between consecutive network activities into a binary

matrix.

Step 3: Simulation of Network Schedules Run (100) simulations to schedule the project network

“N.” For each simulation, go to Step 3a.

Step 3a: Determination of Random Duration

Follow the methodology flowchart proposed in Appendix C.5 to generate 1000 probabilistic

durations for each activity on the network “N” using its triangular distribution parameters stored

in the network file. There is no need to plot the resulting data points except randomizing them to

obtain a random and probabilistic duration. Repeat this process for all activities to form a vector

of random durations before moving to Step 3b.

Step 3b: Scheduling of Project network

Utilize the CPM to schedule the project network based on the random activity durations obtained

in Step 3a, assuming FTS or FF constraints with zero-time lags (L=0). The ES and EF times are

determined during the forward pass using the CPM scheduling method, which is detailed in

Appendix C.1 for the forward pass, Appendix C.2 for the forward pass, and Appendix C.3 for float

calculations. The LS and LF times, on the other hand, are calculated during the forward pass.

Additionally, the entire duration of the project and the activity TF are calculated. The TF of an

activity specifies the amount of time the activity can be postponed without influencing the start

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
205

time (s) of its successor (s). Critical activities have TFs of zero, and their sequential connection

defines the network's critical path. A network may contain several critical paths. Last, calculate

each project's frequency and average duration for each critical path detected during the total

number of network scheduling simulations.

Step 4: Identification of Network Paths

Generate an exhaustive list of all possible paths connecting 'Source/start' to 'Sink/target) of the

Network "N." This may be accomplished by graph theory given the network's dependency matrix,

which per definition is independent of activity durations. MATLAB has all Graph theory-based

functions. The generated list contains all network paths, including critical ones. If two or more

networks are considered, save the network "N" path results, and go to Step 1 for the following

network. Otherwise, this step completes the path identification process.

For this study, automation of these methodology operations, as illustrated in the flowchart provided

in Appendix C.6, is necessary to derive the indicators of network morphologies crucial to the

analysis to be performed. As a result, a MATLAB code to computerize the methodology operations

was developed for application to the networks considered for this study. To validate this

methodology, provided in Table 2-13 and Table 2-14 outputs of the exemplar Network. The tables

indicate that this network has thirteen paths, among which five have the potential to become

critical. In addition, from the hundred network schedule simulation runs outputted in Table 2-14,

it is most likely that seven out of the seventeen network activities would be critical. Moreover, the

critical path made of these activities happened to be one of the two longest paths of this network.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
206

In addition, these results suggest a strong likelihood for the project network duration of 88.29 days

and very little chance for a duration of 84.65 days.

Network all possible paths from ‘Source’ to ‘Sink’


Path# Activities
1 Source Mob A D L T/O Sink
2 Source Mob A J L T/O Sink
3 Source Mob B C D L T/O Sink
4 Source Mob B C F H M T/O Sink
5 Source Mob B C F I L T/O Sink
6 Source Mob B C F K T/O Sink
7 Source Mob B C J L T/O Sink
8 Source Mob E F H M T/O Sink
9 Source Mob E F I L T/O Sink
10 Source Mob E F K T/O Sink
11 Source Mob E G H M T/O Sink
12 Source Mob E G I L T/O Sink
13 Source Mob E G K T/O Sink

Table 2-13: All Paths from Source to Sink of the Exemplar Network

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
207

Path return Project Avg.


Freq/100 runs Duration ********Critical Path per run*********
0.45 88.28667 Source Mob B C F I L T/O Sink

0.22 88.475 Source Mob E G I L T/O Sink

0.22 87.50273 Source Mob E F I L T/O Sink

0.08 84.64625 Source Mob A J L T/O Sink

0.03 84.84667 Source Mob A D L T/O Sink

Table 2-14: All Critical Paths of the Exemplar Network for 100 Simulations

Similarly, outputs for the PSPLIB project network J301-1 are also provided in Appendix E.1 for

all possible paths connecting activity 'Source' to activity 'Sink' paths. Table 2-15 for the

probabilistic durations of the project. These results show that the network J301-1 has thirteen

possible paths, out of which only two have the potential to become critical. The most likely critical

path is made of 34.4% of the network activities, is the network's longest path, and its activities

have a 60% chance of being critical. In addition, its duration suggests that the project network

would complete in 46.54 days

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
208

Path return Project Avg.


********Critical Path per run*********
Freq/100 runs Duration
0.4 46.1698 Source 4 10 16 22 23 24 30 Sink
0.6 46.5443 Source 3 8 12 14 17 22 23 24 30 Sink

Table 2-15: All Critical Paths of Network j301 for 100 Simulations

2.4.5.5 Graphical Representations of Project Networks

Su et al. (2016) made this section's methodology possible by supplying some of the MATLAB

subroutines required to display project network schedule diagrams and incorporate critical project

activities based on their criticality indices determined using the CPM. In addition, global

complexity measures such as restrictiveness (RT) and density which will be covered in the

subsequent section, can be calculated and added to the network diagrams. Although criticality

indices will be extensively discussed in a later chapter, it is necessary to introduce them briefly.

Assume that a network schedule is composed of random variables (Dodin and Elmaghraby 1985).

The criticality index of a path, on the other hand, represents the probability that the path's duration

will be larger than or equal to the duration of any other network's path. A criticality index is a

number that ranges from zero for a non-critical activity to one for critical activity.

Additionally, activities having a criticality index greater than 0.5 will be denoted by a darker node

or box on the network diagram. Nevertheless, with few exceptions, the methodology for

representing any network schedule diagram is like the one proposed in Section 2.4.5.4 to find all

critical and non-critical paths of any number of networks. The reason is that they both employ

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
209

random durations of project activities during the project scheduling process. With that in mind, the

methodology adopted to represent the network diagram is performed in five steps and as follows:

Step 1 and Step 2: Formatting of Network Files and Creation of Network Dependency Matrix

These steps are identical to the ones described in Section 2.4.5.4 on page 203.

Step 3: Simulation of Network Schedules

Run ten (10) simulations of the network Schedule. For each simulation, follow Step 3a and Step

3b proposed in Section 2.4.5.4. For the same reason stated in the previous section, ten rather than

100 simulations should be sufficient to determine activity criticality indices. There is no need to

calculate the average total project duration and critical path frequencies. Instead, compute each

activity's criticality index (Crr) based on its TF amount as given in Equation 2.52(a). Go to Step

4 after all simulations have been run.

Crr = 1, if activity TTF = 0


(a)
Crr = 0, if activity TTF > 0 Eq. 2-52

Cr = sum (Crr) / Total simulation runs (b)

Step 4: Computation of Activity Criticality Indices

Compute the criticality index of each project activity on the network using Equation 2.52(b).

Step 5: Representing the Project Network Schedule Diagram and Project Critical Activities

Create a network schedule diagram and list all project network activities with critical index values

greater than 0.5. If two or more networks are being studied, save the outputs for the first network

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
210

and proceed to Step 1 for the second network. Otherwise, the network schedule representation is

accomplished with this step.

To illustrate this methodology, developing a MATLAB computer program was necessary to

automate the methodology's operations. In addition to the Exemplar's network diagram, a

representation of each set of PSPLIB networks has been displayed. As shown in Table 2-10, each

set represents several networks of identical size. Figure 2.12A(a) shows the exemplar network

diagram and nine project activities found critical out of ten simulation runs. These activities also

happened to be the constituents of the network's most probable critical path, as outlined in Table

2-14. The exemplar network diagram is elongated in the x-axis direction for requiring three rows

and five columns to represent the 17 project activities and their precedence relationships

symbolized by links between activities. Accordingly, the network possesses a sequential structure

different from the diagram's structure representing the PSPLIB network j12012-9 shown in Figure

2.12C. This network diagram necessitates 16 rows and 20 columns to describe the project (124)

activities, and their associated links possess a distinctive parallel structure. When comparing the

four PSPLIB network representations, the diagram in Figure 2.12A(b) depicting PSPLIB network

j3038-7 has the most serial shape requiring a row-column ratio of two to represent the project (34)

activities. While the structures of the PSPLIB networks j3038-7 and j12012-9 could be visually

defined based on their diagrams, the structures of the other graphs are entirely hybrid.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
211

(c)

(a)

(b)

Figure 2.12A: Network Structure Representations


212

Figure 2.12B: Network Structure Representations

Given that their column-row ratios of 1.23 and 1.38 are necessary to represent both the PSPLIB

networks j6010-5 in Figure 2.12A(c) and j902-4 in Figure 2.12B, one may conclude that both

network topologies are serial. However, to allow comparison, simply displaying them is

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
213

insufficient to identify network structures and maybe classify them according to their topological

structures.

Figure 2.12C: Network Structure Representations

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
214

2.4.5.6 Network Complexity Measures

Based on the approach described in the preceding section, the following methodology is provided

for determining the study's network complexity metrics. The majority of the measures used were

derived from earlier research studies. While the literature demonstrates that a variety of complexity

measures can be used to evaluate the morphology of any given network, only six have been chosen.

Additionally, the methodology is evolved from the most fundamental to the most advanced in

terms of the data required to compute them. The following sections define the six complexity

metrics utilized in this study, followed by the technique used to calculate them.

Network Complexity Measure – Coefficient of Network (CNC/CM1)

The number of links or arcs (a) connecting (n) nodes or project activities in precedence

relationships calculates an AON network's topological structure or complexity. This measure,

known as the coefficient of a network (CNC), was developed by Pascoe (1966) as the ratio of "a"

over "n" (Nassar and Hegab 2006; Demeulemeester et al. 2003) and was later redefined by various

academics to be one of the most well-known complexity measures (Demeulemeester et al. 2003).

CM1 will indicate CNC in this study, and its calculation will only contain non-dummy actions, as

shown in Equation 2.53. The number of arcs (a) and nodes (n) will be capitalized to indicate the

presence of non-dummy activity. The adjacency matrix can serve to calculate the number of arcs

A, which is equal to the sum of all its elements in the numerical implementation of CM1. As a

result, in the methodology provided in Section 2.4.5.4, a subroutine for summing up the

dependency matrix components, minus dummies, can be added between Step 2 and Step 3. To

validate this methodology, refer to the AON diagram in Figure 2.9 of the exemplar network and

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
215

assign A and N values of 24 and 15, respectively. Doing so should lead to a CM1 value of 1.6 for

this network. Similarly, the network dependency matrix provided in Table 2-12 should be used to

get the same information.

Total number of arcs excluding dummies A


CM1 [Eq. 2-53]
Total number of nodes excluding dummies N

Network Complexity Measure – Paths Ratio (CM2)

As with the criticality index established in Section 2.4.5.5, a global CI can be constructed in terms

of the network paths connecting activities from 'Source/Start' to 'Sink/End.' As previously defined

and established, a network may have critical or non-critical paths. The overall number of critical

network paths is always less than the total number of non-critical network paths. Intuitively,

constructing a complexity metric based on many simulated network schedules as a percentage of

the total number of the network's critical paths selected from all feasible routes connecting its first

to last activity may serve as a measure of complexity. With the same amount of simulation runs,

the stated complexity measure can allow the comparison or classification of different project

networks. CM2 is the abbreviation for this measure, which is specified in Equation 2.54.

Total number of critical paths from simulated schedules


CM2 % 100 [Eq. 2-54]
Total number of all possible paths

Its numerical computation can be accomplished after completing Step4 of the methodology

presented in Identification of Network Paths. The proposed procedure for the numerical calculation

of CM2 can be simply performed using the outputs of the exemplar network provided in Table

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
216

2-13 for all possible paths and Table 2-14 for all the critical paths acquired after 100 simulation

runs. A CM2 value of 38.46 % is determined for the exemplar network schedule with the resulting

information.

Network Complexity Measure - Johnson Measure (D)/(CM3)

The Johnson measure denoted by D or CM3 in this study is another complexity measure based on

basic project network information, specifically on project activity dependency relationships. Its

formula given by Equation 2.55 derives from the work of scholars such as Nassar and Hegab

(2006, p. 556) and Boushaala (2010, p. 773) for their contributions to the study of project network

complexities are priceless.

D Max 0, p s [Eq. 2-55]

Where N is the total number of project activities, and 𝑝 represents the number of predecessors of

activity ‘i’, whereas 𝑠 denotes the number of its successors.

Numerically, CM3 can be derived from a network dependency matrix based on the binary

information it contains, with “1” symbolizing a relationship between activity “i” and “j” and “0”

a lack of precedence relationship. After determining the network dependency matrix in Step 2 of

the methodology provided in Section 2.4.5.4, one can computerize the operations of the flowchart

provided in Appendix C.7 to enable the calculation of CM3.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
217

The dependency matrix of the exemplar network provided in Table 2-12 can be used to validate

this methodology. As structured, rows and columns of this dependency matrix are labeled with the

project network activities. The predecessor(s) of any project activity, inscribed at the upper row of

the dependency matrix as shown, can be found on the extreme left column of the matrix as follows:

first, localize the activity associated column; second, find all the “1s” in this column; last, starting

at each one, figuratively draw a horizontal line to cross the vertical line symbolically passing

through the extreme left column. At the intersection of horizontal and vertical lines, the resulting

activity represents one of the activity's predecessors in consideration, which total number is the

sum of all the “1s” found in the activity column. Likewise, the successor(s) of any project activity,

inscribed at the very left column of the dependency matrix, can be found on the upper row of the

matrix column.

As an example, the activity "H," which is shown in Figure 2.13, has activities F and G as its

predecessors. To find its successors, first, find "H" on the far left of the matrix, then find all the

"1s" on the matrix row that corresponds to "H." Finally, draw a vertical line from the only "1"

found in this row to reach the uppermost row of the dependency matrix. Activity "M" located at

the intercession of the vertical line and horizontal line figuratively passing through the upper row

of the matrix represents the only successor of activity "H." In summary, "H" possesses two (2)

predecessors F and G, and one (1) successor, "M," which are also obtained when using the network

AON diagram shown in Figure 2.9. With the resulting information, the difference between the

total numbers of the predecessors of "H" and successors of "H" is equal to one. Similarly, the

predecessors and successors of all project activities may be identified, and the difference between

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
218

their predecessor and successor total numbers may be computed as follows. Note that adding or

removing dummy activities from the dependency matrix does not affect the final value of D.

D 𝑚𝑎𝑥 0, 1 0 𝑚𝑎𝑥 0, 3 1 ⋯ 𝑚𝑎𝑥 0, 1 3 𝑚𝑎𝑥 0, 0 1 10

Successors of Activities in the Rows


Sce Mob A B C D E F G H I J K L M T/O Sink Sum
Sce 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
Mob 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 3
Predecessors of Activities in the Columns

A 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 2
B 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1
C 0 1 0 1 0 0 0 1 0 0 0 0 0 3
D 0 0 0 0 0 0 0 0 1 0 0 0 1
E 0 1 1 0 0 0 0 0 0 0 0 2
F 0 0 1 1 0 1 0 0 0 0 3
G 0 1 1 0 1 0 0 0 0 3
H 0 0 0 0 0 1 0 0 1
I 0 0 0 1 0 0 0 1
J 0 0 1 0 0 0 1
K 0 0 0 1 0 1
L 0 0 1 0 1
M 0 1 0 1
T/O 0 1 1
Sink 0 0

Sum 0 1 1 1 1 2 1 2 1 2 2 2 2 3 1 3 1

Figure 2.13: Illustration of Activity Predecessors and Successors on a Network Dependency


Matrix

Network Complexity Measure - Normalized Complexity Measures (Cn) - (CM4)

Another complexity measure considered for the analysis of networks collected for this study is the

one that Nassar and Hegab (2006) developed. The number of activity nodes (n) and the number of

links or arcs (a) linking them serve as a measure of their complexity (identical to CM1). As

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
219

depicted in Equation 2.59, the developed measure, denoted by Cn or CM4 in this study, is

formulated in percentage (%), unlike most measures for network complexities which are expressed

as unitless coefficients. In addition, this measure may serve to rank and compare project

alternatives by indicating the option that may be simple to manage (Nassar and Hegab 2006).

Nassar and Hegab (2006, p. 557) advised that redundant links be removed from the network before

analyzing network complexity because existing and indirect links can replace them. Otherwise,

incorporating them would be misleading, as it would imply a higher level of complexity than exists

in the project network. Whenever possible, dummy activities "Source" and "Sink" will be omitted

from the determination of "n" or "a" in this study to justify the use of "N" or "A" in Equation 2.56

as in the CM1 determination.

⎧100
⎪ Log A N Log N 1 if N is odd [Eq. 2-56]
1 4 N 1
C %

⎪ 100 Log A N Log N if N is even
⎩ 1 4 N 1

The authors recommended Table 2-16 below to aid in interpreting C values.

Complexity Measure (%) Interpretation


70 ─100 Seriously consider review schedule
50 ─69 Consider review schedule
30 ─49 Satisfactory, but may be improved
0 ─30 Acceptable

Table 2-16: Interpretation of the Complexity Measure (Cn) Values


Courtesy of Nassar and Hegab (2006, p. 561)

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
220

Using the formula in Equation 2.56, C can be calculated directly or as a subroutine to any

previously described ways. One may use the following information on the exemplar network to

validate the methodology: Nodes of activity (N = 15) and linkages or arcs (A = 24).

Since N equals 15, N is odd, Cn may be determined using Equation 2.57.

24
𝐿𝑜𝑔 [Eq. 2-57]
𝐶 100 15 1 38.88 %
15 1
𝐿𝑜𝑔
4 15 1

According to Table 2-16, a Cn value of 32.28% is satisfactory, but the project network may be

improved.

Network Complexity Measure – Density (OS)/ (CM5)

Another complexity measure considered for this study is density, also known as Order of Strength

(OS). The complexity measure OS, also referred to as CM5 in this study, is “defined as the number

of precedence relations (including the transitive ones [,] but not including the arcs connecting the

dummy start or end activity) divided by the theoretical maximum number of precedence relations

[n × (n - 1)/2], where n denotes the number of no dummy activities in the network” (Su et al. 2016).

As initially stated earlier in the paragraph devoted to the complexity measure CM1, n in the

expression of in [n × (n - 1)/2] will be replaced with a capital letter to indicate the use of non-

dummy activities in the expression of OS. In addition, the total number of precedence relations

can be calculated as the sum of the elements of the network precedence matrix without the dummy

activities. Accordingly, OS can be expressed by Equation 2.58 below provided, which may be used

for its numerical computation in a computer program.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
221

∑ ∑ Dependency Matrix i, j
OS 2 [Eq. 2-58]
N N 1

N represents the total number of non-dummy nodes in an AON network schedule in the above

equation. By adding a subroutine after ‘Creation of Network Dependency Matrix’ in Step 2 of the

Methodology proposed in Section 2.4.4.5, one may calculate the density of any network. The

exemplar network, which has 15 non-dummy activities and a total of 24 precedence relations, can

serve once again as the illustration of this complexity measure. By substituting both values in

Equation 2.58, an OS value of 0.2286 [=24x2/(15x14)] can be found for the exemplar network.

Network Complexity Measure - Restrictiveness (RT)/ (CM6)

The last complexity measure considered for this study is restrictiveness (RT). This measure was

derived from graph theory and first introduced by Thesen (1977). Nassar and Hegab (2006, p.557)

justified its application to construction schedules because “project network usually falls under a

special category in graph theory called directed acyclic graph.” Its determination requires not only

an addition of a couple of dummy activities, “Start/source” and “Finish/sink,” to the project

network schedule but also a known reachability matrix R of the network in question. Given the

network dependency matrix, also known as an adjacency matrix, one can extract from it a new

matrix called the reachability matrix R, and its entries 𝑟 are as follows:

𝑟 1, if both activities in row “i” and in column “j” are reachable or connected by a path

containing one or more arcs. Otherwise, 𝑟 0, if there is no path.

The value of RT can be calculated using Equation 2.59 expressed in terms of the elements of the

R matrix and the set of paths V connecting activity “i” to activity “j” of the network. RT values

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
222

vary between 0 and 1, with a value of 0 indicating a perfect parallel digraph (directed graph or

AON diagram) and a value of 1 indicating a serial digraph (Su et al. 2016, p.3-4). Su et al. (2016,

p.3) added that “[r]edundant arcs do not affect RT, since it is based on the

reachability matrix (the closure of the connectivity matrix)” (Latva-Koivisto 2001, p. 16).

2 ∑,∈ r 6 N 1
RT [Eq. 2-59]
N N 1

Numerically, RT can be calculated after the dependency matrix determination in Step 2 of the

methodology proposed in Section 2.4.5.4. The validation of the proposed methodology for the

calculation of RT can be performed by applying the exemplar network information from previous

sections. Using its dependency matrix, the reachability matrix of the exemplar network, as

provided in Table 2-17 below, can be derived. Thus, its RT value can be determined using Equation

2.59 as follows:

2 ∑,∈ r 6 N 1 2 116 6 15 1
RT 0.6476
N N 1 15 15 1

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
223

Sce Mob A B C D E F G H I J K L M T/O Sink


Sce 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Mob 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
A 1 0 0 1 0 0 0 0 0 1 0 1 0 1 1
B 1 1 1 0 1 0 1 1 1 1 1 1 1 1
C 1 1 0 1 0 1 1 1 1 1 1 1 1
D 1 0 0 0 0 0 0 0 1 0 1 1
E 1 1 1 1 1 0 1 1 1 1 1
F 1 0 1 1 0 1 1 1 1 1
G 1 1 1 0 1 1 1 1 1
H 1 0 0 0 0 1 1 1
I 1 0 0 1 0 1 1
J 1 0 1 0 1 1
K 1 0 0 1 1
L 1 0 1 1
M 1 1 1
T/O 1 1
Sink 1

Table 2-17: Exemplar Network Reachability Matrix

2.4.6 Model Development to Investigate Project Schedules Underlying Behaviors

As indicated in Section 1.3.2, construction project networks are built up of nodes representing

activities that are linked by precedence relationships. Each activity is a project task that must be

completed within the time range specified for the project to be completed in its entirety. Aside

from project activities, the enormous interactions between numerous parties involved in building

a project, especially a large one, cause a construction project to fall under complex systems.

Scientists have utilized random matrices to simulate complex systems and examine their behaviors

using their eigenvalues' distributions to understand their behavior better. This section aims to use

the ubiquitous pattern known as "universality," which is based on RMT and has been used to

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
224

explore a variety of complex system characteristics. This is justified by the similarities in applying

the Tracy-Widom laws and construction project networks.

As a result, examining the distribution of the extreme eigenvalues of random matrices obtained

from probabilistic durations of project network activities could contain the key to identifying their

underlying probability distribution. Based on a sample of activity durations on p activities of a

project network, it is possible to derive reasonable inferences regarding activity durations in the

general population. Creating a mathematical model to draw samples is thus the first step toward

attaining the purpose of the section. The second stage is to develop a simulation technique for

randomly selecting the samples required for a thorough statistical investigation of the sample

covariance matrix's eigenvalues. The final step is to perform multivariate analysis, specifically,

hypothesis testing using the p correlated variables assessed jointly, to establish whether the Tracy-

Widom distribution is a reasonable distribution for the population of interest.

2.4.6.1 Model Preparation: Encoding a Project Network Schedule into a Matrix

This research is in part inspired by the discovery a Czech physicist Petr Šebad made after plotting,

on a computer, thousands of bus departure times that each bus driver’s paid spy collected at a bus

stop in Cuernavaca, Mexico. A bus driver would use this information to maximize his profits by

either slowing down if the bus ahead had just left so that passengers would accumulate at the next

stop or speeding up to pass other buses if the bus ahead had left long ago. The discovery confirmed

his suspicion about the chaotic interactions between drivers, causing the spacing between bus times

to have the same behavior pattern as earlier in his experiments involving chaotic systems in

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
225

quantum physics (Wolchover 2014). This ubiquitous pattern, which scientists referred to as

“universality,” has often appeared in investigating behaviors of complex systems using RMT.

RMT is involved with the empirical distribution of the eigenvalues of random matrices. In

addition, the similarities between construction project networks and the fields of applications of

the Tracy-Widom distribution laws, as revealed in Section 2.4.3, have led to the belief that RMT

may be critical to discovering a natural probability distribution for the extreme eigenvalues of

random matrices for project networks. A random matrix is a collection of entries tabulated in

columns or rows. Each represents a random vector whose independently chosen observations are

taken from a multivariate population with a known or unknown distribution law characterizing

that population.

In construction management and engineering, a dependency matrix, encoding links with “1” or

“0” respectively, whether a link exists between a pair of activities, serves to describe the

topological structure of a project network. Given project scheduling information, Section 2.4.5.3

and Section 2.4.5.2 provide the methodology for generating dependency matrices of project

networks and probabilistic computing durations of project activities. In addition, the derived data

can serve to encode the project network into a matrix. Figure 2.14A and Figure 2.14B illustrate

thirteen different schemes to encode a network into a matrix. Since the ultimate use of the encoded

network matrix is to study the behaviors of project networks using eigenvalues of the derived

matrix, the following reasons guided the selection process to discard some of the proposed

schemes.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
226

As constructed, the first and foremost reason is that the twelve first matrices are not appropriate to

conduct the research investigation. They are either built with fixed binary numbers “0” and “1”

only, fixed binary numbers and random activity durations, or a combination of activity durations

and their early (ES/EF) and late (LS/LF) times; therefore, are not random matrices. However, the

rectangular matrix in scheme thirteen with entries representing EF times of project activities

computed based on activity probabilistic durations sampled from a triangular distribution with

known parameters is random. The second reason is that all entries below or above the diagonal of

an upper or a lower triangular matrix are zeros. With this construction, there is a risk of

degeneration in standardizing these matrices, as most multivariate analysis techniques require. The

last reason is the redundancies of entries in some of the matrices. For example, keying activity

durations or calculated times k+1 times with k representing the total number of its predecessors or

successors may also result in degenerate matrices.

Although the bulk of the initially considered schemes is not random matrices, they are worth

mentioning to deter any future attempt for considerations in a similar investigation using

multivariate statical analysis. To sum up, among the thirteen ways of encoding a project network

into a matrix, only the last matrix proposed in scheme thirteen will serve to construct the random

matrices needed for this study. Therefore, the following section provides a methodology for

creating a mathematical model based on the matrix proposed in scheme thirteen.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
227

Figure 2.14A: Different Schemes for Encoding a Project Network Schedule into a Sample Data Matrix
228

Figure 2.14B: Different Schemes for Encoding a Project Network Schedule into a Sample
Data Matrix

Table 2-18 shows an example of encoding a project network schedule into a matrix using the

selected scheme. The encoded project network schedule is for the exemplar network, which is

frequently utilized throughout this chapter to quickly show newly developed concepts and

associated approaches. Table 2-10 contains information about the project network, including a list

of activities and their defined durations. Finally, refer to Table 2-12 and Figure 2.9 for the project

network’s dependence matrix dependency matrix and AON diagram.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
229

Network activities
Run
Act. 1 Act. 2 Act. 3 Act. 4 Act. 5 Act. 6 Act. 7 Act. 8 Act. 9 Act. 10 Act. 11 Act. 12 Act. 13 Act. 14 Act. 15 Act. 16 Act. 17
No.
1 0.011 8.65 35.88 22.69 28.97 56.79 29.61 46.46 51.28 59.84 64.50 59.48 72.45 91.33 73.58 94.66 94.67
2 0.011 8.45 26.45 17.79 25.47 49.31 22.69 46.99 42.31 55.47 61.13 46.65 61.45 77.99 68.20 81.94 81.96
3 0.012 7.85 25.89 17.27 23.10 49.41 23.32 41.04 37.80 49.06 54.64 49.28 61.22 79.75 60.42 82.69 82.70
4 0.013 9.13 26.91 20.43 27.60 52.16 23.96 47.85 39.05 56.09 60.35 47.39 62.08 77.40 70.65 80.47 80.49
5 0.011 10.34 31.36 19.72 27.33 53.56 32.00 49.35 53.96 60.62 70.28 57.99 73.87 88.80 69.71 91.88 91.89
6 0.010 9.90 37.38 20.90 27.99 53.61 26.19 48.24 43.11 56.10 64.15 60.98 64.12 90.38 65.48 94.21 94.22
7 0.012 10.33 30.53 22.49 31.17 49.69 31.27 47.98 49.87 58.43 64.42 53.94 68.86 81.46 68.87 84.21 84.22
8 0.009 8.62 27.09 21.15 28.79 53.06 24.91 53.44 46.82 59.20 65.19 56.23 67.94 89.92 68.81 94.01 94.02
9 0.014 10.23 29.35 19.35 27.75 50.66 24.71 43.96 40.21 49.97 60.28 49.71 62.85 85.68 60.43 89.69 89.70
10 0.011 9.06 30.62 18.77 27.70 52.42 27.13 46.31 43.42 54.82 57.98 53.87 64.77 84.58 65.44 87.76 87.77
11 0.012 7.90 27.80 20.96 27.30 54.39 25.88 46.96 45.73 54.57 62.19 54.94 61.90 79.35 65.45 83.14 83.16
12 0.015 6.55 34.99 19.56 27.18 52.78 20.32 48.32 35.00 56.28 59.63 55.00 70.44 77.79 70.07 80.57 80.58
13 0.015 7.44 25.93 20.08 27.80 53.45 29.01 44.86 50.70 59.20 63.78 45.76 65.02 82.15 72.87 85.58 85.59
14 0.010 9.39 28.20 23.42 30.78 57.36 23.57 55.61 41.42 61.25 71.88 55.29 72.89 94.52 72.33 97.62 97.63
15 0.010 7.88 34.97 19.53 27.22 55.36 28.91 52.03 44.52 57.76 66.29 56.38 66.50 89.23 70.49 93.57 93.58
16 0.012 8.73 37.03 20.85 27.50 63.85 31.16 55.60 52.68 63.40 70.34 59.10 72.70 88.90 77.69 93.29 93.31
17 0.010 6.55 25.52 18.72 24.37 41.83 20.88 45.87 35.82 53.63 59.57 43.46 61.54 79.56 63.39 83.59 83.60
18 0.009 9.17 36.31 19.01 27.33 55.25 22.89 48.04 46.27 56.21 60.62 61.60 65.88 82.56 70.87 86.30 86.32

Table 2-18: Illustration of a Sample Data Matrix Derived from Early Finish Times of Project Network Activity
230

2.4.6.2 Model Development: Sample Data Matrix Transformation

Repetitive sampling to make inferences about a population—set of people or items—is a widely

employed process to uncover the population's underlying behaviors in various fields of

applications of probability theory and statistics. Prior to making probability statements, the process

starts with the formulation of a statistical model 𝒳, also known as RMM, to describe all random

occurrences in the population of interest. For such a population, the true sample covariance matrix

𝜮, primarily unknown, characterizes the probability of these occurrences. With retrospective to

Section 2.3.1, an RMM is defined as a probability triple object (𝛺, ℙ, ℱ), where 𝛺 represents a set,

ensemble, or group of all possible matrices of interest, ℙ is a probability measure defined on 𝛺,

and ℱ is a family of measurable subsets of 𝛺. Below provided Table 2-19 is a summary of the

model whose elements will be defined in this section by a step-to-step procedure.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
231

Model Definition/System: Population of parameters (µ, 𝛴 + p features:


𝑿 , 𝑿 , … , 𝑿 ≝ p variables with each representing EF times of project network
activities of size p
X ~ Triangular Distribution a, b, c , 𝑖 1, ⋯ , 𝑝
Probability space/Triplet 𝛺, ℙ, ℱ : Random process representation:
𝒳 ≝ 𝑋 , 𝑋 , …, 𝑋
𝛺 = Sample space = Group of random matrices Input: n independent samples
𝑊 𝑛, 𝜮 𝒳 , 𝒳 , ⋯ , 𝒳 from population
Set of symmetric, positive, and definite random
matrices 𝑺𝑵𝑬𝑻 𝑿 𝑿 ~ 𝑊 𝑛, 𝜮
ℱ = σ-algebra on 𝛺, subsets of Ω
ℙ = Probability measure on 𝛺 Output: 𝑿 ≝ sample data
(i) known expression but complicated to matrix with observed values
compute/simplify, 𝒙 ,⋯, of 𝒳 , where 𝒙
(ii) assumed TW limit law 𝐹 𝑡 , 𝑥 ,⋯,𝑥 as in Scheme# 13
(iii) approximates using sampled matrices
(unbiased estimator of 𝜮 ) in terms of 𝐸𝐹 𝐸𝐹 ⋯ 𝐸𝐹
⎡ ⎤
eigenvalues with unknown n. 𝐸𝐹 𝐸𝐹 ⋯ 𝐸𝐹
𝑿𝒏 𝒑
⎢ ⎥
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎣𝐸𝐹 𝐸𝐹 ⋯ 𝐸𝐹 ⎦

Table 2-19: The Proposed Project Network Scheduling Mathematical Model

The development of the following procedure takes into account the conditions of application of

the well-known “university” results, mainly the celebrated Johnstone’s theorem extended by

various authors such as Soshnikov and Péché to relax most of its requirements. This theorem has

been extensively used to describe the limiting behaviors of complex systems of adequately

centered and scaled largest eigenvalues of random matrices that represent them. Thus, doing so

will help to fully define the model required to investigate the limiting behavior of probabilistic

durations of project network activities, therefore find their true probability distribution, which for

now is assumed to be the Tracy-Widom distribution 𝐹 for the reasons provided in the previous

sections.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
232

Accordingly, inspired by the ad hoc construction proposed by Johnstone (2001) and the

formulation of the statistic 𝑇 also called Hotelling’s 𝑇 (refer to Johnson and Wichern 2019), the

following is the procedure used to standardize the random matrix 𝑿 𝑥 ,⋯,𝑥 with each

𝒙 𝐸𝐹 … 𝐸𝐹 representing EF times of the jth activity pertaining to a given benchmark

project network. As it is the practice in the fields of application of probability and statistics, before

any descriptive and inferential statistical analysis, practitioners usually standardize raw data [e.g.,

Saccenti et al. (2011) and Forkman et al. (2019)]. This helps avoid degenerate matrices ending up

in nontrivial limits, especially in limit laws when n→∞.

Step 1: Standardize 𝑿 to create a new matrix 𝑾 𝑤 ,⋯,𝑤 in a way that each of its

columns will have a mean zero and unit Euclidean norm (‖∙‖2 ) as follows in Equation 2.60:

𝒙 𝒙 [Eq. 2-60]
𝒘
𝒙 𝒙

Step 2: Synthesize a Gaussian data 𝑿 𝑹 ∙𝑊 𝑥 𝑥 …𝑥 whose entries 𝑥

r w are obtained by multiplying the corresponding elements r and w of the matrices R

and W, respectively. The elements r of the matrix R are randomly selected according to a long

tail distribution like the chi-square 𝜒 distribution. This is crucial not only because of the

interesting features of long-tail distributions but also to randomize the entries 𝑥 of 𝑿 since

those of∙ 𝑊 are not random.

Step 3: Select a significance level α value to construct a confidence interval to encompass all

plausible random values of the test statistic that would not be rejected by the level α-test of the null

hypothesis. Otherwise, observed values may be too far from the hypothesized value. This process,

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
233

used by practitioners, helps to target observed values that lie in the 100 𝛼 % interval is also

referred to as the acceptance region for the designed test. One may refer to Johnson and Wichern

(2019) for more on this topic. Accordingly, let Equation 2.61 define the matrix 𝑹

/ [Eq. 2-61]
𝑹 𝑛𝑢 𝜒 , 𝑛, 𝑝

Where the constant 𝑛𝑢 is defined by Equation 2.62 below provided.

[Eq. 2-62]
𝑛 1 𝛼
𝑛𝑢 𝜒 2
𝑛 𝑛 𝑝

𝜒 , 𝑛, 𝑝 denotes a MATLAB function capable of generating an 𝑛 𝑝 random matrix with

entries distributed according to the chi-square distribution 𝜒 with n degree of freedom.

𝜒 𝛼
2 represents the inverse CDF of 𝜒 with n degree of freedom evaluated at the probability

values 𝑝 ∈ 0,1 .

Note that there is some leeway in constructing the expression of the above Equation. This is with

regard not only to the choice of α ( 𝛼 2) but also the long tail distribution. Appendix G provides a

handful of expressions examined empirically before selecting the one provided here, as it provided

satisfactory results for the current investigation.

Step 4: create a real Wishart matrix or sample covariance matrix 𝑺 from the new data matrix

𝑿.

1 [Eq. 2-63]
𝑺 𝑿 𝑿
𝑛 1

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
234

By Restituting 𝑿 by its expression given in Step 2 and then R by its expression provided by

Equation 2.61, the expression of the sample covariance matrix S becomes,

1 1 / /
𝑺 𝑹∙𝑾 𝑹∙𝑾 𝑛𝑢 𝝌 , 𝑛, 𝑝 𝑾 𝑛𝑢 𝝌 , 𝑛, 𝑝 𝑾
𝑛 1 𝑛 1

𝑛𝑢 / /
𝝌 , 𝑛, 𝑝 ∙𝑾 𝝌 , 𝑛, 𝑝 .𝑾
𝑛 1

Since the matrix product in the above expression is obtained by straight multiplications of the

corresponding entries of each matrix of interest, Equation 2.64 is the final derived expression of

S. From now on, it will be referred to as 𝑺 with the subscript NET added to avoid any confusion

with the generic sample covariance matrix S used throughout this manuscript up to this point. From

now on, the term sample covariance matrix will refer to 𝑺 denoting the sample covariance

matrix for probabilistic durations of a project network and defined by Equation 2.64.

, , / / [Eq. 2-64]
𝑺 𝑐 , , 𝝌 , 𝑛, 𝑝 ∙𝑾 𝑐 , , 𝝌 , 𝑛, 𝑝 ∙𝑾

with the constant 𝑐 , , given by Equation 2.65 below.

𝜒 𝛼 [Eq. 2-65]
𝑐 2
, , 𝑛 𝑛 𝑝

, ,
Step 5: Compute and sort them the eigenvalues of 𝑺 as follows 𝑙 𝑙 ⋯ 𝑙 . Given the

, ,
expression of 𝑺 as provided in Equation 2.64, the sample size n must be greater than p.
, ,
Otherwise, because the constant 𝑐 , , there will be no sample covariance matrix 𝑺 associated

with the sample data matrix 𝑿 .

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
235

Step 6: Determine the number of eigenvalues and the universality theorem required to analyze the

limiting behavior of the eigenvalues acquired in Step 5. Given the exploratory character of this
, ,
study, it is prudent to examine the mth largest eigenvalue of 𝑺 rather than focusing exclusively
, ,
on the largest eigenvalue of 𝑺 . As a result, investigating the first through fourth largest
, ,
eigenvalue of 𝑺 .should be adequate to derive additional insight into the limiting behavior of

project activity durations.

With retrospective to the universality of the TW (see Section 2.3.5), the sample covariance matrix
, ,
𝑺 , whose entries are not quite Gaussian, can serve to study the limiting behavior of

probabilistic durations of project networks. According to its definition, the 𝑝 𝑝 matrix 𝑺

belongs to the class of sample covariance matrices with a more general population that is not

governed by the normal distribution, for which researchers such as Bao et al. (2015) have recently

continued proving university for the limiting behavior of their normalized largest eigenvalue under

relaxed assumptions on the distribution of the 𝑥 . As a result, Soshnikov's (2002) universality

theorem, as extended by Péché (2008, 2009) and other authors by relaxing some of the
, ,
assumptions, is appropriate for this study. This is justified given the covariance matrix 𝑺 ’s

design, which is based on tried-and-true strategies used by practitioners when dealing with non-

Gaussian data. As a result, it is simple to demonstrate that the class of sample covariance matrices

constructed from sampled duration data of project networks meets the four requirements for

applying this universal theorem.

, ,
Step 7: Standardize the first four largest eigenvalues 𝑙 , 𝑙 , 𝑙 , and 𝑙 , of 𝑺 to obtain their

corresponding centered and rescaled values 𝑙 , and 𝑙 , with 𝑚 1, 2, 3, 4, as provided in

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
236

Equation 2.42 and Equation 2.43, respectively. Let Norm I and Norm II be both normalizations

which will be interesting to employ separately to gain more insights into the limiting behavior of

the largest eigenvalues, as stated earlier. The centering and norming 𝜇 and 𝜎 are defined in

Equation 2.40(a) and Equation 2.40(b), respectively.

, ,
Standardization Methods for Scaling the Eigenvalues 𝑙 , and 𝑙 of 𝑆

Norm I Norm II
Johnstone's (2001) Limiting Law Tracy and Widom's (2000) Limiting Law
𝑙 𝜇 𝑙 √2𝑛
𝑙 , 𝑙
𝜎 2 ⁄ 𝑛 ⁄

Table 2-20: Normalization Methods for Scaling the mth Eigenvalue of Sample Covariance
Matrix SNET

Step 8: Decide on the number of simulations to sample data from the population of activity

durations and derive the necessary test statistics based on the observations of 𝑙 , and 𝑙 . For the

hypothesis test to be completed in Step 9, unless otherwise indicated, a total number of 1000

simulations, denoted by N, will be performed to collect the order statistics 𝑙 , and 𝑙 for any

given identified network of size p. In various studies, like the current one, researchers have used

1000 simulations. Accordingly, this number is justified.

Step 9: Conduct a simulation-based experiment to perform a goodness-of-fit test based on the order

statistics 𝑙 of 𝑺 to verify the assumption that the limiting distribution of probabilistic

durations of project network activities is governed by the Tracy-Widom limit law 𝐹 (TW1) given

by Equation 2.36(b). The Kolmogorov-Smirnov (K-S) Goodness-of-Fit Test (see Section 1.6.7.2)

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
237

is more appropriate for this investigation. The reason is that other scholars, such as Saccenti et al.

(2011), have used it for validating distributional assumptions about their data. The test hypothesis

is formulated as follows:

H0: the Tracy-Widom limit law 𝐹 is the limiting probability distribution of project network

activities’ durations.

Versus the alternative

H1: the limiting probability distribution of project network activities’ durations is not distributed

according to the Tracy-Widom distribution 𝐹 .

As described in Section 1.6.4.3, a K-S test requires the computation of the quantiles 𝑙 , and

𝑙 of the theoretical distribution function 𝐹 𝑥 evaluated at the CDF q 𝑗 0.5 /𝑁

corresponding to the jth observation 𝑙 , and 𝑙 of 𝑺 , respectively. For their numerical

evaluations, the MATLAB routines developed by Dieng (2005) aided in approximating the Tracy-

Widom CDF 𝐹 𝑥 . Depending on the tabulated critical values of the maximum absolute

difference between sample and population CDFs, 𝑐 , , which is determined based on the

significance level α, the test is rejected if 𝐷 > 𝑐 , . When the K-S test results in the acceptance of

the null hypothesis H0, for validation purposes, a graphical representation of the data, such as Q-

Q PLOT, can validate the limiting distribution of the data obtained through hypothesis testing.

Step 10: Graph a Q-Q plot and or histogram of the empirical distribution to compare the

hypothesized distribution as described in Section 1.6.4. That is a scatterplot of pairs

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
238

𝑙 , ,𝑙 , to obtain a Q-Q plot based on Norm I. Alternatively, construct the pairs

𝑙 ,𝑙 to obtain a Q-Q plot based on Norm II.

With the ten steps devised in this section, one can create a mathematical model necessary to

confirm, through a goodness-of-fit test such as a K-S, whether the Tracy-Widom 𝐹 is the limiting

distribution of durations of project activities of a given network of size p. The question now is how

many samples (n) are required for this investigation? The following section provides a step-by-

step procedure to find the optimum sample n necessary to accept Ho.

2.4.6.3 Model Development: Finding the Optimum Sample Size n for the Data

This section proposes a procedure for finding the optimal sample size n for the project network

activity durations required to build a mathematical model to help understand their behavior. As

discussed in the previous section, this model is necessary to obtain samples from the population

needed to make inferences and probability statements about the unknown aspects of the underlying

distribution of project networks suspected to be governed by the Tracy-Widom laws thanks to its

universality. The empirical procedure to devise is based on universality theorems such as those

formulated based on the Tracy-Widom limit laws discussed in Section 2.3.4. The following

process assumes.

Step 1: Make a distributional assumption about the data. For this study, the assumption is that the

Tracy-Widom limit law 𝐹 (TW1) governs the limiting probabilistic durations of project network

activities which should be even more true with more extensive project networks.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
239

Step 2: Choose the total number of data points 𝑛 required for the generation of sample data

matrices for project networks 𝑿 𝑥 ,⋯,𝑥 with 𝑛 𝑝 and each 𝒙 representing EF times

of the jth activity on a given project network. Each network chosen for this inquiry comprises p

activities, as stated in the preceding sections, which also characterize their computational process.

Notably, the more points, preferably sequential integers with steady increments, the easier it is to

estimate values between points using the interpolation approach. The general rule is that

approximately 15 𝑛 points are required to obtain a decent interpolating curve.

Step 3: Follow the procedure described in the previous section to derive the sample covariance

matrix 𝑺 of the data based on 𝑿 and using Norm I and Norm II, determine its normalized

first through fourth largest eigenvalues 𝑙 with 𝑚 1, 2, 3, 4 for each 𝑛 value.

Step 4: Construct the scatter of pairs 𝑛 , 𝑙 for only for 𝑚 1 instead of all the four values

of m. Refer to Step 6 for the justification. As discussed in Section 1.6.4.1, the scatterplot as a

visualization technique can help uncover interesting "patterns" hidden within the data and locate

outliers or extreme observations.

Step 5: Calculate the deviations ∆ between the empirical and hypothesized parameters using the

known mean and variance of the assumed distribution. For every sample size 𝑛 , the number of

simulations required to calculate the empirical mean and variance must be specified. Because this

must be done for each sample size 𝑛 , 100 simulations would be adequate to obtain the observed

statistics. Now, to formulate the expressions of the deviations of means and variances to calculate

at each simulation,

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
240

let the mean and variance of the Tracy-Widom distribution F1 (respectively of the observed 𝑙 )

be denoted by 𝜇 (resp. 𝜇 ) and 𝑣𝑎𝑟 (resp. 𝑣𝑎𝑟 ), respectively. In addition, let ∆ , ,

and ∆ , , be the deviations calculated based on the statistics of the mth observed eigenvalues

𝑙 and the theoretical 𝜇 and 𝑣𝑎𝑟 , respectively. Equation 2.66 below provide their

expressions.

𝜇 𝜇 (a)
∆ , , 𝜇
[Eq. 2-66]
𝑣𝑎𝑟 𝑣𝑎𝑟 (b)
∆ , , 𝑣𝑎𝑟

The values of 𝜇 and 𝑣𝑎𝑟 are known and can be found in Table 2-3 as 𝜇

1.2065335745 and 𝑣𝑎𝑟 1.6077810345.

Step 6: Plot the pairs 𝑛 , ∆ , , to obtain the smoothing curve of deviations of means on the

one hand and the pairs 𝑛 , ∆ , , on the other hand. Subsequently, find the intersection of

the resulting curve with the n-axis at ∆ , , 0 (resp. ∆ , , 0 for the other curve). The

obtained value of n is the optimal sample size 𝑛 to consider for the verification of the

distributional assumptions required to build a mathematical model for the project network in

question identified for the investigation. Repeat the process for all selected networks for the study.

Unless empirically investigated, there is no guarantee that all the curves will yield the exact value

of 𝑛 for a given networks.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
241

Note that there is no need to determine the deviations ∆ , , and ∆ , , and plot curves for

each of the four mth largest eigenvalues to determine 𝑛 , . The reason is that they are the

eigenvalues of the same sample covariance matrix 𝑺 derived from the generated sample data

matrix 𝑿 . Thus, only 𝑛 , , obtained with the normalized largest eigenvalue (𝑚 1) will be

adequate to determine 𝑛 .

This section finishes with Step 6, which was designed to devise a procedure for determining the

ideal sample size n. As previously stated, the value of 𝑛 is essential for hypothesis testing to

validate any distributional assumption with known parameters of the assumed probability

distribution.

2.5 Research Results

2.5.1 Benchmark Network Structure Analysis: Complexity Measures

Given a complexity measure, which computational methodology can be found in Section 2.4.5.6,

the analysis of the values obtained for the benchmark networks collected for this study is facilitated

by classifying the computed values into five categories or groups. This is performed by plotting a

histogram of the complexity measure values to group values into five bins of equal length. Using

descriptive statistics, the frequency distribution of values deriving from the histogram

characterizes the complexity measure. This distribution can be defined by its measures of the

average – mean, median, and mode –and its measures of dispersion range and standard deviation

(Norman and Streiner 2003). Values of all computed complexity measures are provided in

Appendix H. However, due to a large number of collected networks, 2040 networks, only partial

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
242

results could be included in this manuscript to limit its final number of pages. These results are

also summarized in the series of tables provided in the subsequent sections below aimed to present

and analyze the complexity measures obtained for all 2040 network schedules. Last, to help

understand these complexity measures, the five networks selected to validate the methodologies

proposed in Section 2.4.5 are visited to relate them to the identified groups for each complexity

measure.

Network Complexity Measures - CNC/CM1 Results

Coefficients of a network (CNC) were computed for all 2040 PSPLIB network schedules generated

for this study using the approach described in Section 2.4.5.6 beginning on page 214. Appendix

E.2.1 contains some of the CNC values. To characterize the CNC values, the findings are divided

into five groups designated Group 1 through Group 5. All groups have the same length, which

equals 0.14, as shown in Table 2-21A or Figure 2.14A(a), which depicts the frequency distribution

of the CNC values. The CNC values of all PSPLIB networks range between 1.5 and 2.19, with a

range of 0.69 and a standard deviation (σ) of 0.2286, according to the histogram of CNC values.

Furthermore, the mean, mode, and median of the CNC distribution are all 1.85 and belong to Group

3. As a result, this distribution is both symmetric and multimodal. Furthermore, the PSPLIB

network CNC values are evenly divided among three categories, denoted by Group 1, Group 3,

and Group 5. Each set has one-third of J30, J60, J90, and J120 networks, with CNC values ranging

from 1.50 to 1.64, 1.78 to 1.92, and 2.06 to 2.20, respectively. This implies that PSPLIB network

schedules were prepared in three categories based on CNC values. The CNC values of the PSPLIB

networks, whose structure representations are shown in Figure 2.11A through Figure 2.11C, are

as follows: J3038-7 (2.125, Group 5), J6010-5 (1.5, Group 1), J902-4 (1.5, Group 1), and J12012-

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
243

9 are the J3038-7 (2.125, Group 5), J6010-5 (1.5, Group 1), J902-4 (1.5, Group 1), and J12012-9

(1.5, Group 1). It is worth noting that the computed CNC values are of the same magnitude order

as those proposed by Boushaala (2010, p. 774) for 50 building projects.

CNC Limits CM1 - Network Counts per Group and Size


Group Min Max j30 j60 j90 j120 Total %
1 1.5 1.64 160 160 160 200 680 33%
2 1.64 1.78 0 0 0 0 0 0%
3 1.78 1.92 160 160 160 200 680 33%
4 1.92 2.06 0 0 0 0 0 0%
5 2.06 2.2 160 160 160 200 680 33%
Total 480 480 480 600 2040

Table 2-21A: Summary of Complexity Measure Computations

Network Complexity Measures – Paths Ratios/CM2 Results

The methodology proposed on page 215 in Section 2.4.5.6 served to compute paths ratios of the

2040 PSPLIB network schedules collected for this study. A network paths ratio represents a

proportion of the network critical paths ─ obtained through 100 simulation runs ─ in all possible

paths connecting its start and end activities. To better describe the paths ratios of collected

networks, computed values are classified into five categories: Group 1, …, and Group 5 of the

same length equals 7. These values are provided in Appendix E.2.2 and summarized in Table

2-21B. Meanwhile, Figure 2.14A(b) depicts the frequency distribution of these ratios in %. It can

be concluded from its histogram that when simulating any PSPLIB network schedule 100 times,

the pathways ratio will vary between 0.13 % and 31.58 %, with a range of 31.45 % and a standard

deviation (σ) of 4.2. Furthermore, the distribution possesses a median of 2.74 %, a mode of 3.5

percent, and a mean of 5.26 %, all of which are in the first group or bin. These measures of average

characterizing the distribution of paths ratio values corroborate the histogram plot showing a

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
244

unimodal and asymmetric distribution. Because of its asymmetric shape and the data mean value

being located toward the histogram tail, it can be concluded that this distribution is positively

skewed, which agrees with the histogram plot in Figure 2.14A (b).

Moreover, the results in Table 2-21B suggest among all possible paths of each of the 81 out of 100

PSPLIB network schedules, only up to 7% of them can become critical paths. Each network has

at least one critical path out of many non-critical paths. Furthermore, a network with a few

activities is likely to have more probabilistic critical paths than a larger number of activities. The

exemplar network is a great example for having a paths ratio of 38.46%, while the highest paths

ratio is attributed to j306-4 with a ratio of 31.58% among the PSPLIB networks. Besides J30

networks, whose paths ratios fall under all groups, only the paths ratios of J60 networks get into

Group 3, while there are none for J90 or J120 networks in Group 3 through Group 5. From these

results, 14% is the maximum frequency of paths that can become critical for a J120 network. These

results agree with the PSPLIB networks, which structure representations are provided in Figure

2.12A through Figure 2.12C. Their paths ratios are as follows: J3038-7 (2.78%, Group 1), J6010-

5 (2.44%, Group 1); J902-4 (3.51%, Group), and J12012-9 (5.13%, Group 1).

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
245

Ratio Limits (%) CM2 - Network Counts per Group and Size
Group Min Max j30 j60 j90 j120 Total %
1 0 7 250 386 442 581 1659 81.3%
2 7 14 146 82 37 19 284 13.9%
3 14 21 54 12 1 0 67 3.3%
4 21 28 24 0 0 0 24 1.2%
5 28 35 6 0 0 0 6 0.3%
Total 480 480 480 600 2040

Table 2-21B: Summary of Complexity Measure Computations

Figure 2.15A: Distributions of Network Complexity Measures

Network Complexity Measures – Johnson Complexity Measures (D)/(CM3) Results

The Johnson complexity measure (D) of all 2040 PSPLIB networks is computed using the methods

developed in Section 2.4.5.6 (see page 216). Figure 2.15B (a) depicts their frequency distribution,

which groups the D values into five categories to assist their analysis. Appendix E.2.3 contains

partial results of the computed D values, whereas Table 2-21C below contains tabular information

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
246

derived from the distribution of the D values. According to the graph, the distribution of D values

is unimodal. This distribution has a median value of 46, a mean value of 48.82, and a mode value

of 52.5.

Additionally, the distribution values vary between 13 and 95, with a standard deviation of 22.55.

According to Table 2-21C, 97 percent of the J30 networks fit into Group 1 since their D values

range between 13 and 26. Except for the J30 networks, the category with the most often reported

D Values (27.8 percent) is represented by the D values of the J60, J90, and J120 networks.

Moreover, the J30 and J60 networks may be found on the left side of this category. Alternatively,

the J90 and J120 networks can be located on the right side. Furthermore, Group 5 has 45.5 percent

of J120 networks and is entirely composed of these networks with D values ranging from 78 to 95.

As a result, it is possible to conclude that the D values of PSPLIB networks fluctuate proportionally

with network size. It is worth noting that the exemplar network, with a total number of activities

or size of 17, has a D value of 10, putting it in Group 1.

D Limits CM3 - Network Counts per Group and Size


Group Min Max j30 j60 j90 j120 Total %
1 10 27 464 0 0 0 464 22.7%
2 27 44 16 318 64 0 398 19.5%
3 44 61 0 162 219 186 567 27.8%
4 61 78 0 0 197 141 338 16.6%
5 78 95 0 0 0 273 273 13.4%
Total 480 480 480 600 2040

Table 2-21C: Summary of Complexity Measure Computations

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
247

Network Complexity Measures – Normalized Complexity Measure (Cn)/(CM4) Results

The methodology proposed in Section 2.4.5.6 (see page 218) assisted in computing the Cn values

of the 2040 benchmark network schedules. The histogram of the networks’ Cn values is plotted to

group values into five distinct categories of equal intervals. Partial results of Cn values are

provided in Appendix E.2.4 and a summary of all results followed by the first Group through the

last Group. Table 2-21D below is tabulated information derived from the histogram of the Cn

values shown in Figure 2.15B(b). The Cn values range from 12.075 to 38.581, with a range value

of 26.505 and a standard deviation of 7.037. The median value is 20.709 (Group 2), the mode value

is 19.000 (Group2), and the mean value is 21.350 (Group 2). From these values, it can be derived

that the distribution of the Cn values of the PSPLIB networks is asymmetric and positively skewed.

The Cn values of the J120 networks are the most frequent ones representing 53% of results in the

modal Group (Group 2). For the J90, one-third of J60 networks are in Group1, Group 2, or Group

3, with all Cn values ranging from 10 to 28. This observation is similar to the j120 networks, except

those results are not equally split between groups ─ Group 2 or modal Group get 60% of the Cn

values.

Regarding the Cn values of the J60 networks, there are not the most frequently reported. One-third

of them have their Cn values comprised between 10 and 16, whereas two-thirds of the remaining

J60 networks have theirs between 22 and 28. For the J30 networks, one-third of these networks

are either in Group2, Group 4, or Group 5. In Group 4 and Group 5, only Cn values of the J30

networks are reported in these categories. It can also be noticed that the Cn values of PSPLIB

networks are disproportional to the network sizes. This is well expressed in Table 2-15, which

provides Cn values of the four networks j3038-7, j6010-5, j902-4, and j12012-9.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
248

Cn Limits (%) CM4 - Network Counts per Group and Size


Group Min Max j30 j60 j90 j120 Total %
1 10 16 0 160 160 200 520 25.5%
2 16 22 160 0 160 361 681 33.4%
3 22 28 0 320 160 39 519 25.4%
4 28 34 160 0 0 0 160 7.8%
5 34 40 160 0 0 0 160 7.8%
Total 480 480 480 600 2040

Table 2-21D: Summary of Complexity Measure Computations

Figure 2.15B: Distributions of Network Complexity Measures

Network Complexity Measures – Order Strength (OS) / CM5 Results

The methodology provided in Section 2.4.5.6 (see page 220) enabled the calculation of the OS

values of all the benchmark networks considered for this study. A classification of the computed

OS values into five groups was necessary to facilitate the analysis. This is performed through the

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
249

histogram plot in Figure 2.14C(a) of the calculated OS values classifying results into five bins of

equal length. On page 443 of Appendix E.2.5, a summary of the results ranging per group can be

found. Due to many networks, only partial results are made available. However, Table 2-21E

provides an overview of all the OS values. As depicted, groups are not equals. All the PSPLIB

J120 and 2/3 of J90 networks, having the most significant numbers of activities, fall in Group 1,

representing 45.1% of the 2040 networks considered. The second group is made of 2/3 of the J60

networks and the remaining 1/3 of the J90 PSPLIB networks representing 23.5% of the total

networks. The third and fourth groups are equal frequencies and contain 1/3 of the J60 and 1/3 of

the J30 networks. The last group included OS values of only J30 networks, with 2/3 of the network

falling under this category.

The proportions of the networks throughout different groups suggest that the OS values are

disproportional to the network sizes. The J30 networks have the smallest network sizes but have

the greatest OS values ranging from 0.092 to 0.14 (Group 4 and Group 5), followed by the J60

networks, whose OS values are between 0.068 and 0.116 (Group 2, Group 3). The J120 with the

greatest sizes (122 activities) has the lowest OS values between 0.02 and 0.044 (Group1). Overall,

the frequency distribution of the OS values of PSPLIB networks ranges from 0.0248 to 0.1371,

with a range value of 0.1123 and a standard deviation of 0.0355. The distribution of the OS values

has a mode value of 0.032, a median value of 0.0592, and a mean value of 0.0621. As a result, the

distribution of the OS values of the PSPLIB networks is unimodal and positively skewed.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
250

OS Limits CM5 - Network Counts per Group and Size


Group Min Max j30 j60 j90 j120 Total %
1 0.02 0.044 0 0 320 600 920 45.1%
2 0.044 0.068 0 320 160 0 480 23.5%
3 0.068 0.092 0 160 0 0 160 7.8%
4 0.092 0.116 160 0 0 0 160 7.8%
5 0.116 0.14 320 0 0 0 320 15.7%
Total 480 480 480 600 2040

Table 2-21E: Summary of Complexity Measure Computations

Network Complexity Measures – Restrictiveness (RT)/ (CM6) Results

The methodology proposed in Section 2.4.5.6 (see page 221) enabled the numerical determination

of RT values for all 2040 PSPLIB networks. The resulting RT values are categorized into five

groups to assist in the analysis, and their histogram is presented in Figure 2.15C(b). The RT values

of the benchmark networks range from 0.1780 to 0.6875 in Table 2-21F, with a range of 0.5095

and a standard deviation of 0.1274. The average measures of the frequency distribution of the RT

values are 0.3804 for the mean value, 0.4000 for the mode value, and 0.5331 for the median value.

From these values, it can be concluded that the distribution of RT values is asymmetric and

negatively skewed. The repartitions of networks throughout the five groups concerning their sizes

are consistent with the previous complexity measures. RT values and network sizes are inversely

proportional. Networks with the most significant number of activities have a maximum RT value

of 0.46 (Group 3), while the networks with the smallest number of activities have a maximum RT

value of 0.7 (Group 5). It is worthwhile noting that the modal group contains networks of all the

four sizes networks considered for this study.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
251

RT Limits (%) CM6 - Network Counts per Group and Size


Group Min Max j30 j60 j90 j120 Total %
1 0.1 0.22 0 0 13 179 192 9.4%
2 0.22 0.34 1 152 237 219 609 29.9%
3 0.34 0.46 148 176 213 202 739 36.2%
4 0.46 0.58 172 152 17 0 341 16.7%
5 0.58 0.7 159 0 0 0 159 7.8%
Total 480 480 480 600 2040

Table 2-21F: Summary of Complexity Measure Computations

Figure 2.15C: Distributions of Network Complexity Measures

Below provided in Table 2-22 is a summary of all the complexity measures for the four PSPLIB

networks and the exemplar network.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
252

Network Name Exemplar j3038-7 j6010-5 j902-4 j12012-9


Network No. 0 308 486 1175 1470
Total Activities 17 32 62 92 122
CM1 26 68 93 138 183
CM2 13 72 41 57 78
CM3 10 27 30 44 57
CM4 22.848025 37.20747 15.29598 13.23389 12.07533

CM5 0.2285714 0.137097 0.04918 0.032967 0.024793


CM6 0.647619 0.580645 0.315706 0.231008 0.186154
CM7 0.3846154 0.027778 0.02439 0.035088 0.051282

Table 2-22: Complexity Measure Values of Few Networks

2.5.2 Uncovering the Underlying Behavior of Project Network Schedules

2.5.2.1 Data Information

To fully understand the underlying behavior of project network schedules, it is critical to choose a

diverse sample of networks from a pool of 2040 benchmark project network schedules. Utilizing

one of the six complexity measures described in Section 2.4.5.6 should assist in identifying a few

networks to consider. Note that these values have already been computed, classified, and analyzed

for each network in Section 2.5.1. Thus, the available data can aid in identifying appropriate project

networks to investigate their schedules' inherent behavior. Given that restrictiveness, denoted by

RT, is a widely used network structure analysis tool for project network schedules in construction

scheduling and graph theory, it would be appropriate for this task. Thus, based on their RT values,

which range from "0.1780" to "0.6875", a few networks of varying sizes have been identified to

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
253

represent each of the five RT value classes [see Figure 2.15C (b)]. As a reminder from Section

2.4.5.6, an RT value near "0" indicates a perfect parallel AON diagram, whereas a value of "1"

indicates a serial AON.

Along with obtaining representatives for each RT value category, other networks were included

whenever possible to get pairs of networks with equal sizes and RT values. Additionally, networks

of varying sizes but identical RT values have been identified. Indeed, all the resulting project

networks with descriptive characteristics have been identified and arranged in Table 2-23. These

carefully chosen networks should aid in gaining insight into the behavior of project network

schedules as a function of their sizes and complexity measured in terms of RT values. Yet can one

predict the behavior of a network schedule based on its RT value or/and size? The following

section will attempt to address these concerns.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
254

Identified Network Information for Simulation Runs


j30 j60 j90 j120
RT RT RT RT
No. Network Value Class Network Value Class Network Value Class Network Value Class
1 j305_7* 0.3387 2 j602_7* 0.2538 2 j905_3* 0.2000 1 j12014_1* 0.1780 1
2 j3012_6* 0.3952 3 j6010_5 0.3157 2 j9010_5* 0.2174 1 j12012_9ᵛ 0.1862 1
3 j3011_1* 0.4597 3 j6015_1 0.3384 2 j901_3*ᵛ 0.2179 1 j1205_5 0.1901 1
4 j3024_8* 0.4597 3 j6020_7* 0.4030 3 j9014_5 0.2210 2 j1209_10 0.1901 1
5 j3032_4*ᵛᶺᵘᵐᵚ 0.5161 4 j6028_9* 0.4030 3 j902_4 0.2310 2 j12012_1* 0.2179 1
6 j3041_8* 0.5786 4 j6035_8 0.4595 3 j9031_1 0.3395 2 j1201_2 0.2203 2
7 j3038_7* 0.5806 5 j6040_5ᵛ 0.5161 4 j9045_1* 0.4599 3 j12024_2* 0.3397 2
8 j3034_10* 0.5907 5 j6042_6* 0.5738 4 j9037_7 0.4993 4 j12058_1 0.4489 3
9 j3038_5* 0.6169 5
10 j3048_2* 0.6512 5
11 j3037_6* 0.6875 5

Table 2-23: Benchmark Networks of Interest for the Study of their Underlying Behaviors
Table’s annotations:
(1) For all the 35 networks, their sample covariance matrices S are constructed based on the main formula with 𝛼𝑠𝑖𝑚 0.025 whose eigenvalues are normalized
using Norm I.
(2) For all 22 networks denoted by a *, their simulated matrices S are created using the main formula with 𝛼𝑠𝑖𝑚 0.025 whose eigenvalues are normalized
according to Norm II.
(3) For the four networks denoted by ᵛ or ᶺ, their simulated matrices S are based on the main formula, but with 𝛼𝑠𝑖𝑚 0.05 or 𝛼𝑠𝑖𝑚 0.1, respectively. Norm I
normalizes the eigenvalues.
(4) For network j3032-4 with the suffix ᵘ, rather than 103 (for all other networks), 104 simulations are based on the main formula in conjunction with Norm I.
(5) For network j3032-4 with the suffixes ᵐ or ᵚ, simulated networks are constructed using the main formula without the 𝛼𝑠𝑖𝑚 operator, and the eigenvalues are
normalized according to Norm I or Norm II, respectively.
(6) Each simulation of S allows the collection of the first through the fourth largest eigenvalues. However, due to the limited time available for this investigation,
only the first greatest eigenvalues for the 24 networks, j60 through j120 corresponding to Note (1) have been collected.
(7) For each of the groups j30, j60, j90, or j120, the underlined RT values represent a pair of networks with identical RT values.
(8) Italicized and bolded RT values indicate identical RT values in intergroup networks.
255

2.5.2.2 Exploring Patterns in PSPLIB Data Set (Scatterplots)

This section aims to treat and analyze the results of a series of 100 simulations run to determine

the optimum size n of the sample necessary to create an appropriate sample data matrix serving as

input in studying the behavior of project network schedules of interest. A numerical

implementation of the proposed mathematical model and procedures described in the methodology

section (see Section 2.4.6) helped run the required simulations for the study. Like previous

sections, MATLAB aided in carrying out all simulations. For a given project network of size p

(refer to Table 2-23 for the notes), and for each data point i related to the sample size 𝑛 , Table

2-24 contains the necessary information to construct a sample data matrix 𝑿 .

Information to construct an 𝑿 matrix at each of 100


simulation runs
Applications p 𝑛 𝑛 𝑛 𝑛

All Networks defined 32 10 40 350 32


in Notes 1, 3, 4, and 62 50 100 1100 21
network j3032_4ᵐ in 92 100 100 1900 19
Note 5 122 250 250 5000 20
32 10 40 350 32
All Networks defined 62 20 70 350 15
in Notes 2 92 25 100 450 15
122 50 125 725 13
Network j3032_4ᵚ in
32 10 40 460 43
Note 5

Table 2-24: Sample Sizes and Numbers of Data Points Required for Project Networks

The random matrix 𝑿 serves to derive the first largest eigenvalues 𝑙 , and 𝑙 of the sample

covariance matrix 𝑺 at each simulation j in a series of 100 simulations. Reflecting again on

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
256

Section 2.4.6, the first largest eigenvalues 𝑙 , and 𝑙 are normalized using the normalization

methods Norm-I and Norm-II methods, as specified in Table 2-20 of Section 0. The number of

rows 𝑛 of the matrix 𝑿 varies incrementally, with a step of 𝑛 , from 𝑛 to 𝑛 .

Notably, each row of 𝑿 contains p EF times, each of which computed using the CPM to

schedule the project network of interest based on independently sampled durations from a

triangular distribution with parameters a, b, and c set for each activity on the network.

Determining the values of 𝑛 ,𝑛 and 𝑛 as provided in Table 2-24

Nonetheless, for each fixed value of p, the values of 𝑛 were chosen under the requirement that

n be greater than p to construct the sample covariance matrix 𝑺 from a data matrix 𝑿 (see

Section 0, steps 4 and 5). Additionally, a closer value of n to the value of p resulted in positive and

extremely large eigenvalues for 𝑺 . However, the value of 𝑛 was found through trial and

error. Given a normalization procedure and a size value of p for all the networks identified in Table

2-23, experimenting initially with simply a few points 𝑛 and 𝑛 picked at greater distances from

one another can assist in determining 𝑛 . More precisely as it will be implemented later, the

statistics of the matrix 𝑺 's normalized first greatest eigenvalue collected at each of the 100

simulations served to select a suitable 𝑛 . The objective was to determine a value of n that would

produce a negative average value of the normalized first eigenvalues smaller than the mean of the

expected probability distribution (𝜇 ) after a few trials. Finally, 𝑛 was chosen based on the

total number of points 𝑛 desired between 𝑛 and 𝑛 .

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
257

For a more detailed examination of the values in Table 2-24, it is possible to observe that as p

grows, the total number of points 𝑛 decreases. The reason is that this simulation study found

that simulating a larger project network schedule takes longer than simulating a smaller network

schedule. Refer to step 3b of Section 2.4.5.5 for the numerical execution of the approach required

to schedule a project network using Kelly's (1961) CPM. The flowcharts for the methodology's

forward and backward passes are included in Appendices C1 and C2 on pages 409 and 410.

Consequently, the simulation study produced more equidistant and closed points when the project

network was smaller than larger. Finally, the values of 𝑛 ,𝑛 , and 𝑛 used to calculate

the 𝑛 are highly dependent on the method employed to normalize 𝑺 's largest eigenvalue.

They all three, particularly 𝑛 , drop significantly as n increases from Norm-I to Norm-II.

Constructing the scatterplots of pairs 𝑛 , 𝑙 , and 𝑛 , 𝑙

Using the information provided in Table 2-23 and Table 2-24, necessary to run the required

simulations, assisted in generating the outputs needed to scatterplot the pairs 𝑛 , 𝑙 , and

𝑛 ,𝑙 according to Norm I and Norm II, respectively, with j varying from 1 to 100 and 𝑛 or

𝑛 varying in increments of 𝑛 from 𝑛 to 𝑛 . The scatterplots provided in Figure 2.16A

and Figure 2.16B, were created in MATLAB by categorizing project networks according to their

sizes and the normalizing method used to determine 𝑙 , and 𝑙 . Additionally, the scatterplots

depicted in Figure 2.16A for the j30 and j60 networks and Figure 2.16B for the j60 and j120

networks were produced utilizing all 35 networks selected for this simulation investigation.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
258

Normalized Eigenvalue l(n)


1

Figure 2.16A: Scatterplots of Normalized 1st Largest Eigenvalues Versus X’s n Rows
259

Normalized Eigenvalue l(n)


1
Normalized Eigenvalue l(n)
1

Figure 2.16B: Scatterplots of Normalized 1st Largest Eigenvalues Versus X’s n Rows
260

In each of the previous figures, each graph's legend presents the networks in ascending order of

their complexity as measured by their RT values. Appendix F.1 on page 456 illustrates scatterplots

created using a single network in the case of j3024-8 or j3032-4. The following paragraphs look

at scatterplots on a global and individual level.

Discovering a Universal Pattern for Construction Project Networks

Constructing the scatterplots in the matter explained in the previous paraphs enabled the

unexpected discovery of a distinct and consistent pattern associated with project networks of

various sizes, regardless of the method used to compute the largest eigenvalue of the sample

covariance matrices 𝑺 derived from the population of EF times for project activities. This

discovery is a crucial development since it provides insight into the fundamental behavior of

project network schedules. The smaller networks (j30 and j60), shown in Figure 2.16A, have a

higher data point density than their larger counterparts (j90 and j120), as depicted in Figure 2.16B.

By connecting all the scatterplots' markers, the slope, defined as the ratio of vertical to horizontal

change between any two distinct points on a line, of the resulting curve may provide crucial

information. The curve begins steeper at the left and gradually becomes almost horizontal to the

right as the number n of samples increases. This trait of the curve indicates a significant degree of

collinearity in the data, which implies a very low variability in the largest eigenvalues of the sample

covariance matrices as the sample size n becomes larger. The following section provides a more

detailed study of this pattern based on its alternative patterns.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
261

2.5.2.3 Using Patterns in the Data to Infer Structure in The PSPLIB Data Set

The data gathered from the simulation results discussed in the preceding section also aided in

computing the deviations ∆ , , and ∆ , , between the means and variances of the empirical

and hypothesized distributions. While the statistics of the assumed distribution are well known,

the empirical statistics were computed using the observed mean and variance of the normalized

, ,
first largest eigenvalue 𝑙 , and 𝑙 of the sample covariance matrix 𝑆 . Given a sample

size 𝑛 , 100 replicas of a project network schedule of size p supplied the inputs required to
, ,
construct the sample data Matrix 𝑿 , then derive the matrix 𝑆 . As previously indicated, 𝑛

varies from 𝑛 to 𝑛 in a step of 𝑛 , generating 𝑛 points.

The 100 needed simulations produced the outputs required to draw a total of 𝑛 pairs of points

𝑛 ,∆ , , or 𝑛 , ∆ , , in a diagram where the horizontal or x-axis represents the

sample size n and the vertical or y-axis represents the deviation ∆ or ∆ , respectively. Once

again, MATLAB assisted with the graphical depiction of the data, producing the smoothed curves

seen in Figure 2.16(a) and Figure 2.16(b), generated by connecting consecutive points with a

straight line. While the curves supplied here only reflect the set of j30 networks, the remaining

curves illustrating the sets of project networks j60, j90, and j120 described in Table 2-23 can be

found in Appendix F.2 and Appendix F.3. The following graphical analysis of the curves will be

conducted case-by-case, distinguishing between mean and variance deviance plots. For

simplification, the curves obtained using mean values ∆ (resp. ∆ ) will be referred to as a mean

(variance) deviation curve.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
262

Case 1: Graphical Analysis for Mean Deviation Curves

As with normal yield curves (finance), learning curves (construction and engineering cost

estimating), and stress-strain curves (materials science and engineering), a collection of mean

deviation curves provides a graphical representation of combined networks of equal size but

varying complexity. Each curve is a scatterplot of pairs 𝑛 , ∆ , , —For pairs of points

obtained using the set of j60 networks, see page 463 of Appendix F.4. It is supposed to assist in

determining the optimum sample size required to verify the limiting distributional assumption of

jointly sampling the durations of project network activities using a triangular distribution.

Additionally, as the sample size 𝑛 increases by taking values between 𝑛 and 𝑛 (see Table

2-24), the ideal sample size produced from either of these curves should aid in testing the

hypothesis that the Tracy-Widom distribution of type 1 is the true probability distribution of the

, ,
normalized extreme eigenvalue 𝑙 , or 𝑙 of a sample covariance matrix 𝑺 . The following

paragraphs analyze the curves' trends in-depth based on their initial slopes, their yield points, and

finally, their asymptotic behavior as the sample size n increases.

What is remarkable about Figure 2.17 (or the other curves for the sets of j60, j90, and j120

networks in Appendix F.2) is the persistent and unique phenomenal pattern, similar to that shown

in Figure 2.16A and Figure 2.16B. However, unlike the previous pattern, this one is inverted and

takes the shape of a downward and monotonic concave curve with positive slopes decreasing from

left to right as the sample size n grows. As can be seen in each picture, regardless of the

normalization method used (Norm I/Norm II), each upward-sloped curve begins with somewhat

more significant deviations and has a sharper slope that levels off as the sample size increases.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
263

(a)

(b)

Figure 2.17: Plots of Deviations between Means of the Assumed PDF and Empirical PDF
of a Set of j30 Project Network Schedules's Largest Eigenvalues

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
264

For example, using Norm I, the highest slope of 8.57 is seen near the beginning of the mean

deviation curve for Network j3032-4, between points with abscissa 40 and 50. Using Norm II, the

steepest slope of 5.60 is obtained with the same network and points.

Remarkably, the spots with the steepest slopes are located at the beginning of each curve and are

characterized by negative and highly significant deviations, which correspond to smaller sample

size values. After each curve reaches a certain point, which is surprisingly like the "yield point"

on a stress-strain curve, the vertical to horizontal change rate decreases but eventually vanishes as

the sample size increases. Because of the striking similarity between the curves, it is worth

providing some facts about the yield points, which may help with the current study. A yield point

indicates the end of elastic behavior and the start of plastic behavior. In addition, at less than the

yield point, a material deforms elastically and returns to its original shape. Once the yield point is

passed, a portion of the deformation, termed plastic deformation, is permanent and irreversible.

Nevertheless, the downward and concave curve of deviation of means reaches its maximum

curvature at this yield point, corresponding to a zero deviation ∆ 0 .

This unique point, which exists for any of the curves, corresponds to the optimal sample size,

denoted by 𝑛 , most likely to result in the observed and postulated means being equal. For

example, network j3011-1 has a 𝑛 value of 105 for Norm I, which happens to be the smallest

value achieved with the set of j30 networks. Whereas, with the same set of networks but with

Norm II, the minimum 𝑛 value is 49, corresponding to network j3024-8. Given a set of networks

and a normalization procedure, the closer the mean deviation curve is to the origin, the lower the

optimum sample size 𝑛 . Alternatively, the larger the value of 𝑛 , the further the curve is from

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
265

the space origin. For Norm I, network j3032-4, depicted in red, has the highest 𝑛 value of 239

and is the farthest from the space origin. Alternatively, the greatest 𝑛 value of 65 corresponds

to the same network j3032-4, whose red curve appears again at the bottom of all curves. However,

because the networks are ordered from smallest to largest in the legend of each figure, it is worth

noting that the values of 𝑛 are unrelated to the RT values. Furthermore, this pattern is constant

over a range of different-sized networks.

Finally, the mean deviation curves continue to rise after reaching the yield point 𝑛 but at a

slower and more gradual rate as the sample size n grows. The inspection of any curves collection

reveals the steepest slopes at the beginning of the curves that gradually diminish in steepness as

the sample size increases. For example, regarding slopes, the curve associated with network j3032-

4 begins at 8.57 and finishes at 0.074 with Norm I and 5.60 to 0.168 with Norm II. When

combining the graphical analysis and computed slopes of individual curves, it becomes clear that

the asymptotic rate of convergence to the x-axis is faster with the curves obtained using Norm I

than with the curves obtained using Norm II for the same sample size. Additionally, one can project

that all the curves represent a collection of equal size networks, but varying complexity will

eventually merge into a straight line. Appendix F.5 contains illustrations of the slopes computed

using Norm I and Norm II for the j3032-4 network. Finally, it is worth mentioning that the greater

the absolute value of ∆ , the further away from the hypothetical mean the statistic mean of the

underlying joint sample distribution of project network activity durations is.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
266

Case 2: Graphical Analysis for Variance Deviation Curves

A collection of variance deviation curves resembles an indifference map of indifference curves

used in economics to investigate customer choice (demand) and budget restrictions. It is a graph

reflecting combined networks of equal size but varying complexity. Each curve, in Figure 2.18 for

the j30 networks or Appendix F.3 for the j60, 90, and 120 networks, plots pairs of points

𝑛 ,∆ , , in the cartesian coordinate system, with the x-axis representing the number of rows

𝑛 of the sample data matrix 𝑿 and the y-axis representing the deviations ∆ , , of the

observed largest eigenvalues' variances from the anticipated distribution variance. For pairs of

points obtained using the set of j60 networks, see page 464 of Appendix F.4. The largest

, ,
eigenvalues 𝑙 , or 𝑙 are determined from the sample covariance matrix 𝑺 , with 𝑛 taking

values between 𝑛 and 𝑛 for a p-dimensional project network (see Table 2-24). These curves

serve a dual role. The first is to identify the optimal sample size n necessary for each network to

validate the limiting distributional assumption on the joint sampling distribution of project network

activities' durations, which are individually randomized using a triangular distribution. Another

objective is better to understand the limiting behavior of project network scheduling.

Given the comprehensive graphical analysis of the mean deviation curves (refer to case 1 on page

262), one may use the same approach to examine the current curve patterns according to their

characteristics graphically. As a result, the subsequent analysis will concentrate on the previous

referenced figures' apexes. As with the previous pictures, an intriguing and recurrent pattern can

be observed across multiple networks of varying sizes and complexity, regardless of the approach
, ,
used to normalize the greatest eigenvalues of the sample covariance matrix 𝑺 ..

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
267

(a)

(b)

Figure 2.18: Plots of Deviations between Variances of the Assumed PDF and Empirical
PDF of a Set of j30 Project Network Schedules 's Largest Eigenvalues

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
268

However, the present tendency is in the opposite direction of the previous one. Overall, the trend

is downward and convex toward the origin. In addition, the slopes of the curves decrease from left

to right. However, the slopes do not diminish in a continuous or monotonous manner. There are

irregularities at the point when the curvature of each curve peaks just before its slopes begin to

drop as the sample size increases rapidly. The figures represent these anomalies by zig-zag lines

linking pairs of points to generate any of the curves displayed. After computing and inspecting the

slopes of each curve as their values change from positive to negative, these irregularities become

readily apparent. Appendix F.6 contains illustrations of the slopes computed using Norm I and

Norm II for the j3032-4 network. Consequently, due to the absence of continuity between the

points that comprise these curves, this study determined that the variance deviation curves are

unsuitable for this investigation.

Regardless of their shortcomings, some observations from the variance deviation curves' figures

are observed. Since networks are ranked in increasing order of complexity, irrespective of the

normalizing method or network size, each figure illustrates that complexity is independent of the

sample size associated with a zero deviation. Because this sample size is different from 𝑛 , let

𝑛 refer to it as from here on. The curve with the lowest 𝑛 appears at the bottom of the

variance deviation map from any figures. Alternatively, the one with the highest 𝑛 is the one

that is the furthest from the origin. This arrangement is identical to the one seen with the mean

deviation curves. As n grows, the rates of change steadily drop until they reach an asymptotic value

of zero near the horizontal axis for Norm I and ̶ 1 for Norm II. With sufficiently large n values, all

curves converge to a single straight line asymptotically parallel to the x-axis. Convergence occurs

more rapidly with Norm I than with Norm II.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
269

Comparing the Discovered Patterns: on 𝑛 versus 𝑛 and Norm I versus Norm II

Comparing the values of 𝑛 and 𝑛 derived using the mean and variance deviation curves for

both normalization methods, Norm I and Norm II, make it clear that both values are different for

any given network regardless of its size or normalization method. The discrepancy is magnified

when networks of a larger size are compared to networks of a smaller size. The value of 𝑛

appears to be higher than the value of 𝑛 for smaller networks j30 and j60, except for network

j6010-5. For this network, the 𝑛 value of 137 is less than the 176 representing the value of

𝑛 . Except for j901-3, which has 177 and 276 for its 𝑛 and 𝑛 , respectively, the same

observation established for smaller networks persists for larger ones for Norm II. For Norm I, the

observation no longer applies for j90 networks, where all 𝑛 values are significantly greater than

𝑛 , while 𝑛 values are considerably greater than 𝑛 values in j120 networks.

The following result is reached by comparing the 𝑛 values obtained using the two normalization

approaches. 𝑛 values calculated using Norm I are more important than those computed using

Norm II. Additionally, the disparities become more pronounced as network sizes increase. For

instance, the ratio of 𝑛 values obtained using Norm I to those obtained using Norm II varies

between 2 and 4 for j30 networks but between 5 and 7 for j120 networks. As a result, when the

first normalizing approach (Norm I) is used, the resulting sample data matrices are more

considerable in size than when the second normalization method is used (Norm II). Concurrent

observation of the mean and variance deviation curves for each given network size and

normalization method demonstrates that choosing a sample size greater than 𝑛 results in

smaller to zero variances for the highest eigenvalue of the sample covariance matrices. With

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
270

reference to the first chapter, sample variance measures the variability or dispersion of data values

near the mean—the greater the variance, the wider the spread.

2.5.2.4 Using the Uncovered Pattern to Derive Important Results

Simulation Outputs

The following Table 2-25 summarizes the outputs of a series of 100 simulations necessary to plot

all the figures analyzed throughout the previous section in terms of mean and variance deviation

curves provided in Figure 2.17 and through Figure 2.18 and the ones in the Appendix F.2 and

Appendix F.3. This table focuses only on simulation outputs obtained with the deviations of mean

curves because this study found the variance deviation curves unsuitable for the network behavior

investigation. The single table categorizes simulation outputs by a set of networks of equal size

sorted in increasing order according to their complexity in terms of RT values. In addition, this

table helps verify the lack of correlation between the optimal sample size 𝑛 and complexity

across the set of networks. Moreover, this table is also valuable for learning more about the limiting

behavior of durations of project network activities by providing answers to the research's crucial

questions.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
271

Crossing Points (𝑛 predicted with 100 Simulations)


Norm I Norm II
Network RT 0.05 0.1 0.2 0.05
j305-7* 0.3387 119 51
j3012-6* 0.3952 201 61
j3011-1* 0.4597 105 50
j3024-8* 0.4597 106 49
j3032-4* 0.5161 239 266 298 65
j3041-8* 0.5786 126 52
j3038-7* 0.5806 129 53
j3034-10* 0.5907 115 51
j3038-5* 0.6169 150 55
j3048-2* 0.6512 145 55
j3037-6* 0.6875 114 51
j602-7* 0.2538 371 103
j6010-5 0.3157 384
j6015-1 0.3384 835
j6020-7* 0.4030 840 137
j6028-9* 0.4030 619 124
j6035-8 0.4595 620
j6040-5 0.5161 507 542 590
j6042-6* 0.5738 500 117
j905-3* 0.2000 1032 177
j9010-5 0.2174 1009 175
j901-3* 0.2179 691 732 781 161
j9014-5 0.2210 1052
j902-4 0.2310 1198
j9031-1 0.3395 884
j9045-1* 0.4599 1806 217
j9037-7 0.4993 1502
j12014-1* 0.1780 1631 261
j12012-9 0.1862 1660 1744 1848
j1205-5 0.1901 1593
j1209-10 0.1901 1869
j12012-1* 0.2179 4318 349
j1201-2 0.2203 1671
j12024-2* 0.3397 2891 280
j12058-1 0.4489 4236

Table 2-25: Optimal Sample Size Predictions for All Networks of Interest

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
272

Analysis of Network of Equal Complexities and Equal Sizes/Different Sizes

This study found the following in conjunction with the various figures used to graphically analyze

the mean deviation curves and the summary Table 2-26 below. Despite the normalization method,

networks of equal size and RT value do not necessarily have similar 𝑛 values except for smaller

project networks. With Norm I, the difference increases in magnitude as p increases. Since 𝑛 is

an increasing function of p, pairs of intergroup networks such as (j3032-4, j6040-5) or (j901-3,

12012-1) having equal or approximate RT values will not necessarily end up with identical 𝑛

values.

𝑛
j30 ̶ RT = 0.4597 j60 ̶ RT = 0.4030 j90 ̶ RT= 0.217 j120 ̶ RT = 0.1901
Norm j3011-1 j3024-8 j6020-7 j6028-9 j9010-5 j901-3 j1205-5 j1209-10

Norm I 106 106 840 619 1009 691 1593 1869

Norm II 50 49 137 124 175 161 No results

Table 2-26: Optimum Sample Sizes of Networks of Equal Complexities

Effect of the Significance level (α) on the Optimum Sample Size 𝑛

For the four networks j3032-4, j6040-5, j901-3, and j12012-9 selected for additional experiments

as per Table 2-23, the results in Table 2-25 indicate an increase in the significance level α from
, ,
0.05 to 0.1 or 0.2 in the expression of the sample covariance matrix 𝑺 , increased the value of

the sample size 𝑛 . Unfortunately, the experiment was limited to the normalization approach

Norm I due to time constraints. For example, for network j3032-4, the value of 𝑛 increased

from 239 to 266 (𝛼 0.1) or 298 (𝛼 0.2). These values are also recorded in the since additional

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
273

simulations were ran for the same network. As a result, it can be inferred that 𝑛 is an increasing

function of the significance level α.

, ,
Effect of the Significance level α on the Sample Covariance Matrix 𝑺

According to Table 2-23, another experiment was performed utilizing the small size network

j3032-4. Both normalization methods were used for this experiment, but α was left out of the

sample covariance matrix's expression (see Equation 2.64) by setting it to one. In each

normalizing scenario, the patterns of the three plotted graphs required to obtain the optimum

sample size were consistent with the preceding ones. The resulting values of 𝑛 , as given in

Table 2-27 below, illustrate that the absence of alpha resulted in substantially greater values of

𝑛 for Norm I and higher values of 𝑛 for Norm II.

𝑛 for Network j3032-4

Alpha
Norm No Alpha 0.05 0.1 0.2
Norm I 449 239 266 298
Norm II 83 65

Table 2-27: Effect of α on the Optimal Sample Size of Network j3032-4

This finding is consistent with those obtained for larger networks j60, j90, and j120, not included

in this analysis. Experimenting with a representative of each of the four sets of networks was

necessary to determine the optimal formula for the sample covariance matrix, among others

available in the literature, some of which are in Appendix G. In the absence of a supercomputer

and given the time constraints associated with completing this simulation task, choosing a sample

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
274
, ,
covariance matrix 𝑺 expression that required fewer runs while still producing solid results was
, ,
critical. As a result, it was crucial to include α in constructing 𝑺 , as it is done in practice when

designing confidence intervals; otherwise, running all the simulations required for this study would

have taken significantly longer.

Graphical Analysis Conclusion

The empirical investigation, which involved 100 simulation runs for each network chosen for this

study, has found a universal pattern for project network schedules in project network schedules.

This trend was unexpectedly discovered after plotting the scatterplots of the normalized largest

eigenvalue of sample covariance matrices against the sample sizes required to produce the matrices

from EF times of project network activities. While each activity's durations were independently

randomized from a triangular distribution with known parameters, the sampling of the activities’

joint durations required to generate a sample data matrix was unknown but assumed till proven to

be the Tracy-Widom limit law of type 1. The same sampling distribution governs the selection of

each row of the sample data matrix, which constructs the sample covariance matrix. Regardless of

the approach used to standardize the sample covariance matrix's first eigenvalue, the distinctive

uncovered pattern was consistent across networks of similar size and complexity. Remarkably,

regardless of the normalization approach used, the same trend was persistent with networks of

varying sizes and complexities. Subsequently, graphs of deviations of means or variances of the

observed first largest eigenvalue of sample covariance matrices from the hypothesized distribution

mean or variance versus the data matrix sample size revealed two distinct patterns consistent with

the previous one. While the unveiled pattern is concave upward for the deviation of means,

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
275

comparable to a stress-strain curve used in engineering to calculate the yield point of any given

material, the pattern is convex downward for the deviation of variances. The visual examination

of several curves illustrating both trends revealed that the mean deviation curves were more

reliable than those of variance deviation curves in determining the optimal sample sizes.

Additionally, the analysis of simulation data showed that the sample size increases as the network

size increases. Further, when the sample covariance matrix was a function of α, the study found

that the optimum sample size necessary to derive a sample covariance matrix whose largest

eigenvalue’s statistic mean coincides with the assumed distribution’s mean, is a rising function of

the significance level α required to test the distributional assumption set forth. Moreover, the

analysis found that the optimum sample sizes obtained using the normalizing approach Norm I

were considerably more significant than those obtained using the normalization method Norm II,

particularly for larger networks. Finally, when the sample covariance matrix was no longer stated

in terms of α, the optimum sample size values grew considerably larger in the case of Norm I.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
276

2.5.2.5 Goodness-of-Fit Test for Project Networks’ Distributional Assumption Testing

Preliminary

After conducting a graphical data analysis in the preceding part to determine the optimal sample

size n, this section uses n to calculate the test statistics necessary to verify the distributional

assumption made for project network activities’ durations. In other words, this section is concerned

with determining whether the limiting probability distribution of project network durations is

governed by the Tracy-Widom limit law s of type 1. Section 2.4.6.1 in step 8 of the devised

methodology describes systematically running a series of 1000 or 10000 simulations to schedule

a project network systematically. Considering that a normalization method and a network of size

p have been selected, each simulation run allowed the creation of a sample data matrix which

served to compute the sample covariance matrix and then derived its four normalized first largest

eigenvalues. As previously stated, the two identified normalizing methods, based on the

universality of the Tracy-Widom distribution laws, aided in the standardization of eigenvalues of

interest. A significant number (1000 or 10000) of empirical order statistics produced from sample
, ,
covariance matrices 𝑺 helped invalidate the distributional assumption in a multivariate

statistical analysis based on hypothesis testing. This assumption is stated in terms of a null

hypothesis test in step 9 of the procedure provided in Section 2.4.6.1 and restated in Equation

2.67 supplied below:

𝐻 :𝐹 𝑥 𝐹 𝑥

against the alternative Eq. 2-67


𝐻 :𝐹 𝑥 𝐹 𝑥

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
277

where F is the limiting true probability distribution of project network schedules in terms of

probabilistic EF times of its p activities, and F1 is the Tracy-Widom distribution of type 1.

Based on carefully selected four significance levels α of 0.01, 0.05, 0.10, and 0.20, the well-known

Kolmogorov-Smirnov (K-S) goodness-of-fit test enabled the testing of the above hypotheses.

Regarding the selected four significance levels, their corresponding hypothesis tests have been

denoted by KS-test 1, KS-test 2, KS-test 3, and KS-test 4 in Table 2-28 below, providing the

critical value 𝒄𝒏,𝜶 of the K-S test statistic D2.

Level of Significance (α) / Probability ℙ 𝐷 𝑐 , 𝛼


0.01 / 0.99 0.05 / 0.95 0.10 / 0.9 0.20 / 0.80
KS-test 1 KS-test 2 KS-test 3 KS-test 4

Formula 1.63/√𝑁 1.36/√𝑁 1.22/√𝑁 1.07/√𝑁

Value 0.05155 0.04301 0.03858 0.03384

Table 2-28: A Few Percentage Points of the Kolmogorov-Smirnov Test Statistics


(Adapted from Massey 1951)

This is an exploratory study using random samples, so conducting hypothesis testing on three α

values would diversify the acceptance region of the test. Broadening the acceptance region would

figuratively increase the likelihood of tuning into the proper or “universal” radio station. At this

station, one can listen to wonderful and never-ending music audibly. The same logic applies to

determining the optimal sample size. Instead of conducting simulations with only the optimal

sample size, the study included two or four additional sample sizes to create a sequence of numbers

with the optimal sample size at the center. Adding numerals in one-digit increments before and

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
278

after the center aided in determining the sequence's remaining values. As a result, experiments

were conducted using five or three sample sizes, depending on the size and complexity of each

network. The preceding section's empirical investigation established that the computational time

required to execute a specified number of simulations on a particular network is proportional to its

size and complexity.

Identifying the Tables' Results

The next sequel clarifies the test findings presented in various tables in the context of the preceding

development. Due to the variety of parameters evaluated during the series of simulations, the

simulation outputs supplied in the tables have been categorized according to the normalization

approach used to standardize the sample covariance matrix's four largest eigenvalues.

Additionally, the findings have been grouped according to the matrix's first, second, and following

largest eigenvalues. For example, if the preceding section classified networks according to their

sizes, this section would organize simulation outputs into tables according to the normalization

technique and rank the largest eigenvalue of interest, regardless of the network's size or

complexity. As a result, a table with missing values for any given group of networks indicates that

the K-S test failed for at least one of those networks. In other words, each table contains only

networks that pass the K-S test.

For example, no satisfactory test results were seen with any of the j120 networks chosen for the

study in the accompanying Table 2-29 below, which provides test results for Norm I regarding the

first greatest eigenvalue of the sample covariance matrices of project networks. Additionally, the

absence of a table intended to contain the findings of a specific normalizing method for the first,

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
279

second, and subsequent greatest eigenvalue shows that the trials failed to produce valid test results.

The same argument applies to any identified networks for this investigation that are not included

in the following table. Each table is made up of eight columns.

Significance level α / Probability P


0.01 / 0.99 0.05 / 0.95 0.10 / 0.9 0.20 / 0.80
Network n RT D₂ KS-test 1 KS-test 2 KS-test 3 KS-test 4
j3037-6 114 0.6875 0.0405 1 1 0 0
j3048-2 146 0.6512 0.0469 1 0 0 0
j305-7 120 0.3387 0.0472 1 0 0 0
j6015-1 834 0.3384 0.0487 1 0 0 0
j6015-1 835 0.3384 0.0253 1 1 1 1
j6015-1 836 0.3384 0.0490 1 0 0 0
j6020-7 841 0.4030 0.0301 1 1 1 1
j6020-7 842 0.4030 0.0450 1 0 0 0
j6035-8 620 0.4595 0.0286 1 1 1 1
j6035-8 621 0.4595 0.0396 1 1 0 0
j9014-5 1053 0.2210 0.0422 1 1 0 0
j9014-5 1054 0.2210 0.0405 1 1 0 0
j902-4 1196 0.2310 0.0366 1 1 1 0

Selected Networks Ho accepted

Table 2-29: Kolmogorov-Smirnov Test of Goodness of Fit for the 1st Largest Eigenvalue of
Project Networks

The first column, under 'Network,' contains a list of all networks that passed the K-S test. The blue

cells show networks for which more than one sample size out of five or three successfully

performed the K-S test. In this scenario, the value of n in the second column of the table refers to

the sample size that resulted in a greater number of positive tests out of the four tests labeled KS-

test 1 through KS-test 4 and a lower D2 value.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
280

1000 Simulations
Normalized 1st Eigenvalues (Norm II, 0.025)
Significance level α / Probability P
0.01 / 0.99 0.05 / 0.95 0.10 / 0.9 0.20 / 0.80
Network n RT D₂ KS-test 1 KS-test 2 KS-test 3 KS-test 4
j3011-1 48 0.4597 0.0325 1 1 1 1
j3011-1 49 0.4597 0.0162 1 1 1 1
j3011-1 50 0.4597 0.0308 1 1 1 1
j3024-8 47 0.4597 0.0249 1 1 1 1
j3024-8 48 0.4597 0.0168 1 1 1 1
j3024-8 49 0.4597 0.0211 1 1 1 1
j3034-10 50 0.5907 0.0444 1 0 0 0
j3034-10 51 0.5907 0.0423 1 1 0 0
j3034-10 52 0.5907 0.0256 1 1 1 1
j3034-10 53 0.5907 0.0218 1 1 1 1
j3037-6 49 0.6875 0.0269 1 1 1 1
j3037-6 50 0.6875 0.0494 1 0 0 0
j3037-6 51 0.6875 0.0471 1 0 0 0
j3038-5 53 0.6169 0.0219 1 1 1 1
j3038-5 54 0.6169 0.0285 1 1 1 1
j3038-5 55 0.6169 0.0363 1 1 1 0
j3038-5 56 0.6169 0.0367 1 1 1 0
j3038-5 57 0.6169 0.0454 1 0 0 0
j3041-8 50 0.5786 0.0490 1 0 0 0
j3041-8 52 0.5786 0.0479 1 0 0 0
j3041-8 53 0.5786 0.0306 1 1 1 1
j3041-8 54 0.5786 0.0321 1 1 1 1
j305-7 49 0.3387 0.0200 1 1 1 1
j305-7 50 0.3387 0.0384 1 1 1 0
j305-7 51 0.3387 0.0493 1 0 0 0
j9010-5 174 0.2174 0.0372 1 1 1 0
j9010-5 175 0.2174 0.0441 1 0 0 0
j9010-5 176 0.2174 0.0392 1 1 0 0
j905-3 178 0.2000 0.0403 1 1 0 0
j905-3 179 0.2000 0.0381 1 1 1 0
j905-3 180 0.2000 0.0419 1 1 0 0
j905-3 176 0.2000 0.0384 1 1 1 0
j905-3 178 0.2000 0.0353 1 1 1 0
j12024-2 279 0.3397 0.0348 1 1 1 0
j12024-2 280 0.3397 0.0316 1 1 1 1
j12024-2 281 0.3397 0.0371 1 1 1 0

Table 2-30: K-S Test of Goodness of Fit for the 4th Largest …

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
281

An orange, gold, blue, or green left brace, such as in Table 2-30 above provided, has been added

to group rows belonging to a j30, j60, j90, or j120 network with more than one successful sample

size. A brace with dashes denotes an unintentional repetition of the trials for the specified network.

The study sought to conduct only one experiment per network for each chosen normalization

approach, with a total of 1000 or 10000 simulation runs. For example, only two of the five sample

sizes considered for network j9014-5 produced successful findings. See Appendix E.9 to illustrate

all five outputs for network j9014-5. Also, the same appendix contains a companion table that

summarizes each sample's median, mode, mean, variance, skewness, and kurtosis.

The third column, called 'RT,' gives additional information about the complexity of each network

in terms of its RT value, which will serve to analyze the K-S test findings. Finally, the fourth

column contains the D2 test statistic's values. The information in the table's final four columns was

generated by comparing each observed test statistic to the critical values in Table 2-28. In the

table's four columns, a green cell carrying a "1" indicates a successful outcome for the two-tailed

K-S test. An uncolored cell with a "0" in it, on the other hand, means rejection of the null

hypothesis H0. With retrospective to Section 1.9 on the NHST, it is worth noting that rejecting H0

or failing to reject H1 does not imply that H0 is false and H1 is true. Instead, rejecting H0 means

insufficient data to support either H0 or H1. Additionally, rejecting the null hypothesis H0

constitutes a type I error.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
282

Analyzing the results


Next, one may analyze the K-S test results on the null hypothesis provided by Equation 2.67's

expression and refer to Table 2-31, Table 2-32, and the ones provided in Appendix F.7 and

Appendix E.8. Each table shows the results of the first and second normalization methods (Norm

I and Norm II). These attractive tables were also made by encoding all results with the letters "A,"

"B," C,” and “D,” such that they fit into a single table for each normalization method's results. The

letters "A," "B," "C," or "D" in the row of a given network under any of the columns labeled with

KS-test 1 through KS-test 4 signifies that the experimentation conducted with respectively the first

four ranked largest eigenvalue of the sample covariance matrix has resulted in the acceptance of

the null hypothesis H0. The acceptance of the test with a 100 𝛼 % confidence level pertains only

to the network in question. It is important to note that the most significant eigenvalues were

normalized using Norm I or Norm II. For each successful K-S test, the parenthesized value

following any of the four letters indicates the sample size, resulting in the acceptance of H0 for a

given network.

Regardless of the network size and the eigenvalue rank, each table categorizes results by network

size. The RT values are also included to facilitate the analysis. At first glance of both tables

arranged side by side, Norm II resulted in more successful results than its counterpart did. Norm I

yielded mainly As and Bs, while Norm II mainly yielded Cs and Ds. The Cs recorded in the rows

of both networks j3041-8 and j3041-8 for Norm-I and the Bs reported in the row of network j3038-

5 for Norm II are exceptions to the rule.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
283

Passed KS test (Norm I)


Significance level α / Probability P
0.01 / 0.99 0.05 / 0.95 0.10 / 0.9 0.20 / 0.80
Network RT KS-test 1 KS-test 2 KS-test 3 KS-test 4
j305-7 0.3387 A(120) B(118)
j3012-6 0.3952
j3011-1 0.4597 B(107) B(107) B(107) B(107)
j3024-8 0.4597 B(104), C(104) C(104) C(104) C(104)
j3032-4* 0.5161
j3041-8 0.5786 B(124), C(125) B(124), C(125) C(125) C(125)
j3038-7 0.5806 B(129)
j3034-10 0.5907
j3038-5 0.6169
j3048-2 0.6512 A(146), B(143)
j3037-6 0.6875 A(114), B(116) A(114)
j602-7 0.2538
j6010-5 0.3157
j6015-1 0.3384 A(835) A(835) A(835) A(835)
j6020-7 0.4030 A(841) A(841) A(841) A(841)
j6028-9 0.4030
j6035-8 0.4595 A(620) A(620) A(620) A(620)
j6040-5 0.5161
j6042-6 0.5738
j905-3 0.2000
j9010-5 0.2174
j901-3 0.2179
j9014-5 0.2210 A(1054) A(1054)
j902-4 0.2310 A(1196) A(1196) A(1196)
j9031-1 0.3395
j9045-1 0.4599
j9037-7 0.4993
j12014-1 0.1780
j12012-9 0.1862
j1205-5 0.1901
j1209-10 0.1901
j12012-1 0.2179
j1201-2 0.2203
j12024-2 0.3397
j12058-1 0.4489
A=1st Eigenvalue B=2nd Eigenvalue C=3rd Eigenvalue D=4th Eigenvalue

Table 2-31: K-S Test of Goodness of Fit – All Test Results with Norm I

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
284

Passed KS test (Norm II)


Significance level α / Probability P
0.01 / 0.99 0.05 / 0.95 0.10 / 0.9 0.20 / 0.80
Network RT KS-test 1 KS-test 2 KS-test 3 KS-test 4
j305-7 0.3387 D(49) D(49) D(49) D(49)
j3012-6 0.3952
j3011-1 0.4597 C(50), D(49) C(50), D(49) C(50), D(49) C(50), D(49)
j3024-8 0.4597 D(48) D(48) D(48) D(48)
j3032-4* 0.5161
j3041-8 0.5786 D(53) D(53) D(53) D(53)
j3038-7 0.5806 C(52) C(52)
j3034-10 0.5907 C(52), D(53) C(52), D(53) C(52), D(53) C(52), D(53)
j3038-5 0.6169 B(53), D(53) B(53), D(53) B(53), D(53) B(53), D(53)
j3048-2 0.6512
j3037-6 0.6875 C(52), D(49) C(52), D(49) D(49) D(49)
j602-7 0.2538
j6010-5 0.3157
j6015-1 0.3384
j6020-7 0.4030
j6028-9 0.4030 C(126) C(126) C(126)
j6035-8 0.4595
j6040-5 0.5161
j6042-6 0.5738 C(119) C(119) C(119) C(119)
j905-3 0.2000 D(178) D(178) D(178)
j9010-5 0.2174 C(175), D(174) C(175), D(174) D(174)
j901-3 0.2179
j9014-5 0.2210
j902-4 0.2310
j9031-1 0.3395
j9045-1 0.4599
j9037-7 0.4993
j12014-1 0.1780 C(261) C(261) C(261) C(261)
j12012-9 0.1862
j1205-5 0.1901
j1209-10 0.1901
j12012-1 0.2179 C(348) C(348)
j1201-2 0.2203
j12024-2 0.3397 D(280) D(280) D(280) D(280)
j12058-1 0.4489
A=1st Eigenvalue B=2nd Eigenvalue C=3rd Eigenvalue D=4th Eigenvalue

Table 2-32: K-S Test of Goodness of Fit – All Test Results with Norm II

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
285

Except for the j30 networks, only significant eigenvalues were assessed in the first normalization

approach (Norm I), which is why there are no letters other than "As" in any rows of the three sets

of networks j60, j90, and j120. Probably if the second through the fourth largest eigenvalues were

collected and tested under Norm I, Table 2-31 would have been packed. Since all four most

significant Eigenvalues of j30 networks were evaluated, it was shown that the results obtained with

the networks under Norm II still exceeded those obtained with Norm I. As a result, the K-S test

yielded better results for Norm II than Norm I did. The graphs in Figure 2.19 depict all results.

Norm/Scaling I

j120 0
j90 3 0
j60 3 0
j30 3 7 2
0 2 4 6 8 10 12

Norm I A Norm I B Norm I C Norm I D

Norm/Scaling II

j120 2 0
j90 1 2
j60 2 0
j30 4 7

0 2 4 6 8 10 12

Norm II A Norm II B Norm II C Norm II D

Figure 2.19: Illustrations of the K-S test Results for all Project Networks

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
286

Regarding the set of networks whose tests resulted in more acceptance of the null hypothesis H0,

the j30 networks outperformed the largest networks for both normalization methods, with more

successful results obtained with Norm II. The previous statement is more accurate with the first

Norm-I than Norm II since only a few networks were tested using Norm II. These networks' names

have an Asterix suffix. Note that for a given network, the sample size is not always equal to 𝑛

and may differ from one eigenvalue rank to another rank. For example, the following illustration

is regarding Norm II (see Table 2-32). While a sample size of 53 produced successful test results

for network j3038-5 with the second (B) and third (D) largest eigenvalues at all significant levels

of α, the sample size that produced successful test results for network j3037-6 fluctuated around

its optimum sample size.

The analysis of the results with emphasis to the α level under which any given K-S test was

performed suggests that including α in the expression of the sample covariance matrix to build an

acceptance region or confidence interval has a significant effect on the results. Linking the design

of the sample covariance matrix to the significance level α of the two-tailed hypothesis test

suggests that the results obtained under the Null Hypothesis H0 would be correct and significant

at the confidence level of (1-α) %. As a result, having more successful outcomes with tests

performed with a significance level smaller or equal to α is reasonable, which explains the direction

of the test results in Table 2-31 and Table 2-32. For instance, concerning Table 2-32 (Norm II),

the K-S test results for network j12012-1 have resulted in significant results when performed at

the significance levels of 0.01 and 0.05, lesser than equal to the level α of the test. Given the

construction of the sample covariance matrix, only the K-S test 2 should have been conducted.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
287

Concerning networks of equal size and complexity, the results suggest that these equalities cannot

infer the performance of the network's K-S test results regardless of the normalization method and

rank of the largest eigenvalue. For instance, using either of the tables' results obtained with

networks of equal sizes and complexities j3011-1 and j3024-8 help illustrate this conclusion.

Surprisingly, the test results suggest similar K-S test performances for Norm-I but not for Norm-

II for networks of similar complexity but different sizes. For instance, for the second normalization

method (Norm II), in the absence of test results with network j6040-5 for comparison with network

j3032-4, comparing the K-S test performances of network j901-3 with no result against network

j12012-1 with successful results should help to illustrate the finding.

Lastly, to verify the effect of either removing alpha from the expression of the sample covariance

matrix, changing alpha from 0.05 to 0.1 or 0.2 in the formula of the sample covariance matrix, or

increasing the simulation runs from 1000 to 10000 would have on the results, additional

experiments were performed independently using network j3032-4. Before providing their

outcomes, it is notable stating the reasons for choosing network j3032-4, among others, as an

excellent candidate for these experiments. First, no successful results were obtained with j3032-4,

whether with any normalization methods. Lastly, due to its small size, which requires less

computational time than the larger ones, the necessary simulations can be available quickly.

Nevertheless, none of the three changes stated earlier resulted in any improvement at all levels

from the simulation results. The test performances of the experiments performed under Norm-I are

consistent with the observations made earlier. That is mainly of types A and B. The same applies

to the results obtained with the second normalization method (Norm II), mainly of types B, C, and

D.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
288

2.5.2.6 Validations of K-S Test Results

Q-Q plots and histograms served to validate the successful results obtained with the K-S test.

Figure 2.20 and Figure 2.22 are samples of Q-Q plots obtained with Norm I and Norm II for a few

networks of different sizes and complexities. In addition, Figure 2.21 and Figure 2.23 depict their

associated histograms. More graphs of Q-Q will be provided in the following chapter to conduct

PCA. Nevertheless, all the plots corroborate the hypothesis testing results. In other words, with

proper scaling of the largest eigenvalues, the Tracy-Widom distribution of type 1 is the true

limiting distribution law of project network schedules in question.

For network j3032-4 which failed to accept H0 regardless of the normalization method used,

constructing the Q-Q plots and histograms of the test statistics generated with each experiment to

understand the results better. The observed data frequencies were low compared to the

hypothesized probability distribution frequencies. Therefore, any of the stated changes could not

improve the results.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
289

Figure 2.20: Q-Q Plots of Networks j3011-1 (Norm I) and j3038-5 (Norm II)
290

Figure 2.21: Histograms of Networks j3011-1 (Norm I) and j3038-5 (Norm II)

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
291

Figure 2.22: Q-Q Plots of Networks j6015-1 (Norm I) and j12024-2 (Norm II)
292

Figure 2.23: Histograms of Networks j6015-1 (Norm I) and j12024-2 (Norm II)

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
293

2.5.2.7 Simulation Study Conclusion

The simulation study conducted with the objective of elucidating the underlying behavior of

project network schedules produced three noteworthy discoveries. The first finding concerned the

uncovering of a universal pattern for project network schedules involving 100 simulation runs for

each network chosen for this study. This trend was unexpectedly discovered after plotting the

scatterplot of the largest eigenvalue of sample covariance matrices against the sample sizes

required to produce the matrices from EF times of project network activities. While each activity's

durations were independently randomized from a triangular distribution with known parameters,

the sampling of the activities’ joint durations required to generate a sample data matrix is unknown

but assumed till proven to be the Tracy-Widom limit law of type 1. The same sampling distribution

governs the selection of each row of the sample data matrix, which constructs the sample

covariance matrix. Regardless of the normalizing approach used to standardize the sample

covariance matrix's first eigenvalue, the distinctive uncovered pattern is consistent across networks

of similar size and complexity. Remarkably, regardless of the normalization approach used, the

same trend was persistent with networks of varying sizes and complexity. Subsequently, graphs of

deviations of means or variances of the observed first largest eigenvalue of sample covariance

matrices from the hypothesized distribution mean or variance versus the data matrix sample size

revealed two distinct patterns consistent with the previous one.

Both patterns were derived from plotting sample sizes versus deviations of the empirical

distribution's mean of the sample covariance matrices' largest eigenvalues from the hypothesized

distribution's mean, and sample sizes versus deviations of the observed distribution's variance of

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
294

the sample covariance matrices' first largest eigenvalue from the hypothesized distribution's

variance. The study discovered that the pattern derived using the means of the empirical and

assumed distributions was more stable and adequate for determining the optimal sample size

required to validate the distributional assumption for any identified network. Due to the striking

similarities between a stress-strain curve and the newly discovered pattern, the point of intersection

between the mean deviation curve and the horizontal axis at zero deviation, whose abscissa is

referred to as the optimum sample size for a given network, may be regarded as the yield point on

a stress-strain curve. Given that the yield point, which is commonly used in materials science and

engineering, denotes the boundary between elastic and plastic behavior, the optimum sample size

appears to be acceptable for verifying the limiting distributional assumption stated for project

network scheduling.

The second conclusion addressed the normalizing procedure required to standardize the four most

significant eigenvalues of the sample covariance matrix utilized as test statistics and related to each

of the study's project networks. A comparative analysis of the normalization methods based on the

universality of the Tracy-Widom limit law of type 1 revealed that the scaling formulas developed

by Baik et al. (1999) and Johansson (1998) are more appropriate for studying the behavior of

project network schedules than the one devised by Johnstone (2001). This discovery is significant

because the chosen approach was obtained from studying the length of the longest increasing

sequence of random permutations, which is analogous to the critical path sequentially connecting

all critical activities to form the project network's longest path.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
295

The third significant finding of the study is that the sampling probability distribution of project

network schedules corresponds to the probability distribution of EF times for project network

activities. The Kolmogorov-Smirnov goodness-of-fit test was used to validate the distributional

assumption made about the sampling distribution. The test was created to determine whether the

Tracy-Widom limit law of type 1 is the natural sampling distribution of project network schedules

as sample size approaches a limit. This limiting size corresponds to the optimal sample size

determined from a curve with a universal pattern. When the third and fourth largest eigenvalues of

sample covariance matrices were considered, the null hypothesis was accepted for 19 of the 21

networks investigated. The test performed considerably worse with the first largest eigenvalue than

with the second largest eigenvalue. To corroborate these findings, the Q-Q plots and histograms

used to visualize the simulation data demonstrated that the TW of order 1 is a good fit for the

sample distribution of project network schedules. After the null hypothesis for the networks was

rejected, a graphical representation of their Q-Q plots revealed that suitable rescaling and

recentering of the mth greatest eigenvalue would undoubtedly improve their test performance. This

conclusion also applies to all 35 project networks identified for this investigation, of which only

18 resulted in the null hypothesis being accepted in the case of Johnstone (2001)’s normalization

method.

2.6 Research Contributions and Recommendations

The following sections summarize the chapter's primary contributions to the body of knowledge.

The chapter's objective was to evaluate the evidence for a population covariance structure in

project network schedules through an empirical analysis of probabilistic project network

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
296

schedules. Due to the network dependency structure created by the relationships between project

activities, construction management, and engineering practitioners have always suspected this

covariance structure. The aims of this study have been met as a result of well-known universality

theorems in mathematics and physics, resulting in the following contributions.

2.6.1 Research Contribution 1

Append construction project management and engineering to the list of fields where the Tracy-

Widom limit laws, based on Random Matrix Theory (RMT), have effectively analyzed high

dimensionality complex systems.

2.6.2 Research Contribution 2

Propose a mathematical model for project network schedules based on well-established results in

probability and statistics and project scheduling techniques that may be used to study their

behavior and thus improve existing scheduling techniques.

2.6.3 Research Contribution 3

Facilitate testing hypotheses about the durations of project network activities and the whole project

based on the newly discovered pattern for project network schedules.

2.6.4 Research Contribution 4

Devise a methodology based on multivariate statistical analysis and graphical methods for data

analysis that can be used to determine the limiting duration of a project and of each activity

comprising the project network schedule, beyond which any delay will be irreversible. The Tracy-

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
297

Widom distribution has been established for this study as the limiting distribution of project

network schedules.

2.6.5 Research Contribution 5

Initiate a research study of the connections between a measure of project network complexity and

the sample size required to draw an appropriate number of samples from a population of identically

distributed activity durations as a requirement for studying project networks using RMT.

2.7 Recommendations for Future Research

Researchers in construction management and engineering have been working intensively to

resolve the ubiquitous problem of project delays. To aid in their efforts, this research study, which

opened new avenues for studying the behavior of project network schedules, began with numerous

unknowns to venture into the unexplored ground. Aware of the limitations of deterministic

scheduling techniques that have been demonstrated to be ineffective at resolving the problem of

delays, this research study provided an opportunity to explore other related fields.

These fields have been utilizing modern mathematics to investigate the underlying behavior of

complex systems and devise practical solutions to persistent problems. In contemporary

mathematics, the study of the underlying behavior of any complex system begins with the

construction of a mathematical model that defines the sample space representing the collection of

all outcomes for any given measurable attribute (variable) of the population under study.

Subsequently, and establishing the probability distribution for drawing samples (events) from the

entire population to create sample data matrices and subsets.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
298

In most cases, the joint sampling distribution of the population data is unknown, but the

distribution of individual population variables is always known (triangular distribution in the case

of this study). The majority of well-known RMT theorems necessitate the creation of sample data

matrices to compute sample covariance matrices crucial for determining the intrinsic covariance

matrix of the population representing the complex system under investigation. The real covariance

matrix of a system functions as a signature, a distinguishing feature that demonstrates its

complexity and sufficient correlation in its structure. Testing hypotheses based on the eigenvalues

of sample covariance matrices is a technique used in multivariate statistical analysis to determine

the system's actual covariance matrix.

When employing RMT, not all sample covariance matrices may be used to investigate a given

system's underlying behavior. Hence, its application requires proper standardization of sample data

matrices and formulation of the sample covariance matrices. The resulting matrix belongs to one

of the well-known classes of matrices (Wigner, 𝛽-ensembles) to apply well-established universal

laws in probability and statistics (e.g., the Wigner semi-circle law). Due to the study's primary

objective, utilizing the universality of the Tracy-Widom limiting rules provides a potential method

for finding solutions to the delay problem. These laws specify the limiting probability distribution

of the first, second, third, or subsequent greatest eigenvalue of the sample covariance matrix that

is correctly normalized.

Given a project network schedule, the universality of the TW distribution laws is likely to aid in

defining a duration threshold that should not be exceeded for a project to be completed on time

and under its assigned budget. However, as with any other law, the TW's universality restricts its

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
299

applicability (e.g., Gaussian entries for sample data matrices). Owning it to the contributions of

numerous authors (e.g., Soshnikov 2002, Bao et al. 2015), most of the restrictions have been lifted.

Relaxing these assumptions enables it to be applied to a broader class of matrices that are not

necessarily Gaussian or have a near-unit ratio of the total number of rows to the total number of

columns.

When combined with a suitable standardization of the largest eigenvalue of the sample covariance

matrix for project networks, the proposed model demonstrated that the universality of the TW limit

law of type 1 holds when the eigenvalues are appropriately normalized. As such, the following

recommendations will aid in research work.

2.7.1 Recommendation 1 – Using Larger and Real-Life Project Network Schedules

While the current study used only project network schedules from the Project Scheduling Problem

Library (PSPLIB), whose maximum size is 120, future research should extend the analysis to

include larger network schedules from fictitious and real-life projects.

2.7.2 Recommendation 2 – Considering other Measures of Complexity

Because this analysis revealed no correlation between restrictiveness (RT) and the number of

samples required to satisfy the conditions of applying TW-based universal theorems, future

research should investigate alternative complexity metrics, such as the other five identified by this

study.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
300

2.7.3 Recommendation 3 – Using a Different Normalization Approach

While the normalization approach utilized in this study was specified as a function of n and p using

Johnstone's (2001) celebrated theorem, especially its ad hoc version, future studies might explore

employing a more extended formulation of the centering and scaling functions. Péché (2008), who

expanded on Soshnikov's work, provides an example of such formulation. Using such a

formulation is anticipated to help improve the performance of hypothesis testing necessary to

validate distributional assumptions in circumstances where the sample matrix covariance's first or

second largest eigenvalue and a supercomputer are available for speedy simulations.

2.7.4 Recommendation 4 – Extending to Include the Next, Next Largest Eigenvalue

Future research may include extending the study to at least the fifth and sixth greatest eigenvalues

when employing the normalization method derived from the work of Johansson (1998) and Baik

et al. (1999). Bornemann (2009) determined the numerical approximations to the Tracy-Widom

distributions (CDF) statistics up to the sixth greatest eigenvalue, allowing for this expansion.

2.8 Conclusion

This chapter concludes with suggestions for future research studies and formulation of the current

study work's contributions to the corpus of knowledge. The comprehensive empirical study

formulated by adopting and adapting proven methodologies from other areas of application of the

TW limit laws contributed to achieving the chapter's objectives, which were defined based on the

scope of its investigation.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
References

Al Sarraj, Z. M. (1990). "Formal development of line-of-balance


technique." J.Constr.Eng.Manage., 116(4), 689-704.
Anderson, T. W. (. (2003). An introduction to multivariate statistical analysis. Wiley-
Interscience, Hoboken, N.J.
Bachmat, E., Berend, D., Sapir, L., Skiena, S., and Stolyarov, N. (2005). "Analysis of airplane
boarding via space-time geometry and random matrix theory." Arxiv Paper: Physics/0512020, .
Bai, Z. D., and Yin, Y. Q. (1988). "Convergence to the semicircle law." The Annals of
Probability, 16(2), 863-875.
Baik, J., Borodin, A., Deift, P., and Suidan, T. (2006). "A model for the bus system in
Cuernavaca (Mexico)." Journal of Physics A: Mathematical and General, 39(28), 8965.
Baik, J., Deift, P., and Johansson, K. (1999). "On the distribution of the length of the longest
increasing subsequence of random permutations." Journal of the American Mathematical
Society, 12(4), 1119-1178.
Bao, Z., Pan, G., and Zhou, W. (2015). "Universality for the largest eigenvalue of sample
covariance matrices with general population." The Annals of Statistics, 43(1), 382-421.
Bao, Z., Pan, G., and Zhou, W. (2012). "Tracy-Widom law for the extreme eigenvalues of
sample correlation matrices." Electronic Journal of Probability, 17 1-32.
Baryshnikov, Y. (2001). "GUEs and queues." Probability Theory and Related Fields, 119(2),
256-274.
Basor, E. L., Tracy, C. A., and Widom, H. (1992). "Asymptotics of level-spacing distributions
for random matrices." Phys. Rev.Lett., 69(1), 5.
Behan, R. J. (1966). Cost reduction through short interval scheduling. Prentice-Hall.
Bejan, A. (2005). "Largest eigenvalues and sample covariance matrices. Tracy-Widom and
Painlevé II: computational aspects and realization in s-plus with applications." Preprint:
Http://Www.Vitrum.Md/Andrew/MScWrwck/TWinSplus.Pdf, .
Ben Arous, G., and Péché, S. (2005). "Universality of local eigenvalue statistics for some sample
covariance matrices." Communications on Pure and Applied Mathematics: A Journal Issued by
the Courant Institute of Mathematical Sciences, 58(10), 1316-1357.

301

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Benaych-Georges, F., Guionnet, A., and Maïda, M. (2012). "Large deviations of the extreme
eigenvalues of random deformations of matrices." Probability Theory and Related
Fields, 154(3), 703-751.
Binder, K. (1986). "Spin glasses: Experimental facts, theoretical concepts, and open
questions." Reviews of Modern Physics, 58(4), 801-976.
Bohigas, O., de Carvalho, J. X., and Pato, M. P. (2009). "Deformations of the Tracy-Widom
distribution." Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics, 79(3 Pt 1),
031117.
Bornemann, F. (2009). "On the numerical evaluation of distributions in random matrix theory: a
review." arXiv Preprint arXiv:0904.1581.
Borot, G., Eynard, B., Majumdar, S. N., and Nadal, C. (2011). "Large deviations of the maximal
eigenvalue of random matrices." Journal of Statistical Mechanics: Theory and
Experiment, 2011(11), P11024.
Boushaala, A. A. (2010). "Project complexity indices based on topology features.".
Castellana, M., and Zarinelli, E. (2011). "Role of Tracy-Widom distribution in finite-size
fluctuations of the critical temperature of the Sherrington-Kirkpatrick spin glass." Physical
Review B, 84(14),.
Chantaravarapan, S., Gunal, A., and Williams, E. J. (2004). "On using Monte Carlo methods for
scheduling." Proceedings of the 2004 Winter Simulation Conference, 2004. IEEE, 1870-1875.
Chiani, M. (2014). "Distribution of the largest eigenvalue for real Wishart and Gaussian random
matrices and a simple approximation for the Tracy-Widom distribution." Journal of Multivariate
Analysis, 129 69-81.
Collins, B., Gawron, P., Litvak, A. E., and Życzkowski, K. (2014). "Numerical range for random
matrices." Journal of Mathematical Analysis and Applications, 418(1), 516-533.
Colomo, F., and Pronko, A. G. (2015). "Thermodynamics of the six-vertex model in an L-shaped
domain." Communications in Mathematical Physics, 339(2), 699-728.
DasGupta, A. (2005). "The matching, birthday and the strong birthday problem: a contemporary
review." Journal of Statistical Planning and Inference, 130(1), 377-389.
Deift, P. (2006). "Universality for mathematical and physical systems." arXiv Preprint Math-
Ph/0603038.
Deift, P., and Gioev, D. (2007). "Universality at the edge of the spectrum for unitary, orthogonal,
and symplectic ensembles of random matrices." Communications on Pure and Applied
Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 60(6), 867-
910.
302

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Demeulemeester, E., Vanhoucke, M., and Herroelen, W. (2003). "RanGen: A random network
generator for activity-on-the-node networks." Journal of Scheduling, 6(1), 17-38.
Dieng, M. (2005). "Distribution functions for edge eigenvalues in orthogonal and symplectic
ensembles: Painlevé representations." International Mathematics Research Notices, 2005(37),
2263-2287.
Dieng, M., and Tracy, C. A. (2011). "Application of random matrix theory to multivariate
statistics." Random Matrices, Random Processes and Integrable Systems, Springer, 443-507.
Dodin, B. M., and Elmaghraby, S. E. (1985). "Approximating the criticality indices of the
activities in PERT networks." Management Science, 31(2), 207-223.
Dotsenko, V. (2010). "Replica Bethe ansatz derivation of the Tracy-Widom distribution of the
free energy fluctuations in one-dimensional directed polymers." Journal of Statistical
Mechanics: Theory and Experiment, 2010(07), P07010.
Dumaz, L., and Virág, B. (2013). "The right tail exponent of the Tracy-Widom $\ beta $
distribution." Annales de l'Institut Henri Poincaré, Probabilités et Statistiques, Institut Henri
Poincaré, 915-933.
Dyke, P. (2019). "The Prime Number Conspiracy by Thomas Lin." Leonardo, 52(5), 515-516.
Erdős, L., and Yau, H. (2012). "Universality of local spectral statistics of random
matrices." Bulletin of the American Mathematical Society, 49(3), 377-414.
Ergun, G. (2007). "An introduction to random matrix theory." Researchgate.Net, 1-30.
Feldheim, O. N., and Sodin, S. (2010). "A universality result for the smallest eigenvalues of
certain sample covariance matrices." Geometric and Functional Analysis, 20(1), 88-123.
Fente, J., Schexnayder, C., and Knutson, K. (2000). "Defining a probability distribution function
for construction simulation." J.Constr.Eng.Manage., 126(3), 234-241.
Fleming, B. J., and Forrester, P. J. (2011). "Interlaced particle systems and tilings of the Aztec
diamond." Journal of Statistical Physics, 142(3), 441-459.
Forkman, J., Josse, J., and Piepho, H. (2019). "Hypothesis tests for principal component analysis
when variables are standardized." Journal of Agricultural, Biological and Environmental
Statistics, 24(2), 289-308.
Forrester, P. J. (2005). "Spacing distributions in random matrix ensembles." London
Mathematical Society Lecture Note Series, 322 279.
Frahm, G. (2004). " Generalized elliptical distributions: theory and applications." Generalized
Elliptical Distributions: Theory and Applications.

303

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Garrahan, J. P., Stannard, A., Blunt, M. O., and Beton, P. H. (2009). "Molecular random tilings
as glasses." Proc.Natl.Acad.Sci. U.S.A., 106(36), 15209-15213.
Ginibre, J. (1965). "Statistical ensembles of complex, quaternion, and real matrices." Journal of
Mathematical Physics, 6(3), 440-449.
Gravner, J., Tracy, C. A., and Widom, H. (2001). "Limit Theorems for Height Fluctuations in a
Class of Discrete Space and Time Growth Models." Journal of Statistical Physics, 102(5), 1085-
1132.
Gravner, J., Tracy, C. A., and Widom, H. (2002). "A growth model in a random
environment." The Annals of Probability, 30(3), 1340-1368.
Hajiyev, C. (2012). "Tracy-Widom distribution based fault detection approach: Application to
aircraft sensor/actuator fault detection." ISA Trans., 51(1), 189-197.
Harmelink, D. J., and Rowings, J. E. (1998). "Linear scheduling model: Development of
controlling activity path." J.Constr.Eng.Manage., 124(4), 263-268.
Harris, P. E. (2006). Planning Using Primavera Project Planner P3 Version 3. 1 Revised
2006. Eastwood Harris Pty Ltd.
Harris, R. B., and Ioannou, P. G. (1998). "Scheduling projects with repeating
activities." J.Constr.Eng.Manage., 124(4), 269-278.
Hotelling, H. (1933). "Analysis of a complex of statistical variables into principal
components." J.Educ.Psychol., 24(6), 417.
Imamura, T., and Sasamoto, T. (2007). "Dynamics of a tagged particle in the asymmetric
exclusion process with the step initial condition." Journal of Statistical Physics, 128(4), 799-846.
Its, A., and Prokhorov, A. (2020). "On $\beta= 6$ Tracy-Widom distribution and the second
Calogero-Painlev\'e system." arXiv Preprint arXiv:2010.06733, .
Jaafari, A. (1984). "Criticism of CPM for project planning
analysis." J.Constr.Eng.Manage., 110(2), 222-233.
Jagannath, A., and Trogdon, T. (2017). "Random matrices and the New York City subway
system." Physical Review E, 96(3), 030101.
Jakobson, D., Miller, S. D., Rivin, I., and Rudnick, Z. (1999). "Eigenvalue spacings for regular
graphs." Emerging Applications of Number Theory, Springer, 317-327.
Johansson, K. (2007). "From Gumbel to Tracy-Widom." Probability Theory and Related
Fields, 138(1), 75-112.
Johansson, K. (2000). "Shape fluctuations and random matrices." Communications in
Mathematical Physics, 209(2), 437-476.
304

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Johansson, K. (1998). "The longest increasing subsequence in a random permutation and a
unitary random matrix model." Mathematical Research Letters, 5(1), 68-82.
Johansson, K. J. (2001). "Universality of the Local Spacing Distribution¶ in Certain Ensembles
of Hermitian Wigner Matrices." Communications in Mathematical Physics, 215(3), 683-705.
Johnson, R. A., and Wichern, D. W. (2019). Applied Multivariate Statistical Analysis (Classic
Version), 6th Edition. Pearson Prentice Hall, Upper Saddle River, New Jersey.
Johnston, D. W. (1981). "Linear scheduling method for highway construction." Journal of the
Construction Division, 107(2), 247-261.
Johnstone, I. M. (2006). "High dimensional statistical inference and random matrices." arXiv
Preprint Math/0611589.
Johnstone, I. M. (2001). "On the distribution of the largest eigenvalue in principal components
analysis." Annals of Statistics, 295-327.
Johnstone, I. M. (2009). "Approximate Null Distribution of the Largest Root In Multivariate
Analysis." The Annals of Applied Statistics, 3(4), 1616-1633.
Kaplan, C. S. (2009). "Introductory tiling theory for computer graphics." Synthesis Lectures on
Computer Graphics and Animation, 4(1), 1-113.
Karoui, N. E. (2003). "On the largest eigenvalue of Wishart matrices with identity covariance
when n, p and p/n tend to infinity." arXiv Preprint Math/0309355.
Kelley Jr, J. E., and Walker, M. R. (1959). "Critical-path planning and scheduling." Papers
presented at the December 1-3, 1959, eastern joint IRE-AIEE-ACM computer conference, 160-
173.
Kelley, J., and Walker, M. (1989). "The origins of CPM: A personal history." PM Network, 3(2),
7-22.
Kelley, J. E., Jr. (1961). "Critical-Path Planning and Scheduling: Mathematical
Basis." Oper.Res., 9(3), 296-320.
Kemmer, S. L. (2006). "Análise de diferentes tempos de ciclo na formulação de planos de ataque
de edifícios de múltiplos pavimentos." .
Kolisch, R., and Sprecher, A. (1997). "PSPLIB-a project scheduling problem library: OR
software-ORSEP operations research software exchange program." Eur.J.Oper.Res., 96(1), 205-
216.
Krbálek, M., and Seba, P. (2000). "The statistical properties of the city transport in Cuernavaca
(Mexico) and random matrix ensembles." Journal of Physics A: Mathematical and
General, 33(26), L229.
305

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Latva-Koivisto, A. M. (2001). "Finding a complexity measure for business process models." .
Ledoit, O., and Wolf, M. (2004a). "Honey, I shrunk the sample covariance matrix." The Journal
of Portfolio Management, 30(4), 110-119.
Ledoit, O., and Wolf, M. (2004b). "A well-conditioned estimator for large-dimensional
covariance matrices." Journal of Multivariate Analysis, 88(2), 365-411.
Lee, D., Arditi, D., and Son, C. (2013). "The probability distribution of project completion times
in simulation-based scheduling." KSCE Journal of Civil Engineering, 17(4), 638-645.
Liao, S., Wei, L., Kim, T., and Su, W. (2020). "Modeling and Analysis of Residential Electricity
Consumption Statistics: A Tracy-Widom Mixture Density Approximation." IEEE Access, 8
163558-163567.
Lucko, G., Said, H. M., and Bouferguene, A. (2014). "Construction spatial modeling and
scheduling with three-dimensional singularity functions." Autom.Constr., 43 132-143.
Lumsden, P. (1968). The line-of-balance method. Pergamon Press, Industrial Training Division.
Majumdar, S. N., and Schehr, G. (2014). "Top eigenvalue of a random matrix: large deviations
and third order phase transition." Journal of Statistical Mechanics: Theory and
Experiment, 2014(1), P01012.
Malcolm, D. G., Roseboom, J. H., Clark, C. E., and Fazar, W. (1959). "Application of a
technique for research and development program evaluation." Oper.Res., 7(5), 646-669.
Marchenko, V. A., and Pastur, L. A. (1967). "Distribution of eigenvalues for some sets of
random matrices." Matematicheskii Sbornik, 114(4), 507-536.
Massey Jr, F. J. (1951). "The Kolmogorov-Smirnov test for goodness of fit." Journal of the
American Statistical Association, 46(253), 68-78.
May, R. M. (1972). "Will a large complex system be stable?" Nature, 238(5364), 413-414.
Mays, A. (2013). "A Real Quaternion Spherical Ensemble of Random Matrices." J Stat
Phys, 153(1), 48-69.
Mehta, M. L. (2004). Random matrices. Elsevier, .
Mehta, M. L., and Gaudin, M. (1960). "On the density of eigenvalues of a random
matrix." Nuclear Physics, 18 420-427.
Miller, S. J., Novikoff, T., and Sabelli, A. (2008). "The distribution of the largest nontrivial
eigenvalues in families of random regular graphs." Experimental Mathematics, 17(2), 231-244.

306

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Miller, S. J., Novikoff, T., and Sabelli, A. (2006). "The distribution of the second largest
eigenvalue in families of Ramanujan graphs." The Distribution of the Second Largest Eigenvalue
in Families of Ramanujan Graphs.
Muirhead, R. J. (2009). Aspects of multivariate statistical theory. John Wiley & Sons.
Nassar, K. M., and Hegab, M. Y. (2006). "Developing a complexity measure for project
schedules." J.Constr.Eng.Manage., 132(6), 554-561.
Nezval, J. (1958). "Foundations of Flow Production in Construction.".
Norman, G. R., and Streiner, D. L. (2003). PDQ statistics. PMPH USA.
O'Connell, N. (2002). "Random matrices, non-colliding processes and queues." Séminaire De
Probabilités De Strasbourg, 36 165-182.
Pascoe, T. L. (1966). "Allocation of resources CPM." Revue Francaise De Recherche
Operationnele, 10(38), 31.
Pastur, L. A., and Shcherbina, M. (2011). Eigenvalue distribution of large random
matrices. American Mathematical Soc.
Patterson, N., Price, A. L., and Reich, D. (2006). "Population structure and eigenanalysis." PLoS
Genet, 2(12), e190.
Paul, D., and Aue, A. (2014). "Random matrix theory in statistics: A review." Journal of
Statistical Planning and Inference, 150 1-29.
Pearson, K. (1900). "X. On the criterion that a given system of deviations from the probable in
the case of a correlated system of variables is such that it can be reasonably supposed to have
arisen from random sampling." The London, Edinburgh, and Dublin Philosophical Magazine
and Journal of Science, 50(302), 157-175.
Péché, S. (2009). "Universality results for the largest eigenvalues of some sample covariance
matrix ensembles." Probability Theory and Related Fields, 143(3), 481-516.
Péché, S. (2008). "The edge of the spectrum of random matrices." Habilitationa Diriger Des
Recherches, Université Joseph Fourier Grenoble I, to be Submitted, .
Péché, S. (2003). "Universality of local eigenvalue statistics for random sample covariance
matrices." Verlag nicht ermittelbar, .
Plerou, V., Gopikrishnan, P., Rosenow, B., Amaral, L. A. N., Guhr, T., and Stanley, H. E.
(2002). "Random matrix approach to cross correlations in financial data." Physical Review
E, 65(6), 066126.
Prähofer, M., and Spohn, H. (2000a). "Statistical self-similarity of one-dimensional growth
processes." Physica A: Statistical Mechanics and its Applications, 279(1), 342-352.
307

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Prähofer, M., and Spohn, H. (2000b). "Universal distributions for growth processes in 1 1
dimensions and random matrices." Phys. Rev.Lett., 84(21), 4882.
Priestley, J. (1764). A Description of a Chart of Biography: By Joseph Priestley.... Printed at
Warrington.
Reda, R. M. (1990). "RPM: Repetitive project modeling." J.Constr.Eng.Manage., 116(2), 316-
330.
Riley, D. R., and Sanvido, V. E. (1997). "Space planning method for multistory building
construction." J.Constr.Eng.Manage., 123(2), 171-180.
Roofigari Esfahan, N. (2016). "A FRAMEWORK FOR SPATIO-TEMPORAL
UNCERTAINTY-AWARE SCHEDULING AND CONTROL OF LINEAR PROJECTS. "
Saccenti, E., Smilde, A. K., Westerhuis, J. A., and Hendriks, M. M. (2011). "Tracy-Widom
statistic for the largest eigenvalue of autoscaled real matrices." J.Chemometrics, 25(12), 644-
652.
Sauer, T. (2017). "A look back at the Ehrenfest classification." The European Physical Journal
Special Topics, 226(4), 539-549.
Schexnayder, C., Knutson, K., and Fente, J. (2005). "Describing a Beta Probability Distribution
Function for Construction Simulation." J.Constr.Eng.Manage., 131(2), 221-229.
Seppänen, O. (2009). "Empirical research on the success of production control in building
construction projects.".
Soong, T. T. (2004). Fundamentals of probability and statistics for engineers. John Wiley &
Sons.
Soshnikov, A. (2002). "A note on universality of the distribution of the largest eigenvalues in
certain sample covariance matrices." Journal of Statistical Physics, 108(5), 1033-1056.
Soshnikov, A. (1999). "Universality at the Edge of the Spectrum¶ in Wigner Random
Matrices." Communications in Mathematical Physics, 207(3), 697-733.
Stein, D. L. (2004). "Spin glasses: still complex after all these years?" Decoherence and Entropy
in Complex Systems, Springer, 349-361.
Su, Y., Lucko, G., and Thompson, R. C. (2016). "Evaluating performance of critical chain
project management to mitigate delays based on different schedule network complexities." 2016
Winter Simulation Conference (WSC), IEEE, 3314-3324.
Taherpour, A., Nasiri-Kenari, M., and Gazor, S. (2010). "Multiple antenna spectrum sensing in
cognitive radios." IEEE Transactions on Wireless Communications, 9(2), 814-823.

308

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Tao, T., and Vu, V. (2012). "Random covariance matrices: Universality of local statistics of
eigenvalues." The Annals of Probability, 40(3), 1285-1315.
Tao, T., and Vu, V. (2010). "Random matrices: Universality of local eigenvalue statistics up to
the edge." Communications in Mathematical Physics, 298(2), 549-572.
Thesen, A. (1977). "Measures of the restrictiveness of project networks." Networks, 7(3), 193-
208.
Thompson Jr, R. C., Lucko, G., and Su, Y. (2016). "Reconsidering an Appropriate Probability
Distribution Function for Construction Simulations." Construction Research Congress
2016, 2522-2531.
Tijms, H. C. (2007). Understanding probability : chance rules in everyday life. Cambridge
University Press, Cambridge.
Tracy, C. A., and Widom, H. (2008). "A Fredholm Determinant Representation in
ASEP." Journal of Statistical Physics, 132(2), 291-300.
Tracy, C. A., and Widom, H. (2009). "The distributions of random matrix theory and their
applications." New Trends in Mathematical Physics, 753-765.
Tracy, C. A., and Widom, H. (2002). "Distribution functions for largest eigenvalues and their
applications." arXiv Preprint Math-Ph/0210034.
Tracy, C. A., and Widom, H. (2001). "On the distributions of the lengths of the longest
monotone subsequences in random words." Probability Theory and Related Fields, 119(3), 350-
380.
Tracy, C. A., and Widom, H. (1996). "On orthogonal and symplectic matrix
ensembles." Communications in Mathematical Physics, 177(3), 727-754.
Tracy, C. A., and Widom, H. (1994). "Level-spacing distributions and the Airy
kernel." Communications in Mathematical Physics, 159(1), 151-174.
Tracy, C. A., and Widom, H. (1993). "Introduction to random matrices." Geometric and
quantum aspects of integrable systems, Springer, 103-130.
Uma Maheswari, J., Varghese, K., and Sridharan, T. (2006). "Application of dependency
structure matrix for activity sequencing in concurrent engineering
projects." J.Constr.Eng.Manage., 132(5), 482-490.
Van Slyke, R. M. (1963). "Monte Carlo methods and the PERT problem." Oper.Res., 11(5), 839-
860.
Wang, J. (2004). "A fuzzy robust scheduling approach for product development
projects." Eur.J.Oper.Res., 152(1), 180-194.
309

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Wishart, J. (1928). "The generalised product moment distribution in samples from a normal
multivariate population." Biometrika, 32-52.
Wolchover, N. (2014). "At the Far Ends of a New Universal Law." Sci. Am.
Zeng, X., and Hou, Z. (2012). "The universality of Tracy-Widom F2 distribution." Advances in
Mathematics, 41(5), 5Æ.

310

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
CHAPTER 3
Application of PCA for Data Reduction in Modeling Project
Network Schedules Based on the Universality Concept in RMT

Abstract

Significant findings have been achieved by employing the proposed methodology for analyzing

the principal components of a sample covariance matrix or correlation matrix generated from

project network schedules. Project network schedules are comprised of precedence relationships

between activities that can become complex as the number of pairwise links (thousands in large

projects) between activities grows, making them challenging to design and maintain. This study

demonstrated that applying well-known concepts from random matrix theory can help develop

better project schedules. The proposed methodology is based on the largest eigenvalue of sample

covariance and population correlation matrices derived from sampling durations of project

network activities from a triangular distribution with known parameters.

The methodology’s assumptions limit the sample size to an optimum sample size determined at a

significance level α and require an appropriate normalization procedure for standardizing the

matrices ‘eigenvalues. Under these conditions, it has been established that the Tracy-Widom

distribution of order 1 (TW1) is the joint sampling distribution of durations of a project network’s

activities at the significance level α. Moreover, the proposed methodology is based on three

identified rules that assisted in selecting the principal components (PCs) to retain.

311

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
312

The simulations performed on a few numbers of networks of varied sizes resulted in the following

results. First, applying the scree plot rule, the study found that there seems to be a direct correlation

between the optimum sample size and the number of PCs to retain for a given network.

Additionally, the analysis indicated that Johnstone's (2001) spiked covariance model is a viable

candidate for predicting the limiting durations of project network activities using a PCA-based

linear regression. The spiked model, an empirical derivation of Johnstone (2001), is a covariance

matrix with a specific structure used to characterize the behavior of a system having one or more

prominent eigenvalues that are easily differentiated from the rest of the data. Second, applying the

second rule based on hypothesis testing with TW p-values, the study found some limitations that

prevented the specified null hypothesis from adequately being evaluated. The limits are essentially

due to the availability of the approximations of TW p-values.

Finally, the calculated threshold value indicates a phase transition in project network schedules.

This phase transition separates the TW distribution's left and right tails. Over this critical zone, the

system transitions from the weak (stable) to the strong (unstable) coupling phase over this critical

zone. This discovery is critical because it is likely to assist practitioners in identifying the location

at which a construction project schedule may become unstable. The empirical investigation

conducted only on a few project networks has yielded exciting results. Still, more research is

needed to consider project networks of various sizes and complexities to develop guidelines for

constructing PCA-based models and locating phase transitions in project network schedules.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
313

3.1 Introduction and Research Questions

Various scientific and engineering disciplines (e.g., genetics, meteorology, agriculture, and

econometrics) rely on exploratory data analysis and visualization. The requirement to evaluate vast

amounts of multivariate data raises the fundamental dimensionality reduction problem as Roweis

and Saul (2000) inquired about: how to develop compact representations of high-dimensional data.

Due to digitization, a large volume of records is being generated across numerous sectors, reducing

high-dimensional data is a critical problem in today's modern life. While the literature on the issue

recommends various strategies for data reduction, this study focuses on Principal Component

Analysis (PCA). As Naik 2017 wrote, PCA is a frequently used matrix factorization approach for

reducing the dimension of sets of random variables or measurements and identifying hidden

features beneath them. A review of dimensionality reduction strategies, such as the one undertaken

by Sorzano et al. (2014), classified the diversity of available techniques by providing the

mathematical foundations for each, which is an excellent source of well-known techniques. PCA

is concerned with reducing the dimensionality of a data set while keeping as much variance as

possible (Jolliffe 2002). This data set comprises many interrelated variables. Scholars generally

accomplish such a reduction by changing to a new collection of variables known as the principal

components (PCs), which are uncorrelated and ordered. The first few maintain most of the variance

inherent in the original variables.

Nevertheless, after investigating the presence of a covariance structure in the sampled durations of

project network activities in the previous chapter using the universality of the Tracy-Widom laws,

the goal of this chapter is twofold. The initial goal will be to run a sphericity test to see if a specific

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
314

sample variance-covariance matrix (its realization) matches a population with a particular matrix

defining the population covariance structure. Following the determination of the population

covariance matrix (Σ), the second objective of this chapter will be to employ Σ to perform data

reduction and interpretation via PCA. A comprehensive literature study on the sphericity test and

PCA is necessary to fulfill the chapter's primary goal. The methodology that will help achieve the

chapter's primary goals will follow. The proposed methodology will then be applied to a small

number of identified project networks, followed by an analysis of the outcomes. This chapter will

end with the contributions to the body of knowledge and recommendations for further research.

3.2 Literature Review

Principal component analysis (PCA) is a popular matrix factorization technique for reducing

dimensionality and uncovering hidden elements in sets of random variables, signals, or

measurements. Its power comes from the basic assumptions that distinct physical processes

generate independent variables. The reduction - although there are numerous other applications of

PCA (Jolliffe 2002, p.63)—can be accomplished by determining the (orthogonal) variance—

maximizing directions in the space containing the data (Bejan 2005). The PCA is a multivariate

statistics technique based on correlations or covariances that dates back to Pearson (1900) and

Hotelling (1933). Classic writers such as Jolliffe (2002) have a large body of literature, and

fascinating modern variations continue to emerge by covering the most recent cutting-edge

subjects in PCA (e.g., Naik 2017). In recent decades, the scale of data collection has increased. It

is no longer uncommon for the number of variables or features collected, p, to be on the order of,

or greater than, the number of instances (or sample size), n. Under specific assumptions on the

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
315

covariance structure of the data in this "high dimensional" scenario, the statistical features of PCA

display phenomena that are possibly unexpected when seen from the historically common

perspective of numerous samples and a fixed number of variables (Johnstone and Paul 2018). The

PCA is carried out for two main reasons. These are the reasons for selecting major components

that characterize a mathematical model and checking for a covariance structure in the data. Both

causes are discussed in detail in the following sections.

3.2.1 PCA Methods

Because of the difficulty in explaining the variance-covariance structure of a set of variables using

a few linear combinations of these variables, much of the overall system variability is frequently

accounted for by a small number k of the PCs. In this case, the k components contain (almost) as

much information as the original p variables. The p original variables can therefore be replaced

with k principal components. The original data set including n measurements on p variables can

likewise be replaced with the reduced data set containing n measurements on k principal

components. PC analysis frequently exposes previously unknown relationships, allowing for

interpretations that would not have occurred otherwise (Johnson and Wichern 2019). Although it

is preferable to deal with a smaller number of linear combinations of the p variables containing

most of the information, reducing a data set to fewer components is tricky. According to scholars

such as Forkman et al. (2019), the first few PCs usually indicate fascinating systematic patterns.

In contrast, the last may reflect random noise rather than a recurring pattern. As aa result, the last

ones are often eliminated. For investigators, the critical question is how many PCs are statistically

significant. Practitioners retain only a few, depending on the fraction of variation explained by the

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
316

first few PCs. Because each PC is almost certainly a function of all p variables, the values of all p

variables are still necessary to calculate the PCs. The literature on the issue provides numerous

rules for determining an appropriate value of k, the majority of which are ad hoc. In practice, these

decision-making criteria are mainly based on the behavior of the sample covariance or correlation

matrix's largest eigenvalues rather than dealing with natural objects (Bejan 2005). The first three

rules for determining k are ad hoc rules of thumb. However, despite various attempts to formalize

them, as Jolliffe 2002 stated, they are intuitively logical and work in practice. These criteria are

(1) cumulative percentage of total variation—see Cangelosi and Goriely (2007) for application to

biology; (2) size of variances of principal components, also known as the Kaiser's rule (Kaiser

1960)—see Shin et al. 2012, and Kevric and Subasi 2014 for applications; (3) the scree graph and

its alternative the log-eigenvalue (or LEV) diagram introduced by Cattell (1966) and Craddock

and Flood (1969), respectively—e.g., see GarcÇa-Alvarez (2009) for application in engineering

and Franklin et al. (1995) for references with regards to parallel analysis developed by Horn (1965)

as a modification of Cattell's scree diagram. The second set of criteria consists on formal

hypothesis tests.

However, according to Jolliffe (2002), they use distributional assumptions that are frequently

unrealistic and often maintain more variables than are necessary. The Bartlett (1950) test is an

example of this type. It contrasts the null hypothesis that all descend ranked eigenvalues from k+1

to p are equal with the alternative hypothesis that at least two of the last eigenvalues (p-q) are

unequal. Because of the issues with these rules, there is a list of ad hoc rules found in the literature.

For example, for covariance matrices, a maximum likelihood ratio test (MLRT) using the null

distribution of the test statistic can be employed directly (e.g., Johnstone 2001; Choi et al. 2017,

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
317

Saccenti and Timmerman 2017). The latter are statistical rules, most of which do not require

distributional assumptions. The concept underlying these methods is quite similar to the

cumulative percentage of the total variation, except that each entry 𝑥 of X is now predicted from

an equation similar to the SVD but based on a submatrix of X that does not include 𝑥 . The PCs

are based on PRESS, which stands for PREdiction Sum of Squares and is derived from Allen's

(1974) comparable notion in regression. Examples of applications include two families of

bootstrapping approaches, one of which is based on bootstrapping residuals from a

multidimensional model (e.g., Forkman et al. 2019 for applications in biology and environmental

sciences) and the other on bootstrapping the data itself. Another application is related to the

jackknife estimation process. While these criteria for determining the number of variables to keep

have been supplied as background information, the mathematical underlying most of them and

illustrative examples may be found in Jolliffe's (2002) famous work. However, several of these

rules make use of computationally intensive approaches such as cross-validation.

The second reason may be illustrated by referring to a general example supplied by Bejan (2005).

Assume that one is interested in determining whether a particular data matrix has a particular

covariance structure. Then, according to standard statistical methodology, one could consider the

following. First, for a specific type of population distribution, find the distribution of some

statistics that is a function of the sample covariance matrix; then, construct a test based on this

distribution and use it whenever the data satisfy the test's conditions. Some tests employ this

methodology, for example, a test of sphericity in examinations of the data's covariance structure;

for references, see Kendall and Stuart (1968), Korin (1968), and Mauchly (1940). Naturally, this

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
318

is a pretty broad overview. However, given recent results in our field of interest, which will be

explored in greater detail in later sections, one may hope that such a statistic for assessing a

covariance structure is based on the largest sample eigenvalue. Indeed, such tests have evolved

and are now referred to as the greatest root tests of 𝜮 𝑰 in the literature; see Roy (1953).

3.2.2 PCA's Applications in Construction and Civil Engineering (CE) Fields

Principal components analysis (PCA) is extensively utilized in various domains, including data

analysis, model compression, and multivariate process monitoring, with the primary goal of data

reduction. Its end-users include academic researchers and professionals from a variety of fields. In

addition, there are several applications in domains other than construction and engineering, some

of which can be found in various journal articles and thesis. For example, Shubham (2021) recently

employed PCA in Computer and Mathematics Education to establish nine parameters for a

government control approach to preventing COVID-19 proliferation in India. The Indian second

wave SEIR model based on the nine parameters helped with the analysis. The PCA results ranked

wearing a mask in first place (90%), second place in sneezing with a tissue (65%), and third place

in sanitizer dispenser.

Nonetheless, because the current research project is being undertaken in CE, a literature review on

the applications of PCA in this field is required. Doing so will aid in presenting the current

knowledge, including substantive findings and methodological contributions to applying PCA in

CE. According to the conducted literature survey, PCA is not new to the construction and

engineering communities. Various practitioners have used PCA to tackle engineering challenges

and problems faced during project construction in many parts of the world. For example, el-Kholy

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
319

(2021) evaluated the best models for Predicting Delay and Cost Overrun Percentages (PDCOP)

for highway projects using four accurate Artificial Neural Networks (ANN) paradigms in the field

of transportation engineering. The research methodology included applying various models to each

paradigm based on the Input Projection Algorithm, including rule and function. In addition, the

methodology included a sensitivity analysis to ensure the consistency of the results for the superior

models. According to the PCA paradigm, his best-proposed model outperformed previously

published models by having a Mean Absolute Percentage Error (MAPE) of 25.4 percent for

forecasting percent cost overrun, compared to 30.42 and 40.37 % for models in the literature. Other

applications include Ghosh and Jintanapakanont's identification and assessment of significant risk

variables in an underground rail project in Thailand (2004).

Chai et al. (2015) used a Structural Equation Modeling (SEM) approach to examine the current

delay reduction measures in construction to help alleviate housing supply delays caused by

Malaysia's rapid growth and urbanization. They conducted research in Malaysia's 13 states and

three Federal Territories. As a result, the PCA found 17 mitigating solutions, with the essential

mitigating approach being the prevention of delays in house supply. In a similar vein, Tahir et al.

(2017) analyzed 69 responses to discover the fundamental causes of delay and cost overruns in the

Malaysian construction industry using PCA and factor analysis. Their analysis indicated that the

primary causes of delays and cost overruns were delays in design document creation, poor time

management, material delivery delays, a lack of awareness of different execution methods, labor

and material shortages, and changes in the scope of work. Moreover, Karji et al. (2020) used PCA

to identify the primary challenges to promoting sustainable construction in the United States.

Finally, Lam et al. (2005) used PCA as a tool in contractor qualification.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
320

In Computing in Civil Engineering, Dao et al. (2017) used PCA to construct input for a project

complexity model that researchers and practitioners may use to examine a project's complexity

levels based on well-established complexity indicators. They used findings from the study

conducted by the Construction Industry Institute (CII) in 2016, which identified 37 project

complexity measures as statistically significant out of 44 historical project complexity indicators.

However, because having too many predictive variables can be detrimental to a regression model

(the number of parameters being greater than the number of observations), PCA was an appropriate

technique for reducing the number of original variables for the model. Consequently, out of the 37

explanatory variables, the PCA technique based on the Pearson Chi-square test with a significance

level of 0.1 resulted in 27 or fewer variables.

In hydraulic engineering, Nam et al. (2019) proposed an effective method for burst monitoring,

isolation, and sensor placement in water distribution networks (WDN) using PCA and other

methodologies. They employed a three-parameter second-order regression model represented by

Bernoulli's Equation to define pressure and flow rate patterns, which necessitated a pretreatment

procedure to normalize the data of varying size and variation and reduce the dimensionality of the

input data. As a result, a PCA was used to modify the input data to the k-means algorithm, which

was inefficient without the modification. The supervised k-means clustering methodology

uncovered natural features of the data based on similarities among them. Their proposed

monitoring method improved the isolation ratio by 10% compared to conventional systems, and

the sensor combination was 40% less expensive. Because their system was not designed to handle

complex real-world WDNs (for example, industrial use), they anticipated that deep learning

techniques could improve burst monitoring and isolation.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
321

In hydrology, Arabzadeh et al. (2016) proposed a novel drought index (SDI) based on PCA as a

valuable tool for monitoring hydrological drought that is streamflow dependent. First, using

Kolmogorov Smirnov and chi-square, they demonstrated that the streamflow time series does not

follow a normal distribution for the ability to employ well-known distributions for fitting purposes.

Next, they used Bartlett's sphericity test (BST) to validate the PCA requirements (the presence of

strong correlations between variables known as hyper cloud's correlation). The test established the

sufficiency of the hyper cloud's correlation at each time scale at the 1% significant level using a

test statistic generated from the eigenvalues of correlation matrices. They then implemented PCA

based on scree plots to display the eigenvalues and cumulative variability against PCs' number and

other graphical methods. Their results revealed significant correlations between the SDI series of

the stations for specific time scales. Furthermore, the first principal component (PC1) explains 58–

85 % of the regional variability in the SDI series at the time scales specified. Additional

applications may be added, but the ones listed should illustrate how PCA is employed in

construction and civil engineering.

3.3 The Fundamentals of Principal Component Analysis

Principal components are linear combinations of p random variables 𝑋 , ⋯ , 𝑋 . These linear

combinations indicate the geometric selection of a new coordinate system created by rotating the

original system using the coordinate axes 𝑋 , ⋯ , 𝑋 . The new axes denote the directions with the

greatest variability and provide a more concise and straightforward representation of the

covariance structure. The covariance matrix or correlation matrix of the random variables

𝑋 , ⋯ , 𝑋 determines the PCs, as will be discussed. According to a few researchers (Jolliffe 2002,

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
322

Johnson and Wichern 2019), their development does not necessitate a multivariate normal

assumption. However, it is worth mentioning that principal components derived for multivariate

normal populations can be usefully interpreted in constant density ellipsoids, allowing for

inferences about the population from sample components.

3.3.1 Principal Components of the Population

To enable the introduction of PCA, let 𝑿 𝑋 ,⋯,𝑋 have the covariance matrix 𝛴 with

eigenvalues 𝜆 𝜆 ⋯ 𝜆 0. One may consider the following linear combinations

specified in Equation 3.1 below.

⎧𝑌 𝒂 𝑿 𝑎 𝑋 𝑎 𝑋 ⋯ 𝑎 𝑋
[Eq. 3-1]
⎪𝑌 𝒂 𝑿 𝑎 𝑋 𝑎 𝑋 ⋯ 𝑎 𝑋
⎨ ⋮

⎩𝑌 𝒂 𝑿 𝑎 𝑋 𝑎 𝑋 ⋯ 𝑎 𝑋

Thus, using well-established variance (var) and covariance (cov) properties resulting from linear

combinations of random variables, one can construct Equation 3.2(a) and Equation 3.2(b):

𝑉𝑎𝑟 𝑌 𝒂 𝜮𝒂 𝑖 1, 2, ⋯ , 𝑝 (a)
[Eq. 3-2]
𝐶𝑜𝑣 𝑌 , 𝑌 𝒂 𝜮𝒂 𝑖, 𝑘 1, 2, ⋯ , 𝑝 with 𝑖 𝑘 (b)

The principal components are those uncorrelated linear combinations 𝑌 , 𝑌 , ⋯ , 𝑌 with the largest

variances in Equation 3.2(a). The first PC 𝑌 represents the linear combination with the greatest

variance. It follows that the variances 𝑉𝑎𝑟 𝑌 can be augmented by multiplying any vector 𝒂 by

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
323

some constant. To avoid this indeterminacy, practitioners focus their attention on coefficient

vectors of unit length, as shown below in Figure 3.1.

linear combination 𝒂 𝑿 maximizing


ith principal component 𝑌 𝑉𝑎𝑟 𝒂 𝑿 when 𝒂 𝒂 1 and
𝐶𝑜𝑣 𝒂 𝑿, 𝒂 𝑿 0 for 𝑘 𝑖

Figure 3.1: Illustration of a Vector Maximizing the ith PC

With these settings, the following significant results are stated to aid in understanding the

fundamentals of PCA. Demonstrations omitted here can be found in multivariate statistical

analysis textbooks (e.g., Jolliffe 2002, Johnson and Wichern 2019).

3.3.1.1 Important General Results in PCA

Result 1: Assume that 𝜮 is the covariance matrix associated with the random matrix 𝑿

𝑋 , ⋯ , 𝑋 . Additionally, consider 𝜆 , 𝑒 , 𝜆 , 𝑒 , ⋯, 𝜆 , 𝑒 to be the eigenvalue-eigenvector

pairs of 𝜮 such that 𝜆 𝜆 ⋯ 𝜆 0. Hence, Equation 3.3 below provides the ith PC.

𝑌 𝒆 𝑿 𝑒 𝑋 𝑒 𝑋 ⋯ 𝑒 𝑋 , 𝑖 1, 2, ⋯ , 𝑝
[Eq. 3-3]

As a result, Equation 3.4 (a) and Equation 3.4(b) can be derived as follows:

𝑉𝑎𝑟 𝑌 𝒆 𝜮𝒆 𝜆 𝑖 1, 2, ⋯ , 𝑝 (a)
[Eq. 3-4]
𝐶𝑜𝑣 𝑌 , 𝑌 𝒆 𝜮𝒆 0 𝑖, 𝑘 1, 2, ⋯ , 𝑝 with 𝑖 𝑘 (b)

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
324

If some of the 𝜆 are equal, the choices of the corresponding coefficient vectors, 𝑒 , then 𝑌 are not

unique. From this result, one may conclude that the principal components are uncorrelated and

have variances equal to the eigenvalues of 𝜮.

Result 2: Let 𝜮 be the covariance matrix associated with the random matrix 𝑿 𝑋 , ⋯ , 𝑋 . In

addition, let 𝜆 , 𝒆 , 𝜆 , 𝒆 ,⋯, 𝜆 , 𝒆 be the eigenvalue-eigenvector pairs of 𝜮 such that 𝜆

𝜆 ⋯ 𝜆 0. Moreover, let 𝑌 𝒆 𝑿, 𝑌 𝒆 𝑿,⋯, 𝑌 𝒆 𝑿 represent the PCs. Thus,

Equation 3.5 establishes the link between the population covariance matrix and PCs.

𝜎 𝜎 ⋯ 𝜎 𝑉𝑎𝑟 𝑋 𝜆 𝜆 ⋯𝜆 𝑉𝑎𝑟 𝑌 [Eq. 3-5]

Using Equation 3.5, Result 2 shows that the Total population variance equals the total of the

variances of the variables X, which coincides with the sum of the eigenvalues of the population

covariance matrix 𝜮. As a result, Equation 3.6 can indicate the fraction or proportion of total

variance owing to or explained by the kth principal component.

Proportion of the total


𝜆
population variance due to , 𝑘 1, 2, ⋯ , 𝑝 [Eq. 3-6]
𝜆 𝜆 ⋯𝜆
kth principal component

The general rule is that if the first one, two, or three components can account for the majority (80

to 90 %) of total population variance, these components can "replace" the original p variables with

negligible information loss.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
325

Additionally, the coefficient vectors 𝒆 𝑒 ,𝑒 ,⋯,𝑒 with 𝑖 1, 2, ⋯ , 𝑝, need examination.

The magnitude of 𝑒 with 𝑘 1, 2, ⋯ , 𝑝, indicates the relative contribution of the kth variable to

the ith principal component, regardless of the other variables. The coefficient vector 𝑒 is

specifically proportional to the correlation coefficient between 𝑌 and 𝑋 . The following result

establishes a link between the two variables.

Result 3: If 𝑌 𝒆 𝑿, 𝑌 𝒆 𝑿, ⋯, 𝑌 𝒆 𝑿 denote the principal components derived from the

covariance matrix Σ, and 𝜆 , 𝒆 ,⋯, denote the eigenvalue-eigenvector pairs of 𝜮, then

Equation 3.7 below

𝑒 𝜆
𝜌 , 𝑖, 𝑘 1, 2, ⋯ , 𝑝 [Eq. 3-7]
𝜎

specifies the correlations coefficients between the components 𝑌 and the variables 𝑋 .

While correlations between variables and PCs aid in the interpretation of components, they only

quantify the univariate contribution of an individual variable 𝑋 to a component 𝑌 . For that reason,

some statisticians “recommend that only the coefficients 𝑒 , not the correlations, be used to

interpret the components” (Johnson and Wichern 2019).

3.3.2 Principal Components from Standardized Population

If the preceding section discussed the principal components derived from the population variables,

this section discusses the PCs produced from standardized population variables such as those in

Equation 3.8.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
326

𝑋 𝜇
𝑍
√𝜎
𝑋 𝜇
𝑍
√𝜎 [Eq. 3-8]

𝑋 𝜇
𝑍
𝜎

In matrix notation, Equation 3.8 may be written as in Equation 3.9(a) with its independent matrix

V defined as in Equation 3.9(b).

/
𝒁 𝑽 𝑿 𝝁

⎡ 𝜎 0 ⋯ 0 ⎤
⎢ 0 𝜎 ⋯ 0 ⎥ [Eq. 3-9]
/
𝑽 ⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎢ ⎥
⎣ 0 0 ⋯ 𝜎 ⎦

Without a doubt, the expectation 𝐸 𝒁 𝟎, and Equation 3.10 represents the covariance of 𝒁.

/ /
𝐶𝑜𝑣 𝒁 𝑽 𝜮 𝑽 𝝆
[Eq. 3-10]

From Equation 3.10, one can obtain the principal components of Z from the eigenvectors of the

correlation matrix 𝛒 of X. All prior results should be applied from here, as the variance of each 𝑍

is equal to one. To simplify, the notation 𝑌 will refer to either the ith PC of either 𝛒 or Σ.

Nonetheless, the 𝜆 , 𝒆 derived from Σ are not identical to those derived from. The literature

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
327

contains numerous examples demonstrating how standardization significantly impacts PCs (e.g.,

see Johnson and Wichern 2019).

Result 4: Equation 3.11 defines the ith principal component of the standardized population 𝒁

𝑍 ,𝑍 ,⋯,𝑍 with 𝐶𝑜𝑣 𝒁 𝝆.

/
𝑌 𝒆 𝒁 𝒆 𝑽 𝑿 𝝁 , 𝑖 1, 2, ⋯ , 𝑝
[Eq. 3-11]

Additionally, Equation 3.12 provides a relationship between the variances of the standardized

variables and those of the PCs.

𝑉𝑎𝑟 𝑌 𝑉𝑎𝑟 𝑍 𝑝 [Eq. 3-12]

Furthermore, Equation 3.13 provides the correlation coefficients between the PCs of standardized

variables Z and the Original Random Variables X.

𝜌 , 𝑒 𝜆 𝑖, 𝑘 1, 2, ⋯ , 𝑝
[Eq. 3-13]

As a result, 𝜆 , 𝒆 ,⋯, are the eigenvalue-eigenvector pairs of 𝝆. Like Equation 3.6, a similar

equation for a fraction of total variance is explained by the kth PC of 𝒁. This expression is given

by Equation 3.14, with the 𝜆 s representing the eigenvalues of 𝝆.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
328

Proportion of standardized
population variance due to 𝜆
, 𝑘 1, 2, ⋯ , 𝑝 [Eq. 3-14]
kth principal component 𝑝

3.3.3 Principal Components for Covariance Matrices with Special Structures

There are specific patterned covariance and correlation matrices whose principal components can

be expressed in well-known and straightforward forms. PCA's structured matrices are sought after

because, for the most part, they allow inferences through hypothesis testing by relying on well-

established results, which is especially useful when dealing with big data matrices. Tridiagonal

matrices are an example of these matrices. To illustrate one of these structures due to its relevance

to our subject, let 𝜮 be the diagonal matrix given by Equation 3.15(a).

𝜎 0 ⋯ 0
0 𝜎 ⋯ 0
𝜮 (a)
⋮ ⋮ ⋱ ⋮
0 0 ⋯ 𝜎

0 0 [Eq. 3-15]
⎡⋮⎤ ⎡ ⋮ ⎤
𝜎 0 ⋯ 0 ⎢ 0⎥ ⎢ 0 ⎥
0 𝜎 ⋯ 0 ⎢ 1⎥ ⎢1𝜎 ⎥ or 𝜮𝒆 𝜎 𝒆 (b)
⋮ ⋮ ⋱ ⋮ ⎢ 0⎥ ⎢ 0 ⎥
0 0 ⋯ 𝜎
⎢⋮⎥ ⎢ ⋮ ⎥
⎣ 0⎦ ⎣ 0 ⎦

By establishing 𝒆 0, ⋯ , 0, 1, ⋯ , 0 , with 1 in the ith position, one can derive Equation 3.15(b)

and conclude that 𝒆 𝑿 𝑋 , is the ith eigenvalue-eigenvector pair. Because of the linear

combination defined by 𝒆 𝑿 𝑋 , the collection of principal components is essentially the

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
329

original set of uncorrelated random variables. As a result, extracting the PCs for a covariance

matrix with the pattern provided by Equation 3.15(a) yields no benefit. From another perspective,

if X is distributed as 𝑁 𝝁, 𝜮 , the contours of constant density are ellipsoids with axes already

pointing in the direction of maximum variation. As a result, the coordinate system does not need

to be rotated. Finally, it is worth noting that standardization does not affect the situation in Equation

3.15(a). In such instance, 𝝆 𝑰, the 𝑝 𝑝 identity matrix. Additionally, spheroids are multivariate

ellipsoids with constant density in this case.

Another patterned covariance matrix, which is frequently used to describe the correspondence

between certain biological variables, such as the sizes of living organisms, has the general form of

Equation 3.16(a) for its covariance matrix, with the resulting correlation matrix being the same as

in Equation 3.16(b).

𝜎 𝜌𝜎 ⋯ 𝜌𝜎
⎡ ⎤
𝜮 ⎢𝜌𝜎 𝜎 ⋯ 𝜌𝜎 ⎥ (a)
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎣𝜌𝜎 𝜌𝜎 ⋯ 𝜎 ⎦
[Eq. 3-16]
1 𝜌 ⋯ 𝜌
𝜌 1 ⋯ 𝜌
𝝆 (b)
⋮ ⋮ ⋱ ⋮
𝜌 𝜌 ⋯ 1

In this situation, 𝝆 additionally represents the standardized variables' covariance matrix. Moreover,

Equation 3.16(b) implies that the variables 𝑋 , ⋯ , 𝑋 are correlated equally. One can certainly

demonstrate that the p eigenvalues of the correlation matrix in Equation 3.16(b) fall into two

categories which expressions are provided in Equation 3.17(a). For their corresponding

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
330

eigenvectors, while Equation 3.17(b) gives the expression of 𝒆 associated with the largest

eigenvalue 𝜆 , the remaining ones can be found in the literature (e.g., see Johnson and Wichern

2019). In the meantime, Equation 3.17(c) and Equation 3.17(d) provide the first principal

component of 𝝆 and the proportion of the total variance explained by this component. Other

components may be obtained from the same source as mentioned previously.

𝜆 1 𝑝 1 𝜌, 𝜆 𝜆 ⋯ 𝜆 1 𝜌 (a)

1 1 1 (b)
𝒆 , ,⋯,
𝑝 𝑝 𝑝

(c) [Eq. 3-17]


1
𝑌 𝒆 𝒁 𝑍
𝑝

𝜆 1 𝜌 (d)
𝜌
𝑝 𝑝

3.3.4 Principal Components in Random Variable Observations

After discussing the principles of principal components analysis based on the population

covariance or correlation matrix, this section will discuss how to summarize the variation in n

measurements on p variables using a few well-chosen linear combinations. To facilitate the

development, let the data 𝒙 , 𝒙 , ⋯ , 𝒙 be independent drawings from a p-dimensional population

with a mean vector μ and covariance matrix 𝛴. In the previous chapter, it was explained that the

sample mean vector 𝒙, sample covariance matrix S, and sample correlation matrix R can be derived

from these data. The goal of this part is now to build uncorrelated linear combinations of the

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
331

measured parameters that account for a large proportion of the variation in the sample. Hence, the

sample PCs will be the uncorrelated combinations with the greatest variances.

Through straightforward transformations, one may demonstrate that the n values of any

combination given by the equation below have a sample mean and variance of 𝒂 𝒙 and 𝒂 𝑺𝒂 .

𝒂 𝒙 𝑎 𝑥 𝑎 𝑥 ⋯ 𝑎 𝑥 , 𝑗 1, 2, ⋯ , 𝑛

In addition, the pairs of the values 𝒂 𝒙 , 𝒂 𝒙 , for two linear combinations, have covariance

𝒂 𝑺𝒂 .

With these settings, the sample principal components are defined as those linear combinations with

maximum sample variance. As with the population quantities, the coefficient vectors 𝒂 are

restricted to satisfy the condition 𝒂 𝒂 . More specifically, they are provided in Figure 3.2.

1st sample principal component linear combination 𝒂 𝒙 that maximizing the


sample variance of 𝒂 𝒙 when 𝒂 𝒂 1

linear combination 𝒂 𝒙 maximizing the


ith sample principal component sample variance of 𝒂 𝒙 when 𝒂 𝒂𝒊 1 and
zero sample covariance for all pairs
𝒂 𝒙 ,𝒂 𝒙 , 𝑘 𝑖

Figure 3.2: Illustration of Coefficient Vectors Maximizing the 1st and ith Sample PCs

Hence, the first principal component maximizes 𝒂 𝑺𝒂 which translates to Equation 3.18.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
332

𝒂 𝑺𝒂
𝒂 𝒂 [Eq. 3-18]

The maximum for choosing the eigenvector 𝒂 eigenvector 𝒖 of S is obtained by utilizing a

result derived for the maximizing of quadratic forms for points on the unit sphere (e.g., see

Anderson 2003) applicable for a positive definite matrix with eigenvalues 𝑙 ⋯ 𝑙 0.

Successive 𝒂 choices maximize when 0 𝒂 𝑺𝑙 𝒂 𝑙 𝒖 , or 𝒂 perpendicular to 𝒖 . As a

result, identical to the initial results (1–3), the following result about sample principal components

is obtained.

If 𝑺 𝑠 , ,⋯, is an 𝑝 𝑝 matrix with eigenvalue-eigenvector pairings 𝑙 , 𝒖 ,⋯, ,

Equation 3.19 gives the ith sample principal component.

𝑦 𝒖 𝒙 𝑢 𝑥 𝑢 𝑥 ⋯ 𝑢 𝑥 , 𝑖 1, 2, ⋯ , 𝑝 (a)

Sample variance 𝑦 𝑙 , 𝑘 1, 2, ⋯ , 𝑝 (b)

Sample covariance 𝑦 , 𝑦 0, 𝑖 𝑘 (c)


[Eq. 3-19]
Total sample variance ∑ 𝑠 𝑙 𝑙 ⋯ 𝑙 (d)

𝑢 𝑙 (e)
Correlation coefficients 𝑟 , 𝑖, 𝑘 1, 2, ⋯ , 𝑝
𝑠

Where 𝑙 𝑙 ⋯ 𝑙 0, and 𝒙 is an observation of the random variables 𝑋 , ⋯ , 𝑋 . As with

the population principal components, the sample principal components denoted by 𝑦 , 𝑦 , ⋯ , 𝑦 .

are obtained from S, Sn, or R when there is no ambiguity. Please note that the components derived

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
333

from each are not identical. Furthermore, 𝒙 observations are frequently centered by removing 𝒙,

which does not influence the sample covariance matrix S. For the centered observation 𝑦 , as

specified in Equation 3.20(a) and Equation 3.20(b), defines the ith PC. It can be demonstrated that

the sample mean of each PC specified in Equation 3.20(c) is zero.

𝑦 𝒖 𝒙 𝒙 , 𝑖 1, 2, ⋯ , 𝑝 (a)

𝑦 𝒖 𝒙𝒋 𝒙 , 𝑗 1, 2, ⋯ , 𝑛 (b) [Eq. 3-20]

𝑦 0 (c)

The sample variances are still given by the 𝑙 ’s as in Equation 3.19.

3.3.5 Standardizing the sample Principal Components

In general, sample PCs are not invariant with regard to scale changes. As indicated previously in

the discussion of population components, variables measured on different scales or a single scale

with widely varying ranges are frequently normalized (e.g., see Saccinti et al. 2011, Forkman et

al. 2019). The literature on the subject reveals a plethora of standardization strategies based on

various normalization formulas (e.g., standard deviation, Euclidean norms—Chapter 2 illustrates

this strategy). For illustration, standardization can be performed for the sample by creating a new

sample 𝒛 , such as in Equation 3.21(a), from the observations 𝒙 of the random variable X.

Equation 3.21(b) provides the expression of the new 𝑛 𝑝 sample data matrix Z.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
334

𝑥 𝑥̅
⎡ ⎤
⎢ √𝑠 ⎥
⎢𝑥 𝑥̅ ⎥
𝒛 𝑫 𝒙 𝒙 ⎢ √𝑠 ⎥, 𝑗 1, 2, ⋯ , 𝑛 (a)
⎢ ⋮ ⎥
⎢𝑥 𝑥̅ ⎥
⎢ ⎥
⎣ 𝑠 ⎦

𝑧 𝑧 𝑧 ⋯ 𝑧
⎡ ⎤
𝑧 𝑧 ⋯ 𝑧
𝒁 ⎢𝑧 ⎥ (b) [Eq. 3-21]
⎢⋮ ⎥ ⋮ ⋮ ⋱ ⋮
⎣𝑧 ⎦ 𝑧 𝑧 ⋯ 𝑧

𝑥 𝑥̅ 𝑥 𝑥̅ 𝑥 𝑥̅
⎡ ⋯ ⎤
⎢ √𝑠 √𝑠 𝑠⎥
⎢𝑥 𝑥̅ 𝑥 𝑥̅ 𝑥 𝑥̅ ⎥
⎢ ⋯ ⎥
√𝑠 √𝑠 ⋱ 𝑠
⎢ ⋮ ⋮ ⎥

⎢𝑥 𝑥̅ 𝑥 𝑥̅ 𝑥 𝑥̅ ⎥
⎢ ⋯ ⎥
⎣ √𝑠 √𝑠 𝑠 ⎦

Equation 3.21 produces the sample mean vector 𝒛 and covariance matrix 𝑆 in Equation 3.22.

𝑥 𝑥̅ ⎤ (a)

⎢ √𝑠 ⎥
⎢ ⎥
⎢ 𝑥 𝑥̅ ⎥
1 1 1
𝑧̅ 𝟏 𝒁 𝒁 𝟏 ⎢ ⎥ 𝟎
𝑛 𝑛 𝑛⎢ √𝑠

⎢ ⋮ ⎥
⎢ 𝑥 𝑥̅ ⎥ [Eq. 3-22]
⎢ ⎥
⎣ 𝑠 ⎦
1 1 1 1 (b)
𝑆 𝒁 𝟏𝟏 𝒁 𝒁 𝟏𝟏 𝒁 𝒁 𝟏𝒛 𝒁 𝟏𝒛
𝑛 1 𝑛 𝑛 𝑛 1
1
𝒁 𝒁 𝑹
𝑛 1

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
335

Equation 3.19 gives the sample principal components of the standardized data, where R replaces

S. Because the observations are already centered, there is no need to write them in Equation

3.20(a). In the form of result 6, the results for the standardized observations Z are as follows.

Result 6: The ith sample principal component is given by Equation 3.23 if 𝒛 , 𝒛 , ⋯ , 𝒛 are

standardized observations with the covariance matrix R as in defined in Equation 3.22(b).

𝑦 𝒖 𝒛 𝑢 𝑧 𝑢 𝑧 ⋯ 𝑢 𝑧 , 𝑖 1, 2, ⋯ , 𝑝 (a)

Sample variance 𝑦 𝑙 , 𝑘 1, 2, ⋯ , 𝑝 (b)

Sample covariance 𝑦 , 𝑦 0, 𝑖 𝑘 (c) [Eq. 3-23]

Total (standardized) sample variance 𝑡𝑟𝑎𝑐𝑒 𝑹 𝑙 𝑙 ⋯ 𝑙 (d)

Correlation coefficients 𝑟 , 𝑢 𝑙 𝑖, 𝑘 1, 2, ⋯ , 𝑝 (e)

Where 𝑙 𝑙 ⋯ 𝑙 0 and 𝒛 represent an observation of the random variables 𝑍 , ⋯ , 𝑍 .

The proportion of the total sample variance explained by the ith sample principal component, as

defined by Equation 3.24, can be derived from Equation 3.23 as follows:

Proportion of standardized
population variance due to 𝑙
, 𝑘 1, 2, ⋯ , 𝑝 [Eq. 3-24]
kth principal component 𝑝

As a basic rule, one should preserve only those components with variances greater than unity, or,

more precisely, only those components that individually explain at least a proportion 1 𝑝 of the

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
336

total variance. However, this rule lacks a theoretical foundation and should not be applied

carelessly.

3.3.6 PCA in Terms of Singular Value Decomposition

When performing a PCA, one searches for the successively orthogonal directions (eigenvalues

𝑙 /eigenvectors 𝒖 ) that can maximally describe the variation in the data consisting of n

measurements on the p random variables 𝑋 , ⋯ , 𝑋 . Explicitly, in PCA, the problem represented

by Equation 3.25 in terms of the sample covariance matrix S is formulated to be resolved.

𝒖 𝑺𝒖
𝑙 𝑚𝑎𝑥 ∶ 𝒖 ⊥ 𝒖 ,⋯,𝒖 [Eq. 3-25]
𝒖 𝒖 ,⋯, ,

Note that the pair eigenvalues and eigenvectors 𝑙 , 𝒖 ) derives from the singular value

decomposition of a data matrix X. The drawn variable in the sample in the form Equation 3.23(a)

has the sample variance 𝑉𝑎𝑟 𝒛 𝒖 𝑺𝒖.

3.3.7 Geometric Interpretation of the Sample Principal Components

There are various possible meanings for the sample principal components (e.g., see Jolliffe 2002,

Anderson 2003, Greenacre and Hastie 1987). To make this concept easier to grasp, assume the

underlying distribution of X to be roughly Gaussian 𝑁 𝝁, 𝜮 . The sample PCs specified in

Equation 3.26(a) are then a realization of the population PCs specified in Equation 3.26(b), which

have a 𝑁 𝟎, 𝞚 distribution. The diagonal of 𝝠 has entries 𝜆 , 𝜆 , ⋯ , 𝜆 and 𝜆 , 𝒗 that

represent the eigenvalue-eigenvector pairs of 𝜮.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
337

𝑦 𝒖 𝒙 𝒙 , 𝑖 1, 2, ⋯ , 𝑝 (a)
[Eq. 3-26]
𝑌 𝒗 𝑿 𝝁 , 𝑖 1, 2, ⋯ , 𝑝 (b)

Also, from the sample values 𝒙 , one can approximate 𝝁 by 𝒙 and 𝛴 by S. If S is positive definite,

the contour defined by all 𝑝 1 vectors 𝒙 meeting Equation 3.27(a), referred to as the Mahalanobis

distance from the sample mean, estimates the constant density contour of the underlying normal

distribution defined by Equation 3.27(b).

𝒙 𝒙 𝑺 𝒙 𝒙 𝑐 (a) [Eq. 3-27]

𝒙 𝝁 𝑺 𝒙 𝝁 𝑐 (b)

The approximate contours can be drawn to illustrate the normal distribution that generated the data

(e.g., see Jolliffe 2002, p. 23). While the normalcy assumption is favorable for inference

approaches, it is unnecessary to derive the characteristics of the sample PCs given in Equation

3.19. Even when the normal assumption is questioned and the scatter plot deviates from an

elliptical shape, the eigenvalues of S can still be extracted to obtain the sample PCs. Geometrically,

the data can be plotted as n points in p-space. The data can then be expressed in new coordinates

corresponding to the contour axes of Equation 3.27(a). Hence, this equation defines a

hyperellipsoid centered on the sample mean x and with axes defined by the eigenvectors of S.

These hyperllipsoid axes have lengths proportional to 𝑙 , ,⋯,


, where 𝑙 𝑙 ⋯ 𝑙 0

are the eigenvalues of S.

Since 𝒖 has a length of 1, the absolute value of the ith principal component (|𝑦 |) corresponds to

the length of the vector 𝒙 𝒙 ’s projection onto the unit vector 𝒖 . Thus, as defined in Equation

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
338

3.26(a), the sample principal components 𝒚 lie along the hyperellipsoid's axes, and their absolute

values are the lengths of 𝒙 𝒙 ’s projections in the directions of the axes 𝒖 . As a result, the

sample PCs can be considered as the result of translating the origin of the original coordinate

system to 𝒙 and then rotating the coordinate axes until they intersect the scatter in the directions

of maximum variation.

Figure 3.3 depicts the geometry of the sample PCs in two-dimensional space (𝑝 2). Figure 3.3(a)

illustrates a constant-distance ellipse centered on 𝒙, with 𝑙 𝑙 . The PCs of the sample are well

defined. They are perpendicular to the ellipse's axes in the direction of maximum sample variance.

Figure 3.3(b) depicts a constant distance ellipse with a center at X1 and a length of 𝑙 𝑙 . In this

situation, the constant distance contours are almost circular, or the eigenvalues of S are

approximately equal, and the sample variation is homogeneous in all directions. Then it is not

possible to display the data well in less than p dimensions.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
339

(a) 𝑙 𝑙 (b) 𝑙 𝑙

Figure 3.3: Geometric Illustration of the Sample Principal Components


Adaptation courtesy of Johnson and Wichern (2019, p.449)

The final few sample principal components can often be disregarded if the last few eigenvalues 𝑙

are small enough that the variance in the related 𝒖 directions is negligible. The data can be

effectively modeled by their representations in the space of the preserved components. For

additional information on the geometrical interpretation of sample principal components,

including the p-Dimensional or n-dimensional, one may consult PCA-related works such as Jolliffe

(2002) and other authors.

3.3.8 The Number of Principal Components

The number of components to keep was discussed earlier in the sections related to the literature

review. Furthermore, as previously stated, there is no definitive solution to this topic. Typically,

practitioners utilize another rule worth revisiting to provide more specifics to the amount of total

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
340

sample variation explained rule. This rule applies to the scree plot. A scree plot (Cattell 1966) is

an effective visual aid for determining the number of appropriate significant components. A scree

plot depicts 𝑙 versus i—the magnitude of an eigenvalue versus its number—with the eigenvalues

arranged from most important to most minor. To identify the appropriate number of components,

one examines the scree plot for an elbow (bend). The number of components is the value at which

the remaining eigenvalues are all approximately equal in size. For instance, Figure 3.4, Courtesy

of Donald et al. (2009), reveals that the first 20 eigenvalues account for the most variance. As a

result, the dimensionality can be reduced from (50 1000) to (50 20) while retaining much of the

variability in the data.

Figure 3.4: Illustration of a Scree Plot

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
341

3.3.9 Graphing the Principal Components and Regression Model

Principal component plots can identify questionable observations and verify distributional

assumptions (e.g., normality). Because PCs are linear combinations of the original variables, it is

logical to anticipate them as close to the expected distribution. When the first few PCs are to be

utilized as input data for future analyses, it is frequently required to check that they are

approximately distributed as the theoretical distribution. The last principal components can aid in

the identification of abnormal observations. Each observation can be written as a linear

combination of S's entire collection of eigenvectors 𝒖 , 𝒖 , ⋯ , 𝒖 of S as shown in the following

Equation 3.28.

𝒙 𝒙 𝒖 𝒖 𝒙 𝒖 𝒖 ⋯ 𝒙 𝒖 𝒖
[Eq. 3-28]

𝑦 𝒖 𝑦 𝒖 ⋯ 𝑦 𝒖

As a result, the magnitudes of the last principal components affect how well the first few principal

components fit the observations. The expression supplied in Equation 3.29(a) departs from 𝒙 by

the expression given in Equation 3.29(b). The expression of Equation 3.29(c) gives the square of

the length of the last principal component. Suspicious observations will frequently have at least

one large coordinate 𝑦 , ⋯ , 𝑦 contributing to this squared length.

𝑦 𝒖 𝑦 𝒖 ⋯ 𝑦, 𝒖 (a)

𝑦 𝒖 𝑦, 𝒖 ⋯ 𝑦 𝒖 , (b) [Eq. 3-29]

𝑦 𝑦 ⋯ 𝑦 (c)

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
342

In addition, to aid in identifying suspicious observations, it is common in practice to verify the

distributional assumption by creating scatter diagrams for pairs of the first few principal

components. Additionally, Q-Q plots can be created using the sample values generated by each

principal component.

3.4 Principal Components in Regression

An examination of PCs is more of a means to a goal than just an end. They are commonly used as

interim steps in much larger investigations. PCs, for example, might be used as inputs to a multiple

regression procedure. As a result, this section gives context for a multiple regression procedure

based on PCs.

3.4.1 Classical Single Linear Regression Model

Let a dependent (response) variable 𝒀 𝑌 ,⋯,𝑌 rely on p independent observations

𝒙 , ⋯ , 𝒙 . Let r be the number of observations to draw from the population in question. The basic

linear regression model implies that the variable Y consists not only of an expression that is

dependent on the variables 𝒙 but also of a random error 𝜺 𝜺 , ⋯ , 𝜺 , as shown in Equation

3.30. The term "linear" refers to the fact that the expression of Y is made up of a linear function of

the "p" unknown parameters. On the other hand, the behavior of the error ε is characterized by a

set of distributional assumptions given in Equation 3.31.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
343

Y β β x ⋯ β x ϵ

Y β β x ⋯ β x ϵ

⋮ [Eq. 3-30]

Y β βx ⋯ β x ϵ

Response mean depending on Predictors x , x , ⋯ , x error

where the error values are presumed to have the characteristics below grouped in Equation 3.31

with σ representing an unknown parameter known as the error variance.

𝐸 𝜺 0, ∀𝑗 1, ⋯ , 𝑟; (a)

𝑉𝑎𝑟 𝜀 𝜎 , ∀𝑗 1, ⋯ , 𝑟 (b) [Eq. 3-31]

𝐶𝑜𝑣 𝜀 , 𝜀 0, ∀ 𝑗 𝑘 (c)

Expressed in term of matrix notation, Equation 3.30 becomes Equation 3.32 below provided:

𝒀 1 𝑥 𝑥 ⋯ 𝑥 𝛽 𝜀
⎡𝒀 ⎤ ⎡1 𝑥 𝑥 ⋯ 𝑥 ⎤ ⎡𝛽 ⎤ ⎡𝜀 ⎤ [Eq. 3-32]
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢𝜀 ⎥
⎢𝒀 ⎥ ⎢1 𝑥 𝑥 ⋯ 𝑥 ⎥ ⎢𝛽 ⎥ ⎢ ⎥
⎢⋮ ⎥ ⎢⋮ ⋮ ⋮ ⋱ ⋮ ⎥⎢ ⋮ ⎥ ⎢⋮⎥
⎣𝒀 ⎦ ⎣1 𝑥 𝑥 ⋯ 𝑥 ⎦ ⎣𝛽 ⎦ ⎣𝜀 ⎦

where 𝛃 is an unknown parameter and the multiplier of the constant term 𝛽 is denoted by "1s" in

the first column of the design matrix X. It should be noted that the assumptions of the error term

specified in Equation 3.31 are overly simplistic for confidence statements and hypothesis testing.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
344

3.4.2 Least Square Estimation

Many research goals can be achieved with regression analysis. Developing an equation to help

determine the projected response given the values of the predictor variables is one of them. As a

result, it is critical to fit the model in Equation 3.30 to the observed values 𝐲 corresponding with

the known measurements 1, 𝑥 , 𝑥 , ⋯ , 𝑥 . For example, calculating the true values of both the

regression coefficients and the error variance σ matching the available data will aid in achieving

the regression analysis target. The least-squares approach entails choosing trial values for the

regression coefficients β, abbreviated as b, in such a way that they minimize the sum (S) of the

squares of all the differences in Equation 3.33

S 𝐛 y b b x ⋯ b x 𝐲 𝐗𝐛 𝐲 𝐗𝐛 [Eq. 3-33]

with:

𝑏 𝑏 𝑥 ⋯ 𝑏 𝑥 𝐸 𝒚 Mean response of the observed value 𝐲

Since the least square criterion selects the coefficients “b”, they are denoted as least squares

estimates of the regression coefficients β. To highlight their role as estimates, they often written

as 𝛃. As defined, they are consistent with the data by producing estimated or fitted mean responses

that are smaller as possible.

With the least squares’ estimates denoted as 𝛃, the corresponding deviations are called residuals,

and they are represented by Equation 3.34.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
345

𝜀̂ 𝑦 𝛽 𝛽𝑥 ⋯ 𝛽 𝑥 ,∀ 𝑗 1, 2, ⋯ , 𝑟
or [Eq. 3-34]

𝛆 𝐲 𝐗β

While this section aims to provide the background knowledge necessary to connect linear

regression to principal components, the literature contains much information. The elements

required to fit a model to data using least squares estimation are illustrated below.

Observed Straight‐line
Responses "y" model "Y"

Predictors "x" Model


from given data parameters
Model Fitted
to Observed
Responses

Figure 3.5: Elements of Fitting a Model to Data Using Least Square Estimates

3.4.3 Principal Components and Linear Regressions

As previously stated, linear regression analysis may represent the ultimate application of PCA.

This section will provide some context information and significant and widely utilized results for

regression analysis using PCs. The following is a significant result worth including because it

establishes the process for determining the linear regression model's unknowns. Jolliffe (2002)

provides evidence for this conclusion. Assume that X, is composed of n observations x on p

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
346

predictor variables with their means quantified and that the associated regression equation is given

by Equation 3.35,

𝐲 𝐗𝛃 𝛆
[Eq. 3-35]

where y is the vector of n observations on the dependent variable, measured about the mean.

To facilitate the development, assume that the equation 𝒁 𝑿𝑩 transforms X, where B is an 𝑝 𝑝

orthogonal matrix. The regression equation can be written as 𝐲 𝐙𝜸 𝛆, where 𝜸 𝑩 𝟏 𝛃. The

typical least squares estimator for 𝜸 is 𝜸 𝒁 𝒁 𝒁 𝒚. Thus, the elements of 𝜸 possess,

sequentially, the smallest possible variances if 𝑩 𝑨, the matrix whose kth column is the kth

eigenvector of 𝑿 𝑿, and hence the kth eigenvector of the sample covariance matrix 𝑺 𝑿 𝑿. In

this case, Z consists of values of the sample principal components for x.

The residual vector can be estimated as part of the diagnostic established when verifying the

distributional assumptions for a multivariate multiple regression model. Indeed, after fitting any

model using any estimation approach, it is prudent to consider the following equality depicted by

Figure 3.6

Observation Vector of
Residual
vector predicted
vector estimated

Figure 3.6: Illustration of Residual Vector (PCA)

or in mathematical notation in the form of Equation 3.31. In this case, 𝑟 𝑛.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
347

For multivariate linear models, one can analyze the principal components derived from the

residuals’ covariance matrix, as provided in Equation 3.36, in the same way as those determined

from sample principal components discussed earlier.

1
𝜺 𝜺 𝜺 𝜺
𝑛 𝑝 [Eq. 3-36]

One should keep in mind that because the residuals from linear regression analysis are linearly

related, the last eigenvalues will be zero, with rounding error.

3.5 Large Sample inferences

The principal component analysis is defined as a method of determining the principal components

of a covariance matrix (S) or correlation matrix (R). A set of eigenvectors and eigenvalues defines

the maximum variance directions. p-dimensionality is reduced when the few eigenvalues are

substantially greater than the rest. In practice, the quality of the principal component

approximation is determined by the derived eigenvalue-eigenvector pairings 𝑙 , 𝑢 . The pairs

will differ from their underlying population counterparts due to sampling variation. These

distributions are difficult to derive (Johnson and Wichern 2019, Johnstone 2001). While this

manuscript cannot include all large sample inference findings, the following are relevant to this

investigation and are based on the concepts already introduced in the previous section.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
348

3.5.1 Sphericity Test for Sample Covariance Matrices under Multinormality

The population variance-covariance matrix (𝜮) characterizing the inherent uncertainty

(variabilities or distances of data points from the origin) in the observations of a given data is

frequently unknown in the field of multivariate statistical analysis. To find this matrix,

investigators use sphericity tests as inferential tools. First, they look for evidence of a matrix like

this by sampling a specific covariance matrix. Then, they test the matrix to ascertain whether a

population matrix is proportionally equal to the identity matrix. To put it another way, one would

like to know if there is any link between the population variables. For instance, answering this

question would entail testing the null hypothesis of the identity covariance matrix under Gaussian

assumptions. In this case, the null hypothesis will be defined in Equation 3.37.

𝐻 :𝜮 𝑰

against the alternative [Eq. 3-37]

𝐻 :𝜮 𝑰 .

The hypothesis 𝐻 implies that 𝜮 has a specific value in general. Another possibility is to test the

following hypothesis: Does a particular observed sample covariance matrix S correspond to the

population matrix? Kendall and Stuart (1968) developed a sphericity hypothesis to address this

challenge.

The new hypothesis is formed by linearly transforming 𝜮 into a unity matrix, which is

accomplished by multiplying 𝜮 by its inverse 𝜮 , resulting in 𝜮 𝜮 𝑰 . The question that arises

after this transformation is whether the new matrix 𝑪 𝜮 𝑺 corresponds to the hypothetical

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
349

matrix 𝜎 𝑰 . The location is unknown. The test that results is referred to as the sphericity test

(Bejan 2005). Equation 3.38 offers the test statistics under the multi normality assumption

because of Mauchly's 1940 work, where n and p are the data dimensions. The quantity

𝑝 𝑝 1
𝑛log 𝑙 has a chi-square 𝜒 distribution with degrees of freedom 𝑓 2 1.

𝑑𝑒𝑡 𝑪 [Eq. 3-38]


𝑙
𝑡𝑟 𝑪
𝑝

3.5.2 Sphericity Tests Based on TW1 p-Values

Apart from the previous sphericity test, another critical test is derived from Johnstone's (2001)

celebrated theorem. This section aims to establish this sphericity test for white Wishart matrices

(𝜮 𝜎 𝑰 ). It is helpful to refer back to chapter 2 about the Wishart model 𝑊 𝑛, 𝜮 and its

assumptions for its establishment. That model 𝒳 approximates a random process using p random

variables 𝒳 , ⋯ , 𝒳 characterizing a normally distributed population 𝑁 𝜇, 𝜮 . In addition, each

sample's observations are indicated by 𝑿 𝑥 , where 𝑥 are i.i.d. from 𝑁 0,1 . For the

sake of simplification, 𝝁 𝟎 and 𝜮 𝑰 . As discussed in chapter 2, the 𝑝 𝑝 covariance matrix

𝑨 𝑿 𝑿, an element of 𝑊 𝑛, 𝜮 , has a p-variate Wishart distribution with n degrees of freedom

and represents the sample covariance matrix of X.

Let 𝑙 ⋯ 𝑙 denote the eigenvalues of A (biased estimate of 𝜮). Notice that through the

equality 𝑙 𝑛ℓ , the eigenvalues 𝑙 of 𝛴 are associated with the eigenvalues ℓ of the unbiased

estimator of 𝛴, that is 𝑺 𝑨 .
𝑛

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
350

Under these conditions, the null hypothesis H0 asserts that there are no associations between the p

variables. That is the formula 𝜮 𝜎 𝑰 . Under 𝐻 , all population eigenvalues equal one. However,

there is a spread in the sample eigenvalue 𝑙 , as has been known for some time. This subject was

covered in chapter 2. Nonetheless, Tracy and Widom (1993, 1994) estimated the null hypothesis

by Equation 3.39 to determine if "large" observed eigenvalues support rejecting the null

hypothesis.

ℙ 𝑙 𝑡 \𝐻 𝑊 𝑛, 𝜎 𝑰
[Eq. 3-39]

In the following theorem, Johnstone (2001) established an approximation of the probability

distribution of the largest eigenvalue 𝑙 in Equation 3.40

Theorem 1:

ℙ 𝑛𝑙 𝜇 𝜎 𝑥\𝐻 → 𝐹𝟏 𝒙
[Eq. 3-40]

𝑝
where the limit is 𝑛 → ∞, 𝑝 → ∞ such that 𝑛 → 𝛾 ∈ 0, ∞ , 𝐹𝟏 is the largest eigenvalue

distribution known as the Tracy-Widom limit law see chapter 2.) Under relaxed assumptions

on n, p, and the entries of the data matrix X, this theorem has been extended to the mth greatest

eigenvalue of the sample covariance matrix S, corresponding to a broader class of matrices. This

is referred to as the TW distributions' universality.

By adopting 𝑇𝑊 𝑛, 𝑝 for the law of 𝜇 𝜎 𝑊 , then the largest eigenvalue 𝑙 has

approximately the Tracy-Widom 𝑇𝑊 𝑛, 𝑝 distribution. From this result, one may construct a

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
351

sphericity test for covariance matrices in terms of their largest eigenvalues. Specifically, under the

assumption of multinormality, one may consider testing the null hypothesis 𝐻 in the form

provided in Equation 3.37.

Leaning on Saccenti and Timmerman (2017) will provide greater insight into this subject.

Johnstone's Theorem, which addresses the asymptotic distribution of the largest eigenvalue of

random covariance matrices, is the main result illustrated here. Because this is a realistic estimate

even for small sample sizes n and a small number of variables p. It can be used as a statistical test

to determine the number of principal components of empirical data. So yet, there is no similar

approach for standardized data (i.e., principal components based on correlations). Moreover, while

Johnstone's Theorem has recently been extended to the greatest eigenvalue of random correlation

matrices, this asymptotic solution requires very large n and p to arrive at an acceptable

approximation. An approximate solution for the first largest eigenvalue has been proposed, which

appears to apply to smaller N and P. Still, no reasonable approach for verifying the number of

principal components in the case of correlation matrices is available. To conclude this topic,

Saccenti and Timmerman's (2017) assessment of Johnstone's theorem is consistent with

Johnstone's (2001) ad hoc proposal to include running PCA on standardized data in his theorem.

Nevertheless, given a significance level α or the threshold of probability by which to reject the null

hypothesis in a two-tailed test, approximate 𝛼 ∙ 100% significance values to use for the sphericity

test will be the corresponding 𝛼 2 ∙ 100% and 1 𝛼


2 ∙ 100% quantiles of the Tracy-Widom

F1. The following is a quick illustration of the sphericity test based on the TW1 distribution.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
352

3.5.3 Sphericity Test Applications

Let 𝜮 be a positive definite square matrix of size 10 to show the sphericity test under the Gaussian

assumption. Consider a diagonal matrix whose elements are ten positive real-valued integers, such

as those represented by the vector 𝑽 5, 5, 1, 5, 4,1, 2, 3, 5, 5 . This should be the simplest

𝟏
approach to construct 𝜮. The inverse 𝜮 of 𝜮 is also a diagonal matrix, the elements created by

inverting each element of 𝜮. Then, given and the mean vector 𝝁 𝟎 , the MATLAB function

"mvnrnd" can be used to create 𝑛 40 observations of the 𝑝 40 random variables selected from

𝑁 𝝁, 𝜮 . The resulting matrix reflects S, which may then be used to calculate 𝑪 𝜮 𝑺, as well

as the test statistics required to validate the null hypothesis H0: does 𝑪 𝜎 𝑰 ? Under the Gaussian

Assumption, any of the sphericity tests performed on C using the p-value computed from l

(Equation 3.38) or the greatest eigenvalue 𝑙 𝑛𝑙 of C (Johnstone’s Theorem) should validate

H0' s validity. The content for each test is provided in the subsections that follow.

3.5.3.1 Sphericity Test Based on the Mauchly (1940)’s Ratio

One may verify H0 through the application of Equation 3.38. Using this equation, one can

compute: 𝑙 2.1869e , then deduct the value of the test statistic 𝑛log 𝑙 58.3022.

Independently, the degree of freedom 𝑓 can be calculated as 𝑓 1 54, and then used

to find the 99% quantile of the 𝜒 distribution with 54 degrees of freedom. That is 𝜒 . ,

81.069, or simply written as 𝑃 𝜒2 81.069 0.99. Note that given α and a degree of freedom

(d.f.) f, one can either use a lookup table or MATLAB function (chi2inv) to derive or calculate the

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
353

corresponding value of 𝜒 . Since the theoretical sample distribution of 𝑛log 𝑙 follows a

𝜒 distribution with mean 𝑓 54, it is helpful to graphically depict all the test results in a graph

like the one in

Figure 3.7. This graph helps to make conjectures about the test in question, which by design is a

one-tailed test for which obtaining an extreme test statistic t such that 𝑃 𝜒2 𝑡 0.01, would

result in the rejection of H0. From

Figure 3.7, it is evident that the test statistic 58.3022 is not significant since it is less than the value

of 𝜒 . , . Therefore, with a confidence level of 99%, there is no reason for rejecting the

hypothesis H0 that S is a sample covariance matrix equal to the population with (known) true

covariance matrix 𝜮.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
354

Figure 3.7: Illustration of the Mauchly (1940)’s Sphericity Test

3.5.3.2 Sphericity Tests Based on TW F1

A similar verification of H0 may be carried out using the largest eigenvalues of C in a two-tailed

test with the significance level 𝛼 0.01. Let an eigenvalue decomposition of C be performed to

calculate its eigenvalue vector 𝑒𝑖𝑔 𝑪 below provided and derive the largest eigenvalue 𝑙

2.0704 of C as the maximum of 𝑒𝑖𝑔 𝑪 .

𝑒𝑖𝑔 𝑪 2.0704, 1.7830, 1.4632, 1.4036, 1.0915, 0.3626, 0.8385, 0.7326, 0.4706, 0.5742

Then, substitute n and p by their respective values into Equation 2.41 to compute the centering and

scaling constants 𝜇 and 𝜎 . This should lead to 𝜇 87.7427 and 𝜎 7.3523. By

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
355

replacing each expression by its value in 𝑛𝑙 𝜇 𝜎 , one should find a p-value of 0.6699.

Since the test statistic 𝑛𝑙 𝜇 𝜎 is known to have a Tracy-Widom F1 distribution, one may

draw a similar graph to

Figure 3.7. to graphically show all the results. To determine the corresponding quantiles of the

Tracy-Widom F1 at 0.5% and 99.5%, one may either derive them from a lookup table such as Table

C.1 proposed by Bejan (2005) or compute them using Dieng's (2005) MATLAB codes.

Figure 3.8 provides all the information necessary to make inferences about H0. This figure shows

that the value of 0.6699 does not fall in any rejection regions located at the tails of the 1%

double-sided significance regions of the Tracy-Widom F1 distribution of the test statistic.

Therefore, the null hypothesis H0 can be accepted at the confidence level of 99%.

Figure 3.8: Illustration of Sphericity Tests Based on TW F1 p-Values

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
356

3.5.4 The Spiked Model (Johnstone 2001, 2006)

The following describes the spiked model, an empirical derivation by Johnstone (2001). Similar

to the covariance matrices with unique structures presented in Section 3.3.3, this model is very

sought in PCA. However, in practice, there are frequently one or more significant eigenvalues that

are clearly distinguished from the rest of the data. This begs whether they would pull up the other

values if there were only one or a small number of non-unit eigenvalues in the population, for

example. Consider, for example, a "spiked" covariance model in Equation 3.41 with a fixed

number of eigenvalues greater than one, say r, and an eigenvalue greater than one.

𝛴 𝑑𝑖𝑎𝑔 𝜆 , ⋯ , 𝜆 , 1, ⋯ ,1
[Eq. 3-41]

For this model, the author introduced the notation ℒ 𝑙 \𝑛, 𝑝, 𝜮 for the distribution of the kth

largest eigenvalue of the sample covariance matrix 𝑨 𝑿 𝑿 where the 𝑛 𝑝 matrix X is derived

from n independent draws from 𝑁 0, 𝜮 .

In fact, the 𝑟 1 st eigenvalue in the null model with 𝑝 𝑟 variables.

3.5.5 Phase Transition and Tracy-Widom Distribution

When examining the Tracy-Widom test for the first component, Baik and Silverstein (2006) must

consider that it established a detection limit. Equation 3.42 below gives the 𝜆 threshold,

𝑝
𝜆 1 [Eq. 3-42]
𝑛

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
357

where n is the sample size and p is the number of variables. If the first population eigenvalue is

less than the threshold, the sample eigenvalues are Tracy-Widom distributed, and hence the first

component cannot be distinguished from noise (Baik et al., 2005). As seen in Figure 3.9 below, a

so-called phase transition occurs when the sample eigenvalues are over the threshold (Saccinti et

al. 2017).

Figure 3.9: Illustrations of Transition between Two Distinct Phases–Strong and Weak
Courtesy of Majumdar and Schehr (2014)

As shown above, a phase transition separates the TW distribution's left and right tails. It identifies

a key zone of crossover. Over this critical zone, the system transitions from the weak coupling

(stable) to the strong coupling (unstable) phase. It is analogous to May's model's third-order stable-

unstable transition. At finite N, the Tracy-Widom distribution precisely reflects the crossover

behavior of the free energy from one phase to the other (Majumdar and Schehr 2014). The concept

of phase transition is linked to the work of May (1972) on probing the stability of large complex

ecosystems is the first known direct application of the statistics of the largest eigenvalue of the

covariance matrices. Through his work, May (1972) found that these systems, connected at

random, are stable until they reach some critical level of connectance—the proportion of links per

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
358

species—as it increases to become unstable abruptly. By their behaviors, construction project

network schedules are like complex systems. The intricacy and pairwise associations (thousands

in large projects) between activities making up a project network schedule may help explain this

similarity.

3.6 Research Objectives

3.6.1 Research Objective 1

Examine the PCA literature for the method used to select the number of principal components in

future studies since PCA is a means to an aim. Review PCA literature for regression analysis

applications based on covariance or correlation matrices' eigenvalues.

3.6.2 Research Objective 2

Select a few networks varying in size and complexity from the benchmark schedules to apply the

suitable PCA algorithms discovered during the literature study and essential for data reduction.

This objective uses the available algorithms developed to transform a project network into a matrix

and use it for deriving simulated input for various applications.

3.6.3 Research Objective 3

Summarize and interpret findings, infer the number of components required for each project

network based on their sizes, complexity, and other characteristics (e.g., statistical feature of the

covariance matrix), and provide recommendations.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
359

3.6.4 Research Objective 4

Conduct extensive simulations of test schedules to develop a regression model to predict project

activity durations and delays by selecting features that capture the maximum variance in the

activity duration data.

3.6.5 Research Objective 5

Relate the phase transition phenomenon in project schedules to the one observed in the minuscule

margins of the Tracy-Widom distribution to identify similarities, derive the phase transition zone

formula or location, and devise a method for constructing resilient project schedules.

3.7 Research Methodology

This part will outline the methods used to complete the work at hand to accomplish the Chapter's

objectives. This methodology is devised on the preceding Chapter's findings, which examined the

fundamental behavior of building project timelines. In other words, this methodology uses the

previous Chapter's discoveries to fulfill the current Chapter's objective. Additionally, the results

of the substantial literature research undertaken as part of this Chapter's objectives are critical in

developing the approach for this Chapter. The methodology is divided into three distinct sections

to cut to the chase. The first one establishes a set of prerequisites for its application. The second

section analyzes the data acquired from the literature research to identify PCA approaches used in

this Chapter. The third section utilizes the approaches described in the literature review to construct

the necessary procedures for acquiring all the results required for project network schedule

analysis. This section will conclude with the last segment.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
360

3.7.1 Assumptions

The following are conditions that must be met to use the methods supplied. First, the mathematical

model presented in Table 2-19 for project network schedules applies to PCA's prospective project

network schedule. Second, given the data matrix derived from the population of i.i.d. durations of

project network activities in question, there exists an optimum sample size 𝑛 at significance

level α (see Section 2.4.6). Third, this optimal sample size ensures sufficient correlation in the data

to apply the universal results required for hypothesis testing. Finally, according to the conclusions

of Chapter 2, when the sample covariance matrix obtained from the project network is normalized,

at least one of the largest eigenvalues follows the Tracy-Widom limit law of order 1.

3.7.2 Analysis of Literature Review

The study of the information gained by reviewing the literature on the topic of principal component

analysis to minimize data has yielded critical information required for developing the procedures

of this methodology. First, a literature survey on selecting the number of statistically significant

principal components provided criteria that can serve this purpose. Following a thorough

examination of PCA applications in the construction and engineering fields, the rules based on the

cumulative percentage of the total variance, scree plots, and hypothesis testing are appropriate for

this study. Second, the PCA based on correlation matrices instead of covariance matrices will be

applicable for project network scheduling data to determine which principal components to keep.

The reason for this is that the proposed methodology in Section 2.4.6 requires data to be

standardized before use. It suggests two normalization procedures in terms of Norm I and Norm

II. According to Chapter 2, Norm II produced more substantial results than Norm I. Third, because

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
361

project network schedules are not quite normally distributed, Chapter 2 revealed the presence of a

covariance structure in project network schedules when an optimum sample size 𝑛 was used.

As a result, a hypothesis test based on the TW distribution, such as Johnstone's (2001), is

appropriate. Finally, using the discovered significant principal components, a linear regression

model can be developed using the methods described in Section 3.4.3. Finally, the phase transition

location in project network schedules can be established. This last observation summarizes the

entire section.

3.7.3 Procedure for Conducting a PCA for Construction Project Network Schedules

The following step-by-step processes are based on the preceding section's literature review

analysis.

First step: choose a normalization method for the largest eigenvalues of the sample covariance

matrices produced from project network scheduling data. While correlation matrices (R) will be

employed, this phase is critical for obtaining further data from simulation runs. As a result, Norm

II will serve as a method of normalizing.

Second step: Identify the prospective networks for PCA. Since Chapter 2 already identified a

handful of networks for the analysis that yielded significant results, selecting a set from those

networks will be helpful for the analysis to be conducted in Chapter 3. Since time is of the essence

due to the tremendous time and effort to run and collect data from simulations, the following

networks will serve for the analysis: j3037-6, j6028-9, j902-4, and j12014-1 (see Table 2-32 for

significant results on these networks).

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
362

Third step: simulate a project network schedule to create a data matrix 𝑿 with 𝑛 𝑛 . Then,

use the created sample data matrix to derive the standardized matrix 𝑾 from which the

correlation matrix R is determined. Finally, based on the significance level α used to find 𝑛 .,

, ,
determine the sample covariance matrix 𝑺 . Chapter 2 provides all the necessary formulas.

Fourth step: calculate the necessary eigenvalues for the population correlation and sample

covariance matrices. R represents the population correlation matrix (see Johnstone 2001, p. 304).

Fifth step: construct the scree graphic using the correlation and sample covariance matrices'

eigenvalues. The eigenvalues are ordered decreasingly, and their rank number is shown.

Sixth step: analyze the results and make recommendations for further applications.

Seventh Step: device a linear regression model using the method proposed in Section 3.4 based on

the retained significant principal components.

Eighth step: localize phase transition based on the expression of Equation 3.42.

3.7.4 Methodology Conclusion

The final step of the approach provided in the preceding section completes the methodology

employed in Chapter 3 to conduct a PCA to be utilized as input in a linear regression model.

3.8 Simulation Results

Following the proposed technique to satisfy the objectives of Chapter 3, the results of simulations

of project network schedules required to undertake the analysis of principal components are as

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
363

follows. The networks described in Table 3-1 aided in the investigation. The greyed cells represent

the level of significance of the Kolmogorov-Smirnov Test. The significance level of 0.05 was

found to be the most appropriate for the distributional assumption validation on the probabilistic

durations of project networks and to determine the sample covariance matrix.

Significance level α / Probability P


0.01 / 0.99 0.05 / 0.95 0.10 / 0.9 0.20 / 0.80
Network RT KS-test 1 KS-test 2 KS-test 3 KS-test 4
j3037-6 0.6875 C(52), D(49) C(52), D(49) D(49) D(49)
j6028-9 0.4030 C(126) C(126) C(126)
j9010-5 0.2174 C(175), D(174) C(175), D(174) D(174)
j12014-1 0.1780 C(261) C(261) C(261) C(261)
A=1st Eig B=2nd Eig C=3rd Eig D=3rd Eig

Table 3-1: Identified Project Networks for PCA

3.8.1 Analysis of PCs Based on Graphical Analysis and Sample Variabilities

This section examines the outcomes of project network schedule replicas. The simulations aided

in calculating the eigenvalues of the sample covariance matrix S and the population correlation

matrix R=W. Both matrices helped decide the number of principal components to retain for each

identified project network in Table 3-1. Figure 3.10 through Figure 3.13 show two scree plots

derived with the eigenvalues of a correlation matrix (left panel) or a covariance matrix (right panel)

for each network. Some networks feature four plots, whereas others have a couple of plots. This

number was determined from Table 3-1. For example, the K-S test accepted the distributional

assumption on the project network j3037-activities’ durations. At a significance level of 𝛼=0.05,

the limiting distribution of the third (resp. fourth) greatest eigenvalue of the matrix R or S,

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
364

computed from the sample data matrix X of size 52 (resp. 49), is TW of order 1. Any of the figures

shows that a small number of the eigenvalues stand out more than others. In other words, two or a

small number of large sample eigenvalues clearly distinguish themselves from the rest. This is

known as a spiked covariance model described by Equation 3.41.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
365
Eigenvalue

Eigenvalue
Eigenvalue

Eigenvalue

Figure 3.10: Scree Plot for Network j3037-6 (Norm II)

Figure 3.11: Scree Plot for Network j6028-9 (Norm II)

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
366
Eigenvalue

Eigenvalue
Eigenvalue
Eigenvalue

Figure 3.12: Scree Plot for Network j9010-5 (Norm II)

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
367
Eigenvalue

Eigenvalue
Eigenvalue

Eigenvalue

Figure 3.13: Scree Plot for Network j12014-1 (Norm II)

Table 3-2 through Table 3-6 output information obtained from the eigenvalues of the population

correlation (on the left) and covariance (on the right) matrices and the scree plots. The ranks of the

eigenvalues in the Number column, the eigenvalues in the Eigenv. Column, the differences

between consecutive eigenvalues in the Differ column, the proportions of the eigenvalues in the

Prop column, and the cumulative sum of the proportions in the "Cumm" column are all provided

in each table. The cutoff criterion for selection is 80%, and partial tables demonstrate the

eigenvalues chosen as principal components for each network. 6 to 8 for j30 and j60 networks and

14 for j90 and j120 networks.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
368

Network - j3037-6 / Complexity 0.6875 Network - j3037-6 / Complexity 0.6875


Sample Data Matrix Size Sample Data Matrix Size
49 x 32 49 x 32
Population Correlation Matrix Sample Covariance Matrix
32 x 32 32 x 32
Number Eigenv. Differ. Prop. Cumm. Eigenv. Differ. Prop. Cumm.
1 10.7473 2.9122 0.3359 0.3359 11.2307 2.9451 0.3293 0.3293
2 7.8351 4.5787 0.2448 0.5807 8.2856 4.7167 0.2429 0.5722
3 3.2564 1.3128 0.1018 0.6825 3.5689 1.4342 0.1046 0.6769
4 1.9436 0.2865 0.0607 0.7432 2.1348 0.3330 0.0626 0.7395
5 1.6571 0.4132 0.0518 0.7950 1.8017 0.4535 0.0528 0.7923
6 1.2438 0.2443 0.0389 0.8339 1.3482 0.3391 0.0395 0.8318
7 0.9996 0.0891 0.0312 0.8651 1.0091 0.0151 0.0296 0.8614
8 0.9105 0.1653 0.0285 0.8935 0.9940 0.2025 0.0291 0.8906
9 0.7451 0.2834 0.0233 0.9168 0.7915 0.3070 0.0232 0.9138
10 0.4618 0.0630 0.0144 0.9313 0.4844 0.0629 0.0142 0.9280
11 0.3988 0.0457 0.0125 0.9437 0.4216 0.0350 0.0124 0.9404
12 0.3531 0.0832 0.0110 0.9548 0.3866 0.0818 0.0113 0.9517
13 0.2699 0.0322 0.0084 0.9632 0.3049 0.0219 0.0089 0.9606
14 0.2376 0.0163 0.0074 0.9706 0.2830 0.0419 0.0083 0.9689
15 0.2214 0.0637 0.0069 0.9775 0.2411 0.0628 0.0071 0.9760
16 0.1577 0.0351 0.0049 0.9825 0.1783 0.0578 0.0052 0.9812
17 0.1226 0.0181 0.0038 0.9863 0.1205 0.0102 0.0035 0.9848
18 0.1045 0.0089 0.0033 0.9896 0.1103 0.0242 0.0032 0.9880
19 0.0956 0.0319 0.0030 0.9925 0.0861 0.0167 0.0025 0.9905
20 0.0637 0.0182 0.0020 0.9945 0.0694 0.0063 0.0020 0.9925
21 0.0455 0.0092 0.0014 0.9960 0.0631 0.0212 0.0018 0.9944
22 0.0363 0.0093 0.0011 0.9971 0.0419 0.0051 0.0012 0.9956
23 0.0269 0.0085 0.0008 0.9979 0.0367 0.0081 0.0011 0.9967
24 0.0184 0.0031 0.0006 0.9985 0.0286 0.0062 0.0008 0.9975
25 0.0153 0.0022 0.0005 0.9990 0.0224 0.0034 0.0007 0.9982
26 0.0132 0.0064 0.0004 0.9994 0.0190 0.0048 0.0006 0.9988
27 0.0068 0.0023 0.0002 0.9996 0.0142 0.0129 0.0004 0.9992
28 0.0045 0.0011 0.0001 0.9998 0.0013 -0.0087 0.0000 0.9992
29 0.0033 0.0008 0.0001 0.9999 0.0100 0.0058 0.0003 0.9995
30 0.0025 0.0009 0.0001 0.9999 0.0041 -0.0028 0.0001 0.9996
31 0.0016 0.0010 0.0000 1.0000 0.0069 0.0010 0.0002 0.9998
32 0.0005 0.0000 1.0000 0.0059 0.0002 1.0000
Total 32 34.1050

Table 3-2: Principal Components of the Project Network j3037-6 (Size 49 x 32)

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
369

Network - j3037-6 / Complexity 0.6875 Network - j3037-6 / Complexity 0.6875


Sample Data Matrix Size Sample Data Matrix Size
52 x 32 52 x 32
Population Correlation Matrix Sample Covariance Matrix
32 x 32 32 x 32
Number Eigenv. Differ. Prop. Cumm. Eigenv. Differ. Prop. Cumm.
1 10.4668 3.0981 0.3271 0.3271 9.4003 2.6488 0.3225 0.3225
2 7.3687 4.7480 0.2303 0.5574 6.7515 4.3696 0.2316 0.5541
3 2.6207 0.6852 0.0819 0.6393 2.3820 0.6754 0.0817 0.6358
4 1.9355 0.1200 0.0605 0.6997 1.7066 0.0668 0.0585 0.6943
5 1.8156 0.4357 0.0567 0.7565 1.6398 0.3754 0.0563 0.7506
6 1.3799 0.1974 0.0431 0.7996 1.2644 0.1399 0.0434 0.7939
7 1.1825 0.2672 0.0370 0.8366 1.1244 0.2827 0.0386 0.8325
8 0.9153 0.1898 0.0286 0.8652 0.8418 0.1913 0.0289 0.8614
9 0.7255 0.1154 0.0227 0.8878 0.6504 0.1116 0.0223 0.8837
10 0.6101 0.1160 0.0191 0.9069 0.5389 0.0898 0.0185 0.9022

Table 3-3: Principal Components of the Project Network j3037-6 (Size 52 x 32)

Network - j6028-9 / Complexity 0.403 Network - j6028-9 / Complexity 0.403


Sample Data Matrix Size Sample Data Matrix Size
126 x 62 126 x 62
Population Correlation Matrix Sample Covariance Matrix
62 x 62 62 x 62
Number Eigenv. Differ. Prop. Cumm. Eigenv. Differ. Prop. Cumm.
1 24.0643 14.2798 0.3881 0.3881 15.8194 9.3723 0.3864 0.3864
2 9.7845 5.0139 0.1578 0.5459 6.4471 3.3344 0.1575 0.5438
3 4.7706 1.7991 0.0769 0.6229 3.1127 1.1622 0.0760 0.6199
4 2.9715 0.2529 0.0479 0.6708 1.9505 0.1508 0.0476 0.6675
5 2.7186 0.3806 0.0438 0.7147 1.7997 0.2658 0.0440 0.7114
6 2.3380 0.1668 0.0377 0.7524 1.5339 0.1019 0.0375 0.7489
7 2.1712 0.9309 0.0350 0.7874 1.4321 0.6160 0.0350 0.7839
8 1.2403 0.0934 0.0200 0.8074 0.8161 0.0464 0.0199 0.8038
9 1.1469 0.0131 0.0185 0.8259 0.7696 0.0219 0.0188 0.8226
10 1.1338 0.1631 0.0183 0.8442 0.7478 0.0938 0.0183 0.8409
11 0.9707 0.0764 0.0157 0.8598 0.6539 0.0581 0.0160 0.8568
12 0.8943 0.0916 0.0144 0.8743 0.5958 0.0571 0.0146 0.8714
13 0.8027 0.0273 0.0129 0.8872 0.5387 0.0265 0.0132 0.8846

Table 3-4: Principal Components of the Project Network j6028-9

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
370

Network - j9010-5 / Complexity 0.2174 Network - j9010-5 / Complexity 0.2174


Sample Data Matrix Size Sample Data Matrix Size
174 x 92 174 x 92
Population Correlation Matrix Sample Covariance Matrix
92 x 92 92 x 92
Number Eigenv. Differ. Prop. Cumm. Eigenv. Differ. Prop. Cumm.
1 21.9584 9.4440 0.2387 0.2387 17.9901 7.7230 0.2379 0.2379
2 12.5143 4.9819 0.1360 0.3747 10.2671 4.0650 0.1358 0.3737
3 7.5324 1.1156 0.0819 0.4566 6.2021 0.9361 0.0820 0.4557
4 6.4168 1.5304 0.0697 0.5263 5.2660 1.2656 0.0696 0.5253
5 4.8864 0.5722 0.0531 0.5794 4.0003 0.4719 0.0529 0.5782
6 4.3141 0.7759 0.0469 0.6263 3.5284 0.6227 0.0467 0.6248
7 3.5382 1.0975 0.0385 0.6648 2.9057 0.9078 0.0384 0.6633
8 2.4407 0.1709 0.0265 0.6913 1.9979 0.1302 0.0264 0.6897
9 2.2698 0.2713 0.0247 0.7160 1.8677 0.2104 0.0247 0.7144
10 1.9985 0.0888 0.0217 0.7377 1.6573 0.0895 0.0219 0.7363
11 1.9097 0.3767 0.0208 0.7585 1.5677 0.3161 0.0207 0.7570
12 1.5330 0.1037 0.0167 0.7751 1.2517 0.0703 0.0166 0.7736
13 1.4293 0.1974 0.0155 0.7907 1.1814 0.1582 0.0156 0.7892
14 1.2319 0.0495 0.0134 0.8041 1.0232 0.0408 0.0135 0.8027
15 1.1824 0.0328 0.0129 0.8169 0.9824 0.0348 0.0130 0.8157
16 1.1496 0.1209 0.0125 0.8294 0.9476 0.1099 0.0125 0.8283
17 1.0287 0.1520 0.0112 0.8406 0.8378 0.1137 0.0111 0.8393
18 0.8766 0.0032 0.0095 0.8501 0.7241 0.0171 0.0096 0.8489
19 0.8734 0.0482 0.0095 0.8596 0.7070 0.0281 0.0093 0.8583
20 0.8252 0.0504 0.0090 0.8686 0.6789 0.0465 0.0090 0.8672
21 0.7748 0.0544 0.0084 0.8770 0.6324 0.0360 0.0084 0.8756
22 0.7204 0.0310 0.0078 0.8848 0.5964 0.0306 0.0079 0.8835
23 0.6894 0.0461 0.0075 0.8923 0.5658 0.0301 0.0075 0.8910
24 0.6433 0.0460 0.0070 0.8993 0.5357 0.0393 0.0071 0.8980
25 0.5973 0.0530 0.0065 0.9058 0.4965 0.0477 0.0066 0.9046

Table 3-5: Principal Components of the Project Network j9010-5

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
371

Network - j12014-1 / Complexity 0.178 Network - j12014-1 / Complexity 0.178


Sample Data Matrix Size Sample Data Matrix Size
261 x 122 261 x 122
Population Correlation Matrix Sample Covariance Matrix
122 x 122 122 x 122
Number Eigenv. Differ. Prop. Cumm. Eigenv. Differ. Prop. Cumm.
1 30.0706 8.5257 0.2465 0.2465 20.1573 5.7448 0.2461 0.2461
2 21.5449 13.1694 0.1766 0.4231 14.4125 8.8081 0.1760 0.4221
3 8.3755 2.1461 0.0687 0.4917 5.6044 1.4146 0.0684 0.4905
4 6.2294 0.5517 0.0511 0.5428 4.1898 0.3793 0.0512 0.5417
5 5.6777 1.4191 0.0465 0.5893 3.8105 0.9586 0.0465 0.5882
6 4.2587 0.5772 0.0349 0.6242 2.8519 0.3783 0.0348 0.6230
7 3.6815 0.2354 0.0302 0.6544 2.4735 0.1621 0.0302 0.6532
8 3.4461 0.1727 0.0282 0.6827 2.3115 0.1205 0.0282 0.6815
9 3.2734 0.3661 0.0268 0.7095 2.1910 0.2269 0.0268 0.7082
10 2.9073 0.3983 0.0238 0.7333 1.9641 0.2770 0.0240 0.7322
11 2.5089 0.3331 0.0206 0.7539 1.6871 0.2305 0.0206 0.7528
12 2.1758 0.1236 0.0178 0.7717 1.4566 0.0840 0.0178 0.7706
13 2.0523 0.2563 0.0168 0.7885 1.3726 0.1640 0.0168 0.7873
14 1.7960 0.2116 0.0147 0.8033 1.2086 0.1480 0.0148 0.8021
15 1.5844 0.3039 0.0130 0.8163 1.0606 0.1955 0.0130 0.8151
16 1.2805 0.0801 0.0105 0.8267 0.8651 0.0650 0.0106 0.8256
17 1.2004 0.0765 0.0098 0.8366 0.8001 0.0466 0.0098 0.8354
18 1.1239 0.0781 0.0092 0.8458 0.7535 0.0572 0.0092 0.8446
19 1.0458 0.0829 0.0086 0.8544 0.6962 0.0504 0.0085 0.8531
20 0.9629 0.0665 0.0079 0.8623 0.6458 0.0421 0.0079 0.8610
21 0.8965 0.0312 0.0073 0.8696 0.6037 0.0240 0.0074 0.8683
22 0.8652 0.0296 0.0071 0.8767 0.5797 0.0206 0.0071 0.8754
23 0.8357 0.0601 0.0068 0.8836 0.5590 0.0413 0.0068 0.8822
24 0.7756 0.1030 0.0064 0.8899 0.5178 0.0638 0.0063 0.8886
25 0.6726 0.0411 0.0055 0.8954 0.4539 0.0230 0.0055 0.8941
26 0.6316 0.0072 0.0052 0.9006 0.4309 0.0082 0.0053 0.8994
27 0.6244 0.0602 0.0051 0.9057 0.4227 0.0452 0.0052 0.9045

Table 3-6: Principal Components of the Project Network j12014-1

3.8.2 PCA Based on Hypothesis Testing

The analysis based on the TW p-values was conducted to supplement the analysis based on the

scree plots and the summary of the variability in the eigenvalues. They are all complementary.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
372

As expressed by Equation 3.40, the double-sided hypothesis test allows each value in Table 3-2

through Table 3-6 to be tested at the significance level of 0.05 based on the Tracy-Widom p-value

provided in Table 3-7.

TW p-value
st nd
Probability 1 Eig. 2 Eig. 3rd Eig. 4th Eig.
0.005 ‐4.1490 ‐5.7302 ‐7.0585 ‐8.2483
0.025 ‐3.5165 ‐5.1769 ‐6.5488 ‐7.7685
0.05 ‐3.1816 ‐4.8876 ‐6.2822 ‐7.5199
0.1 ‐2.7830 ‐4.5467 ‐5.9727 ‐7.2283
0.3 ‐1.9116 ‐3.8156 ‐5.3108 ‐6.6121
0.5 ‐1.2694 ‐3.2911 ‐4.8401 ‐6.1753
0.7 ‐0.5924 ‐2.7509 ‐4.3602 ‐5.7318
0.8 ‐0.1662 ‐2.4160 ‐4.0650 ‐5.4600
0.9 0.4495 ‐1.9431 ‐3.6489 ‐5.0793
0.95 0.9789 ‐1.5422 ‐3.3004 ‐4.7613
0.975 1.4530 ‐1.1893 ‐2.9948 ‐4.4829
0.99 2.0232 ‐0.7703 ‐2.6346 ‐4.1561
0.995 2.4217 ‐0.4810 ‐2.3875 ‐3.9317

Table 3-7: p-Values of the Tracy-Widom Distribution

It is worth noting that Equation 3.40 only describes the test for the first largest eigenvalue using

the Norm I normalization approach. Adjusting the hypothesis test statement for the successive

eigenvalues and the normalization procedure Norm II is necessary. Saccinti and Timmerman

(2017)'s publication contains valuable information for redefining Equation 3.40 for normalizing

approaches and the next-next eigenvalue. The availability of TW p-values after the fourth

eigenvalue may be a problem. For example, the RMTFredholmToolbox built-in Matlab

(Bornemann, 2009, 2010) is an excellent tool for those familiar with Matlab.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
373

3.8.3 Phase Transition

Investigating the Presence of a Phase Transition in Schedules for Construction Projects

If the first population eigenvalue is less than the threshold, the sample eigenvalues are Tracy-

Widom distributed, and hence the first component cannot be distinguished from noise (Baik et al.,

2005). As seen in Figure 3.9 below, a so-called phase transition occurs when the sample

eigenvalues are over the threshold (Saccinti et al. 2017).

Threshold Value for a Phase Transition


First Eigenvalue
𝑝
Network Size n p Correlation Covariance 1
𝑛
j3037-6 49 x 32 49 32 10.7473 11.2307 1.8081
j3037-6 52 x 32 52 32 10.4668 9.4003 1.7845
j6028-9 126 x 62 126 62 24.0643 15.8194 1.7015
j9010-5 174 x 92 174 92 21.9584 17.9901 1.7271
j12014-1 261 x 122 261 122 30.0706 20.1573 1.6837

Table 3-8: Threshold Value for a Phase Transition (Baik et al. 2005)

3.9 Conclusions and Contributions to the Body of Knowledge

The eigenvalues of correlation and covariance matrices obtained from a few benchmark project

network schedules provided greater insight into the limiting behaviors of probabilistic durations

of project activities began in Chapter 2. The scree plots and the proportions of the total population

(sample) variance attributed to each variable (component) aided in deciding the number of

principal components to maintain for each network considered for the experiment. Unless further

research is undertaken, the number of maintained principal components appears to be proportional

to the size of project networks. Furthermore, the analysis indicated that Johnston's (2001) spiked

covariance model is a viable candidate for identifying the limiting durations of project network

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
374

activities. The spiked model, an empirical derivation of Johnstone (2001), is a covariance matrix

with a specific structure used to characterize the behavior of a system having one or more

prominent eigenvalues that are easily differentiated from the rest of the data. Furthermore, the

principal component analysis based on hypothesis testing with TW p-values had limitations that

prevented the specified null hypothesis from being adequately evaluated. Finally, the calculated

threshold value indicated the occurrence of phase transitions in the project's network schedules.

3.9.1 Research Contribution 1

Develop a linear regression model using Johnstone's spiked covariance model (2001). This model

is expected to help predict the total project duration by predicting the limiting duration of each

activity on the project network. This model should be a powerful tool for project managers,

especially for budgeting and tracking projects.

3.9.2 Research Contribution 2

Propose a method for determining the principal components based on the correlation or covariance

matrices derived from project network schedules governed by the universal law of the Tracy-

Widom distribution.

3.9.3 Research Contribution 3

Propose a process for determining the number of principal components to include in a predictive

model that can be used as a prelude to PCA-based models in construction project schedules.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
375

3.9.4 Research Contribution 4

Guide including the phase transition into construction project networks under the TW phase

transition framework. The location of this phase transition is expected to assist practitioners in

identifying the moment at which a construction project schedule may reach a tipping point.

3.9.5 Recommendations for Future Research

The findings of the experiments with a few project networks show that more research is needed to

analyze the principal components of project networks of various sizes and complexities. This will

aid in the development of guidelines for constructing PCA-based models for project networks. In

addition, the spiked model and phase transition are additional subjects that need to be explored

further to develop project scheduling techniques.

3.10 Conclusion

The proposed methodology, which necessitated lengthy simulation, resulted in discovering a new

model based on Johnstone's spiky covariance matrix (2001). This model can be used to forecast

the limiting durations of project activities through a PCA-based linear regression approach.

Additionally, this study established the position of a phase transition in project network schedules

using the Tracy-Widom distribution's universality. A phase transition identifies a key zone of

crossover. Over this critical zone, the system transitions from weak coupling (stable) to strong

coupling (unstable) phase. This discovery is critical because it is likely to assist practitioners in

identifying the moment at which a construction project schedule may become unstable.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
References

Allen, D. M. (1974). "The relationship between variable selection and data augmentation and a
method for prediction." Technometrics, 16(1), 125-127.
Al-Sabah, R., Menassa, C. C., and Hanna, A. (2014). "Evaluating impact of construction risks in
the Arabian Gulf Region from perspective of multinational architecture, engineering and
construction firms." Constr.Manage.Econ., 32(4), 382-402.
Anderson, T. W. (. (2003). An introduction to multivariate statistical analysis. Wiley-
Interscience, Hoboken, N.J.
Arabzadeh, R., Kholoosi, M. M., and Bazrafshan, J. (2016). "Regional hydrological drought
monitoring using principal components analysis." J.Irrig.Drain.Eng., 142(1), 04015029.
Baik, J., Arous, G. B., and Péché, S. (2005). "Phase transition of the largest eigenvalue for
nonnull complex sample covariance matrices." The Annals of Probability, 33(5), 1643-1697.
Baik, J., Deift, P., and Johansson, K. (1999). "On the distribution of the length of the longest
increasing subsequence of random permutations." Journal of the American Mathematical
Society, 12(4), 1119-1178.
Baik, J., and Silverstein, J. W. (2006). "Eigenvalues of large sample covariance matrices of
spiked population models." Journal of Multivariate Analysis, 97(6), 1382-1408.
Bartlett, M. S. (1950). "Tests of significance in factor analysis." Br.J.Psychol.,.
Bejan, A. (2005). "Largest eigenvalues and sample covariance matrices. Tracy-Widom and
Painlevé II: computational aspects and realization in s-plus with applications." Preprint:
Http://Www.Vitrum.Md/Andrew/MScWrwck/TWinSplus.Pdf, .
Bianchini, A. (2014). "Pavement Maintenance Planning at the Network Level with Principal
Component Analysis." J Infrastruct Syst, 20(2), 4013013.
Bornemann, F. (2010). "On the numerical evaluation of Fredholm determinants." Mathematics of
Computation, 79(270), 871-915.
Cangelosi, R., and Goriely, A. (2007). "Component retention in principal component analysis
with application to cDNA microarray data." Biology Direct, 2(1), 1-21.
Cattell, R. B. (1966a). "The scree test for the number of factors." Multivariate Behavioral
Research, 1(2), 245-276.
Cattell, R. B. (1966b). "The scree test for the number of factors." Multivariate Behavioral
Research, 1(2), 245-276.

376

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Chai, C. S., Yusof, A. M., and Habil, H. (2015). "Delay mitigation in the Malaysian housing
industry: A structural equation modeling approach." Journal of Construction in Developing
Countries, 20(1), 65.
Choi, Y., Ng, C. T., and Lim, J. (2017). "Regularized LRT for large scale covariance matrices:
One sample problem." Journal of Statistical Planning and Inference, 180 108-123.
Craddock, J. M., and Flood, C. R. (1969). "Eigenvectors for representing the 500 mb
geopotential surface over the Northern Hemisphere." Q.J.R.Meteorol.Soc., 95(405), 576-593.
Dao, B., Anderson, S., and Esmaeili, B. (2017). "Developing a Satisfactory Input for Project
Complexity Model Using Principal Component Analysis (PCA)." Computing in Civil
Engineering, 125-131.
Donald, D. A., Everingham, Y. L., McKinna, L. W., and Coomans, D. (2009). "3.23 - Feature
Selection in the Wavelet Domain: Adaptive Wavelets." Comprehensive Chemometrics, S. D.
Brown, R. Tauler, and B. Walczak, eds., Elsevier, Oxford, 647-679.
Dyer, T. G. (1975). "The assignment of rainfall stations into homogeneous groups: an application
of principal component analysis." Q.J.R.Meteorol.Soc., 101(430), 1005-1013.
El-Kholy, A. M. (2021). "Exploring the best ANN model based on four paradigms to predict
delay and cost overrun percentages of highway projects." International Journal of Construction
Management, 21(7), 694-712.
Forkman, J., Josse, J., and Piepho, H. (2019). "Hypothesis tests for principal component analysis
when variables are standardized." Journal of Agricultural, Biological and Environmental
Statistics, 24(2), 289-308.
Franklin, S. B., Gibson, D. J., Robertson, P. A., Pohlmann, J. T., and Fralish, J. S. (1995).
"Parallel analysis: a method for determining significant principal components." Journal of
Vegetation Science, 6(1), 99-106.
GarcÇa-Alvarez, D. (2009). "Fault detection using principal component analysis (PCA) in a
wastewater treatment plant (WWTP)." Proceedings of the International Student's Scientific
Conference, 55-60.
Ghosh, S., and Jintanapakanont, J. (2004). "Identifying and assessing the critical risk factors in
an underground rail project in Thailand: a factor analysis approach." Int.J.Project
Manage., 22(8), 633-643.
Gleser, L. J. (1966). "A note on the sphericity test." The Annals of Mathematical Statistics, 37(2),
464-467.
Greenacre, M., and Hastie, T. (1987). "The geometric interpretation of correspondence
analysis." Journal of the American Statistical Association, 82(398), 437-447.

377

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Horn, J. L. (1965). "A rationale and test for the number of factors in factor
analysis." Psychometrika, 30(2), 179-185.
Hotelling, H. (1933). "Analysis of a complex of statistical variables into principal
components." J.Educ.Psychol., 24(6), 417.
Ikediashi, D. I., Ogunlana, S. O., and Alotaibi, A. (2014). "Analysis of project failure factors for
infrastructure projects in Saudi Arabia: A multivariate approach." Journal of Construction in
Developing Countries, 19(1), 35.
Johansson, K. (1998). "The longest increasing subsequence in a random permutation and a
unitary random matrix model." Mathematical Research Letters, 5(1), 68-82.
John, S. (1971). "Some optimal multivariate tests." Biometrika, 58(1), 123-127.
Johnson, R. A., and Wichern, D. W. (2019). Applied Multivariate Statistical Analysis (Classic
Version), 6th Edition. Pearson Prentice Hall, Upper Saddle River, New Jersey.
Johnstone, I. M. (2001). "On the distribution of the largest eigenvalue in principal components
analysis." Annals of Statistics, 295-327.
Johnstone, I. M., and Paul, D. (2018). "PCA in high dimensions: An orientation." Proc
IEEE, 106(8), 1277-1292.
Jolliffe, I. T. (2002). Principal Component Analysis, Springer, 2nd Edition.
Kaiser, H. F. (1960). "The application of electronic computers to factor analysis." Educational
and Psychological Measurement, 20(1), 141-151.
Karji, A., Namian, M., and Tafazzoli, M. (2020). "Identifying the key barriers to promote
sustainable construction in the United States: a principal component
analysis." Sustainability, 12(12), 5088.
Kendall, M. G., and Stuart, A. (1968). "The Advanced Theory of Statistics, Vol. 3, Charles
Griffin & Co." London, 1946Kendall1The Advanced Theory of Statistics1946.
Kevric, J., and Subasi, A. (2014). "The effect of multiscale PCA de-noising in epileptic seizure
detection." J.Med.Syst., 38(10), 1-13.
Korin, B. P. (1968). "On the distribution of a statistic used for testing a covariance
matrix." Biometrika, 55(1), 171-178.
Lam, K. C., Hu, T. S., and Ng, S. T. (2005). "Using the principal component analysis method as
a tool in contractor pre‐qualification." Constr.Manage.Econ., 23(7), 673-684.
Li, T., Zhang, H., Yuan, C., Liu, Z., and Fan, C. (2012). "A PCA-based method for construction
of composite sustainability indicators." The International Journal of Life Cycle
Assessment, 17(5), 593-603.
378

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Majumdar, S. N., and Schehr, G. (2014). "Top eigenvalue of a random matrix: large deviations
and third order phase transition." Journal of Statistical Mechanics: Theory and
Experiment, 2014(1), P01012.
Mauchly, J. W. (1940). "Significance test for sphericity of a normal n-variate distribution." The
Annals of Mathematical Statistics, 11(2), 204-209.
May, R. M. (1972). "Will a large complex system be stable?" Nature, 238(5364), 413-414.
McCord, J., McCord, M., Davis, P. T., Haran, M., and Rodgers, W. J. (2015). "Understanding
delays in housing construction: evidence from Northern Ireland." Journal of Financial
Management of Property and Construction.
Naik, G. R. (2017). Advances in principal component analysis: research and
development. Springer.
Nam, K., Ifaei, P., Heo, S., Rhee, G., Lee, S., and Yoo, C. (2019). "An efficient burst detection
and isolation monitoring system for water distribution networks using multivariate statistical
techniques." Sustainability, 11(10), 2970.
Palau, C. V., Arregui, F. J., and Carlos, M. (2012). "Burst Detection in Water Networks Using
Principal Component Analysis." J Water Res Plan Man, 138(1), 47-54.
Pearson, K. (1901). "LIII. On lines and planes of closest fit to systems of points in space." The
London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11), 559-
572.
Péché, S. (2009). "Universality results for the largest eigenvalues of some sample covariance
matrix ensembles." Probability Theory and Related Fields, 143(3), 481-516.
Péché, S. (2008). "The edge of the spectrum of random matrices." Habilitationa Diriger Des
Recherches, Université Joseph Fourier Grenoble I, to be Submitted, .
Pillai, K., and Nagarsenker, B. N. (1971). "On the distribution of the sphericity test criterion in
classical and complex normal populations having unknown covariance matrices." The Annals of
Mathematical Statistics, 42(2), 764-767.
Reddy, G. T., Reddy, M. P. K., Lakshmanna, K., Kaluri, R., Rajput, D. S., Srivastava, G., and
Baker, T. (2020). "Analysis of dimensionality reduction techniques on big data." IEEE Access, 8
54776-54788.
Robey, R. R., and Barcikowski, R. S. (1987). "Sphericity Tests and Repeated Measures Data."
Roweis, S. T., and Saul, L. K. (2000). "Nonlinear dimensionality reduction by locally linear
embedding." Science, 290(5500), 2323-2326.
Roy, S. N. (1953). "On a heuristic method of test construction and its use in multivariate
analysis." The Annals of Mathematical Statistics, 24(2), 220-238.
379

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Saccenti, E., and Timmerman, M. E. (2017). "Considering Horn's parallel analysis from a
random matrix theory point of view." Psychometrika, 82(1), 186-209.
Shin, E., Hwang, C. E., Lee, B. W., Kim, H. T., Ko, J. M., Baek, I. Y., Lee, Y., Choi, J. S., Cho,
E. J., and Seo, W. T. (2012). "Chemometric approach to fatty acid profiles in soybean cultivars
by principal component analysis (PCA)." Preventive Nutrition and Food Science, 17(3), 184.
Shubham, K. (2021). "Covid-19 Data Analysis For Second Wave Indian Pandemic SEIR Model
By Using Principal Component Analysis Tool." Turkish Journal of Computer and Mathematics
Education (TURCOMAT), 12(9), 2907-2915.
Sorzano, C. O. S., Vargas, J., and Montano, A. P. (2014). "A survey of dimensionality reduction
techniques." arXiv Preprint arXiv:1403.2877, .
Tahir, M. M., Haron, N. A., Alias, A. H., and Diugwu, I. A. (2017). "Causes of delay and cost
overrun in Malaysian construction industry." Global Civil Engineering Conference, Springer, 47-
57.
Van Der Maaten, L., Postma, E., and Van den Herik, J. (2009). "Dimensionality reduction: a
comparative." J Mach Learn Res, 10(66-71), 13.
Wang, Q., and Yao, J. (2013). "On the sphericity test with large-dimensional
observations." Electronic Journal of Statistics, 7 2164-2192.
Zhang, X., Huang, S., Yang, S., Tu, R., and Jin, L. (2020). "Safety assessment in road
construction work system based on group AHP-PCA." Mathematical Problems in
Engineering, 2020.
Zou, C., Peng, L., Feng, L., and Wang, Z. (2014). "Multivariate sign-based high-dimensional
tests for sphericity." Biometrika, 101(1), 229-236.

380

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
CHAPTER 4
Summary and Conclusions

Chapter Summary

The following are essential conclusions of this study.

4.1 Conclusions and Contributions to the Body of Knowledge

4.1.1 Introduction (Chapter 1)

The suggested introductory chapter for this study provided the framework for the subsequent work.

In addition, its framework may serve as a model for others, particularly for subjects that are being

applied for the first time in a particular discipline. The trick is to select and present pertinent

background material that adequately prepares readers and researchers for the task ahead.

4.1.2 An Investigation of the Underlying Behavior of Construction Project Network

Schedules (Chapter 2)

The extensive empirical analysis, which was developed by adopting and adapting proven

procedures from other fields of application of the TW limiting laws, led to achieving the chapter's

objectives, which were defined based on the scope of its examination. The conclusions and

recommendations for this chapter are as follows. (1) Add construction project management and

Engineering to the list of domains where the Tracy-Widom limit laws based on Random Matrix

Theory (RMT) have successfully examined large dimensionality complex systems; (2) Propose a

mathematical model for project network schedules based on well-established results in probability and

statistics and project-scheduling approaches, which can serve to investigate their behavior and improve
381

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
382

existing scheduling techniques; (3) Based on the newly established pattern for project network

schedules, it is possible to evaluate predictions regarding the durations of project network activities

and the entire project; (4) Create a methodology based on multivariate statistical and graphical data

analysis methodologies which assisted in demonstrating that the Tracy-Widom limit law of order 1

governs the joint sampling distribution of project activity durations that may be used to forcast the

project's limiting duration and the limiting duration of each activity comprising the project network

schedule, beyond which any delay will be irreversible; (5) Initiate a research study to investigate the

relationships between a measure of project network complexity and the sample size required to draw

an optimum number of samples from a population of identically distributed activity durations as a

prerequisite for studying project networks using RMT.

4.1.3 Application of PCA for Data Reduction in Modeling Project Network Schedules

Based on the Universality Concept in RMT (Chapter 3)

To reduce data using PCA, eigenvalues and eigenvectors serve for this purpose. The eigenvalues

of correlation and covariance matrices derived from a few benchmark project network schedules

shed additional light on the limiting behaviors of probabilistic durations of project activities

discussed in Chapter 2. The scree plots and proportions of the total variance were used to decide

how many PCs to keep for each network in the experiment. Unless further research is done, the

number of preserved PCs looks to be related to project network size. The investigation also

revealed that Johnston's (2001) spiked covariance model could be used to detect project network

activity time limits. It is a covariance matrix with a specific structure that is used to characterize

the behavior of a system with one or more significant eigenvalues that stand out from the rest of

the data. Moreover, the principal component analysis using TW p-values had a few restrictions

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
383

that hindered evaluating the null hypothesis. Finally, the determined threshold value reflects the

project's network schedule phase changes.

The contributions are as follows: (1) Develop a linear regression model using Johnstone's spiked

covariance model (2001). This model is expected to help predict the total project duration by

predicting the limiting duration of each activity on the project network. This model should be a

powerful tool for project managers, especially for budgeting and tracking projects; (2) Propose a

method for determining the principal components based on the correlation or covariance matrices

derived from project network schedules governed by the universal law of the Tracy-Widom

distribution. (3) Propose a process for determining the number of principal components to include

in a predictive model that can be used as a prelude to PCA-based models in construction project

schedules; (4) Provide guidance on how to include the phase transition into construction project

networks under the TW phase transition framework.

4.2 Recommendations for Future Research

4.2.1 An Investigation of the Underlying Behavior of Construction Project Network

Schedules (Chapter 2-Recommendations)

The recommendations resulting from Chapter 2 for future research are as follows: (1) While the

current study used only project network schedules from the Project Scheduling Problem Library

(PSPLIB), whose maximum size is 120, future research should extend the analysis to include larger

network schedules from fictitious and real-life projects; (2) Because this analysis revealed no

correlation between restrictiveness (RT) and the number of samples required to satisfy the

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
384

conditions of applying TW-based universal theorems, future research should investigate

alternative complexity metrics, such as the other five identified by this study; (3) While the

normalization approach utilized in this study was specified as a function of n and p using

Johnstone's (2001) celebrated theorem, especially its ad hoc version, future studies might explore

employing a more extended formulation of the centering and scaling functions. Péché (2008), who

expanded on Soshnikov's work, provides an example of such formulation. Using such a

formulation is anticipated to help improve the performance of hypothesis testing necessary to

validate distributional assumptions in circumstances where the sample matrix covariance's first or

second largest eigenvalue and a supercomputer are available for speedy simulations; (4) Future

research may include extending the study to at least the fifth and sixth greatest eigenvalues when

employing the normalization method derived from the work of Johansson (1998) and Baik et al.

(1999). Bornemann (2009) determined the numerical approximations to the Tracy-Widom

distributions (CDF) statistics up to the sixth greatest eigenvalue, allowing for this expansion.

4.2.2 Application of PCA for Data Reduction in Modeling Project Network Schedules

Based on the Universality Concept in RMT (Chapter 3-Recommendations)

The findings of the experiments with a few project networks show that more research is needed to

analyze the principal components of project networks of various sizes and complexities. This will

aid in the development of guidelines for constructing PCA-based models for project networks. In

addition, the spiked model and phase transition are additional subjects that need further exploration

to develop project scheduling techniques.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
385

4.2.3 Conclusion

This final chapter concludes the inquiry work undertaken in this research study. All went well,

with fascinating discoveries and insights that will hopefully assist in enhancing project scheduling

procedures and equipping project managers with modern techniques necessary to sustain and

conquer the problems of the era of enormous data production.

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
APPENDIXES

386

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix A
Original PSPLIB Files
Appendix A.1: PSPLIB Files Converted from.sm to.txt Format

J30 Networks
No. File Name No. File Name No. File Name No. File Name
1 j3010_1.sm 44 j3014_3.sm 87 j3018_6.sm 130 j3021_9.sm
2 j3010_10.sm 45 j3014_4.sm 88 j3018_7.sm 131 j3022_1.sm
3 j3010_2.sm 46 j3014_5.sm 89 j3018_8.sm 132 j3022_10.sm
4 j3010_3.sm 47 j3014_6.sm 90 j3018_9.sm 133 j3022_2.sm
5 j3010_4.sm 48 j3014_7.sm 91 j3019_1.sm 134 j3022_3.sm
6 j3010_5.sm 49 j3014_8.sm 92 j3019_10.sm 135 j3022_4.sm
7 j3010_6.sm 50 j3014_9.sm 93 j3019_2.sm 136 j3022_5.sm
8 j3010_7.sm 51 j3015_1.sm 94 j3019_3.sm 137 j3022_6.sm
9 j3010_8.sm 52 j3015_10.sm 95 j3019_4.sm 138 j3022_7.sm
10 j3010_9.sm 53 j3015_2.sm 96 j3019_5.sm 139 j3022_8.sm
11 j3011_1.sm 54 j3015_3.sm 97 j3019_6.sm 140 j3022_9.sm
12 j3011_10.sm 55 j3015_4.sm 98 j3019_7.sm 141 j3023_1.sm
13 j3011_2.sm 56 j3015_5.sm 99 j3019_8.sm 142 j3023_10.sm
14 j3011_3.sm 57 j3015_6.sm 100 j3019_9.sm 143 j3023_2.sm
15 j3011_4.sm 58 j3015_7.sm 101 j301_1.sm 144 j3023_3.sm
16 j3011_5.sm 59 j3015_8.sm 102 j301_10.sm 145 j3023_4.sm
17 j3011_6.sm 60 j3015_9.sm 103 j301_2.sm 146 j3023_5.sm
18 j3011_7.sm 61 j3016_1.sm 104 j301_3.sm 147 j3023_6.sm
19 j3011_8.sm 62 j3016_10.sm 105 j301_4.sm 148 j3023_7.sm
20 j3011_9.sm 63 j3016_2.sm 106 j301_5.sm 149 j3023_8.sm
21 j3012_1.sm 64 j3016_3.sm 107 j301_6.sm 150 j3023_9.sm
22 j3012_10.sm 65 j3016_4.sm 108 j301_7.sm 151 j3024_1.sm
23 j3012_2.sm 66 j3016_5.sm 109 j301_8.sm 152 j3024_10.sm
24 j3012_3.sm 67 j3016_6.sm 110 j301_9.sm 153 j3024_2.sm
25 j3012_4.sm 68 j3016_7.sm 111 j3020_1.sm 154 j3024_3.sm
26 j3012_5.sm 69 j3016_8.sm 112 j3020_10.sm 155 j3024_4.sm
27 j3012_6.sm 70 j3016_9.sm 113 j3020_2.sm 156 j3024_5.sm
28 j3012_7.sm 71 j3017_1.sm 114 j3020_3.sm 157 j3024_6.sm
29 j3012_8.sm 72 j3017_10.sm 115 j3020_4.sm 158 j3024_7.sm
30 j3012_9.sm 73 j3017_2.sm 116 j3020_5.sm 159 j3024_8.sm
31 j3013_1.sm 74 j3017_3.sm 117 j3020_6.sm 160 j3024_9.sm
32 j3013_10.sm 75 j3017_4.sm 118 j3020_7.sm 161 j3025_1.sm
33 j3013_2.sm 76 j3017_5.sm 119 j3020_8.sm 162 j3025_10.sm
34 j3013_3.sm 77 j3017_6.sm 120 j3020_9.sm 163 j3025_2.sm
35 j3013_4.sm 78 j3017_7.sm 121 j3021_1.sm 164 j3025_3.sm
36 j3013_5.sm 79 j3017_8.sm 122 j3021_10.sm 165 j3025_4.sm
37 j3013_6.sm 80 j3017_9.sm 123 j3021_2.sm 166 j3025_5.sm
38 j3013_7.sm 81 j3018_1.sm 124 j3021_3.sm 167 j3025_6.sm
39 j3013_8.sm 82 j3018_10.sm 125 j3021_4.sm 168 j3025_7.sm

387

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
J30 Networks
No. File Name No. File Name No. File Name No. File Name
40 j3013_9.sm 83 j3018_2.sm 126 j3021_5.sm 169 j3025_8.sm
41 j3014_1.sm 84 j3018_3.sm 127 j3021_6.sm 170 j3025_9.sm
42 j3014_10.sm 85 j3018_4.sm 128 j3021_7.sm 171 j3026_1.sm
43 j3014_2.sm 86 j3018_5.sm 129 j3021_8.sm 172 j3026_10.sm
173 j3026_2.sm 219 j302_8.sm 265 j3034_4.sm 311 j3039_1.sm
174 j3026_3.sm 220 j302_9.sm 266 j3034_5.sm 312 j3039_10.sm
175 j3026_4.sm 221 j3030_1.sm 267 j3034_6.sm 313 j3039_2.sm
176 j3026_5.sm 222 j3030_10.sm 268 j3034_7.sm 314 j3039_3.sm
177 j3026_6.sm 223 j3030_2.sm 269 j3034_8.sm 315 j3039_4.sm
178 j3026_7.sm 224 j3030_3.sm 270 j3034_9.sm 316 j3039_5.sm
179 j3026_8.sm 225 j3030_4.sm 271 j3035_1.sm 317 j3039_6.sm
180 j3026_9.sm 226 j3030_5.sm 272 j3035_10.sm 318 j3039_7.sm
181 j3027_1.sm 227 j3030_6.sm 273 j3035_2.sm 319 j3039_8.sm
182 j3027_10.sm 228 j3030_7.sm 274 j3035_3.sm 320 j3039_9.sm
183 j3027_2.sm 229 j3030_8.sm 275 j3035_4.sm 321 j303_1.sm
184 j3027_3.sm 230 j3030_9.sm 276 j3035_5.sm 322 j303_10.sm
185 j3027_4.sm 231 j3031_1.sm 277 j3035_6.sm 323 j303_2.sm
186 j3027_5.sm 232 j3031_10.sm 278 j3035_7.sm 324 j303_3.sm
187 j3027_6.sm 233 j3031_2.sm 279 j3035_8.sm 325 j303_4.sm
188 j3027_7.sm 234 j3031_3.sm 280 j3035_9.sm 326 j303_5.sm
189 j3027_8.sm 235 j3031_4.sm 281 j3036_1.sm 327 j303_6.sm
190 j3027_9.sm 236 j3031_5.sm 282 j3036_10.sm 328 j303_7.sm
191 j3028_1.sm 237 j3031_6.sm 283 j3036_2.sm 329 j303_8.sm
192 j3028_10.sm 238 j3031_7.sm 284 j3036_3.sm 330 j303_9.sm
193 j3028_2.sm 239 j3031_8.sm 285 j3036_4.sm 331 j3040_1.sm
194 j3028_3.sm 240 j3031_9.sm 286 j3036_5.sm 332 j3040_10.sm
195 j3028_4.sm 241 j3032_1.sm 287 j3036_6.sm 333 j3040_2.sm
196 j3028_5.sm 242 j3032_10.sm 288 j3036_7.sm 334 j3040_3.sm
197 j3028_6.sm 243 j3032_2.sm 289 j3036_8.sm 335 j3040_4.sm
198 j3028_7.sm 244 j3032_3.sm 290 j3036_9.sm 336 j3040_5.sm
199 j3028_8.sm 245 j3032_4.sm 291 j3037_1.sm 337 j3040_6.sm
200 j3028_9.sm 246 j3032_5.sm 292 j3037_10.sm 338 j3040_7.sm
201 j3029_1.sm 247 j3032_6.sm 293 j3037_2.sm 339 j3040_8.sm
202 j3029_10.sm 248 j3032_7.sm 294 j3037_3.sm 340 j3040_9.sm
203 j3029_2.sm 249 j3032_8.sm 295 j3037_4.sm 341 j3041_1.sm
204 j3029_3.sm 250 j3032_9.sm 296 j3037_5.sm 342 j3041_10.sm
205 j3029_4.sm 251 j3033_1.sm 297 j3037_6.sm 343 j3041_2.sm
206 j3029_5.sm 252 j3033_10.sm 298 j3037_7.sm 344 j3041_3.sm
207 j3029_6.sm 253 j3033_2.sm 299 j3037_8.sm 345 j3041_4.sm
208 j3029_7.sm 254 j3033_3.sm 300 j3037_9.sm 346 j3041_5.sm
209 j3029_8.sm 255 j3033_4.sm 301 j3038_1.sm 347 j3041_6.sm
210 j3029_9.sm 256 j3033_5.sm 302 j3038_10.sm 348 j3041_7.sm
211 j302_1.sm 257 j3033_6.sm 303 j3038_2.sm 349 j3041_8.sm
212 j302_10.sm 258 j3033_7.sm 304 j3038_3.sm 350 j3041_9.sm
213 j302_2.sm 259 j3033_8.sm 305 j3038_4.sm 351 j3042_1.sm
214 j302_3.sm 260 j3033_9.sm 306 j3038_5.sm 352 j3042_10.sm

388

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
J30 Networks
No. File Name No. File Name No. File Name No. File Name
215 j302_4.sm 261 j3034_1.sm 307 j3038_6.sm 353 j3042_2.sm
216 j302_5.sm 262 j3034_10.sm 308 j3038_7.sm 354 j3042_3.sm
217 j302_6.sm 263 j3034_2.sm 309 j3038_8.sm 355 j3042_4.sm
218 j302_7.sm 264 j3034_3.sm 310 j3038_9.sm 356 j3042_5.sm
357 j3042_6.sm 403 j3047_2.sm 449 j306_8.sm
358 j3042_7.sm 404 j3047_3.sm 450 j306_9.sm
359 j3042_8.sm 405 j3047_4.sm 451 j307_1.sm
360 j3042_9.sm 406 j3047_5.sm 452 j307_10.sm
361 j3043_1.sm 407 j3047_6.sm 453 j307_2.sm
362 j3043_10.sm 408 j3047_7.sm 454 j307_3.sm
363 j3043_2.sm 409 j3047_8.sm 455 j307_4.sm
364 j3043_3.sm 410 j3047_9.sm 456 j307_5.sm
365 j3043_4.sm 411 j3048_1.sm 457 j307_6.sm
366 j3043_5.sm 412 j3048_10.sm 458 j307_7.sm
367 j3043_6.sm 413 j3048_2.sm 459 j307_8.sm
368 j3043_7.sm 414 j3048_3.sm 460 j307_9.sm
369 j3043_8.sm 415 j3048_4.sm 461 j308_1.sm
370 j3043_9.sm 416 j3048_5.sm 462 j308_10.sm
371 j3044_1.sm 417 j3048_6.sm 463 j308_2.sm
372 j3044_10.sm 418 j3048_7.sm 464 j308_3.sm
373 j3044_2.sm 419 j3048_8.sm 465 j308_4.sm
374 j3044_3.sm 420 j3048_9.sm 466 j308_5.sm
375 j3044_4.sm 421 j304_1.sm 467 j308_6.sm
376 j3044_5.sm 422 j304_10.sm 468 j308_7.sm
377 j3044_6.sm 423 j304_2.sm 469 j308_8.sm
378 j3044_7.sm 424 j304_3.sm 470 j308_9.sm
379 j3044_8.sm 425 j304_4.sm 471 j309_1.sm
380 j3044_9.sm 426 j304_5.sm 472 j309_10.sm
381 j3045_1.sm 427 j304_6.sm 473 j309_2.sm
382 j3045_10.sm 428 j304_7.sm 474 j309_3.sm
383 j3045_2.sm 429 j304_8.sm 475 j309_4.sm
384 j3045_3.sm 430 j304_9.sm 476 j309_5.sm
385 j3045_4.sm 431 j305_1.sm 477 j309_6.sm
386 j3045_5.sm 432 j305_10.sm 478 j309_7.sm
387 j3045_6.sm 433 j305_2.sm 479 j309_8.sm
388 j3045_7.sm 434 j305_3.sm 480 j309_9.sm
389 j3045_8.sm 435 j305_4.sm
390 j3045_9.sm 436 j305_5.sm
391 j3046_1.sm 437 j305_6.sm
392 j3046_10.sm 438 j305_7.sm
393 j3046_2.sm 439 j305_8.sm
394 j3046_3.sm 440 j305_9.sm
395 j3046_4.sm 441 j306_1.sm
396 j3046_5.sm 442 j306_10.sm
397 j3046_6.sm 443 j306_2.sm
398 j3046_7.sm 444 j306_3.sm

389

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
J30 Networks
No. File Name No. File Name No. File Name No. File Name
399 j3046_8.sm 445 j306_4.sm
400 j3046_9.sm 446 j306_5.sm
401 j3047_1.sm 447 j306_6.sm
402 j3047_10.sm 448 j306_7.sm

390

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
All J60 Networks

No. File Name No. File Name No. File Name No. File Name
1 j6010_1.sm 46 j6014_5.sm 91 j6019_1.sm 136 j6022_5.sm
2 j6010_10.sm 47 j6014_6.sm 92 j6019_10.sm 137 j6022_6.sm
3 j6010_2.sm 48 j6014_7.sm 93 j6019_2.sm 138 j6022_7.sm
4 j6010_3.sm 49 j6014_8.sm 94 j6019_3.sm 139 j6022_8.sm
5 j6010_4.sm 50 j6014_9.sm 95 j6019_4.sm 140 j6022_9.sm
6 j6010_5.sm 51 j6015_1.sm 96 j6019_5.sm 141 j6023_1.sm
7 j6010_6.sm 52 j6015_10.sm 97 j6019_6.sm 142 j6023_10.sm
8 j6010_7.sm 53 j6015_2.sm 98 j6019_7.sm 143 j6023_2.sm
9 j6010_8.sm 54 j6015_3.sm 99 j6019_8.sm 144 j6023_3.sm
10 j6010_9.sm 55 j6015_4.sm 100 j6019_9.sm 145 j6023_4.sm
11 j6011_1.sm 56 j6015_5.sm 101 j601_1.sm 146 j6023_5.sm
12 j6011_10.sm 57 j6015_6.sm 102 j601_10.sm 147 j6023_6.sm
13 j6011_2.sm 58 j6015_7.sm 103 j601_2.sm 148 j6023_7.sm
14 j6011_3.sm 59 j6015_8.sm 104 j601_3.sm 149 j6023_8.sm
15 j6011_4.sm 60 j6015_9.sm 105 j601_4.sm 150 j6023_9.sm
16 j6011_5.sm 61 j6016_1.sm 106 j601_5.sm 151 j6024_1.sm
17 j6011_6.sm 62 j6016_10.sm 107 j601_6.sm 152 j6024_10.sm
18 j6011_7.sm 63 j6016_2.sm 108 j601_7.sm 153 j6024_2.sm
19 j6011_8.sm 64 j6016_3.sm 109 j601_8.sm 154 j6024_3.sm
20 j6011_9.sm 65 j6016_4.sm 110 j601_9.sm 155 j6024_4.sm
21 j6012_1.sm 66 j6016_5.sm 111 j6020_1.sm 156 j6024_5.sm
22 j6012_10.sm 67 j6016_6.sm 112 j6020_10.sm 157 j6024_6.sm
23 j6012_2.sm 68 j6016_7.sm 113 j6020_2.sm 158 j6024_7.sm
24 j6012_3.sm 69 j6016_8.sm 114 j6020_3.sm 159 j6024_8.sm
25 j6012_4.sm 70 j6016_9.sm 115 j6020_4.sm 160 j6024_9.sm
26 j6012_5.sm 71 j6017_1.sm 116 j6020_5.sm 161 j6025_1.sm
27 j6012_6.sm 72 j6017_10.sm 117 j6020_6.sm 162 j6025_10.sm
28 j6012_7.sm 73 j6017_2.sm 118 j6020_7.sm 163 j6025_2.sm
29 j6012_8.sm 74 j6017_3.sm 119 j6020_8.sm 164 j6025_3.sm
30 j6012_9.sm 75 j6017_4.sm 120 j6020_9.sm 165 j6025_4.sm
31 j6013_1.sm 76 j6017_5.sm 121 j6021_1.sm 166 j6025_5.sm
32 j6013_10.sm 77 j6017_6.sm 122 j6021_10.sm 167 j6025_6.sm
33 j6013_2.sm 78 j6017_7.sm 123 j6021_2.sm 168 j6025_7.sm
34 j6013_3.sm 79 j6017_8.sm 124 j6021_3.sm 169 j6025_8.sm
35 j6013_4.sm 80 j6017_9.sm 125 j6021_4.sm 170 j6025_9.sm
36 j6013_5.sm 81 j6018_1.sm 126 j6021_5.sm 171 j6026_1.sm
37 j6013_6.sm 82 j6018_10.sm 127 j6021_6.sm 172 j6026_10.sm
38 j6013_7.sm 83 j6018_2.sm 128 j6021_7.sm 173 j6026_2.sm
39 j6013_8.sm 84 j6018_3.sm 129 j6021_8.sm 174 j6026_3.sm
40 j6013_9.sm 85 j6018_4.sm 130 j6021_9.sm 175 j6026_4.sm
41 j6014_1.sm 86 j6018_5.sm 131 j6022_1.sm 176 j6026_5.sm
42 j6014_10.sm 87 j6018_6.sm 132 j6022_10.sm 177 j6026_6.sm
43 j6014_2.sm 88 j6018_7.sm 133 j6022_2.sm 178 j6026_7.sm
44 j6014_3.sm 89 j6018_8.sm 134 j6022_3.sm 179 j6026_8.sm
45 j6014_4.sm 90 j6018_9.sm 135 j6022_4.sm 180 j6026_9.sm

391

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
All J60 Networks

No. File Name No. File Name No. File Name No. File Name
181 j6027_1.sm 226 j6030_5.sm 271 j6035_1.sm 316 j6039_5.sm
182 j6027_10.sm 227 j6030_6.sm 272 j6035_10.sm 317 j6039_6.sm
183 j6027_2.sm 228 j6030_7.sm 273 j6035_2.sm 318 j6039_7.sm
184 j6027_3.sm 229 j6030_8.sm 274 j6035_3.sm 319 j6039_8.sm
185 j6027_4.sm 230 j6030_9.sm 275 j6035_4.sm 320 j6039_9.sm
186 j6027_5.sm 231 j6031_1.sm 276 j6035_5.sm 321 j603_1.sm
187 j6027_6.sm 232 j6031_10.sm 277 j6035_6.sm 322 j603_10.sm
188 j6027_7.sm 233 j6031_2.sm 278 j6035_7.sm 323 j603_2.sm
189 j6027_8.sm 234 j6031_3.sm 279 j6035_8.sm 324 j603_3.sm
190 j6027_9.sm 235 j6031_4.sm 280 j6035_9.sm 325 j603_4.sm
191 j6028_1.sm 236 j6031_5.sm 281 j6036_1.sm 326 j603_5.sm
192 j6028_10.sm 237 j6031_6.sm 282 j6036_10.sm 327 j603_6.sm
193 j6028_2.sm 238 j6031_7.sm 283 j6036_2.sm 328 j603_7.sm
194 j6028_3.sm 239 j6031_8.sm 284 j6036_3.sm 329 j603_8.sm
195 j6028_4.sm 240 j6031_9.sm 285 j6036_4.sm 330 j603_9.sm
196 j6028_5.sm 241 j6032_1.sm 286 j6036_5.sm 331 j6040_1.sm
197 j6028_6.sm 242 j6032_10.sm 287 j6036_6.sm 332 j6040_10.sm
198 j6028_7.sm 243 j6032_2.sm 288 j6036_7.sm 333 j6040_2.sm
199 j6028_8.sm 244 j6032_3.sm 289 j6036_8.sm 334 j6040_3.sm
200 j6028_9.sm 245 j6032_4.sm 290 j6036_9.sm 335 j6040_4.sm
201 j6029_1.sm 246 j6032_5.sm 291 j6037_1.sm 336 j6040_5.sm
202 j6029_10.sm 247 j6032_6.sm 292 j6037_10.sm 337 j6040_6.sm
203 j6029_2.sm 248 j6032_7.sm 293 j6037_2.sm 338 j6040_7.sm
204 j6029_3.sm 249 j6032_8.sm 294 j6037_3.sm 339 j6040_8.sm
205 j6029_4.sm 250 j6032_9.sm 295 j6037_4.sm 340 j6040_9.sm
206 j6029_5.sm 251 j6033_1.sm 296 j6037_5.sm 341 j6041_1.sm
207 j6029_6.sm 252 j6033_10.sm 297 j6037_6.sm 342 j6041_10.sm
208 j6029_7.sm 253 j6033_2.sm 298 j6037_7.sm 343 j6041_2.sm
209 j6029_8.sm 254 j6033_3.sm 299 j6037_8.sm 344 j6041_3.sm
210 j6029_9.sm 255 j6033_4.sm 300 j6037_9.sm 345 j6041_4.sm
211 j602_1.sm 256 j6033_5.sm 301 j6038_1.sm 346 j6041_5.sm
212 j602_10.sm 257 j6033_6.sm 302 j6038_10.sm 347 j6041_6.sm
213 j602_2.sm 258 j6033_7.sm 303 j6038_2.sm 348 j6041_7.sm
214 j602_3.sm 259 j6033_8.sm 304 j6038_3.sm 349 j6041_8.sm
215 j602_4.sm 260 j6033_9.sm 305 j6038_4.sm 350 j6041_9.sm
216 j602_5.sm 261 j6034_1.sm 306 j6038_5.sm 351 j6042_1.sm
217 j602_6.sm 262 j6034_10.sm 307 j6038_6.sm 352 j6042_10.sm
218 j602_7.sm 263 j6034_2.sm 308 j6038_7.sm 353 j6042_2.sm
219 j602_8.sm 264 j6034_3.sm 309 j6038_8.sm 354 j6042_3.sm
220 j602_9.sm 265 j6034_4.sm 310 j6038_9.sm 355 j6042_4.sm
221 j6030_1.sm 266 j6034_5.sm 311 j6039_1.sm 356 j6042_5.sm
222 j6030_10.sm 267 j6034_6.sm 312 j6039_10.sm 357 j6042_6.sm
223 j6030_2.sm 268 j6034_7.sm 313 j6039_2.sm 358 j6042_7.sm
224 j6030_3.sm 269 j6034_8.sm 314 j6039_3.sm 359 j6042_8.sm
225 j6030_4.sm 270 j6034_9.sm 315 j6039_4.sm 360 j6042_9.sm

392

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
All J60 Networks

No. File Name No. File Name No. File Name No. File Name
361 j6043_1.sm 406 j6047_5.sm 451 j607_1.sm
362 j6043_10.sm 407 j6047_6.sm 452 j607_10.sm
363 j6043_2.sm 408 j6047_7.sm 453 j607_2.sm
364 j6043_3.sm 409 j6047_8.sm 454 j607_3.sm
365 j6043_4.sm 410 j6047_9.sm 455 j607_4.sm
366 j6043_5.sm 411 j6048_1.sm 456 j607_5.sm
367 j6043_6.sm 412 j6048_10.sm 457 j607_6.sm
368 j6043_7.sm 413 j6048_2.sm 458 j607_7.sm
369 j6043_8.sm 414 j6048_3.sm 459 j607_8.sm
370 j6043_9.sm 415 j6048_4.sm 460 j607_9.sm
371 j6044_1.sm 416 j6048_5.sm 461 j608_1.sm
372 j6044_10.sm 417 j6048_6.sm 462 j608_10.sm
373 j6044_2.sm 418 j6048_7.sm 463 j608_2.sm
374 j6044_3.sm 419 j6048_8.sm 464 j608_3.sm
375 j6044_4.sm 420 j6048_9.sm 465 j608_4.sm
376 j6044_5.sm 421 j604_1.sm 466 j608_5.sm
377 j6044_6.sm 422 j604_10.sm 467 j608_6.sm
378 j6044_7.sm 423 j604_2.sm 468 j608_7.sm
379 j6044_8.sm 424 j604_3.sm 469 j608_8.sm
380 j6044_9.sm 425 j604_4.sm 470 j608_9.sm
381 j6045_1.sm 426 j604_5.sm 471 j609_1.sm
382 j6045_10.sm 427 j604_6.sm 472 j609_10.sm
383 j6045_2.sm 428 j604_7.sm 473 j609_2.sm
384 j6045_3.sm 429 j604_8.sm 474 j609_3.sm
385 j6045_4.sm 430 j604_9.sm 475 j609_4.sm
386 j6045_5.sm 431 j605_1.sm 476 j609_5.sm
387 j6045_6.sm 432 j605_10.sm 477 j609_6.sm
388 j6045_7.sm 433 j605_2.sm 478 j609_7.sm
389 j6045_8.sm 434 j605_3.sm 479 j609_8.sm
390 j6045_9.sm 435 j605_4.sm 480 j609_9.sm
391 j6046_1.sm 436 j605_5.sm
392 j6046_10.sm 437 j605_6.sm
393 j6046_2.sm 438 j605_7.sm
394 j6046_3.sm 439 j605_8.sm
395 j6046_4.sm 440 j605_9.sm
396 j6046_5.sm 441 j606_1.sm
397 j6046_6.sm 442 j606_10.sm
398 j6046_7.sm 443 j606_2.sm
399 j6046_8.sm 444 j606_3.sm
400 j6046_9.sm 445 j606_4.sm
401 j6047_1.sm 446 j606_5.sm
402 j6047_10.sm 447 j606_6.sm
403 j6047_2.sm 448 j606_7.sm
404 j6047_3.sm 449 j606_8.sm
405 j6047_4.sm 450 j606_9.sm

393

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
All J90 Networks

No. File Name No. File Name No. File Name No. File Name
1 j9010_1.sm 46 j9014_5.sm 91 j9019_1.sm 136 j9022_5.sm
2 j9010_10.sm 47 j9014_6.sm 92 j9019_10.sm 137 j9022_6.sm
3 j9010_2.sm 48 j9014_7.sm 93 j9019_2.sm 138 j9022_7.sm
4 j9010_3.sm 49 j9014_8.sm 94 j9019_3.sm 139 j9022_8.sm
5 j9010_4.sm 50 j9014_9.sm 95 j9019_4.sm 140 j9022_9.sm
6 j9010_5.sm 51 j9015_1.sm 96 j9019_5.sm 141 j9023_1.sm
7 j9010_6.sm 52 j9015_10.sm 97 j9019_6.sm 142 j9023_10.sm
8 j9010_7.sm 53 j9015_2.sm 98 j9019_7.sm 143 j9023_2.sm
9 j9010_8.sm 54 j9015_3.sm 99 j9019_8.sm 144 j9023_3.sm
10 j9010_9.sm 55 j9015_4.sm 100 j9019_9.sm 145 j9023_4.sm
11 j9011_1.sm 56 j9015_5.sm 101 j901_1.sm 146 j9023_5.sm
12 j9011_10.sm 57 j9015_6.sm 102 j901_10.sm 147 j9023_6.sm
13 j9011_2.sm 58 j9015_7.sm 103 j901_2.sm 148 j9023_7.sm
14 j9011_3.sm 59 j9015_8.sm 104 j901_3.sm 149 j9023_8.sm
15 j9011_4.sm 60 j9015_9.sm 105 j901_4.sm 150 j9023_9.sm
16 j9011_5.sm 61 j9016_1.sm 106 j901_5.sm 151 j9024_1.sm
17 j9011_6.sm 62 j9016_10.sm 107 j901_6.sm 152 j9024_10.sm
18 j9011_7.sm 63 j9016_2.sm 108 j901_7.sm 153 j9024_2.sm
19 j9011_8.sm 64 j9016_3.sm 109 j901_8.sm 154 j9024_3.sm
20 j9011_9.sm 65 j9016_4.sm 110 j901_9.sm 155 j9024_4.sm
21 j9012_1.sm 66 j9016_5.sm 111 j9020_1.sm 156 j9024_5.sm
22 j9012_10.sm 67 j9016_6.sm 112 j9020_10.sm 157 j9024_6.sm
23 j9012_2.sm 68 j9016_7.sm 113 j9020_2.sm 158 j9024_7.sm
24 j9012_3.sm 69 j9016_8.sm 114 j9020_3.sm 159 j9024_8.sm
25 j9012_4.sm 70 j9016_9.sm 115 j9020_4.sm 160 j9024_9.sm
26 j9012_5.sm 71 j9017_1.sm 116 j9020_5.sm 161 j9025_1.sm
27 j9012_6.sm 72 j9017_10.sm 117 j9020_6.sm 162 j9025_10.sm
28 j9012_7.sm 73 j9017_2.sm 118 j9020_7.sm 163 j9025_2.sm
29 j9012_8.sm 74 j9017_3.sm 119 j9020_8.sm 164 j9025_3.sm
30 j9012_9.sm 75 j9017_4.sm 120 j9020_9.sm 165 j9025_4.sm
31 j9013_1.sm 76 j9017_5.sm 121 j9021_1.sm 166 j9025_5.sm
32 j9013_10.sm 77 j9017_6.sm 122 j9021_10.sm 167 j9025_6.sm
33 j9013_2.sm 78 j9017_7.sm 123 j9021_2.sm 168 j9025_7.sm
34 j9013_3.sm 79 j9017_8.sm 124 j9021_3.sm 169 j9025_8.sm
35 j9013_4.sm 80 j9017_9.sm 125 j9021_4.sm 170 j9025_9.sm
36 j9013_5.sm 81 j9018_1.sm 126 j9021_5.sm 171 j9026_1.sm
37 j9013_6.sm 82 j9018_10.sm 127 j9021_6.sm 172 j9026_10.sm
38 j9013_7.sm 83 j9018_2.sm 128 j9021_7.sm 173 j9026_2.sm
39 j9013_8.sm 84 j9018_3.sm 129 j9021_8.sm 174 j9026_3.sm
40 j9013_9.sm 85 j9018_4.sm 130 j9021_9.sm 175 j9026_4.sm
41 j9014_1.sm 86 j9018_5.sm 131 j9022_1.sm 176 j9026_5.sm
42 j9014_10.sm 87 j9018_6.sm 132 j9022_10.sm 177 j9026_6.sm
43 j9014_2.sm 88 j9018_7.sm 133 j9022_2.sm 178 j9026_7.sm
44 j9014_3.sm 89 j9018_8.sm 134 j9022_3.sm 179 j9026_8.sm
45 j9014_4.sm 90 j9018_9.sm 135 j9022_4.sm 180 j9026_9.sm

394

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
All J90 Networks

No. File Name No. File Name No. File Name No. File Name
181 j9027_1.sm 226 j9030_5.sm 271 j9035_1.sm 316 j9039_5.sm
182 j9027_10.sm 227 j9030_6.sm 272 j9035_10.sm 317 j9039_6.sm
183 j9027_2.sm 228 j9030_7.sm 273 j9035_2.sm 318 j9039_7.sm
184 j9027_3.sm 229 j9030_8.sm 274 j9035_3.sm 319 j9039_8.sm
185 j9027_4.sm 230 j9030_9.sm 275 j9035_4.sm 320 j9039_9.sm
186 j9027_5.sm 231 j9031_1.sm 276 j9035_5.sm 321 j903_1.sm
187 j9027_6.sm 232 j9031_10.sm 277 j9035_6.sm 322 j903_10.sm
188 j9027_7.sm 233 j9031_2.sm 278 j9035_7.sm 323 j903_2.sm
189 j9027_8.sm 234 j9031_3.sm 279 j9035_8.sm 324 j903_3.sm
190 j9027_9.sm 235 j9031_4.sm 280 j9035_9.sm 325 j903_4.sm
191 j9028_1.sm 236 j9031_5.sm 281 j9036_1.sm 326 j903_5.sm
192 j9028_10.sm 237 j9031_6.sm 282 j9036_10.sm 327 j903_6.sm
193 j9028_2.sm 238 j9031_7.sm 283 j9036_2.sm 328 j903_7.sm
194 j9028_3.sm 239 j9031_8.sm 284 j9036_3.sm 329 j903_8.sm
195 j9028_4.sm 240 j9031_9.sm 285 j9036_4.sm 330 j903_9.sm
196 j9028_5.sm 241 j9032_1.sm 286 j9036_5.sm 331 j9040_1.sm
197 j9028_6.sm 242 j9032_10.sm 287 j9036_6.sm 332 j9040_10.sm
198 j9028_7.sm 243 j9032_2.sm 288 j9036_7.sm 333 j9040_2.sm
199 j9028_8.sm 244 j9032_3.sm 289 j9036_8.sm 334 j9040_3.sm
200 j9028_9.sm 245 j9032_4.sm 290 j9036_9.sm 335 j9040_4.sm
201 j9029_1.sm 246 j9032_5.sm 291 j9037_1.sm 336 j9040_5.sm
202 j9029_10.sm 247 j9032_6.sm 292 j9037_10.sm 337 j9040_6.sm
203 j9029_2.sm 248 j9032_7.sm 293 j9037_2.sm 338 j9040_7.sm
204 j9029_3.sm 249 j9032_8.sm 294 j9037_3.sm 339 j9040_8.sm
205 j9029_4.sm 250 j9032_9.sm 295 j9037_4.sm 340 j9040_9.sm
206 j9029_5.sm 251 j9033_1.sm 296 j9037_5.sm 341 j9041_1.sm
207 j9029_6.sm 252 j9033_10.sm 297 j9037_6.sm 342 j9041_10.sm
208 j9029_7.sm 253 j9033_2.sm 298 j9037_7.sm 343 j9041_2.sm
209 j9029_8.sm 254 j9033_3.sm 299 j9037_8.sm 344 j9041_3.sm
210 j9029_9.sm 255 j9033_4.sm 300 j9037_9.sm 345 j9041_4.sm
211 j902_1.sm 256 j9033_5.sm 301 j9038_1.sm 346 j9041_5.sm
212 j902_10.sm 257 j9033_6.sm 302 j9038_10.sm 347 j9041_6.sm
213 j902_2.sm 258 j9033_7.sm 303 j9038_2.sm 348 j9041_7.sm
214 j902_3.sm 259 j9033_8.sm 304 j9038_3.sm 349 j9041_8.sm
215 j902_4.sm 260 j9033_9.sm 305 j9038_4.sm 350 j9041_9.sm
216 j902_5.sm 261 j9034_1.sm 306 j9038_5.sm 351 j9042_1.sm
217 j902_6.sm 262 j9034_10.sm 307 j9038_6.sm 352 j9042_10.sm
218 j902_7.sm 263 j9034_2.sm 308 j9038_7.sm 353 j9042_2.sm
219 j902_8.sm 264 j9034_3.sm 309 j9038_8.sm 354 j9042_3.sm
220 j902_9.sm 265 j9034_4.sm 310 j9038_9.sm 355 j9042_4.sm
221 j9030_1.sm 266 j9034_5.sm 311 j9039_1.sm 356 j9042_5.sm
222 j9030_10.sm 267 j9034_6.sm 312 j9039_10.sm 357 j9042_6.sm
223 j9030_2.sm 268 j9034_7.sm 313 j9039_2.sm 358 j9042_7.sm
224 j9030_3.sm 269 j9034_8.sm 314 j9039_3.sm 359 j9042_8.sm
225 j9030_4.sm 270 j9034_9.sm 315 j9039_4.sm 360 j9042_9.sm

395

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
All J90 Networks

No. File Name No. File Name No. File Name No. File Name
361 j9043_1.sm 406 j9047_5.sm 451 j907_1.sm
362 j9043_10.sm 407 j9047_6.sm 452 j907_10.sm
363 j9043_2.sm 408 j9047_7.sm 453 j907_2.sm
364 j9043_3.sm 409 j9047_8.sm 454 j907_3.sm
365 j9043_4.sm 410 j9047_9.sm 455 j907_4.sm
366 j9043_5.sm 411 j9048_1.sm 456 j907_5.sm
367 j9043_6.sm 412 j9048_10.sm 457 j907_6.sm
368 j9043_7.sm 413 j9048_2.sm 458 j907_7.sm
369 j9043_8.sm 414 j9048_3.sm 459 j907_8.sm
370 j9043_9.sm 415 j9048_4.sm 460 j907_9.sm
371 j9044_1.sm 416 j9048_5.sm 461 j908_1.sm
372 j9044_10.sm 417 j9048_6.sm 462 j908_10.sm
373 j9044_2.sm 418 j9048_7.sm 463 j908_2.sm
374 j9044_3.sm 419 j9048_8.sm 464 j908_3.sm
375 j9044_4.sm 420 j9048_9.sm 465 j908_4.sm
376 j9044_5.sm 421 j904_1.sm 466 j908_5.sm
377 j9044_6.sm 422 j904_10.sm 467 j908_6.sm
378 j9044_7.sm 423 j904_2.sm 468 j908_7.sm
379 j9044_8.sm 424 j904_3.sm 469 j908_8.sm
380 j9044_9.sm 425 j904_4.sm 470 j908_9.sm
381 j9045_1.sm 426 j904_5.sm 471 j909_1.sm
382 j9045_10.sm 427 j904_6.sm 472 j909_10.sm
383 j9045_2.sm 428 j904_7.sm 473 j909_2.sm
384 j9045_3.sm 429 j904_8.sm 474 j909_3.sm
385 j9045_4.sm 430 j904_9.sm 475 j909_4.sm
386 j9045_5.sm 431 j905_1.sm 476 j909_5.sm
387 j9045_6.sm 432 j905_10.sm 477 j909_6.sm
388 j9045_7.sm 433 j905_2.sm 478 j909_7.sm
389 j9045_8.sm 434 j905_3.sm 479 j909_8.sm
390 j9045_9.sm 435 j905_4.sm 480 j909_9.sm
391 j9046_1.sm 436 j905_5.sm
392 j9046_10.sm 437 j905_6.sm
393 j9046_2.sm 438 j905_7.sm
394 j9046_3.sm 439 j905_8.sm
395 j9046_4.sm 440 j905_9.sm
396 j9046_5.sm 441 j906_1.sm
397 j9046_6.sm 442 j906_10.sm
398 j9046_7.sm 443 j906_2.sm
399 j9046_8.sm 444 j906_3.sm
400 j9046_9.sm 445 j906_4.sm
401 j9047_1.sm 446 j906_5.sm
402 j9047_10.sm 447 j906_6.sm
403 j9047_2.sm 448 j906_7.sm
404 j9047_3.sm 449 j906_8.sm
405 j9047_4.sm 450 j906_9.sm

396

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
All J120 Networks

No. File Name No. File Name No. File Name No. File Name
1 j12010_1.sm 46 j12014_5.sm 91 j12019_1.sm 136 j12022_5.sm
2 j12010_10.sm 47 j12014_6.sm 92 j12019_10.sm 137 j12022_6.sm
3 j12010_2.sm 48 j12014_7.sm 93 j12019_2.sm 138 j12022_7.sm
4 j12010_3.sm 49 j12014_8.sm 94 j12019_3.sm 139 j12022_8.sm
5 j12010_4.sm 50 j12014_9.sm 95 j12019_4.sm 140 j12022_9.sm
6 j12010_5.sm 51 j12015_1.sm 96 j12019_5.sm 141 j12023_1.sm
7 j12010_6.sm 52 j12015_10.sm 97 j12019_6.sm 142 j12023_10.sm
8 j12010_7.sm 53 j12015_2.sm 98 j12019_7.sm 143 j12023_2.sm
9 j12010_8.sm 54 j12015_3.sm 99 j12019_8.sm 144 j12023_3.sm
10 j12010_9.sm 55 j12015_4.sm 100 j12019_9.sm 145 j12023_4.sm
11 j12011_1.sm 56 j12015_5.sm 101 j1201_1.sm 146 j12023_5.sm
12 j12011_10.sm 57 j12015_6.sm 102 j1201_10.sm 147 j12023_6.sm
13 j12011_2.sm 58 j12015_7.sm 103 j1201_2.sm 148 j12023_7.sm
14 j12011_3.sm 59 j12015_8.sm 104 j1201_3.sm 149 j12023_8.sm
15 j12011_4.sm 60 j12015_9.sm 105 j1201_4.sm 150 j12023_9.sm
16 j12011_5.sm 61 j12016_1.sm 106 j1201_5.sm 151 j12024_1.sm
17 j12011_6.sm 62 j12016_10.sm 107 j1201_6.sm 152 j12024_10.sm
18 j12011_7.sm 63 j12016_2.sm 108 j1201_7.sm 153 j12024_2.sm
19 j12011_8.sm 64 j12016_3.sm 109 j1201_8.sm 154 j12024_3.sm
20 j12011_9.sm 65 j12016_4.sm 110 j1201_9.sm 155 j12024_4.sm
21 j12012_1.sm 66 j12016_5.sm 111 j12020_1.sm 156 j12024_5.sm
22 j12012_10.sm 67 j12016_6.sm 112 j12020_10.sm 157 j12024_6.sm
23 j12012_2.sm 68 j12016_7.sm 113 j12020_2.sm 158 j12024_7.sm
24 j12012_3.sm 69 j12016_8.sm 114 j12020_3.sm 159 j12024_8.sm
25 j12012_4.sm 70 j12016_9.sm 115 j12020_4.sm 160 j12024_9.sm
26 j12012_5.sm 71 j12017_1.sm 116 j12020_5.sm 161 j12025_1.sm
27 j12012_6.sm 72 j12017_10.sm 117 j12020_6.sm 162 j12025_10.sm
28 j12012_7.sm 73 j12017_2.sm 118 j12020_7.sm 163 j12025_2.sm
29 j12012_8.sm 74 j12017_3.sm 119 j12020_8.sm 164 j12025_3.sm
30 j12012_9.sm 75 j12017_4.sm 120 j12020_9.sm 165 j12025_4.sm
31 j12013_1.sm 76 j12017_5.sm 121 j12021_1.sm 166 j12025_5.sm
32 j12013_10.sm 77 j12017_6.sm 122 j12021_10.sm 167 j12025_6.sm
33 j12013_2.sm 78 j12017_7.sm 123 j12021_2.sm 168 j12025_7.sm
34 j12013_3.sm 79 j12017_8.sm 124 j12021_3.sm 169 j12025_8.sm
35 j12013_4.sm 80 j12017_9.sm 125 j12021_4.sm 170 j12025_9.sm
36 j12013_5.sm 81 j12018_1.sm 126 j12021_5.sm 171 j12026_1.sm
37 j12013_6.sm 82 j12018_10.sm 127 j12021_6.sm 172 j12026_10.sm
38 j12013_7.sm 83 j12018_2.sm 128 j12021_7.sm 173 j12026_2.sm
39 j12013_8.sm 84 j12018_3.sm 129 j12021_8.sm 174 j12026_3.sm
40 j12013_9.sm 85 j12018_4.sm 130 j12021_9.sm 175 j12026_4.sm
41 j12014_1.sm 86 j12018_5.sm 131 j12022_1.sm 176 j12026_5.sm
42 j12014_10.sm 87 j12018_6.sm 132 j12022_10.sm 177 j12026_6.sm
43 j12014_2.sm 88 j12018_7.sm 133 j12022_2.sm 178 j12026_7.sm
44 j12014_3.sm 89 j12018_8.sm 134 j12022_3.sm 179 j12026_8.sm
45 j12014_4.sm 90 j12018_9.sm 135 j12022_4.sm 180 j12026_9.sm

397

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
All J120 Networks

No. File Name No. File Name No. File Name No. File Name

181 j12027_1.sm 226 j12030_5.sm 271 j12035_1.sm 316 j12039_5.sm


182 j12027_10.sm 227 j12030_6.sm 272 j12035_10.sm 317 j12039_6.sm
183 j12027_2.sm 228 j12030_7.sm 273 j12035_2.sm 318 j12039_7.sm
184 j12027_3.sm 229 j12030_8.sm 274 j12035_3.sm 319 j12039_8.sm
185 j12027_4.sm 230 j12030_9.sm 275 j12035_4.sm 320 j12039_9.sm
186 j12027_5.sm 231 j12031_1.sm 276 j12035_5.sm 321 j1203_1.sm
187 j12027_6.sm 232 j12031_10.sm 277 j12035_6.sm 322 j1203_10.sm
188 j12027_7.sm 233 j12031_2.sm 278 j12035_7.sm 323 j1203_2.sm
189 j12027_8.sm 234 j12031_3.sm 279 j12035_8.sm 324 j1203_3.sm
190 j12027_9.sm 235 j12031_4.sm 280 j12035_9.sm 325 j1203_4.sm
191 j12028_1.sm 236 j12031_5.sm 281 j12036_1.sm 326 j1203_5.sm
192 j12028_10.sm 237 j12031_6.sm 282 j12036_10.sm 327 j1203_6.sm
193 j12028_2.sm 238 j12031_7.sm 283 j12036_2.sm 328 j1203_7.sm
194 j12028_3.sm 239 j12031_8.sm 284 j12036_3.sm 329 j1203_8.sm
195 j12028_4.sm 240 j12031_9.sm 285 j12036_4.sm 330 j1203_9.sm
196 j12028_5.sm 241 j12032_1.sm 286 j12036_5.sm 331 j12040_1.sm
197 j12028_6.sm 242 j12032_10.sm 287 j12036_6.sm 332 j12040_10.sm
198 j12028_7.sm 243 j12032_2.sm 288 j12036_7.sm 333 j12040_2.sm
199 j12028_8.sm 244 j12032_3.sm 289 j12036_8.sm 334 j12040_3.sm
200 j12028_9.sm 245 j12032_4.sm 290 j12036_9.sm 335 j12040_4.sm
201 j12029_1.sm 246 j12032_5.sm 291 j12037_1.sm 336 j12040_5.sm
202 j12029_10.sm 247 j12032_6.sm 292 j12037_10.sm 337 j12040_6.sm
203 j12029_2.sm 248 j12032_7.sm 293 j12037_2.sm 338 j12040_7.sm
204 j12029_3.sm 249 j12032_8.sm 294 j12037_3.sm 339 j12040_8.sm
205 j12029_4.sm 250 j12032_9.sm 295 j12037_4.sm 340 j12040_9.sm
206 j12029_5.sm 251 j12033_1.sm 296 j12037_5.sm 341 j12041_1.sm
207 j12029_6.sm 252 j12033_10.sm 297 j12037_6.sm 342 j12041_10.sm
208 j12029_7.sm 253 j12033_2.sm 298 j12037_7.sm 343 j12041_2.sm
209 j12029_8.sm 254 j12033_3.sm 299 j12037_8.sm 344 j12041_3.sm
210 j12029_9.sm 255 j12033_4.sm 300 j12037_9.sm 345 j12041_4.sm
211 j1202_1.sm 256 j12033_5.sm 301 j12038_1.sm 346 j12041_5.sm
212 j1202_10.sm 257 j12033_6.sm 302 j12038_10.sm 347 j12041_6.sm
213 j1202_2.sm 258 j12033_7.sm 303 j12038_2.sm 348 j12041_7.sm
214 j1202_3.sm 259 j12033_8.sm 304 j12038_3.sm 349 j12041_8.sm
215 j1202_4.sm 260 j12033_9.sm 305 j12038_4.sm 350 j12041_9.sm
216 j1202_5.sm 261 j12034_1.sm 306 j12038_5.sm 351 j12042_1.sm
217 j1202_6.sm 262 j12034_10.sm 307 j12038_6.sm 352 j12042_10.sm
218 j1202_7.sm 263 j12034_2.sm 308 j12038_7.sm 353 j12042_2.sm
219 j1202_8.sm 264 j12034_3.sm 309 j12038_8.sm 354 j12042_3.sm
220 j1202_9.sm 265 j12034_4.sm 310 j12038_9.sm 355 j12042_4.sm
221 j12030_1.sm 266 j12034_5.sm 311 j12039_1.sm 356 j12042_5.sm
222 j12030_10.sm 267 j12034_6.sm 312 j12039_10.sm 357 j12042_6.sm
223 j12030_2.sm 268 j12034_7.sm 313 j12039_2.sm 358 j12042_7.sm
224 j12030_3.sm 269 j12034_8.sm 314 j12039_3.sm 359 j12042_8.sm
225 j12030_4.sm 270 j12034_9.sm 315 j12039_4.sm 360 j12042_9.sm

398

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
All J120 Networks

No. File Name No. File Name No. File Name No. File Name
361 j12043_1.sm 406 j12047_5.sm 451 j12051_1.sm 496 j12055_5.sm
362 j12043_10.sm 407 j12047_6.sm 452 j12051_10.sm 497 j12055_6.sm
363 j12043_2.sm 408 j12047_7.sm 453 j12051_2.sm 498 j12055_7.sm
364 j12043_3.sm 409 j12047_8.sm 454 j12051_3.sm 499 j12055_8.sm
365 j12043_4.sm 410 j12047_9.sm 455 j12051_4.sm 500 j12055_9.sm
366 j12043_5.sm 411 j12048_1.sm 456 j12051_5.sm 501 j12056_1.sm
367 j12043_6.sm 412 j12048_10.sm 457 j12051_6.sm 502 j12056_10.sm
368 j12043_7.sm 413 j12048_2.sm 458 j12051_7.sm 503 j12056_2.sm
369 j12043_8.sm 414 j12048_3.sm 459 j12051_8.sm 504 j12056_3.sm
370 j12043_9.sm 415 j12048_4.sm 460 j12051_9.sm 505 j12056_4.sm
371 j12044_1.sm 416 j12048_5.sm 461 j12052_1.sm 506 j12056_5.sm
372 j12044_10.sm 417 j12048_6.sm 462 j12052_10.sm 507 j12056_6.sm
373 j12044_2.sm 418 j12048_7.sm 463 j12052_2.sm 508 j12056_7.sm
374 j12044_3.sm 419 j12048_8.sm 464 j12052_3.sm 509 j12056_8.sm
375 j12044_4.sm 420 j12048_9.sm 465 j12052_4.sm 510 j12056_9.sm
376 j12044_5.sm 421 j12049_1.sm 466 j12052_5.sm 511 j12057_1.sm
377 j12044_6.sm 422 j12049_10.sm 467 j12052_6.sm 512 j12057_10.sm
378 j12044_7.sm 423 j12049_2.sm 468 j12052_7.sm 513 j12057_2.sm
379 j12044_8.sm 424 j12049_3.sm 469 j12052_8.sm 514 j12057_3.sm
380 j12044_9.sm 425 j12049_4.sm 470 j12052_9.sm 515 j12057_4.sm
381 j12045_1.sm 426 j12049_5.sm 471 j12053_1.sm 516 j12057_5.sm
382 j12045_10.sm 427 j12049_6.sm 472 j12053_10.sm 517 j12057_6.sm
383 j12045_2.sm 428 j12049_7.sm 473 j12053_2.sm 518 j12057_7.sm
384 j12045_3.sm 429 j12049_8.sm 474 j12053_3.sm 519 j12057_8.sm
385 j12045_4.sm 430 j12049_9.sm 475 j12053_4.sm 520 j12057_9.sm
386 j12045_5.sm 431 j1204_1.sm 476 j12053_5.sm 521 j12058_1.sm
387 j12045_6.sm 432 j1204_10.sm 477 j12053_6.sm 522 j12058_10.sm
388 j12045_7.sm 433 j1204_2.sm 478 j12053_7.sm 523 j12058_2.sm
389 j12045_8.sm 434 j1204_3.sm 479 j12053_8.sm 524 j12058_3.sm
390 j12045_9.sm 435 j1204_4.sm 480 j12053_9.sm 525 j12058_4.sm
391 j12046_1.sm 436 j1204_5.sm 481 j12054_1.sm 526 j12058_5.sm
392 j12046_10.sm 437 j1204_6.sm 482 j12054_10.sm 527 j12058_6.sm
393 j12046_2.sm 438 j1204_7.sm 483 j12054_2.sm 528 j12058_7.sm
394 j12046_3.sm 439 j1204_8.sm 484 j12054_3.sm 529 j12058_8.sm
395 j12046_4.sm 440 j1204_9.sm 485 j12054_4.sm 530 j12058_9.sm
396 j12046_5.sm 441 j12050_1.sm 486 j12054_5.sm 531 j12059_1.sm
397 j12046_6.sm 442 j12050_10.sm 487 j12054_6.sm 532 j12059_10.sm
398 j12046_7.sm 443 j12050_2.sm 488 j12054_7.sm 533 j12059_2.sm
399 j12046_8.sm 444 j12050_3.sm 489 j12054_8.sm 534 j12059_3.sm
400 j12046_9.sm 445 j12050_4.sm 490 j12054_9.sm 535 j12059_4.sm
401 j12047_1.sm 446 j12050_5.sm 491 j12055_1.sm 536 j12059_5.sm
402 j12047_10.sm 447 j12050_6.sm 492 j12055_10.sm 537 j12059_6.sm
403 j12047_2.sm 448 j12050_7.sm 493 j12055_2.sm 538 j12059_7.sm
404 j12047_3.sm 449 j12050_8.sm 494 j12055_3.sm 539 j12059_8.sm
405 j12047_4.sm 450 j12050_9.sm 495 j12055_4.sm 540 j12059_9.sm

399

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
All J120 Networks

No. File Name No. File Name No. File Name No. File Name
541 j1205_1.sm 556 j12060_5.sm 571 j1207_1.sm 586 j1208_5.sm
542 j1205_10.sm 557 j12060_6.sm 572 j1207_10.sm 587 j1208_6.sm
543 j1205_2.sm 558 j12060_7.sm 573 j1207_2.sm 588 j1208_7.sm
544 j1205_3.sm 559 j12060_8.sm 574 j1207_3.sm 589 j1208_8.sm
545 j1205_4.sm 560 j12060_9.sm 575 j1207_4.sm 590 j1208_9.sm
546 j1205_5.sm 561 j1206_1.sm 576 j1207_5.sm 591 j1209_1.sm
547 j1205_6.sm 562 j1206_10.sm 577 j1207_6.sm 592 j1209_10.sm
548 j1205_7.sm 563 j1206_2.sm 578 j1207_7.sm 593 j1209_2.sm
549 j1205_8.sm 564 j1206_3.sm 579 j1207_8.sm 594 j1209_3.sm
550 j1205_9.sm 565 j1206_4.sm 580 j1207_9.sm 595 j1209_4.sm
551 j12060_1.sm 566 j1206_5.sm 581 j1208_1.sm 596 j1209_5.sm
552 j12060_10.sm 567 j1206_6.sm 582 j1208_10.sm 597 j1209_6.sm
553 j12060_2.sm 568 j1206_7.sm 583 j1208_2.sm 598 j1209_7.sm
554 j12060_3.sm 569 j1206_8.sm 584 j1208_3.sm 599 j1209_8.sm
555 j12060_4.sm 570 j1206_9.sm 585 j1208_4.sm 600 j1209_9.sm

400

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix A.2: Original j301_1.sm

************************************************************************
file with base data: j30_17.bas
initial value random generator: 28123
************************************************************************
projects: 1
jobs (incl. super source/sink): 32
horizon: 158
RESOURCES
- renewable: 4 R
- nonrenewable: 0 N
- doubly constrained: 0 D
************************************************************************
PROJECT INFORMATION:
pronr. #Jobs rel. date due date tard cost MPM-Time
1 30 0 38 26 38
************************************************************************
PRECEDENCE RELATIONS:
jobnr. #modes #successors successors
1 1 3 2 3 4
2 1 3 6 11 15
3 1 3 7 8 13
4 1 3 5 9 10
5 1 1 20
6 1 1 30
7 1 1 27
8 1 3 12 19 27
9 1 1 14
10 1 2 16 25
11 1 2 20 26
12 1 1 14
13 1 2 17 18
14 1 1 17
15 1 1 25
16 1 2 21 22
17 1 1 22
18 1 2 20 22
19 1 2 24 29
20 1 2 23 25
21 1 1 28
22 1 1 23
23 1 1 24
24 1 1 30
25 1 1 30
26 1 1 31
27 1 1 28
28 1 1 31
29 1 1 32
30 1 1 32
31 1 1 32
32 1 0
************************************************************************

401

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
REQUESTS/DURATIONS:
jobnr. mode duration R 1 R 2 R 3 R 4
------------------------------------------------------------------------
1 1 0 0 0 0 0
2 1 8 4 0 0 0
3 1 4 10 0 0 0
4 1 6 0 0 0 3
5 1 3 3 0 0 0
6 1 8 0 0 0 8
7 1 5 4 0 0 0
8 1 9 0 1 0 0
9 1 2 6 0 0 0
10 1 7 0 0 0 1
11 1 9 0 5 0 0
12 1 2 0 7 0 0
13 1 6 4 0 0 0
14 1 3 0 8 0 0
15 1 9 3 0 0 0
16 1 10 0 0 0 5
17 1 6 0 0 0 8
18 1 5 0 0 0 7
19 1 3 0 1 0 0
20 1 7 0 10 0 0
21 1 2 0 0 0 6
22 1 7 2 0 0 0
23 1 2 3 0 0 0
24 1 3 0 9 0 0
25 1 3 4 0 0 0
26 1 7 0 0 4 0
27 1 8 0 0 0 7
28 1 3 0 8 0 0
29 1 7 0 7 0 0
30 1 2 0 7 0 0
31 1 2 0 0 2 0
32 1 0 0 0 0 0
************************************************************************
RESOURCEAVAILABILITIES:
R 1 R 2 R 3 R 4
12 13 4 12
************************************************************************

402

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix B
VBA/MATLAB Codes
Appendix B.1: VBA Code

Sub Convert_sm2txt()
'This Procedure changes the extension of each file in folder -
PathToUse- from.sm to,txt

Dim FirstLoop As Boolean


Dim MyFile As String
Dim PathToUse As String
Dim myDoc As Document
Dim Response As Long
Dim strFileName As String
Dim strFilePath As String
Dim oDoc As Document
'Add the path to your files to convert
PathToUse = "C:\Users\Armelle\Documents\My files\Networks\"
' Get Name of First.doc File from Directory
strFileName = Dir$(strFilePath & "*.sm")
While Len(strFileName) <> 0
' Set Error Handler
On Error Resume Next
' Attempt to Open the Document
Set oDoc = Documents.Open(FileName:=strFilePath & strFileName)
' Close Document
oDoc.SaveAs FileName:=strFilePath & strFileName & ".txt"
oDoc.Close
' Clear Object Variable
Set oDoc = Nothing
GetNextDoc:
' Get Next Document from Specified Directory
strFileName = Dir$()
Wend
'page 161 code
End Sub

403

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix B.2: MATLAB Code for Network Files’ Activity Entry Computations

%This code reads the contents of each text file, representing a network
initially converted from.sm format to.txt format using a VBA code and saved in
DirFileA folder. All file names are listed in filenameListFile file. The
information obtained serves to calculate the triangular probabilistic durations
for each network. New files are saved to new folder called DirFileC.

jn='30'; %change to '60', '90', '120' for others


num=1; %for alphanumeric activity names otherwise num=0;
DirFileA=['j',jn,'.sm\J',jn,'_txt_Files\'];
DirFileC=['j',jn,'.sm\J',jn,'_Tri_Durations_Num\'];
filenameB=['j',jn,'.sm\Act_Prob_Durations_J',jn,'_Num.xlsx'];
filenameListFile=['j',jn,'.sm\Copy_of_NetworkConvert_sm2txt-',jn,'.txt'];
%TFileName= readtable(filenameListFile,'Delimiter',' ');
TFileName= readtable(filenameListFile,'Delimiter',' ','ReadVariableNames',
false);

DetermineRows; %subscript hereby provided after the main script


MFileName=NameFileFnl;Tn1=n_act;
%nFileN1=size(TFileName);nFileN=nFileN1(1,1);
nFileN=size(MFileName);

for k=1:nFileN

filenameA=[DirFileA,MFileName{k}];

filenameC=[DirFileC,'\Tri_Dur_',MFileName{k}];
T1 = readtable(filenameA,'HeaderLines', 0, 'ReadVariableNames',
false,'Format', '%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s');
%T1 = readtable(filenameA,'HeaderLines', 0, 'ReadVariableNames',
false,'Format', '%s');
%T1 = readtable(filenameA);
%Tn=size(T1); %here calculate the size (row and column) of the table, then it
can generate a matrix the same size as the excel
sheet=1;

ID1=xlsread(filenameB,sheet,['B2:B',num2str(Tn1+1)]);
nID=size(ID1);
Tn1=nID(1);Tn2=nID(2);
[~,pd1] = xlsread(filenameB,sheet);
NameID=xlsread(filenameB,sheet,['B2:B',num2str(Tn1+1)]);
n1=i2; n2=n1+Tn1-1;
n3=i4; n4=n3+Tn1-1;
aa=T1(n1:n2,1); %9x1 cell
aaD=T1(n3:n4,1);
AA=table2array(aaD);
AAp=table2array(aa);%cell Tn1x1
nq=length(aa{1,:});

Cp1=cell(Tn1,Tn2); Cp=cell(Tn1,6); %6
C1=cell(Tn1,Tn2); C=cell(Tn1,7);%7
if nq==1
for i=1:Tn1

404

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
u=transpose(split(AAp(i)));v=transpose(split(AA(i)));
mu=length(u);mv=length(v);
Cp(i,1:mu)=split(AAp(i));
C(i,1:mv)=split(AA(i));
end
else
for i=1:Tn1
Cp1(i,:)=AAp(i,:);
C1(i,:)=AA(i,:);
end
Cp=Cp1(:,1:6);
C=C1(:,1:7);
end
%rC=7;rCp=6;
CC=zeros(Tn1,7);CCp=zeros(Tn1,6);
for i=1:Tn1
if nq==1
for pq=1:6
if or (isequal(Cp{i,pq},{''}),isequal(Cp{i,pq},[]))
Cp{i,pq}=cell2mat({'0'});
end
CCp(i,pq)=str2num(Cp{i,pq});
end
for pq=1:7
if or (isequal(C{i,pq},{''}),isequal(C{i,pq},[]))
C{i,pq}=cell2mat({'0'}); %C{i,pq}={'0'};
end
CC(i,pq)=str2num(C{i,pq});
end
else
for pq=1:6
if or (isequal(Cp{i,pq},{''}),isequal(Cp{i,pq},[]))
Cp{i,pq}=cell2mat({'0'});
end
CCp(i,pq)=str2num( cell2mat(Cp(i,pq)) );
end
for pq=1:7
if or (isequal(C{i,pq},{''}),isequal(C{i,pq},[]))
C{i,pq}=cell2mat({'0'}); %C{i,pq}={'0'};
end
CC(i,pq)=str2num( cell2mat(C(i,pq)) );
end
end
end

Mat=CC;Matp=CCp;
szC=size(CC);rC=szC(1,2); szCp=size(CCp);rCp=szCp(1,2);

Dur=zeros(Tn1,4);
Dur(:,1)=Mat(:,3);
Index=cell(Tn1,1);
for i=1:Tn1
if or( cell2mat(pd1(i+1,3))=='tri', cell2mat(pd1(i+1,3))=='Tri')
if round(Dur(i,1),2)>0
Dur(i,2)=Dur(i,1)*0.9;
Dur(i,3)=Dur(i,1);

405

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Dur(i,4)=Dur(i,1)*1.5;
Index{i}=[num2str(Dur(i,2)),';',num2str(Dur(i,3)),';',num2str(Dur(i,4))];
else
Index{i}=['0.009',';','0.01',';','0.15'];
end
end
end

Succe_Index=Matp(:,4:rCp);
Successor=cell(Tn1,1);
for i=1:Tn1
j=1;
a=nonzeros(Succe_Index(i,:));
na=length(a);
if na~=0
b='';
while j<=na
if j<na
b1=[num2str(Succe_Index(i,j)),'-FTS-0;'];
b=[b,b1];
else
b1=[num2str(Succe_Index(i,j)),'-FTS-0'];
b=[b,b1];
end
j=j+1;
end
else
b='N/A';
end
Successor{i}=b;
end
Duration=Dur(:,1);
pd=pd1(2:Tn1+1,3);
ID=ID1(:,1);
Table_Net=table(ID,NameID,Duration,pd,Index,Successor);

writetable(Table_Net,filenameC,'Delimiter','\t','WriteRowNames',
true);
warning('off')
end

DetermineRows.m (subscript)
clc;
%This code finds the beginning of the 1st row of the activities/successors
%table. It also determines the total number of activities per file. It also
%finds the beginning of the 1st row of activities/durations
MFileName_1=table2cell(TFileName);
DirFileA=['j',jn,'.sm\J',jn,'_txt_Files\'];
nfile=length(MFileName_1);
%nfile=20;
pos=zeros(nfile,3);

for j=1:nfile

406

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
filenameA=[DirFileA,MFileName_1{j}];

T1 = readtable(filenameA,'HeaderLines', 0, 'ReadVariableNames',
false,'Format', '%s%s%s%s%s%s%s%s');

i=1;c=0;cc=0;
n_act=str2num(jn)+2;

while c~=1
u=table2cell(T1(i,'Var1'));
v=cell2mat(u);
if isequal(u,"PRECEDENCE RELATIONS:")==1

i2=i+2;
%i3=i2+n_act-1;
c=0;
end

if isequal(u,"REQUESTS/DURATIONS:")==1
i4=i+3;
%i3=i4-u+1; c=1;
c=1;
end
i=i+1;
end
pos(j,:)=[i2,n_act,i4];
end
moda=mode(pos(:,1));
list1name=cell(nfile,1);list2name=cell(nfile,1);c1=0;c2=0;
for k=1:nfile
if pos(k,1)>moda
list1name(k)=MFileName_1(k);c1=c1+1;

else
list2name(k)=MFileName_1(k);c2=c2+1;k1=k;

end
end
nact=pos(k1,2);

NameFileIssue=cell(c1,1);NameFileFnl=cell(c2,1);c1=0;c2=0;
for k=1:nfile
if isequal(list2name(k),{''})
c1=c1+1;NameFileIssue(c1)=list1name(k);
end

if isequal(list1name(k),{''})
c2=c2+1;NameFileFnl(c2)=list2name(k);
end
end

407

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix C
Flowcharts
Appendix C.0: Meanings of Flowchart Symbols

408

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix C.1: Flowchart - CPM Forward Pass

409

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix C.2: Flowchart - CPM Backward Pass

410

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix C.3: Flowchart - Activity Float Calculations

411

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix C.4: Flowchart - Formatting a Network for Activity Entry Computations

412

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix C.5: Flowchart - Probabilistic Duration Calculations

413

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix C.6: Flowchart - Network Path Determinations

414

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix C.7: Flowchart - Johnson Complexity Measure Calculation

415

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix D
Code Outputs/Network Activity Information

Appendix D.1: Converted ‘j30_1.sm’ to ‘J30_Tri_ j301_1.txt’

ID NameID Duration pd Index Successor


1 1 0 Tri 0.009;0.01;0.015 2-FTS-0;3-FTS-0;4-FTS-0
2 2 8 Tri 7.2;8;12 6-FTS-0;11-FTS-0;15-FTS-0
3 3 4 Tri 3.6;4;6 7-FTS-0;8-FTS-0;13-FTS-0
4 4 6 Tri 5.4;6;9 5-FTS-0;9-FTS-0;10-FTS-0
5 5 3 Tri 2.7;3;4.5 20-FTS-0
6 6 8 Tri 7.2;8;12 30-FTS-0
7 7 5 Tri 4.5;5;7.5 27-FTS-0
8 8 9 Tri 8.1;9;13.5 12-FTS-0;19-FTS-0;27-FTS-0
9 9 2 Tri 1.8;2;3 14-FTS-0
10 10 7 Tri 6.3;7;10.5 16-FTS-0;25-FTS-0
11 11 9 Tri 8.1;9;13.5 20-FTS-0;26-FTS-0
12 12 2 Tri 1.8;2;3 14-FTS-0
13 13 6 Tri 5.4;6;9 17-FTS-0;18-FTS-0
14 14 3 Tri 2.7;3;4.5 17-FTS-0
15 15 9 Tri 8.1;9;13.5 25-FTS-0
16 16 10 Tri 9;10;15 21-FTS-0;22-FTS-0
17 17 6 Tri 5.4;6;9 22-FTS-0
18 18 5 Tri 4.5;5;7.5 20-FTS-0;22-FTS-0
19 19 3 Tri 2.7;3;4.5 24-FTS-0;29-FTS-0
20 20 7 Tri 6.3;7;10.5 23-FTS-0;25-FTS-0
21 21 2 Tri 1.8;2;3 28-FTS-0
22 22 7 Tri 6.3;7;10.5 23-FTS-0
23 23 2 Tri 1.8;2;3 24-FTS-0
24 24 3 Tri 2.7;3;4.5 30-FTS-0
25 25 3 Tri 2.7;3;4.5 30-FTS-0
26 26 7 Tri 6.3;7;10.5 31-FTS-0
27 27 8 Tri 7.2;8;12 28-FTS-0
28 28 3 Tri 2.7;3;4.5 31-FTS-0
29 29 7 Tri 6.3;7;10.5 32-FTS-0
30 30 2 Tri 1.8;2;3 32-FTS-0
31 31 2 Tri 1.8;2;3 32-FTS-0
32 32 0 Tri 0.009;0.01;0.015 N/A

416

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix D.2: PSPLIB Network j3038-7 Input Data

ID NameID Duration pd Index Successor


1 1 0 Tri 0.009;0.01;0.15 2-FTS-0;3-FTS-0;4-FTS-0
2 2 10 Tri 9;10;15 8-FTS-0;11-FTS-0;22-FTS-0
3 3 3 Tri 2.7;3;4.5 5-FTS-0;9-FTS-0;10-FTS-0
4 4 4 Tri 3.6;4;6 6-FTS-0;12-FTS-0;20-FTS-0
5 5 9 Tri 8.1;9;13.5 7-FTS-0;13-FTS-0;19-FTS-0
6 6 6 Tri 5.4;6;9 10-FTS-0;11-FTS-0;15-FTS-0
7 7 4 Tri 3.6;4;6 14-FTS-0;21-FTS-0;31-FTS-0
8 8 7 Tri 6.3;7;10.5 10-FTS-0;12-FTS-0;18-FTS-0
9 9 2 Tri 1.8;2;3 15-FTS-0;20-FTS-0;26-FTS-0
10 10 2 Tri 1.8;2;3 16-FTS-0;21-FTS-0;27-FTS-0
11 11 4 Tri 3.6;4;6 19-FTS-0;24-FTS-0;25-FTS-0
12 12 9 Tri 8.1;9;13.5 16-FTS-0;17-FTS-0;24-FTS-0
13 13 7 Tri 6.3;7;10.5 15-FTS-0;16-FTS-0;18-FTS-0
14 14 2 Tri 1.8;2;3 17-FTS-0;22-FTS-0;24-FTS-0
15 15 3 Tri 2.7;3;4.5 17-FTS-0;28-FTS-0;30-FTS-0
16 16 1 Tri 0.9;1;1.5 23-FTS-0;31-FTS-0
17 17 1 Tri 0.9;1;1.5 29-FTS-0
18 18 6 Tri 5.4;6;9 23-FTS-0;25-FTS-0
19 19 6 Tri 5.4;6;9 21-FTS-0;23-FTS-0;27-FTS-0
20 20 1 Tri 0.9;1;1.5 22-FTS-0;25-FTS-0;29-FTS-0
21 21 2 Tri 1.8;2;3 26-FTS-0
22 22 3 Tri 2.7;3;4.5 30-FTS-0
23 23 8 Tri 7.2;8;12 26-FTS-0
24 24 2 Tri 1.8;2;3 27-FTS-0
25 25 3 Tri 2.7;3;4.5 31-FTS-0
26 26 9 Tri 8.1;9;13.5 28-FTS-0
27 27 5 Tri 4.5;5;7.5 28-FTS-0;30-FTS-0
28 28 10 Tri 9;10;15 29-FTS-0
29 29 9 Tri 8.1;9;13.5 32-FTS-0
30 30 10 Tri 9;10;15 32-FTS-0
31 31 5 Tri 4.5;5;7.5 32-FTS-0
32 32 0 Tri 0.009;0.01;0.15 N/A

417

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix D.3: PSPLIB Network j902-4 Input Data

ID NameID Duration pd Index Successor


1 1 0 Tri 0.009;0.01;0.15 2-FTS-0;3-FTS-0;4-FTS-0
2 2 6 Tri 5.4;6;9 8-FTS-0;15-FTS-0;85-FTS-0
3 3 7 Tri 6.3;7;10.5 5-FTS-0;7-FTS-0;38-FTS-0
4 4 3 Tri 2.7;3;4.5 6-FTS-0;14-FTS-0
5 5 5 Tri 4.5;5;7.5 9-FTS-0;47-FTS-0
6 6 6 Tri 5.4;6;9 21-FTS-0;35-FTS-0
7 7 5 Tri 4.5;5;7.5 10-FTS-0;11-FTS-0;12-FTS-0
8 8 8 Tri 7.2;8;12 17-FTS-0
9 9 9 Tri 8.1;9;13.5 37-FTS-0;62-FTS-0
10 10 4 Tri 3.6;4;6 19-FTS-0;44-FTS-0
11 11 3 Tri 2.7;3;4.5 13-FTS-0;24-FTS-0;27-FTS-0
12 12 5 Tri 4.5;5;7.5 29-FTS-0;32-FTS-0
13 13 8 Tri 7.2;8;12 16-FTS-0;18-FTS-0;20-FTS-0
14 14 6 Tri 5.4;6;9 26-FTS-0
15 15 3 Tri 2.7;3;4.5 18-FTS-0;48-FTS-0
16 16 10 Tri 9;10;15 31-FTS-0;51-FTS-0
17 17 5 Tri 4.5;5;7.5 51-FTS-0;57-FTS-0;58-FTS-0
18 18 7 Tri 6.3;7;10.5 80-FTS-0
19 19 4 Tri 3.6;4;6 33-FTS-0;67-FTS-0
20 20 5 Tri 4.5;5;7.5 28-FTS-0
21 21 10 Tri 9;10;15 22-FTS-0
22 22 1 Tri 0.9;1;1.5 23-FTS-0;77-FTS-0
23 23 6 Tri 5.4;6;9 30-FTS-0
24 24 6 Tri 5.4;6;9 25-FTS-0;43-FTS-0;53-FTS-0
25 25 2 Tri 1.8;2;3 31-FTS-0;36-FTS-0;56-FTS-0
26 26 1 Tri 0.9;1;1.5 84-FTS-0
27 27 8 Tri 7.2;8;12 78-FTS-0
28 28 7 Tri 6.3;7;10.5 46-FTS-0;55-FTS-0;81-FTS-0
29 29 9 Tri 8.1;9;13.5 46-FTS-0
30 30 10 Tri 9;10;15 34-FTS-0;45-FTS-0;54-FTS-0
31 31 1 Tri 0.9;1;1.5 50-FTS-0
32 32 6 Tri 5.4;6;9 39-FTS-0;41-FTS-0
33 33 1 Tri 0.9;1;1.5 40-FTS-0
34 34 8 Tri 7.2;8;12 59-FTS-0
35 35 10 Tri 9;10;15 71-FTS-0;77-FTS-0
36 36 6 Tri 5.4;6;9 70-FTS-0;76-FTS-0
37 37 10 Tri 9;10;15 55-FTS-0
38 38 5 Tri 4.5;5;7.5 65-FTS-0;75-FTS-0

418

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
ID NameID Duration pd Index Successor
39 39 8 Tri 7.2;8;12 63-FTS-0
40 40 1 Tri 0.9;1;1.5 42-FTS-0;82-FTS-0
41 41 6 Tri 5.4;6;9 68-FTS-0
42 42 2 Tri 1.8;2;3 87-FTS-0
43 43 2 Tri 1.8;2;3 47-FTS-0;61-FTS-0;63-FTS-0
44 44 9 Tri 8.1;9;13.5 61-FTS-0
45 45 7 Tri 6.3;7;10.5 48-FTS-0;52-FTS-0
46 46 8 Tri 7.2;8;12 49-FTS-0;88-FTS-0
47 47 8 Tri 7.2;8;12 64-FTS-0
48 48 2 Tri 1.8;2;3 60-FTS-0
49 49 1 Tri 0.9;1;1.5 86-FTS-0
50 50 5 Tri 4.5;5;7.5 59-FTS-0
51 51 10 Tri 9;10;15 81-FTS-0
52 52 8 Tri 7.2;8;12 80-FTS-0
53 53 7 Tri 6.3;7;10.5 67-FTS-0;83-FTS-0
54 54 2 Tri 1.8;2;3 91-FTS-0
55 55 7 Tri 6.3;7;10.5 69-FTS-0
56 56 1 Tri 0.9;1;1.5 63-FTS-0
57 57 6 Tri 5.4;6;9 84-FTS-0
58 58 3 Tri 2.7;3;4.5 74-FTS-0;76-FTS-0
59 59 1 Tri 0.9;1;1.5 86-FTS-0
60 60 6 Tri 5.4;6;9 72-FTS-0;75-FTS-0
61 61 3 Tri 2.7;3;4.5 87-FTS-0
62 62 10 Tri 9;10;15 66-FTS-0
63 63 8 Tri 7.2;8;12 80-FTS-0
64 64 1 Tri 0.9;1;1.5 76-FTS-0
65 65 5 Tri 4.5;5;7.5 89-FTS-0
66 66 4 Tri 3.6;4;6 68-FTS-0
67 67 7 Tri 6.3;7;10.5 88-FTS-0
68 68 4 Tri 3.6;4;6 82-FTS-0
69 69 1 Tri 0.9;1;1.5 84-FTS-0
70 70 2 Tri 1.8;2;3 73-FTS-0
71 71 1 Tri 0.9;1;1.5 75-FTS-0
72 72 3 Tri 2.7;3;4.5 73-FTS-0
73 73 5 Tri 4.5;5;7.5 79-FTS-0
74 74 10 Tri 9;10;15 78-FTS-0
75 75 6 Tri 5.4;6;9 82-FTS-0
76 76 4 Tri 3.6;4;6 79-FTS-0;81-FTS-0;88-FTS-0
77 77 9 Tri 8.1;9;13.5 83-FTS-0;85-FTS-0
78 78 1 Tri 0.9;1;1.5 79-FTS-0
79 79 4 Tri 3.6;4;6 83-FTS-0
80 80 1 Tri 0.9;1;1.5 90-FTS-0

419

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
ID NameID Duration pd Index Successor
81 81 6 Tri 5.4;6;9 90-FTS-0
82 82 6 Tri 5.4;6;9 90-FTS-0
83 83 3 Tri 2.7;3;4.5 89-FTS-0
84 84 8 Tri 7.2;8;12 86-FTS-0
85 85 4 Tri 3.6;4;6 87-FTS-0
86 86 1 Tri 0.9;1;1.5 89-FTS-0
87 87 6 Tri 5.4;6;9 91-FTS-0
88 88 5 Tri 4.5;5;7.5 91-FTS-0
89 89 4 Tri 3.6;4;6 92-FTS-0
90 90 6 Tri 5.4;6;9 92-FTS-0
91 91 9 Tri 8.1;9;13.5 92-FTS-0
92 92 0 Tri 0.009;0.01;0.15 N/A

420

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix E
Code Outputs/Network
Appendix E.1: All Paths from Source to Sink of Network j301

Path# Network all possible paths from ‘Source’ to ‘Sink’


1 Source 2 6 30 Sink
2 Source 2 11 20 23 24 30 Sink
3 Source 2 11 20 25 30 Sink
4 Source 2 11 26 31 Sink
5 Source 2 15 25 30 Sink
6 Source 3 7 27 28 31 Sink
7 Source 3 8 12 14 17 22 23 24 30 Sink
8 Source 3 8 19 24 30 Sink
9 Source 3 8 19 29 Sink
10 Source 3 8 27 28 31 Sink
11 Source 3 13 17 22 23 24 30 Sink
12 Source 3 13 18 20 23 24 30 Sink
13 Source 3 13 18 20 25 30 Sink
14 Source 3 13 18 22 23 24 30 Sink
15 Source 4 5 20 23 24 30 Sink
16 Source 4 5 20 25 30 Sink
17 Source 4 9 14 17 22 23 24 30 Sink
18 Source 4 10 16 21 28 31 Sink
19 Source 4 10 16 22 23 24 30 Sink
20 Source 4 10 25 30 Sink

421

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix E.2 – Network Complexity Measures

Appendix E.2.1 - PSPLIB Networks - Ranges of Coefficient of Networks (CNC)

Group #1 (CNC Values Provided for 120 Networks only out of 680) --------page 423

Group #2 None

Group #3 (CNC Values Provided for 120 Networks only out of 680) --------page 424

Group #4 None

Group #5 (CNC Values Provided for 120 Networks only out of 680) --------page 425

422

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PSPLIB Networks - Ranges of CNC Values****** Group#1: CNC > 1.5
Net Net Name CNC Net Net Name CNC Net Net Name CNC
1 j3010_1 1.5 41 j3014_10 1.5 81 j302_3 1.5
2 j3010_10 1.5 42 j3014_2 1.5 82 j302_4 1.5
3 j3010_2 1.5 43 j3014_3 1.5 83 j302_5 1.5
4 j3010_3 1.5 44 j3014_4 1.5 84 j302_6 1.5
5 j3010_4 1.5 45 j3014_5 1.5 85 j302_7 1.5
6 j3010_5 1.5 46 j3014_6 1.5 86 j302_8 1.5
7 j3010_6 1.5 47 j3014_7 1.5 87 j302_9 1.5
8 j3010_7 1.5 48 j3014_8 1.5 88 j303_1 1.5
9 j3010_8 1.5 49 j3014_9 1.5 89 j303_10 1.5
10 j3010_9 1.5 50 j3015_1 1.5 90 j303_2 1.5
11 j3011_1 1.5 51 j3015_10 1.5 91 j303_3 1.5
12 j3011_10 1.5 52 j3015_2 1.5 92 j303_4 1.5
13 j3011_2 1.5 53 j3015_3 1.5 93 j303_5 1.5
14 j3011_3 1.5 54 j3015_4 1.5 94 j303_6 1.5
15 j3011_4 1.5 55 j3015_5 1.5 95 j303_7 1.5
16 j3011_5 1.5 56 j3015_6 1.5 96 j303_8 1.5
17 j3011_6 1.5 57 j3015_7 1.5 97 j303_9 1.5
18 j3011_7 1.5 58 j3015_9 1.5 98 j304_1 1.5
19 j3011_8 1.5 59 j3016_1 1.5 99 j304_10 1.5
20 j3011_9 1.5 60 j3016_10 1.5 100 j304_2 1.5
21 j3012_1 1.5 61 j3016_2 1.5 101 j304_3 1.5
22 j3012_10 1.5 62 j3016_3 1.5 102 j304_4 1.5
23 j3012_2 1.5 63 j3016_4 1.5 103 j304_5 1.5
24 j3012_3 1.5 64 j3016_5 1.5 104 j304_7 1.5
25 j3012_4 1.5 65 j3016_6 1.5 105 j304_9 1.5
26 j3012_5 1.5 66 j3016_7 1.5 106 j305_1 1.5
27 j3012_6 1.5 67 j3016_8 1.5 107 j305_10 1.5
28 j3012_7 1.5 68 j3016_9 1.5 108 j305_2 1.5
29 j3012_8 1.5 69 j301_1 1.5 109 j305_3 1.5
30 j3012_9 1.5 70 j301_10 1.5 110 j305_4 1.5
31 j3013_1 1.5 71 j301_2 1.5 111 j305_5 1.5
32 j3013_10 1.5 72 j301_3 1.5 112 j305_6 1.5
33 j3013_2 1.5 73 j301_4 1.5 113 j305_7 1.5
34 j3013_3 1.5 74 j301_5 1.5 114 j305_8 1.5
35 j3013_4 1.5 75 j301_6 1.5 115 j305_9 1.5
36 j3013_5 1.5 76 j301_7 1.5 116 j306_1 1.5
37 j3013_7 1.5 77 j301_8 1.5 117 j306_2 1.5
38 j3013_8 1.5 78 j301_9 1.5 118 j306_3 1.5
39 j3013_9 1.5 79 j302_1 1.5 119 j306_4 1.5
40 j3014_1 1.5 80 j302_2 1.5 120 j306_5 1.5

423

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PSPLIB Networks - Ranges of CNC Values****** Group#3: CNC > 1.78
Net Net Name CNC Net Net Name CNC Net Net Name CNC
1 j3017_1 1.8125 41 j3021_10 1.8125 81 j3025_4 1.8125
2 j3017_10 1.8125 42 j3021_2 1.8125 82 j3025_5 1.8125
3 j3017_2 1.8125 43 j3021_3 1.8125 83 j3025_6 1.8125
4 j3017_3 1.8125 44 j3021_4 1.8125 84 j3025_7 1.8125
5 j3017_4 1.8125 45 j3021_5 1.8125 85 j3025_8 1.8125
6 j3017_5 1.8125 46 j3021_6 1.8125 86 j3025_9 1.8125
7 j3017_6 1.8125 47 j3021_7 1.8125 87 j3026_1 1.8125
8 j3017_7 1.8125 48 j3021_8 1.8125 88 j3026_10 1.8125
9 j3017_8 1.8125 49 j3022_1 1.8125 89 j3026_2 1.8125
10 j3017_9 1.8125 50 j3022_10 1.8125 90 j3026_3 1.8125
11 j3018_1 1.8125 51 j3022_2 1.8125 91 j3026_4 1.8125
12 j3018_10 1.8125 52 j3022_3 1.8125 92 j3026_5 1.8125
13 j3018_2 1.8125 53 j3022_4 1.8125 93 j3026_6 1.8125
14 j3018_3 1.8125 54 j3022_5 1.8125 94 j3026_7 1.8125
15 j3018_4 1.8125 55 j3022_7 1.8125 95 j3026_8 1.8125
16 j3018_5 1.8125 56 j3022_8 1.8125 96 j3026_9 1.8125
17 j3018_6 1.8125 57 j3022_9 1.8125 97 j3027_1 1.8125
18 j3018_7 1.8125 58 j3023_1 1.8125 98 j3027_10 1.8125
19 j3018_8 1.8125 59 j3023_10 1.8125 99 j3027_2 1.8125
20 j3018_9 1.8125 60 j3023_2 1.8125 100 j3027_3 1.8125
21 j3019_1 1.8125 61 j3023_3 1.8125 101 j3027_4 1.8125
22 j3019_10 1.8125 62 j3023_4 1.8125 102 j3027_5 1.8125
23 j3019_2 1.8125 63 j3023_5 1.8125 103 j3027_6 1.8125
24 j3019_3 1.8125 64 j3023_6 1.8125 104 j3027_7 1.8125
25 j3019_4 1.8125 65 j3023_7 1.8125 105 j3027_8 1.8125
26 j3019_5 1.8125 66 j3023_8 1.8125 106 j3027_9 1.8125
27 j3019_6 1.8125 67 j3023_9 1.8125 107 j3028_1 1.8125
28 j3019_7 1.8125 68 j3024_10 1.8125 108 j3028_10 1.8125
29 j3019_9 1.8125 69 j3024_2 1.8125 109 j3028_2 1.8125
30 j3020_1 1.8125 70 j3024_3 1.8125 110 j3028_3 1.8125
31 j3020_10 1.8125 71 j3024_4 1.8125 111 j3028_4 1.8125
32 j3020_2 1.8125 72 j3024_5 1.8125 112 j3028_5 1.8125
33 j3020_3 1.8125 73 j3024_6 1.8125 113 j3028_6 1.8125
34 j3020_4 1.8125 74 j3024_7 1.8125 114 j3028_7 1.8125
35 j3020_5 1.8125 75 j3024_8 1.8125 115 j3028_8 1.8125
36 j3020_6 1.8125 76 j3024_9 1.8125 116 j3028_9 1.8125
37 j3020_7 1.8125 77 j3025_1 1.8125 117 j3029_1 1.8125
38 j3020_8 1.8125 78 j3025_10 1.8125 118 j3029_10 1.8125
39 j3020_9 1.8125 79 j3025_2 1.8125 119 j3029_2 1.8125
40 j3021_1 1.8125 80 j3025_3 1.8125 120 j3029_3 1.8125

424

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PSPLIB Networks - Ranges of CNC Values****** Group#5: CNC > 2.06
Net Net Name CNC Net Net Name CNC Net Net Name CNC
1 j3033_1 2.125 41 j3037_3 2.125 81 j3041_5 2.125
2 j3033_10 2.125 42 j3037_4 2.125 82 j3041_6 2.125
3 j3033_2 2.125 43 j3037_5 2.125 83 j3041_7 2.125
4 j3033_3 2.125 44 j3037_6 2.125 84 j3041_8 2.125
5 j3033_4 2.125 45 j3037_7 2.125 85 j3042_1 2.125
6 j3033_5 2.125 46 j3037_8 2.125 86 j3042_10 2.125
7 j3033_6 2.125 47 j3037_9 2.125 87 j3042_2 2.125
8 j3033_7 2.125 48 j3038_1 2.125 88 j3042_3 2.125
9 j3033_9 2.125 49 j3038_10 2.125 89 j3042_4 2.125
10 j3034_1 2.125 50 j3038_2 2.125 90 j3042_5 2.125
11 j3034_10 2.125 51 j3038_3 2.125 91 j3042_6 2.125
12 j3034_2 2.125 52 j3038_4 2.125 92 j3042_7 2.125
13 j3034_3 2.125 53 j3038_5 2.125 93 j3042_8 2.125
14 j3034_4 2.125 54 j3038_6 2.125 94 j3042_9 2.125
15 j3034_5 2.125 55 j3038_7 2.125 95 j3043_1 2.125
16 j3034_6 2.125 56 j3038_8 2.125 96 j3043_10 2.125
17 j3034_7 2.125 57 j3038_9 2.125 97 j3043_2 2.125
18 j3034_8 2.125 58 j3039_1 2.125 98 j3043_3 2.125
19 j3034_9 2.125 59 j3039_10 2.125 99 j3043_4 2.125
20 j3035_1 2.125 60 j3039_2 2.125 100 j3043_5 2.125
21 j3035_10 2.125 61 j3039_3 2.125 101 j3043_6 2.125
22 j3035_2 2.125 62 j3039_4 2.125 102 j3043_7 2.125
23 j3035_3 2.125 63 j3039_5 2.125 103 j3043_8 2.125
24 j3035_4 2.125 64 j3039_7 2.125 104 j3043_9 2.125
25 j3035_5 2.125 65 j3039_8 2.125 105 j3044_1 2.125
26 j3035_6 2.125 66 j3039_9 2.125 106 j3044_10 2.125
27 j3035_7 2.125 67 j3040_1 2.125 107 j3044_2 2.125
28 j3035_9 2.125 68 j3040_10 2.125 108 j3044_3 2.125
29 j3036_1 2.125 69 j3040_2 2.125 109 j3044_4 2.125
30 j3036_10 2.125 70 j3040_3 2.125 110 j3044_5 2.125
31 j3036_2 2.125 71 j3040_4 2.125 111 j3044_6 2.125
32 j3036_3 2.125 72 j3040_6 2.125 112 j3044_7 2.125
33 j3036_4 2.125 73 j3040_7 2.125 113 j3044_8 2.125
34 j3036_5 2.125 74 j3040_8 2.125 114 j3044_9 2.125
35 j3036_6 2.125 75 j3040_9 2.125 115 j3045_1 2.125
36 j3036_7 2.125 76 j3041_1 2.125 116 j3045_10 2.125
37 j3036_8 2.125 77 j3041_10 2.125 117 j3045_2 2.125
38 j3036_9 2.125 78 j3041_2 2.125 118 j3045_4 2.125
39 j3037_1 2.125 79 j3041_3 2.125 119 j3045_5 2.125
40 j3037_2 2.125 80 j3041_4 2.125 120 j3045_6 2.125

425

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix E.2.2 - PSPLIB Networks - Ranges of Paths Ratio Values

Group #1 (Ratios provided for 135 Networks only out of 1659) --------page 427

Group #2 (Ratios provided for 135 Networks only out of 284) --------page 428

Group #3 (Ratios provided for 135 Networks only out of 67) --------page 429

Group #4 (Ratios provided for 135 Networks only out of 24) --------page 430

Group #5 (Ratios provided for 135 Networks only out of 6) --------page 430

426

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PSPLIB Networks - Paths Ratios (%) for 100 Simulation Runs Group#1: Ratios > 0%

No. Net Name Ratio No. Net Name Ratio No. Net Name Ratio
1 j3043_8 0.64935 46 j3039_1 2.08333 91 j3040_9 3.19149
2 j3045_8 0.79365 47 j3029_9 2.12766 92 j3048_9 3.19149
3 j3037_9 0.87719 48 j3047_2 2.12766 93 j3028_9 3.22581
4 j3045_2 0.89286 49 j3046_2 2.15054 94 j3030_7 3.22581
5 j3035_8 0.95238 50 j3039_7 2.17391 95 j3043_5 3.22581
6 j3042_10 0.96154 51 j3037_4 2.19780 96 j3039_9 3.25203
7 j3036_3 0.99010 52 j3030_2 2.22222 97 j3034_7 3.30579
8 j3036_10 1.02041 53 j3046_6 2.24719 98 j3017_2 3.33333
9 j3042_5 1.02041 54 j3026_8 2.27273 99 j303_4 3.33333
10 j3045_6 1.02041 55 j3034_5 2.32558 100 j3035_1 3.36134
11 j3043_4 1.03093 56 j3046_1 2.32558 101 j3045_9 3.37079
12 j3033_4 1.05263 57 j3038_6 2.36220 102 j3032_1 3.40909
13 j3042_8 1.06383 58 j3019_3 2.38095 103 j3035_4 3.48837
14 j3035_3 1.07527 59 j3043_1 2.38095 104 j3043_3 3.52941
15 j3041_9 1.19048 60 j3047_1 2.45098 105 j3045_5 3.57143
16 j3044_1 1.20482 61 j3041_6 2.46914 106 j3039_5 3.61446
17 j3046_3 1.21951 62 j3034_4 2.50000 107 j3047_6 3.61446
18 j3041_3 1.25000 63 j3040_3 2.50000 108 j3018_4 3.63636
19 j3035_7 1.38889 64 j3048_7 2.50000 109 j3019_5 3.63636
20 j3042_9 1.38889 65 j3034_6 2.53165 110 j3034_9 3.63636
21 j3048_1 1.39860 66 j3042_1 2.53165 111 j3041_7 3.64964
22 j3038_4 1.51515 67 j3036_4 2.58621 112 j3040_6 3.67647
23 j3022_9 1.61290 68 j3048_2 2.58621 113 j3023_3 3.70370
24 j3040_5 1.63934 69 j3033_6 2.63158 114 j3033_9 3.70370
25 j3044_6 1.71429 70 j3035_6 2.63158 115 j3042_6 3.70370
26 j3039_2 1.72414 71 j3045_4 2.67857 116 j3018_3 3.77358
27 j3045_7 1.72414 72 j3031_1 2.70270 117 j3039_4 3.79747
28 j3028_5 1.75439 73 j3031_2 2.70270 118 j3043_7 3.79747
29 j3036_1 1.76991 74 j3043_6 2.70270 119 j3048_5 3.79747
30 j3037_8 1.76991 75 j3042_3 2.75229 120 j3039_8 3.80952
31 j3036_8 1.78571 76 j3023_8 2.77778 121 j3024_3 3.84615
32 j3043_10 1.81818 77 j3025_1 2.77778 122 j3037_6 3.84615
33 j3025_10 1.85185 78 j3038_7 2.77778 123 j3037_7 3.84615
34 j3038_1 1.86916 79 j3048_3 2.77778 124 j3025_5 3.92157
35 j3047_3 1.88679 80 j3044_9 2.81690 125 j3032_2 3.92157
36 j3035_5 1.90476 81 j3035_10 2.85714 126 j3040_2 4.00000
37 j3037_3 1.90476 82 j3045_1 2.85714 127 j3037_10 4.06504
38 j3042_7 1.90476 83 j3019_8 2.94118 128 j3028_6 4.08163
39 j3047_4 1.98020 84 j3046_9 3.00000 129 j3037_2 4.10959
40 j3021_3 2.00000 85 j3048_4 3.01205 130 j3023_1 4.16667
41 j3021_7 2.00000 86 j3026_1 3.03030 131 j303_3 4.16667
42 j3028_10 2.04082 87 j3040_8 3.03030 132 j3017_3 4.25532
43 j3024_4 2.08333 88 j3045_10 3.12500 133 j3030_5 4.25532
44 j3027_8 2.08333 89 j3046_8 3.17460 134 j3044_10 4.25532
45 j3036_5 2.08333 90 j3048_8 3.17460 135 j3041_5 4.26829

427

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PSPLIB Networks - Paths Ratios (%) for 100 Simulation Runs Group#2: Ratios > 7%

No. Net Name Ratio No. Net Name Ratio No. Net Name Ratio
1 j3047_8 7.00000 46 j308_2 8.69565 91 j3026_10 10.34483
2 j3018_5 7.14286 47 j3023_9 8.82353 92 j3012_5 10.52632
3 j3027_2 7.14286 48 j3036_7 8.82353 93 j3015_1 10.52632
4 j3027_6 7.14286 49 j3026_5 8.88889 94 j302_2 10.52632
5 j3031_8 7.14286 50 j3032_6 8.88889 95 j3032_4 10.52632
6 j3028_7 7.27273 51 j3010_2 9.09091 96 j304_3 10.52632
7 j3023_6 7.31707 52 j3013_9 9.09091 97 j305_10 10.52632
8 j3032_8 7.31707 53 j3015_5 9.09091 98 j305_2 10.52632
9 j3021_2 7.35294 54 j3019_6 9.09091 99 j306_8 10.52632
10 j3038_2 7.50000 55 j304_2 9.09091 100 j307_6 10.52632
11 j3044_7 7.56303 56 j304_5 9.09091 101 j308_7 10.52632
12 j3018_6 7.69231 57 j3033_5 9.19540 102 j304_8 10.71429
13 j3023_4 7.69231 58 j3039_3 9.25926 103 j3018_10 10.81081
14 j3023_7 7.69231 59 j3018_8 9.30233 104 j3025_4 10.81081
15 j3024_2 7.69231 60 j3027_1 9.30233 105 j3012_6 11.11111
16 j309_10 7.69231 61 j3011_1 9.52381 106 j3017_4 11.11111
17 j3029_4 7.84314 62 j3012_3 9.52381 107 j3018_9 11.11111
18 j3044_3 7.86517 63 j3013_6 9.52381 108 j3020_10 11.11111
19 j3047_5 7.86517 64 j3015_3 9.52381 109 j3020_4 11.11111
20 j3019_10 7.89474 65 j301_7 9.52381 110 j3020_9 11.11111
21 j309_7 8.00000 66 j303_9 9.52381 111 j3023_10 11.11111
22 j3041_1 8.06452 67 j305_3 9.52381 112 j3024_9 11.11111
23 j3036_2 8.10811 68 j309_5 9.52381 113 j302_7 11.11111
24 j3047_7 8.13953 69 j3021_1 9.61538 114 j3032_10 11.11111
25 j3022_10 8.16327 70 j3022_8 9.75610 115 j3041_10 11.11111
26 j3019_7 8.33333 71 j3025_3 9.75610 116 j305_9 11.11111
27 j3024_6 8.33333 72 j3030_1 9.75610 117 j306_2 11.11111
28 j3025_6 8.33333 73 j3047_9 9.82143 118 j3022_5 11.62791
29 j302_4 8.33333 74 j3041_8 9.89011 119 j3020_3 11.76471
30 j3030_6 8.33333 75 j3010_5 10.00000 120 j3012_7 12.00000
31 j3041_4 8.43373 76 j3011_4 10.00000 121 j3016_5 12.00000
32 j3022_2 8.47458 77 j3014_7 10.00000 122 j3030_4 12.00000
33 j3025_7 8.51064 78 j301_1 10.00000 123 j3017_1 12.12121
34 j3042_2 8.53659 79 j301_3 10.00000 124 j3022_6 12.12121
35 j3022_7 8.57143 80 j3025_8 10.00000 125 j3018_2 12.16216
36 j3037_5 8.57143 81 j3025_9 10.00000 126 j3029_7 12.19512
37 j3043_2 8.57143 82 j3026_7 10.00000 127 j3013_3 12.50000
38 j3044_4 8.62069 83 j3032_3 10.00000 128 j303_8 12.50000
39 j3021_4 8.64198 84 j308_10 10.00000 129 j304_1 12.50000
40 j3014_3 8.69565 85 j3034_2 10.09174 130 j3027_5 12.76596
41 j3015_10 8.69565 86 j3046_10 10.11236 131 j3014_5 13.04348
42 j3016_9 8.69565 87 j3021_10 10.20408 132 j306_9 13.04348
43 j3022_1 8.69565 88 j3038_5 10.22727 133 j3019_1 13.15789
44 j303_10 8.69565 89 j3021_6 10.25641 134 j3019_4 13.15789
45 j307_4 8.69565 90 j3021_5 10.34483 135 j3029_5 13.15789

428

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PSPLIB Networks - Paths Ratios (%) for 100 Simulation Runs Group#3: Ratios > 14%

No. Net Name Ratio No. Net Name Ratio No. Net Name Ratio
1 j3017_5 14.00000 24 j308_6 15.00000 47 j6012_6 16.66667
2 j3018_1 14.00000 25 j309_1 15.00000 48 j6012_9 16.66667
3 j3020_5 14.00000 26 j309_9 15.00000 49 j3022_4 17.07317
4 j3012_4 14.28571 27 j6011_1 15.00000 50 j3021_9 17.14286
5 j3013_4 14.28571 28 j608_2 15.15152 51 j3029_3 17.14286
6 j3016_2 14.28571 29 j3027_7 15.21739 52 j3011_5 17.39130
7 j306_5 14.28571 30 j3014_10 15.38462 53 j306_7 17.39130
8 j308_1 14.28571 31 j3010_10 15.78947 54 j3023_5 17.94872
9 j6014_1 14.28571 32 j3011_2 15.78947 55 j3010_4 18.18182
10 j6014_5 14.28571 33 j3013_5 15.78947 56 j3024_8 18.18182
11 j608_4 14.28571 34 j301_5 15.78947 57 j304_6 18.18182
12 j9015_2 14.58333 35 j302_5 15.78947 58 j3011_10 19.04762
13 j6016_9 14.63415 36 j3032_7 15.78947 59 j302_3 19.04762
14 j6013_5 14.70588 37 j307_7 15.78947 60 j3030_9 19.04762
15 j602_5 14.70588 38 j308_9 15.78947 61 j306_10 19.04762
16 j602_7 14.70588 39 j309_3 15.78947 62 j3029_6 19.51220
17 j3010_1 15.00000 40 j302_10 16.00000 63 j3013_8 20.00000
18 j3014_4 15.00000 41 j306_3 16.00000 64 j3015_2 20.00000
19 j301_9 15.00000 42 j3019_9 16.21622 65 j307_10 20.00000
20 j304_9 15.00000 43 j3028_3 16.21622 66 j6016_6 20.00000
21 j305_5 15.00000 44 j3015_9 16.66667 67 j3014_1 20.83333
22 j306_6 15.00000 45 j302_8 16.66667
23 j307_3 15.00000 46 j307_8 16.66667

429

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PSPLIB Networks - Paths Ratios (%) for 100 Simulation Runs Group#4: Ratios > 21%

No. Net Name Ratio No. Net Name Ratio No. Net Name Ratio
1 j3010_8 21.05263
2 j3012_2 21.05263
3 j3014_6 21.05263
4 j3016_3 21.05263
5 j3016_4 21.05263
6 j302_9 21.05263
7 j305_1 21.05263
8 j307_5 21.05263
9 j308_8 21.05263
10 j3011_7 22.22222
11 j301_2 22.22222
12 j305_6 22.22222
13 j307_2 22.72727
14 j3016_7 23.80952
15 j303_6 23.80952
16 j3020_6 24.48980
17 j301_10 25.00000
18 j3012_1 26.31579
19 j3014_2 26.31579
20 j305_4 26.31579
21 j301_6 26.92308
22 j3013_2 27.77778
23 j305_7 27.77778
24 j309_2 27.77778

PSPLIB Networks - Paths Ratios (%) for 100 Simulation Runs Group#5: Ratios > 28%

No. Net Name Ratio No. Net Name Ratio No. Net Name Ratio
1 j303_1 28.00000
2 j302_1 30.00000
3 j3013_1 30.43478
4 j303_5 30.43478
5 j3012_8 31.57895
6 j306_4 31.57895

430

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix E.2.3 - PSPLIB Networks - Ranges of Johnson Measure (D) Values

Group #1 (D Values Provided for 135 Networks only out of 480) --------page 432

Group #2 (D Values Provided for 135 Networks only out of 639) --------page 433

Group #3 (D Values Provided for 135 Networks only out of 361) --------page 434

Group #4 (D Values Provided for 135 Networks only out of 360) --------page 435

Group #5 (D Values Provided for 135 Networks only out of 200) --------page 436

431

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PSPLIB Networks - Ranges of Johnson Measure (D)Values ****** Group#1: D > 10
No. Net Name D No. Net Name D No. Net Name D
1 j304_8 13 46 j302_3 16 91 j3016_3 17
2 j303_4 14 47 j302_6 16 92 j3016_4 17
3 j307_4 14 48 j303_10 16 93 j3016_6 17
4 j309_10 14 49 j303_2 16 94 j3016_8 17
5 j3010_6 15 50 j303_6 16 95 j301_1 17
6 j3012_4 15 51 j304_10 16 96 j301_10 17
7 j3014_1 15 52 j304_5 16 97 j301_3 17
8 j3014_3 15 53 j304_6 16 98 j301_4 17
9 j3015_5 15 54 j306_1 16 99 j301_5 17
10 j3016_5 15 55 j306_5 16 100 j302_1 17
11 j301_6 15 56 j306_7 16 101 j302_10 17
12 j302_4 15 57 j306_9 16 102 j302_2 17
13 j303_1 15 58 j307_10 16 103 j302_5 17
14 j304_1 15 59 j307_9 16 104 j302_9 17
15 j304_2 15 60 j308_1 16 105 j303_3 17
16 j304_4 15 61 j308_5 16 106 j303_5 17
17 j309_5 15 62 j309_4 16 107 j303_7 17
18 j309_6 15 63 j309_8 16 108 j303_8 17
19 j309_7 15 64 j3010_1 17 109 j303_9 17
20 j3010_3 16 65 j3010_10 17 110 j304_3 17
21 j3010_4 16 66 j3010_2 17 111 j304_7 17
22 j3010_7 16 67 j3010_5 17 112 j304_9 17
23 j3011_1 16 68 j3010_8 17 113 j305_1 17
24 j3011_4 16 69 j3010_9 17 114 j305_10 17
25 j3011_5 16 70 j3011_10 17 115 j305_2 17
26 j3011_9 16 71 j3011_2 17 116 j305_3 17
27 j3012_3 16 72 j3011_3 17 117 j305_4 17
28 j3012_9 16 73 j3011_6 17 118 j305_5 17
29 j3013_1 16 74 j3011_8 17 119 j305_8 17
30 j3013_3 16 75 j3012_1 17 120 j306_10 17
31 j3013_4 16 76 j3012_2 17 121 j306_3 17
32 j3013_6 16 77 j3012_5 17 122 j306_4 17
33 j3013_7 16 78 j3012_7 17 123 j306_6 17
34 j3013_9 16 79 j3012_8 17 124 j306_8 17
35 j3014_4 16 80 j3013_5 17 125 j307_2 17
36 j3014_7 16 81 j3013_8 17 126 j307_3 17
37 j3015_10 16 82 j3014_10 17 127 j307_5 17
38 j3015_3 16 83 j3014_2 17 128 j307_6 17
39 j3015_4 16 84 j3014_5 17 129 j307_7 17
40 j3015_7 16 85 j3014_6 17 130 j308_10 17
41 j3015_8 16 86 j3014_8 17 131 j308_2 17
42 j3016_7 16 87 j3014_9 17 132 j308_3 17
43 j3016_9 16 88 j3015_1 17 133 j308_6 17
44 j301_7 16 89 j3015_2 17 134 j308_7 17
45 j301_9 16 90 j3016_2 17 135 j308_8 17

432

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PSPLIB Networks - Ranges of Johnson Measure (D)Values ****** Group#2: D > 27
No. Net Name D No. Net Name D No. Net Name D
1 j3020_8 27 46 j602_2 29 91 j606_8 30
2 j3034_5 27 47 j602_4 29 92 j606_9 30
3 j3034_8 27 48 j602_6 29 93 j607_10 30
4 j3035_2 27 49 j605_1 29 94 j607_2 30
5 j3036_2 27 50 j605_2 29 95 j607_3 30
6 j3037_2 27 51 j606_10 29 96 j607_4 30
7 j3037_7 27 52 j607_7 29 97 j607_5 30
8 j3038_7 27 53 j607_8 29 98 j608_3 30
9 j3039_4 27 54 j608_1 29 99 j608_9 30
10 j3040_1 27 55 j608_10 29 100 j609_7 30
11 j3042_2 27 56 j608_4 29 101 j6010_10 31
12 j3042_4 27 57 j609_2 29 102 j6010_4 31
13 j3042_7 27 58 j609_5 29 103 j6010_7 31
14 j3046_10 27 59 j609_6 29 104 j6010_8 31
15 j3048_5 27 60 j609_9 29 105 j6011_1 31
16 j3036_9 28 61 j6010_1 30 106 j6011_7 31
17 j6014_10 27 62 j6010_3 30 107 j6011_8 31
18 j6016_2 27 63 j6010_5 30 108 j6012_5 31
19 j603_10 27 64 j6011_2 30 109 j6012_6 31
20 j603_9 27 65 j6011_3 30 110 j6013_1 31
21 j605_5 27 66 j6011_6 30 111 j6013_10 31
22 j605_9 27 67 j6012_1 30 112 j6013_2 31
23 j6010_6 28 68 j6012_3 30 113 j6013_3 31
24 j6014_2 28 69 j6012_7 30 114 j6013_7 31
25 j6016_10 28 70 j6012_8 30 115 j6013_8 31
26 j601_10 28 71 j6013_6 30 116 j6013_9 31
27 j604_10 28 72 j6014_1 30 117 j6014_7 31
28 j607_1 28 73 j6014_5 30 118 j6014_9 31
29 j609_8 28 74 j6014_6 30 119 j6015_2 31
30 j6010_2 29 75 j6014_8 30 120 j6015_5 31
31 j6011_5 29 76 j6015_4 30 121 j6015_6 31
32 j6012_10 29 77 j6016_7 30 122 j6015_7 31
33 j6012_4 29 78 j6016_9 30 123 j6015_9 31
34 j6012_9 29 79 j601_7 30 124 j6016_5 31
35 j6014_3 29 80 j602_10 30 125 j601_3 31
36 j6015_1 29 81 j602_3 30 126 j601_6 31
37 j6015_10 29 82 j603_4 30 127 j601_9 31
38 j6015_3 29 83 j603_5 30 128 j602_1 31
39 j6016_3 29 84 j603_8 30 129 j602_9 31
40 j6016_4 29 85 j604_4 30 130 j603_3 31
41 j6016_6 29 86 j604_5 30 131 j603_6 31
42 j601_1 29 87 j605_6 30 132 j603_7 31
43 j601_2 29 88 j605_7 30 133 j604_6 31
44 j601_5 29 89 j606_2 30 134 j604_9 31
45 j601_8 29 90 j606_6 30 135 j605_10 31

433

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PSPLIB Networks - Ranges of Johnson Measure (D)Values ****** Group#3: D > 44
No. Net Name D No. Net Name D No. Net Name D
1 j6019_3 44 46 j6027_6 46 91 j6042_8 47
2 j6021_10 44 47 j6031_5 46 92 j6043_10 47
3 j6023_4 44 48 j6034_10 46 93 j6043_5 47
4 j6024_4 44 49 j6034_5 46 94 j6043_8 47
5 j6025_7 44 50 j6035_3 46 95 j6044_6 47
6 j6026_3 44 51 j6035_4 46 96 j6044_8 47
7 j6028_10 44 52 j6035_6 46 97 j6045_9 47
8 j6028_2 44 53 j6036_10 46 98 j6046_1 47
9 j6028_4 44 54 j6037_5 46 99 j6046_7 47
10 j6032_7 44 55 j6038_2 46 100 j6047_7 47
11 j6033_1 44 56 j6038_8 46 101 j6048_1 47
12 j6033_9 44 57 j6039_2 46 102 j6033_10 48
13 j6034_7 44 58 j6039_6 46 103 j6033_2 48
14 j6035_9 44 59 j6039_9 46 104 j6033_7 48
15 j6037_1 44 60 j6040_10 46 105 j6033_8 48
16 j6037_3 44 61 j6040_2 46 106 j6035_2 48
17 j6039_10 44 62 j6040_7 46 107 j6036_5 48
18 j6041_10 44 63 j6041_4 46 108 j6038_1 48
19 j6041_5 44 64 j6041_7 46 109 j6038_9 48
20 j6041_8 44 65 j6042_1 46 110 j6039_4 48
21 j6041_9 44 66 j6042_9 46 111 j6040_8 48
22 j6042_3 44 67 j6043_2 46 112 j6042_4 48
23 j6043_1 44 68 j6043_3 46 113 j6044_1 48
24 j6044_7 44 69 j6043_4 46 114 j6044_10 48
25 j6046_5 44 70 j6044_2 46 115 j6044_4 48
26 j6047_5 44 71 j6045_2 46 116 j6044_9 48
27 j6018_3 45 72 j6045_4 46 117 j6045_5 48
28 j6032_1 45 73 j6046_3 46 118 j6045_7 48
29 j6035_1 45 74 j6046_8 46 119 j6046_4 48
30 j6036_1 45 75 j6047_3 46 120 j6047_9 48
31 j6036_4 45 76 j6047_4 46 121 j6048_4 48
32 j6036_9 45 77 j6047_8 46 122 j6048_7 48
33 j6037_2 45 78 j6048_2 46 123 j6048_9 48
34 j6037_9 45 79 j6048_5 46 124 j6033_6 49
35 j6038_7 45 80 j6048_8 46 125 j6034_2 49
36 j6039_7 45 81 j6034_3 47 126 j6034_9 49
37 j6041_2 45 82 j6034_6 47 127 j6035_10 49
38 j6042_5 45 83 j6036_3 47 128 j6036_6 49
39 j6043_9 45 84 j6037_7 47 129 j6036_7 49
40 j6045_3 45 85 j6038_3 47 130 j6038_10 49
41 j6046_6 45 86 j6038_5 47 131 j6038_4 49
42 j6048_3 45 87 j6039_3 47 132 j6038_6 49
43 j6025_8 46 88 j6040_4 47 133 j6039_8 49
44 j6026_1 46 89 j6041_1 47 134 j6040_3 49
45 j6026_2 46 90 j6042_7 47 135 j6040_5 49

434

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PSPLIB Networks - Ranges of Johnson Measure (D)Values ****** Group#4: D > 61
No. Net Name D No. Net Name D No. Net Name D
1 j9018_1 61 46 j9030_5 63 91 j9048_6 65
2 j9020_5 61 47 j9031_8 63 92 j9034_3 66
3 j9020_6 61 48 j9032_8 63 93 j9035_10 66
4 j9021_2 61 49 j9033_9 63 94 j9036_10 66
5 j9022_10 61 50 j9037_9 63 95 j9036_3 66
6 j9022_5 61 51 j9038_3 63 96 j9036_6 66
7 j9023_9 61 52 j9039_4 63 97 j9037_5 66
8 j9024_3 61 53 j9039_5 63 98 j9039_2 66
9 j9026_1 61 54 j9041_8 63 99 j9039_8 66
10 j9027_10 61 55 j9043_5 63 100 j9039_9 66
11 j9028_3 61 56 j9044_2 63 101 j9042_4 66
12 j9029_2 61 57 j9045_10 63 102 j9044_6 66
13 j9030_1 61 58 j9046_7 63 103 j9044_9 66
14 j9031_7 61 59 j9046_9 63 104 j9046_2 66
15 j9032_2 61 60 j9023_4 64 105 j9046_4 66
16 j9034_6 61 61 j9029_9 64 106 j9047_10 66
17 j9044_10 61 62 j9034_2 64 107 j9047_5 66
18 j9017_2 62 63 j9034_5 64 108 j9047_6 66
19 j9018_6 62 64 j9034_7 64 109 j9033_10 67
20 j9019_2 62 65 j9037_2 64 110 j9033_4 67
21 j9020_2 62 66 j9040_5 64 111 j9034_4 67
22 j9020_3 62 67 j9040_7 64 112 j9034_8 67
23 j9022_6 62 68 j9043_2 64 113 j9035_3 67
24 j9024_1 62 69 j9043_9 64 114 j9035_9 67
25 j9028_10 62 70 j9044_4 64 115 j9036_1 67
26 j9028_2 62 71 j9045_1 64 116 j9036_2 67
27 j9031_6 62 72 j9033_2 65 117 j9036_5 67
28 j9031_9 62 73 j9033_3 65 118 j9037_1 67
29 j9033_5 62 74 j9033_7 65 119 j9038_4 67
30 j9037_4 62 75 j9033_8 65 120 j9038_7 67
31 j9038_2 62 76 j9035_2 65 121 j9039_6 67
32 j9039_3 62 77 j9035_7 65 122 j9039_7 67
33 j9040_2 62 78 j9036_8 65 123 j9040_1 67
34 j9040_8 62 79 j9040_10 65 124 j9041_10 67
35 j9043_1 62 80 j9040_4 65 125 j9041_4 67
36 j9045_8 62 81 j9042_1 65 126 j9041_5 67
37 j9017_7 63 82 j9042_9 65 127 j9041_7 67
38 j9018_2 63 83 j9043_7 65 128 j9042_6 67
39 j9019_9 63 84 j9045_3 65 129 j9043_10 67
40 j9022_8 63 85 j9045_6 65 130 j9044_3 67
41 j9023_7 63 86 j9046_10 65 131 j9045_2 67
42 j9024_6 63 87 j9047_4 65 132 j9045_4 67
43 j9025_5 63 88 j9048_2 65 133 j9046_1 67
44 j9027_3 63 89 j9048_4 65 134 j9047_1 67
45 j9027_8 63 90 j9048_5 65 135 j9047_3 67

435

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PSPLIB Networks - Ranges of Johnson Measure (D)Values ****** Group#5: D > 78
No. Net Name D No. Net Name D No. Net Name D.
1 j12021_8 78 46 j12040_7 79 91 j12053_6 83
2 j12022_8 78 47 j12023_10 80 92 j12054_10 83
3 j12024_1 78 48 j12030_6 80 93 j12057_1 83
4 j12024_7 78 49 j12033_7 80 94 j12060_7 83
5 j12025_10 78 50 j12033_9 80 95 j12024_10 84
6 j12025_4 78 51 j12035_3 80 96 j12041_10 84
7 j12025_7 78 52 j12035_8 80 97 j12042_9 84
8 j12027_4 78 53 j12039_1 80 98 j12044_1 84
9 j12027_9 78 54 j12040_2 80 99 j12044_4 84
10 j12028_1 78 55 j12050_3 80 100 j12044_5 84
11 j12028_8 78 56 j12052_1 80 101 j12044_7 84
12 j12028_9 78 57 j12021_1 81 102 j12045_1 84
13 j12029_5 78 58 j12025_8 81 103 j12045_2 84
14 j12029_6 78 59 j12026_10 81 104 j12046_3 84
15 j12031_5 78 60 j12030_2 81 105 j12046_9 84
16 j12032_7 78 61 j12031_3 81 106 j12047_10 84
17 j12032_8 78 62 j12031_4 81 107 j12047_4 84
18 j12033_3 78 63 j12032_3 81 108 j12047_6 84
19 j12033_4 78 64 j12037_7 81 109 j12047_9 84
20 j12033_5 78 65 j12040_1 81 110 j12048_9 84
21 j12034_2 78 66 j12040_4 81 111 j12052_2 84
22 j12034_9 78 67 j12053_1 81 112 j12052_9 84
23 j12035_1 78 68 j12058_1 81 113 j12053_7 84
24 j12035_6 78 69 j12022_4 82 114 j12053_9 84
25 j12036_2 78 70 j12024_9 82 115 j12054_6 84
26 j12036_6 78 71 j12030_5 82 116 j12055_4 84
27 j12037_3 78 72 j12031_6 82 117 j12056_1 84
28 j12040_8 78 73 j12039_4 82 118 j12056_4 84
29 j12057_6 78 74 j12042_3 82 119 j12057_10 84
30 j12022_2 79 75 j12044_2 82 120 j12059_8 84
31 j12022_3 79 76 j12051_3 82 121 j12060_3 84
32 j12022_5 79 77 j12056_5 82 122 j12022_6 85
33 j12023_1 79 78 j12057_4 82 123 j12039_9 85
34 j12023_6 79 79 j12058_3 82 124 j12041_3 85
35 j12024_8 79 80 j12021_9 83 125 j12042_6 85
36 j12026_5 79 81 j12031_9 83 126 j12042_8 85
37 j12027_3 79 82 j12036_7 83 127 j12043_2 85
38 j12028_10 79 83 j12043_8 83 128 j12043_7 85
39 j12029_7 79 84 j12047_5 83 129 j12044_10 85
40 j12029_9 79 85 j12048_2 83 130 j12044_3 85
41 j12031_2 79 86 j12048_6 83 131 j12045_9 85
42 j12033_10 79 87 j12049_4 83 132 j12046_10 85
43 j12037_6 79 88 j12050_5 83 133 j12046_5 85
44 j12038_1 79 89 j12050_7 83 134 j12048_1 85
45 j12038_10 79 90 j12052_3 83 135 j12049_10 85

436

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix E.2.4 - PSPLIB Networks - Ranges of Normalized Complexity Measures (Cn)

Group #1 (Cn Values provided for 135 Networks only out of 520) --------page 438

Group #2 (Cn Values provided for 135 Networks only out of 681) --------page 439

Group #3 (Cn Values provided for 135 Networks only out of 519) --------page 440

Group #4 (Cn Values provided for 135 Networks only out of 160) --------page 441

Group #5 (Cn Values provided for 135 Networks only out of 160) --------page 442

437

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PSPLIB Networks - Ranges of Normalized Complexity (Cn) in % Group#1: Cn > 10%

No. Net Name Cn (%) No. Net Name Cn (%) No. Net Name Cn (%)
1 j6010_1 15.2960 41 j6014_8 15.2960 81 j603_10 15.2960
2 j6010_10 15.2960 42 j6014_9 15.2960 82 j603_2 15.2960
3 j6010_2 15.2960 43 j6015_1 15.2960 83 j603_3 15.2960
4 j6010_3 15.2960 44 j6015_10 15.2960 84 j603_4 15.2960
5 j6010_4 15.2960 45 j6015_2 15.2960 85 j603_5 15.2960
6 j6010_5 15.2960 46 j6015_3 15.2960 86 j603_6 15.2960
7 j6010_6 15.2960 47 j6015_4 15.2960 87 j603_8 15.2960
8 j6010_7 15.2960 48 j6015_5 15.2960 88 j604_1 15.2960
9 j6010_8 15.2960 49 j6015_7 15.2960 89 j604_10 15.2960
10 j6011_10 15.2960 50 j6015_8 15.2960 90 j604_2 15.2960
11 j6011_3 15.2960 51 j6015_9 15.2960 91 j604_3 15.2960
12 j6011_4 15.2960 52 j6016_1 15.2960 92 j604_4 15.2960
13 j6011_5 15.2960 53 j6016_10 15.2960 93 j604_5 15.2960
14 j6011_6 15.2960 54 j6016_2 15.2960 94 j604_6 15.2960
15 j6011_7 15.2960 55 j6016_3 15.2960 95 j604_7 15.2960
16 j6011_8 15.2960 56 j6016_4 15.2960 96 j604_9 15.2960
17 j6011_9 15.2960 57 j6016_5 15.2960 97 j605_1 15.2960
18 j6012_1 15.2960 58 j6016_6 15.2960 98 j605_10 15.2960
19 j6012_10 15.2960 59 j6016_7 15.2960 99 j605_2 15.2960
20 j6012_2 15.2960 60 j6016_8 15.2960 100 j605_3 15.2960
21 j6012_3 15.2960 61 j601_1 15.2960 101 j605_4 15.2960
22 j6012_5 15.2960 62 j601_10 15.2960 102 j605_5 15.2960
23 j6012_6 15.2960 63 j601_2 15.2960 103 j605_6 15.2960
24 j6012_7 15.2960 64 j601_3 15.2960 104 j605_7 15.2960
25 j6012_9 15.2960 65 j601_4 15.2960 105 j605_8 15.2960
26 j6013_1 15.2960 66 j601_5 15.2960 106 j605_9 15.2960
27 j6013_10 15.2960 67 j601_6 15.2960 107 j606_1 15.2960
28 j6013_2 15.2960 68 j601_7 15.2960 108 j606_10 15.2960
29 j6013_4 15.2960 69 j601_8 15.2960 109 j606_2 15.2960
30 j6013_5 15.2960 70 j601_9 15.2960 110 j606_3 15.2960
31 j6013_6 15.2960 71 j602_1 15.2960 111 j606_4 15.2960
32 j6013_7 15.2960 72 j602_10 15.2960 112 j606_5 15.2960
33 j6013_8 15.2960 73 j602_2 15.2960 113 j606_6 15.2960
34 j6013_9 15.2960 74 j602_3 15.2960 114 j606_7 15.2960
35 j6014_2 15.2960 75 j602_4 15.2960 115 j606_8 15.2960
36 j6014_3 15.2960 76 j602_5 15.2960 116 j606_9 15.2960
37 j6014_4 15.2960 77 j602_6 15.2960 117 j607_1 15.2960
38 j6014_5 15.2960 78 j602_7 15.2960 118 j607_10 15.2960
39 j6014_6 15.2960 79 j602_8 15.2960 119 j607_2 15.2960
40 j6014_7 15.2960 80 j603_1 15.2960 120 j607_3 15.2960

438

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PSPLIB Networks - Ranges of Normalized Complexity (Cn) in % Group#2: Cn > 16%

No. Net Name Cn (%) No. Net Name Cn (%) No. Net Name Cn (%)
1 j3010_1 20.7094 41 j3014_10 20.7094 81 j302_3 20.7094
2 j3010_10 20.7094 42 j3014_2 20.7094 82 j302_4 20.7094
3 j3010_2 20.7094 43 j3014_3 20.7094 83 j302_5 20.7094
4 j3010_3 20.7094 44 j3014_4 20.7094 84 j302_6 20.7094
5 j3010_4 20.7094 45 j3014_5 20.7094 85 j302_7 20.7094
6 j3010_5 20.7094 46 j3014_6 20.7094 86 j302_8 20.7094
7 j3010_6 20.7094 47 j3014_7 20.7094 87 j302_9 20.7094
8 j3010_7 20.7094 48 j3014_8 20.7094 88 j303_1 20.7094
9 j3010_8 20.7094 49 j3014_9 20.7094 89 j303_10 20.7094
10 j3010_9 20.7094 50 j3015_1 20.7094 90 j303_2 20.7094
11 j3011_1 20.7094 51 j3015_10 20.7094 91 j303_3 20.7094
12 j3011_10 20.7094 52 j3015_2 20.7094 92 j303_4 20.7094
13 j3011_2 20.7094 53 j3015_3 20.7094 93 j303_5 20.7094
14 j3011_3 20.7094 54 j3015_4 20.7094 94 j303_6 20.7094
15 j3011_4 20.7094 55 j3015_5 20.7094 95 j303_7 20.7094
16 j3011_5 20.7094 56 j3015_6 20.7094 96 j303_8 20.7094
17 j3011_6 20.7094 57 j3015_7 20.7094 97 j303_9 20.7094
18 j3011_7 20.7094 58 j3015_9 20.7094 98 j304_1 20.7094
19 j3011_8 20.7094 59 j3016_1 20.7094 99 j304_10 20.7094
20 j3011_9 20.7094 60 j3016_10 20.7094 100 j304_2 20.7094
21 j3012_1 20.7094 61 j3016_2 20.7094 101 j304_3 20.7094
22 j3012_10 20.7094 62 j3016_3 20.7094 102 j304_4 20.7094
23 j3012_2 20.7094 63 j3016_4 20.7094 103 j304_5 20.7094
24 j3012_3 20.7094 64 j3016_5 20.7094 104 j304_7 20.7094
25 j3012_4 20.7094 65 j3016_6 20.7094 105 j304_9 20.7094
26 j3012_5 20.7094 66 j3016_7 20.7094 106 j305_1 20.7094
27 j3012_6 20.7094 67 j3016_8 20.7094 107 j305_10 20.7094
28 j3012_7 20.7094 68 j3016_9 20.7094 108 j305_2 20.7094
29 j3012_8 20.7094 69 j301_1 20.7094 109 j305_3 20.7094
30 j3012_9 20.7094 70 j301_10 20.7094 110 j305_4 20.7094
31 j3013_1 20.7094 71 j301_2 20.7094 111 j305_5 20.7094
32 j3013_10 20.7094 72 j301_3 20.7094 112 j305_6 20.7094
33 j3013_2 20.7094 73 j301_4 20.7094 113 j305_7 20.7094
34 j3013_3 20.7094 74 j301_5 20.7094 114 j305_8 20.7094
35 j3013_4 20.7094 75 j301_6 20.7094 115 j305_9 20.7094
36 j3013_5 20.7094 76 j301_7 20.7094 116 j306_1 20.7094
37 j3013_7 20.7094 77 j301_8 20.7094 117 j306_2 20.7094
38 j3013_8 20.7094 78 j301_9 20.7094 118 j306_3 20.7094
39 j3013_9 20.7094 79 j302_1 20.7094 119 j306_4 20.7094
40 j3014_1 20.7094 80 j302_2 20.7094 120 j306_5 20.7094

439

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PSPLIB Networks - Ranges of Normalized Complexity (Cn) in % Group#3: Cn > 22%

No. Net Name Cn (%) No. Net Name Cn (%) No. Net Name Cn (%)
1 j6017_1 22.0386 41 j6021_6 22.0386 81 j6026_2 22.0386
2 j6017_10 22.0386 42 j6021_8 22.0386 82 j6026_4 22.0386
3 j6017_2 22.0386 43 j6021_9 22.0386 83 j6026_5 22.0386
4 j6017_3 22.0386 44 j6022_1 22.0386 84 j6026_6 22.0386
5 j6017_4 22.0386 45 j6022_10 22.0386 85 j6026_7 22.0386
6 j6017_5 22.0386 46 j6022_2 22.0386 86 j6026_8 22.0386
7 j6017_6 22.0386 47 j6022_3 22.0386 87 j6026_9 22.0386
8 j6017_7 22.0386 48 j6022_4 22.0386 88 j6027_1 22.0386
9 j6017_9 22.0386 49 j6022_5 22.0386 89 j6027_10 22.0386
10 j6018_10 22.0386 50 j6022_6 22.0386 90 j6027_3 22.0386
11 j6018_2 22.0386 51 j6022_7 22.0386 91 j6027_4 22.0386
12 j6018_3 22.0386 52 j6022_8 22.0386 92 j6027_5 22.0386
13 j6018_4 22.0386 53 j6022_9 22.0386 93 j6027_6 22.0386
14 j6018_5 22.0386 54 j6023_1 22.0386 94 j6027_7 22.0386
15 j6018_6 22.0386 55 j6023_10 22.0386 95 j6027_8 22.0386
16 j6018_7 22.0386 56 j6023_2 22.0386 96 j6027_9 22.0386
17 j6018_8 22.0386 57 j6023_3 22.0386 97 j6028_1 22.0386
18 j6018_9 22.0386 58 j6023_5 22.0386 98 j6028_10 22.0386
19 j6019_1 22.0386 59 j6023_6 22.0386 99 j6028_2 22.0386
20 j6019_10 22.0386 60 j6023_8 22.0386 100 j6028_3 22.0386
21 j6019_2 22.0386 61 j6023_9 22.0386 101 j6028_4 22.0386
22 j6019_3 22.0386 62 j6024_1 22.0386 102 j6028_5 22.0386
23 j6019_4 22.0386 63 j6024_10 22.0386 103 j6028_7 22.0386
24 j6019_5 22.0386 64 j6024_3 22.0386 104 j6028_8 22.0386
25 j6019_6 22.0386 65 j6024_4 22.0386 105 j6028_9 22.0386
26 j6019_7 22.0386 66 j6024_5 22.0386 106 j6029_1 22.0386
27 j6019_8 22.0386 67 j6024_6 22.0386 107 j6029_10 22.0386
28 j6019_9 22.0386 68 j6024_7 22.0386 108 j6029_2 22.0386
29 j6020_10 22.0386 69 j6024_8 22.0386 109 j6029_3 22.0386
30 j6020_2 22.0386 70 j6024_9 22.0386 110 j6029_4 22.0386
31 j6020_3 22.0386 71 j6025_1 22.0386 111 j6029_6 22.0386
32 j6020_5 22.0386 72 j6025_2 22.0386 112 j6029_7 22.0386
33 j6020_7 22.0386 73 j6025_3 22.0386 113 j6029_8 22.0386
34 j6020_8 22.0386 74 j6025_4 22.0386 114 j6030_1 22.0386
35 j6020_9 22.0386 75 j6025_5 22.0386 115 j6030_10 22.0386
36 j6021_1 22.0386 76 j6025_6 22.0386 116 j6030_2 22.0386
37 j6021_10 22.0386 77 j6025_8 22.0386 117 j6030_3 22.0386
38 j6021_3 22.0386 78 j6025_9 22.0386 118 j6030_4 22.0386
39 j6021_4 22.0386 79 j6026_1 22.0386 119 j6030_5 22.0386
40 j6021_5 22.0386 80 j6026_10 22.0386 120 j6030_6 22.0386

440

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PSPLIB Networks - Ranges of Normalized Complexity (Cn) in % Group#4: Cn > 28%

No. Net Name Cn (%) No. Net Name Cn (%) No. Net Name Cn (%)
1 j3017_1 29.6731 41 j3021_10 29.6731 81 j3025_4 29.6731
2 j3017_10 29.6731 42 j3021_2 29.6731 82 j3025_5 29.6731
3 j3017_2 29.6731 43 j3021_3 29.6731 83 j3025_6 29.6731
4 j3017_3 29.6731 44 j3021_4 29.6731 84 j3025_7 29.6731
5 j3017_4 29.6731 45 j3021_5 29.6731 85 j3025_8 29.6731
6 j3017_5 29.6731 46 j3021_6 29.6731 86 j3025_9 29.6731
7 j3017_6 29.6731 47 j3021_7 29.6731 87 j3026_1 29.6731
8 j3017_7 29.6731 48 j3021_8 29.6731 88 j3026_10 29.6731
9 j3017_8 29.6731 49 j3022_1 29.6731 89 j3026_2 29.6731
10 j3017_9 29.6731 50 j3022_10 29.6731 90 j3026_3 29.6731
11 j3018_1 29.6731 51 j3022_2 29.6731 91 j3026_4 29.6731
12 j3018_10 29.6731 52 j3022_3 29.6731 92 j3026_5 29.6731
13 j3018_2 29.6731 53 j3022_4 29.6731 93 j3026_6 29.6731
14 j3018_3 29.6731 54 j3022_5 29.6731 94 j3026_7 29.6731
15 j3018_4 29.6731 55 j3022_7 29.6731 95 j3026_8 29.6731
16 j3018_5 29.6731 56 j3022_8 29.6731 96 j3026_9 29.6731
17 j3018_6 29.6731 57 j3022_9 29.6731 97 j3027_1 29.6731
18 j3018_7 29.6731 58 j3023_1 29.6731 98 j3027_10 29.6731
19 j3018_8 29.6731 59 j3023_10 29.6731 99 j3027_2 29.6731
20 j3018_9 29.6731 60 j3023_2 29.6731 100 j3027_3 29.6731
21 j3019_1 29.6731 61 j3023_3 29.6731 101 j3027_4 29.6731
22 j3019_10 29.6731 62 j3023_4 29.6731 102 j3027_5 29.6731
23 j3019_2 29.6731 63 j3023_5 29.6731 103 j3027_6 29.6731
24 j3019_3 29.6731 64 j3023_6 29.6731 104 j3027_7 29.6731
25 j3019_4 29.6731 65 j3023_7 29.6731 105 j3027_8 29.6731
26 j3019_5 29.6731 66 j3023_8 29.6731 106 j3027_9 29.6731
27 j3019_6 29.6731 67 j3023_9 29.6731 107 j3028_1 29.6731
28 j3019_7 29.6731 68 j3024_10 29.6731 108 j3028_10 29.6731
29 j3019_9 29.6731 69 j3024_2 29.6731 109 j3028_2 29.6731
30 j3020_1 29.6731 70 j3024_3 29.6731 110 j3028_3 29.6731
31 j3020_10 29.6731 71 j3024_4 29.6731 111 j3028_4 29.6731
32 j3020_2 29.6731 72 j3024_5 29.6731 112 j3028_5 29.6731
33 j3020_3 29.6731 73 j3024_6 29.6731 113 j3028_6 29.6731
34 j3020_4 29.6731 74 j3024_7 29.6731 114 j3028_7 29.6731
35 j3020_5 29.6731 75 j3024_8 29.6731 115 j3028_8 29.6731
36 j3020_6 29.6731 76 j3024_9 29.6731 116 j3028_9 29.6731
37 j3020_7 29.6731 77 j3025_1 29.6731 117 j3029_1 29.6731
38 j3020_8 29.6731 78 j3025_10 29.6731 118 j3029_10 29.6731
39 j3020_9 29.6731 79 j3025_2 29.6731 119 j3029_2 29.6731
40 j3021_1 29.6731 80 j3025_3 29.6731 120 j3029_3 29.6731

441

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PSPLIB Networks - Ranges of Normalized Complexity (Cn) in % Group#5: Cn > 34%

No. Net Name Cn (%) No. Net Name Cn (%) No. Net Name Cn (%)
1 j3033_1 37.2075 41 j3037_3 37.2075 81 j3041_5 37.2075
2 j3033_10 37.2075 42 j3037_4 37.2075 82 j3041_6 37.2075
3 j3033_2 37.2075 43 j3037_5 37.2075 83 j3041_7 37.2075
4 j3033_3 37.2075 44 j3037_6 37.2075 84 j3041_8 37.2075
5 j3033_4 37.2075 45 j3037_7 37.2075 85 j3042_1 37.2075
6 j3033_5 37.2075 46 j3037_8 37.2075 86 j3042_10 37.2075
7 j3033_6 37.2075 47 j3037_9 37.2075 87 j3042_2 37.2075
8 j3033_7 37.2075 48 j3038_1 37.2075 88 j3042_3 37.2075
9 j3033_9 37.2075 49 j3038_10 37.2075 89 j3042_4 37.2075
10 j3034_1 37.2075 50 j3038_2 37.2075 90 j3042_5 37.2075
11 j3034_10 37.2075 51 j3038_3 37.2075 91 j3042_6 37.2075
12 j3034_2 37.2075 52 j3038_4 37.2075 92 j3042_7 37.2075
13 j3034_3 37.2075 53 j3038_5 37.2075 93 j3042_8 37.2075
14 j3034_4 37.2075 54 j3038_6 37.2075 94 j3042_9 37.2075
15 j3034_5 37.2075 55 j3038_7 37.2075 95 j3043_1 37.2075
16 j3034_6 37.2075 56 j3038_8 37.2075 96 j3043_10 37.2075
17 j3034_7 37.2075 57 j3038_9 37.2075 97 j3043_2 37.2075
18 j3034_8 37.2075 58 j3039_1 37.2075 98 j3043_3 37.2075
19 j3034_9 37.2075 59 j3039_10 37.2075 99 j3043_4 37.2075
20 j3035_1 37.2075 60 j3039_2 37.2075 100 j3043_5 37.2075
21 j3035_10 37.2075 61 j3039_3 37.2075 101 j3043_6 37.2075
22 j3035_2 37.2075 62 j3039_4 37.2075 102 j3043_7 37.2075
23 j3035_3 37.2075 63 j3039_5 37.2075 103 j3043_8 37.2075
24 j3035_4 37.2075 64 j3039_7 37.2075 104 j3043_9 37.2075
25 j3035_5 37.2075 65 j3039_8 37.2075 105 j3044_1 37.2075
26 j3035_6 37.2075 66 j3039_9 37.2075 106 j3044_10 37.2075
27 j3035_7 37.2075 67 j3040_1 37.2075 107 j3044_2 37.2075
28 j3035_9 37.2075 68 j3040_10 37.2075 108 j3044_3 37.2075
29 j3036_1 37.2075 69 j3040_2 37.2075 109 j3044_4 37.2075
30 j3036_10 37.2075 70 j3040_3 37.2075 110 j3044_5 37.2075
31 j3036_2 37.2075 71 j3040_4 37.2075 111 j3044_6 37.2075
32 j3036_3 37.2075 72 j3040_6 37.2075 112 j3044_7 37.2075
33 j3036_4 37.2075 73 j3040_7 37.2075 113 j3044_8 37.2075
34 j3036_5 37.2075 74 j3040_8 37.2075 114 j3044_9 37.2075
35 j3036_6 37.2075 75 j3040_9 37.2075 115 j3045_1 37.2075
36 j3036_7 37.2075 76 j3041_1 37.2075 116 j3045_10 37.2075
37 j3036_8 37.2075 77 j3041_10 37.2075 117 j3045_2 37.2075
38 j3036_9 37.2075 78 j3041_2 37.2075 118 j3045_4 37.2075
39 j3037_1 37.2075 79 j3041_3 37.2075 119 j3045_5 37.2075
40 j3037_2 37.2075 80 j3041_4 37.2075 120 j3045_6 37.2075

442

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix E.2.5 - PSPLIB Networks - Ranges of Order Strength (OS) Values

Group #1 (OS Values provided for 135 Networks only out of 920) --------page 444

Group #2 (OS Values provided for 135 Networks only out of 480) --------page 445

Group #3 (OS Values provided for 135 Networks only out of 160) --------page 446

Group #4 (OS Values provided for 135 Networks only out of 160) --------page 447

Group #5 (OS Values provided for 135 Networks only out of 320) --------page 448

443

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
*PSPLIB Networks - Ranges of Density/OS ****** Group#1: OS > 0.02
No Net Name OS No. Net Name OS No. Net Name OS
1 j9010_1 0.032967 46 j9014_5 0.032967 91 j903_1 0.032967
2 j9010_10 0.032967 47 j9014_6 0.032967 92 j903_10 0.032967
3 j9010_2 0.032967 48 j9014_7 0.032967 93 j903_2 0.032967
4 j9010_3 0.032967 49 j9014_8 0.032967 94 j903_3 0.032967
5 j9010_4 0.032967 50 j9014_9 0.032967 95 j903_4 0.032967
6 j9010_5 0.032967 51 j9015_1 0.032967 96 j903_5 0.032967
7 j9010_6 0.032967 52 j9015_10 0.032967 97 j903_6 0.032967
8 j9010_7 0.032967 53 j9015_2 0.032967 98 j903_7 0.032967
9 j9010_8 0.032967 54 j9015_3 0.032967 99 j903_8 0.032967
10 j9010_9 0.032967 55 j9015_4 0.032967 100 j903_9 0.032967
11 j9011_1 0.032967 56 j9015_5 0.032967 101 j904_1 0.032967
12 j9011_10 0.032967 57 j9015_6 0.032967 102 j904_10 0.032967
13 j9011_2 0.032967 58 j9015_7 0.032967 103 j904_2 0.032967
14 j9011_3 0.032967 59 j9015_8 0.032967 104 j904_3 0.032967
15 j9011_4 0.032967 60 j9015_9 0.032967 105 j904_4 0.032967
16 j9011_5 0.032967 61 j9016_1 0.032967 106 j904_5 0.032967
17 j9011_6 0.032967 62 j9016_10 0.032967 107 j904_6 0.032967
18 j9011_7 0.032967 63 j9016_2 0.032967 108 j904_7 0.032967
19 j9011_8 0.032967 64 j9016_3 0.032967 109 j904_8 0.032967
20 j9011_9 0.032967 65 j9016_4 0.032967 110 j904_9 0.032967
21 j9012_1 0.032967 66 j9016_5 0.032967 111 j905_1 0.032967
22 j9012_10 0.032967 67 j9016_6 0.032967 112 j905_10 0.032967
23 j9012_2 0.032967 68 j9016_7 0.032967 113 j905_2 0.032967
24 j9012_3 0.032967 69 j9016_8 0.032967 114 j905_3 0.032967
25 j9012_4 0.032967 70 j9016_9 0.032967 115 j905_4 0.032967
26 j9012_5 0.032967 71 j901_1 0.032967 116 j905_5 0.032967
27 j9012_6 0.032967 72 j901_10 0.032967 117 j905_6 0.032967
28 j9012_7 0.032967 73 j901_2 0.032967 118 j905_7 0.032967
29 j9012_8 0.032967 74 j901_3 0.032967 119 j905_8 0.032967
30 j9012_9 0.032967 75 j901_4 0.032967 120 j905_9 0.032967
31 j9013_1 0.032967 76 j901_5 0.032967 121 j906_1 0.032967
32 j9013_10 0.032967 77 j901_6 0.032967 122 j906_10 0.032967
33 j9013_2 0.032967 78 j901_7 0.032967 123 j906_2 0.032967
34 j9013_3 0.032967 79 j901_8 0.032967 124 j906_3 0.032967
35 j9013_4 0.032967 80 j901_9 0.032967 125 j906_4 0.032967
36 j9013_5 0.032967 81 j902_1 0.032967 126 j906_5 0.032967
37 j9013_6 0.032967 82 j902_10 0.032967 127 j906_6 0.032967
38 j9013_7 0.032967 83 j902_2 0.032967 128 j906_7 0.032967
39 j9013_8 0.032967 84 j902_3 0.032967 129 j906_8 0.032967
40 j9013_9 0.032967 85 j902_4 0.032967 130 j906_9 0.032967
41 j9014_1 0.032967 86 j902_5 0.032967 131 j907_1 0.032967
42 j9014_10 0.032967 87 j902_6 0.032967 132 j907_10 0.032967
43 j9014_2 0.032967 88 j902_7 0.032967 133 j907_2 0.032967
44 j9014_3 0.032967 89 j902_8 0.032967 134 j907_3 0.032967
45 j9014_4 0.032967 90 j902_9 0.032967 135 j907_4 0.032967

444

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
*PSPLIB Networks - Ranges of Density/OS ****** Group#2: OS > 0.044
No. Net Name OS No. Net Name OS No. Net Name OS
1 j6010_1 0.04918 46 j6014_5 0.04918 91 j603_1 0.04918
2 j6010_10 0.04918 47 j6014_6 0.04918 92 j603_10 0.04918
3 j6010_2 0.04918 48 j6014_7 0.04918 93 j603_2 0.04918
4 j6010_3 0.04918 49 j6014_8 0.04918 94 j603_3 0.04918
5 j6010_4 0.04918 50 j6014_9 0.04918 95 j603_4 0.04918
6 j6010_5 0.04918 51 j6015_1 0.04918 96 j603_5 0.04918
7 j6010_6 0.04918 52 j6015_10 0.04918 97 j603_6 0.04918
8 j6010_7 0.04918 53 j6015_2 0.04918 98 j603_7 0.04918
9 j6010_8 0.04918 54 j6015_3 0.04918 99 j603_8 0.04918
10 j6010_9 0.04918 55 j6015_4 0.04918 100 j603_9 0.04918
11 j6011_1 0.04918 56 j6015_5 0.04918 101 j604_1 0.04918
12 j6011_10 0.04918 57 j6015_6 0.04918 102 j604_10 0.04918
13 j6011_2 0.04918 58 j6015_7 0.04918 103 j604_2 0.04918
14 j6011_3 0.04918 59 j6015_8 0.04918 104 j604_3 0.04918
15 j6011_4 0.04918 60 j6015_9 0.04918 105 j604_4 0.04918
16 j6011_5 0.04918 61 j6016_1 0.04918 106 j604_5 0.04918
17 j6011_6 0.04918 62 j6016_10 0.04918 107 j604_6 0.04918
18 j6011_7 0.04918 63 j6016_2 0.04918 108 j604_7 0.04918
19 j6011_8 0.04918 64 j6016_3 0.04918 109 j604_8 0.04918
20 j6011_9 0.04918 65 j6016_4 0.04918 110 j604_9 0.04918
21 j6012_1 0.04918 66 j6016_5 0.04918 111 j605_1 0.04918
22 j6012_10 0.04918 67 j6016_6 0.04918 112 j605_10 0.04918
23 j6012_2 0.04918 68 j6016_7 0.04918 113 j605_2 0.04918
24 j6012_3 0.04918 69 j6016_8 0.04918 114 j605_3 0.04918
25 j6012_4 0.04918 70 j6016_9 0.04918 115 j605_4 0.04918
26 j6012_5 0.04918 71 j601_1 0.04918 116 j605_5 0.04918
27 j6012_6 0.04918 72 j601_10 0.04918 117 j605_6 0.04918
28 j6012_7 0.04918 73 j601_2 0.04918 118 j605_7 0.04918
29 j6012_8 0.04918 74 j601_3 0.04918 119 j605_8 0.04918
30 j6012_9 0.04918 75 j601_4 0.04918 120 j605_9 0.04918
31 j6013_1 0.04918 76 j601_5 0.04918 121 j606_1 0.04918
32 j6013_10 0.04918 77 j601_6 0.04918 122 j606_10 0.04918
33 j6013_2 0.04918 78 j601_7 0.04918 123 j606_2 0.04918
34 j6013_3 0.04918 79 j601_8 0.04918 124 j606_3 0.04918
35 j6013_4 0.04918 80 j601_9 0.04918 125 j606_4 0.04918
36 j6013_5 0.04918 81 j602_1 0.04918 126 j606_5 0.04918
37 j6013_6 0.04918 82 j602_10 0.04918 127 j606_6 0.04918
38 j6013_7 0.04918 83 j602_2 0.04918 128 j606_7 0.04918
39 j6013_8 0.04918 84 j602_3 0.04918 129 j606_8 0.04918
40 j6013_9 0.04918 85 j602_4 0.04918 130 j606_9 0.04918
41 j6014_1 0.04918 86 j602_5 0.04918 131 j607_1 0.04918
42 j6014_10 0.04918 87 j602_6 0.04918 132 j607_10 0.04918
43 j6014_2 0.04918 88 j602_7 0.04918 133 j607_2 0.04918
44 j6014_3 0.04918 89 j602_8 0.04918 134 j607_3 0.04918
45 j6014_4 0.04918 90 j602_9 0.04918 135 j607_4 0.04918

445

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
*PSPLIB Networks - Ranges of Density/OS ****** Group#3: OS > 0.068
No
Net Name OS No. Net Name OS No. Net Name OS
.
1 j6033_1 0.069276 46 j6037_5 0.069276 91 j6042_1 0.069276
2 j6033_10 0.069276 47 j6037_6 0.069276 92 j6042_10 0.069276
3 j6033_2 0.069276 48 j6037_7 0.069276 93 j6042_2 0.069276
4 j6033_3 0.069276 49 j6037_8 0.069276 94 j6042_3 0.069276
5 j6033_4 0.069276 50 j6037_9 0.069276 95 j6042_4 0.069276
6 j6033_5 0.069276 51 j6038_1 0.069276 96 j6042_5 0.069276
7 j6033_6 0.069276 52 j6038_10 0.069276 97 j6042_6 0.069276
8 j6033_7 0.069276 53 j6038_2 0.069276 98 j6042_7 0.069276
9 j6033_8 0.069276 54 j6038_3 0.069276 99 j6042_8 0.069276
10 j6033_9 0.069276 55 j6038_4 0.069276 100 j6042_9 0.069276
11 j6034_1 0.069276 56 j6038_5 0.069276 101 j6043_1 0.069276
12 j6034_10 0.069276 57 j6038_6 0.069276 102 j6043_10 0.069276
13 j6034_2 0.069276 58 j6038_7 0.069276 103 j6043_2 0.069276
14 j6034_3 0.069276 59 j6038_8 0.069276 104 j6043_3 0.069276
15 j6034_4 0.069276 60 j6038_9 0.069276 105 j6043_4 0.069276
16 j6034_5 0.069276 61 j6039_1 0.069276 106 j6043_5 0.069276
17 j6034_6 0.069276 62 j6039_10 0.069276 107 j6043_6 0.069276
18 j6034_7 0.069276 63 j6039_2 0.069276 108 j6043_7 0.069276
19 j6034_8 0.069276 64 j6039_3 0.069276 109 j6043_8 0.069276
20 j6034_9 0.069276 65 j6039_4 0.069276 110 j6043_9 0.069276
21 j6035_1 0.069276 66 j6039_5 0.069276 111 j6044_1 0.069276
22 j6035_10 0.069276 67 j6039_6 0.069276 112 j6044_10 0.069276
23 j6035_2 0.069276 68 j6039_7 0.069276 113 j6044_2 0.069276
24 j6035_3 0.069276 69 j6039_8 0.069276 114 j6044_3 0.069276
25 j6035_4 0.069276 70 j6039_9 0.069276 115 j6044_4 0.069276
26 j6035_5 0.069276 71 j6040_1 0.069276 116 j6044_5 0.069276
27 j6035_6 0.069276 72 j6040_10 0.069276 117 j6044_6 0.069276
28 j6035_7 0.069276 73 j6040_2 0.069276 118 j6044_7 0.069276
29 j6035_8 0.069276 74 j6040_3 0.069276 119 j6044_8 0.069276
30 j6035_9 0.069276 75 j6040_4 0.069276 120 j6044_9 0.069276
31 j6036_1 0.069276 76 j6040_5 0.069276 121 j6045_1 0.069276
32 j6036_10 0.069276 77 j6040_6 0.069276 122 j6045_10 0.069276
33 j6036_2 0.069276 78 j6040_7 0.069276 123 j6045_2 0.069276
34 j6036_3 0.069276 79 j6040_8 0.069276 124 j6045_3 0.069276
35 j6036_4 0.069276 80 j6040_9 0.069276 125 j6045_4 0.069276
36 j6036_5 0.069276 81 j6041_1 0.069276 126 j6045_5 0.069276
37 j6036_6 0.069276 82 j6041_10 0.069276 127 j6045_6 0.069276
38 j6036_7 0.069276 83 j6041_2 0.069276 128 j6045_7 0.069276
39 j6036_8 0.069276 84 j6041_3 0.069276 129 j6045_8 0.069276
40 j6036_9 0.069276 85 j6041_4 0.069276 130 j6045_9 0.069276
41 j6037_1 0.069276 86 j6041_5 0.069276 131 j6046_1 0.069276
42 j6037_10 0.069276 87 j6041_6 0.069276 132 j6046_10 0.069276
43 j6037_2 0.069276 88 j6041_7 0.069276 133 j6046_2 0.069276
44 j6037_3 0.069276 89 j6041_8 0.069276 134 j6046_3 0.069276
45 j6037_4 0.069276 90 j6041_9 0.069276 135 j6046_4 0.069276

446

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
*PSPLIB Networks - Ranges of Density/OS ****** Group#4: OS > 0.092
No
Net Name OS No. Net Name OS No. Net Name OS
.
1 j3010_1 0.096774 46 j3014_5 0.096774 91 j303_1 0.096774
2 j3010_10 0.096774 47 j3014_6 0.096774 92 j303_10 0.096774
3 j3010_2 0.096774 48 j3014_7 0.096774 93 j303_2 0.096774
4 j3010_3 0.096774 49 j3014_8 0.096774 94 j303_3 0.096774
5 j3010_4 0.096774 50 j3014_9 0.096774 95 j303_4 0.096774
6 j3010_5 0.096774 51 j3015_1 0.096774 96 j303_5 0.096774
7 j3010_6 0.096774 52 j3015_10 0.096774 97 j303_6 0.096774
8 j3010_7 0.096774 53 j3015_2 0.096774 98 j303_7 0.096774
9 j3010_8 0.096774 54 j3015_3 0.096774 99 j303_8 0.096774
10 j3010_9 0.096774 55 j3015_4 0.096774 100 j303_9 0.096774
11 j3011_1 0.096774 56 j3015_5 0.096774 101 j304_1 0.096774
12 j3011_10 0.096774 57 j3015_6 0.096774 102 j304_10 0.096774
13 j3011_2 0.096774 58 j3015_7 0.096774 103 j304_2 0.096774
14 j3011_3 0.096774 59 j3015_8 0.096774 104 j304_3 0.096774
15 j3011_4 0.096774 60 j3015_9 0.096774 105 j304_4 0.096774
16 j3011_5 0.096774 61 j3016_1 0.096774 106 j304_5 0.096774
17 j3011_6 0.096774 62 j3016_10 0.096774 107 j304_6 0.096774
18 j3011_7 0.096774 63 j3016_2 0.096774 108 j304_7 0.096774
19 j3011_8 0.096774 64 j3016_3 0.096774 109 j304_8 0.096774
20 j3011_9 0.096774 65 j3016_4 0.096774 110 j304_9 0.096774
21 j3012_1 0.096774 66 j3016_5 0.096774 111 j305_1 0.096774
22 j3012_10 0.096774 67 j3016_6 0.096774 112 j305_10 0.096774
23 j3012_2 0.096774 68 j3016_7 0.096774 113 j305_2 0.096774
24 j3012_3 0.096774 69 j3016_8 0.096774 114 j305_3 0.096774
25 j3012_4 0.096774 70 j3016_9 0.096774 115 j305_4 0.096774
26 j3012_5 0.096774 71 j301_1 0.096774 116 j305_5 0.096774
27 j3012_6 0.096774 72 j301_10 0.096774 117 j305_6 0.096774
28 j3012_7 0.096774 73 j301_2 0.096774 118 j305_7 0.096774
29 j3012_8 0.096774 74 j301_3 0.096774 119 j305_8 0.096774
30 j3012_9 0.096774 75 j301_4 0.096774 120 j305_9 0.096774
31 j3013_1 0.096774 76 j301_5 0.096774 121 j306_1 0.096774
32 j3013_10 0.096774 77 j301_6 0.096774 122 j306_10 0.096774
33 j3013_2 0.096774 78 j301_7 0.096774 123 j306_2 0.096774
34 j3013_3 0.096774 79 j301_8 0.096774 124 j306_3 0.096774
35 j3013_4 0.096774 80 j301_9 0.096774 125 j306_4 0.096774
36 j3013_5 0.096774 81 j302_1 0.096774 126 j306_5 0.096774
37 j3013_6 0.096774 82 j302_10 0.096774 127 j306_6 0.096774
38 j3013_7 0.096774 83 j302_2 0.096774 128 j306_7 0.096774
39 j3013_8 0.096774 84 j302_3 0.096774 129 j306_8 0.096774
40 j3013_9 0.096774 85 j302_4 0.096774 130 j306_9 0.096774
41 j3014_1 0.096774 86 j302_5 0.096774 131 j307_1 0.096774
42 j3014_10 0.096774 87 j302_6 0.096774 132 j307_10 0.096774
43 j3014_2 0.096774 88 j302_7 0.096774 133 j307_2 0.096774
44 j3014_3 0.096774 89 j302_8 0.096774 134 j307_3 0.096774
45 j3014_4 0.096774 90 j302_9 0.096774 135 j307_4 0.096774

447

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
*PSPLIB Networks - Ranges of Density/OS ****** Group#5: OS > 0.116
No
Net Name OS No. Net Name OS No. Net Name OS
.
1 j3017_1 0.116935 46 j3021_5 0.116935 91 j3026_1 0.116935
2 j3017_10 0.116935 47 j3021_6 0.116935 92 j3026_10 0.116935
3 j3017_2 0.116935 48 j3021_7 0.116935 93 j3026_2 0.116935
4 j3017_3 0.116935 49 j3021_8 0.116935 94 j3026_3 0.116935
5 j3017_4 0.116935 50 j3021_9 0.116935 95 j3026_4 0.116935
6 j3017_5 0.116935 51 j3022_1 0.116935 96 j3026_5 0.116935
7 j3017_6 0.116935 52 j3022_10 0.116935 97 j3026_6 0.116935
8 j3017_7 0.116935 53 j3022_2 0.116935 98 j3026_7 0.116935
9 j3017_8 0.116935 54 j3022_3 0.116935 99 j3026_8 0.116935
10 j3017_9 0.116935 55 j3022_4 0.116935 100 j3026_9 0.116935
11 j3018_1 0.116935 56 j3022_5 0.116935 101 j3027_1 0.116935
12 j3018_10 0.116935 57 j3022_6 0.116935 102 j3027_10 0.116935
13 j3018_2 0.116935 58 j3022_7 0.116935 103 j3027_2 0.116935
14 j3018_3 0.116935 59 j3022_8 0.116935 104 j3027_3 0.116935
15 j3018_4 0.116935 60 j3022_9 0.116935 105 j3027_4 0.116935
16 j3018_5 0.116935 61 j3023_1 0.116935 106 j3027_5 0.116935
17 j3018_6 0.116935 62 j3023_10 0.116935 107 j3027_6 0.116935
18 j3018_7 0.116935 63 j3023_2 0.116935 108 j3027_7 0.116935
19 j3018_8 0.116935 64 j3023_3 0.116935 109 j3027_8 0.116935
20 j3018_9 0.116935 65 j3023_4 0.116935 110 j3027_9 0.116935
21 j3019_1 0.116935 66 j3023_5 0.116935 111 j3028_1 0.116935
22 j3019_10 0.116935 67 j3023_6 0.116935 112 j3028_10 0.116935
23 j3019_2 0.116935 68 j3023_7 0.116935 113 j3028_2 0.116935
24 j3019_3 0.116935 69 j3023_8 0.116935 114 j3028_3 0.116935
25 j3019_4 0.116935 70 j3023_9 0.116935 115 j3028_4 0.116935
26 j3019_5 0.116935 71 j3024_1 0.116935 116 j3028_5 0.116935
27 j3019_6 0.116935 72 j3024_10 0.116935 117 j3028_6 0.116935
28 j3019_7 0.116935 73 j3024_2 0.116935 118 j3028_7 0.116935
29 j3019_8 0.116935 74 j3024_3 0.116935 119 j3028_8 0.116935
30 j3019_9 0.116935 75 j3024_4 0.116935 120 j3028_9 0.116935
31 j3020_1 0.116935 76 j3024_5 0.116935 121 j3029_1 0.116935
32 j3020_10 0.116935 77 j3024_6 0.116935 122 j3029_10 0.116935
33 j3020_2 0.116935 78 j3024_7 0.116935 123 j3029_2 0.116935
34 j3020_3 0.116935 79 j3024_8 0.116935 124 j3029_3 0.116935
35 j3020_4 0.116935 80 j3024_9 0.116935 125 j3029_4 0.116935
36 j3020_5 0.116935 81 j3025_1 0.116935 126 j3029_5 0.116935
37 j3020_6 0.116935 82 j3025_10 0.116935 127 j3029_6 0.116935
38 j3020_7 0.116935 83 j3025_2 0.116935 128 j3029_7 0.116935
39 j3020_8 0.116935 84 j3025_3 0.116935 129 j3029_8 0.116935
40 j3020_9 0.116935 85 j3025_4 0.116935 130 j3029_9 0.116935
41 j3021_1 0.116935 86 j3025_5 0.116935 131 j3030_1 0.116935
42 j3021_10 0.116935 87 j3025_6 0.116935 132 j3030_10 0.116935
43 j3021_2 0.116935 88 j3025_7 0.116935 133 j3030_2 0.116935
44 j3021_3 0.116935 89 j3025_8 0.116935 134 j3030_3 0.116935
45 j3021_4 0.116935 90 j3025_9 0.116935 135 j3030_4 0.116935

448

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix E.2.6 - PSPLIB Networks - Ranges of RT Values

Group #1 (RT Values provided for 135 Networks only out of 480) --------page 450

Group #2 (RT Values provided for 135 Networks only out of 639) --------page 451

Group #3 (RT Values provided for 135 Networks only out of 361) --------page 452

Group #4 (RT Values provided for 135 Networks only out of 360) --------page 453

Group #5 (RT Values provided for 135 Networks only out of 200) --------page 454

449

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
*PSPLIB Networks - Ranges of RT Values****** Group#1: RT > 0.1
No. Net Name RT No. Net Name RT No. Net Name RT
1 j905_3 0.199952 46 j1208_5 0.191844 91 j12011_8 0.201463
2 j9013_8 0.209986 47 j1208_8 0.192386 92 j12010_2 0.201599
3 j901_1 0.210463 48 j1208_10 0.192521 93 j12015_2 0.20187
4 j904_8 0.210702 49 j12013_10 0.192657 94 j1204_1 0.20187
5 j9015_2 0.213091 50 j12016_8 0.192657 95 j12015_6 0.202005
6 j9015_4 0.213808 51 j1204_6 0.192792 96 j12020_9 0.202005
7 j908_8 0.214286 52 j1209_7 0.193063 97 j12016_3 0.202276
8 j907_6 0.215002 53 j12011_9 0.193334 98 j1201_3 0.202547
9 j906_4 0.215958 54 j12010_7 0.193605 99 j1204_4 0.202547
10 j9010_7 0.216675 55 j1202_8 0.193605 100 j12017_8 0.202683
11 j9010_5 0.217391 56 j12019_4 0.193876 101 j12010_9 0.203224
12 j9013_10 0.217391 57 j1206_8 0.194012 102 j12015_3 0.20336
13 j901_3 0.217869 58 j12020_7 0.194147 103 j12020_6 0.203495
14 j12014_1 0.178025 59 j1202_2 0.194689 104 j1207_5 0.203495
15 j1206_10 0.180599 60 j12013_8 0.194825 105 j12018_4 0.203902
16 j1202_1 0.18087 61 j12018_9 0.19496 106 j1202_6 0.203902
17 j1208_6 0.18087 62 j1201_6 0.19496 107 j1207_1 0.203902
18 j1209_3 0.182631 63 j1202_3 0.19496 108 j1203_9 0.204173
19 j12016_9 0.183444 64 j12019_2 0.195366 109 j12012_6 0.204444
20 j1202_10 0.183579 65 j12011_10 0.195637 110 j1209_2 0.204444
21 j12013_5 0.183986 66 j12014_10 0.195637 111 j12013_4 0.20485
22 j12017_2 0.184528 67 j12010_10 0.196179 112 j1205_2 0.204986
23 j12017_10 0.184799 68 j12017_7 0.196992 113 j1206_2 0.205121
24 j12020_3 0.184934 69 j12016_7 0.197534 114 j12019_5 0.205257
25 j12014_7 0.185612 70 j12018_3 0.19767 115 j1201_10 0.205257
26 j12016_6 0.186018 71 j1205_1 0.19767 116 j1209_8 0.205257
27 j12012_9 0.186154 72 j1206_3 0.198347 117 j1208_3 0.205392
28 j1208_7 0.18656 73 j1205_3 0.198483 118 j12014_4 0.205528
29 j12011_2 0.187508 74 j12012_2 0.198889 119 j1203_2 0.205528
30 j1204_5 0.187644 75 j1204_8 0.199431 120 j12014_2 0.205799
31 j1206_5 0.18805 76 j12012_7 0.199566 121 j12011_1 0.205934
32 j1203_4 0.188457 77 j1206_1 0.199566 122 j12012_3 0.205934
33 j12020_10 0.189134 78 j12012_10 0.200244 123 j1202_7 0.206476
34 j12010_3 0.189812 79 j12015_7 0.200244 124 j1203_3 0.206476
35 j12015_1 0.189947 80 j12013_9 0.200379 125 j1202_5 0.206747
36 j1205_5 0.190083 81 j12015_4 0.200515 126 j1207_10 0.206747
37 j1209_10 0.190083 82 j12016_1 0.200515 127 j12018_7 0.206883
38 j12012_8 0.190489 83 j1205_10 0.200515 128 j12013_7 0.207018
39 j1205_7 0.190625 84 j12017_3 0.200786 129 j1208_1 0.207018
40 j1205_4 0.190896 85 j12017_4 0.200786 130 j12020_1 0.207695
41 j12017_6 0.191031 86 j1204_9 0.200786 131 j12014_8 0.207831
42 j12017_1 0.191302 87 j1205_8 0.200786 132 j12019_10 0.207831
43 j12015_5 0.191573 88 j12011_3 0.200921 133 j12019_9 0.208102
44 j1207_8 0.191708 89 j12013_2 0.201328 134 j1207_2 0.208102
45 j12020_5 0.191844 90 j1201_4 0.201328 135 j12020_8 0.208237

450

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
*PSPLIB Networks - Ranges of RT Values****** Group#2: RT > 0.22
No. Net Name RT No. Net Name RT No. Net Name RT
1 j305_7 0.33871 46 j603_8 0.285563 91 j6016_10 0.304072
2 j602_7 0.253834 47 j609_7 0.285563 92 j606_2 0.304601
3 j604_3 0.257536 48 j606_7 0.286092 93 j6011_2 0.30513
4 j603_7 0.258593 49 j6010_1 0.288207 94 j6014_2 0.30513
5 j6014_9 0.259122 50 j602_1 0.288207 95 j607_6 0.30513
6 j604_6 0.26018 51 j605_3 0.288207 96 j6015_8 0.305658
7 j6013_5 0.265468 52 j6011_9 0.289265 97 j6013_8 0.306187
8 j602_5 0.268641 53 j6010_10 0.290323 98 j601_2 0.306187
9 j602_8 0.268641 54 j6010_4 0.290323 99 j603_3 0.306187
10 j6012_6 0.269699 55 j6016_9 0.290323 100 j609_8 0.306716
11 j6011_10 0.270227 56 j601_3 0.290323 101 j6012_4 0.307245
12 j6013_6 0.270227 57 j6015_9 0.290851 102 j607_7 0.308302
13 j604_7 0.270227 58 j6015_6 0.29138 103 j608_8 0.308302
14 j6015_7 0.271285 59 j6011_7 0.291909 104 j6016_1 0.308831
15 j601_7 0.271285 60 j604_9 0.292438 105 j601_4 0.308831
16 j608_9 0.271285 61 j6014_5 0.292967 106 j601_9 0.308831
17 j6012_5 0.272343 62 j603_2 0.292967 107 j6015_5 0.30936
18 j605_4 0.272871 63 j609_10 0.292967 108 j6016_5 0.30936
19 j6012_2 0.2734 64 j6011_1 0.293496 109 j6016_8 0.309889
20 j6015_3 0.2734 65 j605_7 0.293496 110 j6014_6 0.310418
21 j607_9 0.274458 66 j606_8 0.293496 111 j602_4 0.311475
22 j6013_7 0.274987 67 j6011_8 0.294024 112 j609_5 0.311475
23 j601_6 0.274987 68 j602_9 0.294024 113 j6016_4 0.312533
24 j6013_3 0.276044 69 j6013_1 0.294553 114 j604_4 0.312533
25 j602_6 0.276044 70 j6010_8 0.295611 115 j604_5 0.313591
26 j605_8 0.276044 71 j608_3 0.29614 116 j6015_2 0.31412
27 j606_6 0.276044 72 j604_8 0.296668 117 j6013_9 0.315177
28 j608_2 0.276044 73 j605_6 0.296668 118 j601_5 0.315177
29 j609_1 0.276573 74 j606_10 0.297726 119 j6010_5 0.315706
30 j606_9 0.27816 75 j607_2 0.297726 120 j6012_3 0.315706
31 j602_10 0.278689 76 j609_3 0.297726 121 j601_10 0.315706
32 j606_1 0.278689 77 j6010_7 0.298255 122 j602_2 0.315706
33 j608_6 0.278689 78 j6012_9 0.298784 123 j609_6 0.317292
34 j601_1 0.279746 79 j6015_10 0.298784 124 j6016_2 0.317821
35 j607_3 0.280804 80 j6013_10 0.299313 125 j606_3 0.317821
36 j6012_1 0.281333 81 j6014_1 0.299313 126 j606_5 0.31835
37 j6013_4 0.281333 82 j606_4 0.299841 127 j6011_3 0.318879
38 j6011_5 0.281861 83 j607_4 0.299841 128 j607_10 0.318879
39 j6013_2 0.28239 84 j602_3 0.30037 129 j6010_9 0.319408
40 j605_10 0.282919 85 j6010_3 0.300899 130 j6014_7 0.319408
41 j608_7 0.283977 86 j6011_6 0.300899 131 j603_9 0.319408
42 j604_1 0.284506 87 j604_2 0.301428 132 j6012_7 0.320465
43 j608_5 0.284506 88 j605_1 0.301428 133 j6016_6 0.320465
44 j6010_2 0.285034 89 j603_1 0.301957 134 j608_1 0.321523
45 j603_6 0.285034 90 j609_4 0.302485 135 j6012_8 0.322581

451

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
*PSPLIB Networks - Ranges of RT Values****** Group#3: RT > 0.34
No. Net Name RT No. Net Name RT No. Net Name RT
1 j3011_7 0.356855 46 j305_2 0.399194 91 j3012_9 0.423387
2 j3012_5 0.356855 47 j3011_2 0.40121 92 j3014_4 0.423387
3 j308_3 0.358871 48 j3011_6 0.40121 93 j3011_5 0.425403
4 j308_9 0.358871 49 j3016_6 0.40121 94 j3013_9 0.425403
5 j3012_8 0.360887 50 j301_5 0.40121 95 j3014_3 0.425403
6 j302_2 0.360887 51 j303_2 0.40121 96 j3014_8 0.425403
7 j306_8 0.362903 52 j304_3 0.40121 97 j303_5 0.425403
8 j301_2 0.368952 53 j305_1 0.40121 98 j304_7 0.425403
9 j3010_5 0.370968 54 j308_8 0.40121 99 j306_5 0.425403
10 j301_8 0.370968 55 j303_8 0.403226 100 j309_1 0.425403
11 j3010_10 0.372984 56 j304_9 0.403226 101 j309_8 0.425403
12 j3010_9 0.372984 57 j308_4 0.403226 102 j3011_8 0.427419
13 j302_9 0.372984 58 j3012_7 0.405242 103 j301_9 0.427419
14 j3013_2 0.375 59 j302_4 0.405242 104 j302_6 0.427419
15 j3016_8 0.375 60 j305_3 0.405242 105 j303_9 0.427419
16 j302_3 0.375 61 j309_6 0.405242 106 j307_7 0.427419
17 j305_4 0.375 62 j3013_10 0.407258 107 j3016_4 0.429435
18 j307_6 0.375 63 j3015_2 0.407258 108 j307_9 0.429435
19 j302_7 0.377016 64 j3010_1 0.409274 109 j3015_5 0.431452
20 j305_6 0.381048 65 j3010_7 0.409274 110 j303_3 0.431452
21 j3012_10 0.383065 66 j3016_2 0.409274 111 j306_1 0.431452
22 j303_7 0.383065 67 j301_4 0.409274 112 j308_2 0.431452
23 j309_3 0.385081 68 j305_9 0.409274 113 j3013_8 0.433468
24 j3014_6 0.387097 69 j307_4 0.409274 114 j3015_3 0.433468
25 j3016_10 0.387097 70 j3010_6 0.41129 115 j309_2 0.433468
26 j302_1 0.387097 71 j3013_3 0.41129 116 j301_3 0.435484
27 j307_5 0.387097 72 j3013_6 0.41129 117 j308_1 0.435484
28 j308_6 0.387097 73 j3014_9 0.41129 118 j3013_1 0.4375
29 j3013_5 0.389113 74 j3016_7 0.41129 119 j3015_7 0.4375
30 j306_4 0.389113 75 j304_5 0.41129 120 j306_10 0.4375
31 j3010_8 0.391129 76 j309_9 0.41129 121 j3013_7 0.439516
32 j309_4 0.391129 77 j3016_3 0.413306 122 j305_5 0.439516
33 j3015_1 0.393145 78 j301_1 0.413306 123 j306_3 0.439516
34 j308_7 0.393145 79 j3015_4 0.415323 124 j3014_7 0.441532
35 j3012_1 0.395161 80 j3015_9 0.415323 125 j3015_8 0.441532
36 j3012_3 0.395161 81 j3016_1 0.415323 126 j306_2 0.441532
37 j3012_6 0.395161 82 j301_10 0.415323 127 j3016_9 0.443548
38 j307_2 0.395161 83 j3014_2 0.417339 128 j3017_1 0.443548
39 j3012_2 0.397177 84 j307_1 0.417339 129 j301_7 0.443548
40 j305_10 0.397177 85 j307_3 0.417339 130 j302_5 0.443548
41 j306_6 0.397177 86 j3012_4 0.419355 131 j308_5 0.443548
42 j307_8 0.397177 87 j305_8 0.419355 132 j3010_4 0.445565
43 j3011_10 0.399194 88 j309_7 0.419355 133 j3013_4 0.445565
44 j3015_6 0.399194 89 j3016_5 0.421371 134 j3014_10 0.447581
45 j302_8 0.399194 90 j304_10 0.421371 135 j304_6 0.451613

452

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
*PSPLIB Networks - Ranges of RT Values****** Group#4: RT > 0.46
No. Net Name RT No. Net Name RT No. Net Name RT
1 j3010_3 0.461694 46 j3029_1 0.495968 91 j3019_5 0.516129
2 j3014_5 0.461694 47 j3015_10 0.497984 92 j3020_2 0.516129
3 j3022_5 0.461694 48 j3021_6 0.497984 93 j3023_2 0.516129
4 j3010_2 0.46371 49 j3022_6 0.497984 94 j3024_1 0.516129
5 j3020_3 0.46371 50 j3027_2 0.497984 95 j3030_6 0.516129
6 j3020_4 0.46371 51 j3017_7 0.5 96 j3032_4 0.516129
7 j3025_8 0.465726 52 j3019_7 0.5 97 j3017_9 0.518145
8 j3031_1 0.465726 53 j3022_4 0.5 98 j3027_5 0.518145
9 j304_8 0.465726 54 j3021_9 0.502016 99 j3031_9 0.518145
10 j3030_9 0.467742 55 j3024_10 0.502016 100 j3017_10 0.520161
11 j3011_9 0.469758 56 j3025_5 0.502016 101 j3023_1 0.520161
12 j3031_2 0.469758 57 j3028_8 0.502016 102 j3023_9 0.520161
13 j304_2 0.469758 58 j3027_1 0.504032 103 j3029_4 0.520161
14 j3019_1 0.47379 59 j3029_9 0.504032 104 j3017_5 0.522177
15 j3023_8 0.47379 60 j3031_3 0.504032 105 j3023_5 0.522177
16 j3031_6 0.475806 61 j3020_1 0.506048 106 j3031_8 0.522177
17 j3022_10 0.477823 62 j3020_8 0.506048 107 j3020_10 0.524194
18 j3025_3 0.477823 63 j3022_3 0.506048 108 j3024_2 0.524194
19 j303_1 0.477823 64 j3023_6 0.506048 109 j3027_10 0.524194
20 j308_10 0.479839 65 j3025_1 0.506048 110 j3028_2 0.524194
21 j3019_10 0.481855 66 j3029_5 0.506048 111 j3029_7 0.52621
22 j3030_8 0.481855 67 j3032_3 0.506048 112 j3021_7 0.530242
23 j3018_10 0.483871 68 j3032_5 0.506048 113 j3030_2 0.530242
24 j3026_4 0.483871 69 j3020_6 0.508065 114 j3032_6 0.530242
25 j3026_9 0.483871 70 j3026_10 0.508065 115 j3021_5 0.532258
26 j3029_8 0.483871 71 j3032_10 0.508065 116 j3026_5 0.532258
27 j3020_7 0.485887 72 j3032_2 0.508065 117 j3027_3 0.532258
28 j3028_3 0.485887 73 j3027_7 0.510081 118 j3028_10 0.532258
29 j3029_2 0.485887 74 j3026_8 0.512097 119 j3031_7 0.532258
30 j3030_1 0.485887 75 j3029_3 0.512097 120 j3032_7 0.532258
31 j3025_4 0.487903 76 j3029_6 0.512097 121 j3017_3 0.534274
32 j3026_2 0.487903 77 j3030_10 0.512097 122 j3026_3 0.534274
33 j3031_4 0.487903 78 j304_1 0.512097 123 j304_4 0.534274
34 j3019_9 0.489919 79 j3017_8 0.514113 124 j3018_5 0.53629
35 j3024_5 0.489919 80 j3018_1 0.514113 125 j3019_6 0.53629
36 j3019_8 0.491935 81 j3018_6 0.514113 126 j3028_1 0.53629
37 j3024_6 0.491935 82 j3018_8 0.514113 127 j3030_3 0.53629
38 j3025_6 0.491935 83 j3018_9 0.514113 128 j3030_5 0.53629
39 j3032_8 0.493952 84 j3020_9 0.514113 129 j3017_2 0.538306
40 j3011_4 0.495968 85 j3021_10 0.514113 130 j3031_10 0.538306
41 j3014_1 0.495968 86 j3022_1 0.514113 131 j3022_7 0.540323
42 j3019_2 0.495968 87 j3023_3 0.514113 132 j3026_7 0.542339
43 j3022_8 0.495968 88 j3027_9 0.514113 133 j3027_8 0.542339
44 j3024_4 0.495968 89 j3028_4 0.514113 134 j3031_5 0.542339
45 j3024_9 0.495968 90 j3028_7 0.514113 135 j3017_4 0.544355

453

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
*PSPLIB Networks - Ranges of RT Values****** Group#5: RT > 0.58
No. Net Name RT No. Net Name RT No. Net Name RT
1 j3018_7 0.580645 46 j3039_7 0.600806 91 j3048_4 0.618952
2 j3036_9 0.580645 47 j3048_9 0.600806 92 j3033_2 0.620968
3 j3038_7 0.580645 48 j3033_4 0.602823 93 j3033_3 0.620968
4 j3021_4 0.582661 49 j3039_8 0.602823 94 j3033_6 0.620968
5 j3027_4 0.584677 50 j3040_7 0.602823 95 j3036_10 0.620968
6 j3035_2 0.584677 51 j3046_1 0.602823 96 j3045_6 0.620968
7 j3033_5 0.586694 52 j3046_6 0.602823 97 j3034_9 0.622984
8 j3036_2 0.586694 53 j3047_6 0.602823 98 j3039_3 0.622984
9 j3040_3 0.586694 54 j3037_1 0.604839 99 j3046_7 0.622984
10 j3043_2 0.586694 55 j3037_3 0.604839 100 j3047_3 0.622984
11 j3044_1 0.586694 56 j3039_5 0.604839 101 j3048_6 0.622984
12 j3018_2 0.58871 57 j3041_6 0.604839 102 j3038_8 0.625
13 j3034_1 0.58871 58 j3042_5 0.604839 103 j3040_10 0.625
14 j3035_4 0.58871 59 j3046_5 0.604839 104 j3035_5 0.627016
15 j3037_2 0.58871 60 j3033_10 0.606855 105 j3036_3 0.627016
16 j3048_10 0.58871 61 j3041_3 0.606855 106 j3038_1 0.627016
17 j3034_10 0.590726 62 j3043_4 0.606855 107 j3044_10 0.627016
18 j3040_2 0.590726 63 j3044_7 0.606855 108 j3037_10 0.629032
19 j3042_4 0.590726 64 j3045_9 0.606855 109 j3046_2 0.629032
20 j3023_4 0.592742 65 j3021_1 0.608871 110 j3046_8 0.629032
21 j3034_5 0.592742 66 j3034_6 0.608871 111 j3047_4 0.629032
22 j3034_8 0.592742 67 j3036_1 0.608871 112 j3047_9 0.629032
23 j3037_7 0.592742 68 j3038_9 0.608871 113 j3033_8 0.631048
24 j3040_8 0.592742 69 j3039_1 0.608871 114 j3042_10 0.631048
25 j3042_1 0.592742 70 j3039_2 0.608871 115 j3035_10 0.633065
26 j3043_3 0.592742 71 j3043_1 0.608871 116 j3040_5 0.633065
27 j3044_3 0.592742 72 j3044_2 0.608871 117 j3041_2 0.633065
28 j3038_3 0.594758 73 j3047_8 0.608871 118 j3045_2 0.633065
29 j3033_7 0.596774 74 j3047_2 0.610887 119 j3045_7 0.633065
30 j3040_1 0.596774 75 j3034_7 0.612903 120 j3046_4 0.633065
31 j3042_2 0.596774 76 j3037_4 0.612903 121 j3033_1 0.635081
32 j3043_7 0.596774 77 j3038_2 0.612903 122 j3037_5 0.635081
33 j3043_9 0.596774 78 j3041_7 0.612903 123 j3042_3 0.635081
34 j3046_10 0.596774 79 j3045_4 0.612903 124 j3042_8 0.635081
35 j3046_3 0.596774 80 j3047_10 0.612903 125 j3036_4 0.637097
36 j3047_5 0.596774 81 j3037_8 0.614919 126 j3039_6 0.639113
37 j3035_8 0.59879 82 j3040_9 0.614919 127 j3042_7 0.639113
38 j3036_6 0.59879 83 j3041_4 0.614919 128 j3037_9 0.641129
39 j3041_9 0.59879 84 j3038_5 0.616935 129 j3043_10 0.641129
40 j3044_8 0.59879 85 j3043_5 0.616935 130 j3048_8 0.641129
41 j3046_9 0.59879 86 j3045_1 0.616935 131 j3038_10 0.643145
42 j3047_7 0.59879 87 j3048_7 0.616935 132 j3039_10 0.643145
43 j3033_9 0.600806 88 j3035_3 0.618952 133 j3042_6 0.643145
44 j3035_1 0.600806 89 j3036_7 0.618952 134 j3045_3 0.643145
45 j3036_5 0.600806 90 j3040_4 0.618952 135 j3048_1 0.643145

454

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix F – Simulation Outputs – Networks’ Population Structure

455

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix F.1 – PSPLIB j30 – Scatterplots of Network's Normalized 1st Eigenvalues against Sample Size (Networks

j3024-8 and j3032-4)

Normalized Eigenvalue l(n)

Normalized Eigenvalue l(n)


1

456
Appendix F.2 – Plots of Deviations ∆μ,100 versus Sample Size n Required to Construct the

Matrix Xn x p (1st Largest Eigenvalues)

PSPLIB j60s

457

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PSPLIB j90s

458

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PSPLIB j120s

459

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix F.3 – Plots of Deviations ∆σ2,100 versus Sample Size n Required to Construct the
Matrix Xn x p (1st Largest Eigenvalues)

PSPLIB j60s

460

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PSPLIB j90s

461

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
PSPLIB j120s

462

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix F.4 – Outputs of Means’ and Variances’ Deviations for the Set of j60 Networks

Deviations ∆μ,100

Deviations (∆ , ) of 𝑙 ̅ from 𝜇 found each with 100 simulated Project Network 𝑺

Project Networks PSPLIB j60


n j602-7 j6010-5 j6015-1 j6020-7 j6028-9 j6035-8 j6040-5 j6042-6
100 -99.986 -108.242 -222.772 -224.327 -167.290 -168.746 -135.713 -139.888
150 -44.490 -48.129 -118.061 -118.122 -83.595 -85.398 -66.690 -68.583
200 -26.709 -29.376 -84.541 -84.662 -58.555 -58.491 -43.714 -44.197
250 -15.530 -17.631 -66.310 -67.059 -44.032 -43.849 -30.870 -31.152
300 -7.951 -10.218 -54.105 -54.184 -32.866 -33.321 -21.990 -21.961
350 -2.113 -3.542 -44.534 -45.025 -25.191 -25.763 -15.043 -14.603
400 2.644 1.294 -36.918 -37.376 -19.102 -19.120 -9.532 -9.162
450 6.935 5.448 -30.501 -31.100 -13.901 -13.833 -4.449 -4.358
500 10.716 9.424 -24.718 -25.580 -9.341 -8.852 -0.537 -0.063
550 13.883 12.575 -20.190 -20.719 -5.142 -4.903 3.267 3.734
600 16.813 15.901 -16.027 -16.403 -1.417 -1.282 6.880 7.205
650 19.552 18.625 -12.112 -12.545 2.105 1.861 9.917 10.469
700 22.247 21.080 -8.207 -8.840 5.184 5.179 12.701 12.952
750 24.515 23.754 -4.987 -5.317 8.237 8.069 15.409 15.925
800 26.938 25.981 -2.045 -2.415 10.877 10.824 17.931 18.343
850 29.034 28.108 0.872 0.533 13.245 13.365 20.413 20.536
900 31.083 30.418 3.690 3.304 16.083 15.730 22.413 23.018
950 33.016 32.382 6.244 5.770 18.100 18.150 24.727 25.097
1000 34.865 34.148 8.851 8.262 20.417 20.362 26.787 27.236
1050 36.705 36.055 11.106 10.788 22.575 22.553 28.838 29.411
1100 38.585 37.821 13.330 13.172 24.584 24.437 30.606 31.101

463

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Deviations ∆σ2,100 or ∆var,100

Deviations (∆ ,
) of 𝑙 ̅ from 𝜇 found each with 100 simulated Project Network 𝑺

Project Networks PSPLIB j60


n j602-7 j6010-5 j6015-1 j6020-7 j6028-9 j6035-8 j6040-5 j6042-6
100 61.089 73.191 156.766 134.854 186.495 101.846 111.689 108.234
150 16.168 11.586 33.952 37.779 40.885 20.375 33.883 25.154
200 5.216 5.966 14.173 14.287 20.160 8.000 18.146 10.696
250 3.289 3.318 9.981 9.970 11.513 4.757 9.258 8.872
300 1.924 1.831 6.028 6.891 8.634 3.078 4.713 6.167
350 0.997 1.070 3.960 3.262 4.367 1.498 4.242 2.900
400 0.650 0.822 3.098 3.750 3.807 1.551 2.702 2.327
450 0.039 0.214 2.259 2.217 3.368 0.321 1.455 2.206
500 0.363 -0.074 1.650 1.694 1.263 0.385 1.568 1.177
550 -0.232 -0.153 1.088 0.489 2.128 0.448 1.316 1.034
600 -0.220 -0.150 0.820 0.838 1.173 0.106 0.360 0.865
650 -0.380 -0.554 0.536 0.382 1.171 -0.079 0.788 0.408
700 -0.283 -0.353 0.674 0.649 0.435 -0.306 0.400 0.750
750 -0.487 -0.649 0.328 0.179 0.312 -0.316 0.318 0.305
800 -0.496 -0.529 0.167 -0.069 0.112 -0.416 -0.034 0.323
850 -0.593 -0.516 -0.116 -0.052 -0.028 -0.446 -0.232 0.016
900 -0.651 -0.687 -0.282 0.126 -0.045 -0.545 -0.158 0.108
950 -0.721 -0.763 -0.395 -0.283 -0.364 -0.624 -0.136 -0.069
1000 -0.725 -0.647 -0.378 -0.332 -0.251 -0.650 -0.324 -0.436
1050 -0.710 -0.696 -0.294 -0.423 -0.282 -0.611 -0.488 -0.332
1100 -0.796 -0.709 -0.469 -0.387 -0.094 -0.698 -0.441 -0.311

464

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix F.5 – Outputs of Deviations ∆μ,100 and Slopes for Network j3032-4

Network j3032-4 - Mean Deviations / Slopes /Norm I and Norm II


Norm I Norm II
Sample Size
No. Deviations Slopes Deviations Slopes
(n)
1 40 -170.2164 0.0000 -80.9100 0.0000
2 50 -84.4687 8.5748 -24.9148 5.5995
3 60 -57.8888 2.6580 -6.3305 1.8584
4 70 -44.3811 1.3508 3.9608 1.0291
5 80 -35.9859 0.8395 11.0058 0.7045
6 90 -30.0721 0.5914 16.4467 0.5441
7 100 -25.4776 0.4595 20.9862 0.4540
8 110 -22.1067 0.3371 24.8268 0.3841
9 120 -18.8607 0.3246 28.3943 0.3567
10 130 -16.2857 0.2575 31.5891 0.3195
11 140 -14.0736 0.2212 34.5429 0.2954
12 150 -12.1901 0.1883 37.2959 0.2753
13 160 -10.3208 0.1869 39.9472 0.2651
14 170 -8.7413 0.1579 42.4452 0.2498
15 180 -7.1155 0.1626 44.8810 0.2436
16 190 -5.6932 0.1422 47.2066 0.2326
17 200 -4.4174 0.1276 49.4441 0.2237
18 210 -3.1559 0.1262 51.6281 0.2184
19 220 -2.1828 0.0973 53.7078 0.2080
20 230 -0.9677 0.1215 55.7955 0.2088
21 240 0.0229 0.0991 57.8007 0.2005
22 250 1.1296 0.1107 59.7934 0.1993
23 260 1.9278 0.0798 61.6958 0.1902
24 270 2.9221 0.0994 63.6051 0.1909
25 280 3.8017 0.0880 65.4654 0.1860
26 290 4.5510 0.0749 67.2767 0.1811
27 300 5.3870 0.0836 69.0790 0.1802
28 310 6.2067 0.0820 70.8555 0.1776
29 320 7.0305 0.0824 72.6106 0.1755
30 330 7.7180 0.0688 74.3225 0.1712
31 340 8.4351 0.0717 76.0198 0.1697
32 350 9.1771 0.0742 77.7026 0.1683

465

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix F.6 – Outputs of Deviations ∆μ,100 and Slopes for Network j3032-4

Network j3032-4 - Variance Deviations


Norm I Norm II
Sample Size
No. Deviations Slopes Deviations Slopes
(n)
1 40 90.9073 0.0000 25.9473 0.0000
2 50 23.9490 -6.6958 4.5067 -2.1441
3 60 9.3052 -1.4644 0.8145 -0.3692
4 70 4.0490 -0.5256 -0.2625 -0.1077
5 80 3.0473 -0.1002 -0.4956 -0.0233
6 90 1.4109 -0.1636 -0.7382 -0.0243
7 100 1.1953 -0.0216 -0.7889 -0.0051
8 110 0.4145 -0.0781 -0.8779 -0.0089
9 120 0.1343 -0.0280 -0.9112 -0.0033
10 130 0.1476 0.0013 -0.9178 -0.0007
11 140 0.1623 0.0015 -0.9233 -0.0005
12 150 -0.0681 -0.0230 -0.9429 -0.0020
13 160 -0.3611 -0.0293 -0.9635 -0.0021
14 170 -0.3903 -0.0029 -0.9674 -0.0004
15 180 -0.4324 -0.0042 -0.9714 -0.0004
16 190 -0.2356 0.0197 -0.9636 0.0008
17 200 -0.5156 -0.0280 -0.9782 -0.0015
18 210 -0.5729 -0.0057 -0.9817 -0.0004
19 220 -0.6026 -0.0030 -0.9838 -0.0002
20 230 -0.5624 0.0040 -0.9829 0.0001
21 240 -0.7093 -0.0147 -0.9891 -0.0006
22 250 -0.7195 -0.0010 -0.9900 -0.0001
23 260 -0.7558 -0.0036 -0.9916 -0.0002
24 270 -0.6949 0.0061 -0.9899 0.0002
25 280 -0.7202 -0.0025 -0.9911 -0.0001
26 290 -0.7510 -0.0031 -0.9923 -0.0001
27 300 -0.7945 -0.0043 -0.9939 -0.0002
28 310 -0.7829 0.0012 -0.9937 0.0000
29 320 -0.7842 -0.0001 -0.9940 0.0000
30 330 -0.8684 -0.0084 -0.9964 -0.0002
31 340 -0.8137 0.0055 -0.9951 0.0001
32 350 -0.8537 -0.0040 -0.9962 -0.0001

466

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix F.7 – Outputs of KS Testing (2nd Largest Eigenvalue)

Norm I

1000 Simulations
Normalized 1st Eigenvalues (Norm I, 0.025)
Significance level α / Probability P
0.01 / 0.99 0.05 / 0.95 0.10 / 0.9 0.20 / 0.80
Network n RT D₂ KS-test 1 KS-test 2 KS-test 3 KS-test 4
j3011-1 103 0.46 0.02 1 1 1 1
j3011-1 104 0.46 0.024 1 1 1 1
j3011-1 105 0.46 0.021 1 1 1 1
j3011-1 106 0.46 0.025 1 1 1 1
j3011-1 107 0.46 0.012 1 1 1 1
j3024-8 104 0.46 0.045 1 0 0 0
j3024-8 105 0.46 0.051 1 0 0 0
j3037-6 116 0.688 0.051 1 0 0 0
j3038-7 127 0.581 0.046 1 0 0 0
j3038-7 128 0.581 0.039 1 1 0 0
j3038-7 129 0.581 0.036 1 1 1 0
j3038-7 131 0.581 0.049 1 0 0 0
j3041-8 124 0.579 0.051 1 0 0 0
j3041-8 128 0.579 0.045 1 0 0 0
j3041-8 124 0.579 0.043 1 1 0 0
j3041-8 125 0.579 0.044 1 0 0 0
j3041-8 127 0.579 0.045 1 0 0 0
j3048-2 143 0.651 0.023 1 1 1 1
j3048-2 144 0.651 0.039 1 1 0 0
j3048-2 145 0.651 0.026 1 1 1 1
j3048-2 146 0.651 0.032 1 1 1 1
j3048-2 147 0.651 0.034 1 1 1 0
j305-7 118 0.339 0.036 1 1 1 0
j305-7 119 0.339 0.048 1 0 0 0

Selected Networks Ho accepted

467

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Norm II

1000 Simulations
Normalized 1st Eigenvalues (Norm II, 0.025)
Significance level α / Probability P
0.01 / 0.99 0.05 / 0.95 0.10 / 0.9 0.20 / 0.80
Network n RT D₂ KS-test 1 KS-test 2 KS-test 3 KS-test 4
j3038-5 53 0.617 0.0311 1 1 1 1
j3038-5 54 0.617 0.0482 1 0 0 0
j3038-5 55 0.617 0.0361 1 1 1 0
j3038-5 56 0.617 0.0495 1 0 0 0
j3038-5 57 0.617 0.0476 1 0 0 0

Selected Networks Ho accepted

468

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix F.8 – Outputs of KS Testing (3rd Largest Eigenvalue)

Norm I

1000 Simulations
Normalized 1st Eigenvalues (Norm I, 0.025)
Significance level α / Probability P
0.01 / 0.99 0.05 / 0.95 0.10 / 0.9 0.20 / 0.80
Network n RT D₂ KS-test 1 KS-test 2 KS-test 3 KS-test 4
j3024-8 104 0.46 0.017 1 1 1 1
j3024-8 105 0.46 0.022 1 1 1 1
j3024-8 106 0.46 0.018 1 1 1 1
j3024-8 107 0.46 0.029 1 1 1 1
j3024-8 108 0.46 0.023 1 1 1 1
j3041-8 124 0.579 0.018 1 1 1 1
j3041-8 125 0.579 0.014 1 1 1 1
j3041-8 126 0.579 0.028 1 1 1 1
j3041-8 127 0.579 0.02 1 1 1 1
j3041-8 128 0.579 0.019 1 1 1 1

Selected Networks Ho accepted

469

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Norm II

1000 Simulations
Normalized 1st Eigenvalues (Norm II, 0.025)
Significance level α / Probability P
0.01 / 0.99 0.05 / 0.95 0.10 / 0.9 0.20 / 0.80
Network n RT D₂ KS-test 1 KS-test 2 KS-test 3 KS-test 4
j3011-1 49 0.4597 0.0396 1 1 0 0
j3011-1 50 0.4597 0.027 1 1 1 1
j3011-1 51 0.4597 0.0317 1 1 1 1
j3012-6 59 0.3952 0.0512 1 0 0 0
j3034-10 50 0.5907 0.0451 1 0 0 0
j3034-10 51 0.5907 0.0343 1 1 1 0
j3034-10 52 0.5907 0.0207 1 1 1 1
j3034-10 53 0.5907 0.0376 1 1 1 0
j3037-6 50 0.6875 0.0425 1 1 0 0
j3037-6 51 0.6875 0.0418 1 1 0 0
j3037-6 52 0.6875 0.0393 1 1 0 0
j3037-6 53 0.6875 0.0398 1 1 0 0
j3038-7 51 0.5806 0.044 1 0 0 0
j3038-7 52 0.5806 0.0403 1 1 0 0
j3038-7 53 0.5806 0.0468 1 0 0 0
j6028-9 122 0.403 0.0461 1 0 0 0
j6028-9 124 0.403 0.0424 1 1 0 0
j6028-9 125 0.403 0.0411 1 1 0 0
j6028-9 126 0.403 0.0343 1 1 1 0
j6042-6 115 0.5738 0.0316 1 1 1 1
j6042-6 116 0.5738 0.0355 1 1 1 0
j6042-6 117 0.5738 0.0304 1 1 1 1
j6042-6 118 0.5738 0.0343 1 1 1 0
j6042-6 119 0.5738 0.027 1 1 1 1
j9010-5 174 0.2174 0.0467 1 0 0 0
j9010-5 175 0.2174 0.043 1 1 0 0
j9010-5 176 0.2174 0.0502 1 0 0 0
j12012-1 348 0.2179 0.0393 1 1 0 0
j12012-1 349 0.2179 0.0435 1 0 0 0
j12012-1 350 0.2179 0.0498 1 0 0 0
j12014-1 260 0.178 0.0377 1 1 1 0
j12014-1 261 0.178 0.0331 1 1 1 1
j12014-1 262 0.178 0.0452 1 0 0 0

Selected Networks Ho accepted

470

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix F.9 – Outputs of KS Testing – Illustration of Untreated Results (1st Largest
Eigenvalue – Norm I – j90)

K-S TEST
Test 1 Test 2 Test 3 Test 4
Network n D2 0.01 0.05 0.1 0.2
j9010_5 1007 0.1653 0 0 0 0
1008 0.1388 0 0 0 0
1009 0.1459 0 0 0 0
1010 0.1378 0 0 0 0
1011 0.1410 0 0 0 0
j9014_5 1050 0.1003 0 0 0 0
1051 0.0789 0 0 0 0
1052 0.0544 0 0 0 0
1053 0.0422 1 1 0 0
1054 0.0405 1 1 0 0
j901_3 689 0.0994 0 0 0 0
690 0.0936 0 0 0 0
691 0.0664 0 0 0 0
692 0.0716 0 0 0 0
693 0.0927 0 0 0 0

j90 - Norm I
Sample Statistics
Network n Median Mode Mean Variance Skewness Kurtosis
j9010_5 1007 -0.9146 -8.2506 -0.9090 4.3014 -0.0341 0.0358
1008 -1.0735 -7.0683 -1.0699 4.5727 0.1007 -0.0994
1009 -1.1035 -7.6716 -1.0869 4.7206 -0.0567 -0.1518
1010 -1.2844 -7.9919 -1.2241 4.4182 -0.0166 -0.1697
1011 -1.1573 -7.2551 -1.2111 4.5736 -0.0573 0.0222
j9014_5 1050 -1.0046 -5.5804 -0.9743 2.0825 0.0540 0.1337
1051 -1.0393 -5.9595 -1.0577 1.8915 -0.0614 -0.0233
1052 -1.1111 -4.7403 -1.0891 1.6937 0.0611 -0.2390
1053 -1.1600 -5.1489 -1.1970 1.7254 -0.0636 -0.1756
1054 -1.2869 -5.5193 -1.2866 1.7904 0.0286 0.2135
j901_3 689 -1.1022 -5.7798 -0.9368 2.6187 0.5464 0.4098
690 -1.1653 -5.6746 -1.0335 2.6891 0.4323 0.6950
691 -1.3215 -6.5441 -1.1254 2.9931 0.7407 1.3939
692 -1.2752 -6.1249 -1.1782 2.5958 0.3315 0.2867
693 -1.3921 -5.6574 -1.2924 2.7591 0.3605 0.1375

471

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix F.10 – Q-Q Plots (3rd Largest Eigenvalue)

472

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
473

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms
Appendix G
Different Formulations of the Sample Covariance Matrix 𝑺𝒑 𝒑

𝑇
𝑿𝑛 𝑝 𝑥1 𝒙2 … 𝒙𝑝 with each column vector 𝒙𝑗 𝐸𝐹1𝑗 , … , 𝐸𝐹𝑛𝑗 representing observed EF times of the
project activity “j.”]
𝑇
Standardizing 𝑿 to obtain a new matrix 𝑺 𝑿 𝑿. Where:

𝒙 𝒙
𝒘 , Denoted with 𝒙 is the mean of 𝒙 .
𝒙 𝒙

𝑿 𝑹𝑾 𝑥1 𝑥2 … 𝑥𝑝 necessary to synthesize a Gaussian data matrix with R defines as follows:

𝑹 𝜒 𝑛, 𝑛, 𝑝 𝐶𝑡 , ,

𝐶𝑡𝛼,𝑛,𝑝 is a function of n, p, and 𝛼 defined as:


No. Expression/Formula Notes
1 𝛼 5%
𝑛 1 Used for all conducted simulations. Only
𝐶𝑡 , , 𝜒 𝛼 2 have one J90 and one J120 left due to
𝑛 𝑛 𝑝
their larger sizes.

Also tried replacing 𝛼 2 in the formula of


𝐶𝑡 , , with 𝛼 and 2𝛼
But it did not improve the results of those
networks, which are not TW good fits.
Also tried running 10,000 on one of the
J30 but did not work either.

2 𝑛 1 𝛼 This is related to the Hoteling’s 𝑇 in


𝐶𝑡 , , 𝐹, 2 multivariate analysis (like student’s t-
𝑛 𝑛 𝑝
distribution for univariate analysis)—Ref:
6th edition of Johnson and Wichern (2020)

Current running simulations to see if this


helps to obtain a better fit for:
Net=j1201_2/j12014_1
Needed to determine the crossing points,
then run the 1000 simulations.
3 𝐶𝑡 , , 1 As per Johnstone (2001)’s suggestion.
Run one simulation for Net3=j9045_1

4 𝑛 1 A variant of formulae 1 and 2 without 𝛼


𝐶𝑡 , ,
𝑛 𝑛 𝑝
𝛼 is the significance level of the hypothesis testing to be conducted to confirm evidence of a covariance
structure in construction project networks and that their behaviors are governed by TW. Including 𝛼 helps
to build a confidence or acceptance region/interval as done by investigators in stat and prob related fields.

𝜒 𝑛, 𝑛, 𝑝 is an 𝑛 𝑝 matrix of random number sampled from the chi-square (𝜒 ) distribution with n


degrees of freedom., where: 𝐹𝑝,𝑛 𝑝 denotes the upper 100𝛼 th percentile of the 𝐹𝑝,𝑛 𝑝 distribution.

474

This content downloaded from


182.255.0.242 on Wed, 19 Mar 2025 09:44:41 UTC
All use subject to https://about.jstor.org/terms

You might also like