Efficient distributed event driven simulations of multiple-loop networks
Simulating asynchronous multiple-loop networks is commonly considered a difficult task for parallel programming. This paper presents two examples of asynchronous multiple-loop networks: a stylized queuing system and an Ising model. The network topology ...
Performance evaluation for multiprocessors programmed using monitors
We present a classification of synchronization delays inherent in multiprocessor systems programmed using the monitor paradigm. This characterization is useful in relating performance of such systems to algorithmic parameters in subproblems such as ...
Queueing analysis of finite buffer token networks
This paper introduces analytic models for evaluating demand assignment protocols in realistic finite buffer/finite station network configurations. We present a solution for implicit and explicit token passing systems enabling us to model local area ...
Performance modelling of a HSLAN slotted ring protocol
The slotted ring protocol which is evaluated in this paper is suitable for use at very large transmission rates. In terms of modelling it is a multiple cyclic server system. A few approximative analytical models of this protocol are presented and ...
A case study of DECnet applications and protocol performance
This paper is a study based on measurements of network activities of a major site of Digital's world-wide corporate network. The study yields two kinds of results: (1) DECnet protocol performance information and (2) DECnet session statistics. Protocol ...
A symptotic analysis of large heterogeneous queueing systems
As a simple example of a large heterogeneous queueing system, we consider a single queue with many servers with differing service rates. In the limit of infinitely many servers, we identify a queue control policy that minimizes the average system delay. ...
The limited performance benefits of migrating active processes for load sharing
Load sharing in a distributed system is the process of transparently sharing workload among the nodes in the system to achieve improved performance. In non-migratory load sharing, jobs may not be transferred once they have commenced execution. In load ...
From local to global: an analysis of nearest neighbor balancing on hypercube
This paper will focus on the issue of load balancing on a hypercube network of N processors. We will investigate a typical nearest neighbor balancing strategy - in which workloads among neighboring processors are averaged at discrete time steps. The ...
Application level modeling of parallel machines
In this paper, we consider the application level performance modeling of parallel machines consisting of a large number of processing elements (PE's) connected in some regular structure such as mesh, tree, hypercube, etc. There are K problem types, each ...
Analytic derivation of processor potential utilization in straight line, ring, square mesh, and hypercube networks
In multicomputer architectures, in which processors communicate through message-passing, the overhead encountered because of the need to relay messages can significantly affect performance. Based upon some simplifying assumptions including the rate at ...
Scheduling in multiprogrammed parallel systems
Processor scheduling on multiprocessor systems that simultaneously run concurrent applications is currently not well-understood. This paper reports a preliminary investigation of a number of fundamental issues which are important in the context of ...
On hot-spot contention in interconnection networks
A major component of a parallel machine is its interconnection network, which provides concurrent communication between the processing elements. It is common to use a multi-stage interconnection network (MIN) which is constructed using crossbar switches ...
Performance analysis of multipath multistage interconnection networks
This paper closely examines the performance analysis for unbuffered multipath multistage interconnection networks. A critical discussion of commonly used analysis is provided to identify a basic flaw in the model. A new analysis based on the grouping of ...
Modelling and performance evaluation of multiprocessor based packet switches
This paper presents an approximate analytic model for the performance analysis of a class of multiprocessor based packet switches. For these systems, processors and common memory modules are grouped in clusters, each of them composed of several ...
A manufacturing capacity planning experiment through functional workload decomposition
In this paper, we describe an experiment to evaluate a distributed architecture via functional database workload decomposition. A workload in a circuit pack assembly environment was decomposed and mapped onto a frontend/backend distributed computer ...
Comparison of dataflow control techniques in distributed data-intensive systems
In dataflow architectures, each dataflow node (i.e., operation) is typically executed on a single physical node. We are concerned with distributed data-intensive systems, in which each base (i.e., persistent) set of data has been declustered over many ...
A mean-value performance analysis of a new multiprocessor architecture
This paper presents a preliminary performance analysis of a new large-scale multiprocessor: the Wisconsin Multicube. A key characteristic of the machine is that it is based on shared buses and a snooping cache coherence protocol. The organization of the ...
Sensitivity analysis of reliability and performability measures for multiprocessor systems
Traditional evaluation techniques for multiprocessor systems use Markov chains and Markov reward models to compute measures such as mean time to failure, reliability, performance, and performability. In this paper, we discuss the extension of Markov ...
Design of partially replicated distributed database systems: an integrated methodology
The objective of this research is to develop and integrate tools for the design of partially replicated distributed database systems. Many existing tools are inappropriate for designing large-scale distributed databases due to their large computational ...
Monitoring and performance measuring distributed systems during operation
This paper describes an integrated tool for monitoring distributed systems continuously during operation. A hybrid monitoring approach is used. As special hardware support a test and measurement processor (TMP) was designed, which is part of each node ...
The use of microcode instrumentation for development, debugging and tuning of operating system kernels
We have developed a tool based on microcode modifications to a VAX 8600 which allows a wide variety of operating system measurements to be taken with minimal perturbation and without the need to modify any operating system software. A trace of ...
Memory-reference characteristics of multiprocessor applications under MACH
Shared-memory multiprocessors have received wide attention in recent times as a means of achieving high-performance cost-effectively. Their viability requires a thorough understanding of the memory access patterns of parallel processing applications and ...
Characterising program behaviour with phases and transitions
A detailed quantitative study of program behaviour is described. Reference strings from a representative set of programs were decomposed into phases and transitions. Referencing behaviour is studied at both the macro level (program-wide) and the micro ...
Adaptive storage control for page frame supply in large scale computer systems
A real storage management algorithm called Adaptive Control of Page-frame Supply (ACPS) is described. ACPS employees three strategies: prediction of the demand for real page frames, page replacement based on the prediction, and working set control. ...
On the properties of approximate mean value analysis algorithms for queueing networks
This paper presents new formulations of the approximate mean value analysis (MVA) algorithms for the performance evaluation of closed product-form queueing networks. The key to the development of the algorithms is the derivation of vector nonlinear ...
Optimal allocation of multiple class resources in computer systems
A class-constrained resource allocation problem is considered. In this problem, a set of M heterogeneous resources is to be allocated optimally among a set of L users belonging to K user classes. A set of class allocation constraints, which limit the ...
PAM-a noniterative approximate solution method for closed multichain queueing networks
Approximate MVA algorithms for separable queueing networks are based upon an iterative solution of a set of modified MVA formulas. Although each iteration has a computational time requirement of O(MK2) or less, many iterations are typically needed for ...