|
Distributed and Parallel Computing Hesham El-Rewini and Ted G. Lewis 1997 | 469 pages ISBN: 1884777511 |
|||
| $60.00 | Softbound print book | Out of print (?) | |
Introduction
If a single computer can solve a problem in ten seconds, can ten computers solve the same problem in one second? These and similar questions have intrigued computer scientists since the earliest days of electronic computers. Pioneers such as John von Neumann (1949) and Gene Amdahl (1967) dreamed of ever faster computing machines created by harnessing the power of multiple processing elements. But their dreams have been slow to materialize. In fact, the design and construction of massively parallel machines peaked in the early 1990s. Nearly all such machines have since become extinct. Even modestly parallel machines have had difficulty holding their own against advancing single-chip designs. Why?Parallel computers turned out to be more difficult to build and use within mainstream computing than anyone previously thought. For at least the first three decades (1950-1970) of the troubled history of parallel computing, design and construction of the hardware were extremely challenging. Early machines such as the Illiac IV (University of Illinois and Burroughs Corporation) were so unreliable that they rarely ran for more than a few hours before breaking down. And the hardware costs were so high that nearly everyone ignored the unsolved problems of software.
Another two decades (1970-1980) passed before hardware became reliable enough and inexpensive enough to make the production of commercial machines feasible. By the 1980s, microprocessor-based parallel machines looked like a sure thing. But their designers overlooked a critical piece of the puzzle: software. By 1990, almost anyone could build extremely fast and reliable machines, but few people could program them. The parallel computer industry hit a wall--writing parallel programs was simply too difficult and time-consuming. The entire industry hit a brick wall called "software."
The parallel computing discipline retrenched by the early 1990s and prepared for a long siege. The problem of parallel programming would take longer to solve than anyone had imagined. Most everyone went back to the drawing board. Parallel computing returned to the relative safety of academic research. With some exceptions, that is where it has remained. Research into programming parallel machines has become a challenging research topic, much as the research into parallel hardware was in the 1980s.
In the meantime, while most people were concentrating on parallel computing in the form of internally linked processors, another community was concentrating on external parallelism in the form of networked computers. Instead of putting everything in a box and tightly coupling processors to memory, the Internet achieved a kind of parallelism by loosely connecting everything outside of the box. Thus, distributed computing was reborn as a kind of "lazy parallelism." A network of computers could team up to solve many problems at once, rather than one problem at a much higher speed. In a sense, distributed computing is parallel computing applied to several problems at the same time.
In fact, distributed and parallel computing are extremes in a spectrum of concurrent computing with everything in-between. A distributed computer system may be loosely coupled, but it is parallel. A parallel computer may connect processing elements and memories via high-speed busses instead of network links, but it is a kind of distributed system. In many systems, it is difficult to decide whether the hardware is distributed or parallel. Programming models are more or less unrelated to the underlying hardware models. Everything is merging together.
Today's distributed and parallel system is a collection of hardware and software components that optimizes per-problem performance, multiple-problem throughput, and reliability--or a combination of these. It is perhaps no longer relevant which paradigm is used. However, many of the old problems remain: Programming is still difficult, and getting the maximum performance out of all the processors all the time remains a challenge. Clustering lots of processors together via communication links often disappoints the user. Performance often falls far short of promises made by vendors. The techniques that work for writing serial programs often fail when applied to distributed programs. The return on effort is frequently not rewarded by better throughput, response time, or reliability.
To get the most out of a distributed and parallel computer system, designers and software developers must understand the interaction between hardware and software parts of the system. This is the reason we wrote this book. We want the reader to understand the power and limitations of distributed and parallel systems. Rather than oversell the technology, our goal is to apprise the reader of both the beneficial and challenging aspects of parallelism.
After many decades of progress, we know a lot about distributed and parallel computers. We know that some problems lend themselves to parallel hardware and others work better on distributed hardware. We know that parallel programming remains difficult, and special programming languages are required. We have an array of theoretical and practical results. We have a variety of languages and tools. We can, with some considerable effort, build highly distributed and parallel systems. This book surveys all aspects of distributed and parallel computing. Hopefully it clarifies the field.
But much remains to be accomplished. There is no universal parallel programming system. Parallel programming involves all of the challenges serial programming--plus additional challenges, such as data partitioning, task partitioning, task scheduling, parallel debugging, and synchronization. Unlike single-processor systems, interconnection bandwidth and message latency dominate the performance of distributed and parallel systems. There is no clear-cut way to predict what the performance of a new system will be. Therefore, it is difficult to predict what the benefits of a distributed or parallel system will be prior to an investment of time and effort.
The Internet has changed everything, and yet there are few tools for writing distributed programs that run over an Internet of computers. Java and related technologies promise these tools, but Java is a very low level tool--for example, the only mechanism in Java for writing distributed programs is the socket. This mechanism is far too low-level to be useful in large-scale programming. Java must mature and incorporate broadcast mechanisms, such as IP multicast, and distributed memory facilities, such as global shared memory. To know what Java needs, you need to know more about distributed and parallel computing. Similarly, Java's thread mechanism provides a very low level and dangerous mechanism for writing parallel programs. Because of its inadequacies as a programming language for highly parallel systems, Java can become a dangerous weapon in the hands of an unskilled programmer.
The next steps in the evolution of distributed and parallel computing will take place on both fronts: inside and outside the box. Inside, parallelism will continue to be used by hardware designers to increase processor performance. Intel's MMX (multimedia extensions) technology, which implements a small-scale form of data parallelism, is one example. Out-of-order instruction execution by superscalar processors, such as the Digital Alpha microprocessor is another example of internal parallelism. Thus, the decade-old dream of the pioneers may yet come true.
Outside, distributed processing will continue to progress as network computing becomes mainstream. Java applet technology and other distributed component technology will mature until nearly every program written is automatically "network aware. The field of distributed and parallel computing will grow and become synonymous with mainline computing. This is why the field is so exciting. Distributed and parallel computing is all of computing.
how to use this book
This book has been designed for several different audiences. It can be used for either an upper-division undergraduate textbook or a first-year graduate textbook in engineering, science, and computer science degree programs. It may also be used by practitioners working as programmers, managers, and technologists. Portions of the book can be used to teach short courses to practitioners.
Different chapters might be used to offer courses with different flavors--for example, a course can be designed to briefly cover the different programming languages, while another may focus on only one language in great depth. Additionally, chapters 4 and 5, which cover theoretical aspects, may be taught at the graduate level.
We suggest the following course outlines. Each of the outlines, Survey, Programming, and Graduate, describes a one-semester course. The final five suggestions are short courses designed for Practitioners.
Survey
This course teaches the fundamentals and the different programming paradigms.
- Chapters 1-3, 6, 8-10,12
This course teaches fundamentals, scheduling techniques, and a single programming paradigm in detail. Select one of the following.
- Chapters 1-3, 6-7, 10-11 (MPI)
- Chapters 1-3, 6-8 (Parallaxis)
- Chapters 1-3, 6-7, 9 (PVM)
- Chapters 1-3, 6-7, 12 (Java)
Graduate
This course teaches the fundamentals, theoretical models and algorithms, program design, scheduling techniques, and a single programming language. This course is appropriate for graduate students.
- Chapters 1-7 and one of the following chapters: 8, 9, 11, or 12
Select one of the following.
- Fundamentals of distributed and parallel computing
- Chapters 1-3 - Distributed and parallel programming in parallaxis
- Chapters 1, 6, 8 - Distributed and Parallel Programming in PVM
- Chapters 1, 6, 9
- Distributed and Parallel Programming in MPI
- Chapters 1, 10-11 - Distributed and Parallel Programming in Java
- Chapters 1, 6, 12
This book spans four major areas of the field: performance and architecture, theory and complexity analysis of parallel algorithms, design and scheduling of distributed and parallel tasks, and programming languages and systems for writing distributed and parallel programs.
Cutting across these broad topical areas are the various programming paradigms--for example, data parallel, control parallel, and distributed programming. After developing these fundamental concepts, we illustrate them in a wide variety of algorithms and programming languages and environments. This approach gives the reader a broad, yet insightful, view of the field. The material is organized in 12 chapters, as follows.
Chapter 1 is a survey of the field of distributed and parallel computing at an introductory level. We first study the evolution of computing and the changes that have led to distributed computing. We then provide a classification of distributed and parallel systems based on the grain size and system architecture.
Chapter 2 is about performance. How should we characterize the performance of a computer system when, in effect, distributed computing redefines traditional measures such as million instructions per second (MIPS) and million floating-point operations per second (MFLOPS)? New measures of performance, such as speedup, are needed. This chapter examines several versions of speedup, as well as other performance measures and benchmarks.
Chapter 3 surveys the general architectures of distributed and parallel systems. It offers a taxonomy of interconnection networks and covers different types of systems: beginning with the simplest (shared memory), followed by the more elaborate message-passing classification, and ending with the Internet. The chapter also covers the emerging distributed processing software platforms (distributed objects, distributed compound documents, etc.).
Chapters 4 and 5 cover theoretical models, algorithms, and complexity analysis. In chapter 4, we discuss a shared-memory abstract model, which can be used to study parallel algorithms and evaluate their complexities. Abstract models and algorithms for message-passing systems are studied in chapter 5.
In chapter 6, we show how to develop programs for distributed and parallel systems. We give two distributed program development methods--one for construction of new programs and the other for reverse-engineering of existing sequential programs. The spiral model for developing distributed programs is also discussed.
Chapter 7 addresses the scheduling problem in many of its variations. We survey a number of solutions to this important problem. We cover program and system models, optimal algorithms, heuristic algorithms, scheduling versus allocation techniques, and homogeneous versus heterogeneous environments.
Chapters 8-12 cover various programming languages and environments. In chapter 8, we study an SIMD data-parallel programming paradigm in which parallelism is exploited by applying simultaneous operations across large sets of data. We present a machine-independent language called Parallaxis.
Chapter 9 illustrates the parallel virtual machine (PVM) programming system. It shows how to write programs on a network of heterogeneous machines.
Chapters 10 and 11 cover the message-passing interface (MPI) standard in which portable distributed parallel programs can be developed. The extensions to the MPI standard (called MPI-2) are covered in chapter 11.
Finally, chapter 12 gives an overview of the Java programming language and shows how Java can be used to write distributed and parallel programs. We show how to use Java threads to implement internal parallelism (tightly coupled multiprocessors) and sockets to implement external parallelism (loosely coupled Internet-based network of processors).
DESCRIPTION
Distributed and Parallel Computing is a comprehensive survey of the state-of-the-art in concurrent computing. It covers four major aspects:
- Architecture and performance
- Theory and complexity analysis of parallel algorithms
- Programming languages and systems for writing parallel and distributed programs
- Scheduling of parallel and distributed tasks
Many books on parallel computing have been published during the last 10 years or so. Most are already outdated since the themes and technologies in this area are changing very rapidly. Particularly, the notion that parallel and distributed computing are two separate fields is now beginning to fade away; technological advances have been bridging the gap. Distributed and Parallel Computing is unique in recognizing how distributed systems are utlized for parallel computations. The coverage and the contents represent the state of the art.
Distributed and Parallel Computing is designed for computer science and engineering students as well as professionals in the field (programmers, managers, technologists).
