Complicated Programming Models
While most computers have one or two processors, high-performance computing often uses computers with many more. And as hardware prices decrease, this model is starting to become more widespread. Having more than one processor means that the threads model in Figure 197 changes to something similar to that shown in Figure 198.
Figure 198, Four-Processor Computer
This figure shows four cores in one computer, each of which has three threads. (Only four cores are shown even though many more could be on a chip.) This architecture is an extension to the model that links more than one computer together. Its advantage is that the processor doesn’t need to communicate with other processors over a network as it is completely self-contained.
The next step is to join many multi-processor computers together. Figure 199 shows five computers, each with four ­processors, with each processsor running three threads. If this figure shows the execution of one program, then the program is using 60 threads.
 
Figure 199, Four Processors on a Network
This figure depicts only processors and threads. It doesn’t have any information about the nature of the programs and threads or even whether the programs are copies of one another or represent different executables.
At any time, it is next to impossible to guess which threads are executing and what a thread is actually doing. Even more complex, many multi-processor programs begin by invoking a process such as mpirun or IBM poe, whose function is to distribute and control the work being performed. In this kind of environment, a program is using another program to control the workflow across processors.
In this model, traditional debuggers and solutions don’t work. TotalView, on the other hand, organizes this mass of executing procedures for you, distinguishing between threads and processes that the operating system uses from those that your program uses.