–mp_keep will store the compiler-generated subroutines in the following file name:Ĭheck the LOCAL variables when the code runs correctly as a single process but fails when multiprocessed.
Once you have narrowed the bug to a single file, use –g –mp_keep to save debugging information and to save the file containing the multiprocessed DO loop Fortran code that has been moved to a subroutine. If the loop cannot be multiprocessed, changing the order frequently causes the single-processor version to fail, and standard single-process debugging techniques can be used to find the problem. If the loop can be multiprocessed, then the iterations can execute in any order and produce the same answer. Ideally, try to reduce the problem to a single C$DOACROSS loop.īefore debugging a multiprocessed program, change the order of the iterations on the parallel DO loop on a single-processor version. Try to isolate the problem as much as possible. For this reason, do as much debugging as possible on the single-processor version.
“Parallel Programming Exercise” contains several examples of profiling output and how to use the information it provides.ĭebugging a multiprocessed program is much harder than debugging a single-processor program. This gives a rough estimate of how parallel the program is. The less time they wait, the more time they work. Slave threads wait for work in the routine mp_slave_wait_for_work. The mp_simple_sched routine is the synchronizer and controller. In addition to the loops, the profile shows the special routines that actually do the multiprocessing. Comparing the amount of time spent in each loop by the various threads shows how well the workload is balanced. Each of these loops is shown as a separate procedure in the profile. As mentioned in “Analyzing Data Dependencies for Multiprocessing”, to produce a parallel program, the compiler pulls the parallel DO loops out into separate subroutines, one routine for each loop. The profile of a Fortran parallel job is different from a standard profile. The standard profile analyzer prof(1) can be used to examine this output. On jobs that use multiple threads, both these methods will create multiple profile data files, one for each thread.
Both pixie(1) and pc-sample profiling can be used. IRIX provides profiling tools that can be used on Fortran parallel programs. Good execution profiles of the program are crucial to help you focus on the loops consuming the most time.
MP_SET_NUMTHREADS can be an integer from 1 to 16.Īfter converting a program, you need to examine execution profiles to judge the effectiveness of the transformation. If it is set, Fortran tasks will use the specified number of execution threads regardless of the number of processors physically present on the machine. The default can be overridden by setting the shell environment variable MP_SET_NUMTHREADS. The default is to use the number of processors that are on the machine (the value returned by the system call sysmp(MP_NAPROCS) see the sysmp(2) man page). This determination occurs each time the task starts the number of threads is not compiled into the code. When an executable has been linked with –mp, the Fortran initialization routines determine how many parallel threads of execution to create. Creating multiple execution threads, running and synchronizing them, and task terminating are all handled automatically. The file bellman.o need not have been compiled with the –mp flag (although it could have been).Īfter linking, the resulting executable can be run like any standard executable. Here, the –mp flag signals the linker to use the Fortran multiprocessing library. A standard snark.o binary is produced, which must be linked: % f77 –mp –o boojum snark.o bellman. The Fortran routines in the file snark.f are compiled with multiprocess code generation enabled. The following command line % f77 –mp foo.fĬompiles and links the Fortran program foo.f into a multiprocessor executable. This section steps you through a few examples of compiling code using –mp.