Physically each processor in a computer share access to the same RAM and the threads running on the processors interact with each other via shared variables in the common address space of a single process
-
Architecture
-
UMA
- Uniform Memory Access
- Data access time from the memory is same for each processor
-
-
Methodology
-
Multi-threaded Programming
- Multiple threads are created which execute a piece of code concurrently
- Can be achieved using Fork System Call in Linux
-
-
Comparison with Message Passing Model
- Message Passing requires effort to introduce parallelism and the network traffic is usually send sequentially
- Incremental Parallelism is not supported in Message Passing
- Shared Memory model is more better in terms of performance
OpenMP
Shared Memory Programming is implemented through OpenMP
- Open Multi Processing
- API for writing multithreaded applications
Implementation
-
Available in C / C++ under the ==Pragma== directive
-
Pragma stands for Pragmatic Information
-
A way for programmer to communicate with the compiler
-
Compiler is free to ignore Pregma
-
Example
the
for
above instructs the compiler to work with the C / C++ for loop
-
Clauses and their Functions:
-
parallel
:-
Function: Indicates that the following block of code should be executed in parallel.
-
Syntax:
-
-
for
:-
Function: Splits the iterations of a loop among the available threads.
-
Syntax:
-
-
num_threads
:-
Function: Specifies the number of threads to be used for parallel execution.
-
Syntax:
-
-
shared
:-
Function: Shares variables among all threads. Any modification by one thread affects the value seen by other threads.
-
Syntax:
-
-
private
:-
Function: Specifies that each thread should have its own private instance of the variable.
-
Syntax:
-
-
reduction
:-
Function: Performs a reduction operation on a variable across all threads. Commonly used for summing or finding the maximum of a set of values.
-
Syntax:
-
-
critical
:-
Function: Allows only one thread at a time to execute a specific block of code.
-
Syntax:
-
-
ordered
:-
Function: Ensures that the iterations of the loop execute in the order of their index.
-
Syntax:
-
-
schedule
:-
Function: Specifies how iterations of a loop should be divided among threads.
-
Syntax:
-
Where,
static | dynamic | guided | auto | runtime
is the scheduling type, andchunk_size
(optional) is the number of iterations each thread should handle at a time -
The
schedule
clause in OpenMP is used to control how iterations of a loop are divided among the threads in a parallel region. It allows you to specify the way in which work is distributed among threads, which can have an impact on performance and load balancing. -
Types of Scheduling:
-
Static Scheduling (
schedule(static[, chunk_size])
)::-
In static scheduling, iterations of the loop are divided into equal-sized chunks at compile time and distributed among the threads.
-
Each thread is assigned a chunk of iterations to work on.
-
If
chunk_size
is not specified, the iterations are evenly divided among the threads. -
Example:
-
In the example, if
n
is 100 and you have 4 threads, each thread would handle 25 iterations (assuming nochunk_size
specified).
-
-
Dynamic Scheduling (
schedule(dynamic[, chunk_size])
)::-
In dynamic scheduling, iterations are distributed dynamically among the threads at runtime.
-
Each thread takes a chunk of iterations to work on, and when it finishes, it requests more work until all iterations are complete.
-
The
chunk_size
specifies the number of iterations to be assigned to each thread at a time. -
Example:
-
In this example, each thread initially gets 10 iterations to work on. When a thread finishes its work, it will request more iterations until all are done.
-
-
Guided Scheduling (
schedule(guided[, chunk_size])
)::-
Guided scheduling is similar to dynamic scheduling, but the chunk size decreases over time.
-
The first chunk size is
chunk_size
, and subsequent chunks are smaller. -
This can be useful for tasks where the amount of work per iteration varies widely.
-
Example:
-
In this example, the initial chunk size is 20, and it decreases as iterations are completed.
-
-
Auto Scheduling (
schedule(auto)
)::-
The
auto
scheduling option allows the compiler to decide the best scheduling type based on heuristics. -
It aims to balance workload among threads and reduce scheduling overhead.
-
Example:
-
The actual scheduling type chosen by the compiler may vary depending on the system and workload.
-
-
Runtime Scheduling (
schedule(runtime)
)::-
With
schedule(runtime)
, the scheduling type and chunk size are determined at runtime through environment variables or API calls. -
This provides flexibility to change the scheduling behavior without recompiling the code.
-
Example:
-
The scheduling type and chunk size can be set using environment variables like
OMP_SCHEDULE
.
-
Summary::
- Static: Divide iterations into equal-sized chunks at compile time.
- Dynamic: Distribute iterations dynamically at runtime, with a specified chunk size.
- Guided: Decrease chunk size over time, useful for varying workloads.
- Auto: Let the compiler decide the best scheduling type.
- Runtime: Determine scheduling at runtime using environment variables or API calls.
-
-
-
nowait
:-
Function: Specifies that threads do not need to wait for the completion of the loop before proceeding.
-
Syntax:
-
These are just a few of the many clauses available in OpenMP for C++. They allow you to control the behaviour of parallel sections of code, specifying things like how many threads to use, how to handle shared data, and how to split work among threads. Remember to always check the OpenMP documentation for the most up-to-date information and for additional clauses and options.
- Execution through it can be turned under the C / C++ preferences Visual Studio