Nanos++ Copies and Directory System
The copies and directory system of the Nanos++ runtime provide support for executing work pieces in devices that have a separate address space (accelerators). The program provides the runtime information about the data used and produced when a piece of work is executed in an accelerator. This data is known as CopyData (copies) and represents a part of the host's memory that will be accessed for a given task (WorkDescriptor).
Each WorkDescriptor has a Directory that defines a memory scope in which its children will register their copies. The Directory tracks the uses of all copies and controls in which caches are they, where is the newest version and if it is dirty. When a WorkDescriptor is picked by the scheduler for execution, its copies have to be taken to the device. This is done by the Accelerator (the ProcessingElement in which the WorkDescriptor will be run). The data is allocated, the input copies are performed (from the host's memory to de device's) and, after execution, data is copied back.
A Device represents the architecture-dependent behavior of the accelerator. It implements an interface to allocate, free and copy data from the host to the accelerator and vice-versa. Which is used by the cache. Each device's memory has an associated cache (through the Accelerator) that controls all copies to the memory of the device for all WorkDescriptors executed in that Accelerator. Caches can have different policies (WriteThrough or WriteBack) and provide support for asyncrhonous copies. The control of the copies through the cache allow us to implement optimizations like pre-fetching, data reuse and overlapping computation and communication.
Future extensions in the Cache and Directory System will offer services to allow other optimizations like locality-aware scheduling by providing useful information to the schedulers about data location in the caches.
- lmartinell's blog
- Add new comment
- 906 reads