Memory Management
TL;DR:
The only functions you need to call from the outside to handle dynamic memory are:
//! Obtain gMemoryPtr to dynamic memory based on the name:
template<bool onDevice> static gMemoryPtr<onDevice> MemoryManagement::getMemAt(const std::string& name, size_t size = 0);
//! Print a summary of the current state of the dynamic memory:
static void MemoryManagement::memorySummary();
//! Retrieve or modify the content of the dynamic memory allocation:
template<class floatT> __device__ __host__ inline void MemoryAccessor::setValue(const size_t isite, const floatT value);
template<class floatT> __device__ __host__ inline void MemoryAccessor::getValue(const size_t isite, floatT &value)
//! Manipulate the gMemory object itself via gMemoryPtr->... :
template<class T> bool MemoryManagement::gMemory::adjustSize(size_t size)
void MemoryManagement::gMemory::swap(gMemoryPtr<onDevice> &src);
void MemoryManagement::gMemory::memset(int value);
template<bool onDeviceSrc> void MemoryManagement::gMemory::copyFrom(const gMemoryPtr<onDeviceSrc> &src, size_t sizeInBytes, size_t offsetSelf = 0, size_t offsetSrc = 0);
bool MemoryManagement::gMemory::adjustSize(size_t sizeBytes);
size_t MemoryManagement::gMemory::getSize() const;
...
Motivation for making our own pointer class
The memory demands for lattice calculations can increase with increasing lattice size in a nontrivial way, depending on the algorithm being used. To help avoid surpassing hardware limitations, we would like a way to conveniently keep track of all dynamically allocated memory at any given time. Additionally to avoid memory leaks, dynamically allocated memory should be automatically deallocated when appropriate. Since our code uses GPUs, one needs that memory allocation and deallocation are handled through the appropriate API depending on the hardware. Finally, dynamic memory allocation and deallocation can take a non-negligible amount of time, so one can gain a performance boost by allowing a large chunk of temporary to be shared between multiple objects.
How it works
The idea of the MemoryManagement
class is to have a central object that manipulates and knows about all dynamic memory allocated in the code.
The MemoryManagement
is never explicitly instantiated. We allow only certain methods
of this class to be callable by the user by making them static.
It works as follows: Each instance of dynamic memory is enclosed by a gMemory
object and referenced to the user through one or more corresponding gMemoryPtr
objects.
The content of the dynamic memory can only be accessed through the MemoryAccessor
static class, and the memory object
itself can be manipulated through its public member functions which can be accessed by dereferencing the corresponding gMemoryPtr
’s.
A gMemory
object contains the raw pointer and the size in bytes of the allocated memory, as well as wrappers for GPU functions
needed to allocate memory on GPUs, and functions for copying, swapping, and resizing. From the outside you only interact with these
objects via gMemoryPtr
’s. The MemoryManagement
class is the only thing in the code allowed to create gMemory
objects directly,
and we should strive not to use any other kind of dynamically allocated memory when we code. If we allocate our own dynamic memory
independent of the MemoryManagement
, then it does not know about it, which defeats part of the purpose.
Within the MemoryManagement
, gMemory
objects are stored in containers, which are implemented through std::map
. Using std::map
we associate to each gMemory
object a name
(std::string
). (For those of you familiar with python, we essentially have a dictionary
where the keys are the names and the values are the gMemory
objects.) There are separate containers for the device and host.
The MemoryManagement
enforces that if getMemAt(name)
is called a second time with the same name
, that the returned gMemoryPtr
will point to its own gMemory
, separate from the first time getMemAt(name)
was called. (Hence we call this a SmartName
.)
Internally the name is appended with
a unique “tag” (just a number starting at zero). For some very basic examples how this works, please read through main_memManTest.cpp
in the src/testing
folder and compile+run it to see the output of MemoryManagement::memorySummary()
.
If the name begins with “SHARED_
” it will not append a tag and refer to the same memory every time you call getMemAt
.
It will then only change the dynamic memory allocation when the size needs to be increased. Many dynamic memory allocations
of the code base are shared by default (for example halo buffers and lattice containers).
The gMemoryPtr
objects let you interact with the gMemory
of a specific name and can be used just like real pointers.
gMemoryPtr
objects are special in that they comply with all this name/container functionality. Again, you will never interact
with a gMemory
object directly in your code; instead you will interact with (one of) its associated gMemoryPtr
.
The MemoryManagement
keeps track of how many gMemoryPtr
to any given gMemory
are alive. Once every gMemoryPtr
that
points to a specific gMemory
object is destroyed, the gMemory
itself will be destroyed and the dynamic memory freed.
In this way we don’t need to keep track of the dynamic memory ourselves, and no memory leaks should occur.
Here is an example of how you can allocate and manipulate dynamic memory on GPUs:
/// Allocate some memory; call its pointer mem_1, and label it DescriptiveName
gMemoryPtr<true> mem_1 = MemoryManagement::getMemAt<true>("DescriptiveName");
std::cout << mem_1->getSize() << std::endl;
/// Change the size of the memory to which it points to 2024 bytes.
mem_1->adjustSize(2024); //! This calls a public method of the gMemory object
std::cout << mem_1->getSize() << std::endl;
/// Copy construct another gMemoryPtr:
gMemoryPtr<true> mem_2 = mem_1;
std::cout << mem_2->getSize() << std::endl;
The output will be as follows:
0
2024
2024