Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Defer buffer release when no host memory to be updated #6837

Merged
merged 68 commits into from
Dec 8, 2022

Conversation

KseniyaTikhomirova
Copy link
Contributor

@KseniyaTikhomirova KseniyaTikhomirova commented Sep 21, 2022

SYCL2020 4.7.2.3. Buffer synchronization rules states that "A buffer can be constructed from a range (and without a hostData pointer). The memory management for this type of buffer is entirely handled by the SYCL system. The destructor for this type of buffer does not need to block, even if work on the buffer has not completed. Instead, the SYCL system frees any storage required for the buffer asynchronously when it is no longer in use in queues."
This commit implements this behavior for sycl::buffer.

This feature introduced more resources to be released in the end of program if there was no chance to release them earlier. This commit implements WA of known issues with global object destruction based on thread_local usage, thread_local variables destroy earlier than global variables that allow us to do release resources earlier.

Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
sycl/CMakeLists.txt Outdated Show resolved Hide resolved
sycl/unittests/buffer/CMakeLists.txt Outdated Show resolved Hide resolved
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
@KseniyaTikhomirova KseniyaTikhomirova marked this pull request as ready for review October 3, 2022 20:18
@KseniyaTikhomirova KseniyaTikhomirova requested a review from a team as a code owner October 3, 2022 20:18
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
std::unique_ptr<ResourceHandler> &MObj;
static std::atomic_bool MReleaseCalled;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify why this is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it was used to handle the following case:
main thread exits and MCounter becomes equal to 0, we call release resources and start joining thread pool threads. They pass check !MCounter and MObj since scheduler is still alive and call release resources again which has no sense.
Although, after your question I think that I may sort it out without extra variable mReleaseCalled and do it like this:
if (!MIncrementCounter)
return; //no actions at all for thread pool threads
MCounter--;
if (!MCounter && MObj)
MObj->releaseResources();

Will update patch shortly.

Signed-off-by: Tikhomirova, Kseniya <[email protected]>
sycl/plugins/hip/pi_hip.cpp Outdated Show resolved Hide resolved
sycl/source/detail/event_impl.cpp Outdated Show resolved Hide resolved
Comment on lines 36 to 38
// MObj and MReleaseCalled is extra protection needed to handle case when main
// thread finished but thread_pool is still running and we will join that
// threads in releaseResources call.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// MObj and MReleaseCalled is extra protection needed to handle case when main
// thread finished but thread_pool is still running and we will join that
// threads in releaseResources call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 71e9048

@@ -47,7 +80,24 @@ T &GlobalHandler::getOrCreate(InstWithLock<T> &IWL, Types... Args) {
return *IWL.Inst;
}

Scheduler &GlobalHandler::getScheduler() { return getOrCreate(MScheduler); }
void GlobalHandler::attachScheduler(Scheduler *Scheduler) {
// The method is for testing purposes. Do not protect with lock since
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The method is for testing purposes. Do not protect with lock since
// The method is used in unittests only. Do not protect with lock since

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-) fixed in 71e9048

@@ -141,9 +191,18 @@ void GlobalHandler::unloadPlugins() {
GlobalHandler::instance().getPlugins().clear();
}

void GlobalHandler::drainThreadPool() {
if (MHostTaskThreadPool.Inst)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we lock MHostTaskThreadPool.Lock here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me if my understanding is wrong but I thought that lock in InstWithLock exists to protect during getOrCreate call to avoid data race for object creation. Drain call is done in releaseResources called on program exit and do not introduce any data races related to it.

@@ -444,9 +450,13 @@ class Scheduler {
const QueueImplPtr &getDefaultHostQueue() const { return DefaultHostQueue; }

static MemObjRecord *getMemObjRecord(const Requirement *const Req);
// Virtual for testing purposes only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Virtual for testing purposes only
// Virtual for testing purposes only

Is it still relevant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, fixed in 71e9048


Scheduler();
~Scheduler();
void releaseResources();
inline bool isDeferredMemObjectsEmpty();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify why inline is needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 71e9048
you are right, compiler will decide

@@ -91,7 +91,7 @@ void SYCLMemObjT::updateHostMemory() {
// If we're attached to a memory record, process the deletion of the memory
// record. We may get detached before we do this.
if (MRecord)
Scheduler::getInstance().removeMemoryObject(this);
assert(Scheduler::getInstance().removeMemoryObject(this));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, do not put assert around this call because in this case removeMemoryObject is not called in the build without asserts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 71e9048

@@ -30,13 +30,16 @@ class ThreadPool {
std::mutex MJobQueueMutex;
std::condition_variable MDoSmthOrStop;
std::atomic_bool MStop;
std::atomic_uint MJobsInExecution;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::atomic_uint MJobsInExecution;
std::atomic_uint MNumOfJobs;

Or maybe MJobsInPool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 71e9048

std::unique_lock<std::mutex> Lock(MJobQueueMutex);

std::thread::id ThisThreadId = std::this_thread::get_id();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::thread::id ThisThreadId = std::this_thread::get_id();

It seems this line is not needed anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 71e9048

@@ -91,7 +102,7 @@ class ThreadPool {
std::lock_guard<std::mutex> Lock(MJobQueueMutex);
MJobQueue.emplace(Func);
}

MJobsInExecution++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the counter be incremented in the another version of submit as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, missed that we have two versions and got hang(
fixed in 06e2608

Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Copy link
Contributor

@romanovvlad romanovvlad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

assert(
Result &&
"removeMemoryObject should not return false in mem object destructor");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be a warning saying that Result is unused. And will turn into an error when building with -werror.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, fixed in ceea7f8

@KseniyaTikhomirova
Copy link
Contributor Author

HIP fails with UNRESOLVED is known issue and reported here #7634

@romanovvlad romanovvlad merged commit 894ce25 into intel:sycl Dec 8, 2022
KseniyaTikhomirova added a commit to KseniyaTikhomirova/llvm that referenced this pull request Dec 8, 2022
steffenlarsen pushed a commit that referenced this pull request Dec 8, 2022
@KseniyaTikhomirova
Copy link
Contributor Author

the second fix for post commit (win symbols) #7705

pvchupin pushed a commit that referenced this pull request Dec 8, 2022
KseniyaTikhomirova added a commit to KseniyaTikhomirova/llvm that referenced this pull request Jan 3, 2023
bader pushed a commit that referenced this pull request Jan 6, 2023
aelovikov-intel added a commit to aelovikov-intel/llvm that referenced this pull request Jan 29, 2024
intel#6837 enabled asynchronous
buffer destruction for buffers constructed without host data. However,
initial fallback assert implementation in
intel#3767 predates it and as such had to
place the buffer inside `queue_impl` to avoid unintended synchronization
point. I don't know if there was the same crash observed on the
end-to-end test added as part of this PR prior to
intel#3767, but it doesn't even matter
because the "new" implementation is both simpler and doesn't result in a
crash.

I suspect that without it (with the buffer for fallback assert
implementation being a data member of `sycl::queue_impl`) we had a
cyclic dependency somewhere leading to resource leak and ultimately to
the assert in `DeviceGlobalUSMMem::~DeviceGlobalUSMMem()`.
aelovikov-intel added a commit that referenced this pull request Jan 31, 2024
#6837 enabled asynchronous buffer
destruction for buffers constructed without host data. However, initial
fallback assert implementation in
#3767 predates it and as such had to
place the buffer inside `queue_impl` to avoid unintended synchronization
point. I don't know if there was the same crash observed on the
end-to-end test added as part of this PR prior to
#3767, but it doesn't even matter
because the "new" implementation is both simpler and doesn't result in a
crash.

I suspect that without it (with the buffer for fallback assert
implementation being a data member of `sycl::queue_impl`) we had a
cyclic dependency somewhere leading to resource leak and ultimately to
the assert in `DeviceGlobalUSMMem::~DeviceGlobalUSMMem()`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants