-
Notifications
You must be signed in to change notification settings - Fork 84
Description
Hi there,
I was running cuBERT_benchmark.py
and noticed that CuBERT does not utilize all threads when using multiple CPUs (even when setting MKL_NUM_THREADS and OMP_NUM_THREADS). It seems that only CPU#1 is fully utilized in my case, while CPU#2 is almost idle (see attached image). Is there a reason for this behaviour?
I compared by running TF-BERT and it utilizes all threads of both CPUs.
Also, I am trying to use CuBERT in another application where I use multi-processing as well. Is it possible that python's multiprocessing is interfering with CuBERT's multi-threading? Somehow CuBERT is running slower in this application (and it utilizes only some threads totally irregularly) than TF-BERT, while it's faster when I run the benchmark.
Thanks for your help