I saw the description of "
dataset:
We will extract the data from raw dataset
and store them in the disk buffer by producer
When training, we will read the data
randomly from the buffer by consumer
The producer will replace the data which has been
read by the consumer with new data
The path to the buffer (at least 400GB)
buf_path: /mnt/data/yinzhenhan/buffer
The number of chunks in the buffer
" in your config/hrdt_pretrain.yaml file. However, in the code, it seems that you didn't use it but directly obtained it in real time.
I saw the description of "
dataset:
We will extract the data from raw dataset
and store them in the disk buffer by producer
When training, we will read the data
randomly from the buffer by consumer
The producer will replace the data which has been
read by the consumer with new data
The path to the buffer (at least 400GB)
buf_path: /mnt/data/yinzhenhan/buffer
The number of chunks in the buffer
" in your config/hrdt_pretrain.yaml file. However, in the code, it seems that you didn't use it but directly obtained it in real time.