Hello, thanks for the great paper.
With the ResNet version of SparK using sparse convolution and sparse batch normalization together, the flow and mixing of global semantic information is heavily restricted due to effective masking on the receptive field caused by sparse operations and lack of global channel interaction with batch norm. It seems like this information will struggle to propagate especially in more shallow networks with lower receptive field like ResNet50. In the paper it was empirically shown that ResNet50 benefited the least from SparK, failing to match the performance of supervised ResNet101. I was wonder if the authors or anyone else tried using sparse group normalization with ResNet so there would be some global interaction of feature channels to better allow the learning of high level features. Masked autoencoder pretraining has shown alot of promise for data limited tasks in medical imaging and ResNet50 is commonly used by practitioners, so understanding how to most effectively use SparK pretraining has big implications for many in the field.
Hello, thanks for the great paper.
With the ResNet version of SparK using sparse convolution and sparse batch normalization together, the flow and mixing of global semantic information is heavily restricted due to effective masking on the receptive field caused by sparse operations and lack of global channel interaction with batch norm. It seems like this information will struggle to propagate especially in more shallow networks with lower receptive field like ResNet50. In the paper it was empirically shown that ResNet50 benefited the least from SparK, failing to match the performance of supervised ResNet101. I was wonder if the authors or anyone else tried using sparse group normalization with ResNet so there would be some global interaction of feature channels to better allow the learning of high level features. Masked autoencoder pretraining has shown alot of promise for data limited tasks in medical imaging and ResNet50 is commonly used by practitioners, so understanding how to most effectively use SparK pretraining has big implications for many in the field.