Hi, thanks a lot for the great work and the clean implementation.
I have a question regarding how the temperature (logit_scale) is handled when two CLIP losses are computed within a single training step. Specifically, do you use a single shared temperature that is updated by both CLIP losses, or do you maintain separate temperature parameters for each loss?
Thanks