-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathlog_rank0.txt
More file actions
411 lines (411 loc) · 47.5 KB
/
log_rank0.txt
File metadata and controls
411 lines (411 loc) · 47.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
[2024-09-24 07:03:09 ViT-B/16] (main.py 276): INFO working dir: ./
[2024-09-24 07:03:09 ViT-B/16] (main.py 280): INFO AUG:
COLOR_JITTER: 0.8
CUTMIX: 1.0
GRAY_SCALE: 0.2
LABEL_SMOOTH: 0.1
MIXUP: 0.8
MIXUP_SWITCH_PROB: 0.5
BASE: ['']
DATA:
DATASET: fit101
INPUT_SIZE: 224
LABEL_LIST: xclip/data/videos/labels.csv
NUM_CLASSES: 5
NUM_FRAMES: 8
ROOT: xclip/data/videos
TRAIN_FILE: xclip/data/videos/train.txt
VAL_FILE: xclip/data/videos/val.txt
LOCAL_RANK: 0
MODEL:
ARCH: ViT-B/16
DROP_PATH_RATE: 0.0
FIX_TEXT: True
PRETRAINED: None
RESUME: None
OUTPUT: ./
PRINT_FREQ: 50
SAVE_FREQ: 10
SEED: 1024
TEST:
NUM_CLIP: 1
NUM_CROP: 1
ONLY_TEST: False
TRAIN:
ACCUMULATION_STEPS: 4
AUTO_RESUME: False
BATCH_SIZE: 5
EPOCHS: 50
LR: 2e-06
LR_SCHEDULER: cosine
OPTIMIZER: adamw
OPT_LEVEL: O1
USE_CHECKPOINT: False
WARMUP_EPOCHS: 5
WEIGHT_DECAY: 0.001
[2024-09-24 07:08:04 ViT-B/16] (main.py 276): INFO working dir: ./
[2024-09-24 07:08:04 ViT-B/16] (main.py 280): INFO AUG:
COLOR_JITTER: 0.8
CUTMIX: 1.0
GRAY_SCALE: 0.2
LABEL_SMOOTH: 0.1
MIXUP: 0.8
MIXUP_SWITCH_PROB: 0.5
BASE: ['']
DATA:
DATASET: fit101
INPUT_SIZE: 224
LABEL_LIST: data/videos/labels.csv
NUM_CLASSES: 5
NUM_FRAMES: 8
ROOT: data/videos
TRAIN_FILE: data/videos/train.txt
VAL_FILE: data/videos/val.txt
LOCAL_RANK: 0
MODEL:
ARCH: ViT-B/16
DROP_PATH_RATE: 0.0
FIX_TEXT: True
PRETRAINED: None
RESUME: None
OUTPUT: ./
PRINT_FREQ: 50
SAVE_FREQ: 10
SEED: 1024
TEST:
NUM_CLIP: 1
NUM_CROP: 1
ONLY_TEST: False
TRAIN:
ACCUMULATION_STEPS: 4
AUTO_RESUME: False
BATCH_SIZE: 5
EPOCHS: 50
LR: 2e-06
LR_SCHEDULER: cosine
OPTIMIZER: adamw
OPT_LEVEL: O1
USE_CHECKPOINT: False
WARMUP_EPOCHS: 5
WEIGHT_DECAY: 0.001
[2024-09-24 07:16:27 ViT-B/16] (main.py 276): INFO working dir: ./
[2024-09-24 07:16:27 ViT-B/16] (main.py 280): INFO AUG:
COLOR_JITTER: 0.8
CUTMIX: 1.0
GRAY_SCALE: 0.2
LABEL_SMOOTH: 0.1
MIXUP: 0.8
MIXUP_SWITCH_PROB: 0.5
BASE: ['']
DATA:
DATASET: fit101
INPUT_SIZE: 224
LABEL_LIST: data/videos/labels.csv
NUM_CLASSES: 5
NUM_FRAMES: 8
ROOT: data/videos
TRAIN_FILE: data/videos/train.txt
VAL_FILE: data/videos/val.txt
LOCAL_RANK: 0
MODEL:
ARCH: ViT-B/16
DROP_PATH_RATE: 0.0
FIX_TEXT: True
PRETRAINED: None
RESUME: None
OUTPUT: ./
PRINT_FREQ: 50
SAVE_FREQ: 10
SEED: 1024
TEST:
NUM_CLIP: 1
NUM_CROP: 1
ONLY_TEST: False
TRAIN:
ACCUMULATION_STEPS: 4
AUTO_RESUME: False
BATCH_SIZE: 5
EPOCHS: 50
LR: 2e-06
LR_SCHEDULER: cosine
OPTIMIZER: adamw
OPT_LEVEL: O1
USE_CHECKPOINT: False
WARMUP_EPOCHS: 5
WEIGHT_DECAY: 0.001
[2024-09-24 07:16:31 ViT-B/16] (xclip.py 169): INFO attr: Parameter containing:
tensor([[ 0.0135, -0.0061, -0.0123, ..., -0.0247, -0.0076, 0.0007],
[ 0.0461, -0.0247, 0.0471, ..., -0.0038, 0.0997, -0.0373],
[-0.0315, -0.0412, -0.0571, ..., -0.0540, 0.0156, -0.0402],
...,
[-0.0654, 0.0093, -0.0926, ..., -0.0535, 0.0033, -0.0565],
[-0.0405, -0.0076, 0.0250, ..., 0.0361, -0.0558, -0.0060],
[ 0.0184, -0.0079, 0.0142, ..., 0.0654, -0.0140, -0.0554]],
requires_grad=True)
[2024-09-24 07:16:31 ViT-B/16] (xclip.py 169): INFO attr: Parameter containing:
tensor([[ 0.0949, 0.0449, 0.0531, ..., -0.0693, -0.0635, 0.0004],
[-0.0219, 0.0381, 0.0310, ..., 0.0187, 0.0006, 0.0336],
[-0.1189, -0.1230, -0.0177, ..., -0.0302, -0.0377, 0.0074],
...,
[-0.0280, 0.0341, -0.0259, ..., -0.0814, 0.0661, 0.0119],
[-0.0408, 0.0081, 0.0331, ..., -0.0055, -0.0620, -0.0524],
[ 0.0272, 0.0447, -0.0829, ..., 0.0094, 0.0563, -0.0323]],
requires_grad=True)
[2024-09-24 07:16:31 ViT-B/16] (xclip.py 215): INFO load pretrained CLIP: _IncompatibleKeys(missing_keys=['prompts_visual_proj', 'prompts_generator.alpha', 'prompts_generator.norm.weight', 'prompts_generator.norm.bias', 'prompts_generator.decoder.0.cross_attn.q_proj.weight', 'prompts_generator.decoder.0.cross_attn.k_proj.weight', 'prompts_generator.decoder.0.cross_attn.v_proj.weight', 'prompts_generator.decoder.0.cross_attn.proj.weight', 'prompts_generator.decoder.0.cross_attn.proj.bias', 'prompts_generator.decoder.0.norm1.weight', 'prompts_generator.decoder.0.norm1.bias', 'prompts_generator.decoder.0.norm3.weight', 'prompts_generator.decoder.0.norm3.bias', 'prompts_generator.decoder.0.mlp.0.weight', 'prompts_generator.decoder.0.mlp.0.bias', 'prompts_generator.decoder.0.mlp.3.weight', 'prompts_generator.decoder.0.mlp.3.bias', 'prompts_generator.decoder.1.cross_attn.q_proj.weight', 'prompts_generator.decoder.1.cross_attn.k_proj.weight', 'prompts_generator.decoder.1.cross_attn.v_proj.weight', 'prompts_generator.decoder.1.cross_attn.proj.weight', 'prompts_generator.decoder.1.cross_attn.proj.bias', 'prompts_generator.decoder.1.norm1.weight', 'prompts_generator.decoder.1.norm1.bias', 'prompts_generator.decoder.1.norm3.weight', 'prompts_generator.decoder.1.norm3.bias', 'prompts_generator.decoder.1.mlp.0.weight', 'prompts_generator.decoder.1.mlp.0.bias', 'prompts_generator.decoder.1.mlp.3.weight', 'prompts_generator.decoder.1.mlp.3.bias', 'mit.positional_embedding', 'mit.resblocks.0.attn.in_proj_weight', 'mit.resblocks.0.attn.in_proj_bias', 'mit.resblocks.0.attn.out_proj.weight', 'mit.resblocks.0.attn.out_proj.bias', 'mit.resblocks.0.ln_1.weight', 'mit.resblocks.0.ln_1.bias', 'mit.resblocks.0.mlp.c_fc.weight', 'mit.resblocks.0.mlp.c_fc.bias', 'mit.resblocks.0.mlp.c_proj.weight', 'mit.resblocks.0.mlp.c_proj.bias', 'mit.resblocks.0.ln_2.weight', 'mit.resblocks.0.ln_2.bias', 'visual.transformer.resblocks.0.message_fc.weight', 'visual.transformer.resblocks.0.message_fc.bias', 'visual.transformer.resblocks.0.message_ln.weight', 'visual.transformer.resblocks.0.message_ln.bias', 'visual.transformer.resblocks.0.message_attn.in_proj_weight', 'visual.transformer.resblocks.0.message_attn.in_proj_bias', 'visual.transformer.resblocks.0.message_attn.out_proj.weight', 'visual.transformer.resblocks.0.message_attn.out_proj.bias', 'visual.transformer.resblocks.1.message_fc.weight', 'visual.transformer.resblocks.1.message_fc.bias', 'visual.transformer.resblocks.1.message_ln.weight', 'visual.transformer.resblocks.1.message_ln.bias', 'visual.transformer.resblocks.1.message_attn.in_proj_weight', 'visual.transformer.resblocks.1.message_attn.in_proj_bias', 'visual.transformer.resblocks.1.message_attn.out_proj.weight', 'visual.transformer.resblocks.1.message_attn.out_proj.bias', 'visual.transformer.resblocks.2.message_fc.weight', 'visual.transformer.resblocks.2.message_fc.bias', 'visual.transformer.resblocks.2.message_ln.weight', 'visual.transformer.resblocks.2.message_ln.bias', 'visual.transformer.resblocks.2.message_attn.in_proj_weight', 'visual.transformer.resblocks.2.message_attn.in_proj_bias', 'visual.transformer.resblocks.2.message_attn.out_proj.weight', 'visual.transformer.resblocks.2.message_attn.out_proj.bias', 'visual.transformer.resblocks.3.message_fc.weight', 'visual.transformer.resblocks.3.message_fc.bias', 'visual.transformer.resblocks.3.message_ln.weight', 'visual.transformer.resblocks.3.message_ln.bias', 'visual.transformer.resblocks.3.message_attn.in_proj_weight', 'visual.transformer.resblocks.3.message_attn.in_proj_bias', 'visual.transformer.resblocks.3.message_attn.out_proj.weight', 'visual.transformer.resblocks.3.message_attn.out_proj.bias', 'visual.transformer.resblocks.4.message_fc.weight', 'visual.transformer.resblocks.4.message_fc.bias', 'visual.transformer.resblocks.4.message_ln.weight', 'visual.transformer.resblocks.4.message_ln.bias', 'visual.transformer.resblocks.4.message_attn.in_proj_weight', 'visual.transformer.resblocks.4.message_attn.in_proj_bias', 'visual.transformer.resblocks.4.message_attn.out_proj.weight', 'visual.transformer.resblocks.4.message_attn.out_proj.bias', 'visual.transformer.resblocks.5.message_fc.weight', 'visual.transformer.resblocks.5.message_fc.bias', 'visual.transformer.resblocks.5.message_ln.weight', 'visual.transformer.resblocks.5.message_ln.bias', 'visual.transformer.resblocks.5.message_attn.in_proj_weight', 'visual.transformer.resblocks.5.message_attn.in_proj_bias', 'visual.transformer.resblocks.5.message_attn.out_proj.weight', 'visual.transformer.resblocks.5.message_attn.out_proj.bias', 'visual.transformer.resblocks.6.message_fc.weight', 'visual.transformer.resblocks.6.message_fc.bias', 'visual.transformer.resblocks.6.message_ln.weight', 'visual.transformer.resblocks.6.message_ln.bias', 'visual.transformer.resblocks.6.message_attn.in_proj_weight', 'visual.transformer.resblocks.6.message_attn.in_proj_bias', 'visual.transformer.resblocks.6.message_attn.out_proj.weight', 'visual.transformer.resblocks.6.message_attn.out_proj.bias', 'visual.transformer.resblocks.7.message_fc.weight', 'visual.transformer.resblocks.7.message_fc.bias', 'visual.transformer.resblocks.7.message_ln.weight', 'visual.transformer.resblocks.7.message_ln.bias', 'visual.transformer.resblocks.7.message_attn.in_proj_weight', 'visual.transformer.resblocks.7.message_attn.in_proj_bias', 'visual.transformer.resblocks.7.message_attn.out_proj.weight', 'visual.transformer.resblocks.7.message_attn.out_proj.bias', 'visual.transformer.resblocks.8.message_fc.weight', 'visual.transformer.resblocks.8.message_fc.bias', 'visual.transformer.resblocks.8.message_ln.weight', 'visual.transformer.resblocks.8.message_ln.bias', 'visual.transformer.resblocks.8.message_attn.in_proj_weight', 'visual.transformer.resblocks.8.message_attn.in_proj_bias', 'visual.transformer.resblocks.8.message_attn.out_proj.weight', 'visual.transformer.resblocks.8.message_attn.out_proj.bias', 'visual.transformer.resblocks.9.message_fc.weight', 'visual.transformer.resblocks.9.message_fc.bias', 'visual.transformer.resblocks.9.message_ln.weight', 'visual.transformer.resblocks.9.message_ln.bias', 'visual.transformer.resblocks.9.message_attn.in_proj_weight', 'visual.transformer.resblocks.9.message_attn.in_proj_bias', 'visual.transformer.resblocks.9.message_attn.out_proj.weight', 'visual.transformer.resblocks.9.message_attn.out_proj.bias', 'visual.transformer.resblocks.10.message_fc.weight', 'visual.transformer.resblocks.10.message_fc.bias', 'visual.transformer.resblocks.10.message_ln.weight', 'visual.transformer.resblocks.10.message_ln.bias', 'visual.transformer.resblocks.10.message_attn.in_proj_weight', 'visual.transformer.resblocks.10.message_attn.in_proj_bias', 'visual.transformer.resblocks.10.message_attn.out_proj.weight', 'visual.transformer.resblocks.10.message_attn.out_proj.bias', 'visual.transformer.resblocks.11.message_fc.weight', 'visual.transformer.resblocks.11.message_fc.bias', 'visual.transformer.resblocks.11.message_ln.weight', 'visual.transformer.resblocks.11.message_ln.bias', 'visual.transformer.resblocks.11.message_attn.in_proj_weight', 'visual.transformer.resblocks.11.message_attn.in_proj_bias', 'visual.transformer.resblocks.11.message_attn.out_proj.weight', 'visual.transformer.resblocks.11.message_attn.out_proj.bias', 'prompts_visual_ln.weight', 'prompts_visual_ln.bias'], unexpected_keys=[])
[2024-09-24 07:33:58 ViT-B/16] (main.py 276): INFO working dir: ./
[2024-09-24 07:33:58 ViT-B/16] (main.py 280): INFO AUG:
COLOR_JITTER: 0.8
CUTMIX: 1.0
GRAY_SCALE: 0.2
LABEL_SMOOTH: 0.1
MIXUP: 0.8
MIXUP_SWITCH_PROB: 0.5
BASE: ['']
DATA:
DATASET: fit101
INPUT_SIZE: 224
LABEL_LIST: data/videos/labels.csv
NUM_CLASSES: 5
NUM_FRAMES: 8
ROOT: data/videos
TRAIN_FILE: data/videos/train.txt
VAL_FILE: data/videos/val.txt
LOCAL_RANK: 0
MODEL:
ARCH: ViT-B/16
DROP_PATH_RATE: 0.0
FIX_TEXT: True
PRETRAINED: None
RESUME: None
OUTPUT: ./
PRINT_FREQ: 50
SAVE_FREQ: 10
SEED: 1024
TEST:
NUM_CLIP: 1
NUM_CROP: 1
ONLY_TEST: False
TRAIN:
ACCUMULATION_STEPS: 4
AUTO_RESUME: False
BATCH_SIZE: 5
EPOCHS: 50
LR: 2e-06
LR_SCHEDULER: cosine
OPTIMIZER: adamw
OPT_LEVEL: O1
USE_CHECKPOINT: False
WARMUP_EPOCHS: 5
WEIGHT_DECAY: 0.001
[2024-09-24 07:34:02 ViT-B/16] (xclip.py 169): INFO attr: Parameter containing:
tensor([[ 0.0135, -0.0061, -0.0123, ..., -0.0247, -0.0076, 0.0007],
[ 0.0461, -0.0247, 0.0471, ..., -0.0038, 0.0997, -0.0373],
[-0.0315, -0.0412, -0.0571, ..., -0.0540, 0.0156, -0.0402],
...,
[-0.0654, 0.0093, -0.0926, ..., -0.0535, 0.0033, -0.0565],
[-0.0405, -0.0076, 0.0250, ..., 0.0361, -0.0558, -0.0060],
[ 0.0184, -0.0079, 0.0142, ..., 0.0654, -0.0140, -0.0554]],
requires_grad=True)
[2024-09-24 07:34:02 ViT-B/16] (xclip.py 169): INFO attr: Parameter containing:
tensor([[ 0.0949, 0.0449, 0.0531, ..., -0.0693, -0.0635, 0.0004],
[-0.0219, 0.0381, 0.0310, ..., 0.0187, 0.0006, 0.0336],
[-0.1189, -0.1230, -0.0177, ..., -0.0302, -0.0377, 0.0074],
...,
[-0.0280, 0.0341, -0.0259, ..., -0.0814, 0.0661, 0.0119],
[-0.0408, 0.0081, 0.0331, ..., -0.0055, -0.0620, -0.0524],
[ 0.0272, 0.0447, -0.0829, ..., 0.0094, 0.0563, -0.0323]],
requires_grad=True)
[2024-09-24 07:34:02 ViT-B/16] (xclip.py 215): INFO load pretrained CLIP: _IncompatibleKeys(missing_keys=['prompts_visual_proj', 'prompts_generator.alpha', 'prompts_generator.norm.weight', 'prompts_generator.norm.bias', 'prompts_generator.decoder.0.cross_attn.q_proj.weight', 'prompts_generator.decoder.0.cross_attn.k_proj.weight', 'prompts_generator.decoder.0.cross_attn.v_proj.weight', 'prompts_generator.decoder.0.cross_attn.proj.weight', 'prompts_generator.decoder.0.cross_attn.proj.bias', 'prompts_generator.decoder.0.norm1.weight', 'prompts_generator.decoder.0.norm1.bias', 'prompts_generator.decoder.0.norm3.weight', 'prompts_generator.decoder.0.norm3.bias', 'prompts_generator.decoder.0.mlp.0.weight', 'prompts_generator.decoder.0.mlp.0.bias', 'prompts_generator.decoder.0.mlp.3.weight', 'prompts_generator.decoder.0.mlp.3.bias', 'prompts_generator.decoder.1.cross_attn.q_proj.weight', 'prompts_generator.decoder.1.cross_attn.k_proj.weight', 'prompts_generator.decoder.1.cross_attn.v_proj.weight', 'prompts_generator.decoder.1.cross_attn.proj.weight', 'prompts_generator.decoder.1.cross_attn.proj.bias', 'prompts_generator.decoder.1.norm1.weight', 'prompts_generator.decoder.1.norm1.bias', 'prompts_generator.decoder.1.norm3.weight', 'prompts_generator.decoder.1.norm3.bias', 'prompts_generator.decoder.1.mlp.0.weight', 'prompts_generator.decoder.1.mlp.0.bias', 'prompts_generator.decoder.1.mlp.3.weight', 'prompts_generator.decoder.1.mlp.3.bias', 'mit.positional_embedding', 'mit.resblocks.0.attn.in_proj_weight', 'mit.resblocks.0.attn.in_proj_bias', 'mit.resblocks.0.attn.out_proj.weight', 'mit.resblocks.0.attn.out_proj.bias', 'mit.resblocks.0.ln_1.weight', 'mit.resblocks.0.ln_1.bias', 'mit.resblocks.0.mlp.c_fc.weight', 'mit.resblocks.0.mlp.c_fc.bias', 'mit.resblocks.0.mlp.c_proj.weight', 'mit.resblocks.0.mlp.c_proj.bias', 'mit.resblocks.0.ln_2.weight', 'mit.resblocks.0.ln_2.bias', 'visual.transformer.resblocks.0.message_fc.weight', 'visual.transformer.resblocks.0.message_fc.bias', 'visual.transformer.resblocks.0.message_ln.weight', 'visual.transformer.resblocks.0.message_ln.bias', 'visual.transformer.resblocks.0.message_attn.in_proj_weight', 'visual.transformer.resblocks.0.message_attn.in_proj_bias', 'visual.transformer.resblocks.0.message_attn.out_proj.weight', 'visual.transformer.resblocks.0.message_attn.out_proj.bias', 'visual.transformer.resblocks.1.message_fc.weight', 'visual.transformer.resblocks.1.message_fc.bias', 'visual.transformer.resblocks.1.message_ln.weight', 'visual.transformer.resblocks.1.message_ln.bias', 'visual.transformer.resblocks.1.message_attn.in_proj_weight', 'visual.transformer.resblocks.1.message_attn.in_proj_bias', 'visual.transformer.resblocks.1.message_attn.out_proj.weight', 'visual.transformer.resblocks.1.message_attn.out_proj.bias', 'visual.transformer.resblocks.2.message_fc.weight', 'visual.transformer.resblocks.2.message_fc.bias', 'visual.transformer.resblocks.2.message_ln.weight', 'visual.transformer.resblocks.2.message_ln.bias', 'visual.transformer.resblocks.2.message_attn.in_proj_weight', 'visual.transformer.resblocks.2.message_attn.in_proj_bias', 'visual.transformer.resblocks.2.message_attn.out_proj.weight', 'visual.transformer.resblocks.2.message_attn.out_proj.bias', 'visual.transformer.resblocks.3.message_fc.weight', 'visual.transformer.resblocks.3.message_fc.bias', 'visual.transformer.resblocks.3.message_ln.weight', 'visual.transformer.resblocks.3.message_ln.bias', 'visual.transformer.resblocks.3.message_attn.in_proj_weight', 'visual.transformer.resblocks.3.message_attn.in_proj_bias', 'visual.transformer.resblocks.3.message_attn.out_proj.weight', 'visual.transformer.resblocks.3.message_attn.out_proj.bias', 'visual.transformer.resblocks.4.message_fc.weight', 'visual.transformer.resblocks.4.message_fc.bias', 'visual.transformer.resblocks.4.message_ln.weight', 'visual.transformer.resblocks.4.message_ln.bias', 'visual.transformer.resblocks.4.message_attn.in_proj_weight', 'visual.transformer.resblocks.4.message_attn.in_proj_bias', 'visual.transformer.resblocks.4.message_attn.out_proj.weight', 'visual.transformer.resblocks.4.message_attn.out_proj.bias', 'visual.transformer.resblocks.5.message_fc.weight', 'visual.transformer.resblocks.5.message_fc.bias', 'visual.transformer.resblocks.5.message_ln.weight', 'visual.transformer.resblocks.5.message_ln.bias', 'visual.transformer.resblocks.5.message_attn.in_proj_weight', 'visual.transformer.resblocks.5.message_attn.in_proj_bias', 'visual.transformer.resblocks.5.message_attn.out_proj.weight', 'visual.transformer.resblocks.5.message_attn.out_proj.bias', 'visual.transformer.resblocks.6.message_fc.weight', 'visual.transformer.resblocks.6.message_fc.bias', 'visual.transformer.resblocks.6.message_ln.weight', 'visual.transformer.resblocks.6.message_ln.bias', 'visual.transformer.resblocks.6.message_attn.in_proj_weight', 'visual.transformer.resblocks.6.message_attn.in_proj_bias', 'visual.transformer.resblocks.6.message_attn.out_proj.weight', 'visual.transformer.resblocks.6.message_attn.out_proj.bias', 'visual.transformer.resblocks.7.message_fc.weight', 'visual.transformer.resblocks.7.message_fc.bias', 'visual.transformer.resblocks.7.message_ln.weight', 'visual.transformer.resblocks.7.message_ln.bias', 'visual.transformer.resblocks.7.message_attn.in_proj_weight', 'visual.transformer.resblocks.7.message_attn.in_proj_bias', 'visual.transformer.resblocks.7.message_attn.out_proj.weight', 'visual.transformer.resblocks.7.message_attn.out_proj.bias', 'visual.transformer.resblocks.8.message_fc.weight', 'visual.transformer.resblocks.8.message_fc.bias', 'visual.transformer.resblocks.8.message_ln.weight', 'visual.transformer.resblocks.8.message_ln.bias', 'visual.transformer.resblocks.8.message_attn.in_proj_weight', 'visual.transformer.resblocks.8.message_attn.in_proj_bias', 'visual.transformer.resblocks.8.message_attn.out_proj.weight', 'visual.transformer.resblocks.8.message_attn.out_proj.bias', 'visual.transformer.resblocks.9.message_fc.weight', 'visual.transformer.resblocks.9.message_fc.bias', 'visual.transformer.resblocks.9.message_ln.weight', 'visual.transformer.resblocks.9.message_ln.bias', 'visual.transformer.resblocks.9.message_attn.in_proj_weight', 'visual.transformer.resblocks.9.message_attn.in_proj_bias', 'visual.transformer.resblocks.9.message_attn.out_proj.weight', 'visual.transformer.resblocks.9.message_attn.out_proj.bias', 'visual.transformer.resblocks.10.message_fc.weight', 'visual.transformer.resblocks.10.message_fc.bias', 'visual.transformer.resblocks.10.message_ln.weight', 'visual.transformer.resblocks.10.message_ln.bias', 'visual.transformer.resblocks.10.message_attn.in_proj_weight', 'visual.transformer.resblocks.10.message_attn.in_proj_bias', 'visual.transformer.resblocks.10.message_attn.out_proj.weight', 'visual.transformer.resblocks.10.message_attn.out_proj.bias', 'visual.transformer.resblocks.11.message_fc.weight', 'visual.transformer.resblocks.11.message_fc.bias', 'visual.transformer.resblocks.11.message_ln.weight', 'visual.transformer.resblocks.11.message_ln.bias', 'visual.transformer.resblocks.11.message_attn.in_proj_weight', 'visual.transformer.resblocks.11.message_attn.in_proj_bias', 'visual.transformer.resblocks.11.message_attn.out_proj.weight', 'visual.transformer.resblocks.11.message_attn.out_proj.bias', 'prompts_visual_ln.weight', 'prompts_visual_ln.bias'], unexpected_keys=[])
[2024-09-24 09:35:18 ViT-B/16] (main.py 280): INFO working dir: ./
[2024-09-24 09:35:18 ViT-B/16] (main.py 284): INFO AUG:
COLOR_JITTER: 0.8
CUTMIX: 1.0
GRAY_SCALE: 0.2
LABEL_SMOOTH: 0.1
MIXUP: 0.8
MIXUP_SWITCH_PROB: 0.5
BASE: ['']
DATA:
DATASET: fit101
INPUT_SIZE: 224
LABEL_LIST: data/videos/labels.csv
NUM_CLASSES: 5
NUM_FRAMES: 8
ROOT: data/videos
TRAIN_FILE: data/videos/train.txt
VAL_FILE: data/videos/val.txt
LOCAL_RANK: 0
MODEL:
ARCH: ViT-B/16
DROP_PATH_RATE: 0.0
FIX_TEXT: True
PRETRAINED: None
RESUME: None
OUTPUT: ./
PRINT_FREQ: 50
SAVE_FREQ: 10
SEED: 1024
TEST:
NUM_CLIP: 1
NUM_CROP: 1
ONLY_TEST: False
TRAIN:
ACCUMULATION_STEPS: 4
AUTO_RESUME: False
BATCH_SIZE: 5
EPOCHS: 50
LR: 2e-06
LR_SCHEDULER: cosine
OPTIMIZER: adamw
OPT_LEVEL: O1
USE_CHECKPOINT: False
WARMUP_EPOCHS: 5
WEIGHT_DECAY: 0.001
[2024-09-24 09:35:22 ViT-B/16] (xclip.py 169): INFO attr: Parameter containing:
tensor([[ 0.0135, -0.0061, -0.0123, ..., -0.0247, -0.0076, 0.0007],
[ 0.0461, -0.0247, 0.0471, ..., -0.0038, 0.0997, -0.0373],
[-0.0315, -0.0412, -0.0571, ..., -0.0540, 0.0156, -0.0402],
...,
[-0.0654, 0.0093, -0.0926, ..., -0.0535, 0.0033, -0.0565],
[-0.0405, -0.0076, 0.0250, ..., 0.0361, -0.0558, -0.0060],
[ 0.0184, -0.0079, 0.0142, ..., 0.0654, -0.0140, -0.0554]],
requires_grad=True)
[2024-09-24 09:35:22 ViT-B/16] (xclip.py 169): INFO attr: Parameter containing:
tensor([[ 0.0949, 0.0449, 0.0531, ..., -0.0693, -0.0635, 0.0004],
[-0.0219, 0.0381, 0.0310, ..., 0.0187, 0.0006, 0.0336],
[-0.1189, -0.1230, -0.0177, ..., -0.0302, -0.0377, 0.0074],
...,
[-0.0280, 0.0341, -0.0259, ..., -0.0814, 0.0661, 0.0119],
[-0.0408, 0.0081, 0.0331, ..., -0.0055, -0.0620, -0.0524],
[ 0.0272, 0.0447, -0.0829, ..., 0.0094, 0.0563, -0.0323]],
requires_grad=True)
[2024-09-24 09:35:22 ViT-B/16] (xclip.py 215): INFO load pretrained CLIP: _IncompatibleKeys(missing_keys=['prompts_visual_proj', 'prompts_generator.alpha', 'prompts_generator.norm.weight', 'prompts_generator.norm.bias', 'prompts_generator.decoder.0.cross_attn.q_proj.weight', 'prompts_generator.decoder.0.cross_attn.k_proj.weight', 'prompts_generator.decoder.0.cross_attn.v_proj.weight', 'prompts_generator.decoder.0.cross_attn.proj.weight', 'prompts_generator.decoder.0.cross_attn.proj.bias', 'prompts_generator.decoder.0.norm1.weight', 'prompts_generator.decoder.0.norm1.bias', 'prompts_generator.decoder.0.norm3.weight', 'prompts_generator.decoder.0.norm3.bias', 'prompts_generator.decoder.0.mlp.0.weight', 'prompts_generator.decoder.0.mlp.0.bias', 'prompts_generator.decoder.0.mlp.3.weight', 'prompts_generator.decoder.0.mlp.3.bias', 'prompts_generator.decoder.1.cross_attn.q_proj.weight', 'prompts_generator.decoder.1.cross_attn.k_proj.weight', 'prompts_generator.decoder.1.cross_attn.v_proj.weight', 'prompts_generator.decoder.1.cross_attn.proj.weight', 'prompts_generator.decoder.1.cross_attn.proj.bias', 'prompts_generator.decoder.1.norm1.weight', 'prompts_generator.decoder.1.norm1.bias', 'prompts_generator.decoder.1.norm3.weight', 'prompts_generator.decoder.1.norm3.bias', 'prompts_generator.decoder.1.mlp.0.weight', 'prompts_generator.decoder.1.mlp.0.bias', 'prompts_generator.decoder.1.mlp.3.weight', 'prompts_generator.decoder.1.mlp.3.bias', 'mit.positional_embedding', 'mit.resblocks.0.attn.in_proj_weight', 'mit.resblocks.0.attn.in_proj_bias', 'mit.resblocks.0.attn.out_proj.weight', 'mit.resblocks.0.attn.out_proj.bias', 'mit.resblocks.0.ln_1.weight', 'mit.resblocks.0.ln_1.bias', 'mit.resblocks.0.mlp.c_fc.weight', 'mit.resblocks.0.mlp.c_fc.bias', 'mit.resblocks.0.mlp.c_proj.weight', 'mit.resblocks.0.mlp.c_proj.bias', 'mit.resblocks.0.ln_2.weight', 'mit.resblocks.0.ln_2.bias', 'visual.transformer.resblocks.0.message_fc.weight', 'visual.transformer.resblocks.0.message_fc.bias', 'visual.transformer.resblocks.0.message_ln.weight', 'visual.transformer.resblocks.0.message_ln.bias', 'visual.transformer.resblocks.0.message_attn.in_proj_weight', 'visual.transformer.resblocks.0.message_attn.in_proj_bias', 'visual.transformer.resblocks.0.message_attn.out_proj.weight', 'visual.transformer.resblocks.0.message_attn.out_proj.bias', 'visual.transformer.resblocks.1.message_fc.weight', 'visual.transformer.resblocks.1.message_fc.bias', 'visual.transformer.resblocks.1.message_ln.weight', 'visual.transformer.resblocks.1.message_ln.bias', 'visual.transformer.resblocks.1.message_attn.in_proj_weight', 'visual.transformer.resblocks.1.message_attn.in_proj_bias', 'visual.transformer.resblocks.1.message_attn.out_proj.weight', 'visual.transformer.resblocks.1.message_attn.out_proj.bias', 'visual.transformer.resblocks.2.message_fc.weight', 'visual.transformer.resblocks.2.message_fc.bias', 'visual.transformer.resblocks.2.message_ln.weight', 'visual.transformer.resblocks.2.message_ln.bias', 'visual.transformer.resblocks.2.message_attn.in_proj_weight', 'visual.transformer.resblocks.2.message_attn.in_proj_bias', 'visual.transformer.resblocks.2.message_attn.out_proj.weight', 'visual.transformer.resblocks.2.message_attn.out_proj.bias', 'visual.transformer.resblocks.3.message_fc.weight', 'visual.transformer.resblocks.3.message_fc.bias', 'visual.transformer.resblocks.3.message_ln.weight', 'visual.transformer.resblocks.3.message_ln.bias', 'visual.transformer.resblocks.3.message_attn.in_proj_weight', 'visual.transformer.resblocks.3.message_attn.in_proj_bias', 'visual.transformer.resblocks.3.message_attn.out_proj.weight', 'visual.transformer.resblocks.3.message_attn.out_proj.bias', 'visual.transformer.resblocks.4.message_fc.weight', 'visual.transformer.resblocks.4.message_fc.bias', 'visual.transformer.resblocks.4.message_ln.weight', 'visual.transformer.resblocks.4.message_ln.bias', 'visual.transformer.resblocks.4.message_attn.in_proj_weight', 'visual.transformer.resblocks.4.message_attn.in_proj_bias', 'visual.transformer.resblocks.4.message_attn.out_proj.weight', 'visual.transformer.resblocks.4.message_attn.out_proj.bias', 'visual.transformer.resblocks.5.message_fc.weight', 'visual.transformer.resblocks.5.message_fc.bias', 'visual.transformer.resblocks.5.message_ln.weight', 'visual.transformer.resblocks.5.message_ln.bias', 'visual.transformer.resblocks.5.message_attn.in_proj_weight', 'visual.transformer.resblocks.5.message_attn.in_proj_bias', 'visual.transformer.resblocks.5.message_attn.out_proj.weight', 'visual.transformer.resblocks.5.message_attn.out_proj.bias', 'visual.transformer.resblocks.6.message_fc.weight', 'visual.transformer.resblocks.6.message_fc.bias', 'visual.transformer.resblocks.6.message_ln.weight', 'visual.transformer.resblocks.6.message_ln.bias', 'visual.transformer.resblocks.6.message_attn.in_proj_weight', 'visual.transformer.resblocks.6.message_attn.in_proj_bias', 'visual.transformer.resblocks.6.message_attn.out_proj.weight', 'visual.transformer.resblocks.6.message_attn.out_proj.bias', 'visual.transformer.resblocks.7.message_fc.weight', 'visual.transformer.resblocks.7.message_fc.bias', 'visual.transformer.resblocks.7.message_ln.weight', 'visual.transformer.resblocks.7.message_ln.bias', 'visual.transformer.resblocks.7.message_attn.in_proj_weight', 'visual.transformer.resblocks.7.message_attn.in_proj_bias', 'visual.transformer.resblocks.7.message_attn.out_proj.weight', 'visual.transformer.resblocks.7.message_attn.out_proj.bias', 'visual.transformer.resblocks.8.message_fc.weight', 'visual.transformer.resblocks.8.message_fc.bias', 'visual.transformer.resblocks.8.message_ln.weight', 'visual.transformer.resblocks.8.message_ln.bias', 'visual.transformer.resblocks.8.message_attn.in_proj_weight', 'visual.transformer.resblocks.8.message_attn.in_proj_bias', 'visual.transformer.resblocks.8.message_attn.out_proj.weight', 'visual.transformer.resblocks.8.message_attn.out_proj.bias', 'visual.transformer.resblocks.9.message_fc.weight', 'visual.transformer.resblocks.9.message_fc.bias', 'visual.transformer.resblocks.9.message_ln.weight', 'visual.transformer.resblocks.9.message_ln.bias', 'visual.transformer.resblocks.9.message_attn.in_proj_weight', 'visual.transformer.resblocks.9.message_attn.in_proj_bias', 'visual.transformer.resblocks.9.message_attn.out_proj.weight', 'visual.transformer.resblocks.9.message_attn.out_proj.bias', 'visual.transformer.resblocks.10.message_fc.weight', 'visual.transformer.resblocks.10.message_fc.bias', 'visual.transformer.resblocks.10.message_ln.weight', 'visual.transformer.resblocks.10.message_ln.bias', 'visual.transformer.resblocks.10.message_attn.in_proj_weight', 'visual.transformer.resblocks.10.message_attn.in_proj_bias', 'visual.transformer.resblocks.10.message_attn.out_proj.weight', 'visual.transformer.resblocks.10.message_attn.out_proj.bias', 'visual.transformer.resblocks.11.message_fc.weight', 'visual.transformer.resblocks.11.message_fc.bias', 'visual.transformer.resblocks.11.message_ln.weight', 'visual.transformer.resblocks.11.message_ln.bias', 'visual.transformer.resblocks.11.message_attn.in_proj_weight', 'visual.transformer.resblocks.11.message_attn.in_proj_bias', 'visual.transformer.resblocks.11.message_attn.out_proj.weight', 'visual.transformer.resblocks.11.message_attn.out_proj.bias', 'prompts_visual_ln.weight', 'prompts_visual_ln.bias'], unexpected_keys=[])
[2024-09-24 09:36:48 ViT-B/16] (main.py 188): INFO Train: [0/50][0/6] eta 0:08:30 lr 0.000000000 time 85.0619 (85.0619) tot_loss 0.4772 (0.4772) mem 0MB
[2024-09-24 09:41:29 ViT-B/16] (main.py 280): INFO working dir: ./
[2024-09-24 09:41:29 ViT-B/16] (main.py 284): INFO AUG:
COLOR_JITTER: 0.8
CUTMIX: 1.0
GRAY_SCALE: 0.2
LABEL_SMOOTH: 0.1
MIXUP: 0.8
MIXUP_SWITCH_PROB: 0.5
BASE: ['']
DATA:
DATASET: fit101
INPUT_SIZE: 224
LABEL_LIST: data/videos/labels.csv
NUM_CLASSES: 5
NUM_FRAMES: 8
ROOT: data/videos
TRAIN_FILE: data/videos/train.txt
VAL_FILE: data/videos/val.txt
LOCAL_RANK: 0
MODEL:
ARCH: ViT-B/16
DROP_PATH_RATE: 0.0
FIX_TEXT: True
PRETRAINED: None
RESUME: None
OUTPUT: ./
PRINT_FREQ: 50
SAVE_FREQ: 10
SEED: 1024
TEST:
NUM_CLIP: 1
NUM_CROP: 1
ONLY_TEST: False
TRAIN:
ACCUMULATION_STEPS: 4
AUTO_RESUME: False
BATCH_SIZE: 5
EPOCHS: 50
LR: 2e-06
LR_SCHEDULER: cosine
OPTIMIZER: adamw
OPT_LEVEL: O1
USE_CHECKPOINT: False
WARMUP_EPOCHS: 5
WEIGHT_DECAY: 0.001
[2024-09-24 09:41:32 ViT-B/16] (xclip.py 169): INFO attr: Parameter containing:
tensor([[ 0.0135, -0.0061, -0.0123, ..., -0.0247, -0.0076, 0.0007],
[ 0.0461, -0.0247, 0.0471, ..., -0.0038, 0.0997, -0.0373],
[-0.0315, -0.0412, -0.0571, ..., -0.0540, 0.0156, -0.0402],
...,
[-0.0654, 0.0093, -0.0926, ..., -0.0535, 0.0033, -0.0565],
[-0.0405, -0.0076, 0.0250, ..., 0.0361, -0.0558, -0.0060],
[ 0.0184, -0.0079, 0.0142, ..., 0.0654, -0.0140, -0.0554]],
requires_grad=True)
[2024-09-24 09:41:32 ViT-B/16] (xclip.py 169): INFO attr: Parameter containing:
tensor([[ 0.0949, 0.0449, 0.0531, ..., -0.0693, -0.0635, 0.0004],
[-0.0219, 0.0381, 0.0310, ..., 0.0187, 0.0006, 0.0336],
[-0.1189, -0.1230, -0.0177, ..., -0.0302, -0.0377, 0.0074],
...,
[-0.0280, 0.0341, -0.0259, ..., -0.0814, 0.0661, 0.0119],
[-0.0408, 0.0081, 0.0331, ..., -0.0055, -0.0620, -0.0524],
[ 0.0272, 0.0447, -0.0829, ..., 0.0094, 0.0563, -0.0323]],
requires_grad=True)
[2024-09-24 09:41:32 ViT-B/16] (xclip.py 215): INFO load pretrained CLIP: _IncompatibleKeys(missing_keys=['prompts_visual_proj', 'prompts_generator.alpha', 'prompts_generator.norm.weight', 'prompts_generator.norm.bias', 'prompts_generator.decoder.0.cross_attn.q_proj.weight', 'prompts_generator.decoder.0.cross_attn.k_proj.weight', 'prompts_generator.decoder.0.cross_attn.v_proj.weight', 'prompts_generator.decoder.0.cross_attn.proj.weight', 'prompts_generator.decoder.0.cross_attn.proj.bias', 'prompts_generator.decoder.0.norm1.weight', 'prompts_generator.decoder.0.norm1.bias', 'prompts_generator.decoder.0.norm3.weight', 'prompts_generator.decoder.0.norm3.bias', 'prompts_generator.decoder.0.mlp.0.weight', 'prompts_generator.decoder.0.mlp.0.bias', 'prompts_generator.decoder.0.mlp.3.weight', 'prompts_generator.decoder.0.mlp.3.bias', 'prompts_generator.decoder.1.cross_attn.q_proj.weight', 'prompts_generator.decoder.1.cross_attn.k_proj.weight', 'prompts_generator.decoder.1.cross_attn.v_proj.weight', 'prompts_generator.decoder.1.cross_attn.proj.weight', 'prompts_generator.decoder.1.cross_attn.proj.bias', 'prompts_generator.decoder.1.norm1.weight', 'prompts_generator.decoder.1.norm1.bias', 'prompts_generator.decoder.1.norm3.weight', 'prompts_generator.decoder.1.norm3.bias', 'prompts_generator.decoder.1.mlp.0.weight', 'prompts_generator.decoder.1.mlp.0.bias', 'prompts_generator.decoder.1.mlp.3.weight', 'prompts_generator.decoder.1.mlp.3.bias', 'mit.positional_embedding', 'mit.resblocks.0.attn.in_proj_weight', 'mit.resblocks.0.attn.in_proj_bias', 'mit.resblocks.0.attn.out_proj.weight', 'mit.resblocks.0.attn.out_proj.bias', 'mit.resblocks.0.ln_1.weight', 'mit.resblocks.0.ln_1.bias', 'mit.resblocks.0.mlp.c_fc.weight', 'mit.resblocks.0.mlp.c_fc.bias', 'mit.resblocks.0.mlp.c_proj.weight', 'mit.resblocks.0.mlp.c_proj.bias', 'mit.resblocks.0.ln_2.weight', 'mit.resblocks.0.ln_2.bias', 'visual.transformer.resblocks.0.message_fc.weight', 'visual.transformer.resblocks.0.message_fc.bias', 'visual.transformer.resblocks.0.message_ln.weight', 'visual.transformer.resblocks.0.message_ln.bias', 'visual.transformer.resblocks.0.message_attn.in_proj_weight', 'visual.transformer.resblocks.0.message_attn.in_proj_bias', 'visual.transformer.resblocks.0.message_attn.out_proj.weight', 'visual.transformer.resblocks.0.message_attn.out_proj.bias', 'visual.transformer.resblocks.1.message_fc.weight', 'visual.transformer.resblocks.1.message_fc.bias', 'visual.transformer.resblocks.1.message_ln.weight', 'visual.transformer.resblocks.1.message_ln.bias', 'visual.transformer.resblocks.1.message_attn.in_proj_weight', 'visual.transformer.resblocks.1.message_attn.in_proj_bias', 'visual.transformer.resblocks.1.message_attn.out_proj.weight', 'visual.transformer.resblocks.1.message_attn.out_proj.bias', 'visual.transformer.resblocks.2.message_fc.weight', 'visual.transformer.resblocks.2.message_fc.bias', 'visual.transformer.resblocks.2.message_ln.weight', 'visual.transformer.resblocks.2.message_ln.bias', 'visual.transformer.resblocks.2.message_attn.in_proj_weight', 'visual.transformer.resblocks.2.message_attn.in_proj_bias', 'visual.transformer.resblocks.2.message_attn.out_proj.weight', 'visual.transformer.resblocks.2.message_attn.out_proj.bias', 'visual.transformer.resblocks.3.message_fc.weight', 'visual.transformer.resblocks.3.message_fc.bias', 'visual.transformer.resblocks.3.message_ln.weight', 'visual.transformer.resblocks.3.message_ln.bias', 'visual.transformer.resblocks.3.message_attn.in_proj_weight', 'visual.transformer.resblocks.3.message_attn.in_proj_bias', 'visual.transformer.resblocks.3.message_attn.out_proj.weight', 'visual.transformer.resblocks.3.message_attn.out_proj.bias', 'visual.transformer.resblocks.4.message_fc.weight', 'visual.transformer.resblocks.4.message_fc.bias', 'visual.transformer.resblocks.4.message_ln.weight', 'visual.transformer.resblocks.4.message_ln.bias', 'visual.transformer.resblocks.4.message_attn.in_proj_weight', 'visual.transformer.resblocks.4.message_attn.in_proj_bias', 'visual.transformer.resblocks.4.message_attn.out_proj.weight', 'visual.transformer.resblocks.4.message_attn.out_proj.bias', 'visual.transformer.resblocks.5.message_fc.weight', 'visual.transformer.resblocks.5.message_fc.bias', 'visual.transformer.resblocks.5.message_ln.weight', 'visual.transformer.resblocks.5.message_ln.bias', 'visual.transformer.resblocks.5.message_attn.in_proj_weight', 'visual.transformer.resblocks.5.message_attn.in_proj_bias', 'visual.transformer.resblocks.5.message_attn.out_proj.weight', 'visual.transformer.resblocks.5.message_attn.out_proj.bias', 'visual.transformer.resblocks.6.message_fc.weight', 'visual.transformer.resblocks.6.message_fc.bias', 'visual.transformer.resblocks.6.message_ln.weight', 'visual.transformer.resblocks.6.message_ln.bias', 'visual.transformer.resblocks.6.message_attn.in_proj_weight', 'visual.transformer.resblocks.6.message_attn.in_proj_bias', 'visual.transformer.resblocks.6.message_attn.out_proj.weight', 'visual.transformer.resblocks.6.message_attn.out_proj.bias', 'visual.transformer.resblocks.7.message_fc.weight', 'visual.transformer.resblocks.7.message_fc.bias', 'visual.transformer.resblocks.7.message_ln.weight', 'visual.transformer.resblocks.7.message_ln.bias', 'visual.transformer.resblocks.7.message_attn.in_proj_weight', 'visual.transformer.resblocks.7.message_attn.in_proj_bias', 'visual.transformer.resblocks.7.message_attn.out_proj.weight', 'visual.transformer.resblocks.7.message_attn.out_proj.bias', 'visual.transformer.resblocks.8.message_fc.weight', 'visual.transformer.resblocks.8.message_fc.bias', 'visual.transformer.resblocks.8.message_ln.weight', 'visual.transformer.resblocks.8.message_ln.bias', 'visual.transformer.resblocks.8.message_attn.in_proj_weight', 'visual.transformer.resblocks.8.message_attn.in_proj_bias', 'visual.transformer.resblocks.8.message_attn.out_proj.weight', 'visual.transformer.resblocks.8.message_attn.out_proj.bias', 'visual.transformer.resblocks.9.message_fc.weight', 'visual.transformer.resblocks.9.message_fc.bias', 'visual.transformer.resblocks.9.message_ln.weight', 'visual.transformer.resblocks.9.message_ln.bias', 'visual.transformer.resblocks.9.message_attn.in_proj_weight', 'visual.transformer.resblocks.9.message_attn.in_proj_bias', 'visual.transformer.resblocks.9.message_attn.out_proj.weight', 'visual.transformer.resblocks.9.message_attn.out_proj.bias', 'visual.transformer.resblocks.10.message_fc.weight', 'visual.transformer.resblocks.10.message_fc.bias', 'visual.transformer.resblocks.10.message_ln.weight', 'visual.transformer.resblocks.10.message_ln.bias', 'visual.transformer.resblocks.10.message_attn.in_proj_weight', 'visual.transformer.resblocks.10.message_attn.in_proj_bias', 'visual.transformer.resblocks.10.message_attn.out_proj.weight', 'visual.transformer.resblocks.10.message_attn.out_proj.bias', 'visual.transformer.resblocks.11.message_fc.weight', 'visual.transformer.resblocks.11.message_fc.bias', 'visual.transformer.resblocks.11.message_ln.weight', 'visual.transformer.resblocks.11.message_ln.bias', 'visual.transformer.resblocks.11.message_attn.in_proj_weight', 'visual.transformer.resblocks.11.message_attn.in_proj_bias', 'visual.transformer.resblocks.11.message_attn.out_proj.weight', 'visual.transformer.resblocks.11.message_attn.out_proj.bias', 'prompts_visual_ln.weight', 'prompts_visual_ln.bias'], unexpected_keys=[])
[2024-09-24 10:41:54 ViT-B/16] (main.py 280): INFO working dir: ./
[2024-09-24 10:41:54 ViT-B/16] (main.py 284): INFO AUG:
COLOR_JITTER: 0.8
CUTMIX: 1.0
GRAY_SCALE: 0.2
LABEL_SMOOTH: 0.1
MIXUP: 0.8
MIXUP_SWITCH_PROB: 0.5
BASE: ['']
DATA:
DATASET: fit101
INPUT_SIZE: 224
LABEL_LIST: data/videos/labels.csv
NUM_CLASSES: 5
NUM_FRAMES: 8
ROOT: data/videos
TRAIN_FILE: data/videos/train.txt
VAL_FILE: data/videos/val.txt
LOCAL_RANK: 0
MODEL:
ARCH: ViT-B/16
DROP_PATH_RATE: 0.0
FIX_TEXT: True
PRETRAINED: None
RESUME: None
OUTPUT: ./
PRINT_FREQ: 50
SAVE_FREQ: 10
SEED: 1024
TEST:
NUM_CLIP: 1
NUM_CROP: 1
ONLY_TEST: False
TRAIN:
ACCUMULATION_STEPS: 4
AUTO_RESUME: False
BATCH_SIZE: 5
EPOCHS: 50
LR: 2e-06
LR_SCHEDULER: cosine
OPTIMIZER: adamw
OPT_LEVEL: O1
USE_CHECKPOINT: False
WARMUP_EPOCHS: 5
WEIGHT_DECAY: 0.001
[2024-09-24 10:41:58 ViT-B/16] (xclip.py 169): INFO attr: Parameter containing:
tensor([[ 0.0135, -0.0061, -0.0123, ..., -0.0247, -0.0076, 0.0007],
[ 0.0461, -0.0247, 0.0471, ..., -0.0038, 0.0997, -0.0373],
[-0.0315, -0.0412, -0.0571, ..., -0.0540, 0.0156, -0.0402],
...,
[-0.0654, 0.0093, -0.0926, ..., -0.0535, 0.0033, -0.0565],
[-0.0405, -0.0076, 0.0250, ..., 0.0361, -0.0558, -0.0060],
[ 0.0184, -0.0079, 0.0142, ..., 0.0654, -0.0140, -0.0554]],
requires_grad=True)
[2024-09-24 10:41:58 ViT-B/16] (xclip.py 169): INFO attr: Parameter containing:
tensor([[ 0.0949, 0.0449, 0.0531, ..., -0.0693, -0.0635, 0.0004],
[-0.0219, 0.0381, 0.0310, ..., 0.0187, 0.0006, 0.0336],
[-0.1189, -0.1230, -0.0177, ..., -0.0302, -0.0377, 0.0074],
...,
[-0.0280, 0.0341, -0.0259, ..., -0.0814, 0.0661, 0.0119],
[-0.0408, 0.0081, 0.0331, ..., -0.0055, -0.0620, -0.0524],
[ 0.0272, 0.0447, -0.0829, ..., 0.0094, 0.0563, -0.0323]],
requires_grad=True)
[2024-09-24 10:41:58 ViT-B/16] (xclip.py 215): INFO load pretrained CLIP: _IncompatibleKeys(missing_keys=['prompts_visual_proj', 'prompts_generator.alpha', 'prompts_generator.norm.weight', 'prompts_generator.norm.bias', 'prompts_generator.decoder.0.cross_attn.q_proj.weight', 'prompts_generator.decoder.0.cross_attn.k_proj.weight', 'prompts_generator.decoder.0.cross_attn.v_proj.weight', 'prompts_generator.decoder.0.cross_attn.proj.weight', 'prompts_generator.decoder.0.cross_attn.proj.bias', 'prompts_generator.decoder.0.norm1.weight', 'prompts_generator.decoder.0.norm1.bias', 'prompts_generator.decoder.0.norm3.weight', 'prompts_generator.decoder.0.norm3.bias', 'prompts_generator.decoder.0.mlp.0.weight', 'prompts_generator.decoder.0.mlp.0.bias', 'prompts_generator.decoder.0.mlp.3.weight', 'prompts_generator.decoder.0.mlp.3.bias', 'prompts_generator.decoder.1.cross_attn.q_proj.weight', 'prompts_generator.decoder.1.cross_attn.k_proj.weight', 'prompts_generator.decoder.1.cross_attn.v_proj.weight', 'prompts_generator.decoder.1.cross_attn.proj.weight', 'prompts_generator.decoder.1.cross_attn.proj.bias', 'prompts_generator.decoder.1.norm1.weight', 'prompts_generator.decoder.1.norm1.bias', 'prompts_generator.decoder.1.norm3.weight', 'prompts_generator.decoder.1.norm3.bias', 'prompts_generator.decoder.1.mlp.0.weight', 'prompts_generator.decoder.1.mlp.0.bias', 'prompts_generator.decoder.1.mlp.3.weight', 'prompts_generator.decoder.1.mlp.3.bias', 'mit.positional_embedding', 'mit.resblocks.0.attn.in_proj_weight', 'mit.resblocks.0.attn.in_proj_bias', 'mit.resblocks.0.attn.out_proj.weight', 'mit.resblocks.0.attn.out_proj.bias', 'mit.resblocks.0.ln_1.weight', 'mit.resblocks.0.ln_1.bias', 'mit.resblocks.0.mlp.c_fc.weight', 'mit.resblocks.0.mlp.c_fc.bias', 'mit.resblocks.0.mlp.c_proj.weight', 'mit.resblocks.0.mlp.c_proj.bias', 'mit.resblocks.0.ln_2.weight', 'mit.resblocks.0.ln_2.bias', 'visual.transformer.resblocks.0.message_fc.weight', 'visual.transformer.resblocks.0.message_fc.bias', 'visual.transformer.resblocks.0.message_ln.weight', 'visual.transformer.resblocks.0.message_ln.bias', 'visual.transformer.resblocks.0.message_attn.in_proj_weight', 'visual.transformer.resblocks.0.message_attn.in_proj_bias', 'visual.transformer.resblocks.0.message_attn.out_proj.weight', 'visual.transformer.resblocks.0.message_attn.out_proj.bias', 'visual.transformer.resblocks.1.message_fc.weight', 'visual.transformer.resblocks.1.message_fc.bias', 'visual.transformer.resblocks.1.message_ln.weight', 'visual.transformer.resblocks.1.message_ln.bias', 'visual.transformer.resblocks.1.message_attn.in_proj_weight', 'visual.transformer.resblocks.1.message_attn.in_proj_bias', 'visual.transformer.resblocks.1.message_attn.out_proj.weight', 'visual.transformer.resblocks.1.message_attn.out_proj.bias', 'visual.transformer.resblocks.2.message_fc.weight', 'visual.transformer.resblocks.2.message_fc.bias', 'visual.transformer.resblocks.2.message_ln.weight', 'visual.transformer.resblocks.2.message_ln.bias', 'visual.transformer.resblocks.2.message_attn.in_proj_weight', 'visual.transformer.resblocks.2.message_attn.in_proj_bias', 'visual.transformer.resblocks.2.message_attn.out_proj.weight', 'visual.transformer.resblocks.2.message_attn.out_proj.bias', 'visual.transformer.resblocks.3.message_fc.weight', 'visual.transformer.resblocks.3.message_fc.bias', 'visual.transformer.resblocks.3.message_ln.weight', 'visual.transformer.resblocks.3.message_ln.bias', 'visual.transformer.resblocks.3.message_attn.in_proj_weight', 'visual.transformer.resblocks.3.message_attn.in_proj_bias', 'visual.transformer.resblocks.3.message_attn.out_proj.weight', 'visual.transformer.resblocks.3.message_attn.out_proj.bias', 'visual.transformer.resblocks.4.message_fc.weight', 'visual.transformer.resblocks.4.message_fc.bias', 'visual.transformer.resblocks.4.message_ln.weight', 'visual.transformer.resblocks.4.message_ln.bias', 'visual.transformer.resblocks.4.message_attn.in_proj_weight', 'visual.transformer.resblocks.4.message_attn.in_proj_bias', 'visual.transformer.resblocks.4.message_attn.out_proj.weight', 'visual.transformer.resblocks.4.message_attn.out_proj.bias', 'visual.transformer.resblocks.5.message_fc.weight', 'visual.transformer.resblocks.5.message_fc.bias', 'visual.transformer.resblocks.5.message_ln.weight', 'visual.transformer.resblocks.5.message_ln.bias', 'visual.transformer.resblocks.5.message_attn.in_proj_weight', 'visual.transformer.resblocks.5.message_attn.in_proj_bias', 'visual.transformer.resblocks.5.message_attn.out_proj.weight', 'visual.transformer.resblocks.5.message_attn.out_proj.bias', 'visual.transformer.resblocks.6.message_fc.weight', 'visual.transformer.resblocks.6.message_fc.bias', 'visual.transformer.resblocks.6.message_ln.weight', 'visual.transformer.resblocks.6.message_ln.bias', 'visual.transformer.resblocks.6.message_attn.in_proj_weight', 'visual.transformer.resblocks.6.message_attn.in_proj_bias', 'visual.transformer.resblocks.6.message_attn.out_proj.weight', 'visual.transformer.resblocks.6.message_attn.out_proj.bias', 'visual.transformer.resblocks.7.message_fc.weight', 'visual.transformer.resblocks.7.message_fc.bias', 'visual.transformer.resblocks.7.message_ln.weight', 'visual.transformer.resblocks.7.message_ln.bias', 'visual.transformer.resblocks.7.message_attn.in_proj_weight', 'visual.transformer.resblocks.7.message_attn.in_proj_bias', 'visual.transformer.resblocks.7.message_attn.out_proj.weight', 'visual.transformer.resblocks.7.message_attn.out_proj.bias', 'visual.transformer.resblocks.8.message_fc.weight', 'visual.transformer.resblocks.8.message_fc.bias', 'visual.transformer.resblocks.8.message_ln.weight', 'visual.transformer.resblocks.8.message_ln.bias', 'visual.transformer.resblocks.8.message_attn.in_proj_weight', 'visual.transformer.resblocks.8.message_attn.in_proj_bias', 'visual.transformer.resblocks.8.message_attn.out_proj.weight', 'visual.transformer.resblocks.8.message_attn.out_proj.bias', 'visual.transformer.resblocks.9.message_fc.weight', 'visual.transformer.resblocks.9.message_fc.bias', 'visual.transformer.resblocks.9.message_ln.weight', 'visual.transformer.resblocks.9.message_ln.bias', 'visual.transformer.resblocks.9.message_attn.in_proj_weight', 'visual.transformer.resblocks.9.message_attn.in_proj_bias', 'visual.transformer.resblocks.9.message_attn.out_proj.weight', 'visual.transformer.resblocks.9.message_attn.out_proj.bias', 'visual.transformer.resblocks.10.message_fc.weight', 'visual.transformer.resblocks.10.message_fc.bias', 'visual.transformer.resblocks.10.message_ln.weight', 'visual.transformer.resblocks.10.message_ln.bias', 'visual.transformer.resblocks.10.message_attn.in_proj_weight', 'visual.transformer.resblocks.10.message_attn.in_proj_bias', 'visual.transformer.resblocks.10.message_attn.out_proj.weight', 'visual.transformer.resblocks.10.message_attn.out_proj.bias', 'visual.transformer.resblocks.11.message_fc.weight', 'visual.transformer.resblocks.11.message_fc.bias', 'visual.transformer.resblocks.11.message_ln.weight', 'visual.transformer.resblocks.11.message_ln.bias', 'visual.transformer.resblocks.11.message_attn.in_proj_weight', 'visual.transformer.resblocks.11.message_attn.in_proj_bias', 'visual.transformer.resblocks.11.message_attn.out_proj.weight', 'visual.transformer.resblocks.11.message_attn.out_proj.bias', 'prompts_visual_ln.weight', 'prompts_visual_ln.bias'], unexpected_keys=[])