Retreat notes and progress untill 05.09.2022 - Chair of Cyber-Physical-Systems

Agenda

Next steps after AAAI submission

Normalize total loss
What is the performance of CR-VAE with ResNet architecture on MNIST and CIFAR-10 Datasets?
What is the performance of MoCo on MNIST and CIFAR-10 Datasets?
How does CR-VAE-BIG compare with MoCo?
What is better, SGD or Adam? Why?
What is better, E2E or Modular? Why?
How can we train on ImageNet? Maybe alternative datasets?
New architecture: decoder input -> concatenated latent representations from q and k encoders.
Can we incorporate all representation techniques into one?

KL divergence computation was wrong. When fixed, performance was different
With a weight factor of 1 for the KL divergence, the learned features performance in classification task diminish.
This report shows this problem.
Same behavior for CR-VAE. Untill the reconstruction and the contrastive losses are in the same scale with the KLD loss, the performance will continue to deviate. This happens because KLD dominates numerically the total loss.
Way to mitigate it:
- Descending beta value
  - currently exploring different scheduling techniques
  - report
Note: CR-VAE does not seem novel now
Normalizing total loss (weight loss inversively with their magnitude) might lead to better performance