Retreat notes and progress untill 05.09.2022
Agenda
Next steps after AAAI submission
Upcoming research questions to answer
- Normalize total loss
- What is the performance of CR-VAE with ResNet architecture on MNIST and CIFAR-10 Datasets?
- What is the performance of MoCo on MNIST and CIFAR-10 Datasets?
- How does CR-VAE-BIG compare with MoCo?
- What is better, SGD or Adam? Why?
- What is better, E2E or Modular? Why?
- How can we train on ImageNet? Maybe alternative datasets?
- New architecture: decoder input -> concatenated latent representations from q and k encoders.
- Can we incorporate all representation techniques into one?
post paper submission setbacks
- KL divergence computation was wrong. When fixed, performance was different
- With a weight factor of 1 for the KL divergence, the learned features performance in classification task diminish.
- This report shows this problem.
- Same behavior for CR-VAE. Untill the reconstruction and the contrastive losses are in the same scale with the KLD loss, the performance will continue to deviate. This happens because KLD dominates numerically the total loss.
- Way to mitigate it:
- Descending beta value
- currently exploring different scheduling techniques
- report
- Descending beta value
- Note: CR-VAE does not seem novel now
- Normalizing total loss (weight loss inversively with their magnitude) might lead to better performance