Ganesan Senthilvel: ImageBlind

Friday, August 18, 2023

ImageBlind

Meta engineering team released a paper with the model named ImageBlind, which can bind 6 modalities. It outperforms prior specialist models trained for a particular modality.

It joins data from:

- Images
- Text
- Audio
- Depth
- Thermal
- IMU data

It uses large-scale vision-language models and extends its zero-shot capabilities to new modalities just by using their natural pairing with images, such as video audio and image-depth data, to learn a single joint embedding space.

The paper shows that not all combinations of paired data are required to train a joint embedding, but only image-paired data is sufficient to bind the modalities together.

Ref: https://lnkd.in/gXNndk63

Friday, August 18, 2023

ImageBlind

No comments:

Post a Comment

Blog Archive

Followers