This is a PyTorch/GPU implementation of the paper "Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Generation", referred to as VFMTok, which establishes new state-of-the-art ...
This is a PyTorch/GPU implementation of the paper Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Generation, which directly utilizes the features from the frozen ...