Fake entities appeared in 38% to 51% of reports when agents retrieved a manipulated page, rising to 62% with multiple pages.
Abstract: Pre-trained vision-language models (VLMs) and language models (LMs) have recently garnered significant attention due to their remarkable ability to represent textual concepts, opening up new ...