Abstract: Scaling the input image resolution is essential for enhancing the performance of Vision Language Models (VLMs), particularly in text-rich image understanding tasks. However, popular visual ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results