Latest AI news, expert analysis, bold opinions, and key trends — delivered to your inbox.
ViperGPT is a framework that leverages code-generation models to compose vision-and-language models into subroutines to answer complex visual queries. Unlike end-to-end models, ViperGPT explicitly differentiates between visual processing and reasoning, making it more interpretable and generalizable. It achieves state-of-the-art results across various complex visual tasks without requiring further training.