Blog

experiments to understand LLMs/VLMs and how we may tame them.

models of small worlds : LLMs are terrible at visual reasoning and cannot even reason around basic visual puzzles. There is a million dollar prize to anyone who can make them 85% accurate on visual puzzles that are very simple for humans. These puzzles use simple concepts such as copying, moving, colliding, large eats small etc. in a small world to create scenarios. To solve this, we need to find the precise program which will generate a scenario and apply it to a another instance of that world to solve a puzzle. This blog has my ideas, and attempts to solve this problem.
state of video understanding and analytics :
agent see agent do: how to build agents that learn by looking at you work