A Quest For Visual Commonsense: Scene Understanding By Functional And Physical Reasoning