Visual question answering with contextualized commonsense knowledge