Per-exemplar analysis with MFoM fusion learning for multimedia retrieval and recounting