Data-driven approaches to heterogeneous datasets