Pushing the limits of long context LLM inference via KV cache compression