Near-Optimal Cache Sharing through Co-Located Parallel Scheduling of Threads (Full Report)