Towards high scalability and fine-grained parallelism on distributed HPC platforms (Full Report)