deep learning, Domain-Specific Accelerator, hardware-software co-optimization, locality, parallelism.