Optimize the compiler output for larger cache blast cases that are
common for DMA-based networking.
On ar71xx, I measured a routing throughput increase of ~8%
Signed-off-by: Ben Menchaca <ben.menchaca@qca.qualcomm.com>
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>