https://research.microsoft.com/en-us/um/people/simonpj/papers/ndp/haskell-beats-C.pdf
Abstract
Stream fusion [6] is a powerful technique for automatically transforming high-level sequence-processing functions into efficient implementations. It has been used to great effect in Haskell libraries for manipulating byte arrays, Unicode text, and unboxed vectors. However, some operations, like vector append, still do not perform well within the standard stream fusion framework. Others, like SIMD computation using the SSE and AVX instructions available on modern x86 chips, do not seem to fit in the framework at all.
In this paper we introduce generalized stream fusion, which solves these issues. The key insight is to bundle together multiple stream representations, each tuned for a particular class of stream consumer. We also describe a stream representation suited for ef ficient computation with SSE instructions. Our ideas are implemented in modified versions of the GHC compiler and vector library. Benchmarks show that high-level Haskell code written using our compiler and libraries can produce code that is faster than both compiler- and hand-vectorized C.
На ассемблере такие вещи надо делать. Алсо, тут http://benchmarksgame.alioth.debian.org/u64q/benchmark.php?test=all&lang=gcc&lang2=ghc&data=u64q хаскель сливает сишке почти по всем пунктам. И я бы не сказал, что кода на х-е сильно меньше, чем кода на Си