A user recently 
posted a note on the 
Help forum, suggesting that 
FastFormat was up to 100 times slower than MFC's 
CString::Format(). Thankfully that's not the case, and 
FastFormat appears to be faster than 
CString::Format() in even the simple case suggested.
The reason this appeared to be so was because the test program reused the same 
CString instance in each call to 
fastformat::fmt(), resulting in concatenation to a huge size. The fault here is no doubt mine, for not having made the nature of 
FastFormat's formatting more clear in the documentation. So I'll try and do so briefly now.
All 
FastFormat formatting is appending, and this is a deliberate design decision. In part, this is to achieve consistency between immediate sinks such as strings, and stream sinks (such as 
std::cout, 
stdout, ...). Also, it's useful to be able to break apart large or complex formatting operations into statements, without sacrificing performance or expressiveness. (Note: there'll always a small performance penalty for such things, but it will be insignificant in most cases where the size and complexity of the statement demands breaking up.) Consider the following example:
std::string s;
ff::fmt(s, "{0};{1};{2}|{3};{4};{5}|{6};{7};{8}|{9};{10};{11}|", a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11);
The same can be achieved as follows:
std::string s;
ff::fmt(s, "{0};{1};{2}|", a0, a1, a2);
ff::fmt(s, "{0};{1};{2}|", a3, a4, a5);
ff::fmt(s, "{0};{1};{2}|", a6, a7, a8);
ff::fmt(s, "{0};{1};{2}|", a9, a10, a11);
In real-world cases, such clarity may be worth paying a few more extra cycles for.