On some devices, float is not double, such as previous generations of GPUs and some mobile devices. What should I do when double precision is required?
I remember seeing two float double simulation methods in a certain place last year. After some desperate searches, I had no effort, in the Mandelbrot example of the Cuda SDK, two float functions are found to simulate double multiplication. Even the double on gtx280 is simulated in a similar way, which is surprisingly slow, with only float 1/8.
First show the function dsmul that simulates multiplication:
// This function multiplies DS numbers A and B to yield the DS product C.
_ DEVICE _ inline void dsmul (float & C0, float & C1,
Const float A0, const float A1, const float B0, const float B1)
{
// This splits DSA (1) and DSB (1) into high-order and low-order words.
Float ConA = A0 * 8193.0f;
Float Conb = b0 * 8193.0f;
Float SA1 = ConA-(ConA-A0 );
Float SB1 = Conb-(Conb-B0 );
Float SA2 = A0-SA1;
Float sb2 = b0-SB1;
// Multilply A0 * B0 using Dekker's method.
Float C11 = A0 * B0;
Float C21 = (SA1 * SB1-C11) + SA1 * sb2) + SA2 * SB1) + SA2 * sb2;
// Compute A0 * B1 + A1 * B0 (only high-order word is needed ).
Float C2 = A0 * B1 + A1 * B0;
// Compute (C11, C21) + C2 using knuth's trick, also adding low-order product.
Float T1 = C11 + C2;
Float E = T1-C11;
Float t2 = (C2-e) + (C11-(T1-E) + C21 + A1 * B1;
// The result is t1 + T2, after normalization.
C0 = E = t1 + T2;
C1 = t2-(E-t1 );
} // Dsmul
On the Nv forum, we also found a series of computing functions written by the High-human. Although it is cuda, it is easy to change it to another language and environment: dsmath. h