long double

In C and related programming languages, long double refers to a floating-point data type that is often more precise than double precision though the language standard only requires it to be at least as precise as double. As with C's other floating-point types, it may not necessarily map to an IEEE format.

`long double` in C

History

The long double type was present in the original 1989 C standard,[1] but support was improved by the 1999 revision of the C standard, or C99, which extended the standard library to include functions operating on long double such as sinl() and strtold().

Long double constants are floating-point constants suffixed with "L" or "l" (lower-case L), e.g., 0.3333333333333333333333333333333333L or 3.1415926535897932384626433832795029L for quadruple precision. Without a suffix, the evaluation depends on FLT_EVAL_METHOD.

Implementations

On the x86 architecture, most C compilers implement long double as the 80-bit extended precision type supported by x86 hardware (generally stored as 12 or 16 bytes to maintain data structure alignment), as specified in the C99 / C11 standards (IEC 60559 floating-point arithmetic (Annex F)). An exception is Microsoft Visual C++ for x86, which makes long double a synonym for double.[2] The Intel C++ compiler on Microsoft Windows supports extended precision, but requires the /Qlong‑double switch for long double to correspond to the hardware's extended precision format.[3]

Compilers may also use long double for the IEEE 754 quadruple-precision binary floating-point format (binary128). This is the case on HP-UX,[4] Solaris/SPARC,[5] MIPS with the 64-bit or n32 ABI,[6] 64-bit ARM (AArch64)[7] (on operating systems using the standard AAPCS calling conventions, such as Linux), and z/OS with FLOAT(IEEE)[8][9][10]. Most implementations are in software, but some processors have hardware support.

On some PowerPC systems,[11] long double is implemented as a double-double arithmetic, where a long double value is regarded as the exact sum of two double-precision values, giving at least a 106-bit precision; with such a format, the long double type does not conform to the IEEE floating-point standard. Otherwise, long double is simply a synonym for double (double precision), e.g. on 32-bit ARM,[12] 64-bit ARM (AArch64) (on Windows[13] and macOS[14]) and on 32-bit MIPS[15] (old ABI, a.k.a. o32).

With the GNU C Compiler, long double is 80-bit extended precision on x86 processors regardless of the physical storage used for the type (which can be either 96 or 128 bits),[16] On some other architectures, long double can be double-double (e.g. on PowerPC[17][18][19]) or 128-bit quadruple precision (e.g. on SPARC[20]). As of gcc 4.3, a quadruple precision is also supported on x86, but as the nonstandard type __float128 rather than long double.[21]

Although the x86 architecture, and specifically the x87 floating-point instructions on x86, supports 80-bit extended-precision operations, it is possible to configure the processor to automatically round operations to double (or even single) precision. Conversely, in extended-precision mode, extended precision may be used for intermediate compiler-generated calculations even when the final results are stored at a lower precision (i.e. FLT_EVAL_METHOD == 2). With gcc on Linux, 80-bit extended precision is the default; on several BSD operating systems (FreeBSD and OpenBSD), double-precision mode is the default, and long double operations are effectively reduced to double precision.[22] (NetBSD 7.0 and later, however, defaults to 80-bit extended precision [23]). However, it is possible to override this within an individual program via the FLDCW "floating-point load control-word" instruction.[22] On x86_64, the BSDs default to 80-bit extended precision. Microsoft Windows with Visual C++ also sets the processor in double-precision mode by default, but this can again be overridden within an individual program (e.g. by the _controlfp_s function in Visual C++[24]). The Intel C++ Compiler for x86, on the other hand, enables extended-precision mode by default.[25] On IA-32 OS X, long double is 80-bit extended precision.[26]

Other specifications

In CORBA (from specification of 3.0, which uses "ANSI/IEEE Standard 754-1985" as its reference), "the long double data type represents an IEEE double-extended floating-point number, which has an exponent of at least 15 bits in length and a signed fraction of at least 64 bits", with GIOP/IIOP CDR, whose floating-point types "exactly follow the IEEE standard formats for floating point numbers", marshalling this as what seems to be IEEE 754-2008 binary128 a.k.a. quadruple precision without using that name.

References

ANSI/ISO 9899-1990 American National Standard for Programming Languages - C, section 6.1.2.5.
"Long Double". learn.microsoft.com. Retrieved 2022-10-06.
Intel Developer Site
Hewlett Packard (1992). "Porting C Programs". HP-UX Portability Guide - HP 9000 Computers (PDF) (2nd ed.). pp. 5-3 and 5-37.
"IEEE Arithmetic". docs.oracle.com. Retrieved 2022-10-06.
"MIPSpro™ N32 ABI Handbook" (PDF). 1999. Retrieved 2020-05-26.
"Procedure Call Standard for the Arm® 64-bit Architecture (AArch64)". GitHub. 2020-10-01. Archived (PDF) from the original on 2020-10-02.
"Floating-point types". IBM. 2020-10-09. Retrieved 2020-10-09.
Schwarz, Eric (June 22, 2015). "The IBM z13 SIMD Accelerators for Integer, String, and Floating-Point" (PDF). Retrieved July 13, 2015.
Schwarz, E. M.; Krygowski, C. A. (September 1999). "The S/390 G5 floating-point unit". IBM Journal of Research and Development. 43 (5/6): 707–721. CiteSeerX 10.1.1.117.6711. doi:10.1147/rd.435.0707.
"The saga of the Power ISA 128-bit long double". 2018-12-22. Retrieved 2021-12-26.
"ARM® Compiler toolchain Compiler Reference, Version 5.03" (PDF). 2013. Section 6.3 Basic data types. Retrieved 2019-11-08.
"llvm/llvm-project". GitHub. Retrieved 2020-09-03.
"llvm/llvm-project". GitHub. Retrieved 2020-09-03.
"System V Application Binary Interface: MIPS(r) Processor Supplement" (PDF) (3rd ed.). 1996. Retrieved 2020-05-26.
"x86 Options (Using the GNU Compiler Collection (GCC))". gcc.gnu.org. Retrieved 2022-10-06.
"RS/6000 and PowerPC Options (Using the GNU Compiler Collection (GCC))". gcc.gnu.org. Retrieved 2022-10-06.
Inside Macintosh - PowerPC Numerics Archived 2012-10-09 at the Wayback Machine
128-bit long double support routines for Darwin
"SPARC Options (Using the GNU Compiler Collection (GCC))". gcc.gnu.org. Retrieved 2022-10-06.
"GCC 4.3 Release Series — Changes, New Features, and Fixes - GNU Project". gcc.gnu.org. Retrieved 2022-10-06.
Brian J. Gough and Richard M. Stallman, An Introduction to GCC, section 8.6 Floating-point issues (Network Theory Ltd., 2004).
"Significant changes from NetBSD 6.0 to 7.0".
"Visual Studio 2005 Retired documentation". Microsoft Download Center. Retrieved 2022-10-06.
Intel C++ Compiler Documentation, Using the -fp-model (/fp) Option.
"IA-32 Function Calling Conventions".

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] ANSI/ISO 9899-1990 American National Standard for Programming Languages - C, section 6.1.2.5.

[2] "Long Double". learn.microsoft.com. Retrieved 2022-10-06.

[3] Intel Developer Site

[4] Hewlett Packard (1992). "Porting C Programs". HP-UX Portability Guide - HP 9000 Computers (PDF) (2nd ed.). pp. 5-3 and 5-37.

[5] "IEEE Arithmetic". docs.oracle.com. Retrieved 2022-10-06.

[6] "MIPSpro™ N32 ABI Handbook" (PDF). 1999. Retrieved 2020-05-26.

[7] "Procedure Call Standard for the Arm® 64-bit Architecture (AArch64)". GitHub. 2020-10-01. Archived (PDF) from the original on 2020-10-02.

[8] "Floating-point types". IBM. 2020-10-09. Retrieved 2020-10-09.

[9] Schwarz, Eric (June 22, 2015). "The IBM z13 SIMD Accelerators for Integer, String, and Floating-Point" (PDF). Retrieved July 13, 2015.

[10] Schwarz, E. M.; Krygowski, C. A. (September 1999). "The S/390 G5 floating-point unit". IBM Journal of Research and Development. 43 (5/6): 707–721. CiteSeerX 10.1.1.117.6711. doi:10.1147/rd.435.0707.

[11] "The saga of the Power ISA 128-bit long double". 2018-12-22. Retrieved 2021-12-26.

[12] "ARM® Compiler toolchain Compiler Reference, Version 5.03" (PDF). 2013. Section 6.3 Basic data types. Retrieved 2019-11-08.

[13] "llvm/llvm-project". GitHub. Retrieved 2020-09-03.

[14] "llvm/llvm-project". GitHub. Retrieved 2020-09-03.

[15] "System V Application Binary Interface: MIPS(r) Processor Supplement" (PDF) (3rd ed.). 1996. Retrieved 2020-05-26.

[16] "x86 Options (Using the GNU Compiler Collection (GCC))". gcc.gnu.org. Retrieved 2022-10-06.

[17] "RS/6000 and PowerPC Options (Using the GNU Compiler Collection (GCC))". gcc.gnu.org. Retrieved 2022-10-06.

[18] Inside Macintosh - PowerPC Numerics Archived 2012-10-09 at the Wayback Machine

[19] 128-bit long double support routines for Darwin

[20] "SPARC Options (Using the GNU Compiler Collection (GCC))". gcc.gnu.org. Retrieved 2022-10-06.

[21] "GCC 4.3 Release Series — Changes, New Features, and Fixes - GNU Project". gcc.gnu.org. Retrieved 2022-10-06.

[introgcc-22] Brian J. Gough and Richard M. Stallman, An Introduction to GCC, section 8.6 Floating-point issues (Network Theory Ltd., 2004).

[23] "Significant changes from NetBSD 6.0 to 7.0".

[24] "Visual Studio 2005 Retired documentation". Microsoft Download Center. Retrieved 2022-10-06.

[25] Intel C++ Compiler Documentation, Using the -fp-model (/fp) Option.

[26] "IA-32 Function Calling Conventions".

Data types
Uninterpreted	Bit Byte Trit Tryte Word Bit array
Numeric	Arbitrary-precision or bignum Complex Decimal Fixed point Floating point Reduced precision Minifloat Half precision bfloat16 Single precision Double precision Quadruple precision Octuple precision Extended precision Long double Integer signedness Interval Rational
Pointer	Address physical virtual Reference
Text	Character String null-terminated
Composite	Algebraic data type generalized Array Associative array Class Dependent Equality Inductive Intersection List Object metaobject Option type Product Record or Struct Refinement Set Union tagged
Other	Boolean Bottom type Collection Enumerated type Exception Function type Opaque data type Recursive data type Semaphore Stream Strongly typed identifier Top type Type class Empty type Unit type Void
Related topics	Abstract data type Boxing Data structure Generic Kind metaclass Parametric polymorphism Primitive data type Interface Subtyping Type constructor Type conversion Type system Type theory Variable

Floating-point formats
IEEE 754
16-bit: Half (binary16) 32-bit: Single (binary32), decimal32 64-bit: Double (binary64), decimal64 128-bit: Quadruple (binary128), decimal128 256-bit: Octuple (binary256) Extended precision
Other
Minifloat bfloat16 TensorFloat-32 Microsoft Binary Format IBM floating-point architecture PMBus Linear-11 G.711 8-bit floats Arbitrary precision