Skip to content

Commit

Permalink
V3.79: quad-double arithmetic type (#437)
Browse files Browse the repository at this point in the history
* Setting SEMVER to v3.79

* enabling CI on the new v3.79 release branch

* making fixpnt trivially constructable, disabling complex<fixpnt> regression

* WIP: identified double rounding as cause for complex<fixpnt> failures

* labeling the complex<CustomType> use case: we'll need to create our own

* fixpnt is now trivially constructable, any implicit initialiation assumptions in the regression tests need to be fixed

* adding include <cstdint> to satisfy gcc builds

* dd bug fix and enhancement: supporting compiler environments prior to C++20

* recorded bug in bfloat16 that converts a float qNaN to a bfloat sNaN

* remove bfloat16 separate addition regression test, and replace with increased intensity of randoms

* enhancing the dfloat class API as first step in implementation

* WIP: adding native trig functions to double-double

* bug fix: trigonometry functions

* WIP: adding trigonometric function verification suites

* bug fix: both shim and native sqrt function needed fixing

* bug fix of inverse trigonometry functions for double-double

* removing meta_programming experiment as it is causing compilation issues

* removing meta_programming from build

* adding clang17/18 builder containers, which now include the gdb debugger

* standardizing on double-double documentation

* code hygiene double-double

* adding three_sum2 version of three_sum

* code hygiene

* initial check-in of quad-double skeleton

* rewriting three_sums to reflect Li/Bailey LBNL paper

* changing behavior of to_binary for double-double and quad-double

* unifying the to_binary() algorithm for double-double and quad-double

* implementing addition and subtraction for quad-double

* reclassifying some helper class methods to be protected

* adding multiplication to the quad-double

* adding division to quad-double arithmetic

* generating and testing the quad-precision constants

* enabling native log() functions for double-double type

* adding sub() and div() methods to double-double

* adding classification regression test for quad-double

* adding error/gamma regression test for quad-double

* adding exponent regression test for quad-double: exp algorithm is way off

* compilation fix for gcc

* completing the double add/sub/mul/div promotion to double-double API example

* adding fmod/remainder regression test for quad-double

* adding hyperbolic functions regression test for quad-double

* adding hypot function regression test for quad-double

* adding logarithmic function regression tests for quad-double: approximations are way off

* adding min/max function regression tests for quad-double

* adding nextafter/toward regression tests to quad-double

* adding pow function regression test to quad-double

* adding truncating functions regression tests to quad-double

* adding trigonometry regression tests to quad-double

* adding sqrt regression test to quad-double
  • Loading branch information
Ravenwater committed Aug 15, 2024
1 parent 15ce7cd commit 9aeed89
Show file tree
Hide file tree
Showing 124 changed files with 8,353 additions and 1,369 deletions.
2 changes: 1 addition & 1 deletion .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{
"image": "stillwater/builders:clang16builder"
"image": "stillwater/builders:clang18builder"
}
2 changes: 1 addition & 1 deletion .github/workflows/cmake.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: CMake

on:
push:
branches: [ v3.78, dev, main ]
branches: [ v3.79, dev, main ]
pull_request:
branches: [ main ]

Expand Down
18 changes: 9 additions & 9 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ if(NOT DEFINED UNIVERSAL_VERSION_MAJOR)
set(UNIVERSAL_VERSION_MAJOR 3)
endif()
if(NOT DEFINED UNIVERSAL_VERSION_MINOR)
set(UNIVERSAL_VERSION_MINOR 78)
set(UNIVERSAL_VERSION_MINOR 79)
endif()
if(NOT DEFINED UNIVERSAL_VERSION_PATCH)
set(UNIVERSAL_VERSION_PATCH 1)
Expand Down Expand Up @@ -130,8 +130,8 @@ option(BUILD_NUMBER_FIXPNTS "Set to ON to build static fixed-point
option(BUILD_NUMBER_BFLOATS "Set to ON to build static bfloat tests" OFF)
option(BUILD_NUMBER_CFLOATS "Set to ON to build static cfloat tests" OFF)
option(BUILD_NUMBER_DFLOATS "Set to ON to build static dfloat tests" OFF)
option(BUILD_NUMBER_DDS "Set to ON to build static double-double tests" OFF)
option(BUILD_NUMBER_QDS "Set to ON to build static quad-double tests" OFF)
option(BUILD_NUMBER_DOUBLE_DOUBLE "Set to ON to build static double-double tests" OFF)
option(BUILD_NUMBER_QUAD_DOUBLE "Set to ON to build static quad-double tests" OFF)
option(BUILD_NUMBER_AREALS "Set to ON to build static areal tests" OFF)
option(BUILD_NUMBER_UNUM1S "Set to ON to build static unum type 1 tests" OFF)
option(BUILD_NUMBER_UNUM2S "Set to ON to build static unum type 2 tests" OFF)
Expand Down Expand Up @@ -662,8 +662,8 @@ if(BUILD_NUMBER_STATICS)
set(BUILD_NUMBER_BFLOATS ON)
set(BUILD_NUMBER_CFLOATS ON)
set(BUILD_NUMBER_DFLOATS ON)
set(BUILD_NUMBER_DDS ON)
set(BUILD_NUMBER_QDS ON)
set(BUILD_NUMBER_DOUBLE_DOUBLE ON)
set(BUILD_NUMBER_QUAD_DOUBLE ON)
set(BUILD_NUMBER_AREALS ON)
set(BUILD_NUMBER_UNUM1S ON)
set(BUILD_NUMBER_UNUM2S ON)
Expand Down Expand Up @@ -827,14 +827,14 @@ add_subdirectory("static/dfloat")
endif(BUILD_NUMBER_DFLOATS)

# double-double floats
if(BUILD_NUMBER_DDS)
if(BUILD_NUMBER_DOUBLE_DOUBLE)
add_subdirectory("static/dd")
endif(BUILD_NUMBER_DDS)
endif(BUILD_NUMBER_DOUBLE_DOUBLE)

# quad-double floats
if(BUILD_NUMBER_QDS)
if(BUILD_NUMBER_QUAD_DOUBLE)
add_subdirectory("static/qd")
endif(BUILD_NUMBER_QDS)
endif(BUILD_NUMBER_QUAD_DOUBLE)

# conversion tests suites
if(BUILD_NUMBER_CONVERSIONS)
Expand Down
1 change: 1 addition & 0 deletions docker/Dockerfile.clang11builder
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends -V \
curl \
vim \
gdb \
gdbserver \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

Expand Down
1 change: 1 addition & 0 deletions docker/Dockerfile.clang12builder
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends -V \
curl \
vim \
gdb \
gdbserver \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

Expand Down
1 change: 1 addition & 0 deletions docker/Dockerfile.clang13builder
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends -V \
curl \
vim \
gdb \
gdbserver \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

Expand Down
1 change: 1 addition & 0 deletions docker/Dockerfile.clang14builder
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends -V \
curl \
vim \
gdb \
gdbserver \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

Expand Down
1 change: 1 addition & 0 deletions docker/Dockerfile.clang15builder
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends -V \
curl \
vim \
gdb \
gdbserver \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

Expand Down
1 change: 1 addition & 0 deletions docker/Dockerfile.clang16builder
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends -V \
curl \
vim \
gdb \
gdbserver \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

Expand Down
1 change: 1 addition & 0 deletions docker/Dockerfile.clang17builder
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends -V \
curl \
vim \
gdb \
gdbserver \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

Expand Down
1 change: 1 addition & 0 deletions docker/Dockerfile.clang18builder
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends -V \
curl \
vim \
gdb \
gdbserver \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

Expand Down
4 changes: 4 additions & 0 deletions docker/build_build_containers.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,7 @@ docker build --target clang15builder -t stillwater/builders:clang15builder -f Do
docker push stillwater/builders:clang15builder
docker build --target clang16builder -t stillwater/builders:clang16builder -f Dockerfile.clang16builder .
docker push stillwater/builders:clang16builder
docker build --target clang17builder -t stillwater/builders:clang17builder -f Dockerfile.clang17builder .
docker push stillwater/builders:clang17builder
docker build --target clang18builder -t stillwater/builders:clang18builder -f Dockerfile.clang18builder .
docker push stillwater/builders:clang18builder
2 changes: 1 addition & 1 deletion docker/build_release_container.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# example would be to strace an executable to find its dependencies

MAJOR=v3
MINOR=78
MINOR=79
VERSION="$MAJOR.$MINOR"

if [[ $# == 0 ]]; then
Expand Down
2 changes: 1 addition & 1 deletion docker/build_test_container.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
# example would be to strace an executable to find its dependencies

MAJOR=v3
MINOR=78
MINOR=79
VERSION="$MAJOR.$MINOR"

if [[ $# == 0 ]]; then
Expand Down
3 changes: 2 additions & 1 deletion include/universal/blas/blas.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,11 @@
//
// Super-simple BLAS implementation to aid application,
// numerical, and reproducibility examples.

#ifndef _UNIVERSAL_BLAS_LIBRARY
#define _UNIVERSAL_BLAS_LIBRARY

#include <cstdint>

// aggregation types for serialization
constexpr uint32_t UNIVERSAL_AGGREGATE_SCALAR = 0x1001;
constexpr uint32_t UNIVERSAL_AGGREGATE_VECTOR = 0x2002;
Expand Down
117 changes: 101 additions & 16 deletions include/universal/native/error_free_ops.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -110,22 +110,85 @@ namespace sw { namespace universal {
return s;
}

// ThreeSum
// ThreeSum enumerations

/// <summary>
/// three_sum computes the relationship a + b + c = s + r
/// three_sum computes the relationship x + y + z = r0 + r1 + r2
/// </summary>
/// <param name="a">input</param>
/// <param name="b">input</param>
/// <param name="c">input value, output residual</param>
inline void three_sum(volatile double& a, volatile double& b, volatile double& c) {
volatile double t1, t2, t3;
/// <param name="x">input, yields output r0 (==sum)</param>
/// <param name="y">input, yields output r1</param>
/// <param name="z">input, yields output r2</param>
inline void three_sum(volatile double& x, volatile double& y, volatile double& z) {
volatile double u, v, w;

u = two_sum(x, y, v);
x = two_sum(z, u, w); // x = r0 (==sum)
y = two_sum(v, w, z); // y = r1, and z = r2
}

/// <summary>
/// three_sum2 computes the relationship x + y + z = r0 + r1
/// </summary>
/// <param name="x">input, yields output r0 (==sum)</param>
/// <param name="y">input, yields output r1</param>
/// <param name="z">input</param>
inline void three_sum2(volatile double& x, volatile double& y, double z) {
volatile double u, v, w;

u = two_sum(x, y, v);
x = two_sum(z, u, w); // x = r0 (==sum)
y = v + w; // y = r1
}

t1 = two_sum(a, b, t2);
a = two_sum(c, t1, t3);
b = two_sum(t2, t3, c);
/// <summary>
/// three_sum3 computes the relationship x + y + z = r0
/// just the sum of (x, y, z) without any residuals
/// </summary>
/// <param name="x">input</param>
/// <param name="y">input</param>
/// <param name="z">input</param>
/// <returns>the (rounded) sum of (x + y + z)</returns>
inline double three_sum3(double x, double y, double z) {
double u = x + y;
return u + z; // traditional information loss if z << (x + y) and/or y << x
}

/* */

/// <summary>
/// quick_three_accumulate calculates the relationship a + b + c = s + r
/// s = quick_three_accum(a, b, c) adds c to the dd-pair (a, b).
/// If the result does not fit in two doubles, then the sum is
/// output into s and (a, b) contains the remainder.Otherwise
/// s is zero and (a, b) contains the sum.
/// </summary>
/// <param name="a"></param>
/// <param name="b"></param>
/// <param name="c"></param>
/// <returns></returns>
inline double quick_three_accumulation(volatile double& a, volatile double& b, double c) {
volatile double s;
bool za, zb;

s = two_sum(b, c, b);
s = two_sum(a, s, a);

za = (a != 0.0);
zb = (b != 0.0);

if (za && zb)
return s;

if (!zb) {
b = a;
a = s;
}
else {
a = s;
}

return 0.0;
}

// Split

Expand Down Expand Up @@ -165,8 +228,7 @@ namespace sw { namespace universal {
/// <param name="b">input</param>
/// <param name="r">reference to the residual</param>
/// <returns>the product of a * b</returns>
inline double two_prod(double a, double b, volatile double& r)
{
inline double two_prod(double a, double b, volatile double& r) {
volatile double p = a * b;
if (std::isfinite(p)) {
#if defined( QD_FMS )
Expand All @@ -192,8 +254,7 @@ namespace sw { namespace universal {
/// <returns>the square product of a</returns>
inline double two_sqr(double a, volatile double& r) {
volatile double p = a * a;
if (std::isfinite(p))
{
if (std::isfinite(p)) {
#if defined( QD_FMS )
err = QD_FMS(a, a, p);
#else
Expand All @@ -208,6 +269,30 @@ namespace sw { namespace universal {
}


// Computes the nearest integer to d
inline double nint(double d) {
if (d == std::floor(d)) return d;
return std::floor(d + 0.5);
}

// Computes the truncated integer
inline double aint(double d) {
return (d >= 0.0) ? std::floor(d) : std::ceil(d);
}

/* These are provided to give consistent
interface for double with double-double and quad-double. */
inline void sincosh(double t, double& sinh_t, double& cosh_t) {
sinh_t = std::sinh(t);
cosh_t = std::cosh(t);
}

// square of argument t
inline double sqr(double t) {
return t * t;
}


/// <summary>
/// renorm adjusts the quad-double to a canonical form
/// A quad-double number is an unevaluated sum of four IEEE double numbers.
Expand All @@ -225,7 +310,7 @@ namespace sw { namespace universal {
/// <param name="a2"></param>
/// <param name="a3"></param>
inline void renorm(volatile double& a0, volatile double& a1, volatile double& a2, volatile double& a3) {
volatile double s0, s1, s2 = 0.0, s3 = 0.0;
volatile double s0, s1, s2{ 0.0 }, s3{ 0.0 };

if (std::isinf(a0)) return;

Expand Down Expand Up @@ -274,7 +359,7 @@ namespace sw { namespace universal {
/// <param name="a3">reference to a3</param>
/// <param name="a4">reference to a4</param>
inline void renorm(volatile double& a0, volatile double& a1, volatile double& a2, volatile double& a3, volatile double& a4) {
volatile double s0, s1, s2 = 0.0, s3 = 0.0;
volatile double s0, s1, s2{ 0.0 }, s3{ 0.0 };

if (std::isinf(a0)) return;

Expand Down
16 changes: 0 additions & 16 deletions include/universal/native/extract_fields.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -257,21 +257,5 @@ template<typename Real>
}
return bIsInf;
}

inline void setFields(float& value, bool s, uint64_t rawExponentBits, uint64_t rawFractionBits) noexcept {
float_decoder decoder;
decoder.parts.sign = s;
decoder.parts.exponent = rawExponentBits & 0xFF;
decoder.parts.fraction = rawFractionBits & 0x7FFFFF;
value = decoder.f;
}

inline void setFields(double& value, bool s, uint64_t rawExponentBits, uint64_t rawFractionBits) noexcept {
double_decoder decoder;
decoder.parts.sign = s;
decoder.parts.exponent = rawExponentBits & 0x7FF;
decoder.parts.fraction = rawFractionBits & 0xF'FFFF'FFFF'FFFF;
value = decoder.d;
}

}} // namespace sw::universal
1 change: 1 addition & 0 deletions include/universal/native/ieee754.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
// constexpr compatible bit casts, otherwise
// fallback to nonconstexpr bit casts.
#include <universal/native/extract_fields.hpp>
#include <universal/native/set_fields.hpp>

// functions that do not need to be constexpr
#include <universal/native/nonconst_bitcast.hpp>
Expand Down
1 change: 1 addition & 0 deletions include/universal/native/ieee754_parameter.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
// SPDX-License-Identifier: MIT
//
// This file is part of the universal numbers project, which is released under an MIT Open Source license.
#include <cstdint>

namespace sw { namespace universal {

Expand Down
6 changes: 3 additions & 3 deletions include/universal/native/manipulators.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@ namespace sw { namespace universal {
template<typename Real,
typename = typename std::enable_if< std::is_floating_point<Real>::value, Real >::type
>
inline std::string color_print(Real number) {
inline std::string color_print(Real number, bool nibbleMarker = false) {
std::stringstream s;

bool sign{ false };
Expand All @@ -322,7 +322,7 @@ namespace sw { namespace universal {
uint64_t mask = (1 << (ieee754_parameter<Real>::ebits - 1));
for (int i = (ieee754_parameter<Real>::ebits - 1); i >= 0; --i) {
s << cyan << ((rawExponent & mask) ? '1' : '0');
// if (i > 0 && i % 4 == 0) s << cyan << '\'';
if (nibbleMarker && i > 0 && i % 4 == 0) s << cyan << '\'';
mask >>= 1;
}
}
Expand All @@ -333,7 +333,7 @@ namespace sw { namespace universal {
uint64_t mask = (uint64_t(1) << (ieee754_parameter<Real>::fbits - 1));
for (int i = (ieee754_parameter<Real>::fbits - 1); i >= 0; --i) {
s << magenta << ((rawFraction & mask) ? '1' : '0');
// if (i > 0 && i % 4 == 0) s << magenta << '\'';
if (nibbleMarker && i > 0 && i % 4 == 0) s << magenta << '\'';
mask >>= 1;
}

Expand Down
Loading

0 comments on commit 9aeed89

Please sign in to comment.