25
loading...
This website collects cookies to deliver better user experience
3.1415
(𝝿) or 9.109 × 10⁻³¹
(the mass of the electron in kg) in the memory which is limited by a finite number of ones and zeroes (aka bits)?17
). Let's say we have 16 bits (2 bytes) to store the number. In 16 bits we may store the integers in a range of [0, 65535]
:(0000000000000000)₂ = (0)₁₀
(0000000000010001)₂ =
(1 × 2⁴) +
(0 × 2³) +
(0 × 2²) +
(0 × 2¹) +
(1 × 2⁰) = (17)₁₀
(1111111111111111)₂ =
(1 × 2¹⁵) +
(1 × 2¹⁴) +
(1 × 2¹³) +
(1 × 2¹²) +
(1 × 2¹¹) +
(1 × 2¹⁰) +
(1 × 2⁹) +
(1 × 2⁸) +
(1 × 2⁷) +
(1 × 2⁶) +
(1 × 2⁵) +
(1 × 2⁴) +
(1 × 2³) +
(1 × 2²) +
(1 × 2¹) +
(1 × 2⁰) = (65535)₁₀
[0, 65535]
towards the negative numbers. In this case, our 16 bits would represent the numbers in a range of [-32768, +32767]
.-27.15625
(numbers after the decimal point are just being ignored).2
as a base.Floating-point format | Total bits | Sign bits | Exponent bits | Fraction bits | Base |
---|---|---|---|---|---|
Half-precision | 16 | 1 | 5 | 10 | 2 |
Single-precision | 32 | 1 | 8 | 23 | 2 |
Double-precision | 64 | 1 | 11 | 52 | 2 |
For example, a signed 32-bit integer variable has a maximum value of 2³¹ − 1 = 2,147,483,647, whereas an IEEE 754 32-bit base-2 floating-point variable has a maximum value of ≈ 3.4028235 × 10³⁸.
[0, 31]
(all values are positive here). But if we subtract the value of 15
from it, the range will be [-15, 16]
. The number 15
is called bias, and it is being calculated by the following formula:exponent_bias = 2 ^ (k−1) − 1
k - number of exponent bits
Checkout the 👉🏻 interactive version of this diagram to play around with setting bits on and off, and seeing how it would influence the final result
-0
, -∞
, +∞
and NaN
(not a number) values)Floating-point format | Exp min | Exp max | Range | Min positive |
---|---|---|---|---|
Half-precision | −14 | +15 | ±65,504 | 6.10 × 10⁻⁵ |
Single-precision | −126 | +127 | ±3.4028235 × 10³⁸ | 1.18 × 10⁻³⁸ |
const singlePrecisionBytesLength = 4; // 32 bits
const doublePrecisionBytesLength = 8; // 64 bits
const bitsInByte = 8;
/**
* Converts the float number into its IEEE 754 binary representation.
* @see: https://en.wikipedia.org/wiki/IEEE_754
*
* @param {number} floatNumber - float number in decimal format.
* @param {number} byteLength - number of bytes to use to store the float number.
* @return {string} - binary string representation of the float number.
*/
function floatAsBinaryString(floatNumber, byteLength) {
let numberAsBinaryString = '';
const arrayBuffer = new ArrayBuffer(byteLength);
const dataView = new DataView(arrayBuffer);
const byteOffset = 0;
const littleEndian = false;
if (byteLength === singlePrecisionBytesLength) {
dataView.setFloat32(byteOffset, floatNumber, littleEndian);
} else {
dataView.setFloat64(byteOffset, floatNumber, littleEndian);
}
for (let byteIndex = 0; byteIndex < byteLength; byteIndex += 1) {
let bits = dataView.getUint8(byteIndex).toString(2);
if (bits.length < bitsInByte) {
bits = new Array(bitsInByte - bits.length).fill('0').join('') + bits;
}
numberAsBinaryString += bits;
}
return numberAsBinaryString;
}
/**
* Converts the float number into its IEEE 754 64-bits binary representation.
*
* @param {number} floatNumber - float number in decimal format.
* @return {string} - 64 bits binary string representation of the float number.
*/
function floatAs64BinaryString(floatNumber) {
return floatAsBinaryString(floatNumber, doublePrecisionBytesLength);
}
/**
* Converts the float number into its IEEE 754 32-bits binary representation.
*
* @param {number} floatNumber - float number in decimal format.
* @return {string} - 32 bits binary string representation of the float number.
*/
function floatAs32BinaryString(floatNumber) {
return floatAsBinaryString(floatNumber, singlePrecisionBytesLength);
}
// Usage example
floatAs32BinaryString(1.875); // -> "00111111111100000000000000000000"