Here are some possible datat types may be used.

float32

float16 - IEEE 754 half-precision binary floating-point

  • format: binary16
  • sign bit: 1bit
  • exponent: 5bits
  • significand precision: 11 bits (10 explicitly stored)

!

!

* `exponent` is the value of the number that represented by the binaries.
*  rule1:
   * 0 00000 `1111111111` = (-1)^0 × 2^(-14) x (`1023`/1024)  0.0000609756
* rule2:
  * 0 01111 `0000000001` = (-1)^0 × 2^(15-15) × (1 + `1`/1024)  = 1.0009765625
  * 0 01101 `0101010101` = (-1)^0 × 2^(13-15) × (1 + `341`/1024) = 0.333251953125  1/3
* rule3:
    * 0 11111 0000000000 = infinity

symmetric int8

asymmetric uint8

symmetric int16

asymmetric uint16


Published

Category

MachineLearning

Tags