This section describes the relationship between arithmetic operations and fixed-point scaling, and some basic recommendations that may be appropriate for your fixed-point design. For each arithmetic operation:

The general slope/bias encoding scheme described in Scaling is used.
The scaling of the result is automatically selected based on the scaling of the two inputs. In other words, the scaling is inherited.
Scaling choices are based on:
- Minimizing the number of arithmetic operations of the result.
- Maximizing the precision of the result.
Additionally, radix point-only scaling is presented as a special case of the general encoding scheme.

In embedded systems, the scaling of variables at the hardware interface (the ADC or DAC) is fixed. However for most other variables, the scaling is something you can choose to give the best design. When scaling fixed-point variables, it is important to remember that:

Your scaling choices depend on the particular design you are simulating.
There is no best scaling approach. All choices have associated advantages and disadvantages. It is the goal of this section to expose these advantages and disadvantages to you.

Addition

Consider the addition of two real-world values.

These values are represented by the general slope/bias encoding scheme described in Scaling.

In a fixed-point system, the addition of values results in finding the variable Q_a.

This formula shows:

Q_a is not computed through a simple addition of Q_b and Q_c.
In general, there are two multiplies of a constant and a variable, two additions, and some additional bit shifting.

Inherited Scaling for Speed

In the process of finding the scaling of the sum, one reasonable goal is to simplify the calculations. Simplifying the calculations should reduce the number of operations thereby increasing execution speed. The following choices can help to minimize the number of arithmetic operations:

Set B_a = B_b + B_c. This eliminates one addition.
Set F_a = F_bor F_a = F_c. Either choice eliminates one of the two constant times variable multiplies.

The resulting formula is

These equations appear to be equivalent. However, your choice of rounding and precision may make one choice stand out over the other. To further simplify matters, you could choose E_a = E_c or E_a = E_b. This will eliminate some bit shifting.

Inherited Scaling for Maximum Precision

In the process of finding the scaling of the sum, one reasonable goal is maximum precision. The maximum precision scaling can be determined if the range of the variable is known. As shown in Example: Maximizing Precision, the range of a fixed-point operation can be determined from max(V_a) and min(V_a). For a summation, the range can be determined from

The maximum precision slope can now be derived.

In most cases the input and output word sizes are much greater than one, and the slope becomes

which depends only on the size of the input and output words. The corresponding bias is

The value of the bias depends on whether the inputs and output are signed or unsigned numbers.

If the inputs and output are all unsigned, then the minimum value for these variables are all zero and the bias reduces to a particularly simple form.

If the inputs and the output are all signed, then the bias becomes

Radix Point-Only Scaling

For radix point-only scaling, finding Q_a results in this simple expression.

This scaling choice results in only one addition and some bit shifting. The avoidance of any multiplications is a big advantage of radix point-only scaling.

The subtraction of values produces results that are analogous to those produced by the addition of values.

Accumulation

The accumulation of values is closely associated with addition.

Finding Q_{a_new} involves one multiply of a constant and a variable, two additions, and some bit shifting.

The important difference for fixed-point implementations is that the scaling of the output is identical to the scaling of the first input.

Radix Point-Only Scaling

For radix point-only scaling, finding Q_{a_new} results in this simple expression.

This scaling option only involves one addition and some bit shifting.

The negative accumulation of values produces results that are analogous to those produced by the accumulation of values.

Multiplication

Consider the multiplication of two real-world values.

These values are represented by the general slope/bias encoding scheme described in Scaling.

In a fixed-point system, the multiplication of values results in finding the variable Q_a.

This formula shows:

Q_a is not computed through a simple multiplication of Q_b and Q_c.
In general, there is one multiply of a constant and two variables, two multiplies of a constant and a variable, three additions, and some additional bit shifting.

Inherited Scaling for Speed

The number of arithmetic operations can be reduced with these choices:

Set B_a = B_bB_c. This eliminates one addition operation.
Set F_a = F_bF_c. This simplifies the triple multiplication - certainly the most difficult part of the equation to implement.
Set E_a = E_b + E_c. This eliminates some of the bit-shifting.

The resulting formula is

Inherited Scaling for Maximum Precision

The maximum precision scaling can be determined if the range of the variable is known. As shown in Example: Maximizing Precision, the range of a fixed-point operation can be determined from max(Ṽ_a) and min(Ṽ_a).

For multiplication, the range can be determined from

where

Radix Point-Only Scaling

For radix point-only scaling, finding Q_a results in this simple expression.

Gain

Consider the multiplication of a constant and a variable

where K is a constant called the gain. Since V_a results from the multiplication of a constant and a variable, finding Q_a is a simplified version of the general fixed-point multiply formula.

Note that the terms in the parentheses can be calculated offline. Therefore, there is only one multiplication of a constant and a variable and one addition.

To implement the above equation without changing it to a more complicated form, the constants need to be encoded using a radix point-only format. For each of these constants, the range is the trivial case of only one value. Despite the trivial range, the radix point formulas for maximum precision are still valid. The maximum precision representations are the most useful choices unless there is an overriding need to avoid any shifting. The encoding of the constants is

resulting in the formula

Inherited Scaling for Speed

The number of arithmetic operations can be reduced with these choices:

Set B_a = KB_b. This eliminates one constant term.
Set F_a = KF_b and E_a = E_b. This sets the other constant term to unity.

The resulting formula is simply Q_a = Q_b.

If the number of bits is different, then either handling potential overflows or performing sign extensions is the only possible operations involved.

Inherited Scaling for Maximum Precision

The scaling for maximum precision does not need to be different than the scaling for speed unless the output has fewer bits than the input. If this is the case, then saturation should be avoided by dividing the slope by 2 for each lost bit. This will prevent saturation but will cause rounding to occur.

Division

Division of values is an operation that should be avoided in fixed-point embedded systems, but it can occur in places. Therefore, consider the division of two real-world values.

These values are represented by the general slope/bias encoding scheme described in Scaling.

In a fixed-point system, the division of values results in finding the variable Q_a.

This formula shows:

In general, Q_a is not computed through a simple division of Q_b by Q_c.
In general, there are two multiplies of a constant and a variable, two additions, one division of a variable by a variable, one division of a constant by a variable, and some additional bit shifting.

Inherited Scaling for Speed

The number of arithmetic operations can be reduced with these choices:

Set B_a = 0. This eliminates one addition operation.
If B_c = 0, then set the fractional slope F_a= F_b/F_c. This eliminates one constant times variable multiplication.

The resulting formula is

If B_c ≠ 0, then no clear recommendation can be made.

Inherited Scaling for Maximum Precision

The maximum precision scaling can be determined if the range of the variable is known. As shown in Example: Maximizing Precision, the range of a fixed-point operation can be determined from max(V_a) and min(V_a). For division, the range can be determined from

where for nonzero denominators

Radix Point-Only Scaling

For radix point-only scaling, finding Q_a results in this simple expression.

For the last two formulas involving Q_a, a divide by zero, and zero divided by zero are possible. In these cases, the hardware will give some default behavior but you must make sure that these default responses give meaningful results for the embedded system.

Fixed-Point Blockset

Recommendations for Arithmetic and Scaling

Addition

Inherited Scaling for Speed

Radix Point-Only Scaling

Accumulation

Radix Point-Only Scaling

Multiplication

Inherited Scaling for Speed

Inherited Scaling for Maximum Precision

Radix Point-Only Scaling

Gain

Inherited Scaling for Speed

Inherited Scaling for Maximum Precision

Division

Inherited Scaling for Speed

Inherited Scaling for Maximum Precision

Radix Point-Only Scaling