Homework 1 - CSCI-564 Advanced Computer Architecture

CWID: 10719035

\(CPI_{alu} = 1.1\)

\(CPI_{branch} = 3.0\)

\(Cache_Hitatio = 60\%\)

\(CPI_{hit} = 1\)

\(CPI_{miss} = 120\)

\(Percentage_{branch} = 20\%\) of the instructions are loads

\(Percentage_{load} = 22\%\) of the instructions are loads

\(Percentage_{stores} = 12\%\) of the instructions are stores

The loads & stores are impacted by the cache hit ratio.

To find the \(CPI_{total}\) we need to use the equation \[CPI_{total} = \sum_{i=1}^{n} {CPI_i \times Percentage_i}\] \((CPI_{alu} \times Percentage_{alu}) + (CPI_{branch} \times Percentage_{branch}) + (CPI_{load} \times Percentage_{load}) + (CPI_{store} \times Percentage_{store})\)

- First
To calculate the \(CPI_{load}\) and \(CPI_{store}\) we determine the \(CPI\) based on the hit ratio of the cache. We’re assuming here that load and store have an even distribution of how the cache and memory is accessed. \(CPI_{cache} = ((CPI_{hit} \times Cache\_Hitatio) + (CPI_{miss} \times (100 - Cache\_Hitratio))) = (60\% \times 1) + (40\% \times 120) = 48.6\)

- Second
Next we need to determine the percentage of ALU instructions which is calculated as:

\(Percentage_{alu} = 100 - (Percentage{branch} + Percentage_{load} + Percentage_{branch}) = 100 - (20\% + 22\% + 12\%) = 46\%\)

- Third
Now that we have the \(CPI_{cache}\) we have calcule the \(CPI_{total}\). Each CPI will be multiplied by the percentage that the instruction type is executed. The \(CPI_{load}\) and \(CPI_{store}\) will be substituted with the \(CPI_{cache}\)

\(CPI_{total} = (1.1 \times 46\%) + (3 \times 20\%) + (48.6 \times 22\%) + (48.6 \times 12\%) = \textbf{17.63}\)

When comparing two processors designs we’ll have to make assumptions so that we can make an equal comparison. In this problem we have two processor options processor **A** that executes at *1Ghz* with an average CPI of 1.2 and processor **B** that executes at *2Ghz* and has a CPI of 2.

To compare two processors we typically would have to compare cost, performance, and energy/power consumption. These three factors will have different weights depending on the intended use of the processor. For example, energy has a hight weight in a decision if the intended use is for a mobile device.

Based on the problem provided not enough details around cost, energy, and intended use are provided. As such perfomance is left in making a comparison, to make this comparison we’ll leverage the Performance Equation. *\[{L = IC*CPI*CT}\]*

We can assume that the workload between processors is the same which means that \(IC\) is contant, which leaves us with \(CPI\) and \(CT\).: \[{L = CPI*CT}\]

Given that we’re given frequency in Ghz we’ll change \(CT\), which is clock cycles in seconds, to clock speed. This means our equation will change to \(L = \frac{CPI}{f}\). Table one shows the results of calculting \(L\) for each processor.

Processor | Latency Calc | Latency |
---|---|---|

A | \(\frac{1.2}{1*10^9}\) | \(1.2*10^-9\) |

B | \(\frac{2}{2*10^9}\) | \(1*10-9\) |

\label{my-label}

To determine the improvement from one over the other, we leverage the Speed up equation \[{Speedup =\frac{Latency_{a}}{Latency_{b}} = \frac{1.2*10^-9}{1*10^-9} = 1.2}\] this provided a **1.2x** speedup over processor A.

Factors, such as workload or architecture design, are not differnet between the processors. This is critical as if the architecture is different then bandwidth of the two processors can be different. Given the problem these are assumed to be the same and are not factored in the decission. Additionally, cost and power are not considered which can impact the decission making.

This leaves us only with the comparision of performance, for which the Performance Equation was used. When comparing the latency between the processors the speedup of processor B was **1.2x**, for which the conclusion based on just performance processor B is a better choice.

With a \(s\) factor of 3.5 the memory performance increases to 50% of the new latency or \(CPU_{time}\). Using absolute execution times, Amdahl’s law is in terms of \(L_{new}\), the execution time after an improvement; \(L_{memory}\), the execution time affected by the improvement; \(s\), how many times faster the improved part runs, or its speedup; and \(L_{unaffected}\), the execution time unaffected by the improvement. In these terms, Amdahl’s Law states that: \[{L_{new}=\frac{L_{memory}}{s} + L_{unaffected}}\] If we assume that \(L_{new} = 100\) then based on the given that the new memory latency is 50%, then that would imply that \(L_{unaffected} = 50\). Using these numbers in the Amdahl’s law we get the following: \[{100=\frac{L_{memory}}{3.5} + 50}\] Solving for \(L_{memory}\) would result in it being 175. This will then give us a total \(L_{old} = 175 + 50 = 224\). Now we can calculate the percentage that \(L_{memory}\) is of the original latency: \(\frac{175}{225} * 100 = \textbf{77.77}\%\)

Calculate the Speed up of having a dual-core using Amdahl’s law for parallel processors.

\[Speedup = \frac{1}{\frac{x}{p}+(1-x)}\]

Where \(x\) is % of the program that can be parallized, which is 60% and \(p\) the number of processors. This provides use with a \(Speedup = \frac{1}{\frac{.6}{2} + .4} = \textbf{1.42886}\).

The \(frequency\) represents a \(\frac{1}{1.42886} = \textbf{.7}\) (70%) of the original. This means we can reduce the frequency by **30%** to maintain the same performance.

## Share on Social Media