Skip to content

Computational Graph Introduction

What is a computational graph ?

A computational graph is a directed graph that represents the computation of a mathematical function. It is a graph of nodes and edges, where each node represents a mathematical operation and each edge represents the flow of data between the operations.

Why use a computational graph ? A computational graph is used to represent the computation of a mathematical function. It is a graph of nodes and edges, where each node represents a mathematical operation and each edge represents the flow of data between the operations.

https://miro.medium.com/v2/resize:fit:938/1*k2NQ1NPQxgBMp8fvoiMY4w.jpeg

How to use a computational graph ?

A computational graph is used to represent the computation of a mathematical function. It is a graph of nodes and edges, where each node represents a mathematical operation and each edge represents the flow of data between the operations.

Exercice 1 : simple computational graph

Let's assume we have a simple function \(f(x, y, z) = (x + y) z\). We can break this up into the equations \(q = x + y\) and \(f(x, y, z) = qz\). Using this simplified notation, we can also represent this equation as a computation graph :

./img/cg_1.png

Now let's assume that we are evaluating this function at \(x = -2, y = 5, z = -4\). In addition let the value of the upstream gradient (gradient of the loss with respect to our function, \(\frac{\partial L}{\partial f}\)) equal \(1\). These are filled out for you in the computation graph.

Question

Solve for the following values, both symbolically (without plugging in specific values of x/y/z), and evaluated at \(x = -2, y = 5, z = -4\), and \(\frac{\partial L}{\partial f} = 1\) :

  1. \(\frac{\partial f}{\partial q} =\)

  2. \(\frac{\partial q}{\partial x} =\)

  3. \(\frac{\partial q}{\partial y} =\)

  4. \(\frac{\partial f}{\partial z} =\)

  5. \(\frac{\partial f}{\partial x} =\)

  6. \(\frac{\partial f}{\partial y} =\)

⚠️ write the analytical expression and the numerical solution !

Exercise 2: Backpropagation in a computational graph

Now let's perform backpropagation through a single neuron of a neural network with a sigmoid activation. Specifically, we will define the pre-activation \(z = w_0x_0 + w_1x_1 + w_2\) and we will define the activation value \(\alpha = \sigma(z) = 1 / (1 + e^{-z})\) . The computation graph is visualized below:

./img/cg_2.png

In the graph we've filled out the ​forward activations,​ on the top of the lines, as well as the upstream gradient (gradient of the loss with respect to our neuron, \(\frac{\partial L}{\partial \alpha}\)). Use this information to compute the rest of the gradients (labelled with ​question marks​) throughout the graph.

Question

Solve for the following values, both symbolically (without plugging in specific values of \(x_0, x_1, w_0, w_1, w_2\)), and evaluated it with the green values inside the graph :

  1. \(\frac{\partial \alpha}{\partial x_0} =\)

  2. \(\frac{\partial \alpha}{\partial w_0} =\)

  3. \(\frac{\partial \alpha}{\partial x_1} =\)

  4. \(\frac{\partial \alpha}{\partial w_1} =\)

  5. \(\frac{\partial \alpha}{\partial w_2} =\)

⚠️ Use a calculator for the numerical solution, do not waste time on manual calculations

Exercise 3: Backpropagation focus on dimensions and derivatives

Let's assume we have a two layer neural network, as defined below:

\[z_1 = W_1 x^{(i)} + b_1\]
\[a_1 = ReLU(z_1)\]
\[z_2 = W_2 a_1 + b_2\]
\[\hat{y}^{(i)} = \sigma(z_2)\]
\[L^{(i)} = y^{(i)} * \log(\hat{y}^{(i)}) + (1 - y^{(i)}) * \log(1 - \hat{y}^{(i)})\]
\[J = -\frac{1}{m} \sum_{i=1}^{m} L^{(i)}\]

Note that \(x^{(i)}\) represents a single input example, and is of shape \(D_x \times 1\). Further \(y^{(i)}\) is a single output label and is a scalar. There are \(m\) examples in our dataset. We will use \(D_{a_1}\) nodes in our hidden layer; that is, \(z_1\)'s shape is \(D_{a_1} \times 1\).

Questions

  1. What are the shapes of \(W_1, b_1, W_2, b_2\)? If we were vectorizing this network across multiple examples, what would the shapes of the weights/biases be instead? If we were vectorizing across multiple examples, what would the shapes of \(X\) and \(Y\) be instead?

  2. What is \(\frac{\partial J}{\partial \hat{y}^{(i)}}\)? Refer to this result as \(\delta_1^{(i)}\). Using this result, what is \(\frac{\partial J}{\partial \hat{y}}\)?

  3. What is \(\frac{\partial \hat{y}^{(i)}}{\partial z_2}\)? Refer to this result as \(\delta_2^{(i)}\).

  4. What is \(\frac{\partial z_2}{\partial a_1}\)? Refer to this result as \(\delta_3^{(i)}\).

  5. What is \(\frac{\partial a_1}{\partial z_1}\)? Refer to this result as \(\delta_4^{(i)}\).

  6. What is \(\frac{\partial z_1}{\partial W_1}\)? Refer to this result as \(\delta_5^{(i)}\).

  7. What is \(\frac{\partial J}{\partial W_1}\)? It may help to reuse work from the previous parts.

    ⚠️ Hint: Be careful with the shapes!