Computational Graph Introduction

What is a computational graph ?

A computational graph is a directed graph that represents the computation of a mathematical function. It is a graph of nodes and edges, where each node represents a mathematical operation and each edge represents the flow of data between the operations.

Why use a computational graph ? A computational graph is used to represent the computation of a mathematical function. It is a graph of nodes and edges, where each node represents a mathematical operation and each edge represents the flow of data between the operations.

https://miro.medium.com/v2/resize:fit:938/1*k2NQ1NPQxgBMp8fvoiMY4w.jpeg

How to use a computational graph ?

A computational graph is used to represent the computation of a mathematical function. It is a graph of nodes and edges, where each node represents a mathematical operation and each edge represents the flow of data between the operations.

Exercice 1 : simple computational graph

Let's assume we have a simple function \(f(x, y, z) = (x + y) z\). We can break this up into the equations \(q = x + y\) and \(f(x, y, z) = qz\). Using this simplified notation, we can also represent this equation as a computation graph :

Now let's assume that we are evaluating this function at \(x = -2, y = 5, z = -4\). In addition let the value of the upstream gradient (gradient of the loss with respect to our function, \(\frac{\partial L}{\partial f}\)) equal \(1\). These are filled out for you in the computation graph.

Question

Solve for the following values, both symbolically (without plugging in specific values of x/y/z), and evaluated at \(x = -2, y = 5, z = -4\), and \(\frac{\partial L}{\partial f} = 1\) :

\(\frac{\partial f}{\partial q} =\)
\(\frac{\partial q}{\partial x} =\)
\(\frac{\partial q}{\partial y} =\)
\(\frac{\partial f}{\partial z} =\)
\(\frac{\partial f}{\partial x} =\)
\(\frac{\partial f}{\partial y} =\)

⚠️ write the analytical expression and the numerical solution !

Exercise 2: Backpropagation in a computational graph

Now let's perform backpropagation through a single neuron of a neural network with a sigmoid activation. Specifically, we will define the pre-activation \(z = w_0x_0 + w_1x_1 + w_2\) and we will define the activation value \(\alpha = \sigma(z) = 1 / (1 + e^{-z})\) . The computation graph is visualized below:

In the graph we've filled out the forward activations, on the top of the lines, as well as the upstream gradient (gradient of the loss with respect to our neuron, \(\frac{\partial L}{\partial \alpha}\)). Use this information to compute the rest of the gradients (labelled with question marks) throughout the graph.

Question

Solve for the following values, both symbolically (without plugging in specific values of \(x_0, x_1, w_0, w_1, w_2\)), and evaluated it with the green values inside the graph :

\(\frac{\partial \alpha}{\partial x_0} =\)
\(\frac{\partial \alpha}{\partial w_0} =\)
\(\frac{\partial \alpha}{\partial x_1} =\)
\(\frac{\partial \alpha}{\partial w_1} =\)
\(\frac{\partial \alpha}{\partial w_2} =\)

⚠️ Use a calculator for the numerical solution, do not waste time on manual calculations

Exercise 3: Backpropagation focus on dimensions and derivatives

Let's assume we have a two layer neural network, as defined below:

\[z_1 = W_1 x^{(i)} + b_1\]

\[a_1 = ReLU(z_1)\]

\[z_2 = W_2 a_1 + b_2\]

\[\hat{y}^{(i)} = \sigma(z_2)\]

\[L^{(i)} = y^{(i)} * \log(\hat{y}^{(i)}) + (1 - y^{(i)}) * \log(1 - \hat{y}^{(i)})\]

\[J = -\frac{1}{m} \sum_{i=1}^{m} L^{(i)}\]

Note that \(x^{(i)}\) represents a single input example, and is of shape \(D_x \times 1\). Further \(y^{(i)}\) is a single output label and is a scalar. There are \(m\) examples in our dataset. We will use \(D_{a_1}\) nodes in our hidden layer; that is, \(z_1\)'s shape is \(D_{a_1} \times 1\).

Questions

What are the shapes of \(W_1, b_1, W_2, b_2\)? If we were vectorizing this network across multiple examples, what would the shapes of the weights/biases be instead? If we were vectorizing across multiple examples, what would the shapes of \(X\) and \(Y\) be instead?
What is \(\frac{\partial J}{\partial \hat{y}^{(i)}}\)? Refer to this result as \(\delta_1^{(i)}\). Using this result, what is \(\frac{\partial J}{\partial \hat{y}}\)?
What is \(\frac{\partial \hat{y}^{(i)}}{\partial z_2}\)? Refer to this result as \(\delta_2^{(i)}\).
What is \(\frac{\partial z_2}{\partial a_1}\)? Refer to this result as \(\delta_3^{(i)}\).
What is \(\frac{\partial a_1}{\partial z_1}\)? Refer to this result as \(\delta_4^{(i)}\).
What is \(\frac{\partial z_1}{\partial W_1}\)? Refer to this result as \(\delta_5^{(i)}\).
What is \(\frac{\partial J}{\partial W_1}\)? It may help to reuse work from the previous parts.

⚠️ Hint: Be careful with the shapes!