Jekyll2019-06-25T05:28:02+00:00http://up9rade.com/feed.xmlup9radeMainly focus on machine learning and deep learning in domain of image processing, interact with those things in visualized way. For each series, will present 9+ grades' challenges.PCA and related2019-04-20T00:00:00+00:002019-04-20T00:00:00+00:00http://up9rade.com/2019/04/20/pca-and-related<h1 id="pca-and-related">PCA and related</h1>
<p>Principal component analysis is a methodology widely implemented in dimensionality reduction.</p>
<p>PCA relies on orthogonal transformation, to covert a set of possibly correlated variables into a set of linearly independent / uncorrelated variables, which were called principal components.</p>
<h4 id="pca--linear-regression">PCA & Linear Regression</h4>
<p>The difference between PCA & linear regression, to the latter, the cost function is:</p>
<script type="math/tex; mode=display">argmin\frac{1}{2}(h(\Theta)-y)^2</script>
<p>While the measurement for PCA is different. The optimization is to lead to:</p>
<ul>
<li>maximum variance (which remains maximum original data information)</li>
<li>minimum covariance (which eliminates correlated dimensions)</li>
</ul>
<p><img src="http://up9rade.com/img/pca01.png" /></p>
<p>In my personal view, the cost function & optimization methodology may be swap between these two, it’s mainly due to established by usage and for the sake of easy to implementation.</p>
<p>First of all, we make zero mean of all the data.</p>
<script type="math/tex; mode=display">Var(a) = \frac{1}{n}\sum_{i=1}^{n}(a - \bar{a})^2</script>
<script type="math/tex; mode=display">Cov(a, b) = \frac{1}{n}\sum_{i=1}^{n}(a - \bar{a})(b-\bar{b}) = \frac{1}{n}\sum_{i=1}^{n}ab</script>
<p>Our initiative is to maximize Var(a) and minimize Cov(a,b).</p>
<p>And there’s a perfect equation on this:</p>
<script type="math/tex; mode=display">\frac{1}{n}XX^T</script>
<p>According to definition of covariance matrix, which consists of variance of variables along the main diagonal and the covariance between each variables in the other matrix positions.</p>
<p>So, the leads is clear now:
Try to maximize the values along the main diagonal and sort them by descend, and minimize the other values through the matrix.</p>
<p>Though tends to minimize information loss, it’s a “lossy compression”.</p>
<h4 id="steps-of-calculating-pca">Steps of calculating PCA</h4>
<ol>
<li>Zero mean of dataset;</li>
<li>Covariance matrix;</li>
<li>Eigenvectors & eigenvalues;</li>
<li>Decide top K values;</li>
</ol>
<p>A practice for deciding K:</p>
<script type="math/tex; mode=display">\frac{\sum_{i=0}^{k}\lambda_k}{\sum_{i=0}^{n}\lambda_n} \geqslant 0.99</script>
<h4 id="pca--svd">PCA & SVD</h4>
<table>
<thead>
<tr>
<th> </th>
<th>PCA</th>
<th>SVD</th>
</tr>
</thead>
<tbody>
<tr>
<td>purpose</td>
<td>dimensionality reduction</td>
<td>matrix decomposition</td>
</tr>
<tr>
<td>sparse data</td>
<td>not support, zero mean data</td>
<td>supports sparse matrix</td>
</tr>
<tr>
<td>computation</td>
<td>less efficient, needs solve cov matrix</td>
<td>efficient</td>
</tr>
<tr>
<td>reduction</td>
<td>one way, dimensionality</td>
<td>bilateral, dimensionality(right matrix) and sample numbers(left matrix)</td>
</tr>
</tbody>
</table>
<p>Towards bilateral dimensionality reduction, let’s assume we have input data set m*n. According to SVD:</p>
<script type="math/tex; mode=display">SVD = U_{m,r}\sum_{r,r}V_{r,n}^T</script>
<script type="math/tex; mode=display">New = U_{m,r}^TX_{m,n}</script>
<p>By transformation, we may get a new matrix with r * n, here we managed to reduct the rows from the original data set, that’s another way of data set reduction.</p>
<h4 id="pca-is-more-than-dimensionality-reduction">PCA is more than dimensionality reduction</h4>
<p>Other than dimensionality reduction, PCA was introduced into image argumentation with the “ImageNet Classiﬁcation with Deep Convolutional Neural Network” aka AlexNet.</p>
<p>According to the paper, Alex et.al decreased top 1 error rate by 1% in imagenet, which was pretty amazing.</p>
<p>The method is:</p>
<script type="math/tex; mode=display">NewImage = [p_1, p_2, p_3][\alpha_1\lambda_1,\alpha_2\lambda_2,\alpha_3\lambda_3 ]</script>
<p>Where p is the eigen vector, lambda is eigen value, alpha is Gaussian random with zero mean and std variation 0.1.</p>
<p>[post status: fresh start]</p>PCA and relatedCoding, Binary Search [grade 1]2019-04-05T00:00:00+00:002019-04-05T00:00:00+00:00http://up9rade.com/2019/04/05/coding-binary-search<h1 id="coding-binary-search">Coding, Binary Search</h1>
<h2 id="grade-1-binary-search">grade 1: binary search</h2>
<p>There’s a saying about binary search, including mistakes on: dead looping, error boundary checking etc, a total 90% error rate will occur during implementation.</p>
<p>Question:
Given a sorted, non-descending array arr, try to find the min i, which makes arr[i] >= n. If there’s no existing i, return to -1.</p>
<p>Python version:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">bi_search</span><span class="p">(</span><span class="n">arr</span><span class="p">,</span> <span class="n">n</span><span class="p">):</span>
<span class="n">arr_len</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">arr</span><span class="p">)</span>
<span class="c"># boundary check</span>
<span class="k">if</span> <span class="n">arr_len</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">return</span> <span class="o">-</span><span class="mi">1</span>
<span class="k">if</span> <span class="n">arr</span><span class="p">[</span><span class="n">arr_len</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o"><</span> <span class="n">n</span><span class="p">:</span>
<span class="k">return</span> <span class="o">-</span><span class="mi">1</span>
<span class="c"># initialization</span>
<span class="n">first</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">last</span> <span class="o">=</span> <span class="n">sl</span>
<span class="k">while</span> <span class="n">first</span><span class="o"><</span><span class="n">last</span><span class="p">:</span> <span class="c"># sector 1</span>
<span class="n">mid</span> <span class="o">=</span> <span class="n">first</span> <span class="o">+</span> <span class="p">(</span><span class="n">last</span><span class="o">-</span><span class="n">first</span><span class="p">)</span><span class="o">//</span><span class="mi">2</span> <span class="c"># sector 2</span>
<span class="k">if</span> <span class="n">arr</span><span class="p">[</span><span class="n">mid</span><span class="p">]</span> <span class="o"><</span> <span class="n">n</span><span class="p">:</span> <span class="c"># sector 3</span>
<span class="n">first</span> <span class="o">=</span> <span class="n">mid</span> <span class="o">+</span> <span class="mi">1</span> <span class="c"># sector 4</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">last</span> <span class="o">=</span> <span class="n">mid</span>
<span class="k">return</span> <span class="n">first</span>
<span class="n">arr</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">7</span><span class="p">]</span>
<span class="n">bi_search</span><span class="p">(</span><span class="n">arr</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
</code></pre></div></div>
<p>Decipher on the implementation:</p>
<h3 id="boundary-check">boundary check</h3>
<p>Firstly, boundary check and if there’s no satisfactory number exists in given array, we may stop early.</p>
<h3 id="sector-1">sector 1</h3>
<p>Now come to the search part, in sector 1:
We loop from left to right of an array, if you remember we usually for things like this:</p>
<p><code class="highlighter-rouge">for (i = 0; i < len; i++)</code></p>
<p>And here is an intelligible explanation:
<a href="https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html">Why numbering should start at zero</a></p>
<h3 id="sector-2">sector 2</h3>
<p>About the mid-point part,</p>
<p><code class="highlighter-rouge">mid = l + (r - l)//2</code></p>
<p>There’s an alternative way:</p>
<p><code class="highlighter-rouge">mid = (l + r)//2</code></p>
<p>This is also work for python, but will put an adverse effect on stack over flow in c and java, because l + r might cross memory boundary when these 2 numbers are big enough.</p>
<p>And why //2 instead of /2 in python, in python 3, / is the operator to floating-point division and // is to integer division.</p>
<h3 id="sector-3">sector 3</h3>
<p>Here, we start with:</p>
<p><code class="highlighter-rouge">arr[mid] < n</code> instead of <code class="highlighter-rouge">arr[mid] > n</code></p>
<p>It also implies with our initial logic, let the program search from left/first to right/last, by assuming arr[mid] < n, step by step, moves the search scale.</p>
<h3 id="sector-4">sector 4</h3>
<p>Why using:</p>
<p><code class="highlighter-rouge">first = mid +1</code> instead of <code class="highlighter-rouge">first = mid</code> here?</p>
<p>Avoid the situation of dead loop. Consider when will the <code class="highlighter-rouge">while first < last</code> condition always be true (aka dead loop)?</p>
<p>The answer is, when:</p>
<p><code class="highlighter-rouge">first = last = mid</code></p>
<p>The binary search procedure is to squeeze the search space of [first, last), once the process managed to squeeze the elements between first and last, into one, and by coincidence the element is bigger than given n, the program will trapped into dead loop, and this happens.</p>
<p>With <code class="highlighter-rouge">first = mid + 1</code> been given, this situation will be avoided.</p>
<p>You might saying, wait, how would we know one more element won’t be skipped during search since we add 1 to mid? Yes, it won’t happen, by looking at the condition, <code class="highlighter-rouge">arr[mid] < n</code> , instead of <code class="highlighter-rouge">arr[mid] <= n</code>, so we know adding 1 to mid, won’t skip one more element.</p>
<p>And the program ends when <code class="highlighter-rouge">first == last</code>, search scale becomes to [first, last), at this moment, returns first or last doesn’t matter, they share the same value.</p>
<p>Following are 2 other implementations.</p>
<p>Javascript Version</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span> <span class="nx">bi_search</span><span class="p">(</span><span class="nx">arr</span><span class="p">,</span> <span class="nx">n</span><span class="p">){</span>
<span class="nx">sl</span> <span class="o">=</span> <span class="nx">arr</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">sl</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">arr</span><span class="p">[</span><span class="nx">sl</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o"><</span> <span class="nx">n</span><span class="p">)</span> <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">l</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="nx">r</span> <span class="o">=</span> <span class="nx">sl</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="nx">l</span><span class="o"><</span><span class="nx">r</span><span class="p">){</span>
<span class="nx">mid</span> <span class="o">=</span> <span class="nx">l</span> <span class="o">+</span> <span class="nb">parseInt</span><span class="p">((</span><span class="nx">r</span><span class="o">-</span><span class="nx">l</span><span class="p">)</span><span class="o">/</span><span class="mi">2</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">arr</span><span class="p">[</span><span class="nx">mid</span><span class="p">]</span> <span class="o"><</span> <span class="nx">n</span><span class="p">)</span>
<span class="nx">l</span> <span class="o">=</span> <span class="nx">mid</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">else</span>
<span class="nx">r</span> <span class="o">=</span> <span class="nx">mid</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">r</span><span class="p">;</span>
<span class="p">}</span>
<span class="nx">arr</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">7</span><span class="p">];</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">bi_search</span><span class="p">(</span><span class="nx">arr</span><span class="p">,</span> <span class="mi">2</span><span class="p">));</span>
</code></pre></div></div>
<p>C Version</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include <stdio.h>
</span>
<span class="cp">#define ARR_LEN(arr) (sizeof(arr) / sizeof(arr[0]))
</span>
<span class="kt">int</span> <span class="n">bi_search</span><span class="p">(</span><span class="kt">int</span> <span class="n">arr</span><span class="p">[],</span> <span class="kt">int</span> <span class="n">len</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">){</span>
<span class="k">if</span> <span class="p">(</span><span class="n">len</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">arr</span><span class="p">[</span><span class="n">len</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o"><</span> <span class="n">n</span><span class="p">)</span> <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">l</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">r</span> <span class="o">=</span> <span class="n">len</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="n">l</span><span class="o"><</span><span class="n">r</span><span class="p">){</span>
<span class="kt">int</span> <span class="n">mid</span> <span class="o">=</span> <span class="n">l</span> <span class="o">+</span> <span class="p">(</span><span class="n">r</span><span class="o">-</span><span class="n">l</span><span class="p">)</span><span class="o">/</span><span class="mi">2</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">arr</span><span class="p">[</span><span class="n">mid</span><span class="p">]</span> <span class="o"><</span> <span class="n">n</span><span class="p">)</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">mid</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">else</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">mid</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">l</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">n</span><span class="p">,</span> <span class="n">rst</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">arr</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">7</span><span class="p">};</span>
<span class="kt">int</span> <span class="n">len</span> <span class="o">=</span> <span class="n">ARR_LEN</span><span class="p">(</span><span class="n">arr</span><span class="p">);</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">Please input a number: "</span><span class="p">);</span>
<span class="n">scanf</span><span class="p">(</span><span class="s">"%d"</span><span class="p">,</span> <span class="o">&</span><span class="n">n</span><span class="p">);</span>
<span class="n">rst</span> <span class="o">=</span> <span class="n">bi_search</span><span class="p">(</span><span class="n">arr</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="n">n</span><span class="p">);</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Answer is %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">rst</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>[post status: almost done]</p>Coding, Binary SearchWhat is image file [grade 4]2019-03-10T00:00:00+00:002019-03-10T00:00:00+00:00http://up9rade.com/2019/03/10/what-is-image-a-pixel-grade-4<h1 id="what-is-image-file-behind-a-pixel">What is image file: behind a pixel</h1>
<h2 id="grade-4-behind-a-pixel">grade 4: behind a pixel</h2>
<p>Let’s try to dig what’s behind an image from a popular website, i.e. amazon, facebook.</p>
<p>Right click the mouse, choose open image in new tab, and you will see a long url like the following:
[https://images-na.ssl-images-amazon.com/images/…<em>260x260_1550730012_jpg](https://images-na.ssl-images-amazon.com/images/G/01/US-hq/2019/img/Certified_Refurbished/XCM_Manual_1161174_renewed_card_resize_260x260_Certified_Refurbished_XCM_Manual_1161174_us_certified_refurbished_renewed_card_resize_260x260_1550730012_jpg._CB454883446_SY260</em>.jpg)</p>
<p>First of all, the image comes with a height & width in pixels, for e.g. we see 260x260 from amazon.</p>
<p>What’s the actual file size aka how much disk space will it take to save a 260x260 pixels image?</p>
<p>Image size = 260 x 260 * 3 bytes / 1024 kb = 198 kb</p>
<p>A little bit explanation on the equation:</p>
<p>Normally images of jpg on the web, they were consists of 3 color channels, the RGB “red, green, blue”.
For each channel, use a number in between 0 to 255 to denote, to do so, which takes 8 bits, each bit represents 0 or 1.</p>
<p>Hence, 1 pixel from an image, could use 3 bytes to denote its 3 color channels.</p>
<p>1 Kb = 1024 bytes.</p>
<p>This is the same size if you open this image in photoshop.
But wait, if you check the image file info in your operation system, doesn’t matter Mac or Wins, you found it’s not the number of 198 kb as we calculated on the “size”.</p>
<p>What’s going on?</p>
<p>In image compression, the file size is not calculated by its pixels. As we introduced in previous post, <a href="http://up9rade.com/2018/08/26/what-is-image-jpeg-heif-grade-1.html">What is image file</a>, jpeg is the commonly used lossy compression for images.</p>What is image file: behind a pixelConvolution in forward and backward2019-03-04T00:00:00+00:002019-03-04T00:00:00+00:00http://up9rade.com/2019/03/04/convolution-in-forward-and-backward<h1 id="convolution-in-forward-and-backward">Convolution in forward and backward</h1>
<p>Convolution neural network is a specific type of MLP, multi-layer perceptron, yet it complies with MLP’s core, which is:
<script type="math/tex">y = w*x + b</script></p>
<p>With x been transferred forward, to get a cost function (which is a scalar) of:</p>
<p><script type="math/tex">J(\theta, x, y)</script>
We then back propagate the J function, with gradient descent methodology, to compute:
<script type="math/tex">\frac{\partial J(\theta, x, y)}{\partial \theta}</script></p>
<p>Once done optimization with cost function J, which means we could get the representation parameters with given training data x and label data y.</p>
<p>Following is a simplified convolution network, let’s assume an image input with 1 channel, a kernel filter with 1 channel and an output with padding on input layer, which makes the output size is same as input.</p>
<p><img src="http://up9rade.com/img/nn027.png" /></p>
<p>We have: $\frac{\partial L}{\partial y}$ already known.</p>
<p>2 unknowns going to be solved:</p>
<script type="math/tex; mode=display">\frac{\partial L}{\partial w} \qquad and \qquad \frac{\partial L}{\partial x}</script>
<h3 id="part-1">Part 1:</h3>
<p>According to “chain rule” in calculus,</p>
<script type="math/tex; mode=display">\frac{\partial L}{\partial w} = \frac{\partial L}{\partial y} * \frac{\partial y}{\partial w}</script>
<script type="math/tex; mode=display">\frac{\partial L}{\partial w} = \frac{\partial L}{\partial y} * \frac{\partial {(x * w)}}{\partial w} = \frac{\partial L}{\partial y} * x^T</script>
<p>How does this magic $x^T$ comes out?</p>
<p>Well, let’s review the traditional & im2col convolutional process again, you may easily find more reference infos about traditional & im2cold through web.</p>
<p><img src="http://up9rade.com/img/nn028.png" /></p>
<p>Here we’d like to emphasize that, through the im2col method, we may transfer convolutional to matrix product.</p>
<p>And if we simplify kernel into just one kernel, this actually turns into “jacobian matrix”, by doing that, the kernel and output both turns into “column vector”.</p>
<p>Following is an explanation intuitively about why there is an $x^T$.</p>
<script type="math/tex; mode=display">x = \begin{bmatrix}
x_1, x_2, x_3
\end{bmatrix}
\quad
w = \begin{bmatrix}
w_1 \\ w_2 \\ w_3
\end{bmatrix}
\\
y = xw = x_1w_1 + x_2w_2 + x_3w_3
\\
\frac{\partial y}{\partial w} = \begin{bmatrix}
\frac{\partial y}{\partial w_1} \\ \frac{\partial y}{\partial w_2} \\ \frac{\partial y}{\partial w_3}
\end{bmatrix} = \begin{bmatrix}
x_1 \\ x_2 \\ x_3
\end{bmatrix}
= x^T</script>
<h3 id="part-2">Part 2:</h3>
<p>According to “chain rule” in calculus,</p>
<script type="math/tex; mode=display">\frac{\partial L}{\partial x} = \frac{\partial L}{\partial y} * \frac{\partial y}{\partial x}</script>
<p>We have $\frac{\partial L}{\partial y}$ already known, and going to figure out $\frac{\partial y}{\partial x}$.</p>
<p>With saying $\frac{\partial y}{\partial x}$, the factor behind the scene is, what’s the scope of impact with one unit chaning on “x” will put on “y”.</p>
<p>Let’s take a look at the convolution procedure again:</p>
<p><img src="http://up9rade.com/img/nn029.png" /></p>
<p>Take a pixel from upper left of the input, say “p”. What’s the scope of impact of “p”, will put on output “y”?</p>
<p>Through the 2<em>2 kernel, we could see 1 pixel from “x” will affect 2</em>2 pixels in “y”, in the sequence of “pd, pc, pb, pa”.</p>
<p>A little bit details on how do we get the “pd, pc, pb, pa”:</p>
<p>With 2<em>2 kernel and stride 1, we could find out the output value of y on the upper left equals to: p</em>d + f(rest3pixels * rest3weights), in short p*d + f(c) .</p>
<p>2 notices: in order keep output size is equal to the input x, we added padding in x, denoted by line of dashes. The f(c) is not important, because when calculating derivative of $\frac{\partial y_p}{\partial x_p}$ , the f(c) will be eliminated.</p>
<p>Now let’s move back to the implementation, we have:</p>
<script type="math/tex; mode=display">p * kernel(a, b , c , d) = {pd, pc, pb, pa}</script>
<p>This is exactly:</p>
<script type="math/tex; mode=display">p * kernel^{flip} = {pd, pc, pb, pa}</script>
<p>What an interesting finding!</p>
<p>By solving $\frac{\partial y_p}{\partial x_p}$, we now get:</p>
<script type="math/tex; mode=display">\frac{\partial L}{\partial x} = \frac{\partial L}{\partial y} * w^{flip}</script>
<p>Let’s find out more about the multiplication:</p>
<p>Assume size of input is: $N_1 * N_2$ and size of kernel is $K_1 * K_2$, we know size of output is the same as input: $N_1 * N_2$</p>
<script type="math/tex; mode=display">|\frac{\partial L}{\partial y}| = N_1 * N_2
\qquad
|w^{flip}| = K_1 * K_2</script>
<p>Hence, this is another convolution.</p>
<p>To recap, on calculating $\frac{\partial L}{\partial w}$, we transfer the convolution into the matrix multiplication by im2col method.</p>
<p>On calculating $\frac{\partial L}{\partial x}$, we take out 1 pixel from input to observe the consequence in the output.</p>
<p>With this in mind, let’s come to the conclusion:</p>
<h3 id="conclusion">Conclusion:</h3>
<p>The forward process is convolution.</p>
<p>The backward process is also convolution, with flipped kernel.</p>
<p>[post status: almost done]</p>Convolution in forward and backwardGramian Matrix [series 2]2019-01-13T00:00:00+00:002019-01-13T00:00:00+00:00http://up9rade.com/2019/01/13/gram-matrix<h1 id="gramian-matrix">Gramian Matrix</h1>
<h2 id="series-1-gramian-matrix">series 1: Gramian Matrix</h2>
<p>Given a set V of m vectors, gram matrix G is simply the matrix of all possible inner products of V.</p>
<script type="math/tex; mode=display">G_{ij} = V^T_i V_j</script>
<p>T denotes the transpose.</p>
<p>Gram matrix plays a key role in the task of style transfer in deep neural network.</p>
<p>A pre-trained model, usually vgg19 is introduced into the transfer process, the reason behind using pre-trained model is, we assume a model could perform versatile categories’ classification should be able to understand the content of image we give. While this assumption could be challenged as well.</p>
<p>And get back to gram, when compare pastiche image with the given image with a certain “style”, how should the “style” of image been represented im math?</p>
<p>As we know, gram matrix is result from a matrix multiplying its transpose, aka every row is multiplied with every column in the matrix, we could treat this process as a way of finding correlations - bigger value multiply with bigger value gets bigger, and smaller gets smaller.</p>
<p>A value in gram matrix close to zero, which denotes the 2 features in given layer of style image do not activate simultaneously, vice versa. Hence with this feature, a gram matrix could reveal the activation pattern of the style image.</p>
<p>In the neural style transfer paper 2015, Gatys et al found better results were made through making a combination of taking both shallow and deep layers of the pre-trained network with vgg.</p>Gramian MatrixVolume of Cone [series 1]2019-01-02T00:00:00+00:002019-01-02T00:00:00+00:00http://up9rade.com/2019/01/02/math-cone-volume<h1 id="volume-of-cone">Volume of Cone</h1>
<h2 id="series-1-volume-of-cone">series 1: Volume of Cone</h2>
<p>We will calculate volumes by cross sections with helps of definite integral.</p>
<p>Let’s first look at the type of cross section in horizontal:</p>
<p><img src="http://up9rade.com/img/math01.png" /></p>
<p>Let’s denote radius of cross section of “y”, now we are going to figure out formula of it:</p>
<script type="math/tex; mode=display">\tan\theta = \frac yx = \frac rh</script>
<script type="math/tex; mode=display">y = \frac rh * x</script>
<script type="math/tex; mode=display">\int_{x=0}^{x=h}\pi(\frac rh * x)^2dx = \pi(\frac rh)^2\int_{x=0}^{x=h}x^2dx</script>
<script type="math/tex; mode=display">V = \pi(\frac rh)^2\frac {h^3} {3} = \frac 13\pi r^2h</script>
<p>The above procedure seems nice & neat.</p>
<p>Now let’s challenge ourselves by asking:</p>
<p>Why should us calculate the volume by figuring out the “y”, instead of “x”?</p>
<p>Or, why shouldn’t calculate by cross sections of vertical?</p>
<p><img src="http://up9rade.com/img/math02.png" /></p>
<script type="math/tex; mode=display">\tan\theta = \frac xy = \frac hr</script>
<script type="math/tex; mode=display">x = \frac hr * y</script>
<script type="math/tex; mode=display">\int_{y=0}^{y=r}\frac 12 * y * \frac hr * y * 2 * 2 = 2 (\frac hr)\int_{y=0}^{y=r}y^2dy</script>
<script type="math/tex; mode=display">V = \frac 23 * h * r^2</script>
<p>Compare with previous one:</p>
<script type="math/tex; mode=display">\frac 13 \pi r^2 * h \neq \frac 23 * h * r^2</script>
<p>So, where is the error?</p>
<p>Following is the cite of <a href="https://en.wikipedia.org/wiki/Conic_section">Conic Section</a> from wikipedia:</p>
<p><img src="http://up9rade.com/img/math03.png" /></p>
<p>So do not mislead by the previous wrong infer, the vertical section is hyperbola instead of triangle.</p>
<p>Or think in this way:</p>
<p>We are actually calculating the volume of a pyramid, just thinking the radius of “r” is a special case of a pyramid:</p>
<p><img src="http://up9rade.com/img/math04.png" /></p>
<p>We may get the area of pyramid the above case is: “ r * r * 2 “.
So the volume turns from:</p>
<script type="math/tex; mode=display">\frac 23 h r^2</script>
<p>into:</p>
<script type="math/tex; mode=display">\frac 13 h S</script>
<p>in which S is the area of bottom of pyramid, it’s also the formula of volume of pyramid.</p>
<p>Math is fun.~~~~</p>
<p>[post status: almost done]</p>Volume of ConeNeural Network [grade 9]2018-12-10T00:00:00+00:002018-12-10T00:00:00+00:00http://up9rade.com/2018/12/10/neural-network-grade-9<h1 id="receptive-field-with-cnn">Receptive Field with CNN</h1>
<h2 id="grade-9-receptive-field-with-cnn">grade 9: Receptive field with CNN</h2>
<p>Following is a description cited from <a href="https://en.wikipedia.org/wiki/Receptive_field">wikipedia</a>:</p>
<p>The receptive field of an individual sensory neuron is the particular region of the sensory space (e.g., the body surface, or the visual field) in which a stimulus will modify the firing of that neuron.</p>
<p><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/68/Conv_layer.png/231px-Conv_layer.png" /></p>
<p>Borrowed from neuron science, given a specific layer, receptive field is how much area could a neuron see at the input layer.</p>
<p>The process of calculating receptive field might be a little confusing, yet, we could make it very easy to interpret.</p>
<p>Let’s start from convolution first, given a simple black & white image input, here for better interpreting, we flat the input, from 2 dimensions into 1 dimension.</p>
<p>Assume there are 8 inputs, following with 3 * 3 conv with stride 1, and 2 * 2 pooling with stride 2, then 3 * 3 conv with stride 1.</p>
<p>We may construct the network as following:</p>
<p>There are 5 outputs in layer 1, then 3 in layer 2 and 1 in layer3.</p>
<p><img src="http://up9rade.com/img/nn025.png" /></p>
<p>This is a top down approach.</p>
<p>We will use bottom up approach on describing receptive field, let’s start with a single neuron in layer 3. According to 3rd layer was generated from a 3*3 con with stride 1, we may decompile there are 3 neurons in front of the neuron of layer 3, this is the “receptive area” that layer 3 could see on layer 2.</p>
<p>Moving on, each of the neuron connects to 2 neurons in layer 1, because the layer 2 was construct of 2*2 pooling with stride of 2.</p>
<p>After mapping 6 neurons to 8 neurons in input layer, we done building the network bottom up.</p>
<p>Now tak a look at the green neuron of layer 1, how much could it see from input layer? 3. Which means the receptive field of layer 1 is 3.</p>
<p>Then look at the red neuron from layer 2, how many neurons could it see from the input layer? 4, the receptive field in layer 2 is 4.</p>
<p>Finally, how many could neuron in layer 3 see? 8, the receptive field in layer 3 is 8. Layer 3 could see 3 in layer 2, and see 6 in layer 1, and see 8 in input layer.</p>
<p>The receptive field is not that hard to interpret.</p>
<p>The convolution process is like to distill a sea into a drop of water, while calculating RF (receptive field) is just like to look a whole sea from a drop of water.</p>
<p>Convolution is to abstract higher level info from a picture, receptive field calculating is to figure out how many input elements could be counted in one single neuron.</p>
<p><img src="http://up9rade.com/img/nn026.png" /></p>
<p>Now, what’s it for calculating receptive field?</p>
<ol>
<li>Calculate parameters in a network, make the network less computation but maintain the similar representation power. Look at the network, with a 3<em>3 conv then 2</em>2 pooling then 3<em>3 conv, we could see which has same power of 8</em>8 conv.</li>
<li>While on build up a network for classification task, we pay attention to the receptive field in the deepest layer, could it see the whole picture of the input? If not, the performance will be impacted due to receptive field is not big enough.</li>
</ol>
<p>We gave an easy for interpreting way on explaining receptive field, in reality, we are facing very deep networks, give these circumstances, we will need to calculate receptive fields by programming.</p>
<p>Following is the formula on computing:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">receptive_field</span> <span class="o">=</span> <span class="n">stride</span> <span class="o">*</span> <span class="n">rf_size_output</span> <span class="o">+</span> <span class="n">kernel_size</span> <span class="o">-</span> <span class="n">stride</span>
</code></pre></div></div>
<p>The complete code:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="n">net_struct</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">'alexnet'</span><span class="p">:{</span>
<span class="s">'net'</span><span class="p">:[[</span><span class="mi">11</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">0</span><span class="p">],[</span><span class="mi">3</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">0</span><span class="p">],[</span><span class="mi">5</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">],[</span><span class="mi">3</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">0</span><span class="p">],[</span><span class="mi">3</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">],[</span><span class="mi">3</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">],[</span><span class="mi">3</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">],[</span><span class="mi">3</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">0</span><span class="p">]],</span>
<span class="s">'name'</span><span class="p">:[</span><span class="s">'conv1'</span><span class="p">,</span><span class="s">'pool1'</span><span class="p">,</span><span class="s">'conv2'</span><span class="p">,</span><span class="s">'pool2'</span><span class="p">,</span><span class="s">'conv3'</span><span class="p">,</span><span class="s">'conv4'</span><span class="p">,</span><span class="s">'conv5'</span><span class="p">,</span><span class="s">'pool5'</span><span class="p">]},</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">receptive_field</span><span class="p">(</span><span class="n">net</span><span class="p">,</span> <span class="n">layer_num</span><span class="p">):</span>
<span class="n">RF</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">layer</span> <span class="ow">in</span> <span class="nb">reversed</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="n">layer_num</span><span class="p">)):</span>
<span class="n">fsize</span><span class="p">,</span> <span class="n">stride</span><span class="p">,</span> <span class="n">pad</span> <span class="o">=</span> <span class="n">net</span><span class="p">[</span><span class="n">layer</span><span class="p">]</span>
<span class="n">RF</span> <span class="o">=</span> <span class="p">((</span><span class="n">RF</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span><span class="o">*</span> <span class="n">stride</span><span class="p">)</span> <span class="o">+</span> <span class="n">fsize</span>
<span class="k">return</span> <span class="n">RF</span>
<span class="k">for</span> <span class="n">net</span> <span class="ow">in</span> <span class="n">net_struct</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span>
<span class="k">print</span> <span class="s">'************network </span><span class="si">%</span><span class="s">s**************'</span><span class="o">%</span> <span class="n">net</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">net_struct</span><span class="p">[</span><span class="n">net</span><span class="p">][</span><span class="s">'net'</span><span class="p">])):</span>
<span class="n">rf</span> <span class="o">=</span> <span class="n">receptive_field</span><span class="p">(</span><span class="n">net_struct</span><span class="p">[</span><span class="n">net</span><span class="p">][</span><span class="s">'net'</span><span class="p">],</span> <span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
<span class="k">print</span> <span class="s">"Layer Name = </span><span class="si">%</span><span class="s">s, RF size = </span><span class="si">%3</span><span class="s">d"</span> <span class="o">%</span> <span class="p">(</span><span class="n">net_struct</span><span class="p">[</span><span class="n">net</span><span class="p">],</span> <span class="n">rf</span><span class="p">)</span>
</code></pre></div></div>
<p>Tensorflow provides a receptive calculation util, you might want to look at its <a href="https://github.com/tensorflow/tensorflow/tree/092a49a2bf181a3571a5b1994b6b9305313a0403/tensorflow/contrib/receptive_field">source code</a> to get more.</p>
<p>[post status: almost done]</p>Receptive Field with CNNNeural Network [grade 8]2018-12-09T00:00:00+00:002018-12-09T00:00:00+00:00http://up9rade.com/2018/12/09/neural-network-grade-8<h1 id="convolution-with-cnn">Convolution with CNN</h1>
<h2 id="grade-8-convolution-with-cnn">grade 8: Convolution with CNN</h2>
<p>There’s a very comprehensive introduction on convolutional neural networks from <a href="http://cs231n.github.io/convolutional-networks/">cs231n</a>.</p>
<p>We will focus on how to implement a cnn from scratch (with numpy), and apply it on mnist dataset.</p>
<p>Key steps for implementation:</p>
<ol>
<li>On the feed forward, the convolutional operation;</li>
<li>On the back propagation, there are couple of items significantly differentiate with deep neural network we introduced before:
<ul>
<li>the pooling layer, which does not have an activate function like relu/sigmoid as convolution layer;</li>
<li>the pooling layer takes a down sampling strategy on feeding forward, which means data was zipped;</li>
<li>dnn applies matrix multiplication towards each layer directly, while cnn applies sum up of convolutional operation to each layer;</li>
<li>and in a word, it’s vector, participated in dnn operation, and it’s tensor, participated in cnn operation.</li>
</ul>
</li>
</ol>
<p>We need to solve these differentiations to solve back propagation in cnn.</p>
<p>[post status: in writing]</p>Convolution with CNNPhotorealistic Style Transfer2018-11-18T00:00:00+00:002018-11-18T00:00:00+00:00http://up9rade.com/2018/11/18/photorealistic-style-transfer<h1 id="photorealistic-style-transfer">Photorealistic Style Transfer</h1>
<p>On this series we are going to discuss about: image-to-image translation problem.</p>
<p>Photorealistic style transfer aim to transfer the style from the target image meanwhile eliminate the distortion on generated image, to keep the result photorealistic, this process could be treated a special kind of image-to-image.</p>
<p>However, this transfer process does not need a set of training data or labelling.</p>
<p>Gatys et al introduce neural style transfer in 2015.
The state-of-art works are by:
Luan et al :Deep Photo Style Tranfer in 2017.
Li et al: A Closed-form Solution to Photorealistic Image Stylization in 2018.</p>
<p>They took differenct approaches:</p>
<p>Luan et al treat the style transfer as affine in color space from input to output, while Li et al reckon the tranfer should not be limited only to color, but introducing patterns from style image to output.</p>
<p>To achieve the color transfer:</p>
<p>Luan et al implemented the following: use an affine function maps the input RGB values on to the output counterparts, augmented style loss with semantic segmentation.</p>
<p>Li et al take 2 steps: style transfer based on refined WCT and smoothing based on graph-based ranking algorithm.</p>
<p>The latter achieves much more computational efficiency than the previous.</p>Photorealistic Style TransferNeural Network [grade 7]2018-10-28T00:00:00+00:002018-10-28T00:00:00+00:00http://up9rade.com/2018/10/28/neural-network-grade-7<h1 id="mini-rnn">Mini RNN</h1>
<h2 id="grade-7-mini-rnn">grade 7: mini RNN</h2>
<p>Following we will implement a simple version of RNN.
And we will go back to CNN on more details later. : )</p>
<p>While CNN overcomes the issue of computational complexity over traditional full connection networks, it has its own shortcoming, it has no memory on previous events.</p>
<p>For instance, if a CNN robot sitting in the cinema watching a movie, most likely it won’t understand what’s going on and what’s the story, unlike human, the CNN won’t memorize previous scenes to help understand what’s happening and what will be in the near future.</p>
<p>With saying that, a RNN is not that mysterious, still it could be looked as a CNN, in some way.</p>
<p>It’s understandable for human beings to predict what’s next in a say hi dialog scenario, so do RNN.
It’s difficult for people to inference the ending of a story book by using the information gave from the beginning, so do RNN. With the “refined” RNN, the Long Short Term Memory networks - LSTMs is aim to resolve this.</p>
<p>[post status: in writing]</p>Mini RNN