<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.3">Jekyll</generator><link href="https://kgrz.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://kgrz.io/" rel="alternate" type="text/html" /><updated>2024-06-05T13:04:12+05:30</updated><id>https://kgrz.io/feed.xml</id><title type="html">Kashyap’s blog</title><subtitle>Kashyap&apos;s Blog</subtitle><author><name>Kashyap Kondamudi</name></author><entry><title type="html">Rewrites outside location blocks in Nginx are bad!</title><link href="https://kgrz.io/rewrites-outside-location-in-nginx-bad.html" rel="alternate" type="text/html" title="Rewrites outside location blocks in Nginx are bad!" /><published>2024-06-05T00:00:00+05:30</published><updated>2024-06-05T00:00:00+05:30</updated><id>https://kgrz.io/rewrites-outside-location-in-nginx-bad</id><content type="html" xml:base="https://kgrz.io/rewrites-outside-location-in-nginx-bad.html"><![CDATA[<p>When we did our recent performance tests on one of our nginx clusters I noticed something odd: the CPU was choking at a request rate that was too little for a system like that. It’s a static proxy server running vanilla nginx, and the downstream servers were doing okay in terms of latency. CPU on this system shouldn’t choke before saturating those downstream systems, but it did. <code class="language-plaintext highlighter-rouge">perf</code> reports on the process showed most of the samples occupied by symbols related to <code class="language-plaintext highlighter-rouge">rewrite</code> and <code class="language-plaintext highlighter-rouge">ngx_http_regex_exec</code>. While we have a lot of <code class="language-plaintext highlighter-rouge">location</code> rules in this codebase, and many of them are regular expression style matchers, it <em>seemed</em> like way too much time was occupied by these routines. What’s worse is that this happens even when we try running the benchmarks with known wrong URLs or triggering the block/rate-limiting configurations, which <em>should</em> bypass most of the <code class="language-plaintext highlighter-rouge">location</code> matching anyway. At one point I noticed Nginx (depending on compilation flags) has support for a <a href="https://nginx.org/en/docs/ngx_core_module.html#pcre_jit" title="pcre_jit directive documentation">PCRE JIT enigne</a> configuration that promises improvement in regular expression matching. Turning this <code class="language-plaintext highlighter-rouge">on</code> did improve the situation quite a bit, but it wasn’t anything spectacular. The regex symbols still formed a large part in <code class="language-plaintext highlighter-rouge">perf</code> for every cut of URL type. Debugging this further pointed towards a combination of the following issues caused the problem:</p>

<ol>
  <li>We have lots of <code class="language-plaintext highlighter-rouge">rewrite</code> rules within a <code class="language-plaintext highlighter-rouge">server</code> block outside of <code class="language-plaintext highlighter-rouge">location</code> blocks.</li>
  <li>The bypass routines in cases of known errors and/or rate limiting used relative paths in the config (internal redirects of sorts)</li>
</ol>

<p>Our config had a bit of this shape:</p>

<figure class="highlight">
  <pre><code class="language-nginx" data-lang="nginx"><span class="k">http</span> <span class="p">{</span>
  <span class="kn">server</span> <span class="p">{</span>
    <span class="kn">server_name</span> <span class="s">x.example.com</span> <span class="s">localhost</span><span class="p">;</span>
    <span class="kn">listen</span> <span class="mi">80</span><span class="p">;</span>

    <span class="kn">error_page</span>  <span class="mi">404</span> <span class="n">/404.html</span><span class="p">;</span>
    <span class="kn">error_page</span>  <span class="mi">403</span> <span class="n">/403.html</span><span class="p">;</span>

    <span class="c1"># hundreds of rewrite rules go here, intended to "normalize" matching URLs</span>
    <span class="c1"># rewrite {regex} ...</span>
    <span class="c1"># rewrite {regex} ...</span>
    <span class="c1"># rewrite {regex} ...</span>
    <span class="c1"># ... and so on</span>

    <span class="kn">location</span> <span class="n">/404.html</span> <span class="p">{</span>
      <span class="kn">root</span> <span class="n">/srv/www/html</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="kn">location</span> <span class="n">/403.html</span> <span class="p">{</span>
      <span class="kn">root</span> <span class="n">/srv/www/html</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="kn">location</span> <span class="n">/</span> <span class="p">{</span>
      <span class="kn">proxy_pass</span> <span class="s">http://upstream</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="c1"># ... and so on</span>
  <span class="p">}</span>
<span class="p">}</span></code></pre>
</figure>

<p class="caption">In our case the configuration is spread over hundreds of files, and the <code class="language-plaintext highlighter-rouge">rewrite</code> rules shown here are all in a separate file which gets included right after the main <code class="language-plaintext highlighter-rouge">server</code> block configurations end and any <code class="language-plaintext highlighter-rouge">location</code> block definitions start.</p>

<p>A long time ago on the project a decision was made to add a file that contained a few URL matching rules that had to run in-order for every URL, before any <code class="language-plaintext highlighter-rouge">location</code> matching runs, sort of a pre-processor to “normalize” URLs.</p>

<p>This is <em>okay</em> as long as it doesn’t get abused. But slowly over time there were additions that should’ve simply been <code class="language-plaintext highlighter-rouge">location</code> blocks instead. For the uninitiated, <code class="language-plaintext highlighter-rouge">location</code> matching tends to be more efficient in matching URLs as nginx builds out a tree of these at startup rather than going at it serially one by one, which is what our problem technically became. This set of rules started growing as it became a kitchen-sink of sorts for every “wrong” URL—at the time of debugging the number of such rules were in the hundreds. These rules get executed multiple times if there are internal redirects; <code class="language-plaintext highlighter-rouge">rewrite</code> rules themselves can be internal redirects if they don’t use one of the bypassing <a href="https://nginx.org/en/docs/http/ngx_http_rewrite_module.html#rewrite" title="rewrite directive documentation"><code class="language-plaintext highlighter-rouge">flags</code></a> like <code class="language-plaintext highlighter-rouge">break</code>, <code class="language-plaintext highlighter-rouge">redirect</code>, <code class="language-plaintext highlighter-rouge">permanent</code>, which further exacerbates the probem. This was true in our case since the intent was to run these in-order, which means <code class="language-plaintext highlighter-rouge">last</code> and <code class="language-plaintext highlighter-rouge">break</code> can’t be used by definition.</p>

<p>Secondly, we used relative paths for all the <a href="https://nginx.org/en/docs/http/ngx_http_core_module.html#error_page" title="error_page directive documentation"><code class="language-plaintext highlighter-rouge">error_page</code></a> configurations mostly as a carry-over from <a href="https://nginx.org/en/docs/example.html" title="Nginx configuration example">nginx configurations</a> that are documented pretty much everywhere<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> . So when an error status is triggered nginx will redo the matching from the beginning. In isolation this is not a problem, and I can understand why the default documentation snippets use this pattern. In our case these two problems in combination create a cascading effect: when testing out our rate limiting and error handling checks, which <em>should’ve</em> bypassed the relatively-costly <code class="language-plaintext highlighter-rouge">location</code> matching,  <em>every</em> rewrite rule got run <strong>twice</strong>, which made the performance pathologically worse!</p>

<p>So here’s a PSA of sorts:</p>

<ol>
  <li>Try not to have <code class="language-plaintext highlighter-rouge">rewrite</code> rules outside of a <code class="language-plaintext highlighter-rouge">location</code> block</li>
  <li>Prefer named routes for “jump”s or bypass internal redirects instead of normal URLs/paths. This would’ve avoided the second execution of the <code class="language-plaintext highlighter-rouge">rewrite</code> rules. Something like the below snippet:</li>
</ol>

<figure class="highlight">
  <pre><code class="language-nginx" data-lang="nginx"><span class="k">error_page</span> <span class="mi">404</span> <span class="s">@notfound</span><span class="p">;</span>

<span class="k">location</span> <span class="s">@notfound</span> <span class="p">{</span>
  <span class="kn">try_files</span> <span class="mi">404</span><span class="s">.html</span> <span class="p">=</span><span class="mi">500</span><span class="p">;</span>
<span class="p">}</span></code></pre>
</figure>

<h3 id="example">Example</h3>

<p>Just to demonstrate this with an example, I’m going to use this configuration in nginx, which is deliberately close to what we had structurally:</p>

<figure class="highlight">
  <pre><code class="language-nginx" data-lang="nginx"><span class="k">events</span> <span class="p">{</span>  <span class="p">}</span>

<span class="k">error_log</span>   <span class="n">/dev/stderr</span> <span class="s">notice</span><span class="p">;</span>

<span class="k">http</span> <span class="p">{</span>
    <span class="kn">include</span>         <span class="n">/etc/nginx/mime.types</span><span class="p">;</span>
    <span class="kn">default_type</span>    <span class="nc">application/octet-stream</span><span class="p">;</span>
    <span class="kn">access_log</span>      <span class="n">/dev/stdout</span> <span class="s">combined</span><span class="p">;</span>
    <span class="kn">rewrite_log</span>     <span class="no">on</span><span class="p">;</span>

    <span class="c1"># The 404 page definition is copied verbatim from the example config that</span>
    <span class="c1"># every debian Nginx package ships with.</span>
    <span class="c1">#</span>
    <span class="c1"># Culprit 1:</span>
    <span class="kn">error_page</span> <span class="mi">404</span> <span class="n">/404.html</span><span class="p">;</span>
    <span class="kn">error_page</span> <span class="mi">403</span> <span class="n">/403.html</span><span class="p">;</span>

    <span class="kn">server</span> <span class="p">{</span>
        <span class="kn">listen</span>      <span class="mi">80</span><span class="p">;</span>
        <span class="kn">server_name</span> <span class="s">localhost</span><span class="p">;</span>
        <span class="kn">root</span> <span class="n">/usr/share/nginx/html</span><span class="p">;</span>

        <span class="c1"># Culprit 2: Naked rewrite rules that try to match for every route, even</span>
        <span class="c1"># internal redirects</span>
        <span class="kn">rewrite</span> <span class="n">/unknown1/</span><span class="s">(.*)</span>  <span class="n">/unknown/</span><span class="nv">$1</span><span class="p">;</span>
        <span class="kn">rewrite</span> <span class="n">/unknown2/</span><span class="s">(.*)</span>  <span class="n">/unknown/</span><span class="nv">$1</span><span class="p">;</span>
        <span class="kn">rewrite</span> <span class="n">/unknown3/</span><span class="s">(.*)</span>  <span class="n">/unknown/</span><span class="nv">$1</span><span class="p">;</span>
        <span class="kn">rewrite</span> <span class="n">/unknown4/</span><span class="s">(.*)</span>  <span class="n">/unknown/</span><span class="nv">$1</span><span class="p">;</span>

        <span class="c1"># This will try to send out the file /usr/share/nginx/html/main.html or</span>
        <span class="c1"># else respond with a 404 error page.</span>
        <span class="kn">location</span> <span class="p">=</span> <span class="n">/main</span> <span class="p">{</span>
            <span class="kn">try_files</span> <span class="s">index.html</span> <span class="n">/index.html</span> <span class="p">=</span><span class="mi">404</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="kn">location</span> <span class="n">/notauthorized</span> <span class="p">{</span>
            <span class="kn">return</span> <span class="mi">403</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="kn">location</span> <span class="n">/nonexistent</span> <span class="p">{</span>
            <span class="kn">return</span> <span class="mi">404</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="c1"># Only the 50x.html template is present in the latest nginx container,</span>
        <span class="c1"># so using that as a generic error page to keep things simple.</span>
        <span class="kn">location</span> <span class="n">/404.html</span> <span class="p">{</span> <span class="kn">try_files</span> <span class="s">50x.html</span> <span class="n">/50x.html</span> <span class="p">=</span><span class="mi">500</span><span class="p">;</span> <span class="p">}</span>
        <span class="kn">location</span> <span class="n">/403.html</span> <span class="p">{</span> <span class="kn">try_files</span> <span class="s">50x.html</span> <span class="n">/50x.html</span> <span class="p">=</span><span class="mi">500</span><span class="p">;</span> <span class="p">}</span>

        <span class="c1"># Always go to /main by issuing an internal rewrite</span>
        <span class="kn">location</span> <span class="n">/</span> <span class="p">{</span>
            <span class="kn">rewrite</span> <span class="s">^/.*</span>$ <span class="n">/main</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span></code></pre>
</figure>

<p>This sets up three main user-facing routes: <code class="language-plaintext highlighter-rouge">/</code>, <code class="language-plaintext highlighter-rouge">/main</code>, <code class="language-plaintext highlighter-rouge">/notauthorized</code>, <code class="language-plaintext highlighter-rouge">/nonauthorized</code>. <code class="language-plaintext highlighter-rouge">/</code> redirects to <code class="language-plaintext highlighter-rouge">/main</code> internally, although the user won’t see any 3xx, while <code class="language-plaintext highlighter-rouge">/notauthorized</code> returns a 403 response. The latter too (<a href="https://nginx.org/en/docs/http/ngx_http_rewrite_module.html#return]" title="return directive documentation"><code class="language-plaintext highlighter-rouge">return</code></a>, as used in this case) is implemented as an internal redirect within nginx, so the routing and rule execution behaviour is going to be similar between the 404 case and the 403 case. For the uninitiated, <a href="https://nginx.org/en/docs/http/ngx_http_core_module.html#try_files" title="try_files directive documentation"><code class="language-plaintext highlighter-rouge">try_files</code></a> (as used here) in nginx checks the paths given to it within the <code class="language-plaintext highlighter-rouge">root</code> path, or else return the status code mentioned at the end. <a href="https://nginx.org/en/docs/http/ngx_http_core_module.html#error_page" title="error_page directive documentation"><code class="language-plaintext highlighter-rouge">error_page</code></a> allow for configuring extra routes when nginx has to respond to a particular status code. Effectively, this too is an internal redirect before and after: when a <code class="language-plaintext highlighter-rouge">location</code> block has the <code class="language-plaintext highlighter-rouge">redirect</code> rule, and when the <code class="language-plaintext highlighter-rouge">redirect</code> rule itself has a path as the target location.</p>

<p>I’ll try the following four routes:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl localhost/
curl localhost/main
curl localhost/notauthorized
curl localhost/nonexistent
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">/</code> and <code class="language-plaintext highlighter-rouge">/main</code> runs are just to demonstrate the extra <code class="language-plaintext highlighter-rouge">rewrite</code> between them. <code class="language-plaintext highlighter-rouge">rewrite_log on;</code> does what it says on the tin, and here’s a filtered snippet from the logs:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>GET /
"/unknown1/(.*)" does not match "/"
"/unknown2/(.*)" does not match "/"
"/unknown3/(.*)" does not match "/"
"/unknown4/(.*)" does not match "/"
"^/.*$" matches "/"
rewritten data: "/main", args: ""

GET /main
"/unknown1/(.*)" does not match "/main"
"/unknown2/(.*)" does not match "/main"
"/unknown3/(.*)" does not match "/main"
"/unknown4/(.*)" does not match "/main"

GET /notauthorized
"/unknown1/(.*)" does not match "/notauthorized"
"/unknown2/(.*)" does not match "/notauthorized"
"/unknown3/(.*)" does not match "/notauthorized"
"/unknown4/(.*)" does not match "/notauthorized"
"/unknown1/(.*)" does not match "/403.html"
"/unknown2/(.*)" does not match "/403.html"
"/unknown3/(.*)" does not match "/403.html"
"/unknown4/(.*)" does not match "/403.html"

GET /nonexistent
"/unknown1/(.*)" does not match "/nonexistent"
"/unknown2/(.*)" does not match "/nonexistent"
"/unknown3/(.*)" does not match "/nonexistent"
"/unknown4/(.*)" does not match "/nonexistent"
"/unknown1/(.*)" does not match "/404.html"
"/unknown2/(.*)" does not match "/404.html"
"/unknown3/(.*)" does not match "/404.html"
"/unknown4/(.*)" does not match "/404.html"
</code></pre></div></div>

<p>Both the <code class="language-plaintext highlighter-rouge">/</code> route and <code class="language-plaintext highlighter-rouge">/main</code> work as expected: the naked <code class="language-plaintext highlighter-rouge">rewrite</code> rules run once, but in the cases of the other two these get executed twice. With the current config it’s a bit hard to demonstrate, but the pathological case happens even when those 403, 404 cases happen naturally: an undefined location etc. Using named routes this is the rewritten (no pun intended) config:</p>

<figure class="highlight">
  <pre><code class="language-nginx" data-lang="nginx"><span class="k">events</span> <span class="p">{</span>  <span class="p">}</span>

<span class="k">error_log</span>   <span class="n">/dev/stderr</span> <span class="s">notice</span><span class="p">;</span>

<span class="k">http</span> <span class="p">{</span>
    <span class="kn">include</span>         <span class="n">/etc/nginx/mime.types</span><span class="p">;</span>
    <span class="kn">default_type</span>    <span class="nc">application/octet-stream</span><span class="p">;</span>
    <span class="kn">access_log</span>      <span class="n">/dev/stdout</span> <span class="s">combined</span><span class="p">;</span>
    <span class="kn">rewrite_log</span>     <span class="no">on</span><span class="p">;</span>

<span class="hll">    <span class="kn">error_page</span> <span class="mi">404</span> <span class="s">@404.html</span><span class="p">;</span>
</span><span class="hll">    <span class="kn">error_page</span> <span class="mi">403</span> <span class="s">@403.html</span><span class="p">;</span>
</span>
    <span class="kn">server</span> <span class="p">{</span>
        <span class="kn">listen</span>      <span class="mi">80</span><span class="p">;</span>
        <span class="kn">server_name</span> <span class="s">localhost</span><span class="p">;</span>
        <span class="kn">root</span> <span class="n">/usr/share/nginx/html</span><span class="p">;</span>

        <span class="c1"># Culprit 2: Naked rewrite rules that try to match for every route, even</span>
        <span class="c1"># internal redirects</span>
        <span class="kn">rewrite</span> <span class="n">/unknown1/</span><span class="s">(.*)</span>  <span class="n">/unknown/</span><span class="nv">$1</span><span class="p">;</span>
        <span class="kn">rewrite</span> <span class="n">/unknown2/</span><span class="s">(.*)</span>  <span class="n">/unknown/</span><span class="nv">$1</span><span class="p">;</span>
        <span class="kn">rewrite</span> <span class="n">/unknown3/</span><span class="s">(.*)</span>  <span class="n">/unknown/</span><span class="nv">$1</span><span class="p">;</span>
        <span class="kn">rewrite</span> <span class="n">/unknown4/</span><span class="s">(.*)</span>  <span class="n">/unknown/</span><span class="nv">$1</span><span class="p">;</span>

        <span class="c1"># This will try to send out the file /usr/share/nginx/html/main.html or</span>
        <span class="c1"># else respond with a 404 error page.</span>
        <span class="kn">location</span> <span class="p">=</span> <span class="n">/main</span> <span class="p">{</span>
            <span class="kn">try_files</span> <span class="s">index.html</span> <span class="n">/index.html</span> <span class="p">=</span><span class="mi">404</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="kn">location</span> <span class="n">/notauthorized</span> <span class="p">{</span>
            <span class="kn">return</span> <span class="mi">403</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="kn">location</span> <span class="n">/nonexistent</span> <span class="p">{</span>
            <span class="kn">return</span> <span class="mi">404</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="c1"># Only the 50x.html template is present in the latest nginx container,</span>
        <span class="c1"># so using that as a generic error page to keep things simple.</span>
<span class="hll">        <span class="kn">location</span> <span class="s">@404.html</span> <span class="p">{</span> <span class="kn">try_files</span> <span class="s">50x.html</span> <span class="n">/50x.html</span> <span class="p">=</span><span class="mi">500</span><span class="p">;</span> <span class="p">}</span>
</span><span class="hll">        <span class="kn">location</span> <span class="s">@403.html</span> <span class="p">{</span> <span class="kn">try_files</span> <span class="s">50x.html</span> <span class="n">/50x.html</span> <span class="p">=</span><span class="mi">500</span><span class="p">;</span> <span class="p">}</span>
</span>
        <span class="c1"># Always go to /main by issuing an internal rewrite</span>
        <span class="kn">location</span> <span class="n">/</span> <span class="p">{</span>
            <span class="kn">rewrite</span> <span class="s">^/.*</span>$ <span class="n">/main</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span></code></pre>
</figure>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>GET /
"/unknown1/(.*)" does not match "/"
"/unknown2/(.*)" does not match "/"
"/unknown3/(.*)" does not match "/"
"/unknown4/(.*)" does not match "/"
"^/.*$" matches "/"
rewritten data: "/main", args: ""

GET /main
"/unknown1/(.*)" does not match "/main"
"/unknown2/(.*)" does not match "/main"
"/unknown3/(.*)" does not match "/main"
"/unknown4/(.*)" does not match "/main"

GET /notauthorized
"/unknown1/(.*)" does not match "/notauthorized"
"/unknown2/(.*)" does not match "/notauthorized"
"/unknown3/(.*)" does not match "/notauthorized"
"/unknown4/(.*)" does not match "/notauthorized"

GET /nonexistent
"/unknown1/(.*)" does not match "/nonexistent"
"/unknown2/(.*)" does not match "/nonexistent"
"/unknown3/(.*)" does not match "/nonexistent"
"/unknown4/(.*)" does not match "/nonexistent"
</code></pre></div></div>

<p>As expected, only one set of <code class="language-plaintext highlighter-rouge">rewrite</code> rule runs. That said, the actual fix would be to refactor the rewrites into <code class="language-plaintext highlighter-rouge">location</code> blocks to improve the matching performance a little further.</p>

<hr />

<h4 id="footnotes">Footnotes</h4>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>Similar configuration is also shipped with the default debian package at least as of Debian 11, and the official Nginx container at least as of <a href="https://hub.docker.com/_/nginx/" title="Nginx official Dockerfile source"><code class="language-plaintext highlighter-rouge">1.27.0</code></a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Kashyap Kondamudi</name></author><category term="nginx" /><summary type="html"><![CDATA[Here's a PSA of sorts: 1. Try not to have `rewrite` rules outside of a location block in Nginx. 2. Prefer named routes for jumps or internal redirects in configs like `error_page` instead of normal URLs/paths. These cause multiple rounds of regex matches that tank system performance even (and especially) in the simplest cases.]]></summary></entry><entry><title type="html">Go can only read 1GiB per Read call</title><link href="https://kgrz.io/go-file-read-max-size-buffer.html" rel="alternate" type="text/html" title="Go can only read 1GiB per Read call" /><published>2024-02-07T00:00:00+05:30</published><updated>2024-02-07T00:00:00+05:30</updated><id>https://kgrz.io/go-file-read-max-size-buffer</id><content type="html" xml:base="https://kgrz.io/go-file-read-max-size-buffer.html"><![CDATA[<p>UPDATE: I don’t mean to say that this is a bad choice, or that it’s a bug, or even a performance implication. It’s just a choice that was made which seemed a bit opaque without doing all the history spelunking I did here, and it’s interesting to see the reasoning behind it.</p>

<p>There’s a 1GiB limit for a single <code class="language-plaintext highlighter-rouge">Read</code> call for an <code class="language-plaintext highlighter-rouge">os.File</code> entity (object? struct?) in Go, even though native <code class="language-plaintext highlighter-rouge">read</code> syscall can fill a 2GiB buffer (as tested in my arm macos and Intel Linux machine). I ran into this when looking at a pprof profile of a sample word count program I was writing, which showed the program was spending way too much time in the <code class="language-plaintext highlighter-rouge">syscall</code> module. That in this context can only mean one thing: way too many <code class="language-plaintext highlighter-rouge">read</code> syscalls were getting called. Something like this would show this behaviour:</p>

<figure class="highlight">
  <pre><code class="language-go" data-lang="go"><span class="n">f</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">os</span><span class="o">.</span><span class="n">Open</span><span class="p">(</span><span class="s">"superlargefile.txt"</span><span class="p">)</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
    <span class="n">log</span><span class="o">.</span><span class="n">Fatal</span><span class="p">(</span><span class="s">"error opening input file: "</span><span class="p">,</span> <span class="n">err</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">defer</span> <span class="n">f</span><span class="o">.</span><span class="n">Close</span><span class="p">()</span>

<span class="n">buf</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span> <span class="m">1024</span><span class="o">*</span><span class="m">1024</span><span class="o">*</span><span class="m">1024</span><span class="o">*</span><span class="m">2</span><span class="p">)</span> <span class="c">// 2GiB buffer</span>
<span class="n">fmt</span><span class="o">.</span><span class="n">Println</span><span class="p">(</span><span class="s">"buffer size"</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">buf</span><span class="p">))</span>

<span class="k">for</span> <span class="n">iter</span> <span class="o">:=</span> <span class="m">1</span><span class="p">;</span> <span class="p">;</span> <span class="n">iter</span> <span class="o">+=</span> <span class="m">1</span> <span class="p">{</span>
    <span class="n">n</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">f</span><span class="o">.</span><span class="n">Read</span><span class="p">(</span><span class="n">buf</span><span class="p">)</span>

    <span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
        <span class="k">if</span> <span class="n">err</span> <span class="o">==</span> <span class="n">io</span><span class="o">.</span><span class="n">EOF</span> <span class="p">{</span>
            <span class="n">fmt</span><span class="o">.</span><span class="n">Println</span><span class="p">(</span><span class="s">"done"</span><span class="p">)</span>
            <span class="k">break</span>
        <span class="p">}</span>

        <span class="n">log</span><span class="o">.</span><span class="n">Fatal</span><span class="p">(</span><span class="s">"error reading input file: "</span><span class="p">,</span> <span class="n">err</span><span class="p">)</span>
    <span class="p">}</span>

    <span class="n">fmt</span><span class="o">.</span><span class="n">Println</span><span class="p">(</span><span class="s">"bytes read: "</span><span class="p">,</span> <span class="n">n</span><span class="p">)</span>
    <span class="n">fmt</span><span class="o">.</span><span class="n">Println</span><span class="p">(</span><span class="s">"iter: "</span><span class="p">,</span> <span class="n">iter</span><span class="p">)</span>
<span class="p">}</span></code></pre>
</figure>

<p>That, on a 2.5G file would output something like:</p>

<figure class="highlight">
  <pre><code class="language-text" data-lang="text">buffer size 2147483648
bytes read:  1073741824
iter:  1
bytes read:  1073741824
iter:  2
bytes read:  490442752
iter:  3
done</code></pre>
</figure>

<p>Even though the initialised buffer size is 2GiB, only 1GiB is read into the buffer per iteration. Upon digging into the source code, it looks like this is a deliberate choice. The main change logs from the history point to the following:</p>

<ol>
  <li><a href="https://codereview.appspot.com/89900044">https://codereview.appspot.com/89900044</a> as a fix for <a href="https://github.com/golang/go/issues/7812">golang/go#7812</a>. This had a fix for failing reads on file sizes greater than or equal to 2GiB on macos and freebsd by capping each <code class="language-plaintext highlighter-rouge">read</code> syscall to only read a 2GiB-1 bytes. For the rest of operating systems, at this point, there was no cap.</li>
  <li><a href="https://codereview.appspot.com/94070044">https://codereview.appspot.com/94070044</a> as a followup of 1, where the limit was decreased without any OS checks to 1GiB, with an explanation that at least it would allow for aligned reads from disk, as opposed to an odd number that might miss page caches (my understanding).</li>
</ol>

<p>Note that a lot has changed since that changeset, and the current file reference for that <code class="language-plaintext highlighter-rouge">_unix.go</code> file in the changeset is <a href="https://github.com/golang/go/blob/release-branch.go1.22/src/internal/poll/fd_unix.go#L132-L137">src/internal/poll/fd_unix.go</a>.</p>

<h3 id="aside-system-limits">Aside: System limits</h3>

<p>As per the linux <a href="https://www.man7.org/linux/man-pages/man2/read.2.html#NOTES"><code class="language-plaintext highlighter-rouge">read</code> syscall documentation</a>, the maximum bytes that can be transferred is 2GiB. And I tested this out with rudimentary scripts in Rust and C. The Rust program is taken verbatim from the example for <a href="https://doc.rust-lang.org/std/io/trait.Read.html#method.read_to_end"><code class="language-plaintext highlighter-rouge">read_to_end()</code></a>. Running that under <code class="language-plaintext highlighter-rouge">strace</code> has the following output (truncated here):</p>

<figure class="highlight">
  <pre><code class="language-text" data-lang="text">read(3, ..., 6594816000) = 2147479552
read(3, ..., 4447336448) = 2147479552
read(3, ..., 2299856896) = 2147479552
read(3, ..., 152377344) = 152377344
read(3, "", 32)         = 0</code></pre>
</figure>

<p>And a similar, simple C program results in similar output, when using the <code class="language-plaintext highlighter-rouge">read</code> syscall in a loop until the file is read:</p>

<figure class="highlight">
  <pre><code class="language-text" data-lang="text">SSIZE_MAX: 9223372036854775807 # outputting the limits.h constant
bytes read: 2147479552
bytes read: 2147479552
bytes read: 2147479552
bytes read: 152377344</code></pre>
</figure>

<p>Although that’s neither here nor there, it’s still interesting that Go’s choice has been to pick 2GiB-1 and then 1GiB justifying the odd buffer size in the former.</p>]]></content><author><name>Kashyap Kondamudi</name></author><category term="go," /><category term="til" /><summary type="html"><![CDATA[There's a 1GiB limit for a single `Read` call for an `os.File` entity (object? struct?) in Go, and this seems to be a deliberate choice.]]></summary></entry><entry><title type="html">classnames library composes well!</title><link href="https://kgrz.io/composing-classnames.html" rel="alternate" type="text/html" title="classnames library composes well!" /><published>2023-05-02T00:00:00+05:30</published><updated>2023-05-02T00:00:00+05:30</updated><id>https://kgrz.io/composing-classnames</id><content type="html" xml:base="https://kgrz.io/composing-classnames.html"><![CDATA[<p>This is an unpublished draft from 6 years ago. Unpublished until now, that is.</p>

<p>The <a href="https://www.npmjs.com/package/classnames">classnames</a> library is a <em>very</em> handy tool to apply CSS classes conditionally in JavaScript components. Since the output of the function
is just a string, this can be composed very well on multiple conditionals layered on on various
parts of the code.</p>

<p>For example, consider the following:</p>

<figure class="highlight">
  <pre><code class="language-javascript" data-lang="javascript"><span class="k">import</span> <span class="nx">cx</span> <span class="k">from</span> <span class="dl">'</span><span class="s1">classnames</span><span class="dl">'</span><span class="p">;</span>

<span class="k">switch </span><span class="p">(</span><span class="nx">type</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">case</span><span class="p">:</span> <span class="dl">'</span><span class="s1">textarea</span><span class="dl">'</span><span class="p">:</span>
    <span class="kd">const</span> <span class="nx">textareaClassNames</span> <span class="o">=</span> <span class="nf">cx</span><span class="p">(</span><span class="dl">'</span><span class="s1">text-area</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">text-input</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">invalid</span><span class="dl">'</span><span class="p">:</span> <span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">state</span><span class="p">.</span><span class="nx">valid</span><span class="p">);</span>
    <span class="k">return</span> <span class="o">&lt;</span><span class="nx">textarea</span> <span class="nx">className</span><span class="o">=</span><span class="p">{</span><span class="nx">textareaClassNames</span><span class="p">}</span> <span class="sr">/</span><span class="err">&gt;
</span>  <span class="k">default</span><span class="p">:</span>
    <span class="kd">const</span> <span class="nx">inputClassNames</span> <span class="o">=</span> <span class="nf">cx</span><span class="p">(</span><span class="dl">'</span><span class="s1">text-input</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">invalid</span><span class="dl">'</span><span class="p">:</span> <span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">state</span><span class="p">.</span><span class="nx">valid</span><span class="p">);</span>
    <span class="k">return</span> <span class="o">&lt;</span><span class="nx">input</span> <span class="nx">type</span><span class="o">=</span><span class="p">{</span><span class="nx">type</span><span class="p">}</span> <span class="nx">className</span><span class="o">=</span><span class="p">{</span><span class="nx">inputClassNames</span><span class="p">}</span> <span class="sr">/</span><span class="err">&gt;
</span><span class="p">}</span></code></pre>
</figure>

<p>The class-names are the same except for one extra item in the case of
textarea type input field. Until today, I would’ve done something like
the above example, since I never bothered to look at the actual output
of the call. A quick glance at the source code of the library made it
evident that the library would enable composition with output of another
classname-generated value (which is a String). So that code can be
simplified to:</p>

<figure class="highlight">
  <pre><code class="language-javascript" data-lang="javascript"><span class="k">import</span> <span class="nx">cx</span> <span class="k">from</span> <span class="dl">'</span><span class="s1">classnames</span><span class="dl">'</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">className</span> <span class="o">=</span> <span class="nf">cx</span><span class="p">(</span><span class="dl">'</span><span class="s1">text-input</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">invalid</span><span class="dl">'</span><span class="p">:</span> <span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">state</span><span class="p">.</span><span class="nx">valid</span><span class="p">);</span>

<span class="k">switch </span><span class="p">(</span><span class="nx">type</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">case</span><span class="p">:</span> <span class="dl">'</span><span class="s1">textarea</span><span class="dl">'</span><span class="p">:</span>
    <span class="k">return</span> <span class="o">&lt;</span><span class="nx">textarea</span> <span class="nx">className</span><span class="o">=</span><span class="p">{</span><span class="nf">cx</span><span class="p">(</span><span class="nx">className</span><span class="p">,</span> <span class="dl">'</span><span class="s1">text-area</span><span class="dl">'</span><span class="p">)}</span> <span class="sr">/</span><span class="err">&gt;
</span>  <span class="k">default</span><span class="p">:</span>
    <span class="k">return</span> <span class="o">&lt;</span><span class="nx">input</span> <span class="nx">type</span><span class="o">=</span><span class="p">{</span><span class="nx">type</span><span class="p">}</span> <span class="nx">className</span><span class="o">=</span><span class="p">{</span><span class="nx">className</span><span class="p">}</span> <span class="sr">/</span><span class="err">&gt;
</span><span class="p">}</span></code></pre>
</figure>

<p>Much better.</p>]]></content><author><name>Kashyap Kondamudi</name></author><category term="React" /><summary type="html"><![CDATA[I love classnames library!]]></summary></entry><entry><title type="html">Node has native CLI argument parsing</title><link href="https://kgrz.io/node-has-native-arg-parsing.html" rel="alternate" type="text/html" title="Node has native CLI argument parsing" /><published>2023-02-09T00:00:00+05:30</published><updated>2023-02-09T00:00:00+05:30</updated><id>https://kgrz.io/node-has-native-arg-parsing</id><content type="html" xml:base="https://kgrz.io/node-has-native-arg-parsing.html"><![CDATA[<p>I knew this was in the works, but wasn’t aware this was shipped with v16!
(released in 2022). I was playing with TypeScript code transforms and wanted to
update the source file after the transformation based on a flag. The script was
basically standalone, so I didn’t want to depend on any external depedencies
like <code class="language-plaintext highlighter-rouge">argparse</code>. The API I was aiming at was basically:</p>

<figure class="highlight">
  <pre><code class="language-bash" data-lang="bash">node enum-to-const-object.mjs source.ts <span class="o">[</span>...]
node enum-to-const-object.mjs <span class="nt">-w</span> source.ts <span class="o">[</span>...]
node enum-to-const-object.mjs <span class="nt">--write</span> source.ts <span class="o">[</span>...]</code></pre>
</figure>

<p>The first invocation would print out the result to standard out, whereas the
latter two would update the source file in-place, exactly how
<a href="https://prettier.io/"><code class="language-plaintext highlighter-rouge">prettier</code></a> works. The standard library API is pretty neat for such
a simple interface:</p>

<figure class="highlight">
  <pre><code class="language-javascript" data-lang="javascript"><span class="c1">// file: enum-to-const-object.mjs</span>
<span class="c1">// Note the mjs extension, which is why I'm able to use import. Otherwise,</span>
<span class="c1">// you'll have to use require in place of the following line</span>
<span class="k">import</span> <span class="p">{</span> <span class="nx">parseArgs</span> <span class="p">}</span> <span class="k">from</span> <span class="dl">'</span><span class="s1">node:util</span><span class="dl">'</span><span class="p">;</span>

<span class="kd">const</span> <span class="nx">options</span> <span class="o">=</span> <span class="p">{</span>
  <span class="na">write</span><span class="p">:</span> <span class="p">{</span>
    <span class="na">type</span><span class="p">:</span> <span class="dl">'</span><span class="s1">boolean</span><span class="dl">'</span><span class="p">,</span>
    <span class="na">short</span><span class="p">:</span> <span class="dl">'</span><span class="s1">w</span><span class="dl">'</span><span class="p">,</span>
    <span class="na">default</span><span class="p">:</span> <span class="kc">false</span>
  <span class="p">}</span>
<span class="p">};</span>

<span class="kd">const</span> <span class="p">{</span> <span class="nx">values</span><span class="p">,</span> <span class="nx">positionals</span> <span class="p">}</span> <span class="o">=</span> <span class="nf">parseArgs</span><span class="p">({</span> <span class="nx">options</span><span class="p">,</span> <span class="na">allowPositionals</span><span class="p">:</span> <span class="kc">true</span> <span class="p">});</span>
<span class="c1">// values is of the shape { write: &lt;flag value&gt; }</span>
<span class="c1">// positionals: [ source.ts, ... ]</span></code></pre>
</figure>

<p>The <code class="language-plaintext highlighter-rouge">options</code> object is the one used by the parser as the configuration of the flags. The keys of this object are the expected flags in long-hand. The <code class="language-plaintext highlighter-rouge">short</code> property for each of these long-hand flags helps with adding aliases.</p>

<p>In addition to the flag format and strings, there is one more additional option that I had to configure: <code class="language-plaintext highlighter-rouge">allowPositionals</code>. This returns rest of the arguments that are not flags, which in my case are the files I wanted to transform. Once <code class="language-plaintext highlighter-rouge">parseArgs</code> is called using the configuration, and (by default) on <code class="language-plaintext highlighter-rouge">process.argv</code>, the flag values as an key-value pair, and the rest of the arguments are returned―<code class="language-plaintext highlighter-rouge">values</code> which contains the flag values, and <code class="language-plaintext highlighter-rouge">positionals</code> which contains the file list.</p>

<p><a href="https://nodejs.org/docs/latest-v16.x/api/util.html#utilparseargsconfig">Docs Link</a></p>]]></content><author><name>Kashyap Kondamudi</name></author><category term="nodejs" /><summary type="html"><![CDATA[I knew this was in the works, but wasn't aware this was shipped with v16! (released in 2022). This is so useful for small cli scripts.]]></summary></entry><entry><title type="html">Using CSP in report-only and enforcement mode</title><link href="https://kgrz.io/multiple-csp.html" rel="alternate" type="text/html" title="Using CSP in report-only and enforcement mode" /><published>2023-01-17T00:00:00+05:30</published><updated>2023-01-17T00:00:00+05:30</updated><id>https://kgrz.io/multiple-csp</id><content type="html" xml:base="https://kgrz.io/multiple-csp.html"><![CDATA[<p>I recently came across this strategy which uses the standardised <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP">Content
Security Policy</a> for both enforcement and script monitoring on a web
page for security. We use CSP for enforcement already, but I was under the
assumption that report-only mode and enforcement mode are exclusive. That is,
if the <code class="language-plaintext highlighter-rouge">Content-Security-Policy</code> header was used with a few rules, I thought
the <code class="language-plaintext highlighter-rouge">Content-Security-Policy-Report-Only</code> can’t be used; or perhaps it doesn’t
work if we send both. But, in hindsight, this was wrong. For example, let’s say
a page returns the following headers, and there’s an <code class="language-plaintext highlighter-rouge">&lt;img&gt;</code> tag on this page
trying to load images from <code class="language-plaintext highlighter-rouge">example.com</code>.</p>

<figure class="highlight">
  <pre><code class="language-text" data-lang="text">Content-Security-Policy: default-src 'self' images.kgrz.io; report-uri: /report-block
Content-Security-Policy-Report-Only: default-src 'self'; report-uri: /report-only</code></pre>
</figure>

<p>This CSP setting ensures resources only from <code class="language-plaintext highlighter-rouge">images.kgrz.io</code> are successfully
loaded onto the page. I had always had the implicit understanding that the
image load will be blocked, and a report sent out to <code class="language-plaintext highlighter-rouge">/report-block</code> path. But
this is not the case: there’ll be two reports sent-out: one to <code class="language-plaintext highlighter-rouge">/report-block</code>,
and one to <code class="language-plaintext highlighter-rouge">/report-only</code>.</p>

<p>The sample application that demonstrates this example is hosted at
<a href="/apps/csp">/apps/csp</a>. You may have to have a modern-ish website to use this
since I’m using no build pipeline for the JS that’s used on the page. (Anything
that [supports <code class="language-plaintext highlighter-rouge">&lt;script type="module"&gt;][module_can_i_use]</code>)</p>]]></content><author><name>Kashyap Kondamudi</name></author><category term="web" /><summary type="html"><![CDATA[I wanted to test out and understand how multiple CSPs on a single page work, and this post is about that. Not only can you use multiple CSPs, but it can be used to kind of pseudo-monitor everything that happens on the page.]]></summary></entry><entry><title type="html">TIL: Vim’s search is backed by a register!</title><link href="https://kgrz.io/vim-register-search.html" rel="alternate" type="text/html" title="TIL: Vim’s search is backed by a register!" /><published>2023-01-01T00:00:00+05:30</published><updated>2023-01-01T00:00:00+05:30</updated><id>https://kgrz.io/vim-register-search</id><content type="html" xml:base="https://kgrz.io/vim-register-search.html"><![CDATA[<p>When you search for a pattern in Vim, it’s stored to the <code class="language-plaintext highlighter-rouge">/</code> register. This then can be used to store the query in some variable for some Vim command, or perhaps paste the pattern as text. For example, the following invocation in normal mode:</p>

<figure class="highlight">
  <pre><code class="language-vim" data-lang="vim"><span class="c">"/p</span></code></pre>
</figure>

<p>pastes the search pattern in the current buffer. This same register value is used for repeat searches (<code class="language-plaintext highlighter-rouge">n</code> in normal mode). I’m not sure when I’m ever going to use this, but at least I’ll start to understand some shortcut on vim.fandom.com or SO that has an odd <code class="language-plaintext highlighter-rouge">/</code> in the command.</p>]]></content><author><name>Kashyap Kondamudi</name></author><category term="vim," /><category term="til" /><summary type="html"><![CDATA[When you search for a pattern in Vim, it’s stored to the / register. This then can be used to store the query in some variable for some Vim command, or perhaps paste the pattern as text. For example, the following invocation in normal mode:]]></summary></entry><entry><title type="html">The state in Ansible’s docker container module</title><link href="https://kgrz.io/ansible-docker-container-state.html" rel="alternate" type="text/html" title="The state in Ansible’s docker container module" /><published>2022-08-24T00:00:00+05:30</published><updated>2022-08-24T00:00:00+05:30</updated><id>https://kgrz.io/ansible-docker-container-state</id><content type="html" xml:base="https://kgrz.io/ansible-docker-container-state.html"><![CDATA[<p>I spent roughly an hour on a stupid misunderstanding I had with the documentation for the <a href="https://docs.ansible.com/ansible/2.5/modules/docker_container_module.html">docker_container</a> today. The module has a <code class="language-plaintext highlighter-rouge">state</code> option that turns a few knobs. The two options I got confused between are <code class="language-plaintext highlighter-rouge">present</code> and <code class="language-plaintext highlighter-rouge">started</code>. In hindsight, why I used <code class="language-plaintext highlighter-rouge">present</code> when I meant “I want to start the container” is an obvious problem. But I did, and spent time trying to debug what the heck Ansible’s module was doing differently than <code class="language-plaintext highlighter-rouge">docker run</code>. The playbook I was writing had a bunch of <a href="https://docs.ansible.com/ansible/latest/user_guide/playbooks_delegation.html#delegating-tasks">local_action</a>s to build an nginx-based image, spawn a container off of it, and trigger a few tests against it. Obviously, for the tests to succeed, the container had to be up, but since I was using <code class="language-plaintext highlighter-rouge">state: present</code>, it looked as if the container booted up but got shut down immediately. The <code class="language-plaintext highlighter-rouge">docker inspect</code> output in this case won’t have any hint as to <em>why</em> the container is not actually running. Turns out that <code class="language-plaintext highlighter-rouge">state: present</code> doesn’t actually run the container, just creates it.</p>

<p>So, here’s a small reminder for the future-me that if the container has to be in <code class="language-plaintext highlighter-rouge">running</code> state, the <code class="language-plaintext highlighter-rouge">state</code> should be set to <code class="language-plaintext highlighter-rouge">started</code>. <code class="language-plaintext highlighter-rouge">present</code> only ensures the container is there in the process list, and not in running state. i.e., ignoring all the other interactions that this parameter has with other options, <code class="language-plaintext highlighter-rouge">state: present</code> is effectively <a href="https://docs.docker.com/engine/reference/commandline/create/"><code class="language-plaintext highlighter-rouge">docker create</code></a>, whereas <code class="language-plaintext highlighter-rouge">state: started</code> is analogous <code class="language-plaintext highlighter-rouge">docker run</code>.</p>

<h4 id="aside-why-use-ansible-to-run-docker">Aside: Why use ansible to run docker?</h4>

<p>This whole setup might seem convoluted, but there’s a reason why I chose this. Before I had an M1-based Macbook, I was using Vagrant for local testing of the ansible pipeline. Everytime I make code changes for the server configuration, I’ll run the entire Ansible playbook end to end to setup the local VM provisioned by Vagrant, followed by basic <code class="language-plaintext highlighter-rouge">curl</code>-based tests that check for status codes and cache headers against the server. Vagrant uses VirtualBox for running the virtual machines, which is not supported on M1 chips. The process of converting this exact setup to instead use Docker is not that straight forward in my experience so far. Running <code class="language-plaintext highlighter-rouge">docker build</code> and <code class="language-plaintext highlighter-rouge">docker run</code> directly won’t work in all cases since I use Ansible template that interpolates variables into the nginx configuration files, which have to be compiled before I actually build the docker image. So the short-term alternative I was trying out was to use another playbook that runs every Ansible task in it locally, which includes letting Ansible take care of building, running the server container and the tests.</p>]]></content><author><name>Kashyap Kondamudi</name></author><category term="ansible" /><summary type="html"><![CDATA[present in Ansible's state setting in docker_container module means create container, started means run the container]]></summary></entry><entry><title type="html">Minus zero in Ruby and JavaScript</title><link href="https://kgrz.io/negative-zero-ruby-and-javascript.html" rel="alternate" type="text/html" title="Minus zero in Ruby and JavaScript" /><published>2021-03-16T00:00:00+05:30</published><updated>2021-03-16T00:00:00+05:30</updated><id>https://kgrz.io/negative-zero-ruby-and-javascript</id><content type="html" xml:base="https://kgrz.io/negative-zero-ruby-and-javascript.html"><![CDATA[<p>From Daniel Lemire’s recent-ish <a href="https://lemire.me/blog/2021/03/04/how-does-your-programming-language-handle-minus-zero-0-0/">blog post on this topic</a>:</p>

<blockquote>
  <p>The ubiquitous IEEE floating-point standard defines two numbers to represent zero, the positive and the negative zeros. You also have the positive and negative infinity. If you compute the inverse of the positive zero, you get the positive infinity. If you compute the inverse of the negative zero, you get the negative infinity.</p>
</blockquote>

<p>I wanted to check this out for the most frequent languages I tend to use—Ruby and JavaScript.</p>

<p>First up, Ruby:</p>

<figure class="highlight">
  <pre><code class="language-ruby" data-lang="ruby"><span class="n">minus_zero</span> <span class="o">=</span> <span class="o">-</span><span class="mf">0.0</span>
<span class="n">plus_zero</span> <span class="o">=</span> <span class="mf">0.0</span>

<span class="n">converted</span> <span class="o">=</span> <span class="s2">"-0.0"</span><span class="p">.</span><span class="nf">to_f</span>

<span class="nb">puts</span> <span class="mf">1.0</span> <span class="o">/</span> <span class="n">minus_zero</span>
<span class="nb">puts</span> <span class="mf">1.0</span> <span class="o">/</span> <span class="n">plus_zero</span>
<span class="nb">puts</span> <span class="mf">1.0</span> <span class="o">/</span> <span class="n">converted</span></code></pre>
</figure>

<p>Output: -Infinity, Infinity, -Infinity</p>

<p>Next, JavaScript:</p>

<figure class="highlight">
  <pre><code class="language-javascript" data-lang="javascript"><span class="kd">const</span> <span class="nx">minus_zero</span> <span class="o">=</span> <span class="o">-</span><span class="mf">0.0</span>
<span class="kd">const</span> <span class="nx">plus_zero</span> <span class="o">=</span> <span class="mf">0.0</span>

<span class="kd">const</span> <span class="nx">converted</span> <span class="o">=</span> <span class="nf">parseFloat</span><span class="p">(</span><span class="dl">"</span><span class="s2">-0.0</span><span class="dl">"</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span>

<span class="nx">console</span><span class="p">.</span><span class="nf">log</span><span class="p">(</span><span class="mf">1.0</span> <span class="o">/</span> <span class="nx">minus_zero</span><span class="p">)</span>
<span class="nx">console</span><span class="p">.</span><span class="nf">log</span><span class="p">(</span><span class="mf">1.0</span> <span class="o">/</span> <span class="nx">plus_zero</span><span class="p">)</span>
<span class="nx">console</span><span class="p">.</span><span class="nf">log</span><span class="p">(</span><span class="mf">1.0</span> <span class="o">/</span> <span class="nx">converted</span><span class="p">)</span></code></pre>
</figure>

<p>Output: -Infinity, Infinity, -Infinity</p>

<p>Both languages (Ruby v. 2.7.1, JavaScript(NodeJS) v. 12.14.x &amp; Chrome 91.x) return the values as expected in the post.</p>]]></content><author><name>Kashyap Kondamudi</name></author><category term="general" /><summary type="html"><![CDATA[From Daniel Lemire’s recent-ish blog post on this topic:]]></summary></entry><entry><title type="html">Safari custom user agent CSS overrides using webfonts</title><link href="https://kgrz.io/safari-custom-user-agent-css-overrides-using-webfonts.html" rel="alternate" type="text/html" title="Safari custom user agent CSS overrides using webfonts" /><published>2020-11-18T00:00:00+05:30</published><updated>2020-11-18T00:00:00+05:30</updated><id>https://kgrz.io/safari-custom-user-agent-css-overrides-using-webfonts</id><content type="html" xml:base="https://kgrz.io/safari-custom-user-agent-css-overrides-using-webfonts.html"><![CDATA[<p>I like to use better monospace fonts as default fonts in browsers. In Chrome and Firefox, this is pretty straight forward—go to Preferences, and you’ll see a menu to change the default fonts. This is a bit harder on Safari; its Intelligent Tracking Protection disables loading all local fonts by default.</p>

<p>A very simple way to circumvent this restriction is by using a webfont. The simplest option is to use an <code class="language-plaintext highlighter-rouge">@import</code> statement and load a webfont from, say, Google fonts or some other service (your own web server too!).</p>

<p>Example stylesheet:</p>

<figure class="highlight">
  <pre><code class="language-css" data-lang="css"><span class="k">@import</span> <span class="sx">url('https://fonts.googleapis.com/css2?family=IBM+Plex+Mono:ital@0;1&amp;display=swap')</span><span class="p">;</span>

<span class="nt">pre</span><span class="o">,</span> <span class="nt">code</span> <span class="p">{</span>
	<span class="nl">font-family</span><span class="p">:</span> <span class="s2">'IBM Plex Mono'</span><span class="p">,</span> <span class="nb">monospace</span><span class="p">;</span>
<span class="p">}</span></code></pre>
</figure>

<p>And use the Preferences &gt; Advanced &gt; Style sheet, and load this file from disk. This trick, and a little bit of default zoom makes the IETF spec docs much much better to read.</p>

<p><img class="image" loading="lazy" src="/public/images/preferences.png" alt="Preferences pane in Safari to upload the default stylesheet override" /></p>

<p>Sample screenshot of one of the IETF spec docs I’m reading:</p>

<p><img class="image" loading="lazy" src="/public/images/ietf.png" alt="Cookies: HTTP State Management Mechanism                     draft-ietf-httpbis-rfc6265bis-05 sample screenshot" /></p>

<p>This is not perfect, but goes a long way without needing to install extra plugins ala Stylebot or Cascadea for the basic usecase.</p>]]></content><author><name>Kashyap Kondamudi</name></author><category term="misc" /><summary type="html"><![CDATA[Safari doesn't allow loading local user-installed fonts, so you can't add a custom stylesheet with that font. Easy way to circumvent this is to use a webfont.]]></summary></entry><entry><title type="html">Flattening and Filtering JSON for Cleaner Types in Go</title><link href="https://kgrz.io/go-json-flatten-filter-cleaner-types.html" rel="alternate" type="text/html" title="Flattening and Filtering JSON for Cleaner Types in Go" /><published>2020-07-30T00:00:00+05:30</published><updated>2020-07-30T00:00:00+05:30</updated><id>https://kgrz.io/go-json-flatten-filter-cleaner-types</id><content type="html" xml:base="https://kgrz.io/go-json-flatten-filter-cleaner-types.html"><![CDATA[<p>Before I grokked the <code class="language-plaintext highlighter-rouge">Unmarshaler</code> interface, it was hard to know how to parse a complex <span class="smallcaps">JSON</span> string into a type in one-shot, with or without preprocessing. There are many good <a href="https://blog.golang.org/json">blog</a> <a href="https://blog.gopheracademy.com/advent-2016/advanced-encoding-decoding/">posts</a> on techniques to parse <span class="smallcaps">JSON</span> in Go, but I had to learn this by experimentation to finally wrap my head around it.</p>

<p>I’ll use an example from GitHub’s <code class="language-plaintext highlighter-rouge">/commits</code> REST API, using PR: <a href="https://github.com/ruby/ruby/pull/3365">ruby/ruby#3365</a>. I’ve <a href="https://github.com/kgrz/json-parsing-post/blob/master/commits.json">saved the response</a> in the <a href="https://github.com/kgrz/json-parsing-post">repo</a> where I’ve added full implementation of the example used in this post. The commits response from GitHub REST API is <em>very</em> verbose, depending on the PR size, and having depth greater than 1. In the hypothetical application that I’m writing, I need a list of “objects” that have the following information:</p>

<figure class="highlight">
  <pre><code class="language-go" data-lang="go"><span class="k">type</span> <span class="n">MetaData</span> <span class="k">struct</span> <span class="p">{</span>
	<span class="n">Author</span> <span class="kt">string</span>
	<span class="n">Committer</span> <span class="kt">string</span>
	<span class="n">SHA</span> <span class="kt">string</span>
	<span class="n">Message</span> <span class="kt">string</span>
<span class="p">}</span></code></pre>
</figure>

<p>That is, I want parse <a href="https://github.com/kgrz/json-parsing-post/blob/master/commits.json">this response</a> into a <code class="language-plaintext highlighter-rouge">[]MetaData</code> slice. I <strong>do not</strong> want to traverse the structs in the format of the responses in my main “business logic”, as that makes it hard to follow the important bits. I don’t want to use <code class="language-plaintext highlighter-rouge">interface{}</code> as a placeholder. A better trade-off, in my opinion and use case, is to do as much as possible during the parse phase to massage the data into the structure you want<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. I’m positive that this is a common use case. I ended up learning one way to do this cleanly almost by accident. First, the components involved:</p>

<h4 id="use-anonymous-structs">Use anonymous structs</h4>

<p>Anonymous structs can be used to avoid defining a concrete type and skip giving it a name for one-off use-cases. It’s heavily used in parsing and marshalling code paths, and testing. In our case, this technique can be used to define a “dirty” struct inside the <code class="language-plaintext highlighter-rouge">UnmarshalJSON</code> function on the fly, and use that for parsing the <span class="smallcaps">JSON</span>.</p>

<h4 id="implementing-unmarshaler-interface">Implementing Unmarshaler interface</h4>

<p>Any type that has a <code class="language-plaintext highlighter-rouge">UnmarshalJSON</code> function on it implements the <code class="language-plaintext highlighter-rouge">Unmarshaler</code> interface. This type then can be used as the target for parsing a <span class="smallcaps">JSON</span> sub tree or the entire <span class="smallcaps">JSON</span> itself!</p>

<h3 id="implementation">Implementation</h3>

<p>First step is to mock out the main function:</p>

<figure class="highlight">
  <pre><code class="language-go" data-lang="go"><span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
	<span class="c">// This variable contains the raw json bytes that resulted from the</span>
	<span class="c">// API call. I'm not adding the code for the actual network fetch</span>
	<span class="c">// for now, but in the example repository, I read the commits</span>
	<span class="c">// response from a file</span>
	<span class="k">var</span> <span class="n">jsonb</span> <span class="p">[]</span><span class="kt">byte</span>
	<span class="n">jsonb</span> <span class="o">=</span> <span class="n">JSONFromSomewhere</span><span class="p">()</span>

	<span class="k">var</span> <span class="n">metadatas</span> <span class="p">[]</span><span class="n">MetaData</span>
	<span class="k">if</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">json</span><span class="o">.</span><span class="n">Unmarshal</span><span class="p">(</span><span class="n">jsonb</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">metadatas</span><span class="p">);</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
		<span class="n">log</span><span class="o">.</span><span class="n">Fatalln</span><span class="p">(</span><span class="s">"error parsing JSON"</span><span class="p">,</span> <span class="n">err</span><span class="p">)</span>
	<span class="p">}</span>

	<span class="n">fmt</span><span class="o">.</span><span class="n">Println</span><span class="p">(</span><span class="n">metadatas</span><span class="p">)</span>
<span class="p">}</span></code></pre>
</figure>

<p>The <span class="smallcaps">JSON</span> response of <code class="language-plaintext highlighter-rouge">/commits</code> endpoint is a list of <code class="language-plaintext highlighter-rouge">commit</code> objects, and I’m using a list of <code class="language-plaintext highlighter-rouge">MetaData</code> types to match that interface. For each commit item from the <span class="smallcaps">JSON</span> array, the raw bytes get passed as the argument to the <code class="language-plaintext highlighter-rouge">UnmarshalJSON</code> function on <code class="language-plaintext highlighter-rouge">MetaData</code>.</p>

<p>Next step is to implement the <code class="language-plaintext highlighter-rouge">UnmarshalJSON</code> function using an anonymous struct to parse out the raw commit object <span class="smallcaps">JSON</span> string into it:</p>

<figure class="highlight">
  <pre><code class="language-go" data-lang="go"><span class="k">func</span> <span class="p">(</span><span class="n">m</span> <span class="o">*</span><span class="n">MetaData</span><span class="p">)</span> <span class="n">UnmarshalJSON</span><span class="p">(</span><span class="n">buf</span> <span class="p">[]</span><span class="kt">byte</span><span class="p">)</span> <span class="kt">error</span> <span class="p">{</span>
	<span class="k">var</span> <span class="n">commit</span> <span class="k">struct</span> <span class="p">{</span>
		<span class="n">SHA</span>    <span class="kt">string</span> <span class="s">`json:"sha"`</span>
		<span class="n">Commit</span> <span class="k">struct</span> <span class="p">{</span>
			<span class="n">Author</span> <span class="k">struct</span> <span class="p">{</span>
				<span class="n">Name</span> <span class="kt">string</span> <span class="s">`json:"name"`</span>
			<span class="p">}</span> <span class="s">`json:"author"`</span>
			<span class="n">Committer</span> <span class="k">struct</span> <span class="p">{</span>
				<span class="n">Name</span> <span class="kt">string</span> <span class="s">`json:"name"`</span>
			<span class="p">}</span> <span class="s">`json:"committer"`</span>
			<span class="n">Message</span> <span class="kt">string</span> <span class="s">`json:"message"`</span>
		<span class="p">}</span> <span class="s">`json:"commit"`</span>
	<span class="p">}</span>

	<span class="k">if</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">json</span><span class="o">.</span><span class="n">Unmarshal</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">commit</span><span class="p">);</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
		<span class="k">return</span> <span class="n">errors</span><span class="o">.</span><span class="n">Wrap</span><span class="p">(</span><span class="n">err</span><span class="p">,</span> <span class="s">"parsing into MetaData failed"</span><span class="p">)</span>
	<span class="p">}</span>

	<span class="c">// continued</span>
<span class="p">}</span></code></pre>
</figure>

<p>Final step is to process the <code class="language-plaintext highlighter-rouge">commit</code> struct, and set the appropriate fields on <code class="language-plaintext highlighter-rouge">MetaData</code> struct:</p>

<figure class="highlight">
  <pre><code class="language-go" data-lang="go"><span class="k">func</span> <span class="p">(</span><span class="n">m</span> <span class="o">*</span><span class="n">MetaData</span><span class="p">)</span> <span class="n">UnmarshalJSON</span><span class="p">(</span><span class="n">buf</span> <span class="p">[]</span><span class="kt">byte</span><span class="p">)</span> <span class="kt">error</span> <span class="p">{</span>
	<span class="c">// same as above</span>

	<span class="n">m</span><span class="o">.</span><span class="n">AuthorName</span> <span class="o">=</span> <span class="n">commit</span><span class="o">.</span><span class="n">Commit</span><span class="o">.</span><span class="n">Author</span><span class="o">.</span><span class="n">Name</span>
	<span class="n">m</span><span class="o">.</span><span class="n">CommitterName</span> <span class="o">=</span> <span class="n">commit</span><span class="o">.</span><span class="n">Commit</span><span class="o">.</span><span class="n">Committer</span><span class="o">.</span><span class="n">Name</span>
	<span class="n">m</span><span class="o">.</span><span class="n">SHA</span> <span class="o">=</span> <span class="n">commit</span><span class="o">.</span><span class="n">SHA</span>
	<span class="n">m</span><span class="o">.</span><span class="n">Message</span> <span class="o">=</span> <span class="n">commit</span><span class="o">.</span><span class="n">Commit</span><span class="o">.</span><span class="n">Message</span>

	<span class="k">return</span> <span class="no">nil</span>
<span class="p">}</span></code></pre>
</figure>

<p>That’s it! An additional advantage to this type of narrow types is it’s easier to test.</p>

<hr />

<h3 id="bonus-filtering-the-slice-further">Bonus: Filtering the slice further</h3>

<p>For bonus points, I want to skip certain <code class="language-plaintext highlighter-rouge">[]MetaData</code> elements based on a condition. A way to do this, keeping the same principles as above in mind, is to define a type that covers <code class="language-plaintext highlighter-rouge">[]MetaData</code>, which implements the <code class="language-plaintext highlighter-rouge">Unmarshaler</code> interface:</p>

<figure class="highlight">
  <pre><code class="language-go" data-lang="go"><span class="k">type</span> <span class="n">MetaDatas</span> <span class="p">[]</span><span class="n">MetaData</span>

<span class="k">func</span> <span class="p">(</span><span class="n">ms</span> <span class="o">*</span><span class="n">MetaDatas</span><span class="p">)</span> <span class="n">UnmarshalJSON</span><span class="p">(</span><span class="n">buf</span> <span class="p">[]</span><span class="kt">byte</span><span class="p">)</span> <span class="kt">error</span> <span class="p">{</span>
	<span class="c">// []MetaData is not the same as MetaDatas, and this difference is</span>
	<span class="c">// important!</span>
	<span class="k">var</span> <span class="n">metadatas</span> <span class="p">[]</span><span class="n">MetaData</span>

	<span class="k">if</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">json</span><span class="o">.</span><span class="n">Unmarshal</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">metadatas</span><span class="p">);</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
		<span class="n">log</span><span class="o">.</span><span class="n">Fatalln</span><span class="p">(</span><span class="s">"error parsing JSON"</span><span class="p">,</span> <span class="n">err</span><span class="p">)</span>
	<span class="p">}</span>

	<span class="c">// filtering without allocations</span>
	<span class="c">// https://github.com/golang/go/wiki/SliceTricks#filtering-without-allocating</span>
	<span class="n">cleanedms</span> <span class="o">:=</span> <span class="n">metadatas</span><span class="p">[</span><span class="o">:</span><span class="m">0</span><span class="p">]</span>
	<span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">metadata</span> <span class="o">:=</span> <span class="k">range</span> <span class="n">metadatas</span> <span class="p">{</span>
		<span class="k">if</span> <span class="o">!</span><span class="n">strings</span><span class="o">.</span><span class="n">HasPrefix</span><span class="p">(</span><span class="n">metadata</span><span class="o">.</span><span class="n">Message</span><span class="p">,</span> <span class="s">"WIP"</span><span class="p">)</span> <span class="p">{</span>
			<span class="n">cleanedms</span> <span class="o">=</span> <span class="nb">append</span><span class="p">(</span><span class="n">cleanedms</span><span class="p">,</span> <span class="n">metadata</span><span class="p">)</span>
		<span class="p">}</span>
	<span class="p">}</span>
	<span class="o">*</span><span class="n">ms</span> <span class="o">=</span> <span class="n">cleanedms</span>

	<span class="k">return</span> <span class="no">nil</span>
<span class="p">}</span></code></pre>
</figure>

<p>Like before, I’m using a temporary type of the kind that matches our main type, and using that to parse into. Then I’m clean out slice based on a condition—I want to skip all the commits that start with <code class="language-plaintext highlighter-rouge">WIP</code>. Note that the <code class="language-plaintext highlighter-rouge">metadatas</code> variable defined inside the <code class="language-plaintext highlighter-rouge">UnmarshalJSON</code> function is defined as <code class="language-plaintext highlighter-rouge">[]MetaData</code> and not as <code class="language-plaintext highlighter-rouge">MetaDatas</code>, since doing that would result in a parse-loop. By design, <code class="language-plaintext highlighter-rouge">var metadatas Metadatas</code> and <code class="language-plaintext highlighter-rouge">var metadatas []MetaData</code> are not the same type.</p>

<p>Finally, the filtered slice gets assigned to the underlying object that the <span class="smallcaps">JSON</span> is getting parsed into.</p>

<hr />

<h4 id="a-note-about-performance">A note about performance</h4>

<p>In these examples, the parse flow will create the entire <code class="language-plaintext highlighter-rouge">[]MetaData</code> slice, even though we filter out many of the elements. To my knowledge, this seems like a necessary hit to take. I’m not aware if there’s a way to avoid allocations by pre-pre-processing the incoming bytes to avoid the allocation in the first place. My thought process here is that if we didn’t filter, or cleanup the <span class="smallcaps">JSON</span> data, it will anyway allocate all the objects, so this may not be a huge difference in allocations per se, but that’s just my opinion at this point.</p>

<hr />

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">

      <p>This may not apply everywhere. There are valid cases where parsing
should be as fast and light as possible. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Kashyap Kondamudi</name></author><category term="go" /><summary type="html"><![CDATA[Before I grokked the `Unmarshaler` interface, it was hard to know how to parse a complex JSON string into a type in one-shot, with or without preprocessing. I go through an example to demonstrate one technique.]]></summary></entry></feed>