Chrome at the Command-Line to Dump Website Structure

You can use Chrome to dump the DOM, PDF, or screenshot of a webpage, or do a number of other cool things:

https://developers.google.com/web/updates/2017/04/headless-chrome

Open a website in a Chrome process headless-mode, which you can then query from a client process or another browser:

$ chrome --headless --disable-gpu --remote-debugging-port=9222 https://www.chromestatus.com

Print the site HTML:

$ chrome --headless --disable-gpu --dump-dom https://www.chromestatus.com

Capture a PDF:

$ chrome --headless --disable-gpu --print-to-pdf https://www.chromestatus.com

Capture a PNG screenshot:

$ chrome --headless --disable-gpu --screenshot https://www.chromestatus.com

Open a REPL console in which to run JavaScript expressions against the DOM:

$ chrome --headless --disable-gpu --repl https://www.chromestatus.com
Advertisements

Use GPG to Quickly Encrypt at the Command-Line


$ echo "cleartext" | gpg --passphrase "some-passphrase" -c --no-use-agent > text.encrypted
$ cat text.encrypted | gpg --passphrase "passphrase" --no-use-agent 2>/dev/null
$ cat text.encrypted | gpg --passphrase "some-passphrase" --no-use-agent 2>/dev/null
cleartext

Without “–no-use-agent”, you might very well be prompted by some system keyring/agent every time.

Resolving Go Import URLs

The package path that you import may not directly correlate to a repository URL. If Go can not find the package in your GOPATH, it will load the URL, redirecting as necessary, and will either look for a “meta” tag with a “name” attribute having value “go-import” or take whatever URL we are at when no more HTTP redirects are necessary (if any).

To speed things up, I wrote a short Python tool to do this and stashed it in a gist.

Make sure to install the dependencies:

  • requests
  • beautifulsoup4

Source:

import sys
import argparse
import urllib.parse

import requests
import bs4

def _get_cycle(url):
    while 1:
        print("Reading: {}".format(url))

        r = requests.get(url, allow_redirects=False)
        r.raise_for_status()

        bs = bs4.BeautifulSoup(r.text, 'html.parser')

        def meta_filter(tag): 
            # We're looking for meta-tags like this:
            #
            # <meta name="go-import" content="googlemaps.github.io/maps git https://github.com/googlemaps/google-maps-services-go">
            
            return \
                tag.name == 'meta' and \
                tag.attrs.get('name') == 'go-import'

        for m in bs.find_all(meta_filter):
            phrase = m.attrs['content']
            _, vcs, repo_url_root = phrase.split(' ')
            if vcs != 'git':
                continue

            return repo_url_root

        next_url = r.headers.get('Location')
        if next_url is None:
            break

        p = urllib.parse.urlparse(next_url)
        if p.netloc == '':
            # Take the schema, hostname, and port from the last URL.
            p2 = urllib.parse.urlparse(url)
            updated_url = '{}://{}{}'.format(p2.scheme, p2.netloc, next_url)
            print("  [{}] => [{}]".format(next_url, updated_url))

            url = updated_url
        else:
            url = next_url

    return url

def _main():
    description = "Determine the import URL for the given Go import path"
    parser = argparse.ArgumentParser(description=description)
    
    parser.add_argument(
        'import_path',
        help='Go import path')

    args = parser.parse_args()

    initial_url = "https://{}".format(args.import_path)
    final_url = _get_cycle(initial_url)

    print("Final URL: [{}]".format(final_url))

if __name__ == '__main__':
    _main()

Best Argument-Processing for .NET/C#

After trying NDesk.Options and Fluent, I am nothing but impressed with CLAP (“Command-Line Auto-Parser”). It completely relies on reflection and parameter attributes (usually just one or two) to automatically marshal your values, assign defaults, enforce requiredness, and provide command-line documentation. It’s beautiful and, so far, flawless. Well done.

using CLAP;

namespace MyNamespace
{
    class Program
    {
        [Verb(IsDefault = true, Description = &quot;Print the current version of the given package and, optionally, increment it.&quot;)]
        public void Version(
            [Description(&quot;Project path&quot;)]
            [Required]
            string projectPath,

            [Description(&quot;Package name&quot;)]
            [Required]
            string packageName,

            [Description(&quot;Base version to increment from (if lower than current, else use current)&quot;)]
            string baseVersion = null,

            [Description(&quot;Increment the version before returning&quot;)]
            bool increment = false
        )
        {
            // ...
        }
    }
}

If you don’t decorate with the “Required” attribute and don’t provide a default value the parameter will default to null. I explicitly set baseVersion to a default of null because I prefer being explicit.