The Programmer's Compendium

Broken out into:

  • Recipes for setting up or using particular software/services
  • Details about particular topics
  • Rundown of Linux
  • Miscellaneous information about particular domains
  • General guidelines on developing software
  • Underlying fundamentals of programming

Other resources include:

  • My Setup - How to setup a computer for software development.
  • Learn It - The predecessor to this book.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

This is a blank page.

Nginx

Config Generator: https://nginxconfig.io/

Nginx is a reverse proxy, allowing a single port to host multiple websites.

Global configurations

Global configuration (http block-level) can be placed inside /etc/nginx/conf.d/*.conf.

TLS

# tls.conf
# Configure TLS security (requires setting parameters in server blocks to activate)

# USAGE:
# You must generate a dhparam.pem file.
# You must set ssl_certificate, ssl_certificate_key, and ssl_trusted_certificate.
# You must NOT use add_header inside your server {} block at all.

ssl_session_cache shared:SSL:10M;
ssl_session_timeout 180m;
ssl_session_tickets off;

# Secure SSL config - https://mozilla.github.io/server-side-tls/ssl-config-generator/
ssl_protocols TLSv1.2 TLSv1.3;
ssl_dhparam /etc/ssl/certs/dhparam.pem;  # Must generate this manually
ssl_ciphers ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;

# Already set in /etc/nginx/nginx.conf
# ssl_prefer_server_ciphers on;

# SSL OSCP Stapling
ssl_stapling on;
ssl_stapling_verify on;
resolver 8.8.8.8 8.8.4.4;

# Force SSL to this domain (+subdomains) for 6 months (+ preload list)
add_header Strict-Transport-Security "max-age=15768000; includeSubDomains; preload" always;

Security Headers

There are several headers used for security, which can be set by Nginx or by your application. The following global configuration file will set a default if upstream has not provided one.

# security_headers.conf
# Ensure HTTP headers that are important for security are set.
# We set some defaults which can be overridden by the upstream server.

# USAGE:
# You must use proxy_hide_header/uwsgi_hide_header to hide the upstream
#   headers for all headers that we list below.
# You must NOT use add_header inside your server {} block at all.

map $upstream_http_referrer_policy $referrer_policy {
    default $upstream_http_referrer_policy;
    '' "strict-origin-when-cross-origin";
}

map $upstream_http_x_content_type_options $x_content_type_options {
    default $upstream_http_x_content_type_options;
    '' "nosniff";
}

map $upstream_http_x_frame_options $x_frame_options {
    default $upstream_http_x_frame_options;
    '' "DENY";
}

map $upstream_http_x_xss_protection $x_xss_protection {
    default $upstream_http_x_xss_protection;
    '' "1; mode=block";
}

add_header Referrer-Policy $referrer_policy always;
add_header X-Content-Type-Options $x_content_type_options always;
add_header X-Frame-Options $x_frame_options always;
add_header X-XSS-Protection $x_xss_protection always;

Note:: X-XSS-Protection has been superseded with Content Security Policies (CSPs)

Note: X-Frame-Options has been superseded with the frame-ancestors CSP

TODO: Deny the use of browser features (Feature Policies)

TODO: Deny the use of cross-site features (Content Security Policies)

Servers

Servers can be configured inside /etc/nginx/sites-available/, but a symlink should be created inside /etc/nginx/sites-enabled/ for them to become active.

Default Server - Redirect to HTTPS

server {
    listen {{ public_ipv4 }}:80 default_server;
    listen [{{ public_ipv6 }}]:80 default_server;

    # Redirect all HTTP requests to HTTPS with a 301 Moved Permanently response.
    return 301 https://$host$request_uri;
}

Default Server - Prevent non-SNI Traffic

# HTTPS default server for non-SNI requests
# Prevents an otherwise random public server certificate being leaked
# Also captures SNI requests where we are not hosting that domain here
server {
    listen {{ public_ipv4 }}:443 default_server;
    listen [{{ public_ipv6 }}]:443 default_server;
    server_name _;
    ssl_certificate /etc/ssl/certs/ssl-cert-snakeoil.pem;
    ssl_certificate_key /etc/ssl/private/ssl-cert-snakeoil.key;
    return 444;
}

A HTTPS Server

server {
    listen {{ public_ipv4 }}:443 ssl http2;
    listen [{{ public_ipv6 }}]:443 ssl http2;
    server_name {{ domain }};
    charset utf-8;

    # Check if this certificate is really served for this server_name
    # https://serverfault.com/questions/578648/properly-setting-up-a-default-nginx-server-for-https
    if ($host != $server_name) {
        return 444;
    }

    # Certificate
    ssl_certificate /etc/letsencrypt/live/{{ domain }}/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/{{ domain }}/privkey.pem;
    ssl_trusted_certificate /etc/letsencrypt/live/{{ domain }}/chain.pem;

    # Lets Encrypt SSL Cert renewal
    location ~ /.well-known/acme-challenge {
        allow all;
        root /var/www/letsencrypt;
    }
}
  • Ubuntu's Nginx configuration does a lot of things itself including: sendfile, tcp_nopush, tcpnodelay, keepalive_timeout, gzip, basic SSL configuration.
  • Doing listen [::]:443 ipv6only=off does not seem to work well (maybe due to use of IP addresses on other servers?). It is also Linux-only.

Serving Files & Upstream Proxies

uWSGI Proxy

upstream uwsgicluster {
    server unix:///run/uwsgi/app/APP_NAME/socket;
}

server {
    ...

    # Proxying connections to application servers
    location / {
        include         uwsgi_params;
        uwsgi_pass      uwsgicluster;

        uwsgi_param     Host $host;
        uwsgi_param     X-Real-IP $remote_addr;
        uwsgi_param     X-Forwarded-For $proxy_add_x_forwarded_for;
        uwsgi_param     X-Forwarded-Host $server_name;
        uwsgi_param     X-Forwarded-Proto $scheme;
        uwsgi_param     X-Forwarded-Port $server_port;

        # Correct handling of fallbacks for HTTP headers
        uwsgi_hide_header   Referrer-Policy;
        uwsgi_hide_header   X-Content-Type-Options;
        uwsgi_hide_header   X-Frame-Options;
        uwsgi_hide_header   X-XSS-Protection;
    }
}

HTTP Proxy

server {
    ...

    # Proxying connections to application servers
    location / {
        proxy_pass         http://localhost:8080;
        # Required to rewrite "Location" header for Jenkins
        proxy_redirect     http://localhost:8080 https://{{ domain }};
        proxy_read_timeout 60;
        proxy_http_version 1.1;

        proxy_set_header   Host $host;
        proxy_set_header   X-Real-IP $remote_addr;
        proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Host $server_name;
        proxy_set_header   X-Forwarded-Proto $scheme;
        proxy_set_header   X-Forwarded-Port $server_port;

        # Correct handling of fallbacks for HTTP headers
        proxy_hide_header  Referrer-Policy;
        proxy_hide_header  X-Content-Type-Options;
        proxy_hide_header  X-Frame-Options;
        proxy_hide_header  X-XSS-Protection;
    }
}

Serving Static Files

server {
    ...

    # Serve static files
    rewrite ^/robots.txt /static/$request_uri last;
    location /static/ {
        alias /var/www-data/static/;
        disable_symlinks if_not_owner;  # Extra-security
        gzip_static on;

        # Performance
        # access_log off;
        open_file_cache         max=1000;
        open_file_cache_errors  on;
    }
}

A neat command to compress static files: find -L . -type f ! -iname "*.gz" ! -iname "*.png" ! -iname "*.jpg" ! -iname "*.jpeg" ! -iname "*.gif" ! -iname "*.webp" ! -iname "*.heif" -exec gzip --best -kf "{}" \;

Performance/Tuning

  • sendfile - directly from kernel to network socket - covered by Ubuntu, but consider adding sendfile_max_chunk
    • Note that this does not work with gzip!
  • open_file_cache - do not recheck filesystem for file on every request
  • gzip - covered by Ubuntu for HTML only...
  • gzip_static - do not compress on the fly, serve pre-generated .gz files
  • limit_req - consider for rate limiting number-of-requests by IP
  • (limit_conn - consider for rate limiting number-of-requests by connections - alternative to above?)
  • limit_rate - consider for limiting a individual request by limiting the network speed
# Rate limit all requests for a server by IP address
limit_req_zone $binary_remote_addr zone=myzone:10m rate=1r/s;

server {
    ...

    limit_req_status 429;
    limit_req zone=myzone burst=60 nodelay;
}

uWSGI

gunicorn is an alternative that may be simpler to use.

The documentation that likes telling stories. uWSGI has a stupid insane number of features.

uWSGI is best combined with a web-server like Nginx: better at serving static files, the reverse-proxy buffers requests before forwarding to avoid wasting app-workers' time (Slowloris DOS), and possibly better security. Other app-servers may not have the latter issue.

[uwsgi]
; Increased efficiency for larger number of processes(/threads) (no reason not to).
thunder-lock = True

; (Threads disabled)
processes = 2
harakiri = 30

; uwsgitop <stats-socket>
stats = /run/uwsgi/app/APP_NAME/stats-socket
memory-report = True
; Clear environment on exit
vacuum = True

REPLACE_ME_HOST_NAME otherwise return HTTP 421 Misdirected Request (security)
route-if-not = equal:${HTTP_HOST};REPLACE_ME_HOST_NAME return:421
  • Due to uWSGI's default pre-forking behaviour, you may want lazy-apps or a @postfork fix function when running more than one process.
  • Note on harakiri and "post buffering" - the web server should take entire request body before passing it on, otherwise a slow upload could be killed due to the harakiri timer.
  • When using threads, e.g.threads=2, will automatically set enable-threads = true.

Features

Python-Specific Configuration

uWSGI has an easy interface (uwsgidecorators) to:

  • Background tasks (spooler, mules, cron, timers, and generic execute-task-in-background-thread)
  • Communcation between all processes using signals
  • Locks
  • In-memory cache (with periodic disk sync)

They are shared only within that master uWSGI instance.

; Python
plugin = python3
virtualenv = /home/APP_NAME/virtualenv/
chdir = /home/APP_NAME/APP_DIR/
module = APP_NAME.wsgi:application

; Background Tasks
spooler = /home/APP_NAME/tasks/
import = APP_NAME.tasks

; Real time tracebacks (inc. for harakiri'd requests)
; uwsgi --connect-and-read /run/uwsgi/app/APP_NAME/tracebacker1
py-tracebacker = /run/uwsgi/app/APP_NAME/tracebacker

; Only when running  application
; single-interpreter = true

Kubernetes

Commands

Running as a different user

Work around for K8S not supporting --user flag: https://github.com/kubernetes/kubernetes/issues/30656

kubectl get pods
kubectl describe pod <POD-NAME> | grep Node
gcloud compute ssh "<NODE-NAME>"
sudo su - root
docker ps | grep <POD-NAME>
docker exec -it -uroot <ID> /bin/bash

Google Cloud

  • Get databases: gcloud sql instances list
  • Get secrets: kubectl get secret
    • Get password: kubectl get secret KEY -o jsonpath --template {.data.password}

Postgres

Executing queries against the replica

ERROR: canceling statement due to conflict with recovery

Detail: User query might have needed to see row versions that must be removed

The preferred solution is to allow the replica to pause applying the received WAL logs while a query executes:

# /etc/postgresql/10/main/postgresql.conf on a slave
max_standby_archive_delay = 900s
max_standby_streaming_delay = 900s

This is a cumulative delay - if multiple queries a running, then the last one might get terminated after just a second because the total time limit has been reached.

This is a blank page.

Python

  • Use pipx to install environment management tools and scripts. (Execute binaries installed in isolated virtual environments.)
  • Use Poetry to manage packaging and dependencies.

Hey, there's already a Hitchhiker's Guide to Python! This is quite comprehensive on the Python development basics.

GIL

GIL required to run bytecode, but not when waiting on I/O - can switch to different thread. CPython has macros to release and re-aquire GIL which can be used if extension does not need to run python bytecode/do any kind of ref-count changes (e.g. sleep, read/write files, network socket, numpy, image processing(?)). Cannot share GIL in CPU-bound code. CPU-bound "check" every 100 ticks to allow switching. Note: I/O doesn't often block unless you are flushing - OS buffers. Note: Threads/Processes result in CPU context switches, while async does not.

Pure CPU: Process <---> Threads <---> Async: Pure I/O (Many "Connections"/Slow I/O).

Async I/O clearly delineates the context switch locations, in theory.

ProcessesThreadsAsync I/O
No CPU Context Switches
Shared memory can result in race conditions.No race conditions.. usually.
No dead locks.. usually.
Costly.Each thread has its own stack.Shared Stack. Uses an executor pool to run sync tasks in a background thread (this uses additional resources)

Python 2 -> 3: Strings/Bytes, Print, Super() - new style classes, division.

Linting

  • black (how-to)
  • flake8
    • pycodestyle (formerly pep8)
    • pyflakes
    • mccabe
  • pylint
  • bandit/dodgy/safety(pipenv)
  • prospector
  • isort
  • Layer Linter (not tried)
  • pydocstyle
  • mypy
  • vulture
  • pyroma for libraries (setup.py)

History

Python 3.8

  • assignment expressions if (n := len(a)) > 10:
  • improved f-strings f"{x*9 + 15=}"
  • positional and keyword only parameters
  • multiprocessing.shared_memory

Python 3.7:

  • dataclasses TBD: talk link.
  • contextvars
  • breakpoint()
  • postponed evaluation of type annotations
  • dicts officially respect insertion-order
  • time - nanosecond resolution functions
  • f"{number:,}

Python 3.6:

  • f-string literals
  • underscores in numeric literals
  • extended variable annotations
  • (async generators & comprehensions)
  • Local Time Disambiguation
  • secrets module

Python 3000:

  • unicode vs bytes
  • print() vs print
  • division float vs int
  • new-style vs classic classes
  • relative imports (?)
  • views and iterators vs lists (e.g. dict.items() == dict.iteritems())
  • nonlocal
  • extended iterable unpacking
  • set literals
  • removed tuple parameter unpacking

Libraries

  • passlib - high-level secrets library
  • pysnooper - line-by-line print debugging
  • boltons - "boltons should be buitins" (stdlib additions)
  • [freezegun]

Pytest

See my cheat sheet.

Useful plugins:

  • pytest-randomly
  • pytest-datadir
  • pytest-bdd

Stop network calls:

import socket
def stop_network_calls(monkeypatch)
    def _socket(*_, **__):
        raise Exception()
    monkeypatch.setattr(socket, "socket", _socket)

Randomise time-zones:

import os, random, pytest
from _pytest.monkeypatch import MonkeyPatch

@pytest.fixture(scope="session")
def monkeysession():
    mpatch = MonkeyPatch()
    yield mpatch
    mpatch.undo()

@pytest.fixture(autouse=True, scope="session")
def randomise_timezone(monkeysession):
    monkeysession.setenv("TZ", random.choice(["Europe/London"]))

SQLAlchemy

  • Remember to index on ForeignKeys. Postgres does not do this automatically. MySQL always does this.
  • Remember to set onupdate and ondelete cascades on ForeignKeys.
  • To prevent spurious joins: (this cannot be used with NULLable outer joins)
    session.query(Model).options(
        sa.orm.joinedload("child").joinedload("grandchild"),
        sa.orm.raiseload("*"),
    )
    

Pandas

Understanding SettingWithCopyWarning.

Raising it as an error (setup.cfg)

[tool:pytest]
filterwarnings =
    error::pandas.core.common.SettingWithCopyWarning
    error::FutureWarning

Gotchas

Probably the last one you will learn:

x = [1,2,3,4]
lambdas = [
   lambda: x[i]
   for i in range(4)
]

[f() for f in lambdas]

Web

Learning: https://hpbn.co/

Google's Style Guide has basic formatting and other guidance.

Minimal HTML

<!DOCTYPE html>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Saving bytes</title>
<link rel="stylesheet" href="styles.css">
<h1>Let's go</h1>
<p>Annddd we're done

Tools

Web Security

  • Same-Origin Policy (SOP). Security. (Related CORS - not really security, just DRM.)
    • The Same Origin Policy (SOP) applies only to scripts accessing data from another origin.
    • CORS can be used to relax restrictions for scripts.
    • Websockets do not use SOP, the server must verify the Origin: header.
    • SOP does not prevent "Simple Requests" (e.g. HTML forms), but does prevent reading the response. This means to prevent CSRF, the server must verify the Origin: header.
    • Older browsers do not send the Origin: header.
  • Content Security Policies let website owners control which resources they allow themselves [the links on the page] to load.
    • HTTP Headers and HTML meta tag intersect.
  • Feature Policies let website owners control which permissions they allow themselves [the scripts on the page] to use.
  • Cross-Origin Resource Sharing let website owners allow which resources they allow others to load.
    • With exceptions that allow hotlinking for legacy resources images, scripts, css, and audio/video.
    • SOP applies to XHR, web fonts, and a couple of other things, and CORS can be used to relax these restrictions.
  • CSRF - server accepting requests it believe came from the user (exploit server's trust in client)
  • XSS - inject scripts into the attacked website to bypass SOP (exploit client's trust in server - for XSS originating from server)
  • Clickjacking
  • Storage access (cookies, sessionStorage, localStorage, IndexDB)
  • Remember: many (all?) of these headers are only needed on rendered content (HTML, not images)

Git

Sometimes it feels like git doesn't make sense.

Squash merging is terrible and awful. It destroys histories and trashes attempts at re-using a branch https://stackoverflow.com/a/14343784. It is especially a bad idea on a large branch because git blame is likely far less usable. It is useful to ensure all commits pass tests for git bisect though. git slap everyone who introduced it in GitHub/BitBucket UIs.

There's a neat interactive branching tutorial somewhere...

Tips & Tricks

Because git is huge.

  • git log --first-parent show only the first parent commit for merge commits.
  • git blame --ignore-rev 2e0ee159c OR git blame --ignore-revs-file <file> to ignore something like a great reformatting commit.

Security

Sign tags not commits: http://git.661346.n2.nabble.com/GPG-signing-for-git-commit-td2582986.html

https://github.com/NicoHood/gpgit

(Interesting read: https://mikegerwitz.com/papers/git-horror-story.html)

Comparison

ConceptGitMercurialSVN
BranchesBranches......
Stash
Working Area
History

Networking

Private networks:

  • 10.0.0.0/8 (255.0.0.0)
  • 192.168.0.0/16 (255.255.0.0)
  • 172.16.0.0/12 (255.240.0.0)

Connections(?):

https://developers.redhat.com/blog/2018/10/22/introduction-to-linux-interfaces-for-virtual-networking/

  • Bridge (Like a Physical Network Switch)
    • One side of bridge gets IP addresses 10.0.0.2+. The bridge gateway would be 10.0.0.1
    • The other side is, i.e. the host IP 192.168.1.2+
    • NAT Port mapping required
    • Anything connected to the bridge can communicate with each other
  • MACVLAN
    • Connect directly to network, get IP address on network
    • MAC address is virtualised on the NIC - requires promiscuous mode, usually not allowed on cloud
    • High Performance (no NAT/bridge)
  • IPVLAN

Crypto

You already know that you should never roll out your own crypto.

Password Hashing (One-way Key Derivation Functions)

The hash may follow a scheme which encodes the KDF, for example $method$params$salt$hash.

  1. 2018 state-of-the-art: Argon2 (not standardised yet).
    • CPU, Memory, Parallelism-hard
  2. Scrypt
  3. Bcrypt
  4. PBKDF2

Passwords are low-entropy data; you need to slow down attackers with access to vash parallel computation.

Cryptographic Hashing

Designed to be fast while guaranteeing no collisions and no way of reversing the hash.

These are suitable for file integrity, message authentication, inputs for cryptographic signatures.

There is yet another category of hasing functions used for dictionaries/hash tables :)

2FA/TOTP/U2F

Entropy

Entropy is calculated using the password-generation function, not the password!

Time

Use RFC 3339 for dates and times, it is simpler than ISO 8601. However, it does not support durations.

There is a good article that I need to fine. I think the rules were:

  • Be liberal in what you accept.
  • Store time as UTC
  • Store the symbolic time zone (i.e. "Europe/London") if you need to know about daylight savings for further calculations.
  • Return time in UTC, and let the presentation layer convert it to the user's local time.

Examples: http://www.creativedeletion.com/2015/01/28/falsehoods-programmers-date-time-zones.html

Bash (sh) Scripts

Bash Template

Use the following options for safer bash scripts:

set -Eeuo pipefail

Explanation:

set -o errtrace      # Functions, substitutions & sub-shells inherit traps
set -o errexit       # Stops script if any command fail
set -o nounset       # Error when using unset variables
set -o pipefail      # Error when a command in a pipeline fails

The xtrace option is also useful for debugging:

set -x
set -o xtrace        # Print each executed command

XML

xpath

Find tags ignoring namespaces

"//*[local-name()='reservationOriginatorCode']"

Filter nodes based on a sub-node text, then find a sibling element:

<rf5:ResGuests>
  <ResGuest>
    <reservationID>12345</reservationID>
    <ReservationReferences>
      <ReservationReference type="PMSID" legNumber="1" />
    </ReservationReferences>
  </ResGuest>
<rf5:ResGuests>
leg_numbers = set(
    xml.getroottree().xpath(
        # 1. Get ResGuest corresponding to this reservation
        # 2. Get the Leg Number from that ResGuest
        """
        ./rf5:ResGuests/rf5:ResGuest[rf5:reservationID[text()[normalize-space(.)='12345']]]
        //rf5:ReservationReference[@type='PMSID']/@legNumber
        """,
        namespaces=NAMESPACES_MAP,
    )
)

This is a blank page.

Databases

  • Clustering organises the data on the disk:
  • Use pgcli over psql
    • It needs to group tables together, and use colours to more easily identify columns/tables.
  • Long Table Names suck:
    • They make queries so much longer
    • Seriously.

UUID Foreign Keys

I liked the idea.

UUIDs suck:

  • Visually they take up more space (e.g. result of queries)
  • Harder to just remember the ID in your head when you need to
  • Lose automatic (basic) ordering of rows
  • Copying and Pasting is much more difficult because double-click to copy does not include the quotes

SQL

https://pgexercises.com/

http://tatiyants.com/pev/ - Explain Visualiser

Data Description Language (DDL)Data Manipulation Language (DML)Data Control Language (DCL)Transaction Control Language (TCL)
CREATE/ALTER/DROP/TRUNCATE/RENAMESELECT/INSERT/UPDATE/DELETEGRANT/REVOKEBEGIN/COMMIT/ROLLBACK/SAVEPOINT

Query Summary

The key actions are:

SELECT [DISTINCT] ... FROM ...
UNION
[INNER|LEFT|RIGHT|OUTER] JOIN ... ON ...
WHERE ...
GROUP BY ...
HAVING ...
ORDER BY ... [ASC|DESC]
LIMIT ... OFFSET ...

INSERT INTO ... VALUES ...
UPDATE ... SET ... [WHERE ...]
DELETE ...

SELECT can be used with AS to rename a table or a column.

UNION adds the rows of different queries

HAVING allows aggregate functions. WHERE filters individual rows before the GROUP BY, while HAVING filters group rows created by the GROUP BY.

The order is in fact:

  1. FROM [UNION/JOIN]
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. SELECT
  6. ORDER BY
  7. LIMIT

ORDER BY clause = e.g. ORDER BY x DESC, y NULLS FIRST

SQL:2008 LIMIT clause = OFFSET n ROWS FETCH FIRST m ROWS ONLY

Operations

We have the arithmetic operators +, -, *, /, and %.

Logical/Comparison operators include:

IS (NOT) NULL
UNIQUE?
=, <, >
<>, !=
NOT, AND, OR
BETWEEN ... AND ...
EXISTS
IN ...
LIKE ...
... ANY, ALL

BETWEEN can be numbers, texts, or dates. In the case of text it seems to use alphabetical ordering (Unicode considerations)?

EXISTS tests whether a subquery returns any values (boolean existence check).

IN can use a tuple of values ('George', 'Clooney'), or a subquery.

LIKE uses two wildcards % (0+ characters), and _ (exactly one character). For example, _r% would mean values with "r" in the 2nd position.

**ANY and ** ALL can be used with an operator against a subquery, e.g. WHERE column = ANY (SELECT ...).

Aggregate Functions

We have aggregate functions that can act on values:

MIN
MAX
COUNT
AVG
SUM

Be aware, that the aggregate selects on the column, and will not select the entire row.

Misc.

-- comment
/* block comment */

Joins start off as a cross-product, and predicates ("ON") filter the rows down.

  • LEFT OUTER JOIN - always return all rows for A, with anything that happens to match in B (ie. B could be null).
  • RIGHT OUTER JOINs are fairly useless - just do a left join for intuitiveness (possible exception for 3+ tables?).
  • FULL OUTER JOIN - either side could be null.
  • INNER JOIN - return only rows with matching records on both sides.

Trust in your database

Transactions. TBD: Serialisation levels

CAP

Choose two: consistency, availability or partition tolerence

Alt: PACELC.

ACID

Traditional RDBMS drop availability

  • Atomicity - transaction is all or nothing
  • Consistency - database-level rules (e.g. constraints) will always be met
  • Isolation - differing levels! Essentially nobody else sees your changes until you want them to
  • Durability - when it says its done you can either nuke the power cable, machine, or data centre

BASE

Basically Available, Soft state, Eventual consistency.

Often chosen by NOSQL databases. Drops consistency.

Durability concerns. Consistency concerns.

Migrations

My talk: http://qasimk.io/talks/2019/database-migrations-2#1

An excellent article: https://benchling.engineering/move-fast-and-migrate-things-how-we-automated-migrations-in-postgres-d60aba0fc3d4

Cliffnotes for Postgres

  • Adding a NOT NULL field , or one with a dynamic default value (NB: < Postgres 11) causes the entire table and its indexes to be rewritten
    • Do multi-step deploy
  • Adding a new NOT NULL with DEFAULT applies default to all existing rows
    • Add default after column is created applies it to new rows only
  • Removing a NOT NULL field can cause the ORM to read/write a non-existant field
    • Migrate to NULL before removing code
  • Creating an index locks the entire table, preventing writes
    • Use CREATE INDEX CONCURRENTLY
  • Creating constraints (inc. NOT NULL) locks the entire table.
    • Use 2-step creation: ... NOT VALID and VALIDATE CONSTRAINT - NOT VALID uses ACCESS EXCLUSIVE, but VALIDATE only uses SHARE UPDATE EXCLUSIVE and a full table scan.
  • Most ALTER TABLE operations (including ADD COLUMN, DROP COLUMN) require an ACCESS EXCLUSIVE lock which waits for ALL other queries to complete (including SELECT!)
    • Set a very low lock_timeout and statement_timeout (< 10 seconds) to allow migration to fail without affecting normal operation.
    • Reduce size of transactions/queries; beware of SQLAlchemy's automatic start of transaction even on a first select.
  • Default ORM queries select all columns to populate the full object, this is a problem because...
  • DROP COLUMN hides column, and gradually, as writes occur, replaces it with NULLs to free up space.

Versions

  • v11 - you can add a new column with a default without causing a full table re-write

Automated Checks

  • Check indexes are created concurrently
  • Check constraints are created in the background
  • Check new columns are NULLABLE [[without a default value]]
  • Check removed columns are NULLABLE
  • In tests, check queries against a new deprecated_column to ensure they are not being used before removal
  • Use a renamed_to column-thing to automatically create SQL triggers to populate data bi-directionally
  • (SQLAlchemy) Check every foreign key is covered by an index

Snippets

Postgres

Get cell as file

Use With CSV for CSV files.

\copy (Select * From foo) To 'test.csv' With BINARY;

XMLs, manually remove | from the front and end of the file:

\copy (SELECT xml_field FROM table_name) TO 'output.xml' WITH CSV QUOTE AS '|';

Disk Space

Free up disk space:

vacuumdb --all --full --freeze

List total table sizes (including indexes):

SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
    AND C.relkind <> 'i'
    AND nspname !~ '^pg_toast'
  ORDER BY pg_total_relation_size(C.oid) DESC
  LIMIT 20;

Bulk Insert with two columns matching sequence value

**Note this doesn't work LOL. **func.nextval('outbound_messages_transaction_id_seq') + 1 may need to be -1, and I don't know how you can work out which one (+1 or -1) it should be. Apparently postgres has a column order, but it doesn't seem to be respected.

I had a table with two columns: transaction_id and xml, but the xml text also contains the transaction_id. If you generate the transaction_id value client-side, then you can of course set the two columns on your INSERT.

However, I wanted the transaction_id value to be a "semi-unique" identifier. It is semi-unique because an external system requires a value between 1-999,999,999 only. So, for that column I use a bounded Integer Sequence that will cycle back to 1 when it reaches the maximum value of 999,999,999.

Then, the problem is how to generate the XML without knowing the transaction_id that will be generated when the database does the INSERT?

One approach that I tried was to do the bulk-insert, returning the transaction_ids. Then to do another UPDATE replacing the XMLs. I found the built-in replace was good enough if the placeholder was sufficiently unique, but had to cast the integer transaction_id to a string for it to work (no auto-casts for you!). It has to wrapped in a transaction of course.

(I had originally casted it to CHAR, but that just results in a single character...)

insert_records = outbound_message_df.to_dict('records')
with session.begin_nested():
    result = session.execute(
        cls.__table.insert(
            values=insert_records
        ).returning(
            cls.id,
        )
    )
    ids = [row[0] for row in result.fetchall()]

    fix_transaction_ids = '''
        UPDATE
            outbound_messages
        SET
            xml = replace(xml, 'X_override_transaction_id', transaction_id::TEXT)
        WHERE
            id in :ids
    '''
    session.execute(fix_transaction_ids, params={'ids': ids})

This worked, but I thought that it seemed inefficient. The solution that I came to was to replace the XML string on the INSERT using database functions:

INSERT INTO  
    outbound_messages (xml)   
VALUES  
    (replace('<xml version...' , 'X_override_transaction_id', currval('outbound_messages_transaction_id_seq')::TEXT))  
;

This SQL inserts the real XML string. but replaces the placeholder X_override_transaction_id with the correct value of transaction_id!

Or with SQLAlchemy, doing a simultaneous multi-insert:

@classmethod
def substitute_func(xml):
    return func.replace(
        xml,
        'X_override_transaction_id',
        cast(func.currval('outbound_messages_transaction_id_seq'), String),
    )

    outbound_message_df['xml'] = outbound_message_df['xml'].map(substitute_func)
    insert_records = outbound_message_df.to_dict('records')
    return cls.__table__.insert(values=insert_records)

However, this fails when giving multiple values because the current value is fixed across all the rows! It seems to be very difficult to get a sequence during an insert per row-value. Instead, we can get the next value for the sequence, which happens sequentially for the rows, and offset it by one.

INSERT INTO  
    opera_outbound_messages (xml, created_utc, updated_utc)   
VALUES  
    (replace('A-OVERRIDE' , 'OVERRIDE', (nextval('outbound_messages_transaction_id_seq') + 1)::text), now(), now()),
    (replace('B-OVERRIDE' , 'OVERRIDE', (nextval('outbound_messages_transaction_id_seq') + 1)::text), now(), now()),
    (replace('B-OVERRIDE' , 'OVERRIDE', (nextval('outbound_messages_transaction_id_seq') + 1)::text), now(), now())
;
SELECT transaction_id, xml from outbound_messages;

With this SQL query, the database obtains the value for xml using the next value for the sequence, e.g. 1. Then obtains the next value for the column transaction_id, i.e. 2. This is done row-by-row so we get consistent values.We offset the first value to match the second value, so we get 2, 2.

We note that this sequence cycles, so it's important that when we add one to the sequence value for xml, that it wraps around appropriately. This will be fine so long as when you add one, you do not need to wrap.

So we must use something like: minvalue=2, maxvalue=99, cycle=True. Start on an even number and end on an odd number! (Or the other way around if you want to start at 2 rather than 3.)

The equivalent SQLAlchemy:

transaction_id_seq = Sequence(
    'outbound_messages_transaction_id_seq',
    minvalue=2,
    maxvalue=999999999,
    cycle=True,
    metadata=Base.metadata,
)
transaction_id = Column(
    Integer,
    # Do NOT set the value client-side because it messes up the bulk insert.
    # default=transaction_id_seq,
    server_default=transaction_id_seq.next_value(),
    nullable=False,
)

@classmethod
def substitute_func(xml):
    return func.replace(
        xml,
        'X_override_transaction_id',
        cast(func.nextval('outbound_messages_transaction_id_seq') + 1, String),
    )

    outbound_message_df['xml'] = outbound_message_df['xml'].map(substitute_func)
    insert_records = outbound_message_df.to_dict('records')
    return cls.__table__.insert(values=insert_records)

Note: SQLAlchemy's comment on server-side vs client-side sequences.

Placement of the Sequence in both the Python-side and server-side default generation contexts ensures that the “primary key fetch” logic works in all cases. Typically, sequence-enabled databases also support RETURNING for INSERT statements, which is used automatically by SQLAlchemy when emitting this statement. However if RETURNING is not used for a particular insert, then SQLAlchemy would prefer to “pre-execute” the sequence outside of the INSERT statement itself, which only works if the sequence is included as the Python-side default generator function.

You could possibly go to the effort of making the sequence sequential (1, 2, 3, ...).

This is a blank page.

Linux Overview/Development

Development

Packaging

If you ever want to package something for Linux, Repology covers where.

Web Tools

Shell

  • Consider an alias like alias q="ls | grep --color -i".
  • The history command (history search).
  • Process management: & fg bg jobs kill disown nohup (CTRL+C/Z).
  • kill -SIGSTOP and kill -SIGCONT for pause/resume.
  • pgrep to search for PID by name.
  • script to record shell session in file (and cat or less -R to view).
  • From less to edit, press v.
  • Use type -a not which as the latter is problematic

Alternative Tools

  • ncdu (du -smh, df -l)
  • nnn - Terminal File Browser

Makefiles

Template:

MAKEFLAGS += --warn-undefined-variables
MAKEFLAGS += --no-builtin-rules

##     help:   This.
.PHONY: help
.DEFAULT: help
help: Makefile
#       Find all double comments and treat them as docstrings
        @echo "make <command>"
        @sed -n 's/^##//p' $<

##     watch:  Hot-reload web-app server.
.PHONY: watch
watch:
#       Build files in case they changed while we were not watching
        $(MAKE) build
        watchman-make -p '*.py' 'Makefile' 'Dockerfile.web' '.dockerignore' -t build


##
##Run make with VERBOSE=1 for additional output.
$(VERBOSE).SILENT:
# Delete targets on failure
.DELETE_ON_ERROR:

Systemd Files

Ubuntu 16.04 does not support user SystemD services.

"The only thing more complicated and confusing than systemd is the planet's ecosystem." - Me

This page is useless.

For example, create /etc/systemd/system/my-ssh-monitor.service

[Unit]
description=My SSH monitor
Requires=network-online.target
Wants=sshd.service

[Service]
User=..
Group=..
WorkingDirectory=/home/my-app/
ExecStart=/bin/my-monitor
ExecStop=/bin/kill -HUP $MAINPID
PrivateTmp=true

[Install]
WantedBy=default.target

systemctl daemon-reload to activate the unit file.

sudo systemctl start|stop|reload|restart|reload-or-restart|enable|disable|status|is-failed

Types of Services

Type=simple - default, must not fork, considered to started immediately.

Targets

What do they do they do?

default.target

network-online.target

multi-user.target

Timers

Shells

There difference between a shell, terminal, console and command line.

FishBash
Alt-L : ls
Alt-P : append "| less"
Alt-V : edit command in editor
Alt-W : what is the current command?
historyhistory

SSH

The full reference (Arch Wiki).

Common commands:

# Create new secure key with comment (User@Device#Realm)
ssh-keygen -t ed25519 -C Qasim@PC#QasimK
# Change passphrase of existing key
ssh-keygen -f ~/.ssh/id_ed25519 -p
# Add your credentials to remote server to allow you to login
ssh-copy-id -i ~/.ssh/id_ed25519.pub username@remote-server.domain

(ssh-copy-id adds your SSH key to to the remote server's user's authorized_keys file.)

For your ~/.ssh/config file, take a look at my-setup, but here is a reference:

# Add all new SSH key passphrases to ssh agent (doesn't seem to work)
AddKeysToAgent [yes|ask|confirm|no]
# This is default and unnecessary
IdentityFile ~/.ssh/id_rsa

Host MACHINE1
   IdentitiesOnly yes
   IdentityFile ~/.ssh/id_ed25519_MACHINE1

TODO: SSH agent

https://wiki.archlinux.org/index.php/SSH_keys#Start_ssh-agent_with_systemd_user

.pam_environment may require reboot !!! (or manual source?)

SSHD

Test config with sudo sshd -T before restarting the daemon.

The following settings are ordered starting from the most significant, least invasive and easiest to setup:

  • Prevent root login
  • Use SSH keys only
  • Use a less common port, e.g. 24
  • Only allow particular groups/users to login (such as wheel, the administrative group)
PermitRootLogin no
PasswordAuthentication no
Port 23
AllowGroups wheel
AllowUsers qasim
  • Rate-limit attempts: sudo ufw limit OpenSSH (NB: check auto correct port?)
  • Use ssh-geoip (blacklist IPs rather than whitelist to prevent lockout) (not tested; IPv6?)
  • Use an SSH bastion.
  • Use fail2ban (not needed with SSH keys; lockout risk)
  • Require 2FA: libpam-google-authenticator (longer setup; not tested; has backup codes)

Mosh

  • Mosh uses SSH for initial authentication.
  • Requires UDP ports 60000–61000 to be open (you can get away with 60000-60010).

Containers

  1. Programming-level, e.g. Python's Virtualenv
  2. Kernel-level
    1. Application encapsulating:, e.g.Docker (ephemeral, separate storage)
    2. OS encapsulating, e.g. LXC, LXD, and kind-of chroot (UNIX)
  3. Hardware virtualisation
    1. Heavyweight OS (Type 2 hosted hypervisor), e.g. Vagrant (depending on back-end), Virtualbox
    2. (Type 1 bare metal) Hypervisor, e.g. KVM, Hyper-V

Docker, LXC, LXD (uses LXC) all use, on Linux, Linux cgroups and namespaces(net, user, pid, ipc, cgroup).

Kubernetes can be used for orchestration.

You also have things like Firejail, Snap/Flatpak, and AppImage.

Containers may conflate:

  • Security via isolation
  • Packaging dependencies
  • Efficient resource utilisation

Developing with Vagrant

Consider LXD as a lightweight, non-portable alternative.

This is useful if you are working on conflicting projects, or want to keep your computer tidy. Keep your setup on your host (text editor and IDE appliations and their config files), and run the project (any executables) within the container. A shared folder can be used to store the repository.

Vagrant is easy if you use the default Virtualbox provider. And, apparently, impossible with vagrant-lxc :/

Plugins (they have been a disaster for me):

  • vagrant-vbguest to keep VirtualBox's Guest Additions up-to-date on the Vagrant box that started failing
  • vagrant-notify-forwarder for filesystem event forwarding that is not reliable
  • vagrant-disksize to easily increase the size of your Virtualbox disk
    • config.disksize.size = '20GB'
  • vagrant-share to be able to share your container with others that I've never used
  • vagrant-lxc for LXC boxes that don't work

Create a merging custom Vagrantfile in ~/vagrant.d/.

Many applications inside the virtual machine will bind to localhost making them difficult to connect to from the host. We can configure a SOCKS5 proxy with SSH, and use a new profile of Firefox which always connect via that proxy.

config.vm.network "private_network", ip: "172.16.3.2"
config.ssh.extra_args = ["-D", "1632"]

Using the Firefox extension SmartProxy, add the SOCKSv5 Proxy Server (Vagrant; SOCKS5; 127.0.0.1; 1632). Then when browsing to a particular localhost:<port>, click on the toolbar icon and enable "Enable proxy on localhost:<port>".

It is also possible to connect the VM's shared private network directly on 172.16.3.2.

This is easier than forwarding each individual application (which you may not know in advance) with:

config.vm.network "forwarded_port", guest: 8000, host: 8000, host_ip: "127.0.0.1"

Issue with vagrant-vbguest

  if Vagrant.has_plugin?("vagrant-vbguest")
    config.vbguest.auto_update = false
  end

Issue with Kubectl

If you encounter an issue with double port-forwarding (i.e. a port-forward inside the guest and then using Vagrant's port-forward to forward it to your host): https://stackoverflow.com/questions/49940964/windows-host-vagrant-kubectl-port-forward-stuck-inside-vagrant. TODO: I have no idea what that is doing ATM.

# Port Forward from local port 8000 to remote port 80, listening on all addresses so that Vagrant's port forwarding works.

kubectl port-forward --address 0.0.0.0 8000:80

Deploying with Docker

  • Use tini for your applications to handle signals (article).
  • Set ENV PYTHONUNBUFFERED 1 to ensure all logs are always forwarded.
  • Compile the Python files python -m compileall ..
  • Use multi-stage builds for very large Python projects.

Recommended: https://pythonspeed.com/docker/

https://blog.realkinetic.com/building-minimal-docker-containers-for-python-applications-37d0272c52f3

https://cloud.google.com/solutions/best-practices-for-building-containers

https://cloud.google.com/solutions/best-practices-for-operating-containers

Best Practises

  • Use hadolint, a docker file linter
  • Use tini as the correct init (forwards signals and reaps zombies) (article)

Terminal Pro

NNN

The terminal file manager

XDG-Mime

Default applications.

$ xdg-mime query filetype myfile.md

xdg-mime query default text/plain
text/plain
$ ls /usr/share/applications
... org.gnome.gedit.desktop ...
$ xdg-mime default org.gnome.gedit.desktop text/plain
$ xdg-mime query default text/plain
org.gnome.gedit.desktop

This is a blank page.

Two good resources

  1. High level overview: https://martin.kleppmann.com/2011/03/07/accounting-for-computer-scientists.html

  2. https://beancount.github.io/docs/the_double_entry_counting_method.html

Audio

https://xiph.org/

Generally don't re-encode the audio with a different codec unless you are using the original lossless.

Quality

Ogg is a container format for Vorbis and Opus codecs.

  • Vorbis is deprecated. Opus generally beats it.
  • Opus is low-latency, switches between speech and music, very good and low bitrates.

See comparison: https://opus-codec.org/comparison/

FFMPEG

Downcode to x264 1080p (copy audio and subtitles)

ffmpeg -i input.wmv \
       -c:v libx264 -preset veryslow -tune film -crf 22 -vf scale=-2:1080 \
       -c:a copy -c:s copy \
       output.mp4

Transcode audio: -c:a libfdk_aac -b:a 128k

-tune parameter: https://superuser.com/a/564404/229283

This is a blank page.

Principles

These guiding principles reduce bugs and increase implementation speed of new features or modifications.

Remember the scope and context of the project. Is it a script? Is it huge? Is it one-off? Is it mission-critical?

YAGNI/KISS

You aren't gonna need it/keep it simple stupid tells you when to do something. The other principles tell you how to do something.

SOLID

SOLID applies to object-orientated programming.

  • Single Responsibility Principle
    • Domain-dependent
    • Think about who - the role - that will be causing what changes and isolate that code together
    • Higher cohesion (object has greater focus)
    • Lower coupling (object does not connect to or require knowledge of other object)
  • Open/Closed Principle
    • "An object should be open for extension, but closed for modification"
    • This means the design should facilitate new functionality or changes in requirements without modifying existing code
    • The object becoming closed means it now has a well-defined interface and may be used by others
    • This limits the risk of introducing bugs into already developed & tested code
    • In a dynamic language it also means you don't dynamically reach into and modify an object
    • Violation smells: switch/if-else statements that can be extended
    • Implementation notes: use abstraction and interfaces
    • Be aware: Pure inheritance introduces coupling if subclasses depend on implementation details of their parent class.
    • When to use: You cannot predict the future so "modifications may be done once, maybe twice, but not thrice before I used OCP"
  • Liskov Substitution Principle
  • Interface Segregation Principle
    • ???
  • Dependency Inversion Principle
    • ???

Composition over Inheritance

Starting point: https://www.youtube.com/watch?v=3MNVP9-hglc

Inheritance is difficult with Liskov's substitution principle anyway.

Law of Demeter

The principle of least knowledge, in particular for OOD, says to loosely couple the design by having objects only talk to their immediate "friends".

Code Smell: self.friend.internal_variable.dobla()

Better: self.friend.dobla()

Tech Debt

TBD: https://engineering.riotgames.com/news/taxonomy-tech-debt

Design Patterns

Design patterns are boilerplate that is indicative of missing language features. They are useful for communicating with peers.

Anti-patterns

The Lava Layer

The lava layer is the mismash that results from an evolving codebase that lacks singular vision and continuity of development.

Testing anti-patterns

Testing

Tests can be divided by size:

  • unit-tests - tiny, no IO, no other components involved (small) - fast
  • integration tests - test combined units (medium-large) - slow
  • system tests - test multiple high-level stages (large) - slow

They can be divided by methodology:

  • functional tests - functional programming, test output based on input (NB: also used to refer to testing features)
  • acceptance tests - the final end result (black-box) tested at the highest, outer-most level (e.g. using your web app with a web browser)

They can be devided by purpose:

  • smoke tests - a subset of tests that quickly reveal severe failures
  • regression tests - tests that catch bugs that come back, or features that stop working

They can be divided by who is doing them:

  • end-to-end tests - performed by QA involving multiple distinct steps to accomplish a goal
  • user acceptance tests - performed by non-technical business users or otherwise domain-experts to test whether they accept the software in their heart

The naming and boundaries are fairly fuzzy so don't worry about it.

Tests accomplish two main goals:

  • What you've written now is right, including incorporating business requirement directly as a test.
  • What you've written stays right, including allow refactorings with confidence (increase development speed; reduce regressions). It leaves a bad impression (on clients) if a bug keeps reappearing, and demoralises developers.

Software Quality Assurance (SQA) looks at the development process itself to reduce the number of issues.

You also have operation acceptance tests, which test things like failover, backups, recoveries, performance, monitoring, and alerting.

Principles

  • The tests can help guide you towards a nice API, e.g. write the final candidate API with its integration tests and write the lower level details to advance the integration test as needed with unit-tests alongside it. i.e. Write what a thing is supposed to do
  • Found a bug? Write a test that verifies it, then fix it. (Usually.)
  • Arrange-Act-Assert (AAA): set up the test, execute your code, check against expected results. I usually do Arrange-Assert-Act-Assert.
    • This is also known as Given-When-Then in Behaviour-Driven Development (BDD) which uses the friendly english language to express the requirements.

Automated Code Review

Who? The machine runs it for everyone

What?

  • Static code analysis
  • Data-flow analysis?
  • Code coverage

When?

  • Git pre-commit hook
  • Local (manual run)
  • Code review interface
  • Continuous Integration (merger) runner

How to Mock

Don't. Ref: https://www.youtube.com/watch?v=Xu5EhKVZdV8

https://www.youtube.com/watch?v=3MNVP9-hglc "The End Of Object Inheritance & The Beginning Of A New Modularity" here?

Stubs - Fake/Shortcutted implementations, for testing the state of the object

Mocks - Empty "implementations", for testing interactions with the object

How to Test: https://www.destroyallsoftware.com/talks/boundaries

Imperative shell = integration tests of the shell which tie together all the dependencies (no logic; no code paths to test) - that's what they're good at.

Functional core = unit-tests of all that messy logic (that's what they're good at)

Architecture Patterns

APIs

RESTful API, HTTP/Web API, TCP/Socket API - RPCs

REST

A set of patterns and guidelines (a software architectural style). The transport protocol is not defined, but universally REST is used as REST-over-HTTP.

A RESTful API1 is about resources2, where resources are hypermedia. Note that URLs are resources (not verbs/actions). You should spend your time defining the media types used for resource representations, which includes the link relations that the resource may contain.

The resources should not depend on the underlying domain/implementation objects. Create resources that make sense in the context of the API - they can be anything you can think of, for example a process can be a resource, especially if you want to ask questions about the state of the process.

HATEOAS means clients know one entry point (the bookmark) and the media types used for resource representations beforehand. What does the entry point return? In theory this allows you to control your namespace at will (i.e. changing object hierarchies or resource names at will); in practise clients may hardcode URLs and you need backwards-compatibility during the transition to a new URL namespace anyway. It also allows you to specify actions that may dynamically change depending on the resource's state. You may want to use links in headers. In practise, decisions are made when the api integration code is written, not at run-time, as opposed to how we use links in web. HATEOAS requires hypermedia-aware media types such as HTML, Atom and SVG - hyperlinks, which XML and JSON don't define though there certain extensions to these that do. It genuinely puts theory over practise, and as it is uncommon it is actually slightly developer unfriendly.

Version via content-type headers: application/vnd.mycompany.myapp.myresource+json; version=1.0 (or just a completely custom header); or inside the URL (hostname, path or query parameter). Note that, application/json is indeed a media type, but it is incomplete because it would only describe the data format, while (almost always) special processing is required for the content. Roy Fieldings says to version inside the hostname as the next version is a new system.

Pragmatic Advice

  • Don't use HATEOAS.
  • Version inside the URL path, /api/v1/.
  • Use the Accept HTTP header (and perhaps a query parameter) to change the media type of the response, e.g. XML. (Perhaps even an extension /123.json.)
  • Requests should be URL-encoded by default, JSON-encoded requests should require Content-Type: application/json.
  • Use HTTP status codes. Consider: 200, 201, 202, 204, 301, 304, 400, 401, 403, 404, 405, 409, 410, 412, 415, 422, 429.
  • Use the Location header for 201.
  • Specify a consistent error response on 4xx which includes a code, a human-readable message, and in the case of validation errors, a list of them.
  • Use explicit Idempotency-Key HTTP headers for POST when necessary. (NB: every write should be idempotent due to network partitions.)
  • Prevent race-conditions with ETag and If-Match headers.
  • Always use plural forms for collections even when it doesn't make sense, e.g. /people.
  • Allow nested objects to be expanded, e.g. with an ?expand[]=friends.name query parameter.
  • Allow fields to be explicitly specified, e.g. with a ?fields[]=age query parameter.
  • Handle filtering, sorting and searching on collections via query parameters, e.g. /people?q=john&state=active&sort[]=name,-date. (i.e. not hierarchical sub-paths)
  • Consider aliasing query common parameters as a sub-resource path.
  • Avoid envelopes - HTTP already has them. For example, for pagination use links-in-headers, e.g. Link: <https://example.com/v1/people?page=3>; rel="next", <https://example.com/v1/people?page=50>; rel="last" (first, last, next, previous), and a custom HTTP header for the total count like X-Total-Count .
  • How to paginate exactly?
  • Use POST/PUT/PATCH appropriately.
  • Use POST on collections to insert. Use PUT (idempotent) on specific resource path to create/update, and PATCH for partial updates.
  • PUT/POST/PATCH should return the new resource representation. In the case of HTTP 201, use the Location header.
  • If rate-limiting, use headers like X-Rate-Limit-Limit , X-Rate-Limit-Remaining , and X-Rate-Limit-Reset (in seconds).
  • Use HTTP Basic Auth, if possible.
  • Use HTTP caching, e.g. ETag , and Last-Modified (with the appropriate vary and privacy settings).
  • Return pretty (indented) responses (gzip covers size concerns).
  • Return a request ID used for support queries as a HTTP header.
  • Documentation should show examples of complete request-response cycles, including copy-and-pastable code.
  • Clients should be informed of deprecation schedules (in the docs) and of updates (through blogs/changelogs/mailing lists).
  • When all else fails, and you really need an action, PATCH a field on the resource, use a sub-resource path, or, in extreme cases, create a new non-RESTful endpoint.

Alternative: GraphQL (see GitHub's blog post).

TBD: REST without PUT and PATCH. The idea that clients shouldn't even manipulate the resource representation directly, but instead signal intents via new creation or update resources, e.g. an intent to change the resource, ChangeOfName vs PATCH person.name. The former allows for clear auditing/question asking of the process, eventual consistency and

Representational State Transfer (REST):

  • separates concerns - data storage from user interface - client-server, allowing components to evolve separately.
  • is stateless - the request must contain all the information and cannot take advantage of stored context on the server, allowing for increased reliability (eases recovery), visibility (eases debugging) and scalability (no persistent state on server to preserve)
  • has a cache
  • uniform interface - simplifies development despite inefficiency due to standardised formats
  • layered (e.g. load-balancers)

Examples

Safety

Safe: No side-effects (no server state change beyond trivial things like logging); Idempotent: Safely Repeatable.

Pure Read [REST:GET] - Safe

Stateful Reads [GET] - pure functions that need computational resources of server; cursor in databases; logging/analytics/paywalls; rate limiting APIs.

Idempotent Write [PUT/DELETE] - you can safely throw these at the server.

Non-idempotent Write (dangerous) [POST-made idempotent with UUID] - in the event of a network partition you could have a serious problem.

Versioning Pattern

Adapt new response to old response - pipeline, infinite support. See stripe <TBD>

1

Originally described by Roy Fielding in his dissertation in 2000

2

The exact meanings of words like "representation" are described in Fielding's REST data elements table

"Agile"

Refactoring

Notes from Refactoring: Improving the Design of Existing Code, Second Edition.

"The whole purpose of refactoring is to make us program faster, producing more value with less effort."
Refactoring p56, Martin Fowler

"for each desired change, make the change easy (warning: this may be hard), then make the easy change"
Kent Beck, Twitter

paraphrasing: going 20 miles north, 100 miles east on the motorway and back down is faster than going 100 miles east through the woods. "sometimes you need to say, 'wait, I need to ceck the map and find the quickest route'."

Refactoring p51, Jessica Kerr

Key Ideas

  • A refactoring is a small change to the structure of the code to make it easier to understand and cheaper to modify without modifying its observable behaviour.
  • This means, the codebase is never left in a broken state, and you can always stop refactoring even if you haven't finished.
  • Change-Compile-Test-Commit.
  • (When you succumbed to the temptation of making a larger multi-step refactoring in one go and the tests fail, simply revert the code and do the smaller steps!)
  • The rule of three: the first time you just do it, the second time you wince but you do it anyway, and the third time you refactor.
  • Design Stamina Hypothesis.
  • Two hats: two mutually exclusive activities of "adding functionality" and "refactoring". The former mostly involves adding new tests and getting them to work, the latter doesn't need new tests to be added (unless you find some were missing).
  • The distinct types of opportunistic refactoring (following the boy-scout principle):
    • Preparatory refactoring - making it easier to add a feature
    • Comprehension refactoring - making code easier to understand
    • Litter-pickup refactoring - coming across bad code

Don't justify refactoring in terms of "clean code", "good engineering practice", or other moral reasons. Instead, communicate with the core economic benefits: it makes us faster to add features and faster to fix bugs.

Documentation

This: https://www.divio.com/blog/documentation/

This is a blank page.

Sorting

Strings

This is a blank page.

Reading Materials

A collection of further, high-quality reading material:

Books:

  • Refactoring (2nd Edition) by Martin Fowler.. Not even out.

The Great References List

Talk Notes

  • The Clean Architecture in Python - Brandon Rhodes @ PyOhio 2014

    • A soft introduction to domain-driven development, imperative shell/functional core
    • Burying I/O into a sub-procedure hides the complexity of the procedure
    • Decoupling I/O above your procedure simplifies the implementation, allows for easier unit-testing, better composability
    • The latter decouples the I/O, the former
    • Comments are information that can be moved into the code itself (self-documenting)
    • Two methods of avoiding decoupled I/O: dependency injection and mocking
    • Dependency injection (2004 Martin Fowler: make the I/O library or function a parameter) leads to DI frameworks because all the dependencies need to be injected all the way down into the very depths of your code (they could use anything they want). The paramters just need to be passed along.
    • The other problem with DI for testing is that you are not testing the real library
    • Mock is similar.
    • DI/mock make you feel that you are fighting the structure of your program? Code smell.
    • A symptom of coupling - testing combinations of parameters
    • In DDD the deeper you go into a program, the closer to policies you should get. The shell should be mechanisms.
    • Isolated, simple data structures are passed across boundaries.
    • "An imperative shell, that wraps and uses, your functional core"
    • Procedural code: "output as you run"; functional code: stages producing data, that outputs at the end.
    • The biggest advantage of functional programming is that you can see the data between stages. "Show code and hide models and I will be mystified; show models and hide code and it will be obvious"
    • By working in stages - linking simple programs together - you can picture the output of each stage. It continually surfaces the result which can be easily checked. Our minds can picture the data easily.
    • "The coder can concentrate on one section of the program at a time, without the overall detailed program continually intruding"
    • Immutable data structures lead to distributed computing: if you modify memory, you cannot move that code to another machine, but if you just accept data and return data then you can move that to another machine.
    • A function that takes an opaque, complex data structure is actually a disguised method.
    • Brief Python-specific details at the very end (generators, iterators, context managers).
  • Microservices + Events + Docker = A Perfect Trio - Chris Richardson @ DockerCon 2016

    • Monolithic applications become large, and you lose team autonomy and fast development because of conflicts between teams that contribute to the same codebase.
    • So, accelerate development with microservices.
    • But, microservices mean a distributed system: communication between services, partial failures, consistency between databases (no distributed transactions), testing is complicated, and DevOps is complex.
    • A shared database is not microservices because of tight coupling - conflicts between teams result in slowdowns and loss of team autonomy.
    • So, use event-driven architecture to maintain data consistency.
    • But you still have a dual-write problem (write to database and write to message broker).
    • So, use Event-Sourcing: it resolves data consistency, gives you reliable event publishing, gives you auditing, and eliminates ORM problems.
    • But, querying the event store directly can be extremely challenging.
    • So, resolve with Command Query Responsibility Segregation (CQRS) by splitting commands on the aggregate which produce events, from materialised views which consume events and provide a query API (backed with type of database)
    • Use Docker to simplify development and deployment.
    • https://microservices.io

The Cultural History of Programming

  • Magnetic-Core Memory used as RAM between 1955-1975 is the source of the term "core memory" or "core dumps".
  • Socat - Create a listening service that forwards to a service inside a network namespace (socat is amazing): /usr/bin/socat UNIX-LISTEN:/run/transmission.sock,fork exec:'ip netns exec NETNS socat STDIO tcp-connect\:127.0.0.1\:8080',nofork

    • HTTP Client -> [Unix Socket Server] ---- | Network Namespace Barrier | ----> HTTP Server (localhost listening)
  • strace -p PID - Look at calls between the process and the kernel

  • gdb, python-dbg - Look at stack traces live

  • TIL: https://news.ycombinator.com/item?id=17083879 "My registrar suspended my domain because an abusive user was using a subdomain for phishing"

  • Make Vim respect XDG - https://tlvince.com/vim-respect-xdg

  • Write scripts, just flipping write them! https://www.youtube.com/watch?v=Jd8ulMb6_ls "Solve Your Problem with Sloppy Python" @Larry Hastings

  • A product manager can buy software from their team, but they can also buy information like, "Is X possible?" or "How many points to build Y?" or "How much support would be reduced if we did Z?".

  • Fira code for this ligatures: https://medium.com/@docodemore/an-alternative-to-operator-mono-font-6e5d040e1c7e

  • http://learnyouanagda.liamoc.net/pages/introduction.html

  • camelCase, PascalCase, snake_case, kebab-case, CAPS_CASE.

  • Sudoers:pccuser ALL=(ALL:ALL) NOPASSWD:/path/to/command "" pcuser = user or %group we are giving permission to ALL= is the host (ALL works unless you are sharing file across hosts) (ALL:ALL) is (user-we-can-command-as:group-we-can-command-as), i.e. sudo -u user -g group. If omitted only root. Can use just (ALL) NOPASSWD: is tags command may be "ALL". "" prevents command parameters (do not use when specifying parameters)

    -> User Host = (Runas) Command

    • Example (placed after NO-NOPASSWD): %wheel ALL=(root) NOPASSWD:/usr/bin/pacmatic -Syu
  • microservices.io

Python Ecosystem

https://github.com/ambv/black - pycodestyle auto

JavaScript Ecosystem

Personality

Don't say: "IF", "I regret", Don't use passive voice. Say how you're going to make sure this doesn't happen again.