The Programmer's Compendium

Broken out into:

Recipes for setting up or using particular software/services
Python because I do so much of it
Details about particular Topics
Rundown of Linux
Miscellaneous information about particular Domains
General guidelines on Software Development
Underlying Fundamentals of programming

Other resources include:

My Setup - How to setup a computer for software development.
Learn It - The predecessor to this book.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Nginx

Config Generator: https://nginxconfig.io/

Nginx is a reverse proxy, allowing a single port to host multiple websites.

add_header does not inherit properly.

Global configurations

Global configuration (http block-level) can be placed inside /etc/nginx/conf.d/*.conf.

TLS

# tls.conf
# Configure TLS security (requires setting parameters in server blocks to activate)

# USAGE:
# You must generate a dhparam.pem file.
# You must set ssl_certificate, ssl_certificate_key, and ssl_trusted_certificate.
# You must NOT use add_header inside your server {} block at all.

ssl_session_cache shared:SSL:10M;
ssl_session_timeout 180m;
ssl_session_tickets off;

# Secure SSL config - https://mozilla.github.io/server-side-tls/ssl-config-generator/
ssl_protocols TLSv1.2 TLSv1.3;
ssl_dhparam /etc/ssl/certs/dhparam.pem;  # Must generate this manually
ssl_ciphers ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;

# Already set in /etc/nginx/nginx.conf
# ssl_prefer_server_ciphers on;

# SSL OSCP Stapling
ssl_stapling on;
ssl_stapling_verify on;
resolver 8.8.8.8 8.8.4.4;

# Force SSL to this domain (+subdomains) for 6 months (+ preload list)
add_header Strict-Transport-Security "max-age=15768000; includeSubDomains; preload" always;

Security Headers

There are several headers used for security, which can be set by Nginx or by your application. The following global configuration file will set a default if upstream has not provided one.

# security_headers.conf
# Ensure HTTP headers that are important for security are set.
# We set some defaults which can be overridden by the upstream server.

# USAGE:
# You must use proxy_hide_header/uwsgi_hide_header to hide the upstream
#   headers for all headers that we list below.
# You must NOT use add_header inside your server {} block at all.

map $upstream_http_referrer_policy $referrer_policy {
    default $upstream_http_referrer_policy;
    '' "strict-origin-when-cross-origin";
}

map $upstream_http_x_content_type_options $x_content_type_options {
    default $upstream_http_x_content_type_options;
    '' "nosniff";
}

map $upstream_http_x_frame_options $x_frame_options {
    default $upstream_http_x_frame_options;
    '' "DENY";
}

map $upstream_http_x_xss_protection $x_xss_protection {
    default $upstream_http_x_xss_protection;
    '' "1; mode=block";
}

add_header Referrer-Policy $referrer_policy always;
add_header X-Content-Type-Options $x_content_type_options always;
add_header X-Frame-Options $x_frame_options always;
add_header X-XSS-Protection $x_xss_protection always;

Note:: X-XSS-Protection has been superseded with Content Security Policies (CSPs)

Note: X-Frame-Options has been superseded with the frame-ancestors CSP

TODO: Deny the use of browser features (Feature Policies)

TODO: Deny the use of cross-site features (Content Security Policies)

Servers

Servers can be configured inside /etc/nginx/sites-available/, but a symlink should be created inside /etc/nginx/sites-enabled/ for them to become active.

Default Server - Redirect to HTTPS

server {
    listen {{ public_ipv4 }}:80 default_server;
    listen [{{ public_ipv6 }}]:80 default_server;

    # Redirect all HTTP requests to HTTPS with a 301 Moved Permanently response.
    return 301 https://$host$request_uri;
}

Default Server - Prevent non-SNI Traffic

# HTTPS default server for non-SNI requests
# Prevents an otherwise random public server certificate being leaked
# Also captures SNI requests where we are not hosting that domain here
server {
    listen {{ public_ipv4 }}:443 default_server;
    listen [{{ public_ipv6 }}]:443 default_server;
    server_name _;
    ssl_certificate /etc/ssl/certs/ssl-cert-snakeoil.pem;
    ssl_certificate_key /etc/ssl/private/ssl-cert-snakeoil.key;
    return 444;
}

A HTTPS Server

server {
    listen {{ public_ipv4 }}:443 ssl http2;
    listen [{{ public_ipv6 }}]:443 ssl http2;
    server_name {{ domain }};
    charset utf-8;

    # Check if this certificate is really served for this server_name
    # https://serverfault.com/questions/578648/properly-setting-up-a-default-nginx-server-for-https
    if ($host != $server_name) {
        return 444;
    }

    # Certificate
    ssl_certificate /etc/letsencrypt/live/{{ domain }}/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/{{ domain }}/privkey.pem;
    ssl_trusted_certificate /etc/letsencrypt/live/{{ domain }}/chain.pem;

    # Lets Encrypt SSL Cert renewal
    location ~ /.well-known/acme-challenge {
        allow all;
        root /var/www/letsencrypt;
    }
}

Ubuntu's Nginx configuration does a lot of things itself including: sendfile, tcp_nopush, tcpnodelay, keepalive_timeout, gzip, basic SSL configuration.
Doing listen [::]:443 ipv6only=off does not seem to work well (maybe due to use of IP addresses on other servers?). It is also Linux-only.

Serving Files & Upstream Proxies

uWSGI Proxy

upstream uwsgicluster {
    server unix:///run/uwsgi/app/APP_NAME/socket;
}

server {
    ...

    # Proxying connections to application servers
    location / {
        include         uwsgi_params;
        uwsgi_pass      uwsgicluster;

        uwsgi_param     Host $host;
        uwsgi_param     X-Real-IP $remote_addr;
        uwsgi_param     X-Forwarded-For $proxy_add_x_forwarded_for;
        uwsgi_param     X-Forwarded-Host $server_name;
        uwsgi_param     X-Forwarded-Proto $scheme;
        uwsgi_param     X-Forwarded-Port $server_port;

        # Correct handling of fallbacks for HTTP headers
        uwsgi_hide_header   Referrer-Policy;
        uwsgi_hide_header   X-Content-Type-Options;
        uwsgi_hide_header   X-Frame-Options;
        uwsgi_hide_header   X-XSS-Protection;
    }
}

HTTP Proxy

server {
    ...

    # Proxying connections to application servers
    location / {
        proxy_pass         http://localhost:8080;
        # Required to rewrite "Location" header for Jenkins
        proxy_redirect     http://localhost:8080 https://{{ domain }};
        proxy_read_timeout 60;
        proxy_http_version 1.1;

        proxy_set_header   Host $host;
        proxy_set_header   X-Real-IP $remote_addr;
        proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Host $server_name;
        proxy_set_header   X-Forwarded-Proto $scheme;
        proxy_set_header   X-Forwarded-Port $server_port;

        # Correct handling of fallbacks for HTTP headers
        proxy_hide_header  Referrer-Policy;
        proxy_hide_header  X-Content-Type-Options;
        proxy_hide_header  X-Frame-Options;
        proxy_hide_header  X-XSS-Protection;
    }
}

Serving Static Files

server {
    ...

    # Serve static files
    rewrite ^/robots.txt /static/$request_uri last;
    location /static/ {
        alias /var/www-data/static/;
        disable_symlinks if_not_owner;  # Extra-security
        gzip_static on;

        # Performance
        # access_log off;
        open_file_cache         max=1000;
        open_file_cache_errors  on;
    }
}

A neat command to compress static files: find -L . -type f ! -iname "*.gz" ! -iname "*.png" ! -iname "*.jpg" ! -iname "*.jpeg" ! -iname "*.gif" ! -iname "*.webp" ! -iname "*.heif" -exec gzip --best -kf "{}" \;

Performance/Tuning

sendfile - directly from kernel to network socket - covered by Ubuntu, but consider adding sendfile_max_chunk
- Note that this does not work with gzip!
open_file_cache - do not recheck filesystem for file on every request
gzip - covered by Ubuntu for HTML only...
gzip_static - do not compress on the fly, serve pre-generated .gz files
limit_req - consider for rate limiting number-of-requests by IP
(limit_conn - consider for rate limiting number-of-requests by connections - alternative to above?)
limit_rate - consider for limiting a individual request by limiting the network speed

# Rate limit all requests for a server by IP address
limit_req_zone $binary_remote_addr zone=myzone:10m rate=1r/s;

server {
    ...

    limit_req_status 429;
    limit_req zone=myzone burst=60 nodelay;
}

uWSGI

gunicorn is an alternative that may be simpler to use.

The documentation that likes telling stories. uWSGI has a stupid insane number of features.

uWSGI is best combined with a web-server like Nginx: better at serving static files, the reverse-proxy buffers requests before forwarding to avoid wasting app-workers' time (Slowloris DOS), and possibly better security. Other app-servers may not have the latter issue.

[uwsgi]
; Increased efficiency for larger number of processes(/threads) (no reason not to).
thunder-lock = True

; (Threads disabled)
processes = 2
harakiri = 30

; uwsgitop <stats-socket>
stats = /run/uwsgi/app/APP_NAME/stats-socket
memory-report = True
; Clear environment on exit
vacuum = True

REPLACE_ME_HOST_NAME otherwise return HTTP 421 Misdirected Request (security)
route-if-not = equal:${HTTP_HOST};REPLACE_ME_HOST_NAME return:421

Due to uWSGI's default pre-forking behaviour, you may want lazy-apps or a @postfork fix function when running more than one process.
Note on harakiri and "post buffering" - the web server should take entire request body before passing it on, otherwise a slow upload could be killed due to the harakiri timer.
When using threads, e.g.threads=2, will automatically set enable-threads = true.

Features

Standard HTTP/TLS/SNI/WebSocket/Static Files/Async Apps
Notifications
Remote Procedure Call
Stats Server
Metrics Collection
App Security Jails (Namespaces)
App Resource Limits (Cgroups)
Auto-scaling (Zergs, Broodlord, Cheaper)
Memory deduplication
Custom Signals (target: workers/spoolers/mules/farms; event: custom/filesystem/timers/cron)
Caches (SharedMemory)
Queues
Background tasks (mules, spoolers)
- Cron & Timers
Locks
Routing (requests, cache)
Legion (Failover?)

Python-Specific Configuration

uWSGI has an easy interface (uwsgidecorators) to:

Background tasks (spooler, mules, cron, timers, and generic execute-task-in-background-thread)
Communcation between all processes using signals
Locks
In-memory cache (with periodic disk sync)

They are shared only within that master uWSGI instance.

; Python
plugin = python3
virtualenv = /home/APP_NAME/virtualenv/
chdir = /home/APP_NAME/APP_DIR/
module = APP_NAME.wsgi:application

; Background Tasks
spooler = /home/APP_NAME/tasks/
import = APP_NAME.tasks

; Real time tracebacks (inc. for harakiri'd requests)
; uwsgi --connect-and-read /run/uwsgi/app/APP_NAME/tracebacker1
py-tracebacker = /run/uwsgi/app/APP_NAME/tracebacker

; Only when running  application
; single-interpreter = true

Kubernetes

Commands

Running as a different user

Work around for K8S not supporting --user flag: https://github.com/kubernetes/kubernetes/issues/30656

kubectl get pods
kubectl describe pod <POD-NAME> | grep Node
gcloud compute ssh "<NODE-NAME>"
sudo su - root
docker ps | grep <POD-NAME>
docker exec -it -uroot <ID> /bin/bash

API Deployment

Use k8syaml.com

spec:
    replicas: 2
    # How long before deployment of a new pod is considered failed  (default: 600)
    progressDeadlineSeconds: 180

    # Rolling updates
    strategy:
        type: RollingUpdate
        rollingUpdate:
            # (Surge is rounded up)
            maxSurge: 25%
            maxUnavailable: 0

    # Container(s)
    template:
        spec:
            # List of containers
            containers:
              - image: <url>
                imagePullPolicy: IfNotPresent

                # Health checks: startupProbe, readinessProbe, livenessProbe
                # startupProbe = when static initialDelaySeconds is bad fit

                # readinessProbe = should kubelet direct traffic?
                readinessProbe:
                    httpGet:
                        path: /healthz
                        port: 5000
                        scheme: HTTP
                    initialDelaySeconds: 10
                    periodSeconds: 10
                    successThreshold: 1
                    failureThreshold: 3
                    timeoutSeconds: 5

                # livenessProbe = should kubelet restart pod?
                livenessProbe:
                    httpGet:
                        path: /healthz
                        port: 5000
                        scheme: HTTP
                    initialDelaySeconds: 10
                    periodSeconds: 10
                    successThreshold: 1
                    failureThreshold: 3
                    timeoutSeconds: 10

                resources:
                    requests:
                        memory: "192Mi"
                        cpu: 0.1
                    limits:
                        memory: "256Mi"
                        cpu: 1

                command: []

            restartPolicy: Always (default)

            # How long between SIGTERM and SIGKILL (default: 30)
            terminationGracePeriodSeconds: 10
---
# Protect against voluntary disruptions (node stuff)
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
spec:
    maxUnavailable: 1
---
# Horizontal Auto-scaler
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
spec:
    minReplicas: 1
    maxReplicas: 4
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: <name>
    targetCPUUtilizationPercentage: 50
---
# Load balancer / Service

Tips:

livenessProbe - do not check dependencies
- be wary of temp. long responses causing one restart => cascading restarts
- use timeout of client timeout and forgiving failureThreshold
- set initialDelaySeconds so that containers always have enough time
readinessProbe - check dependencies [or not, see below]
- if shared dependency (database) try not to fail probe because all pods will fail!
  - especially have a large timeout for these checks
- if truly independent, you're okay
- probably don't need one anyway for upstream servies
  - let application return error immediately (503)
  - better than timeout/404/etc.
- note: still need readinessProbe to fail deployment of new ReplicaSet!
pod can be ready:0/1 with status:RUNNING but not accepting traffic!

How to handle ready checking dependencies that not all endpoints use? * probably better not to check dependencies in readinessProbe at all * for shared dependencies, how likely is it that just this pod has a problem

Using readiness probe to check for misconfiguration? * Better to use startupProbe, but what happens if dependency itself is down? * You don't want to block new deployments due to dependency!

Google Cloud

Get databases: gcloud sql instances list
Get secrets: kubectl get secret
- Get password: kubectl get secret KEY -o jsonpath --template {.data.password}

Postgres

Executing queries against the replica

ERROR: canceling statement due to conflict with recovery

Detail: User query might have needed to see row versions that must be removed

The preferred solution is to allow the replica to pause applying the received WAL logs while a query executes:

# /etc/postgresql/10/main/postgresql.conf on a slave
max_standby_archive_delay = 900s
max_standby_streaming_delay = 900s

This is a cumulative delay - if multiple queries a running, then the last one might get terminated after just a second because the total time limit has been reached.

Python

Use pipx to install environment management tools and scripts. (Execute binaries installed in isolated virtual environments.)
Use pdm to manage packaging and dependencies.
Use a PYTHONSTARTUP

Hey, there's already a Hitchhiker's Guide to Python! This is quite comprehensive on the Python development basics.

Libraries

Linting

ruff
mypy
vulture
bandit/dodgy/safety
Layer Linter (not tried)

Development

cqd for debugging (PYTHONSTARTUP it)
pysnooper - line-by-line print debugging

Libraries

passlib - high-level secrets library
boltons - "boltons should be buitins" (stdlib additions)
freezegun

Pandas

Understanding SettingWithCopyWarning.

Raising it as an error (setup.cfg)

[tool:pytest]
filterwarnings =
    error::pandas.core.common.SettingWithCopyWarning
    error::FutureWarning

Gotchas

Probably the last one you will learn:

x = [1,2,3,4]
lambdas = [
   lambda: x[i]
   for i in range(4)
]

[f() for f in lambdas]

GIL

The GIL is required to run bytecode, but not when waiting on I/O - Python can switch to different thread.
CPython has macros to release and re-aquire GIL which can be used if extension does not need to run python bytecode/do any kind of ref-count changes (e.g. sleep, read/write files, network socket, numpy).
The GIL cannot be shared in CPU-bound code. CPU-bound "check" every 100 ticks to allow switching.
I/O doesn't often block unless you are flushing - OS buffers
Threads/Processes result in CPU context switches, while async does not.

CPU-bound : Process ←→ Threads ←→ Async : I/O-bound.

Async I/O clearly delineates the context switch locations, in theory.

	Processes	Threads	Async I/O
Context Switch	Yes	Yes	No
Race Conditions	Unlikely	Yes (shared memory)	Unlikely
Dead Locks	Unlikely	Yes	Unlikely
Memory	High	Medium, new stack	Low, shared stack

Python Profiling & Performance

Profiling Methods

python -m cProfile -o out.prof do.py

Visualisation

Use snakeviz.

Install pipx install snakeviz.

snakeviz out.prof

Or, use tuna which is slightly less misleading.

Other

Austin can attach to running programs.

Austin.

austin -f -o out.prof python3 do.py

I haven't worked out how to visualise this.

Python History

Python 3.12

more flexible f-strings
simpler type annotations for generic classes class Car[T]
unambiguous type aliases type cars = list[str]
typing check decorator @override
type annotation for **kwargs: Unpack[MyTypedDict]
python -m uuid
python -m sqlite3
debugging values in pdb $_retval and $_exception

Python 3.11

exception groups and except* syntax
asyncio task groups TaskGroup
add context to exceptions using add_note
datetime.UTC alias
date and time support for fromisoformat
StrEnum
enum verification checks verify
operator.call
type hints Self and LiteralString
typing assert_never and reveal_type

Python 3.10

structural pattern matching match
context managers can be defined across multiple lines
union operator for type hints X | Y
type hint guarding TypeGuard
itertools pairwise

Python 3.9

time zone support built-in with zoneinfo module
in-place union |= of dictionary-like objects
combined annotations Annotated[float, "seconds"]
type hint containers directly list[float]
more flexible decorator syntax
string removeprefix and removesuffix
maths gcd and lcm

Python 3.8

assignment expressions if (n := len(a)) > 10:
improved f-strings f"{x*9 + 15=}"
positional and keyword only parameters def f(a, /, b, *, c)
multiprocessing.shared_memory
functools.singledispatchmethod
typing Literal and Final
(xml.etree.ElementTree wildcard stuff — super humorous & annoying timing!)`

Python 3.7:

dataclasses TBD: talk link.
contextvars
breakpoint()
postponed evaluation of type annotations
dicts officially respect insertion-order
time - nanosecond resolution functions
f"{number:,}

Python 3.6:

f-string literals
underscores in numeric literals
extended variable annotations
(async generators & comprehensions)
Local Time Disambiguation
secrets module

Python 3.1 - 3.5

Skipped

Python 3000:

unicode vs bytes
print() vs print
division float vs int
new-style vs classic classes
relative imports (?)
views and iterators vs lists (e.g. dict.items() == dict.iteritems())
nonlocal
extended iterable unpacking
set literals
removed tuple parameter unpacking

Using SQLAlchemy in Python

SQLAlchemy is an absolutely outstanding, best-in-class ORM library.

However, the default setup is not perfect.

Idle in Transaction

The default behaviour in SQLAlchemy is to open a transaction for read queries.

However, read queries (inside a transaction), will block all database migrations.

Considering, that migrations block all subsequent query requests, this can take down an application.

Thus, even small transactions can become a serious problem.

In particular, consider when sessions are tied to web requests—a transaction could be open for several seconds.

Recommendation

When using the usual READ COMMITTED isolation level:

use auto-commit for reads
switch to explicit transactions for writes

Problem description and alternate solution: https://www.gorgias.com/blog/prevent-idle-in-transaction-engineering https://www.oddbird.net/2014/06/14/sqlalchemy-postgres-autocommit/

Weakly Referenced Objects

SQLAlchemy weakly references ORM instances. This means, when you convert them into a core type, they are dropped from the identity map.

The impact is that when it comes to updates, SQLAlchemy will have to inefficiently query the database for the object again.

Recommendation

Use SQLAlchemy's recipe for strong references.

Good to Know

SQLAlchemy has an identity map that caches queried objects
SQLAlchemy using mutation tracking and updates the precise fields that changed

Other tips

Remember to index on ForeignKeys. Postgres does not do this automatically. MySQL always does this.
Remember to set onupdate and ondelete cascades on ForeignKeys.

To prevent spurious joins: (this cannot be used with NULLable outer joins)

session.query(Model).options(
    sa.orm.joinedload("child").joinedload("grandchild"),
    sa.orm.raiseload("*"),
)

Web

Learning: https://hpbn.co/

Google's Style Guide has basic formatting and other guidance.

Minimal HTML

<!DOCTYPE html>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Saving bytes</title>
<link rel="stylesheet" href="styles.css">
<h1>Let's go</h1>
<p>Annddd we're done

Tools

Mozilla Observatory - web security checks (also links to a couple of the below)
SSL Labs - TLS security checker
Hardenize - domain/email/web security checks
SecurityHeaders - security checks on HTTP headers

Web Security

Same-Origin Policy (SOP). Security. (Related CORS - not really security, just DRM.)
- The Same Origin Policy (SOP) applies only to scripts accessing data from another origin.
- CORS can be used to relax restrictions for scripts.
- Websockets do not use SOP, the server must verify the Origin: header.
- SOP does not prevent "Simple Requests" (e.g. HTML forms), but does prevent reading the response. This means to prevent CSRF, the server must verify the Origin: header.
- Older browsers do not send the Origin: header.
Content Security Policies let website owners control which resources they allow themselves [the links on the page] to load.
- HTTP Headers and HTML meta tag intersect.
Feature Policies let website owners control which permissions they allow themselves [the scripts on the page] to use.
Cross-Origin Resource Sharing let website owners allow which resources they allow others to load.
- With exceptions that allow hotlinking for legacy resources images, scripts, css, and audio/video.
- SOP applies to XHR, web fonts, and a couple of other things, and CORS can be used to relax these restrictions.
CSRF - server accepting requests it believe came from the user (exploit server's trust in client)
XSS - inject scripts into the attacked website to bypass SOP (exploit client's trust in server - for XSS originating from server)
- Reflected XSS, Persistent XSS, Self-XSS.
- This game might be helpful: https://xss-game.appspot.com/
Clickjacking
Storage access (cookies, sessionStorage, localStorage, IndexDB)
Remember: many (all?) of these headers are only needed on rendered content (HTML, not images)

Git

Sometimes it feels like git doesn't make sense.

Squash merging is terrible and awful. It destroys histories and trashes attempts at re-using a branch https://stackoverflow.com/a/14343784. It is especially a bad idea on a large branch because git blame is likely far less usable. It is useful to ensure all commits pass tests for git bisect though. git slap everyone who introduced it in GitHub/BitBucket UIs.

There's a neat interactive branching tutorial somewhere...

Tips & Tricks

Because git is huge.

git log --first-parent show only the first parent commit for merge commits.
git blame --ignore-rev 2e0ee159c OR git blame --ignore-revs-file <file> to ignore something like a great reformatting commit.

Security

Sign tags not commits: http://git.661346.n2.nabble.com/GPG-signing-for-git-commit-td2582986.html

https://github.com/NicoHood/gpgit

(Interesting read: https://mikegerwitz.com/papers/git-horror-story.html)

Comparison

Concept	Git	Mercurial	SVN
Branches	Branches	...	...
Stash
Working Area
History

Networking

Private networks:

10.0.0.0/8 (255.0.0.0)
192.168.0.0/16 (255.255.0.0)
172.16.0.0/12 (255.240.0.0)

Connections(?):

https://developers.redhat.com/blog/2018/10/22/introduction-to-linux-interfaces-for-virtual-networking/

Bridge (Like a Physical Network Switch)
- One side of bridge gets IP addresses 10.0.0.2+. The bridge gateway would be 10.0.0.1
- The other side is, i.e. the host IP 192.168.1.2+
- NAT Port mapping required
- Anything connected to the bridge can communicate with each other
MACVLAN
- Connect directly to network, get IP address on network
- MAC address is virtualised on the NIC - requires promiscuous mode, usually not allowed on cloud
- High Performance (no NAT/bridge)
IPVLAN *

Crypto

You already know that you should never roll out your own crypto.

Password Hashing (One-way Key Derivation Functions)

The hash may follow a scheme which encodes the KDF, for example $method$params$salt$hash.

2018 state-of-the-art: Argon2 (not standardised yet).
- CPU, Memory, Parallelism-hard
Scrypt
Bcrypt
PBKDF2

Passwords are low-entropy data; you need to slow down attackers with access to vash parallel computation.

Cryptographic Hashing

Designed to be fast while guaranteeing no collisions and no way of reversing the hash.

These are suitable for file integrity, message authentication, inputs for cryptographic signatures.

There is yet another category of hasing functions used for dictionaries/hash tables :)

2FA/TOTP/U2F

Entropy

Entropy is calculated using the password-generation function, not the password!

Time

Use RFC 3339 for dates and times, it is simpler than ISO 8601. However, it does not support durations.

There is a good article that I need to fine. I think the rules were:

Be liberal in what you accept.
Store time as UTC
Store the symbolic time zone (i.e. "Europe/London") if you need to know about daylight savings for further calculations.
Return time in UTC, and let the presentation layer convert it to the user's local time.

Examples: http://www.creativedeletion.com/2015/01/28/falsehoods-programmers-date-time-zones.html

Bash (sh) Scripts

Use shellcheck to lint

Bash Template

Use the following options for safer bash scripts:

set -Eeuo pipefail

Explanation:

set -o errtrace      # Functions, substitutions & sub-shells inherit traps
set -o errexit       # Stops script if any command fail
set -o nounset       # Error when using unset variables
set -o pipefail      # Error when a command in a pipeline fails

The xtrace option is also useful for debugging:

set -x
set -o xtrace        # Print each executed command

XML

xpath

Find tags ignoring namespaces

"//*[local-name()='reservationOriginatorCode']"

Filter nodes based on a sub-node text, then find a sibling element:

<rf5:ResGuests>
  <ResGuest>
    <reservationID>12345</reservationID>
    <ReservationReferences>
      <ReservationReference type="PMSID" legNumber="1" />
    </ReservationReferences>
  </ResGuest>
<rf5:ResGuests>

leg_numbers = set(
    xml.getroottree().xpath(
        # 1. Get ResGuest corresponding to this reservation
        # 2. Get the Leg Number from that ResGuest
        """
        ./rf5:ResGuests/rf5:ResGuest[rf5:reservationID[text()[normalize-space(.)='12345']]]
        //rf5:ReservationReference[@type='PMSID']/@legNumber
        """,
        namespaces=NAMESPACES_MAP,
    )
)

Databases

Clustering organises the data on the disk:
- An example of how this can improve performance: https://bclennox.com/the-postgres-cluster-command
Long Table Names suck:
- They make queries so much longer
- Seriously.

UUID Foreign Keys

I liked the idea.

UUIDs suck:

Visually they take up more space (e.g. result of queries)
Harder to just remember the ID in your head when you need to
Lose automatic (basic) ordering of rows
Copying and pasting is much more difficult because double-click to copy does not include the quotes

SQL

https://pgexercises.com/

http://tatiyants.com/pev/ - Explain Visualiser

Data Description Language (DDL)	Data Manipulation Language (DML)	Data Control Language (DCL)	Transaction Control Language (TCL)
CREATE/ALTER/DROP/TRUNCATE/RENAME	SELECT/INSERT/UPDATE/DELETE	GRANT/REVOKE	BEGIN/COMMIT/ROLLBACK/SAVEPOINT

Query Summary

The key actions are:

SELECT [DISTINCT] ... FROM ...
UNION
[INNER|LEFT|RIGHT|OUTER] JOIN ... ON ...
WHERE ...
GROUP BY ...
HAVING ...
ORDER BY ... [ASC|DESC]
LIMIT ... OFFSET ...

INSERT INTO ... VALUES ...

UPDATE ... SET ... [WHERE ...]

DELETE ...

SELECT can be used with AS to rename a table or a column.

UNION adds the rows of different queries

WHERE determines which rows will be processed by the query and filters rows before the GROUP BY.

HAVING comes in at the end of a query after the GROUP BY and determines which rows will be sent to the client. It allows aggregate functions by filtering group rows created by the GROUP BY.

The difference between WHERE and HAVING.

The order is in fact:

FROM [UNION/JOIN]
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
LIMIT

ORDER BY clause example: ORDER BY x DESC, y NULLS FIRST.

SQL:2008 The LIMIT clause is OFFSET n ROWS FETCH FIRST m ROWS ONLY.

Operations

We have the arithmetic operators +, -, *, /, and %.

Logical/Comparison operators include:

IS (NOT) NULL
UNIQUE?
=, <, >
<>, !=
NOT, AND, OR
BETWEEN ... AND ...
EXISTS
IN ...
LIKE ...
... ANY, ALL

BETWEEN can be numbers, texts, or dates. In the case of text it seems to use alphabetical ordering (Unicode considerations)?

EXISTS tests whether a subquery returns any values (boolean existence check).

IN can use a tuple of values ('George', 'Clooney'), or a subquery.

LIKE uses two wildcards % (0+ characters), and _ (exactly one character). For example, _r% would mean values with "r" in the 2nd position.

ANY and ALL can be used with an operator against a subquery, e.g. WHERE column = ANY (SELECT ...).

Aggregate Functions

We have aggregate functions that can act on values:

MIN
MAX
COUNT
AVG
SUM

Be aware, that the aggregate selects on the column, and will not select the entire row.

Common Table Expressions (CTEs)

Create composable queries using query-specific views. I call these WITH clauses.

WITH
    my_view AS (
        SELECT x
        FROM y
    )
    -- Remove invalid things
  , my_filter AS (
        SELECT x
        FROM y
        WHERE x IS NOT NULL
    )

SELECT x
FROM my_filter
;

Correlated Subqueries

A query nested inside another query that uses values from the outer query.

The subquery is (usually) evaluated once for each outer query row.

In the SELECT clause:

SELECT
    a
  , (
        SELECT AVG(x2.a)
        FROM x AS x2
        WHERE x2.b = x1.b
    )
FROM x AS x1

In the WHERE clause:

SELECT a
FROM x AS x1
WHERE
    a > (
        SELECT AVG(a)
        FROM x AS x2
        WHERE x2.b = x1.b
    )
;

Window functions

https://thoughtbot.com/blog/postgres-window-functions

Misc.

-- comment
/* block comment */

Joins start off as a cross-product, and predicates ("ON") filter the rows down.

LEFT OUTER JOIN - always return all rows for A, with anything that happens to match in B (ie. B could be null).
RIGHT OUTER JOINs are fairly useless - just do a left join for intuitiveness (possible exception for 3+ tables?).
FULL OUTER JOIN - either side could be null.
INNER JOIN - return only rows with matching records on both sides.

Lateral Joins

A lateral join is like a for-each loop. It can join multiple rows for each original row.

Contrast this with a correlated subquery which can only return one row.

Each lateral join subquery in the FROM clause can reference columns from preceding FROM subqueries. Without the lateral, each subquery is evaluated independently.

Trust in your database

Excellent article on race conditions:

https://ketanbhatt.com/db-concurrency-defects/

CAP

Choose two: consistency, availability or partition tolerence

Alt: PACELC.

ACID

Traditional RDBMS drop availability

Atomicity - transaction is all or nothing
Consistency - database-level rules (e.g. constraints) will always be met
Isolation - differing levels! Essentially nobody else sees your changes until you want them to
Durability - when it says its done you can either nuke the power cable, machine, or data centre

BASE

Basically Available, Soft state, Eventual consistency.

Often chosen by NOSQL databases. Drops consistency.

Durability concerns. Consistency concerns.

Migrations

My talk: http://qasimk.io/talks/2019/database-migrations-2#1

An excellent article: https://benchling.engineering/move-fast-and-migrate-things-how-we-automated-migrations-in-postgres-d60aba0fc3d4

Cliffnotes for Postgres

Adding a NOT NULL field , or one with a dynamic default value (NB: < Postgres 11) causes the entire table and its indexes to be rewritten
- Do multi-step deploy
Adding a new NOT NULL with DEFAULT applies default to all existing rows
- Add default after column is created applies it to new rows only
Removing a NOT NULL field can cause the ORM to read/write a non-existant field
- Migrate to NULL before removing code
Creating an index locks the entire table, preventing writes
- Use CREATE INDEX CONCURRENTLY
Creating constraints (inc. NOT NULL) locks the entire table.
- Use 2-step creation: ... NOT VALID and VALIDATE CONSTRAINT - NOT VALID uses ACCESS EXCLUSIVE, but VALIDATE only uses SHARE UPDATE EXCLUSIVE and a full table scan.
Most ALTER TABLE operations (including ADD COLUMN, DROP COLUMN) require an ACCESS EXCLUSIVE lock which waits for ALL other queries to complete (including SELECT!)
- Set a very low lock_timeout and statement_timeout (< 10 seconds) to allow migration to fail without affecting normal operation.
- Reduce size of transactions/queries; beware of SQLAlchemy's automatic start of transaction even on a first select.
Default ORM queries select all columns to populate the full object, this is a problem because...
DROP COLUMN hides column, and gradually, as writes occur, replaces it with NULLs to free up space.

Versions

v11 - you can add a new column with a default without causing a full table re-write

Automated Checks

Check indexes are created concurrently
Check constraints are created in the background
Check new columns are NULLABLE [[without a default value]]
Check removed columns are NULLABLE
In tests, check queries against a new deprecated_column to ensure they are not being used before removal
Use a renamed_to column-thing to automatically create SQL triggers to populate data bi-directionally
(SQLAlchemy) Check every foreign key is covered by an index

Migrations

☚ Back to Book

HOW-TO: Avoid Dangerous Database Migrations

For PostgreSQL versions 9.5 - 12

Database migrations can seem safe on tables with little read and write activity.
However, in production the exact same migration can be deadly for busy tables.

This is only a HOW-TO guide; a more detailed understanding can be obtained from the references.

<p>
    Consult the table of contents before creating an automated migration.

<details>
    <summary><strong>General Advice...</strong></summary>
    <ol>
        <li>Deployments should do the following things in order:
            <ol>
                <li>Migrate the database successfully
                <li>Then, and only then, deploy the new code
            </ol>
        <li>Application code should be forwards-compatible with the new schema.
        <li>Database schemas should be backwards-compatible with the old code.
        <li>Use a statement timeout for all automated migrations:
            <br>
            <code>
                SET statement_timeout = "5000";<br>
            </code>
        <li>Optionally, additionally, use a lock timeout for all automated migrations:
            <br>
            <code>
                SET lock_timeout = "2000";<br>
            </code>
    </ol>
</details>

Create a NULL Column with a DEFAULT

ALTER TABLE table ADD COLUMN column INT DEFAULT 0;

(Setting a default value after adding the column is different—it won't apply the default to existing rows.)

This is unsafe.

This will cause the entire table to be re-written to disk.

Safe alternative:

ALTER TABLE table ADD COLUMN column INT;
Set the value on existing rows in batches:
UPDATE table SET column = 0 WHERE id >= 0 AND id < 1000;

Since PostgreSQL 11 this is safe.

Create a NOT NULL Column

ALTER TABLE table ADD COLUMN column INT DEFAULT 0 NOT NULL;

(A default value is required.)

This is much better than: NULL ➜ Backfill Default ➜ NOT NULL.

It is literally better to use an incorrect default value.

<div data-pg-versions="9.5 9.6 10">
    <p>
    <details>
        <summary><mark class="warning">This is unsafe.</summary>
        <p>This will cause the entire table to be re-written to disk.
        <p>An <code>ACCESS_EXCLUSIVE</code> lock is acquired during this.
        <p>This blocks <strong>all</strong> queries.
        <p>Alternative: make it NULL first&mdash;still slow because a full table scan is required (faster than a table re-write).
    </details>
    <p>There is no way to do this without downtime.
</div>
<div data-pg-versions="11 12">
    <p>
        <mark class="good">
            Since PostgreSQL 11
            <a href="https://brandur.org/postgres-default">
                this is safe.
            </a>
        </mark>
</div>

Make a NULL Column NOT NULL

ALTER TABLE table ALTER column SET NOT NULL;

<div data-pg-versions="9.5 9.6 10 11">
    <p>
    <details>
        <summary><mark class="warning">This is unsafe.</summary>
        <p>This will cause a full table scan.
        <p>An <code>ACCESS_EXCLUSIVE</code> lock is acquired during this.
        <p>This blocks <strong>all</strong> queries.
    </details>
    <p>The closest alternative is to create table constraint instead.
    <p>The downsides are: 0.5-1% performance hit, and the NOT NULL is fairly invisible.
    <ol>
        <li>
            <code>
                SET statement_timeout = "3000";<br>
                SET lock_timeout = "3000";<br>
                ALTER TABLE table
                ADD CONSTRAINT ck_constraint_name
                CHECK (column IS NOT NULL)
                NOT VALID;
            </code>
        <li>
            <code>
                SET statement_timeout = "0";<br>
                SET lock_timeout = "0";<br>
                VALIDATE CONSTRAINT ck_constraint_name;
            </code>
    </ol>

    <p><small><mark>Upgrading to PostgreSQL 12 allows the suboptimal constraint to be replaced.</mark></small>
</div>
<div data-pg-versions="12">
    <p>
    <details>
        <summary><mark class="warning">This is unsafe.</summary>
        <p>An <code>ACCESS_EXCLUSIVE</code> lock is acquired during this.
        <p>This blocks <strong>all</strong> queries.
        <p>A full table scan is required which takes a <em>long</em> time.
    </details>
    <p>Since PostgreSQL 12 it is possible to use an existing constraint to help.
    <p>Safe alternative:
    <ol>
        <li>
            <code>
                SET statement_timeout = "3000";<br>
                SET lock_timeout = "3000";<br>
                ALTER TABLE table
                ADD CONSTRAINT ck_constraint_name
                CHECK (column IS NOT NULL)
                NOT VALID;
            </code>
        <li>
            <code>
                SET statement_timeout = "0";<br>
                SET lock_timeout = "0";<br>
                VALIDATE CONSTRAINT ck_constraint_name;
            </code>
        <li>
            <code>
                ALTER TABLE table ALTER column SET NOT NULL;
            </code>
        <li>
            <code>
                SET statement_timeout = "3000";<br>
                SET lock_timeout = "3000";<br>
                ALTER TABLE table DROP CONSTRAINT ck_constraint_name;
            </code>
</div>

Create an Index

CREATE INDEX name_idx ON table (column);

CREATE UNIQUE INDEX name_idx ON table (column);

<div data-pg-versions="9.5 9.6 10 11 12">
    <p>
        <details>
            <summary><mark class="warning">This is unsafe.</summary>
            <p>A <code>SHARE</code> lock is acquired for this operation.
            <p>This blocks <strong>writes</strong> on the table.
            <p>A full table scan is required which takes a <em>long</em> time.
        </details>
    <p>A good alternative is to create the index concurrently.
    <p>The downside is that more total work is performed and it takes longer overall.
    <p>There are other caveats in the PostgreSQL documentation, but I feel they are unlikely to be a problem.
    <ol>
        <li>
            <code>
                SET statement_timeout = "0";<br>
                SET lock_timeout = "0";<br>
                CREATE INDEX CONCURRENTLY name_idx ON table (column);
            </code>
        <li>Check the index was created successfully:
            <br>
            <code>
                \d table
            </code>
            <br>
            Invalid indexes are marked with <code>INVALID</code>.
        <li>If it was unsuccessful, try again after dropping the index:
            <br>
            <code>
                DROP INDEX name_idx;
            </code>
    </ol>
</div>

Create a Constraint

NOT VALID, then VALIDATE CONSTRAINT

Not tried.

Create a Foreign Key

Not tried.

Requires lock on foreign key target.

Drop a NOT NULL Column

Problem with ORM reading and writing field.

Safe alternative: make null first, then follow drop a column.

Drop a Column

Problem with ORM reading and writing field(?)

Problem with ACCESS EXCLUSIVE lock (waiting issue)

Rename a Column

...

References

Postgres-specific features

Shell

Use pgcli with pipx install pgcli because it offers syntax highlighting and auto-completion.

Watch

Execute a query then type \watch.

Wide tables

Display tables vertically \x auto.

Time queries

Activate before executing \timing on.

Get cell as file

Use With CSV for CSV files.

\copy (Select * From foo) To 'test.csv' With BINARY;

XMLs, manually remove | from the front and end of the file:

\copy (SELECT xml_field FROM table_name) TO 'output.xml' WITH CSV QUOTE AS '|';

Disk Space

Free up disk space:

vacuumdb --all --full --freeze

List total table sizes (including indexes):

SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
    AND C.relkind <> 'i'
    AND nspname !~ '^pg_toast'
  ORDER BY pg_total_relation_size(C.oid) DESC
  LIMIT 20;

Linux Overview/Development

The filesystem hierarchy
The linux DNS lookup
tl;dr how-to guide to commands
- Alt: bropages
- Alt: cheat
- Alt: pet
thefuck corrects your previous console command
How to control the proliferation of dotfiles
(A way-too-big list of tools to make an awesome shell.)
ip cheat sheat (pdf)
A check list of things to know about the command line

Development

direnv - manage environment variables per-project
fish/xiki - alternative shells
tmux - remote window management - cheatsheat
spacer - put spaces between command output

Packaging

If you ever want to package something for Linux, Repology covers where.

Web Tools

crobtab.guru - write cron schedules
explainshell - explain (long) shell commands

Shell

Consider an alias like alias q="ls | grep --color -i".
The history command (history search).
Process management: & fg bg jobs kill disown nohup (CTRL+C/Z).
kill -SIGSTOP and kill -SIGCONT for pause/resume.
pgrep to search for PID by name.
script to record shell session in file (and cat or less -R to view).
From less to edit, press v.
Use type -a not which as the latter is problematic

Alternative Tools

ncdu (du -smh, df -l)
nnn - Terminal File Browser

Makefiles

I advise avoiding makefiles to run scripts.

Template:

MAKEFLAGS += --warn-undefined-variables
MAKEFLAGS += --no-builtin-rules

##     help:   This.
.PHONY: help
.DEFAULT: help
help: Makefile
#       Find all double comments and treat them as docstrings
        @echo "make <command>"
        @sed -n 's/^##//p' $<

##     watch:  Hot-reload web-app server.
.PHONY: watch
watch:
#       Build files in case they changed while we were not watching
        $(MAKE) build
        watchman-make -p '*.py' 'Makefile' 'Dockerfile.web' '.dockerignore' -t build


##
##Run make with VERBOSE=1 for additional output.
$(VERBOSE).SILENT:
# Delete targets on failure
.DELETE_ON_ERROR:

Systemd Files

"The only thing more complicated and confusing than systemd is the planet's ecosystem." - Me

This page is currently useless.

For example, create /etc/systemd/system/my-ssh-monitor.service

[Unit]
description=My SSH monitor
Requires=network-online.target
Wants=sshd.service

[Service]
User=..
Group=..
WorkingDirectory=/home/my-app/
ExecStart=/bin/my-monitor
ExecStop=/bin/kill -HUP $MAINPID
PrivateTmp=true

[Install]
WantedBy=default.target

systemctl daemon-reload to activate the unit file.

Types of Services

Type=simple - default, must not fork, considered to started immediately.

Targets

What do they do they do?

default.target

network-online.target

multi-user.target

Timers

Shells

There difference between a shell, terminal, console and command line.

Fish	Bash
Alt-L : ls
Alt-P : append "\| less"
Alt-V : edit command in editor
Alt-W : what is the current command?
history	history

SSH

Escape sequences

<Enter> ~. - terminate connection (and any multiplexed sessions)
<Enter> ~C - open command console
<Enter> ~? - help message

Managing Keys

The full reference (Arch Wiki).

Common commands:

# Create new secure key with comment (User@Device#Realm)
ssh-keygen -t ed25519 -C Qasim@PC#QasimK

# Change passphrase of existing key
ssh-keygen -f ~/.ssh/id_ed25519 -p

# Add your credentials to remote server to allow you to login
ssh-copy-id -i ~/.ssh/id_ed25519.pub <host>

ssh-copy-id adds your SSH key to the remote user's authorized_keys file.

Port Forwarding (TCP-only)

Requests to a local port are sent to a remote host/port:

The default bind address is localhost.
host can be localhost which means sshserver or a remote host.

Requests to a remote port are sent to a "local" port:

The default bind address is localhost on sshserver. (Changing this requires GatewayPorts in sshd_config.)
The "local" port can be on your device or another host.

To cancel port forwarding:

In a multiplexed session, run ssh -O cancel <port forward command> <host>.
Otherwise enter the control console and type -KL <port forward command> (or -KR or -KD).

Autossh

autossh can restart an SSH command.

Recommendation: use -M0 with ServerAliveInterval and ServerAliveCountMax.

SSHD

Test config with sudo sshd -T before restarting the daemon.

The following settings are ordered starting from the most significant, least invasive and easiest to setup:

Prevent root login
Use SSH keys only
Use a less common port, e.g. 24
Only allow particular groups/users to login (such as wheel, the administrative group)

PermitRootLogin no
PasswordAuthentication no
Port 23
AllowGroups wheel
AllowUsers qasim

Rate-limit attempts: sudo ufw limit OpenSSH (NB: check auto correct port?)
Use ssh-geoip (blacklist IPs rather than whitelist to prevent lockout) (not tested; IPv6?)
Use an SSH bastion.
Use fail2ban (not needed with SSH keys; lockout risk)
Require 2FA: libpam-google-authenticator (longer setup; not tested; has backup codes)

Mosh

Mosh uses SSH for initial authentication.
Requires UDP ports 60000–61000 to be open (you can get away with 60000-60010).

Containers

Programming-level, e.g. Python's Virtualenv
Kernel-level
1. Application encapsulating:, e.g.Docker (ephemeral, separate storage)
2. OS encapsulating, e.g. LXC, LXD, and kind-of chroot (UNIX)
Hardware virtualisation
1. Heavyweight OS (Type 2 hosted hypervisor), e.g. Vagrant (depending on back-end), Virtualbox
2. (Type 1 bare metal) Hypervisor, e.g. KVM, Hyper-V

Docker, LXC, LXD (uses LXC) all use, on Linux, Linux cgroups and namespaces(net, user, pid, ipc, cgroup).

Kubernetes can be used for orchestration.

You also have things like Firejail, Snap/Flatpak, and AppImage.

Containers may conflate:

Security via isolation
Packaging dependencies
Efficient resource utilisation

Developing with Vagrant

Consider LXD as a lightweight, non-portable alternative.

This is useful if you are working on conflicting projects, or want to keep your computer tidy. Keep your setup on your host (text editor and IDE appliations and their config files), and run the project (any executables) within the container. A shared folder can be used to store the repository.

Vagrant is easy if you use the default Virtualbox provider. And, apparently, impossible with vagrant-lxc :/

Plugins (they have been a disaster for me):

vagrant-vbguest to keep VirtualBox's Guest Additions up-to-date on the Vagrant box that started failing
vagrant-notify-forwarder for filesystem event forwarding that is not reliable
vagrant-disksize to easily increase the size of your Virtualbox disk
- config.disksize.size = '20GB'
vagrant-share to be able to share your container with others that I've never used
vagrant-lxc for LXC boxes that don't work

Create a merging custom Vagrantfile in ~/vagrant.d/.

Many applications inside the virtual machine will bind to localhost making them difficult to connect to from the host. We can configure a SOCKS5 proxy with SSH, and use a new profile of Firefox which always connect via that proxy.

config.vm.network "private_network", ip: "172.16.3.2"
config.ssh.extra_args = ["-D", "1632"]

Using the Firefox extension SmartProxy, add the SOCKSv5 Proxy Server (Vagrant; SOCKS5; 127.0.0.1; 1632). Then when browsing to a particular localhost:<port>, click on the toolbar icon and enable "Enable proxy on localhost:<port>".

It is also possible to connect the VM's shared private network directly on 172.16.3.2.

This is easier than forwarding each individual application (which you may not know in advance) with:

config.vm.network "forwarded_port", guest: 8000, host: 8000, host_ip: "127.0.0.1"

Issue with vagrant-vbguest

  if Vagrant.has_plugin?("vagrant-vbguest")
    config.vbguest.auto_update = false
  end

Issue with Kubectl

If you encounter an issue with double port-forwarding (i.e. a port-forward inside the guest and then using Vagrant's port-forward to forward it to your host): https://stackoverflow.com/questions/49940964/windows-host-vagrant-kubectl-port-forward-stuck-inside-vagrant. TODO: I have no idea what that is doing ATM.

# Port Forward from local port 8000 to remote port 80, listening on all addresses so that Vagrant's port forwarding works.

kubectl port-forward --address 0.0.0.0 8000:80

Deploying with Docker

Use tini for your applications to handle signals (article).
Set ENV PYTHONUNBUFFERED 1 to ensure all logs are always forwarded.
Compile the Python files python -m compileall ..
Use multi-stage builds for very large Python projects.

https://blog.realkinetic.com/building-minimal-docker-containers-for-python-applications-37d0272c52f3

https://cloud.google.com/solutions/best-practices-for-building-containers

https://cloud.google.com/solutions/best-practices-for-operating-containers

Best Practises

Use hadolint, a docker file linter
Use tini as the correct init (forwards signals and reaps zombies) (article)

Terminal Pro

NNN

The terminal file manager

XDG-Mime

Default applications.

$ xdg-mime query filetype myfile.md

xdg-mime query default text/plain
text/plain
$ ls /usr/share/applications
... org.gnome.gedit.desktop ...
$ xdg-mime default org.gnome.gedit.desktop text/plain
$ xdg-mime query default text/plain
org.gnome.gedit.desktop

Two good resources

High level overview: https://martin.kleppmann.com/2011/03/07/accounting-for-computer-scientists.html
https://beancount.github.io/docs/the_double_entry_counting_method.html

Audio

https://xiph.org/

Generally don't re-encode the audio with a different codec unless you are using the original lossless.

Quality

Ogg is a container format for Vorbis and Opus codecs.

Vorbis is deprecated. Opus generally beats it.
Opus is low-latency, switches between speech and music, very good and low bitrates.

See comparison: https://opus-codec.org/comparison/

FFMPEG

ffmpeg cheatsheet: https://amiaopensource.github.io/ffmprovisr/

Downcode to x264 1080p (copy audio and subtitles)

ffmpeg -i input.wmv \
       -c:v libx264 -preset veryslow -tune film -crf 22 -vf scale=-2:1080 \
       -c:a copy -c:s copy \
       output.mp4

Transcode audio: -c:a libfdk_aac -b:a 128k

-tune parameter: https://superuser.com/a/564404/229283

Principles

These guiding principles reduce bugs and increase implementation speed of new features or modifications.

Remember the scope and context of the project. Is it a script? Is it huge? Is it one-off? Is it mission-critical?

YAGNI/KISS

You aren't gonna need it/keep it simple stupid tells you when to do something. The other principles tell you how to do something.

SOLID

SOLID applies to object-orientated programming.

Single Responsibility Principle
- Domain-dependent
- Think about who - the role - that will be causing what changes and isolate that code together
- Higher cohesion (object has greater focus)
- Lower coupling (object does not connect to or require knowledge of other object)
Open/Closed Principle
- "An object should be open for extension, but closed for modification"
- This means the design should facilitate new functionality or changes in requirements without modifying existing code
- The object becoming closed means it now has a well-defined interface and may be used by others
- This limits the risk of introducing bugs into already developed & tested code
- In a dynamic language it also means you don't dynamically reach into and modify an object
- Violation smells: switch/if-else statements that can be extended
- Implementation notes: use abstraction and interfaces
- Be aware: Pure inheritance introduces coupling if subclasses depend on implementation details of their parent class.
- When to use: You cannot predict the future so "modifications may be done once, maybe twice, but not thrice before I used OCP"
Liskov Substitution Principle
- "A type may be replaced by its sub-types without causing any harm to the program"
- Otherwise your method must alter its behaviour by investigating the type of object it has been given
- It works slightly differently in dynamic languages.
Interface Segregation Principle
- ???
Dependency Inversion Principle
- ???

Composition over Inheritance

Starting point: https://www.youtube.com/watch?v=3MNVP9-hglc

Inheritance is difficult with Liskov's substitution principle anyway.

Law of Demeter

The principle of least knowledge, in particular for OOD, says to loosely couple the design by having objects only talk to their immediate "friends".

Code Smell: self.friend.internal_variable.dobla()

Better: self.friend.dobla()

Tech Debt

TBD: https://engineering.riotgames.com/news/taxonomy-tech-debt

Design Patterns

Design patterns are boilerplate that is indicative of missing language features. They are useful for communicating with peers.

Anti-patterns

The Lava Layer

The lava layer is the mismash that results from an evolving codebase that lacks singular vision and continuity of development.

Testing anti-patterns

Testing

I am not happy with this page right now.

There are many different names and people use different names to mean different things. The following will likely not confirm to what you think:

Tests can be divided by size:

unit-tests - tiny, no IO, no other components involved (small) - fast
integration tests - test combined units (medium-large) - slow
system tests - test multiple high-level stages (large) - slow

They can be divided by methodology:

functional tests - functional programming, test output based on input (NB: also used to refer to testing features)
acceptance tests - the final end result (black-box) tested at the highest, outer-most level (e.g. using your web app with a web browser)

They can be divided by purpose:

smoke tests - a subset of tests that quickly reveal severe failures
regression tests - tests that catch bugs that come back, or features that stop working

They can be divided by who is doing them:

end-to-end tests - performed by QA involving multiple distinct steps to accomplish a goal
user acceptance tests - performed by non-technical business users or otherwise domain-experts to test whether they accept the software in their heart

Tests accomplish two main goals:

What you've written now is right, including incorporating business requirement directly as a test.
What you've written stays right, including refactoring with confidence (increase development speed; reduce regressions). It leaves a bad impression (on clients) if a bug keeps reappearing, and demoralises developers.

Software Quality Assurance (SQA) looks at the development process itself to reduce the number of issues.

You also have operation acceptance tests, which test things like failover, backups, recoveries, performance, monitoring, and alerting.

Principles

The tests can help guide you towards a nice API, e.g. write the final candidate API with its integration tests and write the lower level details to advance the integration test as needed with unit-tests alongside it. i.e. Write what a thing is supposed to do
Found a bug? Write a test that verifies it, then fix it. (Usually.)
Arrange-Act-Assert (AAA): set up the test, execute your code, check against expected results. I usually do Arrange-Assert-Act-Assert.
- This is also known as Given-When-Then in Behaviour-Driven Development (BDD) which uses the friendly english language to express the requirements.

Automated Code Review

Who? The machine runs it for everyone

What?

Static code analysis
Data-flow analysis?
Code coverage

When?

Git pre-commit hook
Local (manual run)
Code review interface
Continuous Integration (merger) runner

How to Mock

Don't. Ref: https://www.youtube.com/watch?v=Xu5EhKVZdV8

https://www.youtube.com/watch?v=3MNVP9-hglc "The End Of Object Inheritance & The Beginning Of A New Modularity" here?

How to Test: https://www.destroyallsoftware.com/talks/boundaries

Imperative shell = integration tests of the shell which tie together all the dependencies (no logic; no code paths to test) - that's what they're good at.

Functional core = unit-tests of all that messy logic (that's what they're good at)

Test Doubles

https://scribe.rip/m/global-identity-2?redirectUrl=https%3A%2F%2Fblog.pragmatists.com%2Ftest-doubles-fakes-mocks-and-stubs-1a7491dfa3da

Dummy — anything not used
Fakes — working implementation with shortcuts (test for state)
Stubs — answers queries with predefined data
Spies — stubs that record how they were called
Mocks — verify calls against a specification (test for behaviour)

How to choose between these?

There's different schools:

Don't use test doubles for what you own. Only mock/fake external dependencies.

You own your own database, so you can use the real thing for tests.

You don't own a 3rd party API, so you need to use a fake for it (or mock it).

“Mocking introduces assumptions, which introduces risk”. You are assuming the client library is implemented right, you are assuming all boundaries are solid, you are assuming you know how the library actually behaves.

More ideas

https://www.jamesshore.com/v2/projects/nullables/testing-without-mocks

Architecture Patterns

Functional core, imperative shell/stateful surface (Boundaries)
- Soft introduction: The Clean Architecture in Python [Talk Notes]
imperative and procedural vs oo.

APIs

RESTful API, HTTP/Web API, TCP/Socket API - RPCs

REST

I would suggest RESTful APIs for CRUD-based applications, not for DDD-based applications.

A set of patterns and guidelines (a software architectural style). The transport protocol is not defined, but universally REST is used as REST-over-HTTP.

A RESTful API¹ is about resources², where resources are hypermedia. Note that URLs are resources (not verbs/actions). You should spend your time defining the media types used for resource representations, which includes the link relations that the resource may contain.

The resources should not depend on the underlying domain/implementation objects. Create resources that make sense in the context of the API - they can be anything you can think of, for example a process can be a resource, especially if you want to ask questions about the state of the process.

HATEOAS means clients know one entry point (the bookmark) and the media types used for resource representations beforehand. What does the entry point return? In theory this allows you to control your namespace at will (i.e. changing object hierarchies or resource names at will); in practise clients may hardcode URLs and you need backwards-compatibility during the transition to a new URL namespace anyway. It also allows you to specify actions that may dynamically change depending on the resource's state. You may want to use links in headers. In practise, decisions are made when the api integration code is written, not at run-time, as opposed to how we use links in web. HATEOAS requires hypermedia-aware media types such as HTML, Atom and SVG - hyperlinks, which XML and JSON don't define though there certain extensions to these that do. It genuinely puts theory over practise, and as it is uncommon it is actually slightly developer unfriendly.

Version via content-type headers: application/vnd.mycompany.myapp.myresource+json; version=1.0 (or just a completely custom header); or inside the URL (hostname, path or query parameter). Note that, application/json is indeed a media type, but it is incomplete because it would only describe the data format, while (almost always) special processing is required for the content. Roy Fieldings says to version inside the hostname as the next version is a new system.

Pragmatic Advice

Don't use HATEOAS.
Version inside the URL path, /api/v1/.
Use the Accept HTTP header (and perhaps a query parameter) to change the media type of the response, e.g. XML. (Perhaps even an extension /123.json.)
Requests should be URL-encoded by default, JSON-encoded requests should require Content-Type: application/json.
Use HTTP status codes. Consider: 200, 201, 202, 204, 301, 304, 400, 401, 403, 404, 405, 409, 410, 412, 415, 422, 429.
Use the Location header for 201.
Specify a consistent error response on 4xx which includes a code, a human-readable message, and in the case of validation errors, a list of them.
Use explicit Idempotency-Key HTTP headers for POST when necessary. (NB: every write should be idempotent due to network partitions.)
Prevent race-conditions with ETag and If-Match headers.
Always use plural forms for collections even when it doesn't make sense, e.g. /people.
Allow nested objects to be expanded, e.g. with an ?expand[]=friends.name query parameter.
Allow fields to be explicitly specified, e.g. with a ?fields[]=age query parameter.
Handle filtering, sorting and searching on collections via query parameters, e.g. /people?q=john&state=active&sort[]=name,-date. (i.e. not hierarchical sub-paths)
Consider aliasing query common parameters as a sub-resource path.
Avoid envelopes - HTTP already has them. For example, for pagination use links-in-headers, e.g. Link: <https://example.com/v1/people?page=3>; rel="next", <https://example.com/v1/people?page=50>; rel="last" (first, last, next, previous), and a custom HTTP header for the total count like X-Total-Count .
How to paginate exactly?
Use POST/PUT/PATCH appropriately.
Use POST on collections to insert. Use PUT (idempotent) on specific resource path to create/update, and PATCH for partial updates.
PUT/POST/PATCH should return the new resource representation. In the case of HTTP 201, use the Location header.
If rate-limiting, use headers like X-Rate-Limit-Limit , X-Rate-Limit-Remaining , and X-Rate-Limit-Reset (in seconds).
Use HTTP Basic Auth, if possible.
Use HTTP caching, e.g. ETag , and Last-Modified (with the appropriate vary and privacy settings).
Return pretty (indented) responses (gzip covers size concerns).
Return a request ID used for support queries as a HTTP header.
Documentation should show examples of complete request-response cycles, including copy-and-pastable code.
Clients should be informed of deprecation schedules (in the docs) and of updates (through blogs/changelogs/mailing lists).
When all else fails, and you really need an action, PATCH a field on the resource, use a sub-resource path, or, in extreme cases, create a new non-RESTful endpoint.

Alternative: GraphQL (see GitHub's blog post).

TBD: REST without PUT and PATCH. The idea that clients shouldn't even manipulate the resource representation directly, but instead signal intents via new creation or update resources, e.g. an intent to change the resource, ChangeOfName vs PATCH person.name. The former allows for clear auditing/question asking of the process, eventual consistency and

Representational State Transfer (REST):

separates concerns - data storage from user interface - client-server, allowing components to evolve separately.
is stateless - the request must contain all the information and cannot take advantage of stored context on the server, allowing for increased reliability (eases recovery), visibility (eases debugging) and scalability (no persistent state on server to preserve)
has a cache
uniform interface - simplifies development despite inefficiency due to standardised formats
layered (e.g. load-balancers)

Examples

Twilio
Stripe

Safety

Safe: No side-effects (no server state change beyond trivial things like logging); Idempotent: Safely Repeatable.

Pure Read [REST:GET] - Safe

Stateful Reads [GET] - pure functions that need computational resources of server; cursor in databases; logging/analytics/paywalls; rate limiting APIs.

Idempotent Write [PUT/DELETE] - you can safely throw these at the server.

Non-idempotent Write (dangerous) [POST-made idempotent with UUID] - in the event of a network partition you could have a serious problem.

Versioning Pattern

Adapt new response to old response - pipeline, infinite support. See stripe <TBD>

Originally described by Roy Fielding in his dissertation in 2000

The exact meanings of words like "representation" are described in Fielding's REST data elements table

"Agile"

Split stories into vertical slices to deliver fast

Refactoring

Notes from Refactoring: Improving the Design of Existing Code, Second Edition.

"The whole purpose of refactoring is to make us program faster, producing more value with less effort."
Refactoring p56, Martin Fowler

"for each desired change, make the change easy (warning: this may be hard), then make the easy change"
Kent Beck, Twitter

paraphrasing: going 20 miles north, 100 miles east on the motorway and back down is faster than going 100 miles east through the woods. "sometimes you need to say, 'wait, I need to ceck the map and find the quickest route'."

Refactoring p51, Jessica Kerr

Key Ideas

A refactoring is a small change to the structure of the code to make it easier to understand and cheaper to modify without modifying its observable behaviour.
This means, the codebase is never left in a broken state, and you can always stop refactoring even if you haven't finished.
Change-Compile-Test-Commit.
(When you succumbed to the temptation of making a larger multi-step refactoring in one go and the tests fail, simply revert the code and do the smaller steps!)
The rule of three: the first time you just do it, the second time you wince but you do it anyway, and the third time you refactor.
Design Stamina Hypothesis.
Two hats: two mutually exclusive activities of "adding functionality" and "refactoring". The former mostly involves adding new tests and getting them to work, the latter doesn't need new tests to be added (unless you find some were missing).
The distinct types of opportunistic refactoring (following the boy-scout principle):
- Preparatory refactoring - making it easier to add a feature
- Comprehension refactoring - making code easier to understand
- Litter-pickup refactoring - coming across bad code

Don't justify refactoring in terms of "clean code", "good engineering practice", or other moral reasons. Instead, communicate with the core economic benefits: it makes us faster to add features and faster to fix bugs.

Documentation

This: https://www.divio.com/blog/documentation/

Estimating

Hofstadter's Law It always takes longer than you expect, even when you take into account Hofstadter's Law.

You can, to a certain degree, trade time for more accuracy. Propose that to your manager.

How can we increase the accuracy?

Estimate hierarchically, break down tasks until the leaf nodes are 4 hours or less.
Use how long similar past tasks took to estimate new tasks.
Include time for other work: meetings, code review, writing tests, debugging, presenting, etc.
Derisk unpredictable things ahead of time using spikes.

Domain-Driven Development

I need to shove these links somewhere:

https://github.com/pyeventsourcing/eventsourcing

https://www.cosmicpython.com/

https://abdullin.com/post/event-sourcing-specifications/

https://dzone.com/articles/cqrs-is-an-anti-pattern-for-ddd

Sorting

Strings

Avoid using Makefiles as a Script Runner

The purpose of a makefile is to create a directed acyclic graph of dependencies to build files.

it is yet another custom syntax with its own caveats to learn
long shell scripts must be separate files or badly inlined
most targets will be .PHONY defeating the purpose of the tool
makefiles use mtime to detect changes which can be unreliable for when a team develops using git branches
makefiles usually need additional configuration (help, .PHONY, flags, failure)
make may require explicit installation on a developer's system

Instead, just use bash scripts in a folder.

Django ORM

Django has tight integration between relational databases, HTML, and REST APIs. By defining one model, you get a web interface to modify it, a database interface to query or modify it, and HTML interfaces (forms) to query/modify it. This is fantastic for rapid development.

So, if you're in that tightly-coupled ecosystem and legitimately making use of it, then bear those pros in mind.

I want to talk about when you're not using Django's Admin interface, and HTML forms and templates. This is much more likely for any large application.

The pain

Django's ORM follows

I want to talk about Django's ORM and my explain my thoughts on why I think it should be avoided.

It's not SQL

Django goes out of its way to hide SQL, database principles and Python's standard interface to databases. This means less transferable knowledge is learnt, and actual understanding about databases is harder to come by.

SQLAlchemy maps much more directly onto

Joins

Let's do a simple join:

Books.objects.filter(author__name="django")

It's not obvious at all that a join is happening.

Aggregates

Book.objects.annotate(Count('chapters'))

SELECT book.*, count(chapter.id)
FROM book
JOIN chapter using (book_id);

It's not Python's DBAPI

...

It hides the global state

From anywhere you can call the database:

Book.objects.create()

That is actually pretty bad, but further this demonstrates that the database connection, cursor, session, transaction are all hidden away.

with Session(engine) as session:
    session.add(book)
    session.commit()

Further reading: https://docs.sqlalchemy.org/en/14/orm/session_transaction.html?highlight=savepoint#session-level-vs-engine-level-transaction-control

It uses autocommit

By default, Django uses autocommit where every statement is immediately committed. This is against the PEP 249 standard and usually not what you want.

Complex applications often perform complex queries and mutations that must be kept consistent, and thus

To use a transaction:

with transaction.atomic(durable=True):
    obj1.save()
    obj2.save()

SQLAlchemy ORM:

with session.begin():
    session.add(obj1)
    session.add(obj2)

Further reading: https://docs.sqlalchemy.org/en/14/changelog/migration_20.html#library-level-but-not-driver-level-autocommit-removed-from-both-core-and-orm

It mixes together too many things

Model fields can define blank which is a client-side validation.
Model fields can define verbose_name which is for displaying to users.

It's not clear when queries are executed

There are a lot of rules on when a query is actually executed. I'll grant they're pretty straightforward, but there are edge cases to ensure multiple queries are not executed by accident.

results = Book.objects.all()
if len(results) > 2:
    print(results)

In comparison, with SQLAlchemy there is an explicit execute:

results = session.execute(select(Book))

Complex queries are nasty

SQLAlchemy has strong support for niche features in comparison to Django.

Transactions are a mess

Though in Django 3.2, they finally added transaction(durable=True), the whole transaction system is opaque.

It's not DDD-compatible

Django forces you to define your models in the root of an "app", while with a good code structure the database models will be hidden away.

Read-Modify-Write Patterns in Python

Read-Modify-Write can cause "lost updates". Solutions, in order of preference:

Use calculated updates
Use optimistic row-level locks (version column)
Use pessimistic row-level locks
Use database locks for higher-level issues
Use external locks
Use serialisable transactions

Calculated updates

UPDATE table SET a = a + 1 WHERE id = 1;

These are ideal because no read occurs and they are atomic.

If there are zero updated columns, then you know someone else beat you to it.

It's possible to do more complicated variants that are safe to other kinds of race conditions:

UPDATE balance
SET balance = balance - 100
WHERE id = 1 AND balance >= 100;

In python...

Use optimistic row-level locks (version column)

These are better in read-heavy situations (which is typically the case).

A variant on relative updates with an explicit version column.

UPDATE book
SET
    name = :name,
    version = version + 1
WHERE id = :id AND version = :current_version

Note: only one version of an entity exists at any time, otherwise other kinds of race conditions are possible!

In Python, SQLAlchemy has good built-in support for this https://docs.sqlalchemy.org/en/14/orm/versioning.html and can raise StaleDataError.

Use pessimistic row-level locks

These are better in write-heavy situations. In general, they should be avoided because they create long-running transactions which interfere with database migrations.

In Python, the pattern for these would be:

def create(session: Session: book_new: BookNew) -> BookOut:
    model = convert_book_new_to_model(book_new)
    result = session.add(model)
    return _convert_one(result)

def get_by_id(session: Session, id_: int) -> BookOut:
    result = session.execute(Book.id=id_)
    return _convert_one(result)

def get_locked_by_id(session: Session, id_: int) -> Tuple[BookOut, BookUpdate]:
    result = session.execute(
        select(Book)
        .filter(Book.id=id_)
        .with_for_update()
        .one_or_none()
    )
    return _convert_one(result)

def save(
    session: Session, old_book_out: BookOut, book_update: BookUpdate
) -> BookOut:
    assert book_out.id == book_update.id
    model = _convert_book_update_to_model(book_update)
    new_book_out = session.add(model)
    publish_book_updated(session, old_book_out, new_book_out)
    return _convert_one(new_book_out)


# Usage
book_out, book_update = get_locked_by_id(session, 1)
book_update.name = "Super Book"
save(session, book_out, book_update)

Avoid locks where they are unnecessary
Impossible to update objects without getting a lock
Impossible to update objects without publishing an event (if desired)
Published events have access to old and new state
Transactions are managed outside these methods

Note that where there are child objects (say of a DDD aggregate), then the row-level lock can just be taken on the root entity, trusting your database-access layer to consistently take locks.

Reading Materials

A collection of further, high-quality reading material:

Git Branching Tutorial - learn git properly
Git Book - learn git thoroughly
Networking - basics of networks from the perspective of your home
Game Programming Patterns - useful for more than just games
Linux Workstation Security - a checklist for security (should adapt for your own situation)

Books:

Refactoring (2nd Edition) by Martin Fowler.. Not even out.

The Great References List

http://www.gigamonkeys.com/book/beyond-exception-handling-conditions-and-restarts.html
https://waveform.org.uk/presentations/sql/#/
https://testing.googleblog.com/2010/12/test-sizes.html
http://istqbexamcertification.com/what-is-acceptance-testing/
http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven
http://www.butunclebob.com/ArticleS.UncleBob.PrinciplesOfOod
https://code.tutsplus.com/tutorials/solid-part-2-the-openclosed-principle--net-36600
http://www.informit.com/articles/article.aspx?p=1566460
https://githubengineering.com/the-github-graphql-api/ #rest #graphql #api
https://capnproto.org/ #api
https://restful-api-design.readthedocs.io/ #rest #api
https://www.troyhunt.com/your-api-versioning-is-wrong-which-is/ #rest #api
https://www.vinaysahni.com/best-practices-for-a-pragmatic-restful-api #rest #api
https://medium.com/@rdsubhas/pitiful-restful-urls-5d576ffccb98 #rest #api
http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven #rest #api
http://blog.codepipes.com/testing/software-testing-antipatterns.html #tdd #antipattern
https://www.programmableweb.com/news/rest-api-design-put-type-content-type/2011/11/18 #rest #api
https://www.thoughtworks.com/insights/blog/rest-api-design-resource-modeling #rest #api
[https://github.com/tfredrich/RestApiTutorial.com/blob/master/media/RESTful Best Practices-v2_0.odt](https://github.com/tfredrich/RestApiTutorial.com/blob/master/media/RESTful Best Practices-v2_0.odt) #rest #api
http://mark-kirby.co.uk/2013/creating-a-true-rest-api/ #rest #api
https://sookocheff.com/post/testing/improving-test-coverage-with-exploratory-outcomes/ #tdd
https://www.youtube.com/watch?v=Api6dFMlxAA (presentation on tiling window managers) #wms #twms
https://www.youtube.com/watch?v=S1bJpsS27Qk (configuring VyOS - open source router) #router
https://www.youtube.com/watch?v=wf-BqAjZb8M ("Beyond PEP8" @Raymond Hettinger) #python
https://www.youtube.com/watch?v=9zinZmE3Ogk ("Keynote on Concurrency" @Raymond Hettinger) #python
https://www.youtube.com/watch?v=EiOglTERPEo ("Super Considered Super" @Raymond Hettinger) #python
Once: https://www.youtube.com/watch?v=T-TwcmT6Rcw ("Dataclasses" @Raymond Hettinger) #python
Once: https://www.youtube.com/watch?v=p33CVV29OG8 ("Modern Dictionaries" @Raymond Hettinger) #python
https://software-carpentry.org/lessons/ #learning
https://swcarpentry.github.io/make-novice/ #learning
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying #data #data #data
https://www.obeythetestinggoat.com/book/preface.html #tdd #book
https://www.youtube.com/watch?v=DJtef410XaM ("The Clean Architecture in Python" @Brandon Rhodes) #architecture #python
https://www.youtube.com/watch?v=PpyPa92r44s ("Practical Design Patterns in Docker Networking") #docker #networking
https://www.youtube.com/watch?v=sSm2dRarhPo ("Microservices + Events + Docker = A PerfectTrio") #DDD #microservices #architecture
https://www.youtube.com/watch?v=KVKufdTphKs ("Python's Infamous GIL" @Larry Hastings PyCon 2015) #python

Talk Notes

The Clean Architecture in Python - Brandon Rhodes @ PyOhio 2014
- A soft introduction to domain-driven development, imperative shell/functional core
- Burying I/O into a sub-procedure hides the complexity of the procedure
- Decoupling I/O above your procedure simplifies the implementation, allows for easier unit-testing, better composability
- The latter decouples the I/O, the former
- Comments are information that can be moved into the code itself (self-documenting)
- Two methods of avoiding decoupled I/O: dependency injection and mocking
- Dependency injection (2004 Martin Fowler: make the I/O library or function a parameter) leads to DI frameworks because all the dependencies need to be injected all the way down into the very depths of your code (they could use anything they want). The paramters just need to be passed along.
- The other problem with DI for testing is that you are not testing the real library
- Mock is similar.
- DI/mock make you feel that you are fighting the structure of your program? Code smell.
- A symptom of coupling - testing combinations of parameters
- In DDD the deeper you go into a program, the closer to policies you should get. The shell should be mechanisms.
- Isolated, simple data structures are passed across boundaries.
- "An imperative shell, that wraps and uses, your functional core"
- Procedural code: "output as you run"; functional code: stages producing data, that outputs at the end.
- The biggest advantage of functional programming is that you can see the data between stages. "Show code and hide models and I will be mystified; show models and hide code and it will be obvious"
- By working in stages - linking simple programs together - you can picture the output of each stage. It continually surfaces the result which can be easily checked. Our minds can picture the data easily.
- "The coder can concentrate on one section of the program at a time, without the overall detailed program continually intruding"
- Immutable data structures lead to distributed computing: if you modify memory, you cannot move that code to another machine, but if you just accept data and return data then you can move that to another machine.
- A function that takes an opaque, complex data structure is actually a disguised method.
- Brief Python-specific details at the very end (generators, iterators, context managers).
Microservices + Events + Docker = A Perfect Trio - Chris Richardson @ DockerCon 2016
- Monolithic applications become large, and you lose team autonomy and fast development because of conflicts between teams that contribute to the same codebase.
- So, accelerate development with microservices.
- But, microservices mean a distributed system: communication between services, partial failures, consistency between databases (no distributed transactions), testing is complicated, and DevOps is complex.
- A shared database is not microservices because of tight coupling - conflicts between teams result in slowdowns and loss of team autonomy.
- So, use event-driven architecture to maintain data consistency.
- But you still have a dual-write problem (write to database and write to message broker).
- So, use Event-Sourcing: it resolves data consistency, gives you reliable event publishing, gives you auditing, and eliminates ORM problems.
- But, querying the event store directly can be extremely challenging.
- So, resolve with Command Query Responsibility Segregation (CQRS) by splitting commands on the aggregate which produce events, from materialised views which consume events and provide a query API (backed with type of database)
- Use Docker to simplify development and deployment.
- https://microservices.io

The Cultural History of Programming

Magnetic-Core Memory used as RAM between 1955-1975 is the source of the term "core memory" or "core dumps".

Socat - Create a listening service that forwards to a service inside a network namespace (socat is amazing): /usr/bin/socat UNIX-LISTEN:/run/transmission.sock,fork exec:'ip netns exec NETNS socat STDIO tcp-connect\:127.0.0.1\:8080',nofork
- HTTP Client -> [Unix Socket Server] ---- | Network Namespace Barrier | ----> HTTP Server (localhost listening)
strace -p PID - Look at calls between the process and the kernel
gdb, python-dbg - Look at stack traces live
TIL: https://news.ycombinator.com/item?id=17083879 "My registrar suspended my domain because an abusive user was using a subdomain for phishing"
Make Vim respect XDG - https://tlvince.com/vim-respect-xdg
Write scripts, just flipping write them! https://www.youtube.com/watch?v=Jd8ulMb6_ls "Solve Your Problem with Sloppy Python" @Larry Hastings
A product manager can buy software from their team, but they can also buy information like, "Is X possible?" or "How many points to build Y?" or "How much support would be reduced if we did Z?".
Fira code for this ligatures: https://medium.com/@docodemore/an-alternative-to-operator-mono-font-6e5d040e1c7e
http://learnyouanagda.liamoc.net/pages/introduction.html
camelCase, PascalCase, snake_case, kebab-case, CAPS_CASE.
Sudoers:pccuser ALL=(ALL:ALL) NOPASSWD:/path/to/command "" pcuser = user or %group we are giving permission to ALL= is the host (ALL works unless you are sharing file across hosts) (ALL:ALL) is (user-we-can-command-as:group-we-can-command-as), i.e. sudo -u user -g group. If omitted only root. Can use just (ALL) NOPASSWD: is tags command may be "ALL". "" prevents command parameters (do not use when specifying parameters)

-> User Host = (Runas) Command
- Example (placed after NO-NOPASSWD): %wheel ALL=(root) NOPASSWD:/usr/bin/pacmatic -Syu
microservices.io

Python Ecosystem

stdlib 3.7 dataclasses (https://www.youtube.com/watch?v=T-TwcmT6Rcw "Dataclasses" @Raymond Hettinger)
- previously and still alternatively: attrs lib (has validators, converters and a few other features)
pipenv (https://www.youtube.com/watch?v=GBQAKldqgZs "Pipenv" @Kenneth Reitz)
pytest
typing: (mypy or pyre, MonkeyType) (https://www.youtube.com/watch?v=pMgmKJyWKn8 "Real World" @Carl Meyer)
Instagram 2018 1.5mil lines of Python
linting complete: https://jeffknupp.com/blog/2016/12/09/how-python-linters-will-save-your-large-python-project/
- Create based on project Daedalus.
https://awesome-python.com/

https://github.com/ambv/black - pycodestyle auto

JavaScript Ecosystem

https://prettier.io/ (there was another opinionated one?)
- https://standardjs.com/

Personality

Be likeable - https://news.ycombinator.com/item?id=16883882
Don't be a hero - https://al3x.net/posts/2010/01/09/dont-be-a-hero.html
On how to build an engineering culture - http://firstround.com/review/why-firing-brilliant-assholes-is-required-to-build-a-great-engineering-culture/
How to apologise(!). Source
An expression of regret, i.e. "I'm sorry"
An explanation (but not a justification)
An acknowledgement of responsibility (most important)
A declaration of repentance
An offer of repair
A request for forgiveness (least importance)

Don't say: "IF", "I regret", Don't use passive voice. Say how you're going to make sure this doesn't happen again.