lm – list manual pages

I have wanted this for… probably half of my life?

lm (see https://woozle.org/papers/plan9.html) apparently existed in Plan 9 many years ago; it wraps apropos (and is similar to man -k) so that instead of just listing names and sections of manual pages, it sets up the line to have the man section whatever command at the start of the line, so that the gentle user may copy and paste this to a command line, because that’s almost certainly what the gentle user intends to do next.

I reimplemented it as a zsh function, because, well, why not?

(|N/A:default)cbbrowne@cbbrowne2 /tmp> lm ()
  apropos -l "$@" | sed 's/(.) ((.)) * - /man \2 \1 # /'

So, how does this work?

(|N/A:default)cbbrowne@cbbrowne2 /> lm dockerfile
man 1 docker-build # Build an image from a Dockerfile
man 1 docker-builder-build # Build an image from a Dockerfile
man 1 docker-image-build # Build an image from a Dockerfile
man 5 Dockerfile # automate the steps of creating a Docker image
(|N/A:default)cbbrowne@cbbrowne2 />

Awesome, no?

CFEngine Alternatives

I have been using CFEngine 2 (which is substantially different from version 3) for a great many years to manage various aspects of my home system environments, making use of such things as:

  • Copying files, to do simplistic backups where that works
  • Editing files to have particular content such as SSH keys, cron jobs
  • Restarting processes that I want to keep running (syncthing, dropbox, …)
  • Running shell commands on particular hosts
    • To run backups
    • To run cleanup jobs
  • Setting up symlinks to configuration files, so that I have authoritative configuration in a git repository, and then rc files in $HOME or $HOME/.config or such reference them
  • Ensuring ssh keys have appropriately non-revelatory permissions
  • Making sure new servers have my set of favorite directories

I had used cfengine2 to build system management tools with a “PostgreSQL flair”, where the point was to help manage database instances, doing things like:

  • Deploying PostgreSQL binaries and libraries (our custom builds included Slony-I, for instance)
  • Rotating database logs
  • Building out the filesystem environment for database clusters, thus
    • Setting up needed directories for PGDATA
    • Setting up database log directories
    • Setting up symlinks for the latest binaries, alongside the above “deploying” of the binaries

Eventually, others took this over, ultimately replacing CFEngine with newer tools like Puppet and Ansible, so these uses fell out of my hands.

I never made the migration from CFEngine 2 to CFEngine 3; the latter is apparently a fair bit more featureful, but I found myself unhappy with how the authors decided that having decently trackable logging was something they felt should be a proprietary extra-price extension.

Perhaps ten years later, now, I’m finding that builds of cfengine2 are getting sparse in Linux package management systems.

I started looking around at the sorts of systems that are considered to be successors to CFEngine. My encounters with Puppet have left me with no desire to take that on for systems I’m operating for myself; it seems slow-running and tedious. The short list of plausible alternatives I found of most interest were Ansible and Salt Stack. But as I started poking further, I found that none of these actually reflected the ways in which I have been using CFEngine.

Systems like Puppet, Ansible, and Salt Stack are intended for deploying services and applications, along with their configuration. That’s largely not what I’m doing. (Perhaps I should be looking at it more that way, but it certainly hasn’t been…)

It looks like none of these are what I’m needing for my usual use cases. I am doing some replacements with more modern bits of technology, but with only partial migration away from CFEngine2.


The situations where I was having CFEngine launch, and keep running, certain processes are looking, these days, like what systemd does. I am not especially a lover of systemd, but nor am I one of the haters. I am unhappy with the steady scope creep it seems to undergo, but I do like the way that Unit files provide a declarative way of describing services, their semantics, and their relationships.

For the various services that I want operating, I have set up systemd user unit files. This has led to more CFEngine2 configuration, curiously enough:

  • I create Unit files for services in my favorite Git repo that manages my configuration
  • Configuration files for the service reside in that repo, too.
  • I added CFEngine link targets to point $(HOME)/.config/systemd/user/$SERVICE.service to the unit file in my git repo, and, typically, more to point $HOME/.config to the configuration for the service
  • I added CFEngine process rules that check for service processes that should be running, and run /bin/systemctl --user start $SERVICE if they are not running

It means there’s a few more CFEngine rules, but basically of just two sorts:

  • Process rules, to manage the service process (and it’s using systemd tooling, which is pretty “native,” no horrendous hackishness), and
  • Link rules, to link files in the Git repo into the places where they need to be deployed.


A lot of what I now have left in CFEngine is a set of rules for establishing symlinks.

There has been an outgrowth of tools for doing this sort of thing, and, to be more precise, tools for managing “dotfiles”. There is an awesome-dotfiles repository linking to numerous tools that have been established to help with this.

There are two that elicited the most interest from me:

  • dot-templater, a Rust-based tool with a system for customizing which files (and content) are exposed on each system
  • chezmoi, a more sophisticated system that has a "chezmoi” command for interactively attaching dotfiles to one’s configuration repository

Sadly, they are all so much more sophisticated than symlinks that it has, thus far, seemed simpler just to add a few more link entries to my main CFEngine script.

The direction I am thinking of is to take my “hive” of CFEngine link lines, which, in truth, are decently terse and declarative, and write a little shell-based parser that can read and apply that. Actually, there’s several approaches:

  • Read the link rules, and directly apply them
  • Read the link rules, and generate commands for one or another of the “dotfile manager” tools to put the files under management

Cron Jobs

My use of CFEngine has gone through various bits of evolution over time.

  • Originally, I set up shellcommand rules to run interesting processes periodically, so that my crontab would run cfengine some number of times per hour, and the shellcommand rules would invoke the processes.
    This is well and fine, but means that there are two sources of truth as to what is running, namely what is in the crontab, and what is in my cfengine script. Two sources of truth is not particularly fun.
  • As a part of the “Managing Database Servers” thing, years back, I had recognized that the above was not nice, and so wrote up a script that would capture one’s crontab into a file that would be captured in a specific place, complete with history. It would therefore check the current crontab against the previous version, capturing new versions any time there was a change. This is an output-only approach to things, but nevertheless very useful for tracking history of crontab over time.
    I had never applied this at home.
  • I determined that I needed to fix my “two sources of truth” problem, so took measures to Do Better.
    • A first step was to capture, on each host, the current contents of users’ crontabs, and, as a better thing than before, capturing this in a versioned fashion into a Git repository. This provides the history that Managing Database Servers had done, but as it resides, version-controlled within Git, even better.
      pushd ${CRONHOME}
      echo "Saving crontab to ${CRONTABOUTPUT}"
      crontab -l > ${CRONHOME}/${CRONTABOUTPUT}
      git add ${CRONTABOUTPUT}
      git commit -m "Saving crontab for user ${USERNAME} on host ${HOST}" ${CRONTABOUTPUT}
  • The new, still better step was to use editfiles to compute what I wanted to have in my crontabs. This would construct new files, $(CRONTABS)/$(hostname).$(username).wanted
    consisting of everything that my CFEngine script decided ought to be running on this host, for this user. Thus, the CFEngine script represents the Single Point Of Truth as to what is supposed to be in my crontab.
    I ran this, and in the interest of some lack of trust ;-), did not immediately automate application of this as a new crontab.
    • I did a nice manual run across each of my hosts, comparing the dumped crontab output with what is thought wanted, namely $(CRONTABS)/$(hostname).$(username).wanted
    • There were discrepancies (and since it wasn’t automatically applied, no consternation!), so some modifications were done to rectify shortcomings
    • When I concluded that everything matched my desires, it’s apropos to run crontab against $(CRONTABS)/$(hostname).$(username).wanted so that this is automatically applied
    • Now we have a series of single points of truth:
      • The captured-in-git history files document actual states of crontab over time
      • If I want to add or remove jobs, that takes place by modifying the CFEngine code to add/remove editfiles rules.

This is not exactly a “migration away from CFEngine”, but it does make for a way better controlled set of cron jobs.

I am quite sure that I am not sure what would be much better. I have looked into cron alternatives both small and large. At one point, we did a Proof of Concept at work looking at Dollar Universe (now a Computer Associates product), at the really sophisticated end. That would, personally, be ridiculous overkill, but there are places where it’s going to be a good choice.

Cron has a number of weaknesses:

  • Not very easily auditable
  • Not good at handling “flow control” where a system may be getting overloaded by the set of cron jobs getting invoked
  • No in-system awareness of jobs that should be mutually exclusive or that should be ordered. (“Don’t run A and B simultaneously; make sure to only run B after having run A”)

Nevertheless, for small-ish tasks where exact timing isn’t too critical and where conflicts may be addressed by running jobs in separate hours of the day, it isn’t worth looking to a job scheduling system that is way more complex to manage and heavier weight to run.

One would-be alternative to cron that looks somewhat interesting is pg_timetable which has its data store backed by PostgreSQL and which has a notion of “task chains.”

At one point, I did a bit of work creating a “pg_cron,” which had loosely similar requirements. It never reached the point of working; the place where I was pointedly short on answers was on how to establish the working environment for tasks. The environment needs to be “portable” in a number of ways; you’d want to be able to control tasks running on remote hosts, too. David Tilbrook’s QEF environment seemed to have relevance; it had ways of managing the launching of work agents with tight control over the environment they would receive. Unfortunately, time just hasn’t permitted experimenting more deeply with that.

PostgreSQL URIs versus Unix Domain Sockets

I recently saw https://mydbanotebook.org/post/cant-connect/g/post/cant-connect/ which presents a nice little flowchart for debugging why you might not be able to connect to your PostgreSQL database.

I was recently struggling with setting database connections inside the context of Gitlab CI, where my regression test needed to connect to a “sidecar” PostgreSQL instance. (See the repo… https://gitlab.com/cbbrowne/mahout/-/blob/master/.gitlab-ci.yml)

I have been trying to migrate my connection usages to exclusively use URIs (where possible)… The concept is nicely documented in the standard documentation here https://www.postgresql.org/docs/12/libpq-connect.html#LIBPQ-CONNSTRING, where a URI commonly looks like: postgresql://host1:123,host2:456/somedb?target_session_attrs=any&application_name=myapp

In the Docker context where I was doing this, I needed to use Unix Domain Sockets, where the URI will omit the host, doing something like: postgresql://%2Fvar%2Flib%2Fpostgresql/dbname

Something about this would maddeningly lead to refusals to connect. I could instead use a “traditional connection string,” and ultimately gave up on trying to use URIs in this context. My URI is "host=postgres user=runner port=5432 dbname=${POSTGRES_DB}" for anyone interested 😉

fasd – a smarter cd

Once upon a time I used to use https://github.com/wting/autojump as a way for my systems to help me quickly navigate to my favorite Directories Of Interest. Basically, it did (and similar tools also do) the following:

  • cd is instrumented to capture directory locations each time one visits a new directory, and store them in a wee sort of database
  • an alternative “cd” command is introduced that attempts to Do What I Mean. It takes the input, and sees what stored directory best matches, with a bias towards frequently used directories

autojump was written in Python, which is no grand problem; I did some poking around, and discovered a newer tool, https://github.com/clvv/fasd, which has similar capabilities, perhaps more, and has a slightly smaller footprint, being implemented in “POSIX shell,” so it can happily run on common shells such as Bash (and my fave) zsh.

So far, I have just been using the “zz” functionality that picks the seemingly-most-relevant directory. It does a fine job of this.

It is doubtless a good idea to poke some more at this; running “fasd a b c” searches recent directories for highest-relevance files containing “a” “b” and “c”, fairly successfully. Throwing multiple strings pulls up an interesting list:

cbbrowne@karush ~> fasd tmux conf
1 /home/cbbrowne/.tmux.conf
12 /home/cbbrowne/GitConfig/InitFiles/tmux/tmux-home.conf

Without much effort, this locates tmux configuration files; that’s looking pretty attractive…

Warring with My Tasks

The local LUG had a talk recently about Task Warrior, which inspired me to give the tool a poke.

I have had excessively fragmentary handlings of outstanding ToDo items, where I have assortedly used:

  • Index cards to note things, but this is really ephemeral; I tend to turf them quickly, and only the most immediate things would get captured here, and evaporate as quickly. These can get clever; I didn’t get enough into that, I’m not hipster enough for that!
  • For a while (and this has gotten to be pretty distant in the past) I used todo.txt to capture things to do. Unfortunately, there’s not much of a synchronization mechanism, so I at one point ran the iOS app on my (still around somewhere) iPod Touch, later on Android phones, with occasional copying onto Unix desktop. But because coordinating versions amounted to doing by-hand git patching, this was way less useful than wanted.
  • For quite some time, I used Org Mode files for my grocery list, syncing between desktop and mobile devices using SyncThing. This was decently viable, actually, allowing me to manage the document on desktop (with a lovely big keyboard and big screen for mass editing) and on mobile device (so the list is handy to mark off purchases). Once in a while, I would push completed items into a side file for posterity. I also copy this data to a Git repository (for arguably “more posterity”); it is not as automated as it ought to be, trying to automate Git checkins was more troublesome than it ought to be.

But in November, at the above mentioned talk, I installed Task Warrior on my phone and decided to give it a try. FYI, I run it in a Termux session on my phone. There do exist a couple of TaskWarrior apps, but I have not yet been able to get them to sync to my taskd server. I am happy enough with the Termux-based CLI access. Perhaps I should set up a web server as an alternative interface? We’ll see…

Overall Configuration

I have the app (apt-get install taskwarrior) installed on a variety of hosts:

  • Work desktop, which syncs against central taskd
  • Chromebook laptop, syncs against central taskd
  • Home server Karush, which hosts taskd and has client syncing against taskd
  • OnePlus 5 (Android phone), where termux hosts the app, syncing against taskd

I installed taskd on a server at home. This was a bit of a pain in the neck; setting up users and certificates is fairly fiddly as also is setup of each client. It took a few tries to get it all working, and I went through a couple of user UUIDs before I was done. It comes with a systemd unit file; I have not thus far had that work, so I have to browse through history (boo, hiss!) to find the right command to restart it properly upon system reboot it took some effort to get that working properly.

One interesting thing I noticed; when syncing got “deranged” and I wound up on a new user UUID, I found that, in order to get things unstuck, I had to edit ~/.task/backlog.data. Note that this file contains the UUID of the user that it intends to sync against. (I’m not 100% sure; this may be the “local” idea of the UUID of the user…) The removal of the UUID at the top of that file led to the local instance of task generating a new UUID and proceeding.


I basically started out by tossing in all sorts of tasks that popped up, without too much intentionality, just making sure that knowledge about upcoming Things To Do got captured. I wasn’t sure what projects or tags to use; it is out of seeing a bunch of tasks that need to be classified that the patterns will emerge. I am basically just 3 months into this, so while some patterns have emerged, there are plenty more to come.

  • It turns out that tagging with +home and +work is mighty useful, for the simple reason that it allows en-masse deferral of tasks. At the end of the work day, I find it handy to simply defer items to another day thus:
    task +work status:pending wait:tomorrow
    It would probably be valuable to defer things further, so that my list of things to do immediately does not get randomly cluttered.
  • COVID-19 has changed the above a bit; work from home means that the separation is entirely less obvious
  • I have been adding in lots of periodic tasks as they pop up:
    • Paperwork tasks such as filing copies of pay stubs, bank statements, tax documents, and bills of importance
    • Preparations for annual events
    • Reminders for mailing list moderation
  • Some projects have been emerging, but not quickly or well. It is easier to think about adding tags, and occasionally a tag emerges as being important enough to call it a project.
  • I am (still!) not using dependencies nearly much as I probably ought to.
  • As “wishful thinking,” I’d like it if I could have grocery items dependent on a “go to grocery store” task, and have the children pop up as current the moment I arrive at the store and mark that first task done. That also means I’d like it if the children were being “masked” as not ready to proceed (ergo not listed) until I get there.
    • In reviewing Tomas’ presentation, I found A Better Way to deal with this, which is to use contexts. If my grocery items all have +metro as the locational tag (my nearby grocery store is called Metro), then I can define the relevant context:
      task context define metro +metro
      task context metro
      More inclusions and exclusions could be done; in any case, it is clearly useful to use some contexts so when in a particular place, the set of tasks are restricted to those of relevance.
  • Projects (indicated by adding project:taxes.2019 or project:bugzilla.20721 or project:website) are evidently more useful than I had thought, once I start using the dotted notation to allow hierarchical subprojects. They had appeared to be way less useful than labels, but hierarchy changes that. Both are good things (in moderation!) and are good used together.

Future Items

  • Another few months of patterns hopefully leads me to get a bit smarter about how I’m using this, particularly with regards to deferring items I can’t do immediately.
  • I need to get the “sorta graphical” Android client working; need to fight with the configuration to get that working.
    Update 2020-05-07, I finally found documentation that helped me on this… https://www.swalladge.net/archives/2018/03/17/taskwarrior-taskserver-syncing-tutorial/ had the crucial aspect that I needed to copy a trio of .pem files (certificate data for my user and for my taskd server) into /Internal Storage/Android/data/kvj.taskw/files/one-of-the-subdirectories
  • I find, regrettably, that I don’t very much like the Android client
  • There are some interesting analytical reports such as task burndown to get some longer term patterns out of it. For that to provide value requires more data collection.
  • I imagine I should be automating some task management, such as having things like the following:
    • TaskWarrior should draw a feed of tasks from bug reports. There’s an extension to pull from Github
    • We’re apparently getting into Scrum work; it would be neat to pull Jira tasks into TaskWarrior automatically
  • There’s an Emacs mode; wait, wait, that’s actually comparatively complete, despite being exceeding brief. It works, and is already useful.
    It probably would be worth extending this to allow operations other than ‘a’ (to add a task) and ‘g’ (to refresh the list), to have a set of interactions one might perform on visible items. The Kubernetes interaction mode for Emacs has some relevant examples.
  • I’m told hooks are interesting, and certainly grasp the broad concept from the way that Emacs uses hooks really really a lot…
    At first glance, it seems less interesting than I had expected…
    • One use case is to automatically commit changes to Git; that is broadly interesting, but I think I can live with that taking place pretty periodically rather than continuously. Especially in that I switch clients a lot, so that keeping consistency would require a lot of Git synchronization.
    • Another usage is to rewrite tasks.
      An interesting example was to change notations, so @projectname would be used to specify project, which is shorter than project:projectname. As above, this needs to run “everywhere” which seems less than convenient. (Again, needs Git repo synchronization, this time for the repo containing the hooks.)


I have been happy enough with my experiences with TaskWarrior, and will be continuing to use it. There are clearly plenty of features I am not using yet, some of which may make sense to poke at further.

A wee jot about Mosh

I have been using Mosh for quite a number of years now; it is a notionally “mobile” shell that nicely supports devices with intermittent connectivity. On occasion, I have used it as an alternative protocol to ssh when using my laptops/tablets/phones to connect to shell sessions.

Its main merits (to me) are that:

  • Sessions can survive even fairly long connectivity outages. The more I use tmux to manage sessions on servers, the less that matters, but it is still a useful convenience particularly with connections from my phone.
  • Rather than replaying every keystroke (or every receipt of a character of a log file /bin/cat’ed to stdout), it maintains the state of the screen, so it can refresh the screen, skipping over long-irrelevant output, which is an extraordinary network performance improvement if one is browsing server logs…

Curiously, every so often, and this is why I thought to blog about this, I periodically still get forwarded notifications that people continue to report on issue #98 which I helpt report on back in 2012. I was a bit nonplussed this week to notice another update to this that indicates that people are continue to use (or at least reference) my circa-2012 workaround to issues getting Mosh to connect across systems with slightly differing ideas of UTF-8. I suppose I should be proud that my workaround (which is to explicitly pass LANG and LC_ALL values to mosh client and server programs) continues to seem a relevant solution. I have shell scripts lurking around that are almost 8 years old for doing mosh connections in my local environments that use this. I am, however, a wee bit disappointed that nearly 8 years of further development hasn’t made it unnecessary to tweak these environment aspects.

It is a somewhat happy thing that Mosh’s code base is stable enough (and I note it’s included in numerous Linux and BSD distributions, as well as having support in Android apps such as JuiceSSH) that it is, of late, seeing new commits only every few months.

Oh Shell

I have been poking for a while at the Oh Shell, presented at the 2018 BSDCan. It observes that there are a bunch of things about shells that tend to be painful, which has led to a whole bunch of shells coming out that are (exceedingly thin) veils over other programming languages, which then naturally attends them being of no general interest.

Here are a set of problems that Michael Macinnis pokes at:

  • Undefined variables – bash has an option to gripe about such, but it’s no default
  • varadic functions
  • word splitting versus lists
  • return values 0-255 – somewhat intentional, so that functions look like processes
  • global variables are mostly all that’s available
  • little modularity is possible because everything is in the global environment. This is somewhat like a worse version of Lisp dynamic scope
  • tortured syntax, particularly for variable expansions/rewrites

He presents a variety of mechanisms to try to solve these problems:

  • same syntax for code and data (conses)
  • richer set of data types (strings, symbols, number tower, lists, and some more sophisticated bits
  • first class environment via define/export
  • Kernel like Fexprs – enabled by first class environment. See John N Shutt’s thesis, vau: the ultimate abstraction
  • support dynamic communication patterns – see Squeak (Pike and Cardelli)

The shell is implemented in Go, making it pretty natural to do the “dynamic communication pattern” bit via GoRoutines. It is mostly an implementation of Scheme, with a largely Unix-y syntax. The main places where it deviates towards Scheme (that I have thus far noticed) are:

  • It has a preference for prefix notation for arithmetic rather than infix
  • The “:” character indicates subsumed evaluation of blocks, which is loosely like a (let (()) ()) structure.

I’m not yet sure that it’s enough “cleaner” than other shells that it is worth going in on to any special degree. The modularity aspects would prove quite interesting, if libraries of code using them were to emerge. The absence of libraries for the existing shells is unfortunate. You can certainly find a zillion extensions for Bash and Zsh, but in the absence of deep modularity, the connections can only be very shallow. You’re just a captured environment variable away from making that extension blow up…

Happy 2020

It sure has been a while since the last time I did up a blog entry…

A thing for 2020 is to do so slightly more frequently, perhaps somewhat systematically. I suppose I’m one of the exceedingly independent “non-herdable cats” of the https://indieweb.org/ movement. I’m not especially following anyone else; just following the loose principles that…

  • I should generate my own content
  • On my own web site
  • Hosted on my own domain

Rather than depending on the vagaries of others’ platforms. If you’re depending on Google Plus to publicize your material, oops, it’s gone! And the same is true for other platforms like Facebook or centralized “syndication” systems.

I won’t be getting rich by having someone’s ads on my site, but, again, that’s not a stable source of monies for much the same reasons suggested about material syndication.

The above is all pretty “meta”, and shouldn’t interest people terribly much. What I probably ought to be writing about that might be somewhat interesting would be about things like the following:

  • I have been fooling around with TaskWarrior, a somewhat decentralized ToDo/Task manager, which is allowing me to track all sorts of things I ought to be doing.
    The interesting bit of this is that I’m capturing a whole lot of “things to research”, which tends to point at software I probably ought to consider using, adapting, or, just as likely, ignoring, due to it not being interesting enough.
  • My web site nearby is managed using SGML/DocBook, which is a toolset that is getting increasingly creaky. I’d quite like to switch to another data format that is easier to work with. Some ideas include OrgMode and TeXinfo. I did some poking around to try to find tools to convert DocBook into such; the tools seem to only be suitable for reasonably small documents, and I have 122K lines of SGML, which makes that choke…
  • I have been fooling around with Oh shell (it’s written in Go, and essentially implements Scheme behind the scenes) as a possibly better shell. I’m trying to collect better thoughts as to why that might be a good idea. (I’m not sure Oh is the right shell though)
  • My cfengine 2 configuration management scripts are getting mighty creaky. Initial research focusing on SaltStack and Ansible showed off that those sorts of tools are totally not suitable to the problem I am solving, which is that of managing configuration (e.g. – dotfiles) and the differences needed in differing environments (e.g. – home versus work, servers versus laptop)
  • I’m poking at using Tmux more extensively. I started using GNU Screen in the early 20-noughts, and switched to somewhat simpler to manage tmux a few years ago. There are now tools like tmuxinator for managing sophisticated tmux-based environments, and it looks like that could be quite useful.
  • The big “work” thing I have been gradually chipping away at is Kubernetes. I tend to build batch processes, so this functions quite differently than usual documented use cases.
  • Apparently I should look at some “scrum” tools for task boards; some searching found a bunch of tools where research tasks are queued up in TaskWarrior to get dropped on me at some point…
  • I need to revisit my EmacsConf 2019 notes to see what sorts of things are worth poking at more.

Emacsconf 2019

Operated at https://emacsconf.org/2019/, this conference on “all matters Emacs” went quite well; I was very pleased to have noticed it a couple of weeks beforehand.

They had some struggles publishing video; whomever it was that had the idea of pre-recording the lightning talks was really onto something, as that gave them material to present whilst getting the glitches dealt with. Hopefully some lessons were gleaned from the struggles so that the organizers do not wind up prematurely aged 🙂

The other thing that was a fantastic thing was the https://emacsconf.org/2019/pad “Pad” where they collected a stream of comments. In the world of social media, this sort of collection seems to head into wildly awful places. But this particular comment stream was sheer gold, collecting sets of URLs and viewers’ notes that were somewhat better than the notes I was trying to take, and which collected URLs, questions, and answers.

Actually, in going back and looking, the talk on “Emacs as My Go To Scripting Language” led to adding a discussion on the Pad on a wide set of generative approaches to building regular expressions (for an overview of ideas, see https://gist.github.com/aterweele/11bdefcac0255baa3a8a71d498236d0d ) which was a thought-provoking addition that wasn’t remotely part of the talk. It’s a nice sort of equivalent to the in-person “Hall Discussions Track” that is often the best part of a conference.

Many thanks to the organizers, I hope they have recovered! 🙂

PGCon 2018 Unconference

I attended the PGCon 2018 Developer Unconference, on May 30, 2018, which had, as always, a goodly mixture of discussions of things in progress.

Schema Deployment

I proposed a discussion of schema deployment methods; it fell just short of attracting enough interest to get a room. A few people asked me in passing about what I’d had in mind; I’ll point at the Github project, Mahout, which indicates the set of “agenda” that I have in the matter.

Mahout is a lightweight tool (e.g. – needs Bash, psql, and the most sophisticated shell thing needed is tsort), which, being PostgreSQL-specific, could readily be extended to support PG-specific tooling such as Slony.

In the documentation for Mahout, I describe my goals for it, and the things I hope it solves more satisfactorily than some of the other tools out there. I’d be particularly keen on getting any input as to useful purposes that point to changes in functionality.

JIT Compilation

– Andres Freund
– Biggest benefit comes in aggregate processing, that is where there is huge computational work
– planning is presently too simplistic
– no cacheing of compiled JIT code
– code is somewhat slow in some cases
– JIT compilation of COPY would probably be helpful
– COPY, cuts out a lot of presently kludgy C code
– Sorts
– hashing for hash aggregates and hash joins, there is already a prototype…
– interesting idea to capture generated code in a table (Jeff)
– either C or LLVM bitcode
– bitcode may be architecture-dependent
– want better EXPLAIN [ANALYZE] output (Teodor)
– better code generation for expression evaluation (Andres)
– Presently no Windows support
– Once you have cacheing, OLTP performance will improve, but we’re certainly not there now
– local cache, initially; eventually a global cache
– LRU cache
– If I generated much the same code last time, can reuse the compiled code
– would move some work from executor to the planner, but this is a pretty deep architectural change for now
– can definitely get degenerate optimization cases; gotta watch out for that
– generated code is way denser than the executor plans, so there are cases of significant improvements in memory usage
– Incremental JIT compilation (do it in background, after starting query, but before execution)
– impact of threading? Worker backends + data marshalling?

Connection pooler – Odyssey

– Multithreaded connection pooler and request router
– Open source release
Yandex/Odyssey @ GitHub
– Multithreaded via worker thread
– each thread arranges authentication and proxying of client-to-server and server-to-client requests
– worker threads share global server connection pools
– SSL/TLS supported
– tracks transaction state
– can emit CANCEL on connections and ROLLBACK of abandoned transactions before returning connection to pool
– Pools defined as pair of DB/User
– Each pool can authenticate separately and have different mode/limit settings
– UUID identifiers for each connection
– Log events and client error responses include the UUID


– Greg Stark
– Splunk with alerts based on log data
– Nice to have things actively exported by PostgreSQL
– Exposing
– Aggregating
– Reporting
– Error reporting
– Log files too chatty, lot of data all together
– Could different data be separated?
– But how about when it needs to be correlated?
– Sensitive data…
– Too much parsing needs to be done
– Loading into DB makes it structured, but recursive problems, and can overload the DB
– Metrics
– start with What You Want Measured…
– Rates of many things
– vacuum activity
– WAL generation
– error rates
– index usages
– violation statistics, rates of rollbacks and errors
– Nice to have…
– pg stats with numbers of violations and contentions
– let the stats collector collect a bit more data
– connection usage statistics
– Some tools
– Jaeger – JaegerTracing
Zipkin – distributed tracing system to find latency issues in microservice architectures
Opentracing – vender neutral APIs and instrumentation for distributed tracing
– Can there be a “pg stat user table” indicating bloat information?

Query Optimization with Partitioned Tables

– Planned improvements in PG11
– Partition wise pairs
– Partition wise aggregation
– Partition pruning
– Planning time
– Runtime

TDE – Transparent Data Encryption

– Inshung Moon
– Buffer level encryption/decryption
– Per table encryption
– Perhaps should be per-column???
– 2-tier encryption key management
– Working with external key management services (KMS)
– WAL encryption
– only doing encryption on parts other than header
– Nice to have it on LOB (large objects API) too, but no easy way…
– Log file data needs to be encrypted before submission to destinations


– Encryption of indexes is troublesome
– You lose the usefulness of ordering of disk
– Table added with a per-table/column private key
– What if some data seems to be exposed? Need to generate new key
and rewrite? This would be arbitrarily expensive…
– Changing master key is easy, as long as the function for
generating the private symmetric (per-table key) is symmetric

Threat model

– Translucent Databases
– Peter Wayner
– Order preserving encryption
– Agrawal encryption scheme
Order Preserving Encryption for Numeric Data, by Agrawal, Kernan, Srikant, Xu
IBM Almaden
SIGMOD paper