PostgreSQL URIs versus Unix Domain Sockets

I recently saw https://mydbanotebook.org/post/cant-connect/g/post/cant-connect/ which presents a nice little flowchart for debugging why you might not be able to connect to your PostgreSQL database.

I was recently struggling with setting database connections inside the context of Gitlab CI, where my regression test needed to connect to a “sidecar” PostgreSQL instance. (See the repo… https://gitlab.com/cbbrowne/mahout/-/blob/master/.gitlab-ci.yml)

I have been trying to migrate my connection usages to exclusively use URIs (where possible)… The concept is nicely documented in the standard documentation here https://www.postgresql.org/docs/12/libpq-connect.html#LIBPQ-CONNSTRING, where a URI commonly looks like: postgresql://host1:123,host2:456/somedb?target_session_attrs=any&application_name=myapp

In the Docker context where I was doing this, I needed to use Unix Domain Sockets, where the URI will omit the host, doing something like: postgresql://%2Fvar%2Flib%2Fpostgresql/dbname

Something about this would maddeningly lead to refusals to connect. I could instead use a “traditional connection string,” and ultimately gave up on trying to use URIs in this context. My URI is "host=postgres user=runner port=5432 dbname=${POSTGRES_DB}" for anyone interested 😉

fasd – a smarter cd

Once upon a time I used to use https://github.com/wting/autojump as a way for my systems to help me quickly navigate to my favorite Directories Of Interest. Basically, it did (and similar tools also do) the following:

  • cd is instrumented to capture directory locations each time one visits a new directory, and store them in a wee sort of database
  • an alternative “cd” command is introduced that attempts to Do What I Mean. It takes the input, and sees what stored directory best matches, with a bias towards frequently used directories

autojump was written in Python, which is no grand problem; I did some poking around, and discovered a newer tool, https://github.com/clvv/fasd, which has similar capabilities, perhaps more, and has a slightly smaller footprint, being implemented in “POSIX shell,” so it can happily run on common shells such as Bash (and my fave) zsh.

So far, I have just been using the “zz” functionality that picks the seemingly-most-relevant directory. It does a fine job of this.

It is doubtless a good idea to poke some more at this; running “fasd a b c” searches recent directories for highest-relevance files containing “a” “b” and “c”, fairly successfully. Throwing multiple strings pulls up an interesting list:

cbbrowne@karush ~> fasd tmux conf
1 /home/cbbrowne/.tmux.conf
12 /home/cbbrowne/GitConfig/InitFiles/tmux/tmux-home.conf

Without much effort, this locates tmux configuration files; that’s looking pretty attractive…

Warring with My Tasks

The local LUG had a talk recently about Task Warrior, which inspired me to give the tool a poke.

I have had excessively fragmentary handlings of outstanding ToDo items, where I have assortedly used:

  • Index cards to note things, but this is really ephemeral; I tend to turf them quickly, and only the most immediate things would get captured here, and evaporate as quickly. These can get clever; I didn’t get enough into that, I’m not hipster enough for that!
  • For a while (and this has gotten to be pretty distant in the past) I used todo.txt to capture things to do. Unfortunately, there’s not much of a synchronization mechanism, so I at one point ran the iOS app on my (still around somewhere) iPod Touch, later on Android phones, with occasional copying onto Unix desktop. But because coordinating versions amounted to doing by-hand git patching, this was way less useful than wanted.
  • For quite some time, I used Org Mode files for my grocery list, syncing between desktop and mobile devices using SyncThing. This was decently viable, actually, allowing me to manage the document on desktop (with a lovely big keyboard and big screen for mass editing) and on mobile device (so the list is handy to mark off purchases). Once in a while, I would push completed items into a side file for posterity. I also copy this data to a Git repository (for arguably “more posterity”); it is not as automated as it ought to be, trying to automate Git checkins was more troublesome than it ought to be.

But in November, at the above mentioned talk, I installed Task Warrior on my phone and decided to give it a try. FYI, I run it in a Termux session on my phone. There do exist a couple of TaskWarrior apps, but I have not yet been able to get them to sync to my taskd server. I am happy enough with the Termux-based CLI access. Perhaps I should set up a web server as an alternative interface? We’ll see…

Overall Configuration

I have the app (apt-get install taskwarrior) installed on a variety of hosts:

  • Work desktop, which syncs against central taskd
  • Chromebook laptop, syncs against central taskd
  • Home server Karush, which hosts taskd and has client syncing against taskd
  • OnePlus 5 (Android phone), where termux hosts the app, syncing against taskd

I installed taskd on a server at home. This was a bit of a pain in the neck; setting up users and certificates is fairly fiddly as also is setup of each client. It took a few tries to get it all working, and I went through a couple of user UUIDs before I was done. It comes with a systemd unit file; I have not thus far had that work, so I have to browse through history (boo, hiss!) to find the right command to restart it properly upon system reboot it took some effort to get that working properly.

One interesting thing I noticed; when syncing got “deranged” and I wound up on a new user UUID, I found that, in order to get things unstuck, I had to edit ~/.task/backlog.data. Note that this file contains the UUID of the user that it intends to sync against. (I’m not 100% sure; this may be the “local” idea of the UUID of the user…) The removal of the UUID at the top of that file led to the local instance of task generating a new UUID and proceeding.

Usage

I basically started out by tossing in all sorts of tasks that popped up, without too much intentionality, just making sure that knowledge about upcoming Things To Do got captured. I wasn’t sure what projects or tags to use; it is out of seeing a bunch of tasks that need to be classified that the patterns will emerge. I am basically just 3 months into this, so while some patterns have emerged, there are plenty more to come.

  • It turns out that tagging with +home and +work is mighty useful, for the simple reason that it allows en-masse deferral of tasks. At the end of the work day, I find it handy to simply defer items to another day thus:
    task +work status:pending wait:tomorrow
    It would probably be valuable to defer things further, so that my list of things to do immediately does not get randomly cluttered.
  • COVID-19 has changed the above a bit; work from home means that the separation is entirely less obvious
  • I have been adding in lots of periodic tasks as they pop up:
    • Paperwork tasks such as filing copies of pay stubs, bank statements, tax documents, and bills of importance
    • Preparations for annual events
    • Reminders for mailing list moderation
  • Some projects have been emerging, but not quickly or well. It is easier to think about adding tags, and occasionally a tag emerges as being important enough to call it a project.
  • I am (still!) not using dependencies nearly much as I probably ought to.
  • As “wishful thinking,” I’d like it if I could have grocery items dependent on a “go to grocery store” task, and have the children pop up as current the moment I arrive at the store and mark that first task done. That also means I’d like it if the children were being “masked” as not ready to proceed (ergo not listed) until I get there.
    • In reviewing Tomas’ presentation, I found A Better Way to deal with this, which is to use contexts. If my grocery items all have +metro as the locational tag (my nearby grocery store is called Metro), then I can define the relevant context:
      task context define metro +metro
      then
      task context metro
      More inclusions and exclusions could be done; in any case, it is clearly useful to use some contexts so when in a particular place, the set of tasks are restricted to those of relevance.
  • Projects (indicated by adding project:taxes.2019 or project:bugzilla.20721 or project:website) are evidently more useful than I had thought, once I start using the dotted notation to allow hierarchical subprojects. They had appeared to be way less useful than labels, but hierarchy changes that. Both are good things (in moderation!) and are good used together.

Future Items

  • Another few months of patterns hopefully leads me to get a bit smarter about how I’m using this, particularly with regards to deferring items I can’t do immediately.
  • I need to get the “sorta graphical” Android client working; need to fight with the configuration to get that working.
    Update 2020-05-07, I finally found documentation that helped me on this… https://www.swalladge.net/archives/2018/03/17/taskwarrior-taskserver-syncing-tutorial/ had the crucial aspect that I needed to copy a trio of .pem files (certificate data for my user and for my taskd server) into /Internal Storage/Android/data/kvj.taskw/files/one-of-the-subdirectories
  • I find, regrettably, that I don’t very much like the Android client
  • There are some interesting analytical reports such as task burndown to get some longer term patterns out of it. For that to provide value requires more data collection.
  • I imagine I should be automating some task management, such as having things like the following:
    • TaskWarrior should draw a feed of tasks from bug reports. There’s an extension to pull from Github
    • We’re apparently getting into Scrum work; it would be neat to pull Jira tasks into TaskWarrior automatically
  • There’s an Emacs mode; wait, wait, that’s actually comparatively complete, despite being exceeding brief. It works, and is already useful.
    It probably would be worth extending this to allow operations other than ‘a’ (to add a task) and ‘g’ (to refresh the list), to have a set of interactions one might perform on visible items. The Kubernetes interaction mode for Emacs has some relevant examples.
  • I’m told hooks are interesting, and certainly grasp the broad concept from the way that Emacs uses hooks really really a lot…
    At first glance, it seems less interesting than I had expected…
    • One use case is to automatically commit changes to Git; that is broadly interesting, but I think I can live with that taking place pretty periodically rather than continuously. Especially in that I switch clients a lot, so that keeping consistency would require a lot of Git synchronization.
    • Another usage is to rewrite tasks.
      An interesting example was to change notations, so @projectname would be used to specify project, which is shorter than project:projectname. As above, this needs to run “everywhere” which seems less than convenient. (Again, needs Git repo synchronization, this time for the repo containing the hooks.)

Conclusions

I have been happy enough with my experiences with TaskWarrior, and will be continuing to use it. There are clearly plenty of features I am not using yet, some of which may make sense to poke at further.

A wee jot about Mosh

I have been using Mosh for quite a number of years now; it is a notionally “mobile” shell that nicely supports devices with intermittent connectivity. On occasion, I have used it as an alternative protocol to ssh when using my laptops/tablets/phones to connect to shell sessions.

Its main merits (to me) are that:

  • Sessions can survive even fairly long connectivity outages. The more I use tmux to manage sessions on servers, the less that matters, but it is still a useful convenience particularly with connections from my phone.
  • Rather than replaying every keystroke (or every receipt of a character of a log file /bin/cat’ed to stdout), it maintains the state of the screen, so it can refresh the screen, skipping over long-irrelevant output, which is an extraordinary network performance improvement if one is browsing server logs…

Curiously, every so often, and this is why I thought to blog about this, I periodically still get forwarded notifications that people continue to report on issue #98 which I helpt report on back in 2012. I was a bit nonplussed this week to notice another update to this that indicates that people are continue to use (or at least reference) my circa-2012 workaround to issues getting Mosh to connect across systems with slightly differing ideas of UTF-8. I suppose I should be proud that my workaround (which is to explicitly pass LANG and LC_ALL values to mosh client and server programs) continues to seem a relevant solution. I have shell scripts lurking around that are almost 8 years old for doing mosh connections in my local environments that use this. I am, however, a wee bit disappointed that nearly 8 years of further development hasn’t made it unnecessary to tweak these environment aspects.

It is a somewhat happy thing that Mosh’s code base is stable enough (and I note it’s included in numerous Linux and BSD distributions, as well as having support in Android apps such as JuiceSSH) that it is, of late, seeing new commits only every few months.

Oh Shell

I have been poking for a while at the Oh Shell, presented at the 2018 BSDCan. It observes that there are a bunch of things about shells that tend to be painful, which has led to a whole bunch of shells coming out that are (exceedingly thin) veils over other programming languages, which then naturally attends them being of no general interest.

Here are a set of problems that Michael Macinnis pokes at:

  • Undefined variables – bash has an option to gripe about such, but it’s no default
  • varadic functions
  • word splitting versus lists
  • return values 0-255 – somewhat intentional, so that functions look like processes
  • global variables are mostly all that’s available
  • little modularity is possible because everything is in the global environment. This is somewhat like a worse version of Lisp dynamic scope
  • tortured syntax, particularly for variable expansions/rewrites


He presents a variety of mechanisms to try to solve these problems:

  • same syntax for code and data (conses)
  • richer set of data types (strings, symbols, number tower, lists, and some more sophisticated bits
  • first class environment via define/export
  • Kernel like Fexprs – enabled by first class environment. See John N Shutt’s thesis, vau: the ultimate abstraction
  • support dynamic communication patterns – see Squeak (Pike and Cardelli)

The shell is implemented in Go, making it pretty natural to do the “dynamic communication pattern” bit via GoRoutines. It is mostly an implementation of Scheme, with a largely Unix-y syntax. The main places where it deviates towards Scheme (that I have thus far noticed) are:

  • It has a preference for prefix notation for arithmetic rather than infix
  • The “:” character indicates subsumed evaluation of blocks, which is loosely like a (let (()) ()) structure.

I’m not yet sure that it’s enough “cleaner” than other shells that it is worth going in on to any special degree. The modularity aspects would prove quite interesting, if libraries of code using them were to emerge. The absence of libraries for the existing shells is unfortunate. You can certainly find a zillion extensions for Bash and Zsh, but in the absence of deep modularity, the connections can only be very shallow. You’re just a captured environment variable away from making that extension blow up…

Happy 2020

It sure has been a while since the last time I did up a blog entry…

A thing for 2020 is to do so slightly more frequently, perhaps somewhat systematically. I suppose I’m one of the exceedingly independent “non-herdable cats” of the https://indieweb.org/ movement. I’m not especially following anyone else; just following the loose principles that…

  • I should generate my own content
  • On my own web site
  • Hosted on my own domain

Rather than depending on the vagaries of others’ platforms. If you’re depending on Google Plus to publicize your material, oops, it’s gone! And the same is true for other platforms like Facebook or centralized “syndication” systems.

I won’t be getting rich by having someone’s ads on my site, but, again, that’s not a stable source of monies for much the same reasons suggested about material syndication.

The above is all pretty “meta”, and shouldn’t interest people terribly much. What I probably ought to be writing about that might be somewhat interesting would be about things like the following:

  • I have been fooling around with TaskWarrior, a somewhat decentralized ToDo/Task manager, which is allowing me to track all sorts of things I ought to be doing.
    The interesting bit of this is that I’m capturing a whole lot of “things to research”, which tends to point at software I probably ought to consider using, adapting, or, just as likely, ignoring, due to it not being interesting enough.
  • My web site nearby is managed using SGML/DocBook, which is a toolset that is getting increasingly creaky. I’d quite like to switch to another data format that is easier to work with. Some ideas include OrgMode and TeXinfo. I did some poking around to try to find tools to convert DocBook into such; the tools seem to only be suitable for reasonably small documents, and I have 122K lines of SGML, which makes that choke…
  • I have been fooling around with Oh shell (it’s written in Go, and essentially implements Scheme behind the scenes) as a possibly better shell. I’m trying to collect better thoughts as to why that might be a good idea. (I’m not sure Oh is the right shell though)
  • My cfengine 2 configuration management scripts are getting mighty creaky. Initial research focusing on SaltStack and Ansible showed off that those sorts of tools are totally not suitable to the problem I am solving, which is that of managing configuration (e.g. – dotfiles) and the differences needed in differing environments (e.g. – home versus work, servers versus laptop)
  • I’m poking at using Tmux more extensively. I started using GNU Screen in the early 20-noughts, and switched to somewhat simpler to manage tmux a few years ago. There are now tools like tmuxinator for managing sophisticated tmux-based environments, and it looks like that could be quite useful.
  • The big “work” thing I have been gradually chipping away at is Kubernetes. I tend to build batch processes, so this functions quite differently than usual documented use cases.
  • Apparently I should look at some “scrum” tools for task boards; some searching found a bunch of tools where research tasks are queued up in TaskWarrior to get dropped on me at some point…
  • I need to revisit my EmacsConf 2019 notes to see what sorts of things are worth poking at more.

Emacsconf 2019

Operated at https://emacsconf.org/2019/, this conference on “all matters Emacs” went quite well; I was very pleased to have noticed it a couple of weeks beforehand.

They had some struggles publishing video; whomever it was that had the idea of pre-recording the lightning talks was really onto something, as that gave them material to present whilst getting the glitches dealt with. Hopefully some lessons were gleaned from the struggles so that the organizers do not wind up prematurely aged 🙂

The other thing that was a fantastic thing was the https://emacsconf.org/2019/pad “Pad” where they collected a stream of comments. In the world of social media, this sort of collection seems to head into wildly awful places. But this particular comment stream was sheer gold, collecting sets of URLs and viewers’ notes that were somewhat better than the notes I was trying to take, and which collected URLs, questions, and answers.

Actually, in going back and looking, the talk on “Emacs as My Go To Scripting Language” led to adding a discussion on the Pad on a wide set of generative approaches to building regular expressions (for an overview of ideas, see https://gist.github.com/aterweele/11bdefcac0255baa3a8a71d498236d0d ) which was a thought-provoking addition that wasn’t remotely part of the talk. It’s a nice sort of equivalent to the in-person “Hall Discussions Track” that is often the best part of a conference.

Many thanks to the organizers, I hope they have recovered! 🙂

PGCon 2018 Unconference

I attended the PGCon 2018 Developer Unconference, on May 30, 2018, which had, as always, a goodly mixture of discussions of things in progress.

Schema Deployment

I proposed a discussion of schema deployment methods; it fell just short of attracting enough interest to get a room. A few people asked me in passing about what I’d had in mind; I’ll point at the Github project, Mahout, which indicates the set of “agenda” that I have in the matter.

Mahout is a lightweight tool (e.g. – needs Bash, psql, and the most sophisticated shell thing needed is tsort), which, being PostgreSQL-specific, could readily be extended to support PG-specific tooling such as Slony.

In the documentation for Mahout, I describe my goals for it, and the things I hope it solves more satisfactorily than some of the other tools out there. I’d be particularly keen on getting any input as to useful purposes that point to changes in functionality.

JIT Compilation

– Andres Freund
– Biggest benefit comes in aggregate processing, that is where there is huge computational work
– planning is presently too simplistic
– no cacheing of compiled JIT code
– code is somewhat slow in some cases
– JIT compilation of COPY would probably be helpful
– COPY, cuts out a lot of presently kludgy C code
– Sorts
– hashing for hash aggregates and hash joins, there is already a prototype…
– interesting idea to capture generated code in a table (Jeff)
– either C or LLVM bitcode
– bitcode may be architecture-dependent
– want better EXPLAIN [ANALYZE] output (Teodor)
– better code generation for expression evaluation (Andres)
– Presently no Windows support
– Once you have cacheing, OLTP performance will improve, but we’re certainly not there now
– local cache, initially; eventually a global cache
– LRU cache
– If I generated much the same code last time, can reuse the compiled code
– would move some work from executor to the planner, but this is a pretty deep architectural change for now
– can definitely get degenerate optimization cases; gotta watch out for that
– generated code is way denser than the executor plans, so there are cases of significant improvements in memory usage
– Incremental JIT compilation (do it in background, after starting query, but before execution)
– impact of threading? Worker backends + data marshalling?

Connection pooler – Odyssey

– Multithreaded connection pooler and request router
– Open source release
Yandex/Odyssey @ GitHub
– Multithreaded via worker thread
– each thread arranges authentication and proxying of client-to-server and server-to-client requests
– worker threads share global server connection pools
– SSL/TLS supported
– tracks transaction state
– can emit CANCEL on connections and ROLLBACK of abandoned transactions before returning connection to pool
– Pools defined as pair of DB/User
– Each pool can authenticate separately and have different mode/limit settings
– UUID identifiers for each connection
– Log events and client error responses include the UUID

Monitoring

– Greg Stark
– Splunk with alerts based on log data
– Nice to have things actively exported by PostgreSQL
– Exposing
– Aggregating
– Reporting
– Error reporting
– Log files too chatty, lot of data all together
– Could different data be separated?
– But how about when it needs to be correlated?
– Sensitive data…
– Too much parsing needs to be done
– Loading into DB makes it structured, but recursive problems, and can overload the DB
– Metrics
– start with What You Want Measured…
– Rates of many things
– vacuum activity
– WAL generation
– error rates
– index usages
– violation statistics, rates of rollbacks and errors
– Nice to have…
– pg stats with numbers of violations and contentions
– let the stats collector collect a bit more data
– connection usage statistics
– Some tools
– Jaeger – JaegerTracing
Zipkin – distributed tracing system to find latency issues in microservice architectures
Opentracing – vender neutral APIs and instrumentation for distributed tracing
– Can there be a “pg stat user table” indicating bloat information?

Query Optimization with Partitioned Tables

– Planned improvements in PG11
– Partition wise pairs
– Partition wise aggregation
– Partition pruning
– Planning time
– Runtime

TDE – Transparent Data Encryption

– Inshung Moon
– Buffer level encryption/decryption
– Per table encryption
– Perhaps should be per-column???
– 2-tier encryption key management
– Working with external key management services (KMS)
– WAL encryption
– only doing encryption on parts other than header
– Nice to have it on LOB (large objects API) too, but no easy way…
– Log file data needs to be encrypted before submission to destinations

Concerns

– Encryption of indexes is troublesome
– You lose the usefulness of ordering of disk
– Table added with a per-table/column private key
– What if some data seems to be exposed? Need to generate new key
and rewrite? This would be arbitrarily expensive…
– Changing master key is easy, as long as the function for
generating the private symmetric (per-table key) is symmetric

Threat model

– Translucent Databases
– Peter Wayner
– Order preserving encryption
– Agrawal encryption scheme
Order Preserving Encryption for Numeric Data, by Agrawal, Kernan, Srikant, Xu
IBM Almaden
SIGMOD paper

Spamalicious times

Hmmph. Google sent me a “nastygram” indicating that one of my blog entries had something suggestive of content injection.

I poked around, and it was by no means evident that it was really so. The one suspicious posting was http://linuxdatabases.info/blog/?p=99 which legitimately has some stuff that looks like labels, as it contains a bunch of sample SQL code. I’m suspicious that they’re accounting that as being evil…

But it pointed me at a couple of mostly-irritating things…

  1. I haven’t generated a blog entry since 2013. Well, I’m not actually hugely worried about that.
  2. I reviewed proposed response posts, since, probably about 2013. Wow, oh wow, was that ever spam-filled. Literally several thousand attempts to get me to publish various and sundry advertising links. It’s seriously a pain to get rid of them all, as I could only trim out about 150 at a time. And hopefully there weren’t many “real” proposed postings; it’s almost certain I’ll have thrown those away. (Of course, proposed postings about things I said in 2013… How relevant could it still be???)