PGCon 2018 Unconference

I attended the PGCon 2018 Developer Unconference, on May 30, 2018, which had, as always, a goodly mixture of discussions of things in progress.

Schema Deployment

I proposed a discussion of schema deployment methods; it fell just short of attracting enough interest to get a room. A few people asked me in passing about what I’d had in mind; I’ll point at the Github project, Mahout, which indicates the set of “agenda” that I have in the matter.

Mahout is a lightweight tool (e.g. – needs Bash, psql, and the most sophisticated shell thing needed is tsort), which, being PostgreSQL-specific, could readily be extended to support PG-specific tooling such as Slony.

In the documentation for Mahout, I describe my goals for it, and the things I hope it solves more satisfactorily than some of the other tools out there. I’d be particularly keen on getting any input as to useful purposes that point to changes in functionality.

JIT Compilation

– Andres Freund
– Biggest benefit comes in aggregate processing, that is where there is huge computational work
– planning is presently too simplistic
– no cacheing of compiled JIT code
– code is somewhat slow in some cases
– JIT compilation of COPY would probably be helpful
– COPY, cuts out a lot of presently kludgy C code
– Sorts
– hashing for hash aggregates and hash joins, there is already a prototype…
– interesting idea to capture generated code in a table (Jeff)
– either C or LLVM bitcode
– bitcode may be architecture-dependent
– want better EXPLAIN [ANALYZE] output (Teodor)
– better code generation for expression evaluation (Andres)
– Presently no Windows support
– Once you have cacheing, OLTP performance will improve, but we’re certainly not there now
– local cache, initially; eventually a global cache
– LRU cache
– If I generated much the same code last time, can reuse the compiled code
– would move some work from executor to the planner, but this is a pretty deep architectural change for now
– can definitely get degenerate optimization cases; gotta watch out for that
– generated code is way denser than the executor plans, so there are cases of significant improvements in memory usage
– Incremental JIT compilation (do it in background, after starting query, but before execution)
– impact of threading? Worker backends + data marshalling?

Connection pooler – Odyssey

– Multithreaded connection pooler and request router
– Open source release
Yandex/Odyssey @ GitHub
– Multithreaded via worker thread
– each thread arranges authentication and proxying of client-to-server and server-to-client requests
– worker threads share global server connection pools
– SSL/TLS supported
– tracks transaction state
– can emit CANCEL on connections and ROLLBACK of abandoned transactions before returning connection to pool
– Pools defined as pair of DB/User
– Each pool can authenticate separately and have different mode/limit settings
– UUID identifiers for each connection
– Log events and client error responses include the UUID

Monitoring

– Greg Stark
– Splunk with alerts based on log data
– Nice to have things actively exported by PostgreSQL
– Exposing
– Aggregating
– Reporting
– Error reporting
– Log files too chatty, lot of data all together
– Could different data be separated?
– But how about when it needs to be correlated?
– Sensitive data…
– Too much parsing needs to be done
– Loading into DB makes it structured, but recursive problems, and can overload the DB
– Metrics
– start with What You Want Measured…
– Rates of many things
– vacuum activity
– WAL generation
– error rates
– index usages
– violation statistics, rates of rollbacks and errors
– Nice to have…
– pg stats with numbers of violations and contentions
– let the stats collector collect a bit more data
– connection usage statistics
– Some tools
– Jaeger – JaegerTracing
Zipkin – distributed tracing system to find latency issues in microservice architectures
Opentracing – vender neutral APIs and instrumentation for distributed tracing
– Can there be a “pg stat user table” indicating bloat information?

Query Optimization with Partitioned Tables

– Planned improvements in PG11
– Partition wise pairs
– Partition wise aggregation
– Partition pruning
– Planning time
– Runtime

TDE – Transparent Data Encryption

– Inshung Moon
– Buffer level encryption/decryption
– Per table encryption
– Perhaps should be per-column???
– 2-tier encryption key management
– Working with external key management services (KMS)
– WAL encryption
– only doing encryption on parts other than header
– Nice to have it on LOB (large objects API) too, but no easy way…
– Log file data needs to be encrypted before submission to destinations

Concerns

– Encryption of indexes is troublesome
– You lose the usefulness of ordering of disk
– Table added with a per-table/column private key
– What if some data seems to be exposed? Need to generate new key
and rewrite? This would be arbitrarily expensive…
– Changing master key is easy, as long as the function for
generating the private symmetric (per-table key) is symmetric

Threat model

– Translucent Databases
– Peter Wayner
– Order preserving encryption
– Agrawal encryption scheme
Order Preserving Encryption for Numeric Data, by Agrawal, Kernan, Srikant, Xu
IBM Almaden
SIGMOD paper

Spamalicious times

Hmmph. Google sent me a “nastygram” indicating that one of my blog entries had something suggestive of content injection.

I poked around, and it was by no means evident that it was really so. The one suspicious posting was http://linuxdatabases.info/blog/?p=99 which legitimately has some stuff that looks like labels, as it contains a bunch of sample SQL code. I’m suspicious that they’re accounting that as being evil…

But it pointed me at a couple of mostly-irritating things…

  1. I haven’t generated a blog entry since 2013. Well, I’m not actually hugely worried about that.
  2. I reviewed proposed response posts, since, probably about 2013. Wow, oh wow, was that ever spam-filled. Literally several thousand attempts to get me to publish various and sundry advertising links. It’s seriously a pain to get rid of them all, as I could only trim out about 150 at a time. And hopefully there weren’t many “real” proposed postings; it’s almost certain I’ll have thrown those away. (Of course, proposed postings about things I said in 2013… How relevant could it still be???)

Nexus 7 on CyanogenMod

At last…

I had been lazy, leaving all alone.

In February, I figured I was heading off for a chunk of the month on a cruise, hence wanting tablet for multimedia, but without network, so it was timely not to spend time fiddling with configuration with possible risk of mussing such up.

Alas, the OTA upgrade to JellyBean did a certain chunk of mussing…  It busted SuperUser access, thereby breaking Titanium Backup.  No backups went properly since :-(.

So, today seemed right timing.  I wanted backup, and needed root, the latter looking like a fight.  Ah well, go for gusto, see what we get without it…

I had to upgrade adb to support latest Android…   Got Clockwork Recovery in place, and zip files for CM10.1 and Google Apps…

The last backup was Feb 16, but happily the files still remained after fresh CM10.1 installation, so I could do a good chunk of recovery of apps, and in plenty of cases, this was basically network configuration, so apps would update their own data upon startup.  Sweet!

Superuser is nicely integrated into CM10, also sweet, no extra installation process.

I’ll need to reconfigure the launcher, due to the shift from ADW (I had a license) to built-in Trebuchet on CM10, but that seems like the “worst” irritation, and one I can well live with.

I’m not sure I can readily identify big differences between stock Android and CM10, but there are nice small creature comforts my CM10 phone has gotten me used to, like a quick “turn on/off WiFi” directly on notification screens.  Small but I like it.

Mailman subscriber lists

As part of “due diligence” for some mailing lists I am involved with (for Slony, see slony-backups ), I discovered the need to dump out Mailman mailing list subscribers.

There is a script to do this, written in Python, mentioned on the Mailman wiki, accessible as mailman-subscribers.py

I’d kind of rather have something a bit more version-tracked, so I poked around at GitHub, and found larsks / mailman-subscribers

That was a little out of date; the last code was from a couple of years ago, so I forked, updated to the latest, and suggested that “larsks” pull it, which he did, quite quickly.

The “kudos” bit is that I noticed a bit of a blemish, in that the mailing list password was required to be on the command line, thereby making it visible to anyone with access to /usr/bin/ps on one’s system. I submitted a feature request, and Lars was so kind as to have this feature added so quickly that by the time I had the prototype of my Slony “subscriber backup” script working, I immediately needed to change it to make use of the lovely new password-in-file feature. Nice!

Installing git-annex from Debian unstable

Installing git-annex from unstable

I happen to be a supporter of Joey Hess’ Git Annex Kickstarter project; no big bucks, but it seemed a good thing to help out.

I got in the stickers, that were my “project reward,” and figured I should start playing with the new results. I’m particularly keen on the planned Android client, but I should make some use of it before that comes available.

There’s good news, and bad news:

Good news
He has added in an assistant to provide interactive help in setting up repositories. It’s included in debian unstable, in a version released September 24th.
Bad news
I generally prefer using packages from debian testing, and it has a version released July 24th, well before any of this, and without any of Joey’s recent enhancements.

Fortunately, drawing in the September/~unstable~ version isn’t too terribly difficult. My /etc/apt/preferences.d/simple configuration has Pin-Priority values that prefer stable over testing, testing over unstable, and unstable over experimental (where enormous potential for breakage lies!).

As a consequence, installing the testing version is pretty easy, albeit involving an option I had to go looking for:

root@cbbrowne:~# apt-get -t unstable install git-annex
... leads to loading ...
Get:1 http://ftp.us.debian.org/debian/ unstable/main git-annex amd64 3.20120924 [7,411 kB]

And, with a run of % git annex webapp, it’s up and running!

Worth observing… The documentation tree includes the entirety of Joey’s blog documenting his development efforts.  Possibly excessive, but it’s certainly not to be called inadequate documentation.

Netboot via PXE

Netboot via PXE 2012-03-13 Tue

Some notes

To get this to work, you need…

BIOS ROM that supports PXE
True for most modern motherboards and/or NICs
DHCP server
To manage passing out configuration such as IP addresses and the next-server attribute.
TFTP server
With images
???
It looks for images based on most-to-least specific configuration

  • MAC address
  • IP subnet
  • Default

Some things PXE doesn’t support

It was created as a standard in 1999, and hasn’t been updated much since, so there are things that postdate it, and that are thus not supported.

WIFI
Likely to be troublesome anyways, as you surely want some authentication to get onto a WIFI network
IPv6
It wasn’t clear that it yet mattered in 1999…
DNS
It works with IP addresses only

DHCP discussion

  • Go look for next-server attribute
  • Some discussion of handling sharing subnets across a redundant set of DHCP servers

More worth looking at

Inquisitor
OSS hardware testing tool that’s better than memtest
gPXE
OSS bootloader

  • Supports DNS, so can forward requests broadly potentially anywhere
  • Can transfer data across additional protocols, such as HTTP, HTTPS, SAN (iSCSI, AoE)
  • Can support WIFI
  • Possibly IPv6

Subversion “deprecation”

I was a bit tickled by the characterization I saw today in the new Subversion release, describing the deprecation of version 1.5:

The Subversion 1.5.x line is no longer supported. This doesn't mean
that your 1.5 installation is doomed; if it works well and is all you
need, that's fine. "No longer supported" just means we've stopped
accepting bug reports against 1.5.x versions, and will not make any
more 1.5.x bugfix releases.

They aren’t telling us the world will end for anyone using version 1.5, just that they don’t intend to provide support anymore.

Which seems like a fine thing. Version 1.5 is 3 years old, and, when they seem to be releasing about a version per year (1.0 in 2004, 1.7 in 2011), 3 years of backwards support doesn’t seem dramatically insufficient. Particularly if, when support goes away, you’re not inherently doomed!

PostgreSQL 9.1 now available

Making for some reasonably good news on 9/11, the next version of PostgreSQL, version 9.1, has been released.

Major enhancements include:

Synchronous replication
continuing the enhancements to built-in WAL-based replication
Per-column collations
to support linguistically-correct sorting down to the column level
Unlogged tables
improving performance for the handling of ephemeral data (e.g. – such as caches)
K-Nearest-Neighbor Indexing
indexing on distances for geographical and text-search queries
Serialized Snapshot Isolation
implementing “true serializability”
Writable Common Table Expressions
recursive and similar queries can now update data
Security Enhanced Postgres
Similar to SE-Linux, providing Mandatory Access Controls for higher grade security
Foreign Data Wrappers
attach to other databases and data sources
Extensions
managing deployment of additional database features

Many of these continue the trend of continuing to enhance features added in earlier versions (e.g. – synchronous replication, KNN, Writable CTEs)

Some introduce new kinds of functionality (e.g. – SE-Postgres, FDW, Extensions), where new seeds are sown, that we may expect to flower into further new features in future versions.

Work on version 9.2 continues apace; I’m particularly excited about Range Types, which weren’t quite ready for 9.1.

Music Playing

My latest “musical experiment” is with Clementine, which was recently added to Debian.

I should note things that I have used in the past, and some areas of past pain:

XMMS
Which has often been nice enough, but which has grown long in the tooth.
XMMS2
Which takes the desirable step of being a client/server system which admits the availability of a bunch of backends. I have, when using it, tended to prefer the shell backend.
Amarok
An “all singing, all dancing” option…

  • It uses KDE, which I’m historically not terribly keen on
  • It has libraries that are evidently clever enough to pull music off my iPod Touch as long as it’s plugged into a USB dock
  • It has the “KDE integration” that seems to want to have widgets integrating into some “KDE-compliant” window manager. I’m running StumpWM, which is decidedly not a KDE thing, so controlling Amarok always seems like a bit of a crapshoot…
  • I have played a bit with the “playlist” functionality; it hasn’t yet agreed with me…

At any rate, I saw Clementine listed as “new in Debian,” so thought I’d take a peek. I’m liking what I see thus far:

  • Onscreen widgets for all the sorts of things that need to be controlled, including
    • Managing music library, so as to add things
  • Like Amarok, it can see my iPod whenever it’s plugged in, and can play that music through the computer
  • It easily grabbed album covers (I’m not sure what service it’s using) for most of my music
  • Onscreen controls seem pretty reasonable, though I kind of wish the volume control was larger, as that’s something one wants most frequently to fiddle with.
  • There’s a cool visualization widget (think “equalizer”)

Seems pretty likable thus far…