Oh Shell

I have been poking for a while at the Oh Shell, presented at the 2018 BSDCan. It observes that there are a bunch of things about shells that tend to be painful, which has led to a whole bunch of shells coming out that are (exceedingly thin) veils over other programming languages, which then naturally attends them being of no general interest.

Here are a set of problems that Michael Macinnis pokes at:

  • Undefined variables – bash has an option to gripe about such, but it’s no default
  • varadic functions
  • word splitting versus lists
  • return values 0-255 – somewhat intentional, so that functions look like processes
  • global variables are mostly all that’s available
  • little modularity is possible because everything is in the global environment. This is somewhat like a worse version of Lisp dynamic scope
  • tortured syntax, particularly for variable expansions/rewrites


He presents a variety of mechanisms to try to solve these problems:

  • same syntax for code and data (conses)
  • richer set of data types (strings, symbols, number tower, lists, and some more sophisticated bits
  • first class environment via define/export
  • Kernel like Fexprs – enabled by first class environment. See John N Shutt’s thesis, vau: the ultimate abstraction
  • support dynamic communication patterns – see Squeak (Pike and Cardelli)

The shell is implemented in Go, making it pretty natural to do the “dynamic communication pattern” bit via GoRoutines. It is mostly an implementation of Scheme, with a largely Unix-y syntax. The main places where it deviates towards Scheme (that I have thus far noticed) are:

  • It has a preference for prefix notation for arithmetic rather than infix
  • The “:” character indicates subsumed evaluation of blocks, which is loosely like a (let (()) ()) structure.

I’m not yet sure that it’s enough “cleaner” than other shells that it is worth going in on to any special degree. The modularity aspects would prove quite interesting, if libraries of code using them were to emerge. The absence of libraries for the existing shells is unfortunate. You can certainly find a zillion extensions for Bash and Zsh, but in the absence of deep modularity, the connections can only be very shallow. You’re just a captured environment variable away from making that extension blow up…

Happy 2020

It sure has been a while since the last time I did up a blog entry…

A thing for 2020 is to do so slightly more frequently, perhaps somewhat systematically. I suppose I’m one of the exceedingly independent “non-herdable cats” of the https://indieweb.org/ movement. I’m not especially following anyone else; just following the loose principles that…

  • I should generate my own content
  • On my own web site
  • Hosted on my own domain

Rather than depending on the vagaries of others’ platforms. If you’re depending on Google Plus to publicize your material, oops, it’s gone! And the same is true for other platforms like Facebook or centralized “syndication” systems.

I won’t be getting rich by having someone’s ads on my site, but, again, that’s not a stable source of monies for much the same reasons suggested about material syndication.

The above is all pretty “meta”, and shouldn’t interest people terribly much. What I probably ought to be writing about that might be somewhat interesting would be about things like the following:

  • I have been fooling around with TaskWarrior, a somewhat decentralized ToDo/Task manager, which is allowing me to track all sorts of things I ought to be doing.
    The interesting bit of this is that I’m capturing a whole lot of “things to research”, which tends to point at software I probably ought to consider using, adapting, or, just as likely, ignoring, due to it not being interesting enough.
  • My web site nearby is managed using SGML/DocBook, which is a toolset that is getting increasingly creaky. I’d quite like to switch to another data format that is easier to work with. Some ideas include OrgMode and TeXinfo. I did some poking around to try to find tools to convert DocBook into such; the tools seem to only be suitable for reasonably small documents, and I have 122K lines of SGML, which makes that choke…
  • I have been fooling around with Oh shell (it’s written in Go, and essentially implements Scheme behind the scenes) as a possibly better shell. I’m trying to collect better thoughts as to why that might be a good idea. (I’m not sure Oh is the right shell though)
  • My cfengine 2 configuration management scripts are getting mighty creaky. Initial research focusing on SaltStack and Ansible showed off that those sorts of tools are totally not suitable to the problem I am solving, which is that of managing configuration (e.g. – dotfiles) and the differences needed in differing environments (e.g. – home versus work, servers versus laptop)
  • I’m poking at using Tmux more extensively. I started using GNU Screen in the early 20-noughts, and switched to somewhat simpler to manage tmux a few years ago. There are now tools like tmuxinator for managing sophisticated tmux-based environments, and it looks like that could be quite useful.
  • The big “work” thing I have been gradually chipping away at is Kubernetes. I tend to build batch processes, so this functions quite differently than usual documented use cases.
  • Apparently I should look at some “scrum” tools for task boards; some searching found a bunch of tools where research tasks are queued up in TaskWarrior to get dropped on me at some point…
  • I need to revisit my EmacsConf 2019 notes to see what sorts of things are worth poking at more.

Emacsconf 2019

Operated at https://emacsconf.org/2019/, this conference on “all matters Emacs” went quite well; I was very pleased to have noticed it a couple of weeks beforehand.

They had some struggles publishing video; whomever it was that had the idea of pre-recording the lightning talks was really onto something, as that gave them material to present whilst getting the glitches dealt with. Hopefully some lessons were gleaned from the struggles so that the organizers do not wind up prematurely aged 🙂

The other thing that was a fantastic thing was the https://emacsconf.org/2019/pad “Pad” where they collected a stream of comments. In the world of social media, this sort of collection seems to head into wildly awful places. But this particular comment stream was sheer gold, collecting sets of URLs and viewers’ notes that were somewhat better than the notes I was trying to take, and which collected URLs, questions, and answers.

Actually, in going back and looking, the talk on “Emacs as My Go To Scripting Language” led to adding a discussion on the Pad on a wide set of generative approaches to building regular expressions (for an overview of ideas, see https://gist.github.com/aterweele/11bdefcac0255baa3a8a71d498236d0d ) which was a thought-provoking addition that wasn’t remotely part of the talk. It’s a nice sort of equivalent to the in-person “Hall Discussions Track” that is often the best part of a conference.

Many thanks to the organizers, I hope they have recovered! 🙂

PGCon 2018 Unconference

I attended the PGCon 2018 Developer Unconference, on May 30, 2018, which had, as always, a goodly mixture of discussions of things in progress.

Schema Deployment

I proposed a discussion of schema deployment methods; it fell just short of attracting enough interest to get a room. A few people asked me in passing about what I’d had in mind; I’ll point at the Github project, Mahout, which indicates the set of “agenda” that I have in the matter.

Mahout is a lightweight tool (e.g. – needs Bash, psql, and the most sophisticated shell thing needed is tsort), which, being PostgreSQL-specific, could readily be extended to support PG-specific tooling such as Slony.

In the documentation for Mahout, I describe my goals for it, and the things I hope it solves more satisfactorily than some of the other tools out there. I’d be particularly keen on getting any input as to useful purposes that point to changes in functionality.

JIT Compilation

– Andres Freund
– Biggest benefit comes in aggregate processing, that is where there is huge computational work
– planning is presently too simplistic
– no cacheing of compiled JIT code
– code is somewhat slow in some cases
– JIT compilation of COPY would probably be helpful
– COPY, cuts out a lot of presently kludgy C code
– Sorts
– hashing for hash aggregates and hash joins, there is already a prototype…
– interesting idea to capture generated code in a table (Jeff)
– either C or LLVM bitcode
– bitcode may be architecture-dependent
– want better EXPLAIN [ANALYZE] output (Teodor)
– better code generation for expression evaluation (Andres)
– Presently no Windows support
– Once you have cacheing, OLTP performance will improve, but we’re certainly not there now
– local cache, initially; eventually a global cache
– LRU cache
– If I generated much the same code last time, can reuse the compiled code
– would move some work from executor to the planner, but this is a pretty deep architectural change for now
– can definitely get degenerate optimization cases; gotta watch out for that
– generated code is way denser than the executor plans, so there are cases of significant improvements in memory usage
– Incremental JIT compilation (do it in background, after starting query, but before execution)
– impact of threading? Worker backends + data marshalling?

Connection pooler – Odyssey

– Multithreaded connection pooler and request router
– Open source release
Yandex/Odyssey @ GitHub
– Multithreaded via worker thread
– each thread arranges authentication and proxying of client-to-server and server-to-client requests
– worker threads share global server connection pools
– SSL/TLS supported
– tracks transaction state
– can emit CANCEL on connections and ROLLBACK of abandoned transactions before returning connection to pool
– Pools defined as pair of DB/User
– Each pool can authenticate separately and have different mode/limit settings
– UUID identifiers for each connection
– Log events and client error responses include the UUID

Monitoring

– Greg Stark
– Splunk with alerts based on log data
– Nice to have things actively exported by PostgreSQL
– Exposing
– Aggregating
– Reporting
– Error reporting
– Log files too chatty, lot of data all together
– Could different data be separated?
– But how about when it needs to be correlated?
– Sensitive data…
– Too much parsing needs to be done
– Loading into DB makes it structured, but recursive problems, and can overload the DB
– Metrics
– start with What You Want Measured…
– Rates of many things
– vacuum activity
– WAL generation
– error rates
– index usages
– violation statistics, rates of rollbacks and errors
– Nice to have…
– pg stats with numbers of violations and contentions
– let the stats collector collect a bit more data
– connection usage statistics
– Some tools
– Jaeger – JaegerTracing
Zipkin – distributed tracing system to find latency issues in microservice architectures
Opentracing – vender neutral APIs and instrumentation for distributed tracing
– Can there be a “pg stat user table” indicating bloat information?

Query Optimization with Partitioned Tables

– Planned improvements in PG11
– Partition wise pairs
– Partition wise aggregation
– Partition pruning
– Planning time
– Runtime

TDE – Transparent Data Encryption

– Inshung Moon
– Buffer level encryption/decryption
– Per table encryption
– Perhaps should be per-column???
– 2-tier encryption key management
– Working with external key management services (KMS)
– WAL encryption
– only doing encryption on parts other than header
– Nice to have it on LOB (large objects API) too, but no easy way…
– Log file data needs to be encrypted before submission to destinations

Concerns

– Encryption of indexes is troublesome
– You lose the usefulness of ordering of disk
– Table added with a per-table/column private key
– What if some data seems to be exposed? Need to generate new key
and rewrite? This would be arbitrarily expensive…
– Changing master key is easy, as long as the function for
generating the private symmetric (per-table key) is symmetric

Threat model

– Translucent Databases
– Peter Wayner
– Order preserving encryption
– Agrawal encryption scheme
Order Preserving Encryption for Numeric Data, by Agrawal, Kernan, Srikant, Xu
IBM Almaden
SIGMOD paper

Spamalicious times

Hmmph. Google sent me a “nastygram” indicating that one of my blog entries had something suggestive of content injection.

I poked around, and it was by no means evident that it was really so. The one suspicious posting was http://linuxdatabases.info/blog/?p=99 which legitimately has some stuff that looks like labels, as it contains a bunch of sample SQL code. I’m suspicious that they’re accounting that as being evil…

But it pointed me at a couple of mostly-irritating things…

  1. I haven’t generated a blog entry since 2013. Well, I’m not actually hugely worried about that.
  2. I reviewed proposed response posts, since, probably about 2013. Wow, oh wow, was that ever spam-filled. Literally several thousand attempts to get me to publish various and sundry advertising links. It’s seriously a pain to get rid of them all, as I could only trim out about 150 at a time. And hopefully there weren’t many “real” proposed postings; it’s almost certain I’ll have thrown those away. (Of course, proposed postings about things I said in 2013… How relevant could it still be???)

Nexus 7 on CyanogenMod

At last…

I had been lazy, leaving all alone.

In February, I figured I was heading off for a chunk of the month on a cruise, hence wanting tablet for multimedia, but without network, so it was timely not to spend time fiddling with configuration with possible risk of mussing such up.

Alas, the OTA upgrade to JellyBean did a certain chunk of mussing…  It busted SuperUser access, thereby breaking Titanium Backup.  No backups went properly since :-(.

So, today seemed right timing.  I wanted backup, and needed root, the latter looking like a fight.  Ah well, go for gusto, see what we get without it…

I had to upgrade adb to support latest Android…   Got Clockwork Recovery in place, and zip files for CM10.1 and Google Apps…

The last backup was Feb 16, but happily the files still remained after fresh CM10.1 installation, so I could do a good chunk of recovery of apps, and in plenty of cases, this was basically network configuration, so apps would update their own data upon startup.  Sweet!

Superuser is nicely integrated into CM10, also sweet, no extra installation process.

I’ll need to reconfigure the launcher, due to the shift from ADW (I had a license) to built-in Trebuchet on CM10, but that seems like the “worst” irritation, and one I can well live with.

I’m not sure I can readily identify big differences between stock Android and CM10, but there are nice small creature comforts my CM10 phone has gotten me used to, like a quick “turn on/off WiFi” directly on notification screens.  Small but I like it.

Mailman subscriber lists

As part of “due diligence” for some mailing lists I am involved with (for Slony, see slony-backups ), I discovered the need to dump out Mailman mailing list subscribers.

There is a script to do this, written in Python, mentioned on the Mailman wiki, accessible as mailman-subscribers.py

I’d kind of rather have something a bit more version-tracked, so I poked around at GitHub, and found larsks / mailman-subscribers

That was a little out of date; the last code was from a couple of years ago, so I forked, updated to the latest, and suggested that “larsks” pull it, which he did, quite quickly.

The “kudos” bit is that I noticed a bit of a blemish, in that the mailing list password was required to be on the command line, thereby making it visible to anyone with access to /usr/bin/ps on one’s system. I submitted a feature request, and Lars was so kind as to have this feature added so quickly that by the time I had the prototype of my Slony “subscriber backup” script working, I immediately needed to change it to make use of the lovely new password-in-file feature. Nice!

Installing git-annex from Debian unstable

Installing git-annex from unstable

I happen to be a supporter of Joey Hess’ Git Annex Kickstarter project; no big bucks, but it seemed a good thing to help out.

I got in the stickers, that were my “project reward,” and figured I should start playing with the new results. I’m particularly keen on the planned Android client, but I should make some use of it before that comes available.

There’s good news, and bad news:

Good news
He has added in an assistant to provide interactive help in setting up repositories. It’s included in debian unstable, in a version released September 24th.
Bad news
I generally prefer using packages from debian testing, and it has a version released July 24th, well before any of this, and without any of Joey’s recent enhancements.

Fortunately, drawing in the September/~unstable~ version isn’t too terribly difficult. My /etc/apt/preferences.d/simple configuration has Pin-Priority values that prefer stable over testing, testing over unstable, and unstable over experimental (where enormous potential for breakage lies!).

As a consequence, installing the testing version is pretty easy, albeit involving an option I had to go looking for:

root@cbbrowne:~# apt-get -t unstable install git-annex
... leads to loading ...
Get:1 http://ftp.us.debian.org/debian/ unstable/main git-annex amd64 3.20120924 [7,411 kB]

And, with a run of % git annex webapp, it’s up and running!

Worth observing… The documentation tree includes the entirety of Joey’s blog documenting his development efforts.  Possibly excessive, but it’s certainly not to be called inadequate documentation.

Netboot via PXE

Netboot via PXE 2012-03-13 Tue

Some notes

To get this to work, you need…

BIOS ROM that supports PXE
True for most modern motherboards and/or NICs
DHCP server
To manage passing out configuration such as IP addresses and the next-server attribute.
TFTP server
With images
???
It looks for images based on most-to-least specific configuration

  • MAC address
  • IP subnet
  • Default

Some things PXE doesn’t support

It was created as a standard in 1999, and hasn’t been updated much since, so there are things that postdate it, and that are thus not supported.

WIFI
Likely to be troublesome anyways, as you surely want some authentication to get onto a WIFI network
IPv6
It wasn’t clear that it yet mattered in 1999…
DNS
It works with IP addresses only

DHCP discussion

  • Go look for next-server attribute
  • Some discussion of handling sharing subnets across a redundant set of DHCP servers

More worth looking at

Inquisitor
OSS hardware testing tool that’s better than memtest
gPXE
OSS bootloader

  • Supports DNS, so can forward requests broadly potentially anywhere
  • Can transfer data across additional protocols, such as HTTP, HTTPS, SAN (iSCSI, AoE)
  • Can support WIFI
  • Possibly IPv6