I’ve been wrestling with some thoughts about mechanical music and musical machines lately, prompted by two things that came across the ol’ Facebook transom in the last few months: an MIT Technology Review piece from December about musical composition and machine learning and my first exposure to a most marvelous form of orchestrion, the magical rarity that is the Hupfeld Phonoliszt Violina. This is, of course, only the beginning of a thought. My aim in this post is to explore some examples and see where they take me. I’m most interested in the different roles that coding and proceeding by algorithm play in my examples.
Before I turn to those examples, I want to make it clear that I’m not talking about the kinds of machine music created by more recent electronic instruments (see Peter Manning’s Electronic and Computer Music for that sort of thing, if you’re curious). I’m talking specifically about music originally composed and played by human beings (in this case J.S. Bach and Chopin) on the usual orchestral and recital instruments that happens to have been either adapted for self-playing instruments (orchestrions, pianolas/player-pianos) or recreated algorithmically by machine-learning systems. I’m also not going to deal with music originally composed for self-playing instruments in this post, although there are quite a few of these compositions; I will almost certainly have to say something about them later as a contrast case, as they are an important part of the puzzle that I’m curious about. 
Bach to the Future
Johann Sebastian Bach (and he is hardly alone in this, as Baroque folk go) looks at first glance like a good example of a composer whose work is transparently algorithmic in nature. Indeed, this is exactly what drew Hadjeres & Pachet (2016) to Bach’s short chorales, which they selected for their machine-learning project, DeepBach, because a) there are enough of them (389) to provide a useful sample, and b) they are helpfully homogeneous:
All these short pieces (approximately one minute long) are written for a four-part chorus (soprano, alto, tenor and bass) using similar compositional principles: the composer takes a well-known (at that time) melody from a Lutheran hymn and harmonizes it i.e. he composes the three lower parts (alto, tenor and bass) to be heard while the soprano (the highest part) sings the hymn (Hadjeres & Pachet, 2016, p. 1).
The hard part, from the point of view of both composers/musicians and programmers trying to create a system that can generate a piece of music like this, “comes from the intricate interplay between harmony (notes sounding at the same time) and voice movements (how a single voice evolves through time),” and also requires some way to embody the unique features of each voice in the piece (Hadjeres & Pachet, 2016, p. 2). The short version: there’s a lot of math involved, worth exploring at another time. It’s…well, it’s not messy, but it is tricky, and the result they get appears to be a model that can create convincing and original Bach-like compositions difficult to differentiate from the style of ol’ Bach himself.
Here’s an example of DeepBach’s work:
As some of the commenters on the MIT post point out, it’s an interesting attempt, but DeepBach does make some serious mistakes (especially voice-leading errors) that a human composer familiar with the relevant music-theoretical practices would not have made. One thing DeepBach’s creators appear to have done reasonably well (voicing issues aside) is their handling of the Baroque use of the fermata to mark phrases — DeepBach’s composition appears to breathe fairly naturally in more or less appropriate style. Compare the DeepBach sample to some actual Bach (an example the authors offer in the paper — Wer nur den lieben Gott lässt walten, BWV 434 ):
Interestingly, one of the problems that DeepBach’s programmers were trying to overcome was a matter of expert knowledge of harmonic practice and other compositional norms — earlier attempts at Bach generators seemed to them to run aground on the difficulties of generating the rule base (which would need to be pretty detailed) and handling the fact that the results just didn’t sound much like Bach even when they did follow the rules (Hadjeres & Pachet, 2016, p. 2). DeepBach, with a less restrictive rule base that required no expert knowledge, was nonetheless able to generate results that were original (they tested for plagiarism from the learning samples) and, when tested with listeners, seem to have sounded sufficiently Bach-like to be convincing (Hadjeres & Pachet 2016, p. 8-12).
I confess that I’m not entirely sure what to make of this. While the MIT post appears to suggest that DeepBach’s ability to “fool” listeners in the study’s sample is a promising and impressive development for the use of machine learning to analyze music, I’m not convinced that this is actually what the listening test shows. While there are Turing-test relevant reasons to be at little bit impressed here, it’s not clear that being unable to tell one rule-governed composition apart from another that more or less follows the same rules, in the absence of expert knowledge, really signifies anything other than that carefully following certain rules generates predictably similar results. I’m curious about whether the same sample group could consistently tell Bach apart from a human composer who tried to follow his style carefully. I don’t mean to reduce Bach to rules here, only to point out that revealing and following those rules doesn’t necessarily get us as far as we might like it to do.
Just the same, I think there’s another side of this worth thinking about: how do we evaluate algorithmic players, and how do we code for them?
Hupfeld vs. Rabin
Have a look at and a listen to this magnificent beast of a musical machine, The Hupfeld Phonoliszt Violina:
The Violina (created by the Ludwig Hupfeld company (see also) in the early part of the 20th C.) is an orchestrion that includes a piano and three violins. The violins are played by using a rotating horsehair ring “bow” and some mechanical “fingers,” plus what is basically an whammy bar attachment on the tailpieces to create vibrato. It plays music on interchangeable paper rolls, in which the music is arranged and encoded by punching holes of various lengths in the paper. Automatic instruments of this kind often included additional tempo controls (beyond whatever changes were encoded in the music roll or cylinder), and some of them, like the later pianolas that used hand-played rolls and allowed a human musician to join the fun, could be “played” while playing in a way that allowed further personalization of the performance.
The main reason I mention the tempo controls here is that tempo changes are an obvious place to look for something like “style” in an instrument of this kind. Tempo changes, breaks, breaths, etc. can be used to tell the listener where a phrase ends, as well as creating dramatic effects for emphasis, for mood, etc. in the piece, often alongside changes in dynamics. In the video of the Violina above, while there’s not much variation in dynamics (they seem to have set the whole thing to “blare” and, well, damn the torpedoes…), whoever arranged and coded the Chopin roll seems to have tried to build in some style using controlled tempo changes. The effect is a bit strange, especially compared to the work of a human player in the kind of bel canto mode that the Chopin roll coder/arranger is trying to capture. Consider Michael Rabin’s performance of the same piece, for example:
As I listen to both performances, I find the Violina less pleasing and rather awkward, although it remains an impressive engineering achievement. The Violina can do tempo changes, and it can create vibrato using the whammy bar attachment although it is unconvincing when it does so. But why? Is this just a matter of Rabin being a “better” violinist than the Violina (inclusive of both its mechanical properties/technique and the way the roll was coded)? I’m not quite sure what that might mean. Is it an instrument quality issue? The Dutch workshop that restored the Violina in the video used recent European-built factory violins; Rabin played the “Kubelik” Guarneri del Gesu, now also known as the “Kubelik” or the “Rabin”, for a number of years, although I haven’t yet had time to find out whether or not that was the instrument he played for this particular recording. Is it a side effect of the properties of the circular “bow” and the limitations it imposes on play? Maybe, but I don’t think these concerns are especially helpful as explanatory factors.
In order to make sense of my impressions in this case, I need to be able to sort tone, technique, and tempo out here as separate dimensions, especially where the violin and piano parts interact. What I think requires the closest examination is the difference between measured, algorithmically generated tempo choices and the way in which live players make these moves together. There’s something odd about how the Violina attacks a note and makes the transition between one tempo and the next. Perhaps it’s too regular? Could a better roll coder/arranger solve this problem? Maybe, but I suspect that the need for a certain amount of mechanical regularity (bound by the paper roll system design itself) would confound the attempt. The kind of suspension in time that can happen in a live performance, even if the players are trying to keep a careful rhythm, is almost never perfectly regular — the players are responding to each other in that moment, in that performance, and while they may be really consistent about their choices (having practiced), those choices are not purely mechanical in nature. The difference in style between live player(s) and orchestrion might be easier to disguise in something like baroque phrase-marking fermata, but Chopin’s going for something quite different, and I suspect that a machine-learning system with different physical system limitations might not be able to capture it any better than the paper roll coder/arranger did.
What finally strikes me upon reflection is a possibility: could a better player make DeepBach’s efforts “sound more like Bach,” and if that were the case, what would it mean? Is the issue a matter of not having the right understanding how composition and performance are related? What would that “right” understanding look like?
- Stravinsky, for example, studied the possibilities of the pianola or player piano quite seriously, and adapted his own compositions for piano roll in addition to writing an etude specifically for the pianola. Also worth mentioning: other composers wrote experimental music specifically for the pianola and related mechanical instruments ( Paul Hindemith, among many others, in the early 1900s, and later Conlon Nancarrow from the 1940s on). At least some of the more recent composers inspired by Nancarrow specifically wrote their pianola material to be “unplayable” by human musicians.