Sonification and the Datini Letter Meta-data

Written by Adam Franklin-Lyons (History professor at Marlboro College) and Logan Davis (Research and Development Engineer at Pairity Greater Boston Area Computer Software)

Which means what exactly?  It’s like a visualization, but instead of something you see, it’s something you hear.  Let me start with a little background…

A couple of years ago, we attempted a couple of “sonifications” (rendering complex data in sound) using the metadata from the letters sent by the Datini Company in 14th and 15th century Italy. (We in this context are Adam Franklin-Lyons, professor of history at Marlboro College and Logan Davis, a skilled programming student, now alum, at Marlboro with a strong background in music and sound). The Datini data collection contains over 100,000 letters with multiple variables including origin, destination, sender, receiver, travel time, and others. There is an earlier blogpost with more about Datini and some regular old visualizations from a conference talk. We made a few preliminary experiments, often connecting individual people to a timbre and moving the pitch when that person changed locations. Here is a short version of one of our experiments where three different individuals each “inhabit” an octave of space as they move around – we made both a midi-Piano version and a synth-sound version. The sounds are built using a python sound generator and attaching certain pieces of data (in this case, the locations of three names agents of the Datini company, Boni, Tieri, and Gaddi) to numeric markers that the generator then translates into specific pitches, timbres, decay lengths, etc. What follows here are some of our thoughts about what sonification is, and how you might create your own. This post does not go into specific tools, which can be complicated, but is more of a general introduction to the idea. Hopefully in the future we will include another couple of posts that also talk about the technical side of things.

Despite not being intensively used, you are probably already familiar with the basic idea of sonification. Several well-known modern tools (the Geiger counter is the most widely cited example) use a sonic abstraction to portray data inputs that we can not otherwise sense. (for the Geiger counter, beeps or clicks indicate the quantity of radioactivity emitted. Basic metal detectors work similarly.) In contrast, researchers portray vast amounts of data in visual forms – graphs, charts, maps, videos, and so on. Perhaps this is because of the dominance of visual input for most people, perhaps not. Either way, the goals is the same: how do you take a large quantity of data and distill or organize it into a form that demonstrates patterns or meaningful structures to the person trying to understand the data?

Fields like statistics and data science teach and use visualization constantly, including using many known methods of comparing data sets, measuring variance, or testing changes over time. Researchers have also studied the reliability of different types of visualizations. For one example, visual perception can measure distance much better than area. Thus people consistently get more accurate data from bar graphs than from pie charts. The goals of sonification thus present one important question: what are types of patterns or structures in the data that would actually become clear when heard rather than seen? Are there particular types of patterns that lend themselves well to abstraction in audio rather than visually? (And I will be honest here – I have talked to at least a couple of people who do stats work who have said, “well, there probably aren’t any. Visual is almost bound to be better.” But admittedly, neither of them were particularly “auditory” people anyway – they do not, for instance, share my love of podcasts…their loss.)

Thus, the most difficult aspect is not simply duplicating what visualizations already do well – a sonification of communication practices where the volume matches the number of messages getting louder and louder over the course of a 45 second clip and then drops off more precipitously doesn’t actually communicate more than a standard bar graph. It would take less than 45 seconds to grasp the same concept in its visual form. Visualizations employ color, saturation, pattern, size, and other visual aspects for multiple variables. Combining aspects like attack and decay of notes, pitch level, and volume could potentially allow for multiple related pieces of data to become part of even a fairly simply sonic line. Like visualizations, certain forms of sound patterns will catch our attention better or provide a more accurate rendition of the data. Researchers have not studied the advantages and disadvantages of sound to the same extent, making these questions ripe for exploration.

So what are some examples? There is at least one professional group that has been dedicated to this research for a number of years: The International Community for Auditory Display. Their website has a number of useful links and studies (look particularly at the examples). Although these are not the most recent, there is a good handbook from 2011 and a review article from 2005 that describe some of the successes and failures of sonification.  Many of their examples and suggestions recommend reducing the quantity of data or not overloading the auditory output, much as you would not want to draw thousands of lines of data on a single graph. However, at least a couple of recent experiments have moved towards methods of including very large quantities of data. While promotional in nature, here is a video demonstrating the concept as used by Robert Alexander to help NASA look at solar wind data.

So, how to proceed? First, the work of audification does not escape from the day to day tasks of data science, especially the normalization of data. If your audification cannot reasonably handle minor syntactic differences in data (ie: “PRATO” vs “prato” vs “Prato, Italy”), then your ability to leverage your dataset will be limited, just as it would with visualizations. The work to normalize and the choices you make in the normalization may be made far more efficient with a little leg work in the beginning.

Like visualizations, sonifications should be tailored to the data-set at hand. Then you will have to make choices about which aspects of sound you will relate to which data points. This is the main intellectual question of sonification. What are we voicing? What is time representing? What does timbre (or voice – different wave forms) give us here? Timbre and pitch nicely convey proper nouns and verbs in data-sets. Timbre has a far more accessible (articulated) range of possible expressions for data with higher dimensions (though for a particularly trained ear, micro-tonalism may erase a great deal of that advantage). Decay, in my experience can contain interesting metadata, such as confidence or freshness of a fact; the action of the tone relates to how concretely we know something in the data.

After cleaning, pitch, timbre, decay assignments, etc., you listen. Much of what you will find sonification good for is finding hot-spots in data sets. What stands out? Are there motifs or harmonic patterns that seem especially prevalent? Some of these questions, obviously, will relate to how the data has been coded, but every time we have tried this, there are also at least a few surprising elements. And finally, is it beautiful? (A question becoming more popular in visualization circles, also…) Particularly when intersecting with some of the wild data-set available today, what is the sound world created? Are there tweaks to the encoding that will both make data observations clearer while also making the sound more enjoyable to listen to? When creating an auditory representation of data, you are quite literally choosing what parts are worth hearing.

Mapping the Communication Network of the Datini Company

[This project had significant technical assistance from Pavel Oleinikov at Wesleyan University and Logan Davis, Marlboro class of 2017.]

I have been working with the Datini metadata and letter collection for about a year, but have had more ideas and questions rather than actual analyses and observations.  Just this semester, often in discussions with students I have started to hit on productive lines of analysis within the data.  At the Social Science History Association’s annual meeting in Montreal (Nov. 2-5, 2017), I presented a few preliminary observations – the charts and notes below are all taken from the talk.  First off, if you are not familiar with the Datini company’s letters, the project description includes a short introduction with a number of links to the archive itself and some further readings.

What is a “Normal” Travel Time?

One of the persistent difficulties with medieval travel and communication derives from the fact that baselines are very hard to create.  “Normal” travel, even along short distances, was highly irregular.  Moderate trips could take days or weeks depending on weather, brigands, political difficulties, or any number of other delays.  In Fernand Braudel’s The Mediterranean, he described this as a continuing feature even after 1500: “The essential point to note here is this very variety, the wide range of times taken to travel the same journey: it is a structural feature of the century.” (Braudel, 360).  One of the best advantages of the large body of letters in the Datini collection is that it allows for broad generalizations about travel time and what we might count as “normal” communication.

Attempting to create a sense of average travel times, I selected a series of city pairs that have a relatively high volume of communication – generally at least 500 letters.  This allowed for the errors and vicissitudes present in the data (letters with impossible travel times – both too short and too long, negative travel times, journeys that were simply unexpected, etc.) to be outweighed by useful data, making generalizations possible.  I first noticed that the graphs of travel times between cities tended to have either a relatively tight set of times with a clear center or a messier, more irregular set of times.  The graphs below are two good examples of each type: the trip between Barcelona and Valencia (≈350km), in which the vast majority of trips took six or seven days; some took five or eight, but many of the trips that fell outside this narrow window probably involved some form of delay or even an error in documentation.  On the other hand, the trip between Palma de Mallorca and Florence had large numbers of letters taking anywhere from thirty to forty-five days.  The trip was somewhat longer (≈1000km), but significantly less reliable.

Measuring Reliability with Standard Deviation

My first thought was that this might simply be a product of distance.  To test this, I needed a generalized measure of multiple comparable trips.  Using the standard deviation gave a rough number of days within which the majority of the trips occurred: a higher standard deviation means a less reliable journey (for the above examples, Barcelona/Valencia has a standard deviation of just over 4, Palma/Florence is over 12 – so far so good).  I then produced a map of all of the high-frequency travel routes (this came out to roughly a dozen examples with 500-1000 letters, another dozen from 1000-2000 and a final dozen from 2000-5000).  I color coded the routes by high, medium, and low reliability (green is a standard deviation under 5, blue is 5-10, red is over 10.)

While distance seems to play a role, the larger distinction appears to be between land and sea routes: land routes are on the whole more reliable and present much less variability (a lower standard deviation) than any route which required a ship.  In some respects, this is difficult to measure, because many journeys (Valencia to Marseille, Pisa to Genoa, etc.) could be accomplished either on ship or on land.  However, it is quite notable that trips that require a ship (all communication with Palma de Mallorca, for example) routinely have a higher standard deviation than equivalent or even longer-distance trips accomplished on land.  This map also confirms one of Federigo Melis’s own earlier arguments: that Datini’s representatives largely separated the sending of information from the sending of goods.  Even the long trip from Bruges to Barcelona happened almost entirely over land, meaning the information the couriers carried was worth the cost of a completely separate trip unconnected to the Italian galleys routinely sent around Spain to the low countries and England for wool.

Graphing Specific Travel Routes

To get a better sense of what the standard deviation was telling me about these various trips, I chose a representative set of cities to graph all of their trips over time – a more granular visualization than the full map.  This would indicate both if there was any change in the speed and method of travel over time, but also provide a different way to visualize the reliability of land travel.  The following three graphs show the travel times of all known letters sent from Barcelona to Valencia, from Bruges to Barcelona, and from Palma de Mallorca to Florence.







Previous research has already confirmed that Barcelona to Valencia is certainly land based: it is fast, highly consistent, and has a strong “floor” to the speed, meaning that most trips not only happened quickly, but that the trips took close to the minimum possible amount of time.  Palma to Florence – necessarily a sea trip – has no clear floor and only a hazy average time.  No two sea trips are alike and the average is a good bit higher than the possible fastest trips.  Finally, the graph of travel from Bruges to Barcelona cements the clear impression that most communication came by land.  The graph has a strong floor with a fairly narrow band of travel times, similar to the shape of the Barcelona toValencia graph.  The graphs also indicate that the travel times of each journey stayed relatively consistent over the (relatively short – 1370-1415) time period of the Datini letters – so nothing notable to report there.

And finally: Speed…sort of

The last visualization I created attempts to get a sense of speed along these many routes.  Measuring time is straight forward enough: I used the median travel time for each route.  Distance, however, is an entirely different problem.  For land routes: did people travel the shortest distance all the time?  Surely they did not travel as the crow flies.  Did they travel more likely through known cities even if the route was longer?  Were there portions of river travel interspersed with travel on foot?  For trips in the Mediterranean: it is impossible to know if they followed the coastline or sailed across open seas.  Did the ship stop at any intermediary ports to take on supplies?  Did this add two or three days to the trip that could change our estimation of the speed of travel?  Difficulties abound – most of which medievalists are quite aware of.

Despite these questions, I estimated rough distances between all of the cities, taking into consideration whether I thought the trip was on foot or on ship (in part derived from the standard deviation score compared with the distance.)  The results demonstrate that travelers on foot were not only more reliable, but generally faster than messages sent on ship.  While there are surely extraordinary voyages where a ship manages a very high speed, in the aggregate of thousands of journeys they are routinely slower than land based travel.  Additionally, even across long distances (again, note Bruges to Barcelona), the travel times imply a fast clip (40km+ per day) over many days.  There are a few outliers on the map likely caused by two cities at a distance apart that represents a tipping point.  ie: if location A is 50 kilometers from location B, it’s highly possible that many letters are recorded as taking two days – a speed of 25 kilometers per day (quite slow.)  However, it is just as likely that the courier traveled at 40 kilometers per day and arrived early on the morning of the second day – a fact which is almost never recorded in the documents.  These problems get smoothed out at longer distances, so the outliers tend to be on short routes.

Moving Forward

The most notable commentary to come from the conference is that I neglected to investigate the role of seasonality – Mediterranean shipping is notoriously seasonal because of prevailing winds and winter storms.  The data initially suggests there is less seasonality than expected, but I do not have very solid analyses yet.  A couple of brief looks at the data suggest that the different travel in summer and winter did not influence land-based travel, but did slow down ocean going travel.  A further question follows from the seasonality question: if ship travel slowed down significantly in the winter, did this influence the number of letters sent in winter?  Did certain routes that could use either ships or land lean towards land in the winter in response to harder ocean crossings?  If we split the summer and winter trips of an ocean going voyage apart and graph them separately, do two clearer time bands emerge?

Beyond this specific question, there are a couple other ways to move forward.  The first is to figure out a better way to model and illustrate or visualize the structure of the datini communication.  This will probably involve breaking the usual planimetric accuracy of many thematic maps and creating some form of topological map (think the London Tube map) or cartogram that makes routes more important than the angles and structures of the coastline.  And second, I hope to more thoroughly review Melis’ work on communication in the Datini company and begin to get a sense of who was actually doing the work of moving information around – a prosopography of messengers, if you will.  The mercantile communication of the Datini company will make a compelling comparison to the urban and royal communication systems (which are much less studied than Datini) that make up the “Couriers in the Crown of Aragon” project at the Travelers Lab.