Constantinopolitana: A Database of East Rome

Constantinopolitana: A Database of East Rome is a relational database of everything known about Constantinople. The objects, people, buildings, and events are spatially and temporally linked together to create what is essentially a digital map of Constantinople using the research environment nodgoat. To conceptualize potential ways to link the data, the categories places, people, literature, manuscripts, statues, and seals were created. The places category was based on a previous iteration of this project, Constantinople as Palimpsest (CPal), which was originally created using ArcGIS before being transferred to nodegoat [see below].

The places group is concerned with the geography, structures and cisterns of Constantinople. We were tasked with creating a geographic visualization of the Great Palace of Constantinople, the Hagia Sophia and other notable landmarks from medieval Constantinople overlaid atop modern Istanbul. We utilized various sources in order to best approximate sizes and locations, and were able to piece together a map on which the other aspects of the project can be positioned. Then, through the GeoJSON format were able to map the coordinates which could then be uploaded to the database. By the end of the semester, we were able to initiate the first linkage between the other aspects of the project by connecting to the statues database, thereby allowing for a spatial visualization of the statue’s locations. The greatest challenge in the project arose from the limited degree of information regarding the exact position of structures non existent in the modern day.

We used “Buildings.Spaces” as the main object in nodegoat, which captures the information through name, notes, references, and within buildings. By linking it to the sub-object in Statues.Statues, we were able to make the relationship between statues and buildings.

Over the course of this semester, we plan an expansion and refinement of the existent visualization, and eventually the beginnings of the culmination of our mission through connections to the databases which constitute the project. This expansion of the project will be done in partnership and under the oversight of professor Alice McMicheal from Michigan State University.

by Ruishi Wang and Will Markowitz


The “People” object in the database is made up from references to other prosopographical databases covering Constantinople, because we are not doing our own prosopographical research for this prototype, as the goal is to combine existing data. To start, we used data from the PMBZ(Prosopography of the Middle Byzantine Period Online). We began with a list of people received from Ivan Maric, a Byzantine Postdoctoral Research Fellow from the Seeger Center for Hellenic Studies. Those people will be used to build the events that Maric studies into the database. However, to build connections between the different categories, we also needed to find and connect people that other objects referenced. For example, Manuscripts object might have a copyist, and the literature object might have an author. To figure out how we were going to relate different objects in the database, we needed people from each category. Unfortunately the PMBZ only covers the years 641-1025 AD, so we also used four other databases: the PBW(Prosopography of the Byzantine World), Pinakes, VIAF (The Virtual International Authority File), and the DO(Dumbarton Oaks) Byzantine Seals Collection. Cross referencing different scholarly mentions of these people and entering into the database (there are 90 entries as of January 2024) stretched into winter break. The next step for the People category is to add a system of occupation, and at that point it will be pretty much finished for the purposes of the prototype stage.

by Alex Williams


For literature, the first step was to identify an initial group of documents that would be the “test run.” This initial group was based on the manuscript project’s “Dated Lake Manuscripts” spreadsheet. I took the Pinakes URL for each manuscript and created a new spreadsheet that would become the initial repository for the data from Pinakes. In Pinakes (an online database of pre-16th century Greek texts), the “content” section at the bottom of each manuscript’s page was copied into the aforementioned spreadsheet. The Pinakes website is entirely in French, so the automatic Google translate feature on Google Chrome was utilized. Once all data from Pinakes was found, the works were searched for in the Thesaurus Linguae Graecae “TLG” (a digital collection of all surviving Greek literature from antiquity to the present). This led to challenges as the names of both the works and the authors were largely not standardized, as well as some of the titles of the works in Pinakes were not specific enough to determine the match in the TLG. This spreadsheet was then sent to the outside researcher working on this project, Rue Taylor, for assistance in figuring out the inconsistencies. While waiting for Rue, I uploaded the spreadsheet to nodegoat, creating a single data model, “Lit.Literature” that included the title in the original language (either Greek or Latin), the title in English, the author’s name according to Pinakes/TLG, the standardized author’s name (usually found through searching the Pinakes/TLG name on Google), the TLG reference number, and the URLs from both Pinakes and TLG. Not all of these were filled out for each work, however. I think the biggest challenge was the beginning of the project, when I was still working out what made literature and manuscripts different projects. I eventually decided to just get the information and determine the best way to present it on nodegoat and to not focus as much on the abstract concepts, which made the project a lot easier to work with.

by Olivia Keyes


For manuscripts we were creating a dataset to organize three data types regarding manuscripts. The first is the descriptive information regarding the manuscript itself, specifically the date, call number, and primary contents, in short the existence of the manuscript itself. These manuscripts are identified in their sources by a lake number, collected in the pinakes database which can be found through a pinakes URI. The second data type is the location, often the institution, that these manuscripts belong to. As well as the date of creation of these locations, usually religious buildings, which helps us identify a period that a manuscript belongs to, the latitude and longitude are also included. Finally, the manuscript copyist makes up a set of information, identified by their Scribe Name, as in the name attached to the manuscript which they have copied. These three elements to the manuscripts describe the intellectual links between people, institutions, period and geography. The next step to be taken in this area is to integrate “copyists” as a dataset, with “people” as well as “manuscript places” with “buildings” in general, in order to give a more holistic view of the connection between that manuscripts make between people, period and place.

by Charlotte Seal


CDER has been looking to create a repository of important statues from Constantinople. In NodeGoat, we have created data models that express relationships between statues and their specific locations within the city. We have been using Sarah Bassett’s The Urban Image of Late Antique Constantinople for entering descriptions about the statues. These include commentary from primary sources and from Bassett herself, who often refers to other scholars in her analysis. Our data model takes note of this difference between primary and scholarly sources in Bassett’s text. The statues are located in various Constantinople landmarks – such as the Hippodrome or Chalke Gate. This has allowed us to seamlessly relate our work to another project that specifically deals with historic spaces in Constantinople.

by Zaray Dewan


We organized the CDER dataset to include most of the elements in the already existing Byzantine seals database at Dumbarton Oaks, organized by Professor Jonathon Shea, who has also been advising this portion of the project. We started our object dataset by including most of the DO database’s information, but as the seals’ connection to the “People” and “Places” objects became evident we focused on the descriptions that linked the seals to those categories. Most seals carry inscriptions and images featuring people holding offices of various kinds, as they were often used in a manner of authentication (of documents, identity, etc.). We began inputting the seals by filtering those which came from offices located within the Great Palace (as suggested by Prof. Shea) and then cross referencing the people depicted to the PMBZ (Prosopography of the Middle Byzantine Period Online) database, as the Dumbarton Oaks database has started to do this with some seals but not the majority. To continue organizing the dataset, the eventual category of “Offices” will be useful in pinpointing more exact locations of offices within the Great Palace and beyond, and we will be expanding to a catalog of coins as well.

by Arushi Khare

Data Insights: A Student Speaker Series

On April 2, 2024, the Travelers Lab had the honor of presenting to esteemed professors and members of the College of Letters (COL) and QAC departments. Our presentation centered on summarizing the CDER project, providing an opportunity to showcase our research achievements and outline our future objectives.

Olivia began with an overview of the project and its previous iterations, and discussed the issues and challenges with the ways that the data was initially presented, focusing on the geographical inaccuracies and the lack of readability for a wide audience. Will provided context for what Nodegoat is, a relational database that allows for temporal and spatial connections between objects.

2024.04.02 - TLab at QAC 02
2024.04.02 - TLab at QAC 04
2024.04.02 - TLab at QAC 03

To show how this works for the CDER project, a screen recording of last semester’s data model was played. Moving on to the data itself, Zaray pointed out how this project is, at this point in time, creating data from documents and other databases that aren’t necessarily quantifiable in a table. We are also approaching the data input process from a historical perspective, focusing on what historians would be interested in from each category of data. Ruishi then spoke about how we see connections between the data within nodegoat, and showed this [see below] screenshot of the connection visualizer in nodegoat.

To conclude the presentation, Alex explained the interactions between history and data collection, noting that one of the goals of this project is to create a database that can serve as a model for data usage in a historical context and presentation, and Daniel Feldman, part of the Chronicles: Text to Data project team, alongside Arla Hoxha, Yinka Vaughn, and Diana Tran, provided a comprehensive update on their ongoing project. He discussed their utilization of ArcGIS for analyzin g historical events by extracting information from textual references such as events, individuals, and chronicles.

By Olivia Keyes
Class of 2025

Travelers Lab Presents at the Fall 2020 New England Historical Association Meetings

At the Fall 2020 meeting of the New England Historical Association, held virtually, members of the lab provided the papers for a panel called, “Traveling in the Middle Ages: Using Digital Methods and Spatial Analysis for Historical Research.” Chaired by Ella Howard of the Wentworth Institute of Technology in Boston, it was notable for featuring a co-written paper by Wesleyan and Lab alumnus Connor Cobb ’18 as well as three other lab members. The papers were “Women at the Common Law: Travel and Gender in Thirteenth-Century English Courts”
by Gary Shaw and Connor Cobb, Wesleyan University, “The Camino de Santiago: Student Researchers and Creating a Database for Spatial Analysis,” by Sean Perrone of Saint Anselm College, who organized the panel and is in fact the President of the New England Historical Association, and “Medieval Travel as a Big Data Problem,” by Adam Franklin-Lyons, who is now notably based at Emerson College in Boston. All told, the session exemplified the lab’s collaborative character, the goal of integrating students (and former students) into our work, and our interest in combining innovation in pedagogy, interest in the theoretical and methodological challenges of such work, and an ongoing commitment to classical historical scholarship.

Contextualizing COVID-19 in the NY City Region: a Comparative Approach

Text and data visualizations by Jesse W. Torgerson; data collection by Rachel Chung (’20) and Ezra Kohn (’20); mapping by Grant van Inwegen (’20) and Jesse Simmons (’21) with assistance from Kim Diver.

Please see our introductory post. The Traveler’s Lab at Wesleyan University has applied our movement-focused approach to medieval history to contextualizing New York City’s tragic status as the “epicenter”of the global COVID-19 pandemic. 

The COVID-19 pandemic has created a tidal wave of data.” The struggle for scientists and the public at large is not how to find this data but how to contextualize and understand it. A common answer is comparison, but comparison is methodologically tricky. Our use of comparison is informed by our lab’s engagement with the Practices of Comparing project at Bielefeld University (see Johannes Grave’s reflection on this “incomparable” pandemic).

In this post we work steadily through a series of comparisons to draw a sobering conclusion. Even as New York City and the “Tri-State” region immediately surrounding it (denoted in the map below) have seen significant and steady declines in the day-by-day tallies of new COVID-19 positive tests and fatalities, the region’s levels of infection still remain at or above the peak levels found in the two countries most deeply hit by COVID-19: Italy and Spain. We conclude that it would be wise to take these comparisons into consideration as the states of New York, Connecticut, New Jersey, and Pennsylvania each consider imminent re-openings.

Click for interactive map.

Our previous post on COVID-19 in the NYC Commuting Region focused on our creation of the map visualization linked above. We described how we chose to collect and analyze COVID-19 testing and confirmed fatality data from a region defined by patterns of travel and movement rather than political borders. Our comparative map presented relative infection rates in each county to neighboring counties, and between past and present.

Our second approach, analyzed here, identified a means of comparing our region as a whole to outbreaks of COVID-19 in other regions that might be considered similar. We identified the nearly-contemporary COVID-19 outbreaks in Italy and Spain as the best comparisons (or, comparata).

In what follows we first present visualizations of the data that is behind our map. We will then suggest how this data might become coherent and comprehensible by comparing it to the same data from Spain and Italy. Finally we note that differences in testing rates and methods for counting fatalities make all these data technically incommensurate. Nevertheless, historical comparisons are ultimately always associative rather than predicative: historians cannot ever achieve the perfect tertio comparationis demanded by Early Modern logicians. Nevertheless, comparisons serve the historical goal of providing us with context, desperately needed to make sense of a present that seems unprecedented.

Visualizing Testing and Fatality Data from the NYC Commuting Region

We should first define our region of study. Instead of conforming to political or natural geographic boundaries, we defined our region by human travel. Beginning with a map of freeways and rail lines, we identified a list of counties within an approximately 2-hour drive or train ride from New York City as the outer “commuting region” of the metropolis. Since the virus spreads through human movement and contact, rather than according to any other factor, this seemed to us the only reasonable means of defining a region in which to study the impact of a virus on a population. The list of counties this encompassed can be found below, in our footnotes. In total the region we have defined as the New York City Commuting Region consists of 33,929,534 residents within 28,893.3 mi2.

Our first visualization presents a cumulative tally by day of the total number of positive COVID-19 tests, colored by the state to which each county in our region belongs.

Click image for an interactive graph.

This visualization reveals apparently positive news. The use of stay-at-home and social-distancing measures to “flatten the curve” is working: the steep climb of the bars in the three weeks between April 4 and April 25 (from 150k to 400k) has notably slowed in the three weeks since (from 400k to 510k).

This same data rearranged to show only each day’s newly reported cases shows an even more encouraging trend:

Click image for an interactive graph.

From a peak of nearly 23,000 new cases on the single day of April 5, on May 9 and May 10 our region saw two successive days with fewer than 5,000 new cases. Nevertheless, within this positive trend, comparing the number of cases between the New York and New Jersey counties indicates that the decrease is largely driven by the success of New York counties, while Connecticut, Pennsylvania, and especially New Jersey are still discovering relatively similar numbers of new cases each day. Update: from April 26, New Jersey’s counties have matched or surpassed those we are studying in New York in terms of confirmed daily new COVID-19 cases (see here).  As a whole 1.6% of the region’s population has tested positive for COVID-19 (543k out of 33.9m).

In other words, this data would suggest that the success of New York City proper in limiting the spread of COVID-19 may be masking the fact that the region as a whole remains deeply infected. This conclusion is backed up by the more absolute numbers of COVID-19 attributed fatalities.

Click image for an interactive graph.

The counties in New York State have seen a dramatic decrease in what was for weeks a horrific number of new daily fatalities. Nevertheless, the number of daily fatalities reported by New Jersey continues at relatively a steady pace (see chart). If we consider our integrated region as a unit, the situation remains much more serious than when we imagine states as isolated from one another. As of May 10, 2020, the counties of our designated region have confirmed 33,442 COVID-19 fatalities. To contextualize where this places us in relation to other regions devastated by the virus, we turn to our comparisons with Italy and Spain.

Visualizing Comparisons between the NYC Commuting Region, Italy, and Spain

Without a comparative context it is impossible to understand what the numbers we have discussed mean. Even though we can identify a downward regional trend in terms of both new cases per day, and in terms of fatalities, where are we compared to other regions that have experienced a large portion of the population testing positive for COVID-19?

To try to answer that question we will compare the NYC commuting region (population 33.9m) to two others: Italy (60.4m) and Spain (46.9m).

These regions are most comparable for all having suffered dramatically from a staggering number of COVID-19 infections. They are significantly different in terms of land mass, population, and population density, and so all comparisons must be discussed with at least this caveat in mind (leaving aside all discussion of differences in health care systems, demographics, etc.).

The New York City commuting region we have focused on is both smaller and more densely populated than Italy, and especially more so than Spain. Our region has a total population of 33.9 million within 28,893 mi2, or 1,174 persons/mi2. Spain is far less densely populated with approximately 240 persons per each of its 195,363 mi2. Even the most densely populated region of these European countries—Northern Italy—only has 599 persons/mi2 (27.8 million persons within 46,430 mi2). .

Thus, the primary reason for comparing these regions is not their demographic comparability, but for the need to demonstrate just how devastatingly the COVID-19 virus has set into the population of the New York City region. COVID-19 has had a devastating impact on the two entire countries of Italy and Spain, both larger than our region in terms of population and land mass. Nevertheless, the absolute reported numbers of infections and fatalities from these countries pale in comparison to those from the much smaller region surrounding New York City.

Click image for an interactive graph.

While the downward trend of new cases for the New York region is still evident, it is only in the previous week (since May 4) that the number of new daily cases has consistently dipped below the peak number of cases per day reported by Spain (which is, in turn, even higher than Italy’s peak). Italy and Spain have only seen peaks of daily new cases hovering around 5,000 or 8,000 per day, respectively. On the other hand our NYC region has seen many days of over 15,000 new cases per day, only consistently maintaining numbers below 8,000 since May 2. Our region’s numbers of new daily cases still remain beyond the levels of Italy’s peak. This is a staggering comparison when we consider that New York State is beginning to “reopen” on May 15.

These much larger numbers of new cases are certainly connected to the significantly larger number of COVID-19 fatalities that the New York City commuting region has suffered.

Click image for an interactive graph.

On ten different days, our region has seen over one thousand fatalities in a single day. Neither Italy nor Spain, with their much larger populations, ever experienced a single day with this level of devastation from COVID-19. The cumulative result is a staggering and sobering comparison.

Click image for an interactive graph.

As time moves on, the New York City Region’s smaller geography, much more dense and interconnected population, the much deeper degree to which it has become infected, and its own looser social-distancing approach to “lockdown” will likely mean the spread of the virus through the population will likely continue to behave uniquely.

Even as reports have for weeks declared that New York State alone has more reported cases than any country (besides the United States itself), and that the wider New York City area is the current global epicenter of the spread of COVID-19, the scope of the spread of the disease in the region immediately surrounding New York City remains unrecognized and under-reported.

Reporting continues to fall along traditional state-by-state divisions, but as demonstrated in our map and the visualizations above the spread of COVID-19 through counties that can be considered within “commuting distance” of New York City (~2 hours) reveals quite clearly that tallying and reporting cases and fatalities according to state borders is an inhibition to understanding the true spread of this virus.

We have presented our studies not to strike fear, but to be an aid to productive action. We hope that better understanding of the data we have collected concerning the ongoing spread of COVID-19 will result in soberness and seriousness in addressing how to slow and stop the toll which COVID-19 has and will continue to wreak in our area. We urge our readers to action, and compassion.

Conclusions: the Problems with Comparison, and its Value

Positive tests seem to be perfectly commensurate: different countries use different tests, but these have not been reported to be significantly more or less effective in producing accurate results. On the other hand, the sums of positive tests are not comparable because Spain and Italy appear to have a postive test rate of approximately 8-12%, whereas the counties in our study have positive testing rates of anywhere from 23% to 45%. This means that the population of the NYC commuting region is still not being tested at a high enough rate to determine what comparing our total numbers of positive tests to either Italy or Spain demonstrates. If the states of New York, New Jersey, Connecticut, or Pennsylvania were administering enough tests to have only an 8-12% positive rate, how much higher would our total number of positive tests be than in those countries?

Conversely, comparing fatality rates seems to be initially problematic. Even within the four US states that we drew our data from, fatalities are reported differently. New York and New Jersey report COVID-19 fatalities only when the individual received a positive COVID-19 test prior to dying. Connecticut counts these situations as well as when the death certificate lists COVID-19, which can be a discretionary decision. Pennsylvania has seen its fatality counts swing down or up as it revisits its approach to reporting. The CDC continues to allow states discretion in making these decisions. Of course, the same problems apply to the recording of data from Italy and Spain. Nevertheless, it is all but certain that by any reasonable method of attributing fatalities to COVID-19, we are currently in no danger of overcounting the number of deaths this virus has brought.

This is supported by comparing our two sets of data to each other. Recent studies indicate that the “true” fatality rate of COVID-19 is around 1.3% (counting all those infected, including asymptomatic cases). If we consider the COVID-19 attributed fatality rate per positive COVID-19 test of the counties we have studied, we can get an idea of whether each county truly knows how deeply infected their own population is. For instance, in the Pennsylvania counties we have studied, a 2.2% fatality rate per positive test indicates that there are likely at least twice as many active cases as have been identified. On the other end of the spectrum, New Jersey’s 6.7% fatality rate indicates that there are likely five times as many active cases as have been recorded.

Comparison cannot give us an absolute or exact explanation of what is happening around us. Nevertheless, this is not the goal of comparison. Comparison gives us a relative idea of what we are experiencing, or what has happened. In this case, working through several comparisons seems to make it quite clear that the population of the New York City region is as yet much more deeply infected with COVID-19 than Italy, Spain, or any other comparably sized population in the world. Furthermore, our practice of comparison has also made it clear that even this sobering statement is not sober enough. Until we have a much lower percentage of positive to negative COVID-19 test results, we will not know how deeply the population surrounding New York City is infected. There is no perfect tool, but comparison can help us to understand how deeply the virus has established itself, and to formulate what that might mean for current behaviors, future planning, and not least of all to speak to our desire to contextualize what we are all experiencing: to mourn our losses, to commemorate those who have fallen, and to work together to save those we can.

Footnotes: Sources and Notes

Since March 24th we have collaboratively collected our data of county-by-county confirmed COVID-19 cases on a daily basis from the official updates provided by the Departments of Health of  New York, New Jersey, Connecticut, and Pennsylvania. We periodically check for errors by back-checking our data against the data published by the New York Times, and by the crowd-sourced reporting of 1Point3Acres. Data for positive tests and fatalities for Italy and Spain were drawn from Our data on the population and size of each US county was derived from the US Census data contained in the Esri feature layer “USA Counties.”

Notes on idiosyncrasies in the data.

From March 1-March 16 our map records the total number of cases in New York City, as reported by New York State, without making distinctions for the five counties (boroughs) within the City: Bronx, Kings (Brooklyn), New York (Manhattan), Queens, Richmond (Staten Island). For these dates, we noted all NYC fatalities and cases in New York County (Manhattan).

For March 17 to April 5 we used the data provided by New York City Health to distinguish the confirmed cases for each of the five New York City counties. This meant, however, that for this range of dates, we drew numbers for these five counties with a different time stamp than the published state data on the number of confirmed cases in NYC as a whole. We decided that the greater articulation achieved was more valuable than perfectly matching numbers.

From April 6 on, New York State began including distinct data for the five New York City counties (rather than grouping them all together as “New York City”), and we thus returned to using this state data for our tabulations rather than the New York City data.

For April 16 and April 17, New York State produced no fatality counts , except for a total number of fatalities from NYC for each of those days. In keeping with the practice noted above, we added these numbers to New York County (Manhattan). This results in wildly varying county-specific statistics for New York County from April 15 through April 18.

List of counties studied:
New York State: Bronx, Dutchess, Kings (Brooklyn), New York (Manhattan), Nassau, Orange, Putnam, Queens, Richmond (Staten Island), Rockland, Suffolk, Sullivan, Ulster, Westchester
New Jersey: All
Connecticut: All
Pennsylvania: Berks, Bucks, Carbon, Chester, Dauphin, Delaware, Lackawanna, Lancaster, Lebanon, Lehigh, Luzerne, Monroe, Montgomery, Northampton, Philadelphia, Pike, Schuylkill, Wayne

COVID-19 in the NYC Commuting Region: A Travelers’ Lab Study

Updated: April 30, 2020. Follow-up post (5/12/2020) here.

by Rachel Chung, Grant van Inwegen, Ezra Kohn, Jesse Simmons, and Jesse W. Torgerson

The Traveler’s Lab at Wesleyan University studies the movement of people and objects during the middle ages. Our focus on travel brings overlooked and unrecognized realities to standard historical narratives. In mid-March, as Wesleyan’s campus shut down in response to the COVID-19 pandemic, the Theophanes Project began applying our movement-focused approach to the present crisis. What follows are some results of our collaborative work. Data collection by Rachel Chung (’20), Ezra Kohn (’20), and Prof. Jesse Torgerson. Mapping by Grant van Inwegen (’20) and Jesse Simmons (’21) with assistance from Prof. Kim Diver.

New York City’s Battle with COVID-19 is not New York City’s Battle Alone

Click the image to enter our visualization of the regional context of the spread of COVID-19 around New York City. Following is an explanation and a brief guide.

As early as March 22 New York City was declared an epicenter of the global COVID-19 pandemic. However, reporting has almost entirely covered New York City in isolation from its surrounding region, or at best in the context of New York state as a whole. All other “epicenters” of the pandemic (Hubei Province, Northern Italy, Central and Northern Spain) are studied and reported on as regions, not single cities.

The numbers of cases and fatalities from New York City alone give an inaccurate picture of the scope and nature of the region’s affliction. While New York City may be the center, it is the travel region immediately surrounding the city that provides the true context of how COVID-19 has spread and is spreading to, and from, the City. We created a regional context for NYC based on human movement rather than artificial political borders. Ignoring the borders of states entirely, we picked counties that were within an approximately 2-hour driving time from NYC, incorporating counties from not only New York, but New Jersey, Connecticut, and Pennsylvania (see our footnotes below for a full list of counties depicted).

The Traveler’s Lab COVID-19 map addresses how this pandemic has settled into the entire commuting region around New York City by providing three aspects either unavailable or unclear in visualizations such as the Johns Hopkins, UCONN, or NY Times maps:
(1) a time slider that re-presents the development of the crisis since March 1, 2020
(2) counties shaded by percentage of the population tested positive for COVID-19 rather than bubbles representing total case counts
(3) a demonstration of how this pandemic has actively spread along lines of travel rather than according to political boundaries

Traveler’s Lab research projects start with the premise that cities are never isolated, static collections of residents. Cities are centers of exchange and travel networks. Informed by geographic and historical methods, this approach provides a truer context for human interactions. Our geographic visualization of the spread of COVID-19 thus began with road and rail networks. Onto these we overlaid official county-by-county data on confirmed COVID-19 cases. Since our approach is also historical rather than journalistic our “time slider” map preserves data from the first reported case in New York City on March 1, 2020, up to the present (updated every three days).

Features of the Traveler’s Lab COVID-19 Map

Since our methodology is based on human travel rather than political regions, we depicted major freeways, highways, and commuter rail lines out of New York City rather than state borders. Ideally we would have shaded infection rates by municipality rather than county, but such specificity surpassed our data collection abilities. To avoid an overly-cluttered map we only identified a few cities for purposes of orientation. Please see our metadata here.

The dynamic features of the map permit historical and comparative study of the counties in this region. One of the notable conclusions of our comparison is that while the hospitals of New York City are certainly the most overwhelmed with the daily influx of a massive total number of newly infected patients, as of April 24th the counties of Rockland, Richmond, Westchester, and Nassau are in fact more deeply afflicted when considering infection rates relative to population. Similarly, Hudson (NJ), Passaic (NJ), Union (NJ), and Orange (NY) Counties are as deeply afflicted as the communities of Queens and Staten Island.

Users may pursue their own comparative questions about the day-by-day spread of the pandemic through the region by clicking “play” on the time slider (bottom right).

To toggle to a particular day, click pause and either click the forward or backwards buttons, or slide the cursor forward or backward as desired. To change how quickly each new day loads, click the “1x” speed toggle.

In addition to visualizing the spread of COVID-19 in the NYC region since March 1, users may also study the daily data for every county with infections present.

When a particular county is clicked for any particular day (as Nassau for April 24, above), users will see: (a) new cases reported in that county on that day; (b) total cases for that county since March 1; (c) number of residents with confirmed COVID-19 per 10,000 residents; (d) total fatalaties for that county since March 1; (e) the percentage of confirmed cases that have resulted in fatalities; (f) the county’s population as of 2019.

Addressing the Spread of COVID-19 by Facilitating a Collaborative Regional Approach

COVID-19 is spreading through our single population rather than through distinct states. Despite the best intentions of local representatives, governors, and media outlets, the extent to which the regions immediately surrounding New York City are all enveloped in a single crisis is not yet fully recognized. Governors Andrew Cuomo of New York, Ned Lamont of Connecticut, Phil Murphy of New Jersey, and Tom Wolfe of Pennsylvania have all made extensive efforts to sustain dialogue and work collaboratively. Even so, we believe it remains urgent to continue to develop and sustain unified policies and plans that are systematically enforced across the region.

We hope that our visualization can contribute to enhancing the focus of collaborative efforts by making the inter-connectedness of our localities evident, especially as we move into a period where regions begin considering how to “reopen.” By presenting historic data rather than only live data we also hope to provide our fellow residents the ability to maintain a historical perspective on how this situation has developed, and continues to develop. Finally, we hope the context of an accurate bigger picture might serve as evidence of the need for collaboration and mutual aid in other regions where the crisis is only beginning to unfold.

Footnotes: Sources and Notes

Since March 24th we have collaboratively collected our data of county-by-county confirmed COVID-19 cases on a daily basis from the official updates provided by the Departments of Health of  New York, New Jersey, Connecticut, and Pennsylvania. We periodically check for errors by back-checking our data against the data published by the New York Times, and by the crowd-sourced reporting of 1Point3Acres. Our data on the population and size of each county was derived from the US Census data contained in the Esri feature layer “USA Counties.”

Notes on idiosyncrasies in the data.

From March 1-March 16 our map records the total number of cases in New York City, as reported by New York State, without making distinctions for the five counties (boroughs) within the City: Bronx, Kings (Brooklyn), New York (Manhattan), Queens, Richmond (Staten Island). For these dates, we noted all NYC fatalities and cases in New York County (Manhattan).

For March 17 to April 5 we used the data provided by New York City Health to distinguish the confirmed cases for each of the five New York City counties. This meant, however, that for this range of dates, we drew numbers for these five counties with a different time stamp than the published state data on the number of confirmed cases in NYC as a whole. We decided that the greater articulation achieved was more valuable than perfectly matching numbers.

From April 6 on, New York State began including distinct data for the five New York City counties (rather than grouping them all together as “New York City”), and we thus returned to using this state data for our tabulations rather than the New York City data.

For April 16 and April 17, New York State produced no fatality counts , except for a total number of fatalities from NYC for each of those days. In keeping with the practice noted above, we added these numbers to New York County (Manhattan). This results in wildly varying county-specific statistics for New York County from April 15 through April 18.

List of counties studied:
New York State: Bronx, Dutchess, Kings (Brooklyn), New York (Manhattan), Nassau, Orange, Putnam, Queens, Richmond (Staten Island), Rockland, Suffolk, Sullivan, Ulster, Westchester
New Jersey: All
Connecticut: All
Pennsylvania: Berks, Bucks, Carbon, Chester, Dauphin, Delaware, Lackawanna, Lancaster, Lebanon, Lehigh, Luzerne, Monroe, Montgomery, Northampton, Philadelphia, Pike, Schuylkill, Wayne

Rewriting the Historical Geography of Rome with the Chronicle of Theophanes

Text by Rachel Chung (’20), Grant van Inwegen (’20), Ezra Kohn (’20), Nathan Krieger (’20), and Jonah Skolnik (’21).
Mapping by Jesse Simmons (’21) and Grant van Inwegen (’20).
Data visualizations by Weiliang Song (’20) with assistance from Rachel Chung (’20).

This blog post presents a paper that the Theophanes Project team wrote to present at the 2020 CTW (Connecticut College, Wesleyan University, Trinity College) Undergraduate Symposium in the Arts and Humanities, originally planned to be held at Wesleyan University on March 28, 2020. The global pandemic of COVID-19 necessitated that the symposium be cancelled. In lieu of presenting our paper, we have converted it into this blog post. The paper here summarizes the goals and issues of the Geography and Narrative in the Chronicle of Theophanes project as it currently stands, presenting some of our most recent work and analyses. For a description of the project’s origins and evolution, please read the previous blog posts.

Nathan Krieger and Ezra Kohn with Jonah Skolnik

Standard narratives of the Roman empire tend to rely on strict periodization, and though this can take several forms, each brings with it a strong perspective on how to read history and Rome. Many periodizations carry with them certain biases, or at least don’t show the full picture. Since these ways of categorizing eras are decided by modern historians, there is a certain amount of hindsight bias that must be recognized.  

Resulting maps of the Roman empire, regardless of time period, then display these biases. Choices such as how to split up eras, which cities to highlight prominently, or even how to ‘crop’ the map (i.e. which section of the world to put into focus) all reflect certain ways or schools of reading Roman history. One of the main goals of our project is to find a way of taking the histories and chronicles written during the Roman empire and to translate how those histories of the time narrated their world into forms that we can understand.

Given the fact that the empire did not write narratives in the way we write them nor draw maps in the way we draw them, we end up confronting problems of historical translation. The questions we have to ask are, how would medieval Byzantines conceptualize the world they were living in? And, how can we represent this understanding from a modern viewpoint? 

To better understand the empire from a medieval perspective, we turn to arguably the most comprehensive historical text of the time: the Chronicle of Theophanes, a tome of a book recording each year between the years of 284 and 813. The chronicle was only part of a greater narrative written by George Syncellus, beginning its timeline at the Garden of Eden. Each section of the chronicle records the emperor of that time; the reigning bishops in the cities of Rome, Constantinople, Jerusalem, Alexandria, and Antioch; the political events of the year; the outcome of wars, the construction and destruction of cities and monuments; extreme weather; supernatural phenomena, as well as even more detailed and esoteric information. 

One can’t really read the chronicle as literature or a standard textbook, because it’s almost impossible for any one person to track the trends, names, place names, or biases of such an overwhelmingly dense source. To solve this problem, our research lab uses digital methods to visualize the chronicle’s content, converting otherwise undetectable trends into understandable mediums. As the chronicle includes geographical information for every single year between 284 and 813, we can construct exact visualizations charting change in the empire over time. 

How does the Chronicle‘s Historical Geography differ from our Textbooks?
Grant van Inwegen with Jonah Skolnik

Something to consider when attempting to understand the medieval perspective of the Roman Empire was that people living in it did not have an aerial map of their empire’s territory in their brain. Romans identified themselves with their cities rather than understanding themselves as living in the territorial domain of the Roman Empire. Romans didn’t think of space in terms of distance. Despite its closer proximity to Rome, Ravenna seemed further away than Carthage because it took a much longer time to travel across Italy on foot than it did to sail to Carthage.

So, to better understand how people living in the Roman Empire thought of themselves, we tracked mentions of cities in the entire Chronicle of Theophanes. Cities are important because they were the primary way of noting specific locations. For example, the chronicle might describe the location of a battle in terms of which city it took place near.

Cities were also significant to Romans because they used cities as their primary system of governance. In the medieval period, the church used cities as their primary means of organization. To demonstrate this, we included, “the Bishop of Antioch,” for example, in our tracking of city mentions throughout the chronicle.

When we tracked cities, we paired it with the year in which it was mentioned in the chronicle. This is a useful approach for displaying where the drama in historical narratives is occurring over time. For example, we found most of the mentions of Carthage throughout the chronicle in the year 533 A.D. This was because the Byzantine Empire fought a war with Carthage in that year, marking the beginning of Justinian’s reconquest of the West.

When comparing traditional textbook maps of the Roman Empire to the maps we made based on city mentions, we notice different themes.

Map 1: A standard historical map of the Roman Empire under Constantine I (r. 306-337)

Map 2: Cities mentioned in the Chronicle of Theophanes during its account of the reign of Constantine I (r. 306-337). Map by Jesse Simmons with Grant van Inwegen.

Traditional textbook maps depicting the period of the rise of Constantine and subsequent “decline of Western Rome” put most of their focus on the West. Constantinople is typically the Easternmost city that is depicted as relevant in this period. However, our maps displaying city mentions in this period show a large clustering of cities in Asia Minor, the Holy Land, and Egypt. In contrast to traditional maps of this period, only a handful of cities West of Greece are mentioned in the chronicle.

Map 3: A standard map of the Roman Empire under Justinian I (r. 527-565), taken from George Ostrogorsky, History of the Byzantine State

The second period we looked at was the reign of Justinian, who was a polarizing figure in textbooks as well as one of the most mentioned emperors in the chronicle. He is often thought of in textbooks as one of the last “Roman” emperors of late antiquity because the empire changes so drastically over the course of his reign to become “Byzantine.” He is well known for his reconquest of the Western Empire. This textbook map heavily emphasizes the Eastern empire surrounding Constantinople, but also gives emphasis to reconquered cities in Italy and Spain.

Map 4: Cities mentioned in the Chronicle of Theophanes during its account of the reign of Justinian I (r. 527-565). Map by Jesse Simmons with Grant van Inwegen.

While our map showed a similar emphasis on the Eastern empire, we found that the drama of the chronicle was emphasized in the clustering of cities in Northern Africa. The textbook map would seem to make Justinian’s reconquest of Africa much less significant (only three cities are labeled) than in the geography discussed by the Chronicle of Theophanes.

The final period we looked at was the period of the so-called iconoclast (or, Isaurian) emperors Leo III and Constantine V (r. 717-775), often thought of as the dark ages of Byzantine history. By this period, the territorial re-conquests of Justinian had all been lost. Standard textbook maps of this period (like Warren Treadgold’s) focus almost exclusively on Western Anatolia, the Southern Balkans, and Greece.

Map 5: A standard map of the Roman Empire under the Iconoclast or Isaurian emperors (717-775), taken from Warren Treadgold, History of the Byzantine State and Society

Our map was fairly similar, only our map also had a significant clustering of cities in the Holy Land. Treadgold didn’t even bother to include cities from the Holy Land on his map, displaying a contrast between traditional historical narratives and the narrative of contemporary Byzantines.

Map 6: Cities mentioned in the Chronicle of Theophanes during its account of the reigns of the Iconoclast or Isaurian emperors (r. 717-775). Map by Jesse Simmons; formatted for the blog with Grant van Inwegen.

A general theme we noticed on our maps that differed from traditional textbook maps was a focus on the frontiers of the empire. Most notably, the Middle East and North Africa seemed to be important regions that our ninth-century history of the Roman Empire, the Chronicle of Theophanes, gave much more significant attention to than did modern textbook accounts.

Map 7: Time sensitive map of all cities and settlements mentioned in the Chronicle of Theophanes with provinces or themata overlaid, and cities colored to show emperor reigning when mentioned. Map by Jesse Simmons with Grant van Inwegen.

To counter typical periodizations of the history of the Roman Empire, we decided to create a time sensitive map that displays cities throughout the empire in the year that they are mentioned in the Chronicle of Theophanes. This map allows the viewer to see a more natural progression of change throughout the empire rather than an abrupt sequence of drastically different maps. The cities are sized by total mentions to display their prominence within the chronicle.

Our map is not a perfect visualization of the narrative. One issue with this map is that the cities are sized based on total mentions throughout the chronicle, when they ought to be sized by mentions within their respective emperor’s reign. This could confuse the viewer who might think that the sizing is based on mentions in that period. Another issue with our approach is that mentions of cities throughout the chronicle do not necessarily denote importance. The writing style of the author might mention a certain city multiple times in a single sentence, when another city that is only mentioned one time in a sentence could hold a similar level of narrative significance.

Alternative Means of Visualizing Cities and Settlements in the Chronicle
text: Rachel Chung; visualizations: Weiliang Song

Presenting the settlements mentioned by the Chronicle on a GIS map is a way of making visible to us a way to understand the geography through which the text’s narrative progresses. However, presenting medieval places on a modern map is the visual equivalent of a translation from Medieval Greek into Modern English. Our GIS maps, as helpful as they are, show us the geography of the Chronicle on our terms. One way we might think about the geography of the Chronicle on its own terms is–as discussed in our first post on “Geography and Narrative in the Chronicle of Theophanes”–to put places mentioned onto a medieval map of the Mediterranean world, such as surviving maps in medieval copies of Ptolemy’s Geography. We have not yet done this, and even if we were to do so it remains an open question how many medievals would have thought of their experience of the world in terms of such maps. We use aerial maps on a daily basis to navigate through errands and trips. So far as we know a medieval person only ever utilized such maps for theoretical, not practical, discussions. 

Thus, a fundamental challenge of our project–to re-imagine the geography that the narrative of the Chronicle of Theophanes would create in the minds of its ninth-century readers–is to determine what visualizations could possibly capture that “imagined” geography. 

One method we use to read and represent the geography embedded in the Chronicle is to forego geographic maps entirely, and to consider the geography as data visualizations. Using data visualizations, we can analyze our geographical information from different contexts and perspectives whether that’s zeroing in on mentions of bridges during Nikephoros’s reign or taking a birds-eye view of all mentions of settlements in the Chronicle. 

The geographic data we have collected is grouped into three categories: mentions of individual people, mentions of people groups, and mentions of geography. Within our category of “geography,” we group natural geography (mountains, rivers, etc.), political geography (regions and provinces), and civic geography (settlements of all kinds, and their infrastructure). Our settlement data consists of an index of all the settlements mentioned in the Chronicle and tallies of the frequency with which these settlements are mentioned in each annual entry of the Chronicle. In this blog post, we have only been considering our data on civic geography, or settlements (but not the data on the infrastructure of those settlements, which we will discuss in a subsequent post). 

It should be noted that all visualizations to follow are based on a provisional version of our data, which is still undergoing correction and checking against the original text of the Chronicle. These visualizations were generated out of our data as it stood in December 2019.

Figure 1, below, shows one way in which we began to consider how to understand our data on settlements and the frequency with which the Chronicle mentions each of them. This visualization is a “tree-map.”

Figure 1: Tree Map of Settlements mentioned in the ChronicleFigure 1: Tree Map of Settlements mentioned in the Chronicle of Theophanes. Size reflects number of mentions; Colors merely differentiate distinct boxes. Visualization by Weiliang Song. 

In the “tree map” above, each box represents each settlement mentioned in the Chronicle. The size of the box reflects the number of times that settlement is mentioned. The majority of the settlements mentioned are only mentioned one or two times across the 530 annual entries of the Chronicle. These fall into the unlabeled squares in the bottom right quadrant of the visualization. A select number of cities–Constantinople, Alexandria, Antioch, Jerusalem, and Rome–clearly dwarf all the other cities in mentions. Another group of cities–including the times Constantinople is called Byzantium, as well as Edessa, Nicaea, Chalcedon, and others–have a notable but not enormous presence in the Chronicle. 

There is much that might still improve this visualization. First of all, as with all of our current visualizations, this was produced in order to allow us to continue exploring and checking our data. One surprise from this visualization was the apparent frequency with which a small and relatively insignificant settlement, Damatrys, appeared (below Carthage, above). This surprise prompted us to return to our data, where we discovered that most of the supposed mentions of Damatrys were actually mis-recorded mentions of Damasus, the bishop of Rome. This, and all of our visualizations remain means of exploring the data and continuing to improve its reliability.

Nevertheless, even in the still in-progress state of our work, and this rudimentary visualization of it it can be thought of as giving us a geography of the Chronicle’s settlements in terms of their familiarity to a reader. We might imagine the “center” of the reader’s point of view to be  the most frequently-mentioned settlement in the upper left quadrant: Constantinople. From this point of view, the “nearness” or clarity with which a reader would view the other settlements mentioned in the Chronicle is expressed by their relative size and nearness to the “center” of Constantinople. The visualization in Figure 1 can thus help us see and discover which places might be perceived as “near” to a reader in terms of how familiar the narrative of the Chronicle would make them seem. 

On the whole, Figure 1 presents one way of looking at all of our data at once. As such it is overwhelming, at least as a static image. One way to get a better sense of how we might explore this data and get into the experience of a reader is to limit our field or range to the most-mentioned settlements. Figure 2 is a bar-chart that does this. The bar chart uses the exact same data that produced the visualization in Figure 1, but is limited to only the top 15 most-mentioned settlements. 

Figure 2: Bar Chart comparing the number of mentions over the entire Chronicle for the fifteen most-mentioned cities. Visualization: Weiliang Song. Rachel Chung prepared an interactive version of this visualization for the web here.

Mentions of Constantinople outnumber the second-most mentioned settlement, Alexandria, by 288 mentions (making Constantinople’s total 188% of Alexandria’s). Constantinople’s true frequency is actually even higher than this: when labeled Byzantium, is also the sixth-most mentioned settlement indicating the outsized prominence of this location in the Chronicle. The division between two groups of cities in accordance with their mentions is accentuated in this graph. The top 5 most mentioned settlements (counting “Byzantium” and “Constantinople” as a single settlement) all have between 200 and 600 mentions. The remaining nine settlements (Chalcedon, Carthage, etc.) in this group each have between 25 and 100 mentions. 

Given that Constantinople is our mostly highly-mentioned settlement, we wanted to look deeper into how mentions of that city varied over the progression of the Chronicle. Figure 3 is a scatter plot layered with a line chart. Here the number of mentions for Constantinople is on the x axis and the year of each annual entry in the Chronicle is on the y-axis. Mentions are fairly evenly distributed across the book’s chapters with occasional spikes, including the nine times when there are six or more mentions of Constantinople in a single year’s entry. 

Figure 3: Scatter plot layered with a line chart of number of mentions of “Constantinople” in each annual entry over the entire Chronicle. Years refer to the “Annus Mundi” (“AM”) or Year-of-the-World under which each mention appears. The Chronicle covers AM 5777 – AM 6305, which corresponds to our AD 284-813 (for simplicity the Chronicle’s “Preface” was labeled AM 5776). Visualization by Weiliang Song. Rachel Chung prepared an interactive version of this visualization for the web here. Notes: dots do not indicate exact values, as jitter was applied to create visual distinctions.

One immediate observation from this data is that while Constantinople is mentioned in many of the annual entries, it is not mentioned in every entry. Nevertheless, the distribution of Constantinople’s mentions is relatively even across the entire Chronicle

This observation led us to wonder whether this is also the case with the other most frequently-mentioned cities. In order to compare the relative frequency and density of mentions of Constantinople to the other top ten most-mentioned settlements (the same group as in Figure 2, above), we made an area plot, as below. 

Figure 4: Area plot of ten (the chart is incorrectly labeled) most-mentioned cities in the Chronicle of Theophanes by Annus Mundi entry. AM 5800 = AD 307; AM 6300 = AD 807. Visualization by Weiliang Song. Rachel Chung prepared an interactive version of this visualization for the web here. 

In this Area Plot, the number of mentions of each of the top ten (the chart is incorrectly labeled) settlements in each annual entry are color-coded and stacked (alphabetically) on top of each other. We can build on what we have already observed from previous visualizations to learn some additional points. First, while we saw that Constantinople acquired its many mentions through a fairly steady frequency over the course of the Chronicle, many of the other most-mentioned settlements are emphasized in one portion of the Chronicle, but less so or not at all in other portions. For instance, Rome’s pink color is much more frequent in the first half of the Chronicle (up to around AM 6050 / AD 558). Antioch and Alexandria seem to be mentioned in comparable patterns. The city of Carthage is an extreme example of acquiring a great number of mentions in a short portion of the narrative, accounting almost single-handedly for the massive “spike” in a single entry near the middle of the Chronicle (in AM 6022 / AD 530) during the emperor Justinian I’s reconquest of Vandal North Africa. More generally, all of these cities are much more frequently mentioned in the first half of the Chronicle than in the second half.

None of these visualizations offer a comprehensive account for how a reader might experience or process the mentions of settlements that accrue over the course of reading the Chronicle. However, they do present a variety of approaches, and a set of observations from which we might pursue additional, more focused investigations into the world that the Chronicle creates for its reader to imagine the progress of six hundred years of medieval Roman history.

Jesse W. Torgerson

This adapted conference paper presents well the current multi-pronged approach of our research on the geographic data in the Chronicle of Theophanes. Behind our research is an insistence on always remembering for what we can use our source, the text of the Chronicle. The Chronicle is an indelibly early ninth-century work. As such, it does not tell us what a Roman of the fourth century thought of their empire: it tells us what a Roman of the ninth century thought of the past of their empire. Keeping this stricture in mind, our approach already offers to add to our historical knowledge in two important ways.

First, as outlined by Grant van Inwegen, our own textbook historical mapped representations of Rome emphasize political “borders” of empire and guess at “true” political or military importance of different locations at the time they purport to represent. These maps may be “correct” in these representations. However, it is truly fascinating to see how differently we understand the “action” of the reign of Constantine (for instance), as being largely located in the West until it comes to center on the focused “Eastern” stage of Constantinople and Nicaea. But in the ninth-century historical image created by Theophanes’ text, the regions of Eastern Asia Minor, Egypt, and especially Syria-Palestine, are the most densely articulated with a local civic geography. The Western theatre of Constantine’s reign was little animated in comparison to the stage of the far Eastern end of the realm. Similarly, under the Isaurian emperors of the eighth century, Treadgold’s map indicated a fairly heavy civic articulation through the empire, but gave no sense of the fact that the dominant historical narrative of that period–our Chronicle–continued to tell its story through a much larger region, essentially the size of Justinian’s empire pre-expansion. In practice we use our own historical maps as shorthand means to presume what geography individuals might have thought of if they thought of themselves as “belonging” as citizens of the Roman Empire of the time. In comparison to what we have seen from the geography invoked by the Chronicle, our presumptions fail to recognize that medieval Romans understood their empire within a much larger context than what we might think of as its “borders.”

Second, as outlined by Rachel Chung and Weiliang Song, the actual historical world that the Chronicle presented to its readers was complex, and changed over time. Besides clarifying how consistently the Chronicle maintained its narrative focus on the city of Constantinople, there is no simple formula for visualizing the geography it understood as “Roman” for the period it covered (AD 284-AD 813). However, many avenues for future research continue to be suggested by creative and exploratory visualizations of the data. Figure 4, for instance, would seem to indicate something of a “dark age” in the years between approximately AD 620-700 (ca. AM 6120-6200), when the top fifteen cities see a steep decline in their mentions. This can be compared with Map 6 (above) of the cities mentioned in the following 80 years, to AD 775. Though it appears from Figure 4 that there are still not many mentions of the largest metropoleis of the empire, perhaps the total mentions of cities and settlements remains somewhat constant since there is such a dense articulation of settlements for that period as compared to even the reigns of Constantine I and Justinian I.

There is much ongoing work to do. Nevertheless, we are all encouraged with the progress we continue to make in discovering how to know something about the past that we have not previously known. The collection and visualization of geographic data from historical texts can indeed be used to articulate the world imagined (whether in terms of narrative progression, or in terms of a mental image of places in the world) by the humans of the past. We cannot yet say what it looked like. But we can say that it looked and was looked at, very differently.

Geography and Narrative in Chronicle of Theophanes: 2018-2019 Resumé

by Nathan Krieger (Wesleyan ’20)

This project, using quantitative methods to study the role of geography through the narrative of the ninth-century Chronicle of Theophanes, took some significant steps in 2018-2019. Our aim has been to analyze this text using new tools and new methodologies including MAXQDA, Recogito, and both the online and desktop versions of ArcGIS. Over the 2018-2019 year we worked towards the goals of: (1) completing and then cleaning our data set; (2) adding descriptive information to the items in that data set; (3) beginning to visualize our data set by so that others scholars and students can use the data we have created to ask new questions.

Because it has been some time since this project has been updated, and readers new to our work may be finding this post first, we will briefly explain the history of the project before moving on to discuss the new steps we’ve made in the past year as well as our plan for moving forward in the 2019-2020 academic year (see here for all posts).

The project began with the task of assembling a set of ‘tags’ marking individual words, places, people, and events that we considered worth tracking throughout the Chronicle. We defined our interests broadly as “geography” but also tracked references to many key figures in the text (emperors, generals, bishops, etc.). Since every entry of the Chronicle begins with the phrase “In this year…” (or something similar), years are the most granular way of splitting up the text. Thus, after using the software MAXQDA to mark (or “tag”) every time one of our terms of interest appeared in the text, we also entered that information as data into a parallel spreadsheet organized by the complete list of terms (vertical rows) and the years in which those terms appeared (horizontal columns). Over the course of 2018-2019 we have worked to turn this spreadsheet (which we call our “Years-Over-Place” file) into a verified database.

The goal was and is to arrange this database so that queries can be made as to how frequently and where certain terms in the text appeared, and so that those results can be compared to the results for other terms. For example, tagging every emperor in the text might allow us to see the legacy of certain rulers by charting how often they are mentioned in the text after their rule, or by putting the data on emperors in conversation with that of bishops and other priests. Or, in theory, to ask more abstract questions such as the role of Christianity in the text and thus empire. For more information of the types of items we chose to tag and how, as well as why we chose MAXQDA, see previous blog posts written by Jesse W. Torgerson and a previous lab member Ethan Yaro (especially here and here).

The Years-Over-Place file is a large spreadsheet with information on every single item we tagged in the text from the city of Abydos to Zilgbi, King of the Huns. This amounts to 1,804 different items tagged over the course of the 526 years which the Chronicle describes, AD 284 – AD 813. The Years-Over-Place spreadsheet contained much of the data that we had extracted from our MAXQDA tagging. It is unwieldy and impossible to “read” even for those of us in the lab who have created it, certainly incomprehensible for anyone besides the members of our small team who might want to use the data we had collected.

Since completing our “reading” of the Chronicle of Theophanes in 2018 and thus completing the Years-Over-Place spreadsheet, the goal has been to transform this spreadsheet into something new that is more user friendly both for us and, more importantly, for any future users who might not be as intimately familiar with the spreadsheet as we are. We decided that our new database would in fact be three sets of databases.

Even when collecting the base data we added our own metadata categories to each item by determining what “type” of item it was. As can bee seen from the above screenshot, we originally noted this information by color-coding the items we were tagging. After spending a great deal of time in discussions and working with some basic descriptive statistics and data visualizations, we came up with eight overall categories for our items, and grouped these eight categories into three sets. Below is the graph that ultimately helped us to see the data in this way. Instead of showing each single year as a distinct bar we grouped years into reigns of emperors. Here each bar is a different emperor’s reign.

The three sets are essential for the analytical work we want to do as we move forward. The form of the data we have been collecting on every item is the same, (i.e. what years it is mentioned in and how frequently) but the types of questions that can be asked of this data depend on what kind of an item each is. As a result, we’ve begun to separate out these three different sets from the original complete Years-Over-Place spreadsheet in order to produce three different but usable databases. As we develop these databases each will come to look somewhat different depending on the types of items. The three sets are now as follows.

  1. Geography. In the above graph these are the green bars. Sample contents: cities, regions, and natural geography such as rivers, mountain ranges, etc.
  2. Prosopography. In the above graph these are the blue and purple bars. Sample contents: individual people such as bishops, emperors, kings.
  3. Ethnography. In the above graph these are the yellow bars. Sample contents: people groups, both ethnic (“Scythians”) and religious (“Christians” or “Arians”)

Dividing our data into these three sets enabled us to zero in on the types of data that are and will be the most useful to collect. This is important to have decided as we expand the databases to include more information than just frequency and years mentioned. For example, we need to include latitude and longitude in our database for items like Antioch, but not for items like Constantine the Great. Similarly we need to include information like length of reign for Constantine the Great, but not for Antioch. The splitting of our data into the three sets described above allows us to give each item the appropriate descriptors and specificity that we need in order to move forward with analysis.

For each of the three sets we are creating two separate spreadsheets. The first spreadsheet in each set is almost exactly the same as the original “Years over Place” file which we have described above. Each of these three Years-Over-Place spreadsheets will have a vertical Y-axis of all of the items that fit within it, and a horizontal X-axis of the years in which they were mentioned. The only difference is that the spreadsheet is now split into these three sections to make the enormous file usable and coherent.

The second file in each set contains entirely new information and we have been working on gathering that this past Spring 2019 semester. We have been referring to them as “descriptive” spreadsheets. They serves to help understand and interpret the data collected in the Years over Place spreadsheets. The Y axis for each of these three descriptive spreadsheets will be identical to its Years-Over-Place pair, but instead of the x axis being the years in which each item is mentioned, it will be a series of descriptors that help describe and specify the item.

For example, the descriptive spreadsheet of the Chronicle’s Geography, these columns are things like latitude and longitude, the type of geographic item (city, region, etc) and the larger item it may be contained within (for example the Hippodrome is within Constantinople). These columns make little to no sense to be included in the descriptive spreadsheets for Ethnography and Prosopography, which have their own set of unique characteristics to keep track of. We have spent a great deal of the semester finding all of this information and creating hierarchies of descriptive categories within which to organize each. We will have a follow-up blog post on some interesting analysis that has arisen as a result of this process on cities mentioned in the Chronicle whose infrastructure is also described (such as Constantinople, and Antioch).

Once all of this work is finished we will have a set of six spreadsheets that in combination will tell someone anything they need to know about the data we have extracted from the Chronicle. In this form no one of these six makes sense or can stand on its own without a pair. By combining the ‘hard’ data of when each item is mentioned along with how frequently the item appears, and its characteristics, we have significantly expanded the number and types of research questions that can be asked of our data. Not only will we simply be able to get overall pictures of the Chronicle’s narrative based on our major categories, but scholars will be able to query items in any number of ways, from geographic region (by isolating certain latitudes and longitudes), to person’s affiliated religion, to during which part of the Chronicle they most frequently appear, to how references to different regions wax or wane over the course of the narrative, etc., etc..

This brings us to the last step of what we’ve been working on this year: going public. It has long been the goal of the Traveler’s Lab as a whole to get our projects up on Github and into the public sphere. Github is an online development platform designed to allow people to share their work. Though it is mostly used by software programmers to work on and share their code, we think it could be a really great way of making our databases public so they can be used by anyone. We want other researchers to see what we’ve been working on, to use our data, but also to actively contribute to our project. In the immediate future we are working towards getting the first of our three sets sufficiently corrected for this “Github Migration.” This will be the “Geography” set, including cities and settlements along with political regions and natural geographic features. By the end of Spring 2019 we had very nearly completed the cities and settlements portions of this set.

Sonification and the Datini Letter Meta-data

Written by Adam Franklin-Lyons (History professor at Marlboro College) and Logan Davis (Research and Development Engineer at Pairity Greater Boston Area Computer Software)

Which means what exactly?  It’s like a visualization, but instead of something you see, it’s something you hear.  Let me start with a little background…

A couple of years ago, we attempted a couple of “sonifications” (rendering complex data in sound) using the metadata from the letters sent by the Datini Company in 14th and 15th century Italy. (We in this context are Adam Franklin-Lyons, professor of history at Marlboro College and Logan Davis, a skilled programming student, now alum, at Marlboro with a strong background in music and sound). The Datini data collection contains over 100,000 letters with multiple variables including origin, destination, sender, receiver, travel time, and others. There is an earlier blogpost with more about Datini and some regular old visualizations from a conference talk. We made a few preliminary experiments, often connecting individual people to a timbre and moving the pitch when that person changed locations. Here is a short version of one of our experiments where three different individuals each “inhabit” an octave of space as they move around – we made both a midi-Piano version and a synth-sound version. The sounds are built using a python sound generator and attaching certain pieces of data (in this case, the locations of three names agents of the Datini company, Boni, Tieri, and Gaddi) to numeric markers that the generator then translates into specific pitches, timbres, decay lengths, etc. What follows here are some of our thoughts about what sonification is, and how you might create your own. This post does not go into specific tools, which can be complicated, but is more of a general introduction to the idea. Hopefully in the future we will include another couple of posts that also talk about the technical side of things.

Despite not being intensively used, you are probably already familiar with the basic idea of sonification. Several well-known modern tools (the Geiger counter is the most widely cited example) use a sonic abstraction to portray data inputs that we can not otherwise sense. (for the Geiger counter, beeps or clicks indicate the quantity of radioactivity emitted. Basic metal detectors work similarly.) In contrast, researchers portray vast amounts of data in visual forms – graphs, charts, maps, videos, and so on. Perhaps this is because of the dominance of visual input for most people, perhaps not. Either way, the goals is the same: how do you take a large quantity of data and distill or organize it into a form that demonstrates patterns or meaningful structures to the person trying to understand the data?

Fields like statistics and data science teach and use visualization constantly, including using many known methods of comparing data sets, measuring variance, or testing changes over time. Researchers have also studied the reliability of different types of visualizations. For one example, visual perception can measure distance much better than area. Thus people consistently get more accurate data from bar graphs than from pie charts. The goals of sonification thus present one important question: what are types of patterns or structures in the data that would actually become clear when heard rather than seen? Are there particular types of patterns that lend themselves well to abstraction in audio rather than visually? (And I will be honest here – I have talked to at least a couple of people who do stats work who have said, “well, there probably aren’t any. Visual is almost bound to be better.” But admittedly, neither of them were particularly “auditory” people anyway – they do not, for instance, share my love of podcasts…their loss.)

Thus, the most difficult aspect is not simply duplicating what visualizations already do well – a sonification of communication practices where the volume matches the number of messages getting louder and louder over the course of a 45 second clip and then drops off more precipitously doesn’t actually communicate more than a standard bar graph. It would take less than 45 seconds to grasp the same concept in its visual form. Visualizations employ color, saturation, pattern, size, and other visual aspects for multiple variables. Combining aspects like attack and decay of notes, pitch level, and volume could potentially allow for multiple related pieces of data to become part of even a fairly simply sonic line. Like visualizations, certain forms of sound patterns will catch our attention better or provide a more accurate rendition of the data. Researchers have not studied the advantages and disadvantages of sound to the same extent, making these questions ripe for exploration.

So what are some examples? There is at least one professional group that has been dedicated to this research for a number of years: The International Community for Auditory Display. Their website has a number of useful links and studies (look particularly at the examples). Although these are not the most recent, there is a good handbook from 2011 and a review article from 2005 that describe some of the successes and failures of sonification.  Many of their examples and suggestions recommend reducing the quantity of data or not overloading the auditory output, much as you would not want to draw thousands of lines of data on a single graph. However, at least a couple of recent experiments have moved towards methods of including very large quantities of data. While promotional in nature, here is a video demonstrating the concept as used by Robert Alexander to help NASA look at solar wind data.

So, how to proceed? First, the work of audification does not escape from the day to day tasks of data science, especially the normalization of data. If your audification cannot reasonably handle minor syntactic differences in data (ie: “PRATO” vs “prato” vs “Prato, Italy”), then your ability to leverage your dataset will be limited, just as it would with visualizations. The work to normalize and the choices you make in the normalization may be made far more efficient with a little leg work in the beginning.

Like visualizations, sonifications should be tailored to the data-set at hand. Then you will have to make choices about which aspects of sound you will relate to which data points. This is the main intellectual question of sonification. What are we voicing? What is time representing? What does timbre (or voice – different wave forms) give us here? Timbre and pitch nicely convey proper nouns and verbs in data-sets. Timbre has a far more accessible (articulated) range of possible expressions for data with higher dimensions (though for a particularly trained ear, micro-tonalism may erase a great deal of that advantage). Decay, in my experience can contain interesting metadata, such as confidence or freshness of a fact; the action of the tone relates to how concretely we know something in the data.

After cleaning, pitch, timbre, decay assignments, etc., you listen. Much of what you will find sonification good for is finding hot-spots in data sets. What stands out? Are there motifs or harmonic patterns that seem especially prevalent? Some of these questions, obviously, will relate to how the data has been coded, but every time we have tried this, there are also at least a few surprising elements. And finally, is it beautiful? (A question becoming more popular in visualization circles, also…) Particularly when intersecting with some of the wild data-set available today, what is the sound world created? Are there tweaks to the encoding that will both make data observations clearer while also making the sound more enjoyable to listen to? When creating an auditory representation of data, you are quite literally choosing what parts are worth hearing.

A New GitHub Data-Set

Written by Adam Franklin-Lyons

In earlier articles on this blogroll, we have written a couple of times about extracting, analyzing and organizing medieval itineraries as a source of data for doing geographic studies of medieval movement and travel (see: “Itineraries, Gazetteers, and Roads” and “Notes on the Margins“). Currently we have compiled and organized an increasing number of itinerary data-sets including digitizing older itineraries compiled in the 19th century or early 20th century. These data-sets include multiple royal itineraries from the Crown of Aragon and dozens of episcopal itineraries from England. We are planning to expand this project to include new travel itineraries from new places around Europe. To facilitate this expansion, we have moved a portion of our data onto GitHub – a site that specializes in version control usually used for software development.

The GitHub site currently hosts several of our itineraries along with a small amount of code (written in python with the Pandas library) that allows for the conversion of itineraries to trip sets, compilation of itineraries, looking up of existing names in other itineraries or in, and other transformations that will assist in data collection and visualization. This should help us better organize all of the diverse data points, especially by linking each location (when possible) with a geonames id, a broadly cross-referenced and computer legible reference point. This leaves the door open to connecting our projects to linked open data sometime in the future (Wikidata is probably the most famous version of a linked open data project). There are also instructions available for how to insure that the geonames id is correct, not to mention what to do when there is no obviously available geonames referent.

The medium term goal is to compile a larger bibliography along with a set of usable assignments that other scholars and teachers could use as samples in digital humanities courses or as a digital example in an appropriate history course. These assignments will provide a two-fold benefit. First, each assignment undertaken will add to the scope of the overall data-set. The bibliography will provide other opportunities for history students to create their own data which we can then add to the collection. Second, because of the scope of the data already present, students will be able to more quickly use digital tools on a larger data-set to see the potential of geographic and statistical analyses. Eventually, we will add some of the more successful visualizations along with instructions to try with new data or to modify and expand on.

It is a key difficulty in teaching digital humanities that data collection on a scope large enough to produce compelling results must often be balanced with the class time needed to actually learn the digital tools used to run the analyses (GIS tools, stats packages like R, and other tools all involve whole courses in their own right.) By creating an iterative platform with model analyses and large amounts of already usable data, students and professors can participate in each stage of the project only for as long as they have available for their course without sacrificing the ability of students to practice all the steps along with way and produce satisfying historical results. Ideally, the project components on GitHub will allow courses to contribute incomplete data that other groups can pick up and continue. This makes concrete how we have attempted to run the lab in the past – with students able to work on a project at multiple stages, passing it off to a new group of students to move further when they are done. However, the instructions, future work, and goals have generally lived in the individual heads of the professors overseeing each project rather than in more publicly usable formats.

The long-term goal is to structure the data into an SQL database for easier querying, but also to increase substantially the number of trips and itineraries available in the data-set. We are also aiming to add data from a number of letter collections, including some data-sets we have already worked on (see: “Parsing the Past“). Eventually, there should be enough data about small movements to be able to ask broader questions about European mobility in the late medieval period. If we can reach a few hundred thousand individual data points (known short trips of less than a couple days), we will be able to ask systemic questions about the nature of movement. We could look for patterns such as seasonality, the influence of topography, linguistic boundaries, or observe potential regional differences across Europe.

The extended vision of the project could resemble the Stanford Orbis project – an interactive map of the ancient world for which there is no medieval equivalent. However, while Orbis is based on original research and primary sources, the sources are more diaphanous and descriptive. The map is built on algorithms that encode assumptions based on those sources, whereas a large enough data-set of individual trips would allow for algorithms that can give travel times and methods as inductive statistical guesses. These inductive guesses would be built on an extensive underlying source base of individual trips.

So, for the moment, we have tens of thousands, not quite hundreds of thousands. If you want to contribute to the bibliography or have good suggestions for primary sources that could reasonably produce an itinerary, get in touch so we can get it on the site. If you are planning on teaching a digital history course of some sort, use the data, try out the instructions, or create your own itinerary that can compliment the data already available. If you do try out some of the methods on the site, please let us know if some portion of the instructions are hard to follow or do not work as you move through it. We are always looking to update and improve the usability of the data.

So Check It Out!