by Nathan Krieger (Wesleyan ’20)
This project, using quantitative methods to study the role of geography through the narrative of the ninth-century Chronicle of Theophanes, took some significant steps in 2018-2019. Our aim has been to analyze this text using new tools and new methodologies including MAXQDA, Recogito, and both the online and desktop versions of ArcGIS. Over the 2018-2019 year we worked towards the goals of: (1) completing and then cleaning our data set; (2) adding descriptive information to the items in that data set; (3) beginning to visualize our data set by so that others scholars and students can use the data we have created to ask new questions.
Because it has been some time since this project has been updated, and readers new to our work may be finding this post first, we will briefly explain the history of the project before moving on to discuss the new steps we’ve made in the past year as well as our plan for moving forward in the 2019-2020 academic year (see here for all posts).
The project began with the task of assembling a set of ‘tags’ marking individual words, places, people, and events that we considered worth tracking throughout the Chronicle. We defined our interests broadly as “geography” but also tracked references to many key figures in the text (emperors, generals, bishops, etc.). Since every entry of the Chronicle begins with the phrase “In this year…” (or something similar), years are the most granular way of splitting up the text. Thus, after using the software MAXQDA to mark (or “tag”) every time one of our terms of interest appeared in the text, we also entered that information as data into a parallel spreadsheet organized by the complete list of terms (vertical rows) and the years in which those terms appeared (horizontal columns). Over the course of 2018-2019 we have worked to turn this spreadsheet (which we call our “Years-Over-Place” file) into a verified database.
The goal was and is to arrange this database so that queries can be made as to how frequently and where certain terms in the text appeared, and so that those results can be compared to the results for other terms. For example, tagging every emperor in the text might allow us to see the legacy of certain rulers by charting how often they are mentioned in the text after their rule, or by putting the data on emperors in conversation with that of bishops and other priests. Or, in theory, to ask more abstract questions such as the role of Christianity in the text and thus empire. For more information of the types of items we chose to tag and how, as well as why we chose MAXQDA, see previous blog posts written by Jesse W. Torgerson and a previous lab member Ethan Yaro (especially here and here).
The Years-Over-Place file is a large spreadsheet with information on every single item we tagged in the text from the city of Abydos to Zilgbi, King of the Huns. This amounts to 1,804 different items tagged over the course of the 526 years which the Chronicle describes, AD 284 – AD 813. The Years-Over-Place spreadsheet contained much of the data that we had extracted from our MAXQDA tagging. It is unwieldy and impossible to “read” even for those of us in the lab who have created it, certainly incomprehensible for anyone besides the members of our small team who might want to use the data we had collected.
Since completing our “reading” of the Chronicle of Theophanes in 2018 and thus completing the Years-Over-Place spreadsheet, the goal has been to transform this spreadsheet into something new that is more user friendly both for us and, more importantly, for any future users who might not be as intimately familiar with the spreadsheet as we are. We decided that our new database would in fact be three sets of databases.
Even when collecting the base data we added our own metadata categories to each item by determining what “type” of item it was. As can bee seen from the above screenshot, we originally noted this information by color-coding the items we were tagging. After spending a great deal of time in discussions and working with some basic descriptive statistics and data visualizations, we came up with eight overall categories for our items, and grouped these eight categories into three sets. Below is the graph that ultimately helped us to see the data in this way. Instead of showing each single year as a distinct bar we grouped years into reigns of emperors. Here each bar is a different emperor’s reign.
The three sets are essential for the analytical work we want to do as we move forward. The form of the data we have been collecting on every item is the same, (i.e. what years it is mentioned in and how frequently) but the types of questions that can be asked of this data depend on what kind of an item each is. As a result, we’ve begun to separate out these three different sets from the original complete Years-Over-Place spreadsheet in order to produce three different but usable databases. As we develop these databases each will come to look somewhat different depending on the types of items. The three sets are now as follows.
- Geography. In the above graph these are the green bars. Sample contents: cities, regions, and natural geography such as rivers, mountain ranges, etc.
- Prosopography. In the above graph these are the blue and purple bars. Sample contents: individual people such as bishops, emperors, kings.
- Ethnography. In the above graph these are the yellow bars. Sample contents: people groups, both ethnic (“Scythians”) and religious (“Christians” or “Arians”)
Dividing our data into these three sets enabled us to zero in on the types of data that are and will be the most useful to collect. This is important to have decided as we expand the databases to include more information than just frequency and years mentioned. For example, we need to include latitude and longitude in our database for items like Antioch, but not for items like Constantine the Great. Similarly we need to include information like length of reign for Constantine the Great, but not for Antioch. The splitting of our data into the three sets described above allows us to give each item the appropriate descriptors and specificity that we need in order to move forward with analysis.
For each of the three sets we are creating two separate spreadsheets. The first spreadsheet in each set is almost exactly the same as the original “Years over Place” file which we have described above. Each of these three Years-Over-Place spreadsheets will have a vertical Y-axis of all of the items that fit within it, and a horizontal X-axis of the years in which they were mentioned. The only difference is that the spreadsheet is now split into these three sections to make the enormous file usable and coherent.
The second file in each set contains entirely new information and we have been working on gathering that this past Spring 2019 semester. We have been referring to them as “descriptive” spreadsheets. They serves to help understand and interpret the data collected in the Years over Place spreadsheets. The Y axis for each of these three descriptive spreadsheets will be identical to its Years-Over-Place pair, but instead of the x axis being the years in which each item is mentioned, it will be a series of descriptors that help describe and specify the item.
For example, the descriptive spreadsheet of the Chronicle’s Geography, these columns are things like latitude and longitude, the type of geographic item (city, region, etc) and the larger item it may be contained within (for example the Hippodrome is within Constantinople). These columns make little to no sense to be included in the descriptive spreadsheets for Ethnography and Prosopography, which have their own set of unique characteristics to keep track of. We have spent a great deal of the semester finding all of this information and creating hierarchies of descriptive categories within which to organize each. We will have a follow-up blog post on some interesting analysis that has arisen as a result of this process on cities mentioned in the Chronicle whose infrastructure is also described (such as Constantinople, and Antioch).
Once all of this work is finished we will have a set of six spreadsheets that in combination will tell someone anything they need to know about the data we have extracted from the Chronicle. In this form no one of these six makes sense or can stand on its own without a pair. By combining the ‘hard’ data of when each item is mentioned along with how frequently the item appears, and its characteristics, we have significantly expanded the number and types of research questions that can be asked of our data. Not only will we simply be able to get overall pictures of the Chronicle’s narrative based on our major categories, but scholars will be able to query items in any number of ways, from geographic region (by isolating certain latitudes and longitudes), to person’s affiliated religion, to during which part of the Chronicle they most frequently appear, to how references to different regions wax or wane over the course of the narrative, etc., etc..
This brings us to the last step of what we’ve been working on this year: going public. It has long been the goal of the Traveler’s Lab as a whole to get our projects up on Github and into the public sphere. Github is an online development platform designed to allow people to share their work. Though it is mostly used by software programmers to work on and share their code, we think it could be a really great way of making our databases public so they can be used by anyone. We want other researchers to see what we’ve been working on, to use our data, but also to actively contribute to our project. In the immediate future we are working towards getting the first of our three sets sufficiently corrected for this “Github Migration.” This will be the “Geography” set, including cities and settlements along with political regions and natural geographic features. By the end of Spring 2019 we had very nearly completed the cities and settlements portions of this set.