Converting to Github: Any Problems? Lab Meeting October 26, 2017

On October 26, 2017 the Traveler’s Lab met on Wesleyan Campus to walk through the initial stages of converting and migrating our data sharing and storage activities to Github. Led virtually by our own Prof. Adam Franklin-Lyons (Marlboro College), students and professors walked through the various organizational models, and decided on how to structure our “organization,” as well as the various projects that each professor is leading.

To Github or not to Github? Lab Meeting October 17, 2017

On October 17, 2017 the Traveler’s Lab gathered to hear presentations from Elliot Williams (’18) and Elizaveta Kravchenko (’19) on their proposal for the lab to move from and eclectic and individual approach to data storage and shared working files on Google Drive and Dropbox to a single site model on the widely used Github platform.

We were joined by Prof. Stephanie Koscak (Wake Forest University), a visiting faculty fellow for the year at Wesleyan’s Center for the Humanities.

Consider Github. Adobe thinks it works pretty well:

What’s a Network? How “to Database”? Lab Meeting Sept 28, 2017

The Traveler’s Lab met on Wesleyan Campus, September 28 2017 to welcome our new student members, and begin working on defining and transmitting one of the core skill sets we employ: turning a variety of historical sources into databases that represent networks of travel and communication.

Prof. Pavel Oleinikov gave us a presentation entitled “Networks in History,” on the history and theory behind use of network analysis for historical research questions.

Traveler’s Lab students Elizaveta Kravchenko (’19), Brendan McGlone (’18) and Ilana Newman (’18) then gave us working examples of how they have and are turning historical sources into spreadsheets of data.

We were joined by Prof. Christine Axen (Fordam University), in person, and virtually (via Google Hangouts) by a half dozen students of Prof. Kathryn Jasper from Illinois State University.
Prof. Jasper and her students are exploring starting either a satellite of, or their own version of the Traveler’s Lab.

Meeting in action:

Chronography’s Geography: To Organize Geographic References

By Ethan Yaro

Note: This is the fifth in a series devoted to the project “Narrative and Geography in the Chronicle of Theophanes the Confessor”. First post here; second here; third here; fourth here.

The chronicle is geographically dense. After completely coding only half of the text, we have reached over ten thousand data points.

This immense amount of data, unsorted, represents an impenetrable mass, with little meaning for either the casual observer or someone already well versed in the text. For this reason we developed categories into which we could sort this multitude of geographic references.

Learning how to Categorize Our Data

My creation of the Geography in Theophanes database began with an excel sheet. Initially, when developing the excel-sheet index, I created a few general categories in which to sort all of the geographic references or tags. There were only 11, and I initially imagined that this would do a pretty good job organizing the data.

As I moved the project into MAXQDA the number of data points that had been coded in the text steadily climbed into the hundreds and then thousands. It became clear that there had to be a more in-depth organizing principle for all the different types of codes.

Oddly enough, the first step in separating out the different types of data was creating fewer distinct archetypes (or super categories): rather than the initial eleven categories, I boiled the data down to four main types of geographic data within the text. These were:

1: Explicit Geography – References to geographical places, such as Jerusalem, Africa, or Hagia Sofia.
2: Geographical Titles – References to geography that are not a place, but someone associated with a place, such as The Persian Emperor (associated with Persia), The Bishop of Constantinople (associated with Constantinople), or The Dux of Palestine (associated with Palestine).
3: Geographically Related People Groups – References to groups of people that have a distinct geographical association, such as The Citizens of Constantinople (associated with Constantinople), the Bulgars (associated with Bulgaria), and Romans (associated with Rome).
4: Geographically Related Events – References to occurrences that are geographically tied, all of which are synods and councils, such as the Holy Ecumenical Synod of Chalcedon (associated with Chalcedon).

It should be noted that the last three categories of references are all dependent on the existence of the first. Many references in these categoires are also references to the actual geographical place with which they are associated (see our fourth blog post in this series to see how this nesting works).

From these categories I then generated a multitude of different stemma into which I would sort the data.

Making Friends with MaxQDA

Initially, I thought of these four different groupings in terms of ArcGIS. ArcGIS separates geographical data into three different kinds: polygons, lines, and points. Deserts, some bodies of water (lakes, oceans, etc.), continents, and regions were thought of as polygons. Other bodies of water (rivers, streams, etc.) and roads were thought of as lines. Cities, forts, and monasteries were thought of as points.

This way of thinking gave me a problematic structure. Once the number of places within cities grew, it seemed illogical to think of these (place) points as being within (city) points. Cities could have become polygons, but it would have been impossible to plot out such polygons for all cities. This classification scheme was soon was dropped in favor of MaxQDA’s “way of thinking” about the data.

MaxQDA is efficient for sorting and resorting. The code groups one generates are easily movable and can be made subsets of other codes. Often these subset chains are three or four levels deep. For example I made Hagia Sofia a subset of Constantinople, which in turn is a subset of Cities, which is in turn a subset of Explicit Geography.

It should also be noted that, as described in our second post and as demonstrated above, we made the decision to adopt a capacious concept of “geography.” One value of MaxQDA is that it easily allows us to select only particular tags or references. Thus, if we want, we can easily choose to run analysis only for “explicit geography” and suppress references which are more subjectively geographic.

Now, using a portion of the category Explicit Geography as an example, I will follow one of these larger code groups down to its smaller parts to demonstrate how the sorting process works for our project.

Note, for this and all the images that follow, that there are some categories and items which have few or even zero instances. This is due to the fact that these are screen shots of in-process coding, and due to the fact that MaxQDA has some difficulty with the amount of data I am working with, I work with small sections of the text at a time. Items with “0” tags noted are there because they are holdovers from previously-coded sections of the text.

In the above example, “Explicit Geography” has one direct subcode, which is “The World.” “The World” is the largest, most all-encompassing data point of Explicit Geography, and correspondingly, all the other geographical data within Explicit Geography has been made a subset of the world. Within “The World” are the subcodes Deserts, Bodies of Water, Cardinal Regions, Cities, Continents, Forts, Monasteries (that are not in cities, as monasteries in cities become subcodes of the city), Mountains, and Regions.

Unlike “The World,” which not only exists as a category I created, but as a “geographic reference” in the text (i.e., the Chronicle does talk about “The World”), some of these subcodes (such as “Cities”) have no independent tags of their own, and so will also show “0”.

Within all of these are more subcodes. In order not to be tedious, I will only examine one single subset – “Cities” – within “The World.” “Cities” contains good examples of how the smaller subcode structures often work.

As can be seen below, the subcodes within “Cities” are specific cities. These cities are sorted alphabetically (except for Constantinople which, as the axis around which the text revolves, I made accessible to expedite coding within Constantinople).

As indicated by this small selection of the subcodes within cities (many being hapax legomena), we currently have hundreds of distinct cities mentioned by the Chronicle.

Codes within Codes: Constantinople

Let’s look one subcode level lower. I will use Constantinople as the example, since it has the most fleshed out set of subcodes of any city in the text.

While we could sort everything Constantinopolitan together (all could all be conceived of as equivalent points on the map, and sorted as similar data), there are certain subsets within Constantinople which seemed distinct enough to separate from each other.

Separating all items by type allows more comparisons. Furthermore (as we will see in a future post), developing these categories allows us to activate MaxQDAs analytical capabilities. But I did make editorial decisions.

Within “Constantinople,” I sorted items into subcode groups by type when, alternatively, they could have been organized into other groupings, such as regions. Thus, “churches” is a subcode group, instead of sorting all the churches into the districts that they are actually in. Getting all the data together by type at the smaller levels is useful for our interest in comparing different data groups.

On the other hand, in the case of certain buildings (The Hippodrome and The Great Palace), I made them their own unique subcode groups, because this seemed more logical than creating other subgroups for “statues” for instance.

Geographic Misnomers or Comparative Categories?

Setting up the data for analysis in this way has also meant that there are few items that still found a place in our code system even though they do not necessarily fit into the category of geography (even with the wide net that we have cast over that concept, as described in post ?? of this series).

The two most significant groups are the Eastern Emperors (within Geographical Titles) and Religious People Groups (within Geographically Related People Groups). Emperors can be conceived of as having geographical significance—the emperor calls to mind the territory over which he is emperor—but they have been included predominantly as a tool for analysis. In the Chronicle of Theophanes, the change between byzantine emperors is a significant textual marker: they are the most important figure in the chronicle’s dating system, and to some degree each emperor represents a different temporal period.

Religious people groups too can be conceived of in a geographical way—Christians would call to the mind of the reader the Christian world, whereas Muslims would call to mind the territories of the border and beyond — but they have primarily been included for analytical purposes of comparison, rather than for the strength of their geographical reference. We eventually want to ask questions comparing the geographies associated with these different groups of people.

We coded religious groups so that we could locate where and when the text creates different geographic associations with different religious groupings (Christian or otherwise), as well as which emperors have passages filled with criticism, and which emperors are lauded as virtuous and pious, and if particular geographies are consistently associated with either category.

Conclusions : our reading of the Chronicle

It should be clear by this point that while there is a logic for sorting all of these codes the way that I have, it should not be taken as absolutist, normative, or prescriptive. Our categories arose from our reading of the text itself, and the particular research questions we anticipate wanting to ask.

This process should also recall our principle that the text is its own geography. We made our analytical categories derive from this principle.

This decision and method means that though our decision process and rationale should provide a helpful model for other similar projects, we have not developed a universal system. Our coding structure will not necessarily work well for another project. In fact, it would be strange if it did. The decisions outlined above were made because they were practical for this research project: the tagging pattern fits the text.

Our system of coding is itself a reading of the chronicle.

From Theory to Practice: Conference 9/7-9/8

The Traveler’s Lab hosted an international workshop conference,

From Theory to Practice: Digital Methods in Research and Teaching

at Wesleyan University, from Thursday 9/7 through Friday 9/8.

Wesleyan University’s Olivia Drake attended a portion of the proceedings and produced a write-up of the conference and the Traveler’s Lab. (Please click through to see our workshop schedule Digital Methods – Schedule – 2017.09.07)

The conference was held at the Allbritton Center and at the Center for Humanities.

It was attended by scholars and students from Wesleyan University, Lafayette College (PA), Illinois State University (IL), Marlboro College (VT), Binghamton University (NY), and Exeter University (UK).

We are very thankful to all presenters and participants for some outstandingly productive presentations, discussions, and especially the connections made for future projects and networking.
We are particularly thankful to our Wesleyan University sponsors: the Center for the Humanities, the Allbritton Center, the Department of History, and the Quantitative Analysis Center.

Spring 2017 Meetings: a chronicle

Traveler’s Lab collective gatherings and workshops, held between February & May 2017

Feb 6 (Monday) – Informational and Planning Meeting

Feb 20 (Monday) – Brainstorming Meeting: goals and ideas for the semester

Mar 3 (Friday) – Basic Text Analysis using R
(by Matthew Jockers)

Mar 6 (Monday) – Using MaxQDA and Rocogito to read the Chronicle of Theophanes
(Torgerson Labs)

Apr 3 (Monday) – Cracking the Datini Archive
(Franklin-Lyons Labs)

Apr 18 (Monday) – Roads and Paths: How can we find them, how can we map them?
(Shaw Labs)

May 9 (Tuesday) – Professor Grimmer-Solem and the “Cartography Lab”; the Semester in Review

Parsing the Past: Data Extraction in Medieval Text

Parsing the Past:  Data Extraction in Medieval Text

 

by Daniel Gordon

 

There are many questions we have that can be answered through looking at letter collections. They can give valuable information about the speed letters move at, time taken between towns, and if they are effective and timely forms of communication. These collections can end up being quite large, with hundreds, if not thousands, of individual documents. Manually searching every single item can take an excessive amount of time. To get around this, I resolved to write a program that could automatically search every document, pulling out key information like names and dates, and tagging important parts of the letters for data gathering and quick examination. I based this on the Cely Letter Collection, since it only has about 150 entries, allowing for manual checking of work. If applied successfully in this case, then with slight modification it could be applied to other collections, providing a valuable tool for parsing information. To effectively do this, though, there is one major roadblock, the middle English dialect the letters are written in.

Due to the Middle English dialect, a word can be spelled a variety of different ways, sometimes within the same letter. This can create problems when we need to find specific words or phrases. Though writing a program to account for these things can be difficult, I used three techniques to get around it, often in combination.

  • Create a list of possible spellings for a given word, so that whenever a spelling comes up, you know it’s that word. This has its advantages because it means you’ll never pick up a wrong word by accident, but is prone to error in that if you miss a spelling, you won’t pick it up. In addition it can take time to sift through the information to find spelling. That being said, you are 100% sure it’d finding the words you want.
  • Look for patterns within the world. In my case , the word “September” reliably begins with the letters “Sept”, and not many other words begin in that way. By looking for words beginning with those letters, you can find spellings of September. Looking for capitalization can also be a useful tool. Problems arise if you have spellings that break the pattern, or if your pattern isn’t specific enough to that word.
  • Looking for patterns around the word. This is connected to the last strategy but not exactly the same. In the Cely letters, dates are almost always presented in the form “the (number designation) day of (month)” as in “the 3rd day of March”. What sticks out is the phrase “day of” which doesn’t really have an alternate spelling aside from “daye of”, and that word combination doesn’t really appear in other contexts. Thus, by looking for it, we can find when a date is mentioned in a letter. Problems are the same as the last strategy, with the addition that these kinds of patterns are more rare.

There are three other methods I considered but never implemented. The first was to look fat string distance. There are programs in python that can tell you the ratio of how much words differ. Hypothetically you could estimate that if a word is 80% similar, it’s that word. However, some of the spellings can be drastically different from their modern day ones, and if you lower the ratio too much you start picking up words outside of what you’re looking for. Since it’s too vague, I favored more specific methods.

 

The second was a sort of “translation” program. I noticed that the differences in spellings tend to follow certain rules. C is replaced by s, certain letters get doubled, and the letter e gets added to the end of words. Theoretically, if you apply these rules to any middle English words, you get the modern spelling, making data extraction significantly easier. The program would look at a word, make a list of possible rules that could be applied to it, then try every combination of the rules, checking to see if a given combination was in the dictionary. It would take a while, but eventually you’d translate the letter, making it much easier to parse. I decided to try other techniques because I was unsure how effective a program like that would be, and wanted to try other methods before resorting to it.

 

The third has to do with more sophisticated data extracted techniques. Using programs designed for data extraction, such as the natural language toolkit (nltk) 3rd party module for python, you could theoretically make the job easier. This gets complicated by the fact that these programs rely on tagging sentences with parts of speech and extracting information based on the structure of the sentence, a process complicated by the middle English spellings. In addition, the nltk module can be difficult to download and implement.   Thus, I put it aside.

 

 

After developing methods to look at the letters, we can begin extracting information. The format of the letters can vary, and sometimes the information we want is simply not in the letter. But often, patterns hold true. This enables us to look for the first important piece of information: location. In other words, where the letter was sent, and where it came from. The former is a simple matter, there is an address line at the end of every letter, Proceeded with the word “Addressed:” (Due to its consistent spelling I can expect this to be a modern addition). Find that word and the phrase after it, you have an end location. The sending location can be a bit more complex. Though there is no 100% consistent place this information is in the letters, at the end of letters there is often a “Writ at” statement, such as “Wryt at London.” As a bonus, these statements often have dates associated with them, such as “Wryt at London on the 3rd day of March.” By looking for the phrase “Writ at”, you can find this information. True, the phrase can appear in other parts of the letter, but the statement we’re looking for is often at the end of the letter, so if we start looking at the end, we can reliably find what we’re looking for.

 

Now that we have a sense of distance and timing, what else gives us relevant information? Looking for mentioned dates can give us a sense of time frame. As I’ve said, we have a reliable way to find dates by looking for “Day of” statements, and by using combinations of techniques one and two we can find different mentions of months. Another helpful piece is looking for words that imply urgency, or reference sequences of events. Words like “Haste” and “Tidings” are good candidates. I also found looking for “Understand” to be a good method, since it is often used to describe events, such as, “I understand that you’re trying to make a deal.” We can write a program that returns the number of appearances of the words, and use that to create statistics. For the Cely Letters we got these results:

‘Haste’ mentions: 33/147

‘Tidings’ mentions: 22/147

‘Understand’ mentions: 85/147

Tidings and understand mentions within same letter: 15/147

If we want, we can also return a program that returns the words around the program (in this case I went with five words ahead and behind) to get a sense of the context of the words within the sentence.

 

The last thing I looked for are something I call “Receive statements”. Sometimes, a letter is written in response to another letter, or the sender wants the receiver to know that they were told a specific piece of information. To acknowledge this, a letter will often have a phrase along the lines of “I received your letter written at x place on y day”. This gives us a direct sense of time periods, especially when compared to when the letter in question was written. We can find these statements by looking for the various versions of received, then returning a text chunk that begins there, and ends to either a date, or an arbitrary number of words (I chose 15). This way we account for formatting irregularities.

After going through the process of extracting the data, we come away with a wealth of information about distance and timing, ready to be critically analyzed. On the surface the letters can be tedious and confusing to work through, and the use of programming to parse them allows us to pick up thing that had a large chance of being missed. In addition, though no letter collection has the exact same format as the Cely letters, others share great similarities, and even if the programs already written cannot be directly applied, the techniques can be reimplemented to allow for quick and efficient information extraction. Overall, the use of programming languages can greatly aid our examination of letters and texts, teaching us more about travel in the medieval world.

Chronography’s Geography: What counts as Geographic Reference?

By Jesse W. Torgerson and Ethan Yaro

Note: This is the fourth in a series devoted to the project “Narrative and Geography in the Chronicle of Theophanes the Confessor“. First post here; second here; third here.

The approach we are describing in detail here allows us to artificially reconstruct, in our database, something of the passive geography that a ninth-century Byzantine (reading about her or his own City and Empire) would have been relying upon to follow the narrative of the Chronography.

Though the process of revealing this geography — and of explaining our methodology! — is painstaking, we find the direct impact of these decisions upon our results makes each of them quite fascinating. In this post, we continue the explication of our methodology for capturing the geography – or, to be more exact, the geographic references – of the Chronography of Synkellos and Theophanes.

Having described how – into what sort of sections – we decided to divide up the text content of the Chronography, here we explain what items we decided to “tag” as geographic references.

What to tag as a “geographic reference”?

A careful reading of the text immediately revealed that geographic references manifest themselves in a number of ways, some more explicit than others.

  1. Explicit Geographic References

Many geographic references are simple, explicit references to “mapeable” locations: places such as cities & buildings; landforms such as mountains & rivers; political zones such as regions & districts, etc..

To give an example of how we would “tag” such items, consider the one-sentence example from the Chronography cited in the previous post:

AM 5796

Diocletian lived privately in his own city at Salon in Dalmatia while Maximianus Herculius lived in Lykaonia.

We have already “tagged” Diocletian as reigning emperor. If we were to now tag this sample sentence for its geography, we would tag (or code) the explicit geographic references: “Lykaonia”, “Salon” (in Dalmatia), and “Dalmatia” (itself).

AM 5796

Diocletian lived privately in his own city at Salon in Dalmatia while Maximianus Herculius lived in Lykaonia.

Even with these “explicit” geographic references, we had to make a subjective decision. Does “Salon in Dalmatia” count as a single reference, or as two?

Our project goals led us to tag “Salon” (in Dalmatia), and “Dalmatia” as two distinct references. Since our primary goal is to track how the text works with a reader’s mental associations, it is undeniable that the text calling attention to Salon’s Dalmatian location brings Dalmatia to mind;  for us it is not sufficient to exclusively specify the correct (Dalmatian) “Salon,” but not Dalmatia.

  1. Indirect Geographic References

A significant percentage of the geographic references we tag are not nearly so straightforward.

Some of these are indirect references like “the city,” or “that region,” in which reading the surrounding sentences determines exactly which city, or region is referred to.

An important secondary consideration here is the strength of an evocation. When the Chronography states “the city,” and means “Constantinople,” is that just as much of a geographic reference as if it had stated “Constantinople”? Should we somehow rate indirect geographic references lower since they do not set a specific place name before the reader’s eyes? Or, should we rate them higher since the reader needs to make the stronger mental effort to retain in memory which city or region is being discussed?

Our practice has been: having determined the place that these references mean, we tag indirect references just as though the text had stated the place itself. We have not differentiated for “strength of reference” in our database. “Salon,” “Constantinople,” or “that city” count for the exact same “weight” of geographic reference.

  1. Vague Geographic References

A similar issue arises with vague, gesturing geographic references.

Consider two sentences from one entry:

AM 5885, AD 392/393

In this year the pious emperor Theodosius fought bravely against Eugenios at the passes to the Alps, and, after capturing him alive, executed him. … The most Christian emperor … ordering that bishops from the East should come to Rome for this, among whom was sent Akakios of Beroia.

“Passes to the Alps” surely counts as an explicit geographic reference, and it is easy enough to “tag” in MaxQDA as a reference to a mountain range. However – as we will discuss in detail in a future post – this sort of entry is extremely perplexing for one of our desired outputs: a map of all geographic references. What does “the passes to the Alps” look like on a map?

An even more difficult example arises in the second sentence. Plotting “Rome,” or “Beroia” is simple enough, but what about “bishops from the East”? Surely this is a “geographic” reference in that it evokes a region in the mind of the reader. But it is exponentially more difficult to map “the East” than even “passes to the Alps”: should we think of “the East” as shading the entire Eastern Mediterranean and Persia on a map? We also have to be able to decide what kind of a geographic reference this is in order to tag and categoize this reference within MaxQDA. For now we will also set this issue of categorization aside as the subject of our next post.

A second type of “vague” reference is when the current state of scholarship does not allow us to know exactly what is being referred to: we simply cannot be certain what each reference evokes.

For instance, consider this statement in the Chronography:

AM 5887 (AD 394/5)

In this year Arkadios, on being appointed autokrator, built the big portico opposite the Praetorium.

The Praetorium of Constantinople is understood to be located in the Southern curve of the round Forum of Constantine, but there is no known “Portico of Arkadios” in Constantinople.

There may well have been a portico, as yet unexcavated, heading South out of the Forum of Constantine; one could imagine this portico described as starting “oppposite the Praetorium.” However, Arkadios’ known building activity within Constantinople is focused much further to the West, dominated by his Forum and famous Column. Thus the Chronography’s translator, Cyril Mango – a leading expert on the archaeology of Constantinople – was so doubtful as to the existence of an otherwise unattested Portico of Arkadios that he suggested:

Since Theophanes was making considerable use of his Alexandrian material at this period, this too may well have come from the Alexandrian source and so refer to Alexandria rather than Constantinople.
(p. 113, footnote 2)

What to do? Not only are we unclear where within Constantinople to place this reference to a “Portico of Arkadios,” we don’t actually know if this is Constantinople, Alexandria, or elsewhere. In this case, we decided to follow Mango’s lead, and mark this as a reference to Alexandria. There are many similar judgment calls that we have had to make in creating our database of geographic references.

One more such example – another instance where the simple lack of historical information we possess requires us to take significant interpretive liberties – is worth considering:

AM 5878,

… a small basilica … built at the old Basilica, near the Great one

The city of Antioch on the Orontes, the location of these buildings, is one of the most important and populous cities of the Eastern Roman empire and, as such, is often mentioned in detail in the Chronography. However, archaeology has had little chance to recover its topography.

We can deduce that this sentence almost certainly describes an extension to the pre-Constantinian basilica church (“the old Basilica”), near the famous Constantinian Octagonal Church (“the Great One”).

In these phrases we have, in total, three distinct references to the internal topography of Antioch on the Orontes. The “Old” (pre-Constantinian) basilica would receive two tags, thus:

  • the phrase “a small basilica” (tagged as “The Old Basilica” as an extension on said church)
  • the phrase “the old Basilica” itself (tagged of course as “The Old Basilica”)
  • the phrase the “the Great one” (tagged as the “New (Constantinian) Basilica”)

By now the point should be clear: the work of tagging geographic references in a narrative text is much more heavily interpretative than might initially be supposed.

  1. People and Events as Geographic References

This category of interpretive decisions captures several different types of items that we have determined are geographic references, but which other readers may think are not.

Potentially the least controversial of these decisions was to tag events tied to specific places.

The most obvious examples of these are church councils, such as the Council of Nicaea in 325. It is true that a mention of the “Council of Nicaea” in an entry hundreds of years after it happened is not a direct reference to the physical city of Nicaea. Nevertheless, while we grant it is indirect, we find the point that “the Council of Nicaea” does recall the city of Nicaea to the mind of the reader compelling enough, to tag such phrases as a geographic reference.

Our decision to identify people groups as geographic references opens up a second category of interpretative tagging. Some might consider people groups as only very tangentially “geographic,” but we have taken instances of “the Gauls” (for example) as geographic references to Gaul (Gallia).

Our justification is that in almost all cases the name of a people group (Gauls/Gallia, Sklavenoi/Sklavinia, Khazars/Khazaria, etc.) is the land wherein this people live.

As with our earlier discussion, the reasoning here is based on our central goal: to capture the place-based references that the text would evoke in the mind of its reader.

A third category of border-line decisions was to tag all references to titles which were themselves tied to any location. Most of these references are explicit, such as citations of episcopal figures, whose very titles are an undeniable reference to a place: “Peter, Bishop of Alexandria.” Thus, every mention of a bishop (and many mentions of priests or monks) ends up counting as a geographic reference. This reasoning process also applies to a number of secular officials.

As a final example of how the decision to include all these sorts of items in our database works in practice, we can consider the first sentence in the following entry.

AM 5937

In this year Kyros, the City prefect and praetorian prefect, a very learned  and competent man, who had both built the city walls and restored all Constantinople, was acclaimed by the Byzantines in the Hippodrome, in the presence and hearing of the emperor [follows]: “Constantine built [the city], but Kyros restored the City!”

“Kyros the City Prefect” is tagged with the office of “City Prefect,” since this office cannot be understood without reference to Constantinople itself. Likewise, Kyros’ second posting as “Praetorian Prefect” inevitably evokes the Praetorium within Constantinople: in our understanding a reference to a location that is just as strong as “the city walls” or “the Hippodrome.” Furthermore, and more controversially, the final reference to “Kyros” alone is given two geographic references. Reasoning that the reader now understands “Kyros” as “Kyros the City Prefect and Praetorian Prefect,” this second reference to Kyros would be tagged as both “City Prefect” and “Praetorian Prefect.”

We can also use this example to tie in some of this post’s previous points.

“The Emperor” here would be tagged as Theodosius II (the reigning emperor).
“Constantine” would be tagged as Constantine I. Even though Constantine is not the reigning emperor, tagging emperors comprehensively allows us to track their relative importance throughout the Chronography.
“The City” and “Constantinople” would be tagged identically.
“The Byzantines” would also be a reference to “Constantinople” since it evokes the people that live in that location, the city of Byzantium.

It should now be clear how we arrived at the statement with which we began our previous post, that approximately 20% of the text can be categorized as making geographic references. In the above example of 51 words, we tagged 16 words (31%) as “geography,” and 3 words as references to an emperor.

Conclusion: the mind of the Reader

As a transition into our next post, in which we will break down how each of these tags would be categorized, and why, here is an image of these overlapping tags in our MaxQDA database:

In all of our decisions about indirect, vague, and other implicit geographic references, we have opted to tag an item as “geography” when we think it is viable to assume that an attentive reader would make a connection between a word (or phrase), and a place. The image above provides an analytical map of how we are using MaxQDA to try to capture something of the associative, overlapping references to place and space that the mind of an attentive reader would categorize as they proceed through the text. Our procedures are directly derived from our primary goal: to capture all of the “place-references” swirling about the mind of a reader of the Chronography.

It is worth recalling an important point made earlier in this post: once we have determined that a phrase is a geographic reference it receives the same tagging “weight” as any other reference, no matter how “indirect” it may seem. In our database, all geographic references are created equal.

The driving principle behind our methodology is to tease out references that would otherwise be lost upon the modern reader. By running the risk of possibly over-emphasizing geography, we believe we gain a more careful reading and a fuller appreciation of the density of references that can become a hazy fog for even the most seasoned Byzantinist. Our approach allows us to artificially reconstruct, in our database, something of the passive geography that a ninth-century Byzantine, reading about her or his own City and Empire, would have been relying upon to follow the narrative of the Chronography.

Chronography’s Geography: Software & Database Structure

By Jesse W. Torgerson and Ethan Yaro

Note: This is the third in a series devoted to the project “Narrative and Geography in the Chronicle of Theophanes the Confessor“. Our first post considered what the question of place in narrative means for historical research, and our second the question of  mapping ‘space’ v ‘place’. A subsequent post will explain what we consider ‘geography’ in the Chronography.

When we began this project, we had a vague inkling that it might prove productive to analyze the geographical content of the Chronography of George Synkellos and Theophanes the Confessor.

Despite having read the Chronography many times, when we began to actually hunt, line by line, for “geography,” we quickly realized that we had actually  under-estimated the extent to which the Chronography hung on such references. We also realized how difficult it was to determine what, exactly, counted as a geographic reference.

In a previous post we hinted at what we have already discovered, stating “in an exploratory attempt to determine the percentage of the text’s words that were explicitly devoted to ‘geography,’ we came up with the shockingly high figure of 20%.”
We then promised to explain what we meant by this and how we arrived at this number.

The next three posts on our Narrative and Geography project constitute that explanation. We will attempt to explicate our methodology for capturing the way geography works – or, to be more exact, the way geographic references work – over the course of the narrative of the Chronography of Synkellos and Theophanes.

Choosing an Analytic Software

Based on the advice of lab “network” member Jason Simms (Lafayette College), we opted to use MaxQDA to “capture” the geography in the Chronography, and then to perform initial analysis on this data.

Using MaxQDA, we set out to:

  • tag (in MaxQDA’s terminology, to “code”) all geographic references
  • categorize each reference
  • track where references occurred in a way conducive to comparative analyses

MaxQDA’s selling point for this project was the degree of flexibility it allows us in manually coding each section of the text, from extended sections down to specific one-word references, in exactly the way we wanted. This has proven analytically productive especially for the second goal (above).

The Goal: Tracking Geographic References

As argued in the previous post, we started with the premise that a chronography establishes its own geography for a reader. That is, while a Chronography may look to us, today, like some form of a chronlogical encyclopedia (“I wonder what happened in …”), we believe the text rewards readers who (at the very least) read significant sections straight through and – even more – actually read the work from cover to cover as though it contained a narrative and argument that could be, or need be, followed.

With this premise, our goal in tracking geographic references is to better follow, or re-create, a ninth-century reader’s experience with the Chronography. If a ninth-century Constantinopolitan sat down and read through the Chronography, what regions of the empire would be consistently dwelt upon? What regions would be gradually abandoned? What regions would come into focus? Which regions would be associated with which historical characters or emperors? Which regions would be associated with which conflicts – whether military or philosophical or political? Where, in short, would a reader see, in their mind’s eye, the different parts of the story play out?

We thus designed our methods with the over-arching goal: to make the mass of place-specific references coherent to twenty-first century readers in something closer to the way they would have been for a ninth-century reader, to better approximate the mental image that the Chronography might have formed in an attentive reader’s mind.

What our methodology cannot do – of course – is to recreate the associations a reader would already have had with any specific place. Our methodology seeks to simply plot the associations that the Chronography makes internally, for itself, as though in isolation, all to find out:

What is the geographic world that the Chronography actively created for its readers?

Questions and Procedures

In order to determine what proportion of the text was concerned with geography, our initial task was to determine what constituted a geographical reference. This project began in the Summer of 2016, and so our thinking has evolved somewhat as we carried out the research.

In describing our current methodology, we can now distinguish two central issues:

First, how – into what sort of sections – do we divide up the text content?

Second, how do we decide what items we “tag” as geographic references?

Third, how do we go about categorizing these “tagged” references?

We will deal with the first in this post, the second and third in the posts that follow.

How to break down the text and group the geographic references?

Before actually tagging any specific geographic references, we had to decide how we would group (or, from another perspective, separate) them, once we had them.

What constitutes a “textual unit” or “section” of the text that we can use for comparative analysis (i.e., that would allow us to viably compare a section X of the text with a section Y)?

Deciding how to divide the text, how to group the geographic references, is a decision with consequences for the entire project, ultimately determining the research questions our database can answer.

Realizing that the analytical questions we will be able to ask were at stake, we focused on what we conceived to be our ultimate goal.

Since our goal can be described (above) as seeking to better understand how the text is working with the mind of its reader (reading with, rather than against, the grain of the text), we wanted our groupings to reflect the most explicit divisions of the text itself.

  • Group by Yearly Entry

The most obvious way to divide the Chronography, and thus the geographic references we find, is by the Chronography‘s own yearly entries.

What does this mean for our data-gathering process?

To use a one-sentence example from the chronicle:

AM 5796
Diocletian lived privately in his own city at Salon in Dalmatia
while Maximianus Herculius lived in Lykaonia.

In this citation, any geographic references (e.g., to Salon, and Lykaonia) would be linked by falling under AM 5796.*

*As a brief aside for those who have not read the work, the Chronography organized entries primarily by “Years of the World” (Greek: κόσμου ἔτη), conventionally expressed in scholarship by the abbreviation “AM” from the Latin “Anni Mundi.”

This seemed to us a fairly straightforward and uncontroversial decision.

As an added benefit, there are some significant differences in what content falls under which years between the earliest Greek manuscripts (Paris Grec 1710 vs. Oxford Christ Church College Wake Greek 5 vs. Vaticanus Latinus 155). Dividing geographic references by year will allow us, in the future, to tweak the database to reflect the content of each of these individual manuscripts and so compare whether the change in reckoning between these manuscripts changes the function of the geographic references in each.

  • Group by Reigning Emperor

The science of late antique and medieval chronography was primarily built around coordinating reigns of emperors, kings, and bishops.

It was only once these lists of reigns had been coordinated that a “Year of the World,” or a “Universal Year” could be asserted.

Thus, the most obvious way to establish a comparative division of the Chronography was to also divide the text by reigning emperor.

In practice, this meant that not only did we divide the text into the sections that corresponded to each Roman emperor’s reign, we also tagged each mention of each emperor in the text itself, in the same way that we “tagged” places. This allows us to establish a “geography” for each emperors on two levels.

First, there is the general geography for each emperors’ reign, in which all geographic references under, for instance, Diocletian, are simply a single group.

Second, by tagging each emperor as a historical character, Max QDA’s analytical functions allow us to track the specific geography with which these “main characters” of the narrative are most closely associated.

This second method allows us to also apply our “geographic references” data as supplements to more narrative analyses that might want to, for instance, ask whether there are certain geographic trends that correspond to a praise-, or blame-worthy emperor.

Thus, by tagging emperors in these two manners, we are able to track how geographic references change, compare, or contrast between emperor’s reigns, between emperors as characters in the narrative, as well as between all specific yearly entries.

To Conclude:

If we consider the example sentence, above, the entire sentence (and the rest of the entry) would first be tagged as “AM 5796.” This means any specific geographic reference is also coded for this year: if we pulled all references to Salon (for example), we would also know that one reference occurred here, in AM 5796.

In addition, this entry and all other entries for the reign of Diocletian (AM 5777-5796 inclusive), would be tagged as “Diocletian.” This means we are also tracking all geographic references made under Diocletian’s reign as a coherent group, attributing them all to that emperor’s reign. This allows MaxQDA to immediately give us a picture of the “geography” used to tell the story of Dioclectian’s reign.

Finally, the appearance of Diocletian’s name in the text proper would mean we tag this single word in AM 5796, “Diocletian,” as a direct reference to the reigning emperor. When we pull references with a close association of grammatical proximity to “Diocletian,” we would find Salon, Lykaonia, and Dalmatia among the results.

We believe these analytical divisions not only correspond to the explicit way in which the Chronography is organized, but also correspond to the substantial content, much of which has to do with assigning praise or blame to specific emperors. This latter connection will allow our tagging of geographic references to not only tell us something about how geography – in and of itself – works in the Chronography, but will allow us to incorporate these findings in arguments about how to interpret, or read, the text and its polemic.

Having established our means of dividing up the text of the Chronography, in our next post on methodology we will turn to how we determined which words and phrases to count as geographic references.