Connecticut Digital Humanities Conference: Traveler’s Lab Presentations

By Vasilia Yordanova

On February 22, students and professors collaborating on the CDER project and other digital humanities research projects at Wesleyan presented their work at the 2025 Connecticut Digital Humanities Conference at Central Connecticut State University. Will Markowitz, Arushi Khare, Akram Elkouraichi, and Professor Torgerson spoke about their progress on the CDER project, beginning by explaining the goal of accumulating data from across many sources into one digital platform to allow for easier access to relationships between data and to facilitate interaction between academic disciplines. Students’ roles in the project are researching sources and incorporating them in the platform, and compiling data in the platform, with Professor Torgerson’s guidance.

Will explained the NodeGoat platform, which the CDER project uses to accumulate and centralize data in an accessible digital format. Then, Akram discussed linked data and his work on modeling the Istanbul walls. Will spoke about relational data and examining relationships between different kinds of data, including geographical relationships (as the project is space-based). He demonstrated how these relationships appear visually in NodeGoat. Arushi described the work of linking seals from the Dumbarton Oaks collection in Georgetown to an open database of people living in Constantinople and to official buildings where offices would have been stationed. The goal is to link information extracted from sources back to those sources in NodeGoat. 

Arla Hoxha and Zaray Dewan spoke about their work on the Chronicles project, and students and professors working on the Life of Milarepa and Chinese language theaters in North America also presented on their recent work at the conference.

CDER Project update: Mapping Byzantine Seals and People

By Alex Williams

At the time of the last blog post (May 2024) about Constantinopolitana: A Database of East Rome (CDER) project, we celebrated the creation of a prototype database, consisting of multiple types of objects and artifacts from across Constantinople. Each object type had about 20-100 data points (to read more about this process, check out our last blog post!). Of course, lots of this data is directly integrated into CDER from preexisting datasets, many of them having thousands of entries. 

With permission from the owners, we want to figure out a methodology to add these items into CDER to help find wider connections than one institution would be able to create or develop on their own.  This involves combining large datasets, a more technical task compared to the prototype, as there is lots of data cleaning and preparation involved.

This summer, I worked on combining existing data on Byzantine lead seals with existing prosopographic data (data describing individuals alive during the Byzantine empire). 

Datasets and Process

Byzantine lead seals were used to seal letters sent across the empire. These seals contain information about the sender such as their name, title, and occupation, allowing insights about people and the bureaucracy. Imperial titles, known as dignities, demonstrated a seal owners’ place in the imperial hierarchy. An owner’s office, or occupation, also appears on a seal. 

For this stage of the project, we worked with roughly 16,000 seals from the Dumbarton Oaks(DO) Byzantine Seal collection, in collaboration with Johnathan Shea. This collection has already been digitized. A crucial part of integrating this data into our model was making the seals data into relational data, meaning that some parts of the data (in this case dignities and offices) get tables separate from the seals table. 

We chose to work on data regarding people and seals in conjunction because of the large, preexisting, and already partially connected datasets. Byzantine prosopographic data has a wide overlap with seals, some of which is already documented. This is because the prosopographies use seals as evidence for the existence of people. For example, we might know new information about a person from a seal, or that might be the only record that a person existed. With this project, our goal was not to create insights or establish new ‘readings’ of the seals to create connections with people, but rather to digitize and store existing connections in a database, in addition to creating a model for incorporating other types of data into CDER. 

For the data regarding people, I worked with the Prosopography of the Byzantine World(PBW) database covering 1025-1180 AD. I also did some experimenting (hopefully more to come) with the Prosopography of the Middle Byzantine Period (PMBZ) covering 641-1025 AD.

There is already some data overlap between the DO seals collection and the PBW. A lot of my time was spent cleaning up the data and making it consistent, and understandable to our model, which means that each row has to contain the same columns. Each column must contain its information in similar formats across rows. There were also lots of text analysis strategies used to extract information from how it was originally formatted in a sentence, phrase, paragraph, or description. We also learned methods for importing into Nodegoat (the digital humanities software we are using to store the data), as well as methods for making connections technically. 

Visualizations

Below are some graphs and descriptions of what is possible with this data as a database or dataset, and which point towards the next steps for analysis.

I want to preface that these graphs do not give a perfect, or even good, representation of what’s going on in history, or even in the seals dataset. First, the number of seals is not necessarily representative of the actual number of offices or dignities of a certain type, or the number of people in these roles. Some offices might have been sending letters more often (so there would be an overrepresentation of that office on seals). In addition, visualization is more complicated because of the way seals are dated. We know some of the seals are from a certain set of years, and many of them can be dated to a single century. But for some seals, we are not sure of the exact century, and so they would be dated to two or more centuries. In the graphs below, I wanted to avoid double counting the seals, so the date used is the first date in the estimate. For example, both a seal that we know is from the eighth century as well as one we know is either in the eighth or ninth century would be represented in these graphs as part of the eighth century.

Figure 1

Figure 1 depicts the number of seals over time, with colors representing the offices of the seal’s owner. Figure 2 shows the seal count for the top six overall dignities over different centuries.

Figure 2

 

Figure 3

Figures 3 focuses on the dignity ‘Patrikios’ which was high ranking in the 8th – 10th centuries, losing importance through the 11th. (Shea, 2020; Kazhan, 1991). Figure 3 shows the different offices on seals with the dignity Patrikios. 

A historian can use these visualizations to understand how to develop their inquiry. For example, a historian interested in the hierarchy of a specific office such as the dioiketes could use a visualization such as Fig. 3 to understand which other offices ranked similarly over time. 

Figure 5

Figure 5 shows a section of a network graph for the dignity Patrikios. The largest other nodes are dignities ‘imperial protospartharios’ and ‘anypatos’, as well as the office ‘strategos’, the name for a military general. You can also see tight clusters of seals together in the graph: these clusters are made up of parallel and related seals, some of which are identical. Some broader clusters might be hints to explore certain connections further. The connection with people (white dots in Fig. 5), could allow historians to better date seals, if there are other sources about a specific person, or understand an individual’s trajectory through offices and dignities with more context.

Next Steps

There still are some technical steps to finish up this part of the project, consisting of changing data types and storage methods to store values more efficiently. We are also planning on connecting the seals data to the PMBZ. These connections are slightly more complicated as the PMBZ does not contain direct references to the new format of seals, so bibliographic information on each seal involved has to be extracted, normalized, and then matched to bibliographic information in the DO collection. For the lab as a whole, this semester we are going to work on building structure for other objects (statues), as well as continuing our focus on buildings and incorporating location and geographical information into the data. 

Note: This blog post focuses more on the conceptual aspects of the project. If you are interested in any technical details, reach out to apwilliams@wesleyan.edu. 

Summer 2024 Chronicles Project Update

By Arla Hoxha

During summer 2024, Lab Manager Arla Hoxha ’25 continued developing Traveler’s Lab Comparing Chronicles Project through a QAC (Quantitative Analysis Center) Summer Fellowship. Throughout the summer, she experimented with different statistical methods and software and explored how new methods might be utilized to better compare different manuscripts from the Annals of Fulda. The summer research process culminated in a poster presentation session summarizing the progress of Chronicles up to that point, as well as visual representations of our work.

The presentation started off by giving a definition of the event unit for a general audience unfamiliar with our work. In recent years, there has been an effort in the field to shift the focus from the chronological development of historical texts and towards the development of the narrative. Events do not always show up in a narrative as they occur chronologically; they are compiled by an author who chooses what events to include and how to arrange them. Studying chronicles using ‘years’ as units reveals little about the chronicle and even less about its author as it is too broad and unable to capture the nuance of meaning in the language and the way text is ordered. The intervention of the Chronicles Project so far is the alternative unit of observation; the event defined as a string, spanning from a few words to paragraphs, with a central theme (event types), consistent named tag entities (characters and setting), terminating with a change of temporal identifiers or agents in the narrative. We expand more on these definitions in our previous methodologies.

Using a narrative-focused approach in the study of chronicles and the unit of the ‘event’ we explore chronicle entries from the Annals of Fulda—the text which continues to be our main focus—and determine how events differ on the manuscript level. The focus of the summer research was reducing events to their differentiating components and using different text analysis tools to compare these components across events to determine event similarity. We experimented with Python text mining libraries such as spaCy to filter the entries. spaCy is an open-source library used for Natural Language Processing (NLP) in Python. The program developed takes a csv file with the event titles, years and manuscripts. It then splits the title into its components, then using spaCy’s tokenization feature it tags each component into parts of speech and then tries to match entries based on named entities, verbs, etc. This method of filtering can also easily be accomplished through Nodegoat. The idea behind using event titles for the comparison is that they are supposed to capture the event and utilize specific verbs from the event that are representative.

An interesting function spaCy allows is comparing words and giving their similarity through a percentage (cosine similarity model), which can be utilized to compare two similar verbs used in an event title—this could be used to match events that are potentially the same although they have been differently labeled, forgoing the issue of human error. ‘Northmen attacked’ and ‘Northmen plundered’ do not have the same label, nor are their passages textually the same. But we can filter events by year and look for different manuscripts, then differentiate between events based on their type categorizations; checking for cosine similarity above a certain value can help us determine whether ‘plundered’ and ‘attacked’ have a similar meaning and therefore ‘Northmen plundered’ and ‘Northmen attacked’ would be understood as referring to the same event. However, this functionality comes with its own problems; it is limited by the library’s vocabulary and its efficiency is undermined by lack of accuracy. Moreover, even though the verbs used in the titles are important this method overemphasizes the way events are titled over other elements. The accuracy can be increased by training a module using data that is specific to our project. For now, we can go through the process of determining if two verbs are similar enough using a human reader, which is a slower but more accurate process. Although this is beyond the scope of our project as of now, it might be interesting in the future to train a module that is specific to the Chronicles Project, which could prove useful in automatizing part of the process of detecting the same event.

Returning to the main topic of comparing events, two events are the same as long as they speak of the same event; that is the same event type (see Diana Tran’s Event Type methodology here), with the same named entities. Whether they are textually the same is less important. We found events that spoke of the same occurrence by filtering (through the method described above as well as Nodegoat filters) for events with the same title happening in the same year but different manuscripts. An instance of this is ‘Pope Hadrian dies’ in year entry 885 for both manuscripts 2 and 3 in Annals of Fulda:

The passage differs between manuscripts, but the central idea captured in the title remains: ‘Pope Hadrian died.’ Despite the first event including more details surrounding the death of Pope Hadrian and the difference in length of the two event entries, both refer to the same event. Beyond the categorization under the same title, the events in both manuscripts are cataloged under event type ‘Birth/Death’ and both have ‘Pope Hadrian’ listed as a principal actor. The way the event is named is helpful in communicating the ideas of the passage and helping us identify them correctly. Here we see how ‘Even Types’ can be a powerful tool in determining event similarity as well as understanding distribution of events across time. We believe that the categorization of events by event types will increase the accuracy of determining ‘same’ events.

Of course, over relying on Event Types presents the caveat of human error that is built into this categorization. We observe that chronicle year entries with largely the same events have different distributions of event type tags:

The data in use is the cross-referenced chronicle transcript from the different manuscripts of the Annals of Fulda stored through the Nodegoat environment. The parsing of the data and visuals were completed using Google Sheets for data management and plotting libraries in R.

What is the importance of determining events that are the same across manuscripts and chronicles? By pointing out the similarities between events we are able to discern their differences as well and start asking questions about authorship and the context in which different manuscripts emerged. The same methodology we have been using on events from different manuscripts of Fulda is to be applied to events that pertain to different but overlapping chronicles in the future.

Through the work done this summer, determining event similarities we found overlap in a few events from different manuscripts. We attempted to tag text from different manuscripts describing the same event under the same event tag, so we tried to list the passage of events from different manuscripts under the same event object entry. Perhaps the most important result from our work this summer was noticing the issues with our old model of event categorization. Our old model required the passage to be written out on the event description; in the case of two events we would have to do so for both passages and so on depending on the number of manuscripts that describe the same event. Our future goal is to be able to cross reference the English text with the original Latin, this would require us to list another passage under the event entry. We realized that this methodology would become increasingly more insupportable. Moreover, the work comparing chronicles in this way made apparent certain redundancies in our data. Most notably, named entities appeared twice; as tags in the chronicle object and cross-referenced in the object descriptors of events.

To reduce the redundancy in our data and to make the process more efficient moving forward, we implemented a mass restructuring. The new objects are as follows:

Chronicle Entry is now the object that contains passages. Passage is the object where named entities are tagged—previously this was done in Chronicle Entry. Now events are linked to the chronicle entry through the Passage object. The passage is cross referenced in the object description of the event, but the event is not tagged in Passage as to reduce redundancy. Named entities on the other hand appear as tags in Passage, but they are not cross referenced in Event, because Passage already contains them, and Passage is cross listed in Event.

Here is an example of the Passage object: ‘in entry’ is the link to the Chronicle entry, which in turn is linked to the object ‘Chronicle’ which stores the references to the source texts, i.e.: the English translations of Annals of Fulda. The text is the passage, as well as the text of the event in question. Passage number shows where the passage/event occurs in relation to the other events in that same year, emphasizing the progression of narrative over chronological progression, one of our main goals that we have previously had issues representing in Nodegoat.

The event no longer has description tags for ‘Places’ and ‘Person’ because those are tagged in Passage and can be cross-referenced through it. This also makes the work of referencing multiple passages under the same Event much easier; instead of writing down multiple events in the description of the Event object, they can be cross-referenced. This will also aid us in the eventual transition to Latin. The only elements directly listed under Event are the event title, Chronicle entry, and in the sub object, the Event Type.

Other smaller but not insignificant improvements to the model were cleaning up the data of the Person and Places objects where each entry was filled out with relevant information that was previously missing, such as birth and death days for Person and coordinates for Places. In the process is also the development of types of locations to categorize Places.

The Event-Based Narrative in the Annals of Fulda: Results

The Fulda project, a quantitative-qualitative analysis of the chronicle Annals of Fulda through the platform Nodegoat, resulted in a fully-fledged database of chronicle entries, people, places, and events. The model used to map events is a novelty in the field of chronicle-studying, and one we hope will continue to be replicated and improved upon. We hope our database will aid scholars reflecting on this time period or thinking about questions of narrative and the anatomy of the chronicle; why are they put together in a certain way? Enriching the model and data and refining our processes are next for our team at Traveler’s Lab. Following the footsteps of previous projects in the Lab, even though we started Fulda from zero, many of our goals were realised during this summer, some of which are outlined in this article. 

The obvious, and maybe the most important achievement was the database itself, with the chronicle fully uploaded to Nodegoat. Anyone with access to the database can find, categorize and visualize elements of the chronicle, such as the people or places in it. The chronicle entries are tagged by which chronicle and manuscript they belong to and the text is fully mapped with object tags. This makes it easy to analyze the chronicle based on the elements that interest a researcher. Through the Nodegoat configuration, it is possible to see the way all the data is linked to each other; what events take place where, who is involved, how many times a name is referenced in a chronicle entry, comparisons between multiple entries or events, and more. 

A great feature of the database is the link to ancient locations through Pleiades. All events have a location tag which allows us to visualize the events in a geographical map. Interconnectivity is one of the best things about this model because not only do we have data on different places and events but we know what happened where and who was involved.

Creating a model for determining events that allows us to follow the logic of the narrative was an important achievement this summer. The process involved much trial and error and remains a work in progress, but we were able to refine the bulk of the process, as explained in the last article. In creating the event objects for the database we made sure to use the text’s language as well as date the events based on the sequence of the narrative rather than historical time (although a descriptor was provided for this, in case a specific date was present in the text). All this was done to shift the focus towards the narrative and follow the logic of the chronicle and what the text deems important, as opposed to our reading of it. We hope this model will inspire and allow for more thorough analysis, that leaves less room for misinterpretation. 

In a future ambition for expanding the project, we hope to use the comparative tools provided by Nodegoat and the construction of the model to run comparisons between different manuscripts as well as the English and Latin versions of the text. Onboarding a scholar of Latin to work with the team is an aspiration which would further enrich the Fulda project. 

As stated before, we hope to expand our model to include data from other Carolingian chronicles, such as The Royal Frankish Annals. We hope to inspire other scholars to use quantitative methods, especially those that centre the narrative, in their research of chronicles from all time periods. 

Although much progress was made during this summer, there is always room for improvement. In the upcoming semester, we anticipate fixing our problem with accessing locations not covered by the Pleiades ancient locations database through an API. We also hope to find ways to automate as much of the text-tagging process as possible. Some of this has already been done, through the Reconciliation system in Nodegoat, but we wish to refine the process further. The use of AI shows big promise in this regard, as was discovered during an experimental session of using OpenAI to perform people text tagging. Implementing and integrating this process into the Nodegoat object tagging is one of our goals for the future.

The Event-Based Narrative in the Annals of Fulda: Methodology

Introduction

In line with other Traveler’s Lab projects, this undertaking was the beginning of a long exploration of using quantitative methods in the study of medieval chronicles by following the logic of the text through its narration, rather than that of chronology.  This project, drawing from the 9th-century Carolingian chronicle, the Annals of Fulda, served as an experimental model that will inspire similar practices in the way we study chronicles. Describing the work of a whole summer, this article will focus on the methods used to study the Annals of Fulda, including the constructed models we hope will have a wider impact.

Methodology

The whole text of the chronicle the Annals of Fulda was parsed, scanned and uploaded to Nodegoat, a web-platform that allows for data modeling and contextualization through spatial and temporal elements. Nodegoat allowed us to create our own objects to map our data (from the text) such as Person (the historical people part of the event) and Places (the geographical area where the event was happening). The text was systematically mapped with tags of Person, Places and Event objects. A new object added to Fulda was the Religious tag, which is used to map religious celebrations such as Easter or Christmas, that occur throughout the text. Starting to map Fulda not having used the platform before was made easier through following the sample of Fulda’s sister project The Royal Frankish Annals, modeled by Daniel Feldman. Therefore, many of the objects were already set up, and only needed to be furnished with the new data. In order to have both projects sharing the same object database we created the Chronicle object to differentiate between them as well as different manuscripts of Fulda

The team had already started thinking about new ways to express events, in a way to make them help us better understand the narrative. The way the project defines the event is different than we might think of them regularly. For instance, an event is not only a battle or a coronation, or an ‘important’ happening; anything can be an event. In fact, everything is. Every couple of sentences focusing on a specific narrative (following certain guidelines for time and place), was mapped as an event. 

Determining what constitutes an event and creating the event dataset was a challenging experience and a process we are refining to date. With the intention of fully capturing the text of the chronicle, we started developing a model where every sentence would be an event, but soon realized that this would not fully capture the scope of the narrative. We then opted for a definition of the event that was more narrative-focused where the events would terminate depending on the change of temporal identifiers as well as agents in the narrative. To avoid bypassing the text (as the short titles do not allow for detail) we decided to add a ‘Passage’ descriptor, where the text of the particular event is disclosed. 

The event object was the most important yet most difficult to develop. We went through a long trial and error process figuring out what descriptors to attach to the object, in a way that was useful but not redundant. The event object is now linked to the chronicle entry (the text of the chronicle by year), person, places objects and has a sub-object denoting time. 

The places object is connected to Pleiades, a database of ancient locations (along with their longitude, latitude, Pleiades id) which we imported to Nodegoat. The location identifiers in places allow us to visualize the ancient locations where the mapped events happened. 

Dating the events was another issue, since only some of them have a time identifier. We decided that instead of following a chronological logic, by using estimates and dates the text provided to date events, we would follow a narrative logic, by not ‘dating’ the events per se. Instead they would be connecting to each other sequentially, as dictated by the narrative determined by the chronicler; sometimes narrative and historical time are not interchangeable. To preserve the information the text provides we added a descriptor for ‘exact dates’, to be used in case the text provided one such descriptor. 

Having now created a database of objects, Nodegoat allows us to use the Reconciliation feature to map objects such as Places and Person to the remaining chronicle entries. Although not a flawless process, Reconciliation allows for semi-speedy execution of an otherwise laborious task. We are still working on ways to automate the process of text-tagging and potentially extend it to other objects, such as events.