On February 22, students and professors collaborating on the CDER project and other digital humanities research projects at Wesleyan presented their work at the 2025 Connecticut Digital Humanities Conference at Central Connecticut State University. Will Markowitz, Arushi Khare, Akram Elkouraichi, and Professor Torgerson spoke about their progress on the CDER project, beginning by explaining the goal of accumulating data from across many sources into one digital platform to allow for easier access to relationships between data and to facilitate interaction between academic disciplines. Students’ roles in the project are researching sources and incorporating them in the platform, and compiling data in the platform, with Professor Torgerson’s guidance.
Will explained the NodeGoat platform, which the CDER project uses to accumulate and centralize data in an accessible digital format. Then, Akram discussed linked data and his work on modeling the Istanbul walls. Will spoke about relational data and examining relationships between different kinds of data, including geographical relationships (as the project is space-based). He demonstrated how these relationships appear visually in NodeGoat. Arushi described the work of linking seals from the Dumbarton Oaks collection in Georgetown to an open database of people living in Constantinople and to official buildings where offices would have been stationed. The goal is to link information extracted from sources back to those sources in NodeGoat.
Arla Hoxha and Zaray Dewan spoke about their work on the Chronicles project, and students and professors working on the Life of Milarepa and Chinese language theaters in North America also presented on their recent work at the conference.
At the time of the last blog post (May 2024) about Constantinopolitana: A Database of East Rome (CDER) project, we celebrated the creation of a prototype database, consisting of multiple types of objects and artifacts from across Constantinople. Each object type had about 20-100 data points (to read more about this process, check out our last blog post!). Of course, lots of this data is directly integrated into CDER from preexisting datasets, many of them having thousands of entries.
With permission from the owners, we want to figure out a methodology to add these items into CDER to help find wider connections than one institution would be able to create or develop on their own. This involves combining large datasets, a more technical task compared to the prototype, as there is lots of data cleaning and preparation involved.
This summer, I worked on combining existing data on Byzantine lead seals with existing prosopographic data (data describing individuals alive during the Byzantine empire).
Datasets and Process
Byzantine lead seals were used to seal letters sent across the empire. These seals contain information about the sender such as their name, title, and occupation, allowing insights about people and the bureaucracy. Imperial titles, known as dignities, demonstrated a seal owners’ place in the imperial hierarchy. An owner’s office, or occupation, also appears on a seal.
For this stage of the project, we worked with roughly 16,000 seals from the Dumbarton Oaks(DO) Byzantine Seal collection, in collaboration with Johnathan Shea. This collection has already been digitized. A crucial part of integrating this data into our model was making the seals data into relational data, meaning that some parts of the data (in this case dignities and offices) get tables separate from the seals table.
We chose to work on data regarding people and seals in conjunction because of the large, preexisting, and already partially connected datasets. Byzantine prosopographic data has a wide overlap with seals, some of which is already documented. This is because the prosopographies use seals as evidence for the existence of people. For example, we might know new information about a person from a seal, or that might be the only record that a person existed. With this project, our goal was not to create insights or establish new ‘readings’ of the seals to create connections with people, but rather to digitize and store existing connections in a database, in addition to creating a model for incorporating other types of data into CDER.
There is already some data overlap between the DO seals collection and the PBW. A lot of my time was spent cleaning up the data and making it consistent, and understandable to our model, which means that each row has to contain the same columns. Each column must contain its information in similar formats across rows. There were also lots of text analysis strategies used to extract information from how it was originally formatted in a sentence, phrase, paragraph, or description. We also learned methods for importing into Nodegoat (the digital humanities software we are using to store the data), as well as methods for making connections technically.
Visualizations
Below are some graphs and descriptions of what is possible with this data as a database or dataset, and which point towards the next steps for analysis.
I want to preface that these graphs do not give a perfect, or even good, representation of what’s going on in history, or even in the seals dataset. First, the number of seals is not necessarily representative of the actual number of offices or dignities of a certain type, or the number of people in these roles. Some offices might have been sending letters more often (so there would be an overrepresentation of that office on seals). In addition, visualization is more complicated because of the way seals are dated. We know some of the seals are from a certain set of years, and many of them can be dated to a single century. But for some seals, we are not sure of the exact century, and so they would be dated to two or more centuries. In the graphs below, I wanted to avoid double counting the seals, so the date used is the first date in the estimate. For example, both a seal that we know is from the eighth century as well as one we know is either in the eighth or ninth century would be represented in these graphs as part of the eighth century.
Figure 1
Figure 1 depicts the number of seals over time, with colors representing the offices of the seal’s owner. Figure 2 shows the seal count for the top six overall dignities over different centuries.
Figure 2
Figure 3
Figures 3 focuses on the dignity ‘Patrikios’ which was high ranking in the 8th – 10th centuries, losing importance through the 11th. (Shea, 2020; Kazhan, 1991). Figure 3 shows the different offices on seals with the dignity Patrikios.
A historian can use these visualizations to understand how to develop their inquiry. For example, a historian interested in the hierarchy of a specific office such as the dioiketes could use a visualization such as Fig. 3 to understand which other offices ranked similarly over time.
Figure 5
Figure 5 shows a section of a network graph for the dignity Patrikios. The largest other nodes are dignities ‘imperial protospartharios’ and ‘anypatos’, as well as the office ‘strategos’, the name for a military general. You can also see tight clusters of seals together in the graph: these clusters are made up of parallel and related seals, some of which are identical. Some broader clusters might be hints to explore certain connections further. The connection with people (white dots in Fig. 5), could allow historians to better date seals, if there are other sources about a specific person, or understand an individual’s trajectory through offices and dignities with more context.
Next Steps
There still are some technical steps to finish up this part of the project, consisting of changing data types and storage methods to store values more efficiently. We are also planning on connecting the seals data to the PMBZ. These connections are slightly more complicated as the PMBZ does not contain direct references to the new format of seals, so bibliographic information on each seal involved has to be extracted, normalized, and then matched to bibliographic information in the DO collection. For the lab as a whole, this semester we are going to work on building structure for other objects (statues), as well as continuing our focus on buildings and incorporating location and geographical information into the data.
Note: This blog post focuses more on the conceptual aspects of the project. If you are interested in any technical details, reach out to apwilliams@wesleyan.edu.
The Fulda project, a quantitative-qualitative analysis of the chronicle Annals of Fulda through the platform Nodegoat, resulted in a fully-fledged database of chronicle entries, people, places, and events. The model used to map events is a novelty in the field of chronicle-studying, and one we hope will continue to be replicated and improved upon. We hope our database will aid scholars reflecting on this time period or thinking about questions of narrative and the anatomy of the chronicle; why are they put together in a certain way? Enriching the model and data and refining our processes are next for our team at Traveler’s Lab. Following the footsteps of previous projects in the Lab, even though we started Fulda from zero, many of our goals were realised during this summer, some of which are outlined in this article.
The obvious, and maybe the most important achievement was the database itself, with the chronicle fully uploaded to Nodegoat. Anyone with access to the database can find, categorize and visualize elements of the chronicle, such as the people or places in it. The chronicle entries are tagged by which chronicle and manuscript they belong to and the text is fully mapped with object tags. This makes it easy to analyze the chronicle based on the elements that interest a researcher. Through the Nodegoat configuration, it is possible to see the way all the data is linked to each other; what events take place where, who is involved, how many times a name is referenced in a chronicle entry, comparisons between multiple entries or events, and more.
A great feature of the database is the link to ancient locations through Pleiades. All events have a location tag which allows us to visualize the events in a geographical map. Interconnectivity is one of the best things about this model because not only do we have data on different places and events but we know what happened where and who was involved.
Creating a model for determining events that allows us to follow the logic of the narrative was an important achievement this summer. The process involved much trial and error and remains a work in progress, but we were able to refine the bulk of the process, as explained in the last article. In creating the event objects for the database we made sure to use the text’s language as well as date the events based on the sequence of the narrative rather than historical time (although a descriptor was provided for this, in case a specific date was present in the text). All this was done to shift the focus towards the narrative and follow the logic of the chronicle and what the text deems important, as opposed to our reading of it. We hope this model will inspire and allow for more thorough analysis, that leaves less room for misinterpretation.
In a future ambition for expanding the project, we hope to use the comparative tools provided by Nodegoat and the construction of the model to run comparisons between different manuscripts as well as the English and Latin versions of the text. Onboarding a scholar of Latin to work with the team is an aspiration which would further enrich the Fulda project.
As stated before, we hope to expand our model to include data from other Carolingian chronicles, such as The Royal Frankish Annals. We hope to inspire other scholars to use quantitative methods, especially those that centre the narrative, in their research of chronicles from all time periods.
Although much progress was made during this summer, there is always room for improvement. In the upcoming semester, we anticipate fixing our problem with accessing locations not covered by the Pleiades ancient locations database through an API. We also hope to find ways to automate as much of the text-tagging process as possible. Some of this has already been done, through the Reconciliation system in Nodegoat, but we wish to refine the process further. The use of AI shows big promise in this regard, as was discovered during an experimental session of using OpenAI to perform people text tagging. Implementing and integrating this process into the Nodegoat object tagging is one of our goals for the future.
In line with other Traveler’s Lab projects, this undertaking was the beginning of a long exploration of using quantitative methods in the study of medieval chronicles by following the logic of the text through its narration, rather than that of chronology. This project, drawing from the 9th-century Carolingian chronicle, the Annals of Fulda, served as an experimental model that will inspire similar practices in the way we study chronicles. Describing the work of a whole summer, this article will focus on the methods used to study the Annals of Fulda, including the constructed models we hope will have a wider impact.
Methodology
The whole text of the chronicle the Annals of Fulda was parsed, scanned and uploaded to Nodegoat, a web-platform that allows for data modeling and contextualization through spatial and temporal elements. Nodegoat allowed us to create our own objects to map our data (from the text) such as Person (the historical people part of the event) and Places (the geographical area where the event was happening). The text was systematically mapped with tags of Person, Places and Event objects. A new object added to Fulda was the Religious tag, which is used to map religious celebrations such as Easter or Christmas, that occur throughout the text. Starting to map Fulda not having used the platform before was made easier through following the sample of Fulda’s sister project The Royal Frankish Annals, modeled by Daniel Feldman. Therefore, many of the objects were already set up, and only needed to be furnished with the new data. In order to have both projects sharing the same object database we created the Chronicle object to differentiate between them as well as different manuscripts of Fulda.
The team had already started thinking about new ways to express events, in a way to make them help us better understand the narrative. The way the project defines the event is different than we might think of them regularly. For instance, an event is not only a battle or a coronation, or an ‘important’ happening; anything can be an event. In fact, everything is. Every couple of sentences focusing on a specific narrative (following certain guidelines for time and place), was mapped as an event.
Determining what constitutes an event and creating the event dataset was a challenging experience and a process we are refining to date. With the intention of fully capturing the text of the chronicle, we started developing a model where every sentence would be an event, but soon realized that this would not fully capture the scope of the narrative. We then opted for a definition of the event that was more narrative-focused where the events would terminate depending on the change of temporal identifiers as well as agents in the narrative. To avoid bypassing the text (as the short titles do not allow for detail) we decided to add a ‘Passage’ descriptor, where the text of the particular event is disclosed.
The event object was the most important yet most difficult to develop. We went through a long trial and error process figuring out what descriptors to attach to the object, in a way that was useful but not redundant. The event object is now linked to the chronicle entry (the text of the chronicle by year), person, places objects and has a sub-object denoting time.
The places object is connected to Pleiades, a database of ancient locations (along with their longitude, latitude, Pleiades id) which we imported to Nodegoat. The location identifiers in places allow us to visualize the ancient locations where the mapped events happened.
Dating the events was another issue, since only some of them have a time identifier. We decided that instead of following a chronological logic, by using estimates and dates the text provided to date events, we would follow a narrative logic, by not ‘dating’ the events per se. Instead they would be connecting to each other sequentially, as dictated by the narrative determined by the chronicler; sometimes narrative and historical time are not interchangeable. To preserve the information the text provides we added a descriptor for ‘exact dates’, to be used in case the text provided one such descriptor.
Having now created a database of objects, Nodegoat allows us to use the Reconciliation feature to map objects such as Places and Person to the remaining chronicle entries. Although not a flawless process, Reconciliation allows for semi-speedy execution of an otherwise laborious task. We are still working on ways to automate the process of text-tagging and potentially extend it to other objects, such as events.