Reviewing Sorting Phase Data: Visual Characteristics

Colons, diacritics, and seals — oh, my! What did the #genizascribes find in the Cairo Geniza fragments?
To celebrate our volunteers’ hard work & review the data produced in phase 1, we’re sharing a series of blog posts that answer some of the project’s big questions.

Reviewing Sorting Phase Data: Visual Characteristics

To celebrate our volunteers’ hard work & review the data produced in the sorting phase, we’re sharing a series of blog posts that answer some of these questions about this project. Part 1 reviews the question of whether a subject was Hebrew or Arabic script. Part 2 reviews the question of whether a subject was written in formal or informal script. This part reviews reviews the presence of various visual characteristics. Part 4 reviews the classification tags from the talk boards.

What characteristics did volunteers identify, and why are these characteristics important?

At the start of the project, we asked volunteers to use the point tools to identify the following visual characteristics on a given fragment:

Screenshots of instructions for identifying visual characteristics after selecting Hebrew (left) or Arabic (right) for script
  • Diagonal and/or Perpendicular Text in the Margin. This typically indicates this is a text reflecting daily, normal life, such as a letter.
  • Seals (only if the volunteer had sorted the fragment as Arabic). Seals were created by the impressions from signet rings and were used on official documents. A seal typically indicates an official state document.
  • Horizontal Line Above Word (only if the volunteer had sorted the fragment as Hebrew.). This indicates the fragment likely contains a literary text.
  • Use of a Colon in the Text (only if the volunteer had sorted the fragment as Hebrew). The colon “:” symbol written on these fragments demarcates the end of a line or verse, suggesting this fragment may have been part of a prayer book or poetry.

In June 2018, we stopped asking volunteers to identify a horizontal line above a word. We decided this was too specific for our sorting purposes, and instead replaced it with the following:

  • Diacritic, also referred to as Dot, Vowel, or Diacritic (only if the volunteer had sorted the fragment as Hebrew). These markings give us information about how to pronounce and accentuate each word in each phrase, how to sing each word, and where to pause in the phrase while reading. It would indicate that this fragment was meant to be read aloud.

At the same time, we asked volunteers to identify additional evidence of characteristics on a given fragment:

  • Evidence of Binding indicates this fragment was likely part of a book.
  • Justified Margins indicate this may be a literary fragment, as documentary fragments tend to have more irregularity to their margins.
  • Top Corner Page Wear indicates the fragment may have been turned while reading.

These characteristics help us discern whether a fragment was used for everyday purposes or religious purposes. Not all of these features were available as options at every stage of the project. Volunteers also had the option to select “None”. Volunteers could identify the visual characteristics on either side of the subject.


How many subjects were identified as having evidence of these visual characteristics?


9,108 subjects (22%) were classified as having evidence of diagonal and/or perpendicular text in the margin, which means at least one volunteer identified it as such. 187 of these subjects were eventually sorted into the Arabic transcription workflows, and 8,792 were eventually sorted into the Hebrew transcription workflows.

Subject 12602783 features perpendicular text in the right margin. (ENA NS 34, Library of the Jewish Theological Seminary)

416 subjects (1%) were classified as having evidence of seals, which means at least one volunteer identified it as such. These subjects, combined with the #seals tag on the Talk boards, may seem like a small number, but it is incredibly important for identifying official state documents written in Arabic.

Five volunteers noted that Subject 11584200 has two seals on its verso. (Halper 359, University of Pennsylvania, Herbert D. Katz Center for Advanced Judaic Studies Library, Cairo Genizah Collection)

3,734 subjects (9%) were classified as having evidence of horizontal line above word, which means at least one volunteer identified it as such.

Notice the line above the word on the last line of the right side? So did 7 other volunteers! (Subject 11526102: ENA NS 76 326, Library of the Jewish Theological Seminary)

11,978 subjects (29.8%) were classified as having evidence of a colon in the text, which means at least one volunteer identified it as such.

There are several colons on Subject 21708066, the first at the end of line 2. (MS-MOSSERI-I-00034, Genizah Research Unit, Cambridge University Library)

4,398 subjects (10.9%) were classified as having evidence of a dot, vowel, or diacritic, which means at least one volunteer identified it as such.

Subject 21708429: MS-MOSSERI-II-00243–00002, Genizah Research Unit, Cambridge University Library

How many subjects were identified as having evidence of the additional characteristics?

6,457 subjects (16%) were classified as having evidence of justified margins by at least one volunteer. 225 of these subjects were eventually sorted into the Arabic transcription workflows, and 6,163 were eventually sorted into the Hebrew transcription workflows.

Subject 11583957 features justified margins. (Halper 252, University of Pennsylvania, Herbert D. Katz Center for Advanced Judaic Studies Library, Cairo Genizah Collection)

5,958 (14.8%) subjects were classified as having evidence of binding by at least one volunteer. 308 of these subjects were eventually sorted into the Arabic transcription workflows, and 5,593 were eventually sorted into the Hebrew transcription workflows.

Subject 12603390 features evidence of binding — note the ties along the center crease. (ENA NS 51, Library of the Jewish Theological Seminary)

3,707 subjects (9%) were classified as having evidence of top corner page wear by at least one volunteer. 170 of these subjects were eventually sorted into the Arabic transcription workflows, and 3,503 were eventually sorted into the Hebrew transcription workflows.

Subject 21708631 features top corner page wear in the upper left corner, and evidence of binding along the center. (MS-MOSSERI-IV-00119–00001, Genizah Research Unit, Cambridge University Library)

Did volunteers identify visual characteristics correctly?

As noted in previous posts, this doesn’t mean that the subjects definitively have these visual characteristics — it just means that based on the set of instructions given, volunteers identified various visual characteristics of the fragment as such. In the above dataset, we considered a subject to have evidence of a visual characteristic if at least one volunteer marked it as such. As our content specialists review this list, we hope to improve upon the field guide for effective and accurate identification by volunteers.