Writing communicates information via markings and has two independent classes of data: (i) semasiography that uses images without recourse to spoken language and (ii) lexigraphy that uses icons/symbols to embody a verbal language. Within an archaeological context, semasiography is considered to have three categories: (i) primitive art, such as the images found in Lascaux, (ii) descriptive–representational devices, such as the pictorial event messages written by the Plains Indians, and (iii) identifying mnemonics that identify things such as individuals and places. The first two categories of semasiography, being pictorial based, tend not to have a strong, consistent directionality in the image placement. Some identifying mnemonics can have consistent directionality, such as heraldic shields. Lexigraphic writing, based on speech, has an implied and consistent directionality. Thus unknown systems written with a consistent, implied directionality of marking placement may be identifying-mnemonic semasiography or lexigraphic writing (Powell 2009).
The paper (Lee et al. 2010) describes how to determine whether a set of unknown symbols, written with a consistent, implied directionality, is either identifying-mnemonic semasiography or lexigraphic writing by using two variables, Cr and Ur. Directionality is needed in order that pairs of symbols (digrams) can be identified and counted. Cr (based partly upon the proportion of the digrams that appear only once in the text) differentiates between semasiography and lexigraphy. Ur differentiates between the types of lexigraphic unit (letters, syllables, words) and is based upon the level of information, or complexity, encoded by pairs of symbols. The level of information encoded by the pairs of symbols will be influenced by two factors: (i) the transliteration used (or level of detail converted) between the observed symbols and those included in the calculations (e.g. if ‘p’ and ‘q’ are transliterated as ‘p’, then there has been a loss of information compared with the correct transliteration) and (ii) the different type of information unit of the underlying language that the symbols are representing/encoding (e.g. words, syllables, letters)—owing to the different syntax and spelling rules, this encoding is independent of the variety of symbol used (e.g. logograms or letters combinations for words) but strongly affects the numbers and types of symbol pairs observed. The technique employed a very wide dataset of writing, language and script types in order that its classification capability was comprehensive. Owing to space, only a single example describing the effect of changing the level of information encoded for in a script was outlined. However, the mathematics remains the same and thus the conclusions drawn in the paper can be applied to all the other linguistic forms covered by the data.
Defective systems (word endings discarded leaving only a stem) are included within the dataset reported. For example, the Roman inscriptions are written in monumental Latin, which foreshortens many common words to a single stem (e.g. manibus, menses, marci, marcus all foreshortened to ‘m’). The important effect here is that the level of information that the stems encode for is that of words, they follow word syntax, producing digram frequencies characteristic of words and this is what is observed. The effect of using word stems, as opposed to word stems+different endings, is to reduce complexity and thus constrain the word/stem symbol lexicon, reducing Ur. This is shown in figure 1, for full names reduced to diminutive/familiar names consisting only of a base stem, where Ur decreases but the classification of the symbols is still as words. The effect at the letter level is shown in fig. 7 in the paper (Lee et al. 2010).
Egyptian hieroglyphs represent consonants, but being lexigraphic they predominantly encode for syllable-like speech units, for which the vowels were not written (Zauzich 2004). A reader would insert the vowels in the consonants (ths sntnc shws th ffct). Since the hieroglyphs in the inscriptions predominantly encode information at the syllabic level, they show symbol pair frequencies characteristic of syllables and Ur is representative of syllabic systems. Likewise, when transliterated in combination so that they encode for information at the word level, the hieroglyphs give a Ur representative of words.
The technique classifies lexigraphic writing at the level that the majority of symbols are encoding. Thus, English text encoded at the word level can contain symbols such as ‘a’ that also encode at the letter level. The Egyptian inscriptions used employ hieroglyphs encoding mainly at the syllabic level but with some encoding at the word or determinative level. Likewise over 90 per cent of the Chinese logogram characters are complex, containing both a semasiographic device coupled to a phonetic sign, and which together make a single logogram that encodes for a specific word (Powell 2009). In all these cases, the technique correctly classifies the text type based on the level that the majority of symbols are encoding at.
Heraldic symbols are a useful non-lexigraphic comparator since their placement rules lead to an implied directionality. This implied directionality is one of the reasons why the Pictish symbols have been proposed as heraldic symbols. Heraldic symbols consist of the base symbols (e.g. lion), modified by detail (e.g. a lion rampant versus a lion passant) and colour. The dataset used included four types of transliterations: (i) full symbols with colour (the correct transliteration), (ii) full symbols no colour, (iii) base symbols with colour, and (iv) base symbols no colour. Incorrect transliterations reduce the level of information encoded by the symbols resulting in a constrained (smaller/simpler) symbol lexicon, less uncertainty and a lower mean value of Ur (figure 1). This is true whether the symbols are heraldry or lexigraphic systems such as Egyptian hieroglyphs. Consequently, the paper’s conclusion that reducing the complexity encoded for in the text constrains the symbol lexicon and lowers Ur holds for all types of directional writing systems (semasiography or lexigraphy) even though Ur has not been used to classify the directional semasiographic writing.
The variable used to discriminate between semasiography and lexigraphy is Cr not Ur. The different heraldry transliterations showed no discernable effect upon the Cr observed—that is to say, the underlying degree of repetition used to encode for identifying-mnemonics semasiography (e.g. heraldry) is the dominant determinant in this variable rather than the level of complexity.
The Pictish symbol stones generally contain very few symbols but they do have a strong, consistent, implied directionality. Using the Mack corpus, the class I stones (largest set) have a mean of 2.03 symbols/stone with the median being 2 symbols/stone (the majority of the stones contain pairs of symbols, including class II stones, with the remainder predominantly containing either a single symbol or a symbol pair plus the mirror and/or comb symbol) (Mack 1997). Usually, the symbol pair is placed one symbol above the other, although in some they are side by side. For the vertically placed symbols, this naturally leads to a vertical direction of transcribing. Whether one reads top–bottom, bottom–top, left–right or right–left is unknown but, so long as a consistent direction is used, the calculations will give the same conclusion—in these short inscriptions the relative frequency of the symbol pairs (digrams) remains the same regardless of direction (i.e. AB=BA in terms of relative frequency) leading to the same level of information being encoded—the sentence contains the same level of information when written backwards. The level of information is resilient to syntax or symbol order change. For example, the Egyptian inscriptions have no spelling rules for words—hieroglyph order is dependent upon the scribe’s view of the most beautiful combination. Modern transliterations use a standardized spelling order, yet both transliterations with their different syntax classified the hieroglyphs as syllabic. Similarly, there are wide variations in syntax between languages, yet the technique correctly classifies the level of information encoded at the word level for all the languages examined.
The Mack corpus is a simple transliteration of the Pictish symbols. It differentiates between a symbol and a symbol modified by an adjunct such as a Z-rod or V-rod. It makes no differentiation owing to other possible modifications. Using Mack gives the simplest transcription of the Pictish symbols and hence the base case. Unfortunately, there is no agreed corpus of symbol types, nor any agreement on how each symbol with its different internal and directional attributes may be modified or should be transliterated, nor whether the symbols were painted different colours, like the heraldic charges. All of these factors might encode for information and thus change Ur. To investigate these parameters, further research would be needed and one that was aimed at a different audience—the Pictish archaeological, art historical and linguistic community: (i) detailing the possible symbol types, (ii) giving a visual catalogue of each symbol (700+) along with possible symbol type(s) for comparison, and (iii) including a catalogue of possible internal carving modifiers. The effect of these different parameters in the transliteration upon Ur and Cr would need to be investigated. A paper covering these parameters is currently being reviewed, but the summary is that adding any extra information into the symbol parameters above that which Mack uses increases the complexity and increases Ur (just as expected from the data on heraldry and lexigraphy) while not changing the Cr classification (lexigraphy).
Class I stones contain only Pictish symbols, class II stones contain Pictish symbols, a cross and often other imagery, and class III stones contain a cross and imagery found on class II stones but no Pictish symbols. The paper reports that the classification of the symbols is the same on class I stones (where they appear without any other possible information) as it is on class II stones (where they appear with descriptive–representational semasiographic devices—the cross and the other imagery), thus confirming that the level of information that the symbols (rather than the other imagery) encode for is the same whichever class of stone they are found upon. The paper makes no attempt to investigate the descriptive–representational semasiography since it does not have a strong, consistent, implied direction of reading, making the identification of image pairs extremely hard (and inconsistent). For this reason, the other imagery found on class II Pictish symbol stones such as Eassie or Aberlemno 2 is not included in the analysis. This is similar to the protocol that Egyptologists use with mixed material containing hieroglyphs, such as the Narmer Palette. The Narmer Palette is recognized as a descriptive–representational semasiographic device that includes a few, early hieroglyphs encoding information at the lexigraphic level (Powell 2009). The Kinord Pictish stone is a class III stone containing a beautifully carved cross but no other imagery. In order to be able to quantatively discuss the syntax and complexity of descriptive–representational semasiographic devices, a different technique is needed—one that would map the different images and then apply a multi-spatial analysis to the information encoded. To our knowledge this technique does not currently exist.
The technique in the paper investigated the Pictish symbols but it can be applied to other unknown scripts. Taking Fischer’s reading (Fischer 1997) of the Phaistos Disk and using all the symbols and the bars gives first-order entropy that falls outside the random ellipse. The values for Cr and Ur are such that the technique classifies the symbols as syllabic. Rerunning the script without the bars gives a similar result. These results are similar to those already arrived at by linguists.
The paper’s conclusion and title is that the Pictish symbols are lexigraphs; that is to say, they are language-based writing and not random or semasiography. This conclusion was drawn using the simplest and least complex coding available—the Mack corpus of symbol types—and is therefore robust. Thus, the paper reaches its conclusion based on the method that it applies. Since there is no agreed corpus of symbol types or other possible modifiers/descriptions (such as internal carving), the paper cannot offer a definite classification on whether the symbols are words, letters or syllables. The only scientifically valid proposition is to offer the inference of what might be likely, until such time as there is agreement within the Pictish scholarship as to what constitutes the actual symbol corpus and its grammar.
The accompanying Comment can be viewed at http://dx.doi.org/doi:10.1098/rspa.2010.0189.
- Received June 20, 2010.
- Accepted July 9, 2010.
- © 2010 The Royal Society