Fig. 1: A modern representation of the Agamemnon, Siracusa (Italy). Copyright E. Schembri

At line 1100 of Aeschylus’ Agamemnon, after Cassandra’s visions have begun to unfold progressively, the murderous plot that is about to be accomplished against the king is hinted at for the first time. The captive seer cries in anguish:

ἰὼ πόποι, τί ποτε μήδεται;

Cassandra’s question is thus translated by Sommerstein (2008, which I have consulted for all the translations of Aeschylus here): “Ió, popoi! What is being schemed?” The change in the syntactic construction from the active in the original to the passive is not without reason. Although it is not difficult to replicate the powerful dramatic effect of the original into those languages where the subject can be left out, as it is in Greek, the challenges for rendering the line in English, for example, are pointed out by Fraenkel (1950: vol. 2, 498 n.1) with perfect clarity:

In some modern languages, e.g. in English and German, it is hardly possible to retain the suspense of the riddle without using the passive, where the Greek has the deponent.

What Fraenkel has acutely remarked is that a syntactic phenomenon, the deliberate omission of the subject, contributes decisively to the meaning of this sentence in its context. If we add a subject pronoun and translate: “what is she scheming?”, we give away too much and too soon about the author of the plot [1]!

As we have seen in the previous post, modern annotated corpora can embed information on morphology and syntax, and thus allow scholars to search and retrieve data on these aspects of language. But as we move to the level of the meanings and of communicative processes, a sentence like Aesch. Ag. 1100 exemplifies the many challenges and subtleties that we have to face. Is there a way to capture the peculiar meaning conveyed by the ellipsis of the subject and to integrate it in our annotated corpus, along with the other phenomena that contribute to the general sense?

When the readers of the play ask questions like: “who is the subject of μήδεται? Is it recoverable from the context?”, they evoke two important linguistic phenomena that must be taken into account.

First of all, it is important to ask why we need to look for a subject. Obviously, it is not only for the sake of translation. Even speakers of such languages where the subject can be omitted can feel that the verb form μήδεται requires to be supplemented with some information so that the communication is completed. In fact, as we know from Grammars, a subject pronoun can be left out in Greek only when the referent that it would point to can be clearly inferred from the context. When this is not the case, the communication is malfunctioning. Indeed, although Cassandra’s language in 1100-4 is remarkably plain for the Agamemnon, the Chorus rightly complains that this part of the prophecy is totally unintelligible for them. “I am incapable of decoding this prophecy” (τούτων ἄιδρίς εἰμί τῶν μαντευμάτων), they say in their answer (1105). On the contrary (cf. 1106), they know only too well who the “babies lamenting their slaughter, and the roast flesh their father devoured” of the first part of Cassandra’s vision are (vv. 1095-7), even if this legend is hinted at with high poetic diction and far more obscure syntax [2].

At a more abstract level, a verb like μήδεσθαι, or its English equivalent “contrive”, depicts a scene where at least two “actors” are involved. Syntactically, they will be expressed as the object and subject of the verb. Semantically, we will identify an agent and a result of the action described [3]. Additional information on the circumstances (e.g. a space and a time when the plot is schemed) may always be added. Often in Greek, another actor, a beneficiary, is expressed too [4]. But whereas these optional circumstantial complements and supplementary arguments can be omitted, agent and result will always be implied, even if only by (elliptical) reference to the context.

The fact that words, and predicates in particular, require a number of mandatory arguments in order to complete their sense is variously defined in linguistic theories, but it is mainly known by the name of valency. Locus classicus for the definition of valency (valence) is Tesnière (1959: 102), where the author distinguishes between adjuncts (circonstants) and arguments (actants), which are the real actors in the “drama” (“pétit drame”) expressed by the verb [5]. Different schools use different theoretical framework to record the arguments required by each verb sense, as it can be seen by comparing three lexical resources in the Unified Verb Index where the structure of a given English verb, e.g. “contrive”, is analyzed (PropBank, FrameNet, VerbNet), or the Valency Lexicon of Czech (PDT Vallex) for the roughly equivalent verb “vymyslet”.

An implication of the distinction between obligatory modifiers and optional adjuncts is the fact that, when we try to diagram the meaning of a sentence, we have to make room for the obligatory arguments, even in the case that they are left out, as with the subject of μήδεται in Ag. 1100. These positions are left empty either because a pronoun has been “dropped” in the communication, or it has been left intentionally unexpressed (as in the line that Aeschylus wrote for the character of Cassandra), or it is a general argument that, grammatically, needs not to be filled (as the agent of the verb “to cook” in such a sentence as: “cooking is fun”).

The second phenomenon I mentioned is co-reference resolution. Whenever we read a line like Ag. 1100, we are naturally induced not only to supplement a subject mentally, but also to clarify what is the person or object this pronoun refers to. Even in the case where a pronoun is expressed, this linguistic “pointer” must be decoded for the meaning of the sentence to be understood. This operation of decoding is another necessary part of the communication process. It is at this level that the problems of the Chorus with Cassandra’s question start: they understand that a subject of “contrive” is implied (as speakers of Greek, they are aware of the valency of μήδομαι), but they are unable to refer the unexpressed subject to any entity in the world of the discourse or of reality that constitutes the context.

If we take these two sets of meaning in mind, we may try to formalize our reading of Ag. 1100 using a dependency representation in this way: the verb μήδομαι governs a direct object, the interrogative pronoun «what?» (τι). In the fictional world of the play, the subject is present only to the hallucinated mind of the seer; in the discursive context of tragic performance, the impossibility for one of the parties involved in the communication to reconstruct the intended reference is deliberately used by the playwright to build up the “riddling” effect that so magnificently contributes to the tension of this scene. Therefore, the other element that is obligatorily demanded by the valency of the verb is reconstructed with a stub, that we may label as “Unspecified” and that we will not link to any other node of the context with a co-reference arrow.

Surface syntax (left) and Tectogrammatical tree (right)

Fig 2: Surface syntax (left) and tectogrammatical tree (right) of Ag. 1100

Leaving all the many technicalities aside, we may note that the right tree in Fig. 2 visualizes precisely this reconstruction. On the other hand, a “plain” syntactic tree that diagrams the relations between the words that are effectively attested in the sentence is reproduced on the left. In the language of the Prague Dependency Treebank, and in the theoretical framework of Functional Generative Description of the Prague School that lies behind it (Sgall et al. 1986), such diagrams as ours in the right are called tectogrammatical trees. Whereas a treebank of “surface syntax” is already available for Ancient Greek (see our previous post), tectogrammatical annotation of Greek is still an unexplored new frontier.

Those who are interested in the technical aspects of the formalism can refer to this clear introduction by Jan Hajič to the multi-layer Prague Dependency Treebank of Czech, where this kind of annotation was implemented and developed for the first time. What is important to stress here is that the complex layer of tectogrammatics is a formalism that integrates the major phenomena that impact on the meaning of a sentence in its context. In the Tectogrammatical Sentence Representation, the meaning is produced by a combination of four different factors, all of which are represented in the trees:

  1. the semantically relevant words and the dependency structure formed by them; a detailed set of semantic-syntactic labels (the so-called “functors”) is used to describe these relations; special attention is paid to the relations implied by the verb valency;
  2. the semantic information that are expressed through the morphology (number, gender, aspect etc) and are represented as properties of the words (with the so-called “grammatemes”);
  3. the co-reference of the linguistic “pointers” (pronouns and reconstructed nodes);
  4. topic-focus articulation and word-order.

A fifth level that we may add is ellipsis resolution. Very often, some information is left unexpressed for the sake of brevity: this happens not only with subject pronouns as in our example, but, even more frequently, with coordinated structures (as in e.g. Mary likes lilies and Anna [likes] roses). In such cases, the omitted nodes must be reconstructed in the tectogrammatical representation.

The information structure (topic and focus articulation) is a crucial and rather difficult argument that will require a separate post. For the moment, it is important to note its relevance for the meaning of the sentence: the same sentence can have radically different interpretations according to the changing distribution of Topic and Focus. In this video, E. Hajičová discusses a few examples of this kind of ambiguity.

In a tectogrammatical tree, the sentence may be significantly reshaped from its surface realization: some nodes for more “technical” words (like prepositions, conjunctions or the auxiliary parts of composed verbs) are left out and represented as attributes of the semantically relevant units who carries the actual (lexical) meaning. Other nodes, on the other hand, may be introduced, as we have seen with the obligatory arguments of the valency frame.

Tectogrammatical trees may seem awfully complicate to read and generate, but the advantages of using such a layer of representation are manifold.

Firstly, the relations between the different lexical elements with their semantic properties are formalized and visualized in the form of a dependency tree, with the predicate in the dominant position. The structure of the Treebank as a 3-level scenario (morphology, surface syntax and tectogrammatical layer) allows the user to shift back and forth from the tectogrammatical tree to the level of surface syntax. In Fig. 2, the two trees can be seen side by side and compared; technically, it is always possible to move from the one to the other.

Secondly, this complex tree structure is able to make room for the most important components of the meaning that we have seen: valency, coreference, topic-focus articulation, and ellipsis.

Thirdly, the model has been developed and tested for a language like Czech that shares a number of important features with (Ancient) Greek: it is a pro-drop language, with a highly flexible word-order, where topic and focus articulation play an important role. Tectogrammatical sentence representation allows a very sophisticated interplay between linguistic theory and data annotation: ultimately, it serves as the perfect “playground” where linguistic theories can be tested on the corpus evidence with complex multifactorial approaches (see especially Hajičová and Sgall 2006).


[1] It is only at line 1107, when Cassandra addresses the future killer directly with the words: ἰὼ τάλαινα (“oh wretched woman”) that we learn that the person in her vision is indeed a woman.

[2] See Denniston and Page (1957) ad v. 1096.

[3] Most of the times, in active sentences, semantic agents and syntactic subjects will coincide. But there is at least one complication to this equation: in case of a passive construction (e.g. “a plan is contrived”), the distribution is inverted and the semantic result becomes the syntactic subject.

[4] As in Il. 7. 478: σφιν κακὰ μήδετο μητίετα Ζεύς. Cf. LSJ s.v. μήδομαι A.2.

[5] A brief introduction to the history of valency, along with a first attempt to build a valency lexicon for Medieval Latin based on the corpus of Thomas Aquinas, can be found in McGillivray and M. Passarotti (2009: 45-6, especially).

  1. Francesco — isn’t interpunction also something that could be covered by tectogrammatical diagramming? The Aeschylus sentence that you analysed has modern punctuation; in manuscript (or inscribed on stone) it would look different, but we still know that this interpunction has to be there. You’re considering linguistically ideal cases, I know, but I’m thinking as a philologist: imagine that you want to compare tree diagrams of textual variants of your sentence, and that each variant is punctuated differently: we would, of course, see that something is missing, but we won’t be seeing what the missing part is doing (and that it’s doing it even absent from the text).

    • Francesco Mambrini

      Thank you for your comment, Neven! Punctuation, as you say, is very fascinating in philology. Puncutuation is a standardized way to make some syntactical phenomena explicit through printing and the conventions change widely in time and space. With Ancient (and originally not punctuated) texts, it is also a way for editors to print out their interpretation on the syntax of what they are editing. As you say, a comparison between the different punctuations adopted can be very intructive indeed!
      For treebanks, however, it is important to note a point. Standard syntactic treebanks maintain and annotate the punctuation of the digital text that is used as the basis for annotation. In the AGDT (as in PDT), we distinguish two functions for punctuation (if we leave the final period or semicolon aside, which is less interesting): punctuation marks can be either head of coordinated structures (like in the dummy sentence: Mary likes roses, Ann [likes] violets, above), or they can be “auxiliaries” that e.g. separates subordinate clauses from the main clause. Thus, you will find a comma in the left tree above for Aeschylus’ sentence, that separates the exclamative “Io popoi” from the rest; the comma is attached to πόποι as an auxiliary node AuxX and tagged as ‘u’ that is precisely the abbreviation for ‘punctuation’ in the tagset.
      But punctuation is normally not retained in tectogrammatical trees, the only exception bein (for technical reasons) when it is the only coordinating element. The logic is the following. In a sentence like: “as for the kids, they are doing fine” the comma is used to separate the (contrastive) topic from the rest. In a tectogrammatical tree, we signal that “kids” is a contrastive topic, and get rid of the graphical convention that expresses that. A linguist that is interested in the history of punctuation conventions will still infer this usage by the interaction between the two layers: he will find that “kids” is contrastive topic in the tectogrammatical tree on the right, and find that it is separated by a comma from the main clause in the tree on the left. That is why the punctuation disappears in tectogrammatical trees. I hope it is clear!

