Outline of the presentation:
1. Introduction
2. Textual Syntax
3. The Evolution of Visual Grammar
4. Visual Syntax in pictures
5. Visual Grammar and the realisation of meaning by Kress and van Leeuwen
6. Contextual Representational Meaning
7. Contextual Interactive Meaning
8. Contextual Compositional Meaning
9. Criticisms of Visual Grammar
10. Conclusion


The messages we receive daily on different social media platforms, come in different forms. Apart from words, visuals are also important resource for making meaning. The use of visuals in sending messages is rapidly developing with the growth of the social media; this partly explains why Goodman, (1996) asserts that “it is difficult these days to find a single text which uses solely verbal English”. This assertion draws attention to the fact that apart from words, messages are transmitted through pictures, graphics, drawings as well as paintings.

The messages we receive daily on our different platforms comprise words, pictures, videos and so on. Have we wondered how the pictures convey meaning(s) to us? The knowledge and ability to decode semiotic elements in whatever form they present themselves will enable viewers to reach a greater percentage of the text producer’s meaning.

As (Kostelnick 1993: 244) notes, pictures represent an understanding of the world acquired by members of a certain group, and thus, the meaning readers construct from a given picture may depend largely on knowledge they share with group members. The saying that 'a picture is worth a thousand words' conveys the compactness and complexity contained in pictures.

Until recently, verbal and written forms of language, have been the most analysed. Pictures crystallise important ideas about society. Scholars like Norris (2004), Cohn (2013), and Forceville (2019 have argued that in a multimodal text, the picture tends to catch attention first than the words.

Textual Syntax 

Textual syntax  is the structural relationship between linguistic items in a construction. Syntax as a grammar (or system of rules) structures a language. Without grammar, language cannot exist. With the ubiquity of many communicative channels on the social media platforms, so many semiotic elements are used in the transmission of messages.

There are single mode (written text alone) or (picture alone), double mode (say written text and picture), or multiple mode (videos, Tik toks, Graphic interchange formats (GIFs)), etc. A structure like: the man opened the door, has the subject (actor), the predicate and the object (goal). When the action moves from the actor to the goal, we say it is transitive. In other words, transitive verb takes an object.

Another structure can be: the man (actor) smiled (intransitive verb does not take an object). The meaning of a written text to a large extent is easy to decode through the syntactic arrangement of words. Then how do we read the meaning in the other modes, say like pictures which pervade the internet? This question has been answered partly by scholars like Kress and van Leeuwen (1996/2006), Cohn (2013), and Forceville (2019). Both Cohn and Forceville in separate studies investigated the affordance of comics, and found similar elements and patterns. Cohn (2013) points out that the building blocks in visuals that enhance meaning constitute visual language, as a result, he suggests that in reading all forms of images, visual language should be adopted instead of Visual Grammar (VG, hereafter).

Forceville (2019) partly agrees with Cohn pointing out that, pictograms constitute genres of visuals that have language-like properties, and can be read just like linguistic texts. These views reinforce Kress and van Leeuwen’s (1996/2006) position upon which most of the arguments in this paper draw their impetuses, that meaning resides in other semiotic modes other than language. Before moving on to the analysis proper, let us briefly look at the evolution of Visual Grammar:

The Evolution of Visual Grammar 

In the early 50s, Panosfy (1953) asserts that pictures are windows of the world, through which subjective points of view are expressed,  later pictures were used as surfaces on which to make marks, and according to Arnheim (1974: 274) ‘‘the world in the picture was experienced as a direct continuation of the observer's own space’’. Arnheim’s (1974) focus on pictures then was mostly only in formalist and aesthetic terms, not as contributing to meaning.

Traces of Panofsky, and Arnheim’s views are evident in K and vL (1996/2006) propositions in *Reading* *Images ….* Kress and van Leeuwen (2006) adopting a social semiotic approach to language as a framework, follow Halliday (1978) in recognizing three main language functions, which are always performed simultaneously. The high premium on language as mode of transmission of meaning over other semiotic modes, perhaps charted the way for K and vL’s VG.

They use ‘system networks’, which derives from the work of Halliday (1978), whose linguistic theories have influence on so many fields of language and visual analysis. Kress and van Leeuwen (2006) used different terms to explain how meanings are constructed in images: ‘representational’ instead of ‘ideational’; ‘interactive’ instead of ‘inter-personal’; and ‘compositional’ instead of ‘textual’ to account for the different representations of images. Kress and van Leeuwen point out that, pictures not only represent the world (whether in abstract or concrete ways), but also play a part in some interaction and, with or without accompanying text, constitutes a recognizable kind of text. They equally suggest that there is the need to choose a term that can encompass images, in all communicative setting; as a result, chose 'Visual Grammar' to cover ‘‘the grammar of contemporary visual design …, an account of the explicit and implicit knowledge and practices around a resource, consisting of the elements and rules underlying a culture-specific form of visual communication’’ (Kress and van Leeuwen 2006: 3).

VG focuses in particular, on the affordance of visuals; in other words, how the various elements in pictures are combined into meaningful whole. Just as the words of a language are combined by the grammar and lexis into meaningful clauses and Discourses, so also the affordance or semiotic elements in a picture can give generate meaning. Inferentially, VG shows how cultural meanings are encoded in different semiotic elements.

We now turn to Visual Syntax in pictures.

Visual syntax in pictures 

Visual structures in pictures do not simply reproduce the structures of ‘reality’ as pointed out by Midalia (1999). On the contrary, they produce images of reality which are bound up with the interests of the social institutions within which the pictures are produced, circulated and read. Pictures, like words can be used to serve different interests. The organisation of structures or affordance in a picture, enables the viewers to interpret or make conclusion of a particular picture. Without proper organisation, most pictures would not make sense to the viewer.

Visual syntax, therefore, in this sense, is the grammar of visual images. For instance, at the height of the fear of the corona virus pandemic, and even now, one wakes up in the morning and sees the picture below without a written text posted on one’s window. The affordance of this prototype picture of the corona virus, tells the viewer a story.

We probably may not have seen the picture or image before the outbreak of the corona virus pandemic, but with the knowledge of the pandemic, and the shared knowledge that the virus is deadly. The affordance of the picture assists us to read it correctly, that it represents the deadly corona virus. Imagine if the picture were to be just a round object without the protruding spikes, we would read the picture differently, and probably call it a football. Without yet going into the concepts which will assist us to read meaning in pictures as postulated by K and vL (2006), the affordance in the picture (roundish shape, colour, protruding spikes) has given the reader some form of visual syntax (unity of meaning), that, the picture represents the deadly corona virus.

Next we turn to:

Visual Grammar and the realisation of meaning by Kress and van Leeuwen

Kress and van Leeuwen (1996), is on the assumption that there should be an underlying pattern or structure that people can rely on to interpret the meaning of visual texts. They note that what can be communicated in words can also be communicated in images, pictures, signposts and so on. The argument by Kress and Van Leewuen is important today with the widespread use of different semiotic resources for making meaning.

The knowledge of Visual syntax denotes the ability to explain, interpret, describe, negotiate, and make meaning from information presented in the form of an image. Kress and van Leeuwen (1996) define two components for visual discourse (represented participants and interactive participants). The papers on Multimodality presented on this platform some time ago, focused on meaning in multiple semiotic elements. The focus here is only on reading meanings in pictures which can be arrived at through the conscious/unconscious combination of the affordance in pictures.  People or things that are mapped on an image are represented participants (I refer to represented participants as affordances), interactive participants refer to producers/readers/viewers. Relations exist between both classes of participants (including context, background knowledge and so on which assist to pass across meaning or read meaning).

The relation between the represented participants can be considered as syntactic, between the represented and the interactive participants as semantic and between the interactive participants as pragmatic (Kress & van Leeuwen, 1996: 119). They further point out that interactive participants are […] real people who produce and make sense of images in the context of social institutions which, to different degrees and in different ways, regulate what may be ‘‘… said with images, and how it should be said and how it should be interpreted’’.

As noted by (Midalia, 1999: 131), visual images, like all representations, “are never innocent or neutral reflections of reality... they re-present for us: that is, they offer not a mirror of the world but an interpretation of it”. This means that like written or verbal texts, pictures, images can be used, or manipulated to suit the producers’ aims. 

Apart from K and vL’s postulations on meanings in other semiotic resources other than language, Ping (2018) observes that the focus of language only in analysis obscures the importance of other modes in meaning making. Ping (2018) draws attention to the fact that with the development of technology, pure discourses gradually decrease, also noting that other semiotic elements exist in meaning making. He further notes that with the development of technology, language only data is gradually decreasing; supporting K and vL (2001:2) view that multimodality, that is, the ubiquity of images and other semiotic resources are the features of modern society. Forceville (2019) equally notes the importance of K and vL’s (2006) proposals for the classification and interpretation of images, and their relevance to the currently emerging ‘cognitivist’ paradigm of the 21 century.

With these views and arguments, to adequately read a picture, the viewer needs more knowledge than the base provided by the picture. In addition to the base knowledge provided by the affordance of the picture, we need to know its context, some background knowledge that led to its emergence. This covert knowledge allows us to view such pictures/images as assemblages of information that we need to meaningfully interpret. I define visual syntax as the mental arrangement of affordances in pictures/images to achieve meaningful interpretation.

Visual Grammar (VG) is the theory that has developed concepts for studying visual communication. As Kress and van Leeuwen (1996:137) observe, ‘‘socially determined viewpoints could, in this way, be naturalised and presented as studies of nature’’. Physical objects, images and pictures to a large extent represent nature.  This is another key emphasis of the social semiotic approach: semiotic resources are at once the products of cultural histories and the cognitive resources we use to create meaning in the production and interpretation of visuals and other messages.

Below is a schema which captures the three strata of meaning realisation in VG:

Figure 1, is the schematic representation of meaning in Visual Grammar by K and vL (2006). Visual Grammar, as mentioned before, has a tripartite division (representational meaning, interactive meaning and compositional meaning); meaning in each is realised through the components on the right hand side. The visual syntax can be realised through any of the tripartite divisions, or through a simultaneous combination of the concepts.

In the modified model, context is included in the realisation of meaning in each division. Context here includes the situated context of each picture, the common knowledge shared by the producer of a picture and the viewers.

In the next section, we take the three strata of meaning in VG with examples:

 Contextual Representational Meaning: 

Contextual Representational meaning explores the meaning of objects/entities represented by the image producer(s) in the real world. Contextual Representational meaning is first of all conveyed by the (abstract or concrete) ‘participants’ (people, places or things) depicted. Kress and van Leeuwen (2006: 79) propose that Representational meaning is realized by two processes as depicted in the schema above (Figure 2): narrative process and conceptual process. Narrative images (or scenes within pictures) are recognized by the presence of a vector.

Since  actions  presuppose  human  or  human-like  agency,  vectorial  patterns  are  called  ‘‘narrative’’. In a visual representation, the verb is equivalent to a vector. Kress and van Leeuwen (1996) introduced the concept of ‘‘vector’’ as the pictorial equivalent of the action verb (recall the example I gave under textual syntax).  Real  or  virtual  lines  between  human  elements  in  a  picture  function  in  ways  similar  to  verbs  describing  relations  between  what  in  Hallidayan  grammar  are referred to as ‘‘Actors’’ and  ‘‘Goals’’.  The vector is a line that connects participants, it expresses a dynamic, ‘doing’ or ‘happening’ kind of relation like the verb in a textual syntax. The vector is a criterion to identify whether the affordance in the image qualify it to be a narrative or conceptual process. “Vectors are the marks of narrative process” (Kress & van Leeuwen, 2006: 82).

Narrative representations relate participants in terms of ‘‘doings’’ and ‘‘happenings’’, of the unfolding of actions, events, or processes of change. In Narrative Representation, the concepts: Action, Reaction, Transactive, and Non-Transactive aid in the realisation of meaning. Just like in textual syntax where the subject and the verb aid interpretation, the affordances in the picture/image guide the interpretation. This is the visual equivalent of lexis; ‘‘syntax’’ in pictures is a matter of affordance sequencing.

Sample 2, is an example of a picture that has an actor, a vector and a goal. The direction of the gaze of the represented participants not directed at the viewer, is a special kind of vector (Kress and van Leeuwen 2006: 82). It creates a reaction rather than an action. Such a reaction can be transactive or non-transactive. It can be that the viewer sees both the person who is looking and the object of his or her gaze (transactive reaction). The affordances of the picture, such as: the direction of the gaze of the represented participants, the corona virus sign, and the hand shake, are tools which aid the viewer in reaching a fine construction of the visual syntax.

The vector connects the participants. Action goes from the man to  the lady. If the corona virus image were not to be in the picture, the visual syntax would read something like: the man shook the lady. But based on the common ground knowledge of the corona virus pandemic, and how the virus looks like (recall the prototype picture of sample 1), the visual syntax reads something like: The man who has corona virus shook the lady. The man does something to the woman. The ‘actor’, the ‘doer’ of the action is the man. The meaning the picture gives has been created using the visual language.

The represented participants—the man, woman in the picture interacts with the viewers through the reading path created by the natural affordance in the picture. We can go further with the analysis to say that the '' vector'' is bi-directional (that is, the woman's hands also form a vector which ‘acts on’ the man). When a picture or a scene within a picture has both an actor and a goal, it is ‘transactive’, representing an action taking place between two parties. The picture provides visual evidence of views in the society. But when a picture is just ‘there’ as used by the producer, it is a non-transactional image. A picture can as well be used by a producer to depict an actor not carrying out any action, similar to intransitive constructions:

Sample 3, has just a represented participant, just looking. When a picture has just a participant (represented participant), the human image is the Actor and has no Goal. The action is upon itself, similar to intransitive verb in English. It is not aimed at anyone or anything. The non-transactional action process is therefore analogous to the intransitive verb in language (Kress and van Leeuwen 2006: 61-62). The look of the represented participant is not directed at anyone. The picture itself provides the reading path which guides its interpretation. The picture makes meaning to the viewers in the context of the corona virus pandemic, which requires the wearing of mask as a way to prevent one from contracting the virus. The affordance of the picture, especially the face mask, guides the visual syntax which could read: the masked man looks. The viewers choose the useful affordance in the picture, and organise them mentally to read the picture and form the visual syntax.

We turn next to the second level of Representational meaning, referred to by K and vL as Conceptual process:

Pictures which depict conceptual process, represent  participants  ‘‘in  terms  of  their  class,  structure  or  meaning,  in  other  words,  in  terms  of  their  generalized  and  more  or  less  stable  and  timeless  essence’’, K and vL (1996: 56). The represented participants are the carriers of the message. A participant (‘Carrier’) is depicted as made up of a number of parts (‘Possessive Attributes’) and the structure is interpreted as showing all the parts from which the whole is made up. Individual represented participant complement one another to gradually build up the whole. The conceptual process contains relational process and existential process. The individual elements gradually conceptualize elements that make up a whole. Pictures which do not contain vectors are ‘conceptual’. They visually ‘define’ or ‘analyse’ or ‘classify’ people, both physical and abstract things. One kind of conceptual pattern is the classification structure; this brings different people, places or things together in one picture, distributing them symmetrically across the picture space to show that they have something in common, that they belong to the same class, as illustrated below:

Sample 4, depicts the represented participants as not performing any action, but the unification of the corona virus sign made salience through colour, the standing positions of each represented participant guide the viewer(s) to the knowledge that they share something in common. The composition of the picture, based on the background knowledge of the pandemic, allows viewers to recognize the social significance of the picture. There are traces of social facts embedded in the picture, as well as evidence of the social conventions and organisational practices that underpin its production and interpretation. The affordance brings to mind the warning from healthcare experts that people should maintain social distance in public places. The viewer simply gets a mental picture of what social distancing entails. The likely visual syntax of the picture based on the understanding of the corona virus pandemic is: Maintain social distance when in public places.

Contextual Interactive Meaning:

Contextual Interactive meaning is associated with the social relations between actors, and the evaluative orientations that participants adopt towards each other and to the represented world (Kress & van Leeuwen, 1996: 110). Pictures can be used to enact relationships between producers and viewers. Contextual Interactive meaning as shown in Figure 2, can be realized through contact, attitude, distance and modality. Pictures can create particular relations between viewers and the world as depicted inside the frames; this way they interact with viewers and suggest the attitude viewers should take towards what is being represented. According to K and vL (2006: 121), the speech functions, allowed in VG, are either ‘‘demand” or “offer.” If a vector by eye-gazing appears in a visual, it implies that the represented participants are addressing the viewers with a visual “You,” demanding something from the viewer, which is, the viewers’ attention. Relationships can be depicted in different ways, that is, how does the picture relate with the viewer, what message is the picture trying to pass across to the viewer?

When the represented participants in a picture look directly at the viewer, they tend to ‘‘demand’’ something from the viewer. In this way they ‘‘make contact’’ with the viewers, establish an (imaginary) relation with them. Kress and van Leeuwen (2006:118) call such pictures ‘‘demand’’ pictures – the people in the picture symbolically ‘‘demand’’ something from the viewer, through direct eye contact. The image of the corona virus’ active involvement with the viewer through the resource of laughter, hands on the shoulder of the carrier and gaze, ‘‘demands’’ the viewers’ attention; just like in written expression, the linguistic elements draw the viewer into the process of meaning making. Unlike the picture of sample 3, which has only an actor, sample 5 has both actor and goal. The contact between the represented participants and the interactive participants is enacted through direct address. The picture addresses the viewers directly, ‘‘demanding’’ whatever the viewer can infer. The situational context allows the inferential meaning of ‘‘You’’ can contract the virus through close associates. The hands of the girls on the shoulder of the actor show close affinity. The visual syntax reads something like: your close friends/relatives can transfer the virus to you. But when there is no ‘imaginary contact’ the represented participant(s) are viewed quite differently.

The represented participant of Sample 6 does not involve actively with the viewers; and, the role of the represented participant is confined to, according to K and vL (2006:119) “items of information” or “specimens in a display case”. The realized speech function then is interpreted as an ‘offer’ – an ‘‘offer of information’’ being made. Offer pictures position the viewer as an observer only, and ‘‘offers’’ the represented participants as ‘information’ to be taken in by the viewer. The corona virus pandemic helps the viewer to get the likely piece of information ‘offered’. The likely information ‘offered’ by sample 6 then is: the Ooni wears face mask too. The picture interacts with the viewer through the information ‘‘offered’’.

The terms ‘‘demand’’ and ‘‘offer’’ were taken from Halliday (1985) who uses them to distinguish among four speech functions that ‘offer information’, ‘offer goods-and-services’, ‘demand information’ and ‘demand goods-and-services’. Kress and van Leeuwen point out that images could ‘demand’, ‘goods-and-services’ that realize a particular social relation; images could also primarily ‘offer’ information.

Visual Grammar, aids in describing the kinds of meaning which allows producers and viewers to create the kinds of relations between pictures, producers/viewers and the people, places or things depicted in pictures.

Contextual Compositional Meaning

Contextual Compositional Meaning, refers to the whole layout of a picture. Information value, framing and salience are concepts through which compositional meaning is achieved. Meaning can be achieved through each of the levels individually, or through a combination of the concepts. In relation to Contextual Compositional meaning, ‘‘information value’’ refers to the placement of particular information such as given and new information in a picture or image. Sometimes, a picture has natural and arbitrary selection of semiotic resources. 

The composite structure of a picture, connects with the reader through framing, to give a coherent interpretation of the picture. Framing indicates that, elements of a composition can either be given separate identities, or represented as belonging together; framing ‘connects’ or ‘disconnects’ elements. Salience, refers to different degrees of elements for readers such as ‘background’ and ‘foreground’. Some elements can be made more eye-catching than others. This again can be made in many different ways--through size, through colour contrasts. Here is an example of how salience can guide visual syntax:

In Sample 7, the corona virus image made prominent by the producer, creates a reading path for the viewer and the overall theme of the picture. Again, based on background knowledge, we can look into the picture to explore its informational content provided by the arrangement of the affordance of the represented participants. The interpretation drawn from the represented participants guides the viewer not to view the picture as object of fascination, but as collective meaning making resources. This is enabled by the ensemble of the affordance of the picture. The producer draws attention to the corona virus pandemic and the importance of people staying at home through salience (achieved through colour contrast) . Some of the represented participants have contracted the corona virus, while some have not. The composite meaning of the visual syntax the viewers get is: The corona virus can easily be spread through people walking about. 

The viewer is able to construct an understandable visual syntax aided by the affordance of the image of the corona virus and the different positions of the represented participants. In the picture, the viewer sees not individual participants, but the way participants fit together to make up a larger whole. According to Kress and van Leeuwen (1996: 219), ‘salience’ creates reading paths in a text. The most salient represented participants in the composition guides the viewers to the likely meaning of picture. The picture plays a role in reminding viewers that non-social distancing of individuals, spread the virus faster. The layout, the placement and relative salience of the corona virus image do the compositional work that allow viewers in this context to recognize the picture as a warning for people to stay at home.

Kress and van Leeuwen have argued that VG dwells more on images in western societies, this argument is not completely true, pictures/images from non-western societies also depict events and happenings in the society, and can also be subjected to analysis using the template provided by K and vL. Most of the examples used in this paper especially samples 1,2,4,5 and 7 are not society-specific. They can, in the corona virus awareness campaign fit into any society, and also make meaning. The analysis shows that, apart from verbal warnings, pictures deployed at the height of the corona virus pandemic, can, also to a large extent, draw viewers into the process of signification. As Kostelnick (1993: 244) puts it, ‘‘reading pictures involves not only what we see but what we know’’, the common knowledge between the producers and viewers, guides the viewers in piecing together affordances in the pictures, to reach the likely meaning of the producers.

Criticisms of Visual Grammar

Visual Grammar like most theoretical paradigms is not without criticisms; it has been criticised for not paying much attention to the interaction between pictures and (con)text. For instance, Forceville (1996) points out that although K and vL acknowledge the importance of context, their concepts and models do not specify how context must be incorporated. Visual Grammar has also been criticised for introducing new terminologies in their framework which may be vague to audiences not familiar with the terms, and as such, requires elaborate explanation each time the terminologies are used in analysis. Haught (2012), also has some reservations about the reliability of some hypotheses presented in K and vL (1996). He points out that, though he agrees with some hypotheses as raised by K and vL, he suggests that qualitative and quantitative research should be done to test them. Kress and van Leeuwen have also been criticised for not comparing visual structures with mental processes through which both language and images are the perceptible manifestations. To me, the benefits of K and vL (2006) postulations outweigh the criticisms. The concepts are efficient in reading the meanings of the semiotic elements in picture.

In this presentation, I have discussed how pictures can mean with a modified model of Kress and van Leeuwen three strata of meaning in VG. Pictures play a role which goes far beyond the mere illustration of what is communicated in language. Pictures too, to a large extent can be read as language, because they have structures that produce meanings. As the analyses in this paper show, the concepts provided by VG can be applied to pictures in non-western societies. While some studies have utilsed the concepts of VG in the explication of meaning in texts with multiple semiotic resources, there is still the need to apply it in the analysis of meanings in other visuals. The three levels of analysis in VG, offers a wealth of knowledge on different concepts and aspects to study on, the examples I have presented here are glimpses of what each strata of meaning making entails.

I observe also, that K and vL did not solely use pictures in their analyses, most of their examples are picture-cum-text. I have used examples that are only pictures, to show how affordances in pictures and context can guide interpretation. As I have mentioned somewhere here, some of the pictures used as samples are non-culture specific.

I have discussed the three strata of meaning and some concepts in VG, and also, partly how to apply the concepts in analysis.

