IST 8A, Fall 2008
Vision Systems II
Through the case of Dee Fletcher who had selective brain damage, Milner and Goodale have developed a picture of the visual system as specialized into two subsystems with very different purposes. These are vision for action and vision for perception. In turn, each of these subsystems is further specialized into modules designed to extract different features of the visual scene (in the vision for perception stream) or to implement different motor actions based on visual information (in the vision for action stream).
This lecture will summarize the modularity of the visual system, and talk further about the computational methods used by each subsystem to accomplish its purposes.
Two visual systems and the tasks they perform
The authors discuss two main visual systems and the tasks they perform, in primates (which have been studied extensively) and humans (whose studies comes from lesioned patients such as Dee Fletcher). From our previous discussion we know that there are more than two systems, but the rest of the discussion in Chapter 4 is focused mainly on vision for action and vision for perception.
Vision for perception
According to the authors, one defining characteristic of vision for perception is that it creates internal representations of the visual world, which can be accessed “off-line” for various purposes. These are our conscious visual perceptions. The pathways of vision for perception are not linked to motor areas of the brain. Instead they include areas with specialized abilities to recognize edges, textures, faces, familiar objects and places. These areas all lie within the ventral temporal lobe. From there the pathways link to other areas involved in memory, emotion and decision making.
Vision for action
These pathways are concerned with acting in the present – taking fast action without reflection. Memory and emotion do not enter but fast motor response and control does. The dorsal stream proceeds from V1 into the parietal lobe. An evolutionarily older stream bypasses the LGNd and V1, going instead to through the superior colliculus. In primates this stream controls head and eye movements, but unlike the frog it is now refined by the presence of the dorsal stream. This means that primate head and eye movements are much more sophisticated than those of a frog. But control of these motions is not accessible to consciousness, any more than it would be in the frog (to the extent that a frog is conscious!).
Evidence for separate systems
Prior to Dee Fletcher, lesion studies with monkeys dating back to the 1800s had suggested the existence of two separate vision systems. Monkeys whose dorsal stream (i.e. what we would now call the dorsal stream) had been damaged would fumble when trying to pick up objects, even though they could see the objects perfectly well. And monkeys whose ventral stream was damaged could not be trained to recognize new objects, but could dexterously pluck flies out of the air!
More recently, techniques were developed to monitor individual neurons firing. When applied to ventral stream areas, the techniques discovered that some neurons preferentially fire for edges shown to the monkey at specific angles. And we already know about the fusiform gyrus, which has neurons that fire selectively for faces. A common trait of all these neurons is that they are quite “particular” about which visual features cause them to fire, but also quite insensitive to other features such as viewpoint or lighting.
This discovery of preferential firing was the beginning of our current conception that the ventral stream’s job is to represent features of the visual world as opposed to reacting to them. And it gives a clue to the nature of our conscious perceptions: when viewing from a variety of vantage points, we still recognize the same scene.
Modules – the building blocks of the visual streams
The idea is that the broad specialization of visual streams into two main tasks – vision for action and vision for perception – is implemented by sub modules belonging to each stream. The existence of modules is indicated by imaging and electrode recording studies, and also by lesions: localized damage can result in specific gaps in cognition or behavior.
In the vision for perception stream there are sub modules for recognition of objects, places and faces. In the vision for action stream there are areas for reaching, grasping, and eye-flick motions (saccades) among others. A deficit in one of the ventral stream modules leads to some kind of “agnosia” (= “not knowing”), e.g. prosopagnosia (face blindness), topographical agnosia (inability to recognize familiar landmarks).
The concept for isolating these areas and abilities is the same as that for the broader visual streams themselves. It is termed double dissociation. Generally, double dissociation means that in separate patients, it can happen that ability A is lost in one patient while ability B is retained; and in the second patient ability A is retained but B is lost. This means that these abilities are dissociated or unconnected in the brain. Double dissociation is what leads to the concept of brain modularity.
The dorsal stream, by contrast, has areas that fire only when visual information is accompanied by action. For example, there are separate areas that fire when a monkey reaches for a target, or just flicks its eyes toward it.
Terminology for deficits due to damage in each stream
Here is a summary of terminology that describes the different conditions of dorsal or ventral stream damage:
“Not knowing” aspects of the visual field; blindness to at least some features: damage to the ventral stream (Dee Fletcher)
Incoordination of muscle movements involved in grasping objects; inability to reach for or appropriately align the hand to pick up something: damage to the dorsal stream
This section reviews the organization of the visual streams, starting with the areas in the occipital cortex (the “V’s”) along with their connections to the areas for recognition and action.
The next figure shows the locations of the v-areas in the occipital cortex.
Figure 1. Diagram of the visual areas in the occipital cortex, seen from medial (left) and lateral (right) viewpoints. Front of the brain is to the left in each picture.
Representations of visual data in the ventral stream
This section will talk about some aspects of visual representation, starting in primary visual cortex (V1) and continuing into the ventral stream. Here there are neurons that selectively fire for certain kinds of input. This means that they encode or represent certain kinds of data but not others.
The next figure is an example of what takes place in V1 itself. This area has neurons specialized for encoding lines and edges at various orientations. Here is a striking example!
Figure 2. Encoding a grid pattern in V1
“Downstream” from V1, in the inferior temporal lobe, are areas we have talked a lot about! These areas are more “fussy” (to use the words of M & G, Chapter 4) about which stimuli they fire for. Thus they encode higher order features of a scene, such as faces.
As the authors state, representation is typical of the action in the ventral stream. These representations have complex interconnections with other brain areas involving memory and emotion, for example. Thus, our vision for perception is heavily connected to other areas involved in consciousness.
Activations in the dorsal stream
The authors also state that visual data reaching the dorsal stream typically causes firing only when there is interaction with an object. This has been determined in single cell recordings of monkeys, showing activation if the monkey reaches for an object, does eye saccades (flicking the eyes toward the object, or grasps the object. Thus, there is not a visual representation in the same way as in the ventral stream. Instead, if you want, there are codings for action in response to visual stimuli. One key difference is that the dorsal stream has direct connections to motor areas of the brain, whereas the ventral stream does not.
The next diagram illustrates some of these ideas.
Figure 3. Organization of the two streams into subcomponents. Source for Figures 2 and 3:
Next is a diagram of the main visual streams and some important modules in each stream. The ventral stream (vision for perception) goes through the inferior temporal lobe (IT). The dorsal stream (vision for action (is mainly found in the intraparietal sulcus (IPS). The “v-areas” have V1 (primary visual area) in the occipital lobe.
The diagram shows the organization of the two streams and their sub modules as described in the book. This leads to the following points:
Computational tasks of each system
The perception and action streams were revealed through lesions that resulted in behavioral deficits. And the deficits delineated the nature of each stream. It became apparent that the perception stream was for making representations that could be processed “off line” as in thinking, remembering and planning, while the action stream was for guiding motion “on line”. These concepts are summarized by the following points:
Experiments to reveal the different types of computation
The text describes several interesting types of experiments to reveal the different computational abilities of each system. The perception system needs to form judgments of relative closeness or distance between objects. It uses cues such as object occlusion (one object partially blocking another) or relative size to make these inferences. The action system needs to compute absolute positions of objects in order to interact with them.
This is why the perception system is more able to be fooled by illusions. The common factor in the illusions cited by the text is that context makes us perceptually misjudge the size of one component. Our perception of an object’s size can be judged if we are asked to indicate how big it is by opening our fingers. Typically subjects will open to the wrong width when influenced by context. But when asked to grasp the component (if the illusion is implemented in 3D) our hands open to the appropriate width. Thus our action system is not fooled. This shows that it depends on a different kind of computation that ignores contextual cues.
Pantomime is the process of consciously (i.e. using the perceptual system) mimicking motions that are normally carried out by the action system. It is the distinction of pretending to pick up a non-existent cup vs. actually picking up a real cup. Most normal people are worse when doing pantomime than in executing the real actions. This is because the pantomimed actions depend on the very different computation in the perceptual system – which goes for generalities and context over precision.
For example, one experiment is to see an object, and then the object is removed. After a short delay, the subject is asked to pantomime picking up the object. To do so, the person has to consciously remember and visualize the object, then adjust the hands and arm reach accordingly. This is crude for normal people but Dee Fletcher cannot do it at all, even though she could pick up the object in real time. On the other hand, people with dorsal stream damage (ataxia) cannot pick up objects in real time. But their performance actually improves when asked to pantomime!
Distance computations in vision for action
have already mentioned how the perception system uses cues to make contextual
judgments about a scene. That is why it is so easily fooled. But how does the
action system avoid these traps? How does it compute exact metrics for picking
up an object or threading a needle?
The answer is binocular vision. In Dee Fletcher’s case, she computes distance to an object by monitoring (unconsciously, of course) the degree to which both her eyes must converge to fixate on the object. This is revealed by the fact that she does poorly on motor tasks when one eye is covered.
-- Evan Fletcher