Vision

The actual “capture” presents no problem. Electron guns

cover the rows of pixels at 25 sweeps per second,

refreshing and changing intensities. 28 gray values

make feature extraction the challenge. Efficient
decision trees

collapse in the combinatorial explosion.

Digitizing brightness variation will reveal

reflective properties, texture and compositions.

The attractively human stereo systems still are troubled

by the correspondence problem: one wrench seen twice is two;

they cannot correct for viewpoint. Structured light,
though it can slip

into the angle between steel sheets, is vulnerable

to shadows. In windowing, shifts in resolution

distribute the labor. The master scans the field

coarsely, for promising features—intrusions,
protrusions, holes—

then moves in slaves for a detailed investigation.

A PUMA at the University of Rhode Island

now solves several bin-picking problems. It quickly selects

for graspability, retrieving a sequence of parts

from an overlapping mass. The arm’s parallel jaws

know when a part is between them, and close around it gently.

A bottom-up system learns like a newborn. The first
flexible net,

SOPHIA, used 12 SLAMS and had only one discriminator.

The commercially available WISARD has no state structure,

yet it achieves 100% discrimination

among target faces, through its powers of generalization.

Trained and tested on live images, it is not disconcerted

by changes in light, spectacles, grimaces

or false mustaches. Even a single-layer net displays

certain features of intentionality: the sudden

catastrophic leap from estimate to decision, when

the first burst of feedback confirms recognition

of a dubious pattern. The high-resolution window

homes in on the key feature, without camera-shake: the dot

or cross-bar blown up on top of the vertical slash,

the suspect’s face, framed and magnified over the crowd.