Archive for the ‘Epistemology’ Category

The Fallacy of the Right Answer

The Fallacy of the Right Answer is everywhere. With regards to education technology, it dates back at least to BF Skinner.

Skinner saw education as a series of definite, discrete, linear steps along a fixed, straight road; today this is called a curriculum. He referred to a child who guesses the password as “being right”. Khan Academy uses similar gatekeeping techniques in its exercises, limiting the context. Students must meet one criterion before proceeding to the next, being spoon-fed knowledge and seeing through a peephole not unlike Skinner’s machines. Furthermore, these steps are claimed to be objective, universal and emotionless. Paul Lockhart calls this the “ladder myth”, the conception of mathematics as a clear hierarchy of dependencies. But the learning hierarchy is tangled, replete with strange loops.

It is fallacious yet popular to think that a concept, once learned, is never forgotten. But most educated adults I know (including myself) find value in rereading old material, and make connections back to what they already have learned. What was once understood narrowly or mechanically can, when revisited, be understood in a larger or more abstract context, or with new cognitive tools. There are two words for “to know” in French. Savoir means to know a fact, while connaitre means to be familiar with, comfortable with, to know a person. The Right Answer loses sight of the importance, even the possibility, of knowing a piece of information like an old friend, to find pleasure in knowing, to know for knowing’s sake, because you want to. Linear teaching is workable for teaching competencies but not for teaching insights, things like why those mechanical methods work, how they can be extended, and how they can fail.

Symbol manipulation according to fixed rules is not cognition but computation. The learners take on the properties of the machines, and those who programmed them. As Papert observed, the computer programs the child, not the other way around (as he prefers). Much of this mechanical emphasis is driven by the SAT and other unreasonable standardized tests which are nothing more than timed high-stakes guessing games. They are gatekeepers to the promised land of College. Proponents of education reform frequently cite distinct age-based grades as legacy of the “factory line model” dating back to the industrial revolution. This model permeates not only how we raise children, but more importantly, what we raise them to do, what we consider necessary of an educated adult. Raising children to work machinery is the same as, or has given way to, raising them to work like machinery. Tests like the SAT emphasize that we should do reproducible de-individualized work, compared against a clear, ideal, unachievable standard. Putting this methodology online does not constitute a revolution or disruption.

patent_portrait

(source)

Futurists have gone as far to see the brain itself as programmable, in some mysteriously objective sense. At some point, Nicholas Negroponte veered off his illustrious decades-long path. Despite collaborating with Seymour Papert at the Media Lab, his recent work has been dropping tablets into rural villages. Instant education, just add internet! It’s great that the kids are teaching themselves, and have some autonomy, but who designed the apps they play with? What sort of biases and fallacies do they harbor? Do African children learning the ABCs qualify as cultural imperialism? His prediction for the next thirty years is even more troublesome: that we’ll acquire knowledge by ingesting it. Shakespeare will be encoded into some nano-molecular device that works its way through the blood-brain barrier, and suddenly: “I know King Lear!”. Even if we could isolate the exact neurobiological processes that constitute reading the Bard, we all understand Shakespeare in different ways. All minds are unique, and therefore all brains are unique. Meanwhile, our eyes have spent a few hundred million years of evolutionary time adapting to carry information from the outside world into our mind at the speed of an ethernet connection. Knowledge intake is limited not by perception but by cognition.

Tufte says, to simplify, add context. Confusion is not a property of information but of how it is displayed. He said these things in the context of information graphics but they apply to education as well. We are so concerned with information overload that we forget information underload, where our brain is starved for detail and context. It is not any particular fact, but the connections between them, that constitute knowledge.  The fallacy of reductionism is to insist that every detail matters: learn these things and then you are educated! The fallacy of holism is to say that no details matter: let’s just export amorphous nebulous college-ness and call it universal education! Bret Victor imagines how we could use technology to move from a contrived, narrow problem into a deeper understanding about generalized, abstract notions, much as real mathematicians do. He also presents a mental model for working on a difficult problem:

I’m trying to build a jigsaw puzzle. I wish I could show you what it will be, but the picture isn’t on the box. But I can show you some of the pieces… If you are building a different puzzle, it’s possible these pieces won’t mean much to you. You might not have a spot for them to fit, or you might not yet. On the other hand, maybe some of these are just the pieces you’ve been looking for.

One concern with Skinner’s teaching machines and their modern-day counterparts is that they isolate each student and cut off human interaction. We learn from each other, and many of the things that we learn fall outside of the curriculum ladder. Learning to share becomes working on a team; show-and-tell becomes leadership. Years later, in college, many of the most valuable lessons are unplanned, a result of meeting a person with very different ideas, or hearing exactly what you needed to at that moment. I found that college exposed to me brilliant people, and I could watch them analyze and discuss a problem. The methodology was much more valuable than the answer it happened to yield.

The hallmark of an intellectual is do create daily what has never existed before. This can be an engineer’s workpiece, an programmer’s software, a writer’s novel, a researcher’s paper, or an artist’s sculpture. None of these can be evaluated by comparing them to a correct answer, because the correct answer is not known, or can’t even exist. The creative intellectual must have something to say and know how to say it; ideas and execution must both be present. The bits and pieces of a curriculum can make for a good technician (a term I’ve heard applied to a poet capable of choosing the exact word). It’s not so much that “schools kill creativity” so much as they replace the desire to create with the ability to create. Ideally schools would nurture and refine the former (assuming something-to-say is mostly innate) while instructing the latter (assuming saying-it-well is mostly taught).

What would a society look like in which everyone was this kind of intellectual? If everyone is writing and drawing, who will take out the trash, harvest food, etc? Huxley says all Alphas and no Epsilons doesn’t work. Like the American South adjusting to an economy without slaves, elevating human dignity leaves us with the question of who will do the undignified work. As much as we say that every child deserves an education, I think that the creative intellectual will remain in an elite minority for years to come, with society continuing to run on the physical labor of the uneducated. If civilization ever truly extends education to all, then either we will need to find some equitable way of sharing the dirty work (akin to utopian socialist communes), or we’ll invent highly advanced robots. Otherwise, we may need to ask ourselves a very unsettling question: can we really afford to extend education to all, given the importance of unskilled labor to keep society running?

 


If you liked this post, you should go read everything Audrey Watters has written. She has my thanks.

Prefer Verbs to Nouns

My principle, v0.2

Prefer verbs to nouns.

When Bret Victor introduced the concept of a principle, he said a good principle can be applied “in a fairly objective way”. This is the biggest problem with my first draft, which took several sentences to define what a powerful way of thinking was. A principle must be general enough to apply to many situations, but also able to operationalize to find meaning in any specific situation. Devising or crafting a principle requires inductive reasoning (specific to general), but applying it demands deductive reasoning (general to specific). Forging a principle resembles Paul Lockhart’s vision of mathematics: an idea that at first may be questioned and refined, but at some point begins “talk back”, instructing its creator rather than being shaped by it.

I could have formulated the principle as verbs, not nouns or similar, but the principle itself demands a verb. I have chosen prefer, but I fear that may not be active enough; something closer to choose or emphasize verbs over nouns may more fitting. As the principle predicts, identifying a dichotomy and even choosing one side is easy compared to selecting the verb to encompass the process and relationship. This principle retains status as a draft, although unlike its predecessor it does not have the glaring flaw of subjective application. The verb (and preposition serving it) are still to be determined, and the possibility of cutting a new principle from whole cloth also remains open.

All of this without a discussion of the principle itself! Human language is endlessly versatile and adaptive, and therefore (in hindsight!) it is quite fitting that I use the terms of language itself. Of course the principle does not apply specifically to language, but any field that involves structures and the relationships between them, which is to say, any field at all. It can apply to essays, presentations, or works of art. Finding the verbs and nouns of a particular field is often easy, even if it is difficult to abstract the process. With that said, verbs are not always grammatically verbs; -ing and -tion nouns can be fine verbs for the purpose of the principle.

The verbs should be emphasized to your audience, but the setting will determine how you craft their experience. Most of the liberal arts require grappling with verbs directly; a good thesis is architected around a verb that relates otherwise disparate observations or schools of thought. By emphasizing the verbs, one communicates causal mechanisms, transformations, relationships, and differences across time, location, demographics, and other variables. The goal is not merely to show that the nouns differ (“the a had x but the b had y”), but why, what acted on them to cause the differences. Frequently the base material (often historical events or written works) are already known to your audience, and you need to contribute more than just a summary. You need to justify a distinction.

However, in the presence of detailed, substructured, and numeric nouns, it is often best to let them speak directly. Often the evidence itself is novel, such as a research finding, and you want to present it objectively. In such cases, more frequent in science and engineering, placing your audience’s focus on verbs requires that you place yours on presenting the nouns. The more nouns you have, the more ways they can relate to each other; the more detailed the nouns, the more nuanced those relationships can be. When the nouns are shown correctly, your audience will have a wide array of verbs available to them; Edward Tufte gives examples (Envisioning Information, 50):

select, edit, single out, structure, highlight, group, pair, merge, harmonize, synthesize, focus, organize, condense, reduce, boil down, choose, categorize, catalog, classify, list, abstract, scan, look into, idealize, isolate, discriminate, distinguish, screen, pigeonhole, pick over, sort, integrate, blend, inspect, filter, lump, skip, smooth, chunk, average, approximate, cluster, aggregate, outline, summarize, itemize, review, dip into, flip through, browse, glance into, leaf through, skim, refine, enumerate, glean, synopsize, winnow the wheat from the chaff, and separate the sheep from the goats

The ability to act in these ways is fragile.  Inferior works destroy verb possibilities (science and engineering) or never present them at all (liberal arts). Verbs are the casualties of PowerPoint bullets; nouns can often be picked out from the shrapnel but the connections between them are lost. But conversely, a focus on verbs promotes reason and the human intellect. Verbs manifest cognition and intelligence. Emphasizing verbs is a proxy and litmus test for cogent thought.

Infographics and Data Graphics

I’d like to set the record straight about two types of graphical documents floating around the internet. Most people don’t make a distinction between infographics and data graphics. Here are some of each – open them in new tabs and see if you can tell them apart.

No peeking!

No, really, stop reading and do it. I can wait.

Okay, had a look and made your categorizations? As I see it, dog food, energy, and job titles are infographics, and Chicago buildings, movie earnings, and gay rights are data graphics. Why? Here are some distinctions to look for, which will make much more sense now that you’ve seen some examples. Naturally these are generalizations and some documents will be hard to classify, but not as often as you might think.

Infographics emphasize typography, aesthetic color choice, and gratuitous illustration.
Data graphics are pictorially muted and focused; color is used to convey data.

Infographics have many small paragraphs of text communicate the information.
Data graphics are largely wordless except for labels and an explanation of the visual encoding.

In infographics, numeric data is scant, sparse, and piecemeal.
In data graphics, numeric data is plentiful, dense, and multivariate.

Infographics have many components that relate different datasets; sectioning is used.
Data graphics have single detailed image, or less commonly multiple windows into the same data.

An infographic is meant to be read through sequentially.
A data graphic is meant to be scrutinized for several minutes.

In infographics, the visual encoding of numeric information is either concrete (e.g. world map, human body), common (e.g. bar or pie charts), or nonexistent (e.g. tables).
In data graphics, the visual encoding is abstract, bespoke, and must be learned.

Infographics tell a story and have a message.
Data graphics show patterns and anomalies; readers form their own conclusions.

You may have heard the related term visualization – a data graphic is a visualization on steroids. (An infographic is a visualization on coffee and artificial sweetener.) A single bar, line, or pie chart is most likely a visualization but not a data graphic, unless it takes several minutes to absorb. However, visualizations and infographics are both generated automatically, usually by code. It should be fairly easy to add new data to a visualization or data graphic; not so for infographics.

If you look at sites like visual.ly which collects visualizations of all stripes, you’ll see that infographics far outnumber data graphics. Selection bias is partially at fault. Data graphics require large amounts of data that companies likely want to keep private. Infographics are far better suited to marketing and social campaigns, so they tend to be more visible. Some datasets are better suited to infographics than data graphics. However, even accounting for those facts, I think we have too many infographics and too few data graphics. This is a shame, because the two have fundamentally different worldviews.

An infographic is meant to persuade or inspire action. Infographics drive an argument or relate a story in a way that happens to use data, rather than allowing the user to infer more subtle and multifaceted meanings. A well-designed data graphic can be an encounter with the sublime. It is visceral, non-verbal, profound; a harmony of knowledge and wonder.

Infographics already have all the answers, and serve only to communicate them to the reader. A data graphic has no obvious answers, and in fact no obvious questions. It may seem that infographics convey knowledge, and data graphics convey only the scale of our ignorance, but in fact the opposite is true. An infographic offers shallow justifications and phony authority; it presents that facts as they are. (“Facts” as they “are”.) A data graphic does not foster any conclusion upon its reader, but at one level of remove, provides its readers with tools to draw conclusions. Pedagogically, infographics embrace the fundamentally flawed idea that learning is simply copying knowledge from one mind to another. Data graphics accept that learning is a process, which moves from mystery to complexity to familiarity to intuition. Epistemologically, infographics ask that knowledge be accepted on little to no evidence, while data graphics encourage using evidence to synthesize knowledge, with no prior conception of what this knowledge will be. It is akin to memorizing a fact about the world, or accepting the validity of the scientific method.

However, many of the design features that impart data graphics with these superior qualities can be exported back to infographics, with compelling results. Let’s take this example about ivory poaching. First off, it takes itself seriously: there’s no ostentatious typography and the colors are muted and harmonious. Second, subject matter is not a single unified dataset but multiple datasets that describe a unified subject matter. They are supplemented with non-numeric diagrams and illustrations, embracing their eclectic nature. Unlike most infographics, this specimen makes excellent use of layout to achieve density of information. Related pieces are placed in close proximity rather than relying on sections; the reader is free to explore in any order. This is what an infographic should be, or perhaps it’s worthy of a different and more dignified name, information graphic. It may even approach what Tufte calls “beautiful evidence”.

It’s also possible to implement a data graphic poorly. Usually this comes down to a poor choice of visual encoding, although criticism is somewhat subjective. Take this example of hurricanes since 1960. The circular arrangement is best used for months or other cyclical data. Time proceeds unintuitively counterclockwise. The strength of hurricanes is not depicted, only the number of them (presumably – the radial axis is not labeled!). The stacked bars make it difficult to compare hurricanes from particular regions. If one wants to compare the total number of hurricanes, one is again stymied by the polar layout. Finally, the legend is placed at the bottom, where it will be read last. Data graphics need to explain their encoding first; even better is to explain the encoding on the diagram itself rather than in a separate legend. For example, if the data were rendered as a line chart (in Cartesian coordinates), labels could be placed alongside the lines themselves. (Here is a proper data graphic on hurricane history.)

An infographic typically starts with a message to tell, but designers intent on honesty must allow the data to support their message. This is a leap of faith, that their message will survive first contact with the data. The ivory poaching information graphic never says that poaching is bad and should be stopped, in such simple words. Rather it guides us to that conclusion without us even realizing it. Detecting bias in such a document becomes much more difficult, but it also becomes much more persuasive (for sufficiently educated and skeptical readers). Similarly, poor data graphics obscure the data, either intentionally because they don’t support the predecided message, or unintentionally because of poor visual encoding. In information visualization, as in any field, we must be open to the hard process of understanding the truth, rather than blithely accepting what someone else wants us to believe.

I know which type of document I want to spend my life making.

Critical Complexity

Here’s a task for you: draw a circle radius three around the origin.

What system do you use? Well, you could use an intuitive system like Piaget’s turtle. Walk out three, turn ninety degrees, and then walk forward while turning inward. By identifying as a specific agent, you take advantage of having a brain that evolved to control a body. If it doesn’t seem intuitive, that’s because you’ve been trained to use other systems. Your familiarity is trumping what comes naturally, at least to children.

You’re probably thinking in Cartesian coordinates. You may even recall that x^2 + y^2 = 3^2 will give you the circle I asked for. But that’s only because you memorized it. Why this formula? It’s not obvious that it should be a circle. It doesn’t feel very circular, unless you fully understand the abstraction beneath it (in this case, the Pythagorean theorem) and how it applies to the situation.

Turtle geometry intuitively fits the human, but it’s limited and naive. Cartesian geometry accurately fits your monitor or graph paper, the technology, but it’s an awkward way to express circles. So let’s do something different. In polar coordinates, all we have to say is r=3 and we’re done. It’s not a compromise between the human and the technology, it’s an abstraction – doing something more elegant and concise than either native form. Human and technology alike  stretch to accommodate the new representation. Abstractions aren’t fuzzy and amorphous. Abstractions are crisp, and stacked on top of each other, like new shirts in a store.

We’ve invented notation that, for this problem, compresses the task as much as possible. The radius is specified; the fact that it’s a circle centered around the origin are implicit in the conventional meaning of r and the lack of other information. It’s been maximally compressed (related technical term: Kolmogorov complexity).

Compression is one of the best tools we have for fighting complexity. By definition, compression hides the meaningless while showing the meaningful. It’s a continuous spectrum, on which sits a point I’ll call critical complexity. Critical complexity is the threshold above which a significant abstraction infrastructure is necessary. But that definition doesn’t mean much to you — yet.

Think of knowledge as terrain. To get somewhere, we build roads, which in our metaphor are abstraction. Roads connect to each other, and take us to new places. It was trivial to abstract Cartesian coordinates into polar by means of conversions. This is like building a road, with one end connecting to the existing street grid and another ending somewhere new. It’s trivial to represent a circle in polar coordinates. This is what we do at the newly accessible location. We’ve broken a non-trivial problem into two trivial pieces – although it wasn’t a particularly hard problem, as otherwise we wouldn’t have been able to do that.

Delivering these words to your machine is a hard problem. You’re probably using a webbrowser, which is written in software code, which is running on digital electronics, which are derived from analog electronics obeying Maxwell’s equations, and so on. But the great thing about abstractions is that you only need to understand the topmost one. You can work in polar coordinates without converting back to Cartesian, and you can use a computer without obtaining multiple engineering degrees first. You can build your own network of roads about how to operate a computer, disconnected from your road network about physics.

Or perhaps not disconnected, but connected by a tunnel through the mountain of what you don’t understand. A tunnel is a way to bypass ignorance to learn about other things based on knowledge you don’t have, but don’t need. Of course, someone knows those things – they’ve laboriously built roads over the mountain so that you can cruise under it. These people, known as scientists and engineers, slice hard problems into many layers of smaller ones. A hard problem may have so many layers that, even if each is trivial on its own, they are non-trivial collectively. That said, some problems are easier than they look because our own sensemaking abstractions blind us.

If you want to write an analog clock in JavaScript, your best bet is to configure someone else’s framework. That is, you say you want a gray clockface and a red second hand, and the framework magically does it. The user, hardly a designer, is reduced to muttering incantations at a black box hoping the spell will work as expected. Inside the box is some 200 lines or more, most of it spent on things not at all related to the high-level description of an analog clock. The resulting clock is a cul-de-sac at the end of a tunnel, overlooking a precipice.

By contrast, the nascent Elm language provides a demo of the analog clock. Its eight lines of code effectively define the Kolmogorov complexity: each operation is significant. Almost every word or number defines part of the dynamic drawing in some way. To the programmer, the result is liberating. If you want to change the color of the clockface, you don’t have to ask the permission of a framework designer, you just do it. The abstractions implicit in Elm have pushed analog clocks under the critical complexity, which is the point above which you need to build a tunnel.

There’s still a tunnel involved, though: the compiler written in Haskell that converts Elm to JavaScript. But this tunnel is already behind us when we set out to make an analog clock. Moreover, this tunnel leads to open terrain where we can build many roads and reach many places, rather than the single destination offered by the framework. What’s important isn’t the avoidance of tunnels, but of tunnels to nowhere. Each abstraction should have a purpose, which is to open up new terrain where abstractions are not needed, because getting around is trivial.

However, the notion of what’s trivial is subjective. It’s not always clear what’s a road and what’s a tunnel. Familiarity certainly makes any abstraction seem simpler. Though we gain a better grasp on an abstraction by becoming familiar with it, we also lose sight of the underlying objective nature of abstractions: some are more intuitive or more powerful than others. Familiarity can be born both by understanding where an idea comes from and how it relates to others, and by practicing using the idea on its own. I suspect that better than either one is both together. With familiarity comes automaticity, where we can quickly answer questions by relying on intuition, because we’ve seen them or something similar before. But depending on the abstraction, familiarity can mean never discarding naïveté (turtle), contorting into awkward mental poses (Cartesian) – or achieving something truly elegant and powerful.

It’s tempting to decry weak or crippling abstractions, but they too serve a purpose. Like the fancy algorithms that are slow when n is small, fancy abstractions are unnecessary for simple problems. Yes, one should practice using them on simple problems as to have familiarity when moving into hard ones. But before that, one needs to see for oneself the morass weak or inappropriately-chosen abstractions create. Powerful abstractions, I am increasingly convinced, cannot be be constructed on virgin mental terrain. For each individual, they must emerge from the ashes of an inferior system that provides both experience and motivation to build something stronger.