Archive for the ‘Epistemology’ Category

The Fallacy of the Right Answer

The Fallacy of the Right Answer is everywhere. With regards to education technology, it dates back at least to BF Skinner.

Skinner saw education as a series of definite, discrete, linear steps along a fixed, straight road; today this is called a curriculum. He referred to a child who guesses the password as “being right”. Khan Academy uses similar gatekeeping techniques in its exercises, limiting the context. Students must meet one criterion before proceeding to the next, being spoon-fed knowledge and seeing through a peephole not unlike Skinner’s machines. Furthermore, these steps are claimed to be objective, universal and emotionless. Paul Lockhart calls this the “ladder myth”, the conception of mathematics as a clear hierarchy of dependencies. But the learning hierarchy is tangled, replete with strange loops.

It is fallacious yet popular to think that a concept, once learned, is never forgotten. But most educated adults I know (including myself) find value in rereading old material, and make connections back to what they already have learned. What was once understood narrowly or mechanically can, when revisited, be understood in a larger or more abstract context, or with new cognitive tools. There are two words for “to know” in French. Savoir means to know a fact, while connaitre means to be familiar with, comfortable with, to know a person. The Right Answer loses sight of the importance, even the possibility, of knowing a piece of information like an old friend, to find pleasure in knowing, to know for knowing’s sake, because you want to. Linear teaching is workable for teaching competencies but not for teaching insights, things like why those mechanical methods work, how they can be extended, and how they can fail.

Symbol manipulation according to fixed rules is not cognition but computation. The learners take on the properties of the machines, and those who programmed them. As Papert observed, the computer programs the child, not the other way around (as he prefers). Much of this mechanical emphasis is driven by the SAT and other unreasonable standardized tests which are nothing more than timed high-stakes guessing games. They are gatekeepers to the promised land of College. Proponents of education reform frequently cite distinct age-based grades as legacy of the “factory line model” dating back to the industrial revolution. This model permeates not only how we raise children, but more importantly, what we raise them to do, what we consider necessary of an educated adult. Raising children to work machinery is the same as, or has given way to, raising them to work like machinery. Tests like the SAT emphasize that we should do reproducible de-individualized work, compared against a clear, ideal, unachievable standard. Putting this methodology online does not constitute a revolution or disruption.



Futurists have gone as far to see the brain itself as programmable, in some mysteriously objective sense. At some point, Nicholas Negroponte veered off his illustrious decades-long path. Despite collaborating with Seymour Papert at the Media Lab, his recent work has been dropping tablets into rural villages. Instant education, just add internet! It’s great that the kids are teaching themselves, and have some autonomy, but who designed the apps they play with? What sort of biases and fallacies do they harbor? Do African children learning the ABCs qualify as cultural imperialism? His prediction for the next thirty years is even more troublesome: that we’ll acquire knowledge by ingesting it. Shakespeare will be encoded into some nano-molecular device that works its way through the blood-brain barrier, and suddenly: “I know King Lear!”. Even if we could isolate the exact neurobiological processes that constitute reading the Bard, we all understand Shakespeare in different ways. All minds are unique, and therefore all brains are unique. Meanwhile, our eyes have spent a few hundred million years of evolutionary time adapting to carry information from the outside world into our mind at the speed of an ethernet connection. Knowledge intake is limited not by perception but by cognition.

Tufte says, to simplify, add context. Confusion is not a property of information but of how it is displayed. He said these things in the context of information graphics but they apply to education as well. We are so concerned with information overload that we forget information underload, where our brain is starved for detail and context. It is not any particular fact, but the connections between them, that constitute knowledge.  The fallacy of reductionism is to insist that every detail matters: learn these things and then you are educated! The fallacy of holism is to say that no details matter: let’s just export amorphous nebulous college-ness and call it universal education! Bret Victor imagines how we could use technology to move from a contrived, narrow problem into a deeper understanding about generalized, abstract notions, much as real mathematicians do. He also presents a mental model for working on a difficult problem:

I’m trying to build a jigsaw puzzle. I wish I could show you what it will be, but the picture isn’t on the box. But I can show you some of the pieces… If you are building a different puzzle, it’s possible these pieces won’t mean much to you. You might not have a spot for them to fit, or you might not yet. On the other hand, maybe some of these are just the pieces you’ve been looking for.

One concern with Skinner’s teaching machines and their modern-day counterparts is that they isolate each student and cut off human interaction. We learn from each other, and many of the things that we learn fall outside of the curriculum ladder. Learning to share becomes working on a team; show-and-tell becomes leadership. Years later, in college, many of the most valuable lessons are unplanned, a result of meeting a person with very different ideas, or hearing exactly what you needed to at that moment. I found that college exposed to me brilliant people, and I could watch them analyze and discuss a problem. The methodology was much more valuable than the answer it happened to yield.

The hallmark of an intellectual is do create daily what has never existed before. This can be an engineer’s workpiece, an programmer’s software, a writer’s novel, a researcher’s paper, or an artist’s sculpture. None of these can be evaluated by comparing them to a correct answer, because the correct answer is not known, or can’t even exist. The creative intellectual must have something to say and know how to say it; ideas and execution must both be present. The bits and pieces of a curriculum can make for a good technician (a term I’ve heard applied to a poet capable of choosing the exact word). It’s not so much that “schools kill creativity” so much as they replace the desire to create with the ability to create. Ideally schools would nurture and refine the former (assuming something-to-say is mostly innate) while instructing the latter (assuming saying-it-well is mostly taught).

What would a society look like in which everyone was this kind of intellectual? If everyone is writing and drawing, who will take out the trash, harvest food, etc? Huxley says all Alphas and no Epsilons doesn’t work. Like the American South adjusting to an economy without slaves, elevating human dignity leaves us with the question of who will do the undignified work. As much as we say that every child deserves an education, I think that the creative intellectual will remain in an elite minority for years to come, with society continuing to run on the physical labor of the uneducated. If civilization ever truly extends education to all, then either we will need to find some equitable way of sharing the dirty work (akin to utopian socialist communes), or we’ll invent highly advanced robots. Otherwise, we may need to ask ourselves a very unsettling question: can we really afford to extend education to all, given the importance of unskilled labor to keep society running?


If you liked this post, you should go read everything Audrey Watters has written. She has my thanks.


Prefer Verbs to Nouns

My principle, v0.2

Prefer verbs to nouns.

When Bret Victor introduced the concept of a principle, he said a good principle can be applied “in a fairly objective way”. This is the biggest problem with my first draft, which took several sentences to define what a powerful way of thinking was. A principle must be general enough to apply to many situations, but also able to operationalize to find meaning in any specific situation. Devising or crafting a principle requires inductive reasoning (specific to general), but applying it demands deductive reasoning (general to specific). Forging a principle resembles Paul Lockhart’s vision of mathematics: an idea that at first may be questioned and refined, but at some point begins “talk back”, instructing its creator rather than being shaped by it.

I could have formulated the principle as verbs, not nouns or similar, but the principle itself demands a verb. I have chosen prefer, but I fear that may not be active enough; something closer to choose or emphasize verbs over nouns may more fitting. As the principle predicts, identifying a dichotomy and even choosing one side is easy compared to selecting the verb to encompass the process and relationship. This principle retains status as a draft, although unlike its predecessor it does not have the glaring flaw of subjective application. The verb (and preposition serving it) are still to be determined, and the possibility of cutting a new principle from whole cloth also remains open.

All of this without a discussion of the principle itself! Human language is endlessly versatile and adaptive, and therefore (in hindsight!) it is quite fitting that I use the terms of language itself. Of course the principle does not apply specifically to language, but any field that involves structures and the relationships between them, which is to say, any field at all. It can apply to essays, presentations, or works of art. Finding the verbs and nouns of a particular field is often easy, even if it is difficult to abstract the process. With that said, verbs are not always grammatically verbs; -ing and -tion nouns can be fine verbs for the purpose of the principle.

The verbs should be emphasized to your audience, but the setting will determine how you craft their experience. Most of the liberal arts require grappling with verbs directly; a good thesis is architected around a verb that relates otherwise disparate observations or schools of thought. By emphasizing the verbs, one communicates causal mechanisms, transformations, relationships, and differences across time, location, demographics, and other variables. The goal is not merely to show that the nouns differ (“the a had x but the b had y”), but why, what acted on them to cause the differences. Frequently the base material (often historical events or written works) are already known to your audience, and you need to contribute more than just a summary. You need to justify a distinction.

However, in the presence of detailed, substructured, and numeric nouns, it is often best to let them speak directly. Often the evidence itself is novel, such as a research finding, and you want to present it objectively. In such cases, more frequent in science and engineering, placing your audience’s focus on verbs requires that you place yours on presenting the nouns. The more nouns you have, the more ways they can relate to each other; the more detailed the nouns, the more nuanced those relationships can be. When the nouns are shown correctly, your audience will have a wide array of verbs available to them; Edward Tufte gives examples (Envisioning Information, 50):

select, edit, single out, structure, highlight, group, pair, merge, harmonize, synthesize, focus, organize, condense, reduce, boil down, choose, categorize, catalog, classify, list, abstract, scan, look into, idealize, isolate, discriminate, distinguish, screen, pigeonhole, pick over, sort, integrate, blend, inspect, filter, lump, skip, smooth, chunk, average, approximate, cluster, aggregate, outline, summarize, itemize, review, dip into, flip through, browse, glance into, leaf through, skim, refine, enumerate, glean, synopsize, winnow the wheat from the chaff, and separate the sheep from the goats

The ability to act in these ways is fragile.  Inferior works destroy verb possibilities (science and engineering) or never present them at all (liberal arts). Verbs are the casualties of PowerPoint bullets; nouns can often be picked out from the shrapnel but the connections between them are lost. But conversely, a focus on verbs promotes reason and the human intellect. Verbs manifest cognition and intelligence. Emphasizing verbs is a proxy and litmus test for cogent thought.

Infographics and Data Graphics

I’d like to set the record straight about two types of graphical documents floating around the internet. Most people don’t make a distinction between infographics and data graphics. Here are some of each – open them in new tabs and see if you can tell them apart.

No peeking!

No, really, stop reading and do it. I can wait.

Okay, had a look and made your categorizations? As I see it, dog food, energy, and job titles are infographics, and Chicago buildings, movie earnings, and gay rights are data graphics. Why? Here are some distinctions to look for, which will make much more sense now that you’ve seen some examples. Naturally these are generalizations and some documents will be hard to classify, but not as often as you might think.

Infographics emphasize typography, aesthetic color choice, and gratuitous illustration.
Data graphics are pictorially muted and focused; color is used to convey data.

Infographics have many small paragraphs of text communicate the information.
Data graphics are largely wordless except for labels and an explanation of the visual encoding.

In infographics, numeric data is scant, sparse, and piecemeal.
In data graphics, numeric data is plentiful, dense, and multivariate.

Infographics have many components that relate different datasets; sectioning is used.
Data graphics have single detailed image, or less commonly multiple windows into the same data.

An infographic is meant to be read through sequentially.
A data graphic is meant to be scrutinized for several minutes.

In infographics, the visual encoding of numeric information is either concrete (e.g. world map, human body), common (e.g. bar or pie charts), or nonexistent (e.g. tables).
In data graphics, the visual encoding is abstract, bespoke, and must be learned.

Infographics tell a story and have a message.
Data graphics show patterns and anomalies; readers form their own conclusions.

You may have heard the related term visualization – a data graphic is a visualization on steroids. (An infographic is a visualization on coffee and artificial sweetener.) A single bar, line, or pie chart is most likely a visualization but not a data graphic, unless it takes several minutes to absorb. However, visualizations and infographics are both generated automatically, usually by code. It should be fairly easy to add new data to a visualization or data graphic; not so for infographics.

If you look at sites like which collects visualizations of all stripes, you’ll see that infographics far outnumber data graphics. Selection bias is partially at fault. Data graphics require large amounts of data that companies likely want to keep private. Infographics are far better suited to marketing and social campaigns, so they tend to be more visible. Some datasets are better suited to infographics than data graphics. However, even accounting for those facts, I think we have too many infographics and too few data graphics. This is a shame, because the two have fundamentally different worldviews.

An infographic is meant to persuade or inspire action. Infographics drive an argument or relate a story in a way that happens to use data, rather than allowing the user to infer more subtle and multifaceted meanings. A well-designed data graphic can be an encounter with the sublime. It is visceral, non-verbal, profound; a harmony of knowledge and wonder.

Infographics already have all the answers, and serve only to communicate them to the reader. A data graphic has no obvious answers, and in fact no obvious questions. It may seem that infographics convey knowledge, and data graphics convey only the scale of our ignorance, but in fact the opposite is true. An infographic offers shallow justifications and phony authority; it presents that facts as they are. (“Facts” as they “are”.) A data graphic does not foster any conclusion upon its reader, but at one level of remove, provides its readers with tools to draw conclusions. Pedagogically, infographics embrace the fundamentally flawed idea that learning is simply copying knowledge from one mind to another. Data graphics accept that learning is a process, which moves from mystery to complexity to familiarity to intuition. Epistemologically, infographics ask that knowledge be accepted on little to no evidence, while data graphics encourage using evidence to synthesize knowledge, with no prior conception of what this knowledge will be. It is akin to memorizing a fact about the world, or accepting the validity of the scientific method.

However, many of the design features that impart data graphics with these superior qualities can be exported back to infographics, with compelling results. Let’s take this example about ivory poaching. First off, it takes itself seriously: there’s no ostentatious typography and the colors are muted and harmonious. Second, subject matter is not a single unified dataset but multiple datasets that describe a unified subject matter. They are supplemented with non-numeric diagrams and illustrations, embracing their eclectic nature. Unlike most infographics, this specimen makes excellent use of layout to achieve density of information. Related pieces are placed in close proximity rather than relying on sections; the reader is free to explore in any order. This is what an infographic should be, or perhaps it’s worthy of a different and more dignified name, information graphic. It may even approach what Tufte calls “beautiful evidence”.

It’s also possible to implement a data graphic poorly. Usually this comes down to a poor choice of visual encoding, although criticism is somewhat subjective. Take this example of hurricanes since 1960. The circular arrangement is best used for months or other cyclical data. Time proceeds unintuitively counterclockwise. The strength of hurricanes is not depicted, only the number of them (presumably – the radial axis is not labeled!). The stacked bars make it difficult to compare hurricanes from particular regions. If one wants to compare the total number of hurricanes, one is again stymied by the polar layout. Finally, the legend is placed at the bottom, where it will be read last. Data graphics need to explain their encoding first; even better is to explain the encoding on the diagram itself rather than in a separate legend. For example, if the data were rendered as a line chart (in Cartesian coordinates), labels could be placed alongside the lines themselves. (Here is a proper data graphic on hurricane history.)

An infographic typically starts with a message to tell, but designers intent on honesty must allow the data to support their message. This is a leap of faith, that their message will survive first contact with the data. The ivory poaching information graphic never says that poaching is bad and should be stopped, in such simple words. Rather it guides us to that conclusion without us even realizing it. Detecting bias in such a document becomes much more difficult, but it also becomes much more persuasive (for sufficiently educated and skeptical readers). Similarly, poor data graphics obscure the data, either intentionally because they don’t support the predecided message, or unintentionally because of poor visual encoding. In information visualization, as in any field, we must be open to the hard process of understanding the truth, rather than blithely accepting what someone else wants us to believe.

I know which type of document I want to spend my life making.

Critical Complexity

Here’s a task for you: draw a circle radius three around the origin.

What system do you use? Well, you could use an intuitive system like Piaget’s turtle. Walk out three, turn ninety degrees, and then walk forward while turning inward. By identifying as a specific agent, you take advantage of having a brain that evolved to control a body. If it doesn’t seem intuitive, that’s because you’ve been trained to use other systems. Your familiarity is trumping what comes naturally, at least to children.

You’re probably thinking in Cartesian coordinates. You may even recall that x^2 + y^2 = 3^2 will give you the circle I asked for. But that’s only because you memorized it. Why this formula? It’s not obvious that it should be a circle. It doesn’t feel very circular, unless you fully understand the abstraction beneath it (in this case, the Pythagorean theorem) and how it applies to the situation.

Turtle geometry intuitively fits the human, but it’s limited and naive. Cartesian geometry accurately fits your monitor or graph paper, the technology, but it’s an awkward way to express circles. So let’s do something different. In polar coordinates, all we have to say is r=3 and we’re done. It’s not a compromise between the human and the technology, it’s an abstraction – doing something more elegant and concise than either native form. Human and technology alike  stretch to accommodate the new representation. Abstractions aren’t fuzzy and amorphous. Abstractions are crisp, and stacked on top of each other, like new shirts in a store.

We’ve invented notation that, for this problem, compresses the task as much as possible. The radius is specified; the fact that it’s a circle centered around the origin are implicit in the conventional meaning of r and the lack of other information. It’s been maximally compressed (related technical term: Kolmogorov complexity).

Compression is one of the best tools we have for fighting complexity. By definition, compression hides the meaningless while showing the meaningful. It’s a continuous spectrum, on which sits a point I’ll call critical complexity. Critical complexity is the threshold above which a significant abstraction infrastructure is necessary. But that definition doesn’t mean much to you — yet.

Think of knowledge as terrain. To get somewhere, we build roads, which in our metaphor are abstraction. Roads connect to each other, and take us to new places. It was trivial to abstract Cartesian coordinates into polar by means of conversions. This is like building a road, with one end connecting to the existing street grid and another ending somewhere new. It’s trivial to represent a circle in polar coordinates. This is what we do at the newly accessible location. We’ve broken a non-trivial problem into two trivial pieces – although it wasn’t a particularly hard problem, as otherwise we wouldn’t have been able to do that.

Delivering these words to your machine is a hard problem. You’re probably using a webbrowser, which is written in software code, which is running on digital electronics, which are derived from analog electronics obeying Maxwell’s equations, and so on. But the great thing about abstractions is that you only need to understand the topmost one. You can work in polar coordinates without converting back to Cartesian, and you can use a computer without obtaining multiple engineering degrees first. You can build your own network of roads about how to operate a computer, disconnected from your road network about physics.

Or perhaps not disconnected, but connected by a tunnel through the mountain of what you don’t understand. A tunnel is a way to bypass ignorance to learn about other things based on knowledge you don’t have, but don’t need. Of course, someone knows those things – they’ve laboriously built roads over the mountain so that you can cruise under it. These people, known as scientists and engineers, slice hard problems into many layers of smaller ones. A hard problem may have so many layers that, even if each is trivial on its own, they are non-trivial collectively. That said, some problems are easier than they look because our own sensemaking abstractions blind us.

If you want to write an analog clock in JavaScript, your best bet is to configure someone else’s framework. That is, you say you want a gray clockface and a red second hand, and the framework magically does it. The user, hardly a designer, is reduced to muttering incantations at a black box hoping the spell will work as expected. Inside the box is some 200 lines or more, most of it spent on things not at all related to the high-level description of an analog clock. The resulting clock is a cul-de-sac at the end of a tunnel, overlooking a precipice.

By contrast, the nascent Elm language provides a demo of the analog clock. Its eight lines of code effectively define the Kolmogorov complexity: each operation is significant. Almost every word or number defines part of the dynamic drawing in some way. To the programmer, the result is liberating. If you want to change the color of the clockface, you don’t have to ask the permission of a framework designer, you just do it. The abstractions implicit in Elm have pushed analog clocks under the critical complexity, which is the point above which you need to build a tunnel.

There’s still a tunnel involved, though: the compiler written in Haskell that converts Elm to JavaScript. But this tunnel is already behind us when we set out to make an analog clock. Moreover, this tunnel leads to open terrain where we can build many roads and reach many places, rather than the single destination offered by the framework. What’s important isn’t the avoidance of tunnels, but of tunnels to nowhere. Each abstraction should have a purpose, which is to open up new terrain where abstractions are not needed, because getting around is trivial.

However, the notion of what’s trivial is subjective. It’s not always clear what’s a road and what’s a tunnel. Familiarity certainly makes any abstraction seem simpler. Though we gain a better grasp on an abstraction by becoming familiar with it, we also lose sight of the underlying objective nature of abstractions: some are more intuitive or more powerful than others. Familiarity can be born both by understanding where an idea comes from and how it relates to others, and by practicing using the idea on its own. I suspect that better than either one is both together. With familiarity comes automaticity, where we can quickly answer questions by relying on intuition, because we’ve seen them or something similar before. But depending on the abstraction, familiarity can mean never discarding naïveté (turtle), contorting into awkward mental poses (Cartesian) – or achieving something truly elegant and powerful.

It’s tempting to decry weak or crippling abstractions, but they too serve a purpose. Like the fancy algorithms that are slow when n is small, fancy abstractions are unnecessary for simple problems. Yes, one should practice using them on simple problems as to have familiarity when moving into hard ones. But before that, one needs to see for oneself the morass weak or inappropriately-chosen abstractions create. Powerful abstractions, I am increasingly convinced, cannot be be constructed on virgin mental terrain. For each individual, they must emerge from the ashes of an inferior system that provides both experience and motivation to build something stronger.

The Top 5 Things Done Wrong in Math Class

Sorry to jump on the top-n list bandwagon, as Vi Hart deliciously parodies, but that’s just how this one shakes out. Some of the reasons why these things are done wrong are pretty advanced, but if you’re a high school student who stumbled upon this blog, please stay and read. Know that it’s okay that you won’t get everything.

All of these gripes stem from the same source: they obfuscate what ought to be clear and profound ideas. They’re why math is hard. Like a smudge on a telescope lens, these practices impair the tool used to explore the world beyond us.

EDIT: This list focuses on notation and naming. There are other “things” done wrong in math class that any good teacher will agonize over with far more subtlety and care than this or any listicle.

5. Function Composition Notation

Specifically f \circ g, which is the same as g(f(x)). No wait, f(g(x)). Probably. This notation comes with a built-in “gotcha”, which requires mechanical memorization apart from the concept of function composition itself. The only challenge is to translate between conventions. In this case, nested parentheses are ready-made to represent composition without requiring any new mechanistic knowledge. They exploit the overloading of parentheses for both order of operations and function arguments; just work outwards as you’ve always done. We should not invent new symbols to describe something adequately described by the old ones.

Nested parentheses lend themselves to function iteration, f(f(x)). These functions are described using exponents, which play nice with the parens to make the critical distinction between f^2(x) = f(f(x)) and f(x)^2 = (f(x))^2 = f(x)f(x). This distinction becomes critical when we say arcsine aka \sin^{-1} and cosecant aka \frac {1}{\sin} are both the inverses of sine. Of course, things get confusing again when we drop the parens and get \sin^2x = (\sin x)^2 because \sin x^2 = \sin (x^2). This notation also supports  first-class functions: once we define a doubling function d(x) = 2x, what is meant by d(f)? I’d much rather explore this idea, which is “integral” to calculus (and functional programming), than quibble over a symbol.

4. The Word “Quadratic”

I’m putting “quadratic” where it belongs: number four. The prefix quadri- means four in every other context, dating back to Latin. (The synonym tetra- is Greek.) So why is x^2 called “quadratic”? Because of a quadrilateral, literally a four-sided figure. But the point isn’t the number of sides, it’s the number of dimensions. And dimensionality is tightly coupled with the notion of the right angle. And since x equals itself, then we’re dealing with not just an arbitrary quadrilateral but a right-angled one with equal sides, otherwise known as a square. So just as x^3 is cubic growth, x^2 is should be called squared growth. No need for any fancy new adjectives like “biatic”, just start using “square”. (Adverb: squarely.) It’s really easy to stop saying four when you mean two.

3.14 Pi

Unfortunately, there is a case when we have to invent a new term and get people to use it. We need to replace pi, because pi isn’t the circle constant. It’s the semicircle constant.

The thrust of the argument is that circles are defined by their radius, not their diameter, so the circle constant should be defined off the radius as well. Enter tau, \tau = \frac{C}{r}. Measuring radians in tau simplifies the unit circle tremendously. A fraction time tau is just the fraction of the total distance traveled around the circle. This wasn’t obvious with pi because the factor of 2 canceled half the time, producing \frac{5}{4}\pi instead of \frac{5}{8}\tau.

If you’ve never heard of tau before, I highly recommend you read Michael Hartl’s Tau Manifesto. But my personal favorite argument comes from integrating in spherical space. Just looking at the integral bounds for a sphere radius R:

\int_{\theta=0}^{2\pi} \int_{\phi=0}^{\pi} \int_{\rho=0}^{R}

It’s immediately clear that getting rid of the factor of two for the \theta (theta) bound will introduce a factor of one-half for the \phi (phi) bound:

\int_{\theta=0}^{\tau} \int_{\phi=0}^{\frac{\tau}{2}} \int_{\rho=0}^{R}

However, theta goes all the way around the circle (think of a complete loop on the equator). Phi only goes halfway (think north pole to south pole). The half emphasizes that phi, not theta, is the weird one. It’s not about reducing the number of operations, it’s about hiding the meaningless and showing the meaningful.

2. Complex Numbers

This is a big one. My high school teacher introduced imaginary numbers as, well, imaginary. “Let’s just pretend negative one has a square root and see what happens.” This method is backwards. If you’re working with polar vectors, you’re working with complex numbers, whether you know it or not.

Complex addition is exactly the the same as adding vectors in the xy plane. It’s also the same as just adding two numbers and then another two numbers, and then writing i afterwards. In this case, you might as well just work in R^2. (Oh hey, another use of exponents.) You can use the unit vectors \hat{x} and \hat{y}, rather than i and j which will get mixed up with the imaginary unit, and besides, you defined that hat to mean a unit vector. Use the notation you define, or don’t define it.

Complex numbers are natively polar. Every high school student (and teacher) should read and play through Steven Witten’s jaw-dropping exploration of rotating vectors. (Again students, the point isn’t to understand it all, the point is to have your mind blown.) Once we’ve defined complex multiplication – angles add, lengths multiply – then 1 \angle 90^{\circ} falls out as the square root of 1 \angle 180^{\circ} completely naturally. You can’t help but define it. And moreover, (1 \angle -90^{\circ})^2 goes around the other way, and its alternate representation (1 \angle 270^{\circ})^2 goes around twice, but they all wind up at negative one. Complex numbers aren’t arbitrary and forced; they’re a natural consequence of simple rules.

Even complex conjugates work better with angles. Instead of an algebraic argument and a formula to memorize, we can geometrically see that we we need to add an angle that brings us back to horizontal, which is just the negative of the angle we already have. This is mathematically equivalent to changing the sign on the imaginary component of the vector, but cognitively it’s very different. You can, with clarity and precision, see what you are doing in a way numerals can never express.

1. Boxplots

Boxplots make the top of the list because they’re taught at a young age and never challenged. They are brought up as a standard way to visualize data, when the boxplot was a relatively recent invention of one statistician, John Tukey. Edward Tufte has proposed variants which dramatically reduce the ink on the page. They are much easier to draw, which is important when you want to convince children that math isn’t about meticulous marks on the page. They have no horizontal component, so in addition to being more compact, they also do not encode non-information in their width.


Boxplots infuriate me because they indoctrinate the idea that there is one way to do it, and that it is not up for discussion. More time is spent on where to draw the lines than why quartiles are important, or how to read what a boxplot says about that data. Boxplots epitomize math as a recipebook, where your ideas are invalid by default and improvisation is prohibited. Nothing could be further from the truth. Moreover, boxplots slap a one-size-fits-all visualization on the data without bothering to ask what other things we could do with them. Tukey’s plots don’t just obscure the data, they obscure data science.

Abstraction and Standardization

What is the future of art? What media will it use? Computers, obviously. Information technology is very good at imitating old media: drawing programs, music programs, word processors designed for playwrights or authors. But none of these tap into the intrinsic strengths of the computer, able to do something no other medium can: simulate. Bret Victor, the man so demanding of user interfaces he left Apple, is dissatisfied with the tools available to artists that allow them to simulate. So he made his own, and gave a one-hour talk on it.

Those interested should definitely take the time to watch it, but to summarize, he demonstrates the power of simulation in creating art that is part animation and part performance, with the human and computer reacting to one another. He then lifts the curtain and show us the tools he used to simulate the characters in the scene, and it’s not code. Instead, it’s a drawing program, with lines and shapes, that he uses to define behavior. Code, he points out, is based on algebra, but his system is based on geometry. Finally, he concludes with a short performance that he built with these tools. Higher is the story of earth, from the stars to cells to civilization to space travel back to the stars.

What blew my mind about Higher is that a few years ago, I had independently created a short film on exactly that topic, with exactly the same background music (Kyle Gabler’s Best Of Times from World of Goo). Victor’s piece was far more polished, but we had both been inspired by the same music to express the same idea, the journey of life to the stars. Remember when I complained about not finding people who shared my narrative? So this is what that feels like.

What drove Victor to create his tools was the belief that art is an attempt to communicate that which cannot be put into words. By binding simulation to lingual code, we make it inaccessible and unsuitable for art and artists. Direct manipulation of the art, which is how art has been created going back to cave paintings, allows the artist to interact with and lend emotion to the art in ways not possible through code’s layer of indirection, of abstraction.

The reason artists’ needs have been neglected by developers is that, for the rest of the world, code works just fine. As I’ve previously blogged, language is one of humankind’s most powerful inventions. The direct manipulation that is liberating to the artist is confining to the engineer. Language is how we manage many layers of abstraction at once; without it we are reduced to pointing and grunting. It’s harder to communicate with a computer in code than a well-designed direct manipulation interface, but code is more powerful. In the sciences, a good result is consistent with what is already known; in art, a good piece is unexpected and shakes our established worldview. More fundamentally, the sciences observe and record some objective outside truth; art looks inward to offer one of many interpretations of the subjective human experience.

This tension that we see between science and art also shows up in schools. In a recent TED talk, Sir Ken Robinson extols diversity as a fundamental human trait, which schools attempt to erase and replace with standardization. We agree that standardization has its place, but I personally think he downplays its importance. Standardization is writing, is language; those things can’t happen without common ways of thinking. At first, children need to explore concepts and use their own terms, without a top-down lesson plan imposed by school administrators. Nevertheless, the capstone is always learning what the rest of the world calls it. That isn’t smashing creativity, but rather empowering the child to learn more about the topic from others and from reference sources. It’s creating a minimum level of knowledge common every adult member of society, which is assumed by all media. Being able to communicate  facts with others isn’t just the result of education, it’s what makes education possible in the first place. With language, groups of people can unambiguously refer to things not present, a shared imagination. Verbalization is a form of abstraction.

Let’s get back to the role of diversity in school. Students should be able to explore what interests them, but the converse is not true: some topics must be taught to everyone, even if some people do not find them interesting. This is especially true before high school. I know you’re not passionate about fractions, Little Johnny, but you need to learn them. Society expects everyone to have a minimum level of competence in every subject. Additionally, passion for a field isn’t always “love at first sight”. The future mathematician isn’t always the first in the class to get basic arithmetic.

Although the curriculum needs to be largely standardized, the pedagogy does not. The neglect of diversity in schools is most heavily felt not in what kids are or are not learning, but how they are learning it. The inflexibility imposed on lesson plans is degrading to teachers and failing our kids. Teachers should be trusted to adapt lessons to their class, and empowered with testing results they find useful, early enough to use it. Standardized testing as it exists today does not fit the bill. Every student needs to achieve the same core competencies, but the paths to doing so will be as diverse as the children themselves. A broad exposure to both methods and topics promotes the development not just of knowledge, but of personality and identity. The reason to have art in school isn’t to improve test scores but because it’s part of being human.

To be more precise, we should distinguish between “the arts” and “art”. The arts are how to create with the media classically used for art: paint, music, poetry, drama, dance, and so on. Like any other discipline, the arts require a standardized language to record and transfer this knowledge. Sometimes it’s plain English, sometimes it’s jargon, sometimes it’s symbols, but it’s still an agreed-upon abstraction. Diversity of ideas expressed in the language is inventive and healthy; diversity of the language itself is nonstandard and chaotic. With this in mind, the arts take their place at one end of a spectrum of knowledge: mathematics, natural science, social science, and history. And the arts.

But art is something entirely different. It is the personal and emotional perception of an experience that communicates without words. Art is direct and concrete; it is subjective and sublime. Much of the arts attempt to create art. Victor’s tools advance the arts; what he creates with them is art.

It’s a defensible position to say that art, because it does not rely on language as all the other fields of knowledge do, is not knowledge at all. But I’ll indulge Victor and say that not all knowledge can be verbalized. That doesn’t mean that art is beyond classification; Victor and I saw the same artistic ideas in the same piece of lyricless music. Conversely, just because something is written down doesn’t mean it’s standardized or useful knowledge. Recently, the mathematics community has been bewildered by an inscrutable set of papers which claim to prove a fundamental piece of number theory. No one can decipher them to tell if the proof is valid, and their author has not been forthcoming with an oral explanation. So in extreme cases, the analogy between language and standardization breaks down. The wordless expression is more coherent than words.

For all the knowledge that abstract language has brought us, ineffable art remains part of the human experience. It is important for our children to learn about art to become mature and thoughtful adults. It is equally important for us to provide tools that support the nonverbal side of thought, to engage the visual and auditory parts of our brains in ways words never can. These are the same failure: the refuge in abstraction, the desire to have everything neat and orderly and predictable. Art exists to explore ambiguity and paradox; it does not demand simple answers but asks complex questions.

A lot of futurists imagine a time when technology makes everything easy. There is a faith in technological convergence, where everything speaks the same language and interacts intelligently and flawlessly. But historically we see technologies become incompatible. If there’s an open standard underneath, such as email, you still get dozens of providers and clients; and if there’s not, you get the walled gardens of social media, loosely tied together by third-party “integration”. What’s important to realize is that the path of technology is not fixed. Our gadgets don’t have to make us more productive and connected; they can make us more artistic and provide privacy, if we design them so. We should stop aspiring to a monoculture of technology because, not only will it not happen for technical and economic reasons, it shouldn’t happen. Standardized technology leads to standardized thinking, especially when coupled with standardized social institutions. Creativity is  not only what drives technology further, but art and humanity as well.

Sherlock Holmes and Hard Problems

“With a few simple lines of computer code,” boasts Moriarty in BBC Sherlock, “I can crack any bank, open any door”. (Paraphrased from memory, shh.) Without any spoilers, I can tell you that Sherlock’s nemesis is portrayed as controlling every detail, forseeing every possibility, and manipulating a web of individuals through blackmail, bribery, snipers, and sowing distrust. And, he makes vague claims of having the ultimate computer hack, stronger than any security system.

What kind of software would this be? Most of computer security relies on mathematics that is computationally hard. Consider a traditional padlock. If you know the combination, it takes almost no time to open the lock. If you don’t know the combination, you have to try every possible code. The combination is easy to check, but difficult to discover.

A completely general computer hack of the sorts Moriarty claims to have would be like being able to open a padlock without the combination as fast as you could with the combination. Sherlock operates in much the same way. Anyone can verify his string of deductions after he’s made them; his genius is to devise them in the first place.

So that’s what separates fact from fiction. These portrayals of genius are unrealistic because they take the same amount of time to produce a solution as it takes to verify one. Right?

Not quite. “Can any answer than can be checked quickly also be created quickly?” is one of the great unsolved problems of computer science. We don’t know.