AI for Seminarians

May 04, 2023

It’s used when somebody wants a simpler explanation of a complicated topic, so if someone were explaining something to me, and I failed to understand, I might say, “Explain like I’m five,” hoping that they might simplify.

It’s a cute phrase, but it’s not particularly helpful from the perspective of the person doing the explaining. Real five-year-olds stop listening really fast, and most of them don’t care about complicated concepts. I speak from experience here. While I don’t have a five-year-old at this very moment, I do have a 6.5-year-old and a 3.5-year-old, and their two ages average to five, so I declare myself an expert!2

The art of explaining complicated concepts is on my mind because in recent years, and especially in the last six months or so, I’ve gotten several questions from friends and acquaintances about Artificial Intelligence, or AI. For example, the other day I got a question from a student in seminary, studying to be a priest. He was looking for a gentle introduction to AI, and he wanted to know if there was anything he could consult to get a “good enough” understanding of the topic.

Unfortunately, I think the answer is “no.” Everything that comes to mind is either really technical, selling something of questionable usefulness, full-throated fear mongering, confused journalists, or profanity-laced takedowns of the journalists’ confusion.

For what it’s worth, I think the first and last links are pretty close to the truth, but for different reasons both of them are pretty hard to follow if you don’t already have some background in the area.

Which brings me to the purpose of this post. It’s an attempt to “Explain like I’m a seminarian.” Specifically, I’m aiming to explain some of the concepts of AI to someone who is:3

Smart and already well-educated, but aware of the limits of his knowledge
Striving to bear greater responsibility for the emotional and spiritual well-being of his community
Relatively unfamiliar with advanced statistics and computing

In short, a Seminarian. Where to begin?

Machine Learning

Artificial Intelligence, or AI, has been all over the news in the past few months. The proximate reason for the news is that computer science researchers have made some remarkable breakthroughs within a field called “Machine Learning,” and especially in a subfield called “Deep Learning.” What’s confusing is that nobody ever seems to explain what these fields are. I hope this post can provide a little glimpse into these esoteric concepts.4

Machine learning and deep learning (and AI for that matter) are somewhat confusingly named. The word “learning” was chosen to differentiate between what you might call traditional and statistical computer science. Programmers traditionally compile logical commands or instructions and carefully send them to a computer to get an output. A key feature of traditional computer science is that programmers (at least in theory) know exactly what they’re telling their computers to do, and the computers faithfully follow the instructions.

As computers grew in capability, some mathematicians noticed that they could use computers to solve math problems that are hard to do by hand. In particular, two such math problems were linear regression and matrix algebra.

Linear regression is a statistical technique for making estimates about unknown numerical values based on other, known values. For example, lets say you meet a kid, and he dares you to guess how old he is. But there’s a catch! You’re only allowed to know that he’s a boy and that he’s 5’6”. It turns out that linear regression can help you come up with a surprisingly good guess. To guess his age using linear regression you would:

Gather up and list out the ages, genders, and heights of as many other kids as you can find (remember, you’re not allowed to know the age of the kid in question because he’s daring you to guess it). The lists you compile are usually called “data.”
Do a bunch of math. The solution to this particular math problem is usually called a “regression model,” or simply a “model.”
Plug the kid’s gender and height into your new model, and it’ll give you as good a guess as mid-20th century math can provide. The guess is usually called a “prediction.” Incidentally, this is why I usually prefer to use the term “predictive model” rather than “machine learning” when I build models like this.

Matrix algebra, on the other hand, is an analytic technique for solving multiple related equations at the same time. In classical algebra, you solve equations one-at-a-time. For example if you wanted to solve an equation like 5x + 2 = 17, you could do some classical algebra to figure out that x = 3. But if you were trying to solve a lot of related equations at the same time, you would need more advanced math, and the advanced math is called linear, or matrix algebra.5

When the mathematicians started using computers to do these math problems, the computer scientists needed a way to help the mathematicians. The resulting combination of math and computing turned into what I’m calling statistical programming and what the computer scientists ended up calling “machine learning,” primarily to differentiate it from their traditional day jobs.

So the word “learning” in this context means something hugely different from what people usually mean when they use the term. Machine “learning” means building computer programs to solve math problems like regression and linear algebra. The computer "learns" from the information it’s given in the sense that it solves the equations (semi-)automatically rather than being directly instructed, as is the case in traditional programming.

Deep Learning

Over the years, as mathematicians and computer scientists played around with these techniques, they came up with all sorts of fun6 ways to push the boundaries of what types of math problems they could solve with computers. One of the concepts they developed during this time turned into something called Neural Networks.

Unfortunately neural nets are so complicated that I can’t think of a simple way to explain them. The best I can do is to describe them as “regressions stacked on top of one another?” They’re frequently visualized in diagrams like the one below, but including it here makes me uncomfortable because it took me years before I understood what the circles and lines really meant.

Each circle is a (usually) a single number and each set of arrows is (usually) a group of related (via matrix algebra) linear regression models, but it depends on the diagram and which type of model you’re looking at.

These models are so complicated that I’ve never found them particularly useful. There are other analytic techniques that are much simpler, and the simpler techniques were (and still are) already good enough to be useful for the types of problems that turn up in my day-to-day work. To this day, I almost never use them for anything.

But neural networks have turned out to be the key to getting computers to perform tasks that had previously been impossible. As I discuss in more detail in my previous post about ChatGPT, it’s really hard to get computers to meaningfully process “unstructured data,” i.e. images, video, audio, and text.

“Processing unstructured data” is so easy for people to do that it feels silly to use such a highfalutin phrase to describe human behavior. Honestly, the phrase just refers to the ability to recognize stuff. For example, take a look at this image:

Screenshot of Google Street View near my office

If you were driving and came across this intersection, you would know in an instant that you had the right-of-way because the light is green. Really simple stuff.

But computers can’t do this, at least not very well. It turns out that telling a computer the rules of the road (e.g. “Green means go.”) is a lot easier than telling it “this is what a stoplight looks like.”

For various reasons that are beyond the scope of this post, computer scientists have finally gotten a foothold in this domain by extending machine learning, specifically neural nets, to such an extent these types of models have grown into an entire subfield called “Deep Learning.” The exact structure of deep learning models isn’t important now.7 All we need to care about is that deep learning is the name for the types of models that can (start to) process unstructured data.

Before moving on, I want to emphasize how big of a deal this is in the realm of academic computer science. Computer science researchers have been trying to extend the capabilities of computers into processing unstructured data for decades, and the fact that they are now even partly successful is truly amazing!

That said…

So what?

I’m not a computer scientist. You’re not a computer scientist.8 Seems neat and all, but these breakthroughs appear to be pretty firmly ensconced within university computer science and mathematics departments?

Is there a reason you should care? Also, why is this in the news all the sudden, and why does it seem to be freaking out some of the members of your congregation?

Honestly, a big part of me is as confused as you are. Scientific breakthroughs happen all the time, but you don’t hear about most of them because the only people that care are the scientists themselves. It’s more than a little jarring for me to see my field of analytics blow up to such a degree that I feel compelled to write multiple several-thousand-word essays just to keep my bearings.

That said, I do have a guess as to why it’s blown up to the degree it has, but before I get there, a big caveat: I am hugely uncertain about the next section. I’m reasonably confident in what I’ve said up until now because I do this type of work all the time, but I want to be abundantly clear that what follows should be thought of as speculation, not explanation.

But you’re not here for hemming and hawing. You’re here for answers!

Here’s the hottest take I can give:

I blame crappy names, ‘90s movies, and Nick Bostrom for the current AI hysteria.

Let’s unpack this a bit.

Names, the Nineties, and Nicky B

I’m really bad at is naming things.

Usually when I finish the “math phase” of an analysis, I have to figure out a way to explain it to my non-mathematical coworkers. I find the best way to explain the technical details is to turn the analysis into a story, a narrative with characters, plot, tension, and so on. Part of crafting any story involves naming your characters, and I suck at it.

To give one (semi-fictionalized) example, a data scientist on my team once built some models to identify which of our customers were the most likely to be receptive to some new products we were launching. The models turned out great, and we had to figure out how to use them. We decided to use these models for targeted sales campaigns, to direct our sales team’s attention towards the customers most likely to buy. The campaigns needed names, and we ended up calling them “sprints.”

In the subsequent months, we pushed the sales team to go after these high-value targets, but they wouldn’t listen! What was going on? The sales people presumably want to sell more, right? And according to our models, these are the very customers that are clamoring to be sold to!

I eventually learned that the sales team hated the name “sprint.” It made them feel like we were forcing nonsense corporate speak on them or something. We’ve now dropped the name, referring to the model results simply as “targets,” and trying again. I’m not sure if “targets” will be any better, but I now know for sure that the name “sprint” was a mistake.

I say this because I’m beginning to suspect that I’m not alone. That I’m not the only one bad at naming things. I wonder if our mathematical and computer science forebears were also bad at naming things.

At the time these techniques were developed, the terms Artificial Intelligence, Machine Learning, Neural Networks, and Deep Learning probably seemed like entirely reasonable names. After all, who’s ever going to care outside of our field? Anybody who uses these words will be sufficiently steeped in the math that their names won’t matter. Plus, they sound like something out of Star Wars or something! 🤓

But now everybody is using these names, and very few people know what they really refer to. All the sudden, these names are causing a lot of confusion because of how easily the names can slide into a particular popular narrative, regardless of what’s going on in the math.

Speaking of popular narratives, I just realized that if you’re in seminary right now, you were probably born after the 1990s, so you might not know a lot about ‘90s movies. Maybe you haven’t even seen Star Wars! 😱

So I guess it’s time for a history lesson.

I grew up in the ‘90s, and I loved movies. The other day I was looking at a list of the highest grossing movies of the 1990s, as one does, and I was struck by how many of them revolve around unstoppable forces, especially robots, threatening to destroy the world.

Such films include: Jurassic Park, Independence Day, The Lost World: Jurassic Park, Men in Black, Armageddon, Twister (sort of), The Mummy, Godzilla (probably? I’ve never seen it), Deep Impact, and especially Terminator 2: Judgment Day and The Matrix. It’s even in sort of in Titanic, the #1 movie of the decade.

I don’t know why this was such a common theme. Maybe it was the coming millennium? Who knows, but holy crap that’s a lot of movies! I’m not sure if this is a particular quirk of the ‘90s, or if apocalypse films are always popular, but those of us who grew up in the ‘90s were clearly inundated with them.

And the movies had a real impact on us. When I was in college in the early ‘00s, conversations with friends frequently drifted toward ideas of futuristic technology, robots, and the possibility of out-of-control disasters.

There’s one such conversation I remember particularly well. A friend of mine laid out an argument that, given a few seemingly-reasonable assumptions, you can conclude that the world isn’t real, or to be more precise, that we’re almost certainly living in a simulation. I’m not going to lay out the argument here, but if you’re interested, it’s not hard to find.

It turns out that this argument came from a philosopher named Nick Bostrom. He makes very deep, serious arguments about these types of things.9 He and several other philosophers and writers have become famous over the past 20 years by taking arguments like these to their logical conclusions, and many folks, especially people working in tech, have fun (despite occasionally freaking themselves out) exploring them in nerdy internet forums.

I really don’t know what to think about these arguments from a philosophical perspective. I’m but a humble data analyst, so please don’t come to me for commentary on them. I’m simply pointing out that a lot of (digital) ink has been spilled in the past couple decades, playing around with concepts like sentient robots and unstoppable technology destroying the world.

The Perfect Storm

So here we are in 2023. Those of us that grew up on disaster movies, and who have been playing around with weird theories about technological apocalypses for awhile, are getting old enough to have real influence on things, whether we realize it or not. Meanwhile, academic researchers are finally able to get computers to recognize stoplights and stuff, using techniques with clunky names like “Artificial Intelligence.” Add to this the myriad marketers, PR teams, journalists, politicians, and outright fraudsters who are always out to capitalize on any trend that comes along, and it’s The Perfect Storm.

When you combine tech nerds who love debating science fiction with real breakthroughs in esoteric, hard-to-understand fields that happen to have names that probably came out of science fiction, it’s almost impossible to avoid framing the breakthroughs as if popular science fiction is happening “in real life.” And even worse, there’s a huge overlap between the nerds doing the debating and the academics doing the researching. So not only would you be forgiven for combining these two groups together when you first look, but they (we?) appear to be the very authorities on these topics, and it looks like they’re the ones the most freaked out!

I guess to that end, maybe this whole thing is a bit of a mea culpa? To whatever extent I count as a member of these groups, I admit that we have not done a good job explaining ourselves (to say the least), and it’s causing a lot of confusion. Maybe this counts as some weird kind of atonement or something. Please forgive me (soon-to-be) Father, for I have sinned.10

Finally, I recognize that this theory is entirely self-centered. But I can’t help but feel like this stuff really is centered on these groups that I'm at least sort of a member of. Heck, the whole reason I started this blog was because I was seeing all sorts of inaccurate information about this stuff, and everybody keeps asking me to explain AI to them!11

Back to seminary

If you really are a seminarian, or just somebody trying to cut through the static about AI, I hope this is a helpful window into my world. “Real AI” is the extension of mathematics to try to get computers to process images, video, audio, and text. Academic researchers are making remarkable, surprising strides in this area, but at its core, it’s just regression, not magic.

More speculatively, because the new mathematical techniques have these science-fictiony names, and because the names of the techniques (as opposed to the actual mathematics) are related to culturally-ubiquitous depictions of the apocalypse, the popular narrative tells itself. I genuinely wonder if all this would be happening if these techniques had more innocuous names.

So if I have any advice about AI, don’t worry about it. The models are amazing, and it’s really cool that deep learning works at all.

But it’s not the apocalypse. It’s just nerds on the internet.

In the context of this post, a meme is just a joke on the internet. There’s a lot of stuff on the internet that’s annoyingly obtuse: concepts/phrases/ideas that are almost intentionally designed to be hard to understand, and the word “meme” is one of them. Also, it’s pronounced “meem,” not “MayMay.”

Technically, as of the date of publication, my two kids are 6.8 and 3.7 years old, so if I were using significant figures properly (as is implied by the decimals) I would have to say their ages average to 5.3, actually. But come on! We’re talking about kids here, and kids love to say stuff like, “I’m not three. I’m three-and-a-half!”

I use the masculine pronoun here just because most clergy are men, but I absolutely encourage everyone to learn more about this stuff!

I’m going to try to describe these fields in a sort of “history,” but it’s such a huge simplification that you shouldn’t even read it as history. Rather than thinking of this as history, think of it as a window into the world of data science.

Please don’t let the word “matrix” scare you. This is not The Matrix (though we will come back to it, believe it or not). Real matrices are just a method for simplifying complicated notation, and to be honest, I’ve never been very good at them. I think I got a C- when I took linear algebra in college.

Fun for me at least.

But they are really interesting! If you want to dig into the details, d2l.ai is a great resource. #NotAnAd

Ok, well maybe you, dear reader, happen to be a computer scientist, but remember, I’m writing this for our hypothetical seminarian.

Bostrom is so serious that I kind of feel bad referring to him as “Nicky B” above. To be clear, nobody calls him that (at least to my knowledge). I just thought it made for a catchy section title.

I’m really sorry if this admittedly jokey phrasing is sacrilegious or something. I don’t know your faith tradition at all, and I’m barely even scratching the surface of my own at the moment. I sincerely mean no mockery or disrespect.

I want to be clear that I’m not complaining about the burden of explanation here. I love explaining AI and data science. It’s just really hard, especially when there’s so much close-to- but not-quite-right information flying around.

Flyover Analytics

Discussion about this post