The Art of Asking Good Questions

5 min read • April 23, 2025

In data science, we often obsess over the perfect algorithm - the flashiest model, the deepest neural net, the smartest pipeline. But none of that matters if you’re solving the wrong problem.

In 2012, Target knew a teenager was pregnant before her father did. The kicker? They were right.
How could this happen? It all started with a poorly framed question.

Why Questions Matter More Than Models

“A problem well stated is a problem half-solved.” – Charles Kettering

Before we ask how to solve a problem, we should ask what the problem really is. In the rush to model and analyze, the foundational step is often skipped: framing the right question.
Even a perfect model can’t fix a poorly-framed problem.

Questions

An elegant model trained on misunderstood objectives is like building a GPS that optimizes for shortest aerial distance — great in theory, but you might end up in a lake.
Funny in hindsight. Frustrating in production.

🚨 When Good Data Science Goes Bad

Let’s walk through some infamous real-world cases where asking the wrong question — or misunderstanding the data — led to costly mistakes.

🛍️ Case Study 1: Target’s Pregnancy Prediction Incident

Target built a model to predict which customers were pregnant, using shopping behavior like unscented lotion or vitamin supplements.
The model was incredibly accurate — and that was the problem.

They sent maternity-related coupons to a teenage girl… before she had told her family she was pregnant. The model worked, but the question wasn’t just “Can we predict pregnancy?”
It should have been, “Should we? And how do we communicate it?”

Lesson: Ethics and timing are part of the question. Predicting something isn’t the same as solving a business problem thoughtfully.

🚗 Case Study 2: Predicting Taxi Demand in New York

A team built a powerful ML model to predict yellow cab demand in NYC. Accuracy was solid. But when deployed, the model failed miserably. Why?

It had learned from past taxi pickups, not from real-time demand. But people were already switching to Uber and Lyft. So the model was optimizing for a shrinking ecosystem.

Lesson: The question wasn’t “Where were taxis picked up last year?”
It should’ve been, “Where do people want rides now, and how can we measure that?”

👩‍⚕️ Case Study 3: COVID Models That Misled

Early COVID-19 projections varied wildly. Some models overestimated death tolls; others missed the asymptomatic transmission problem. Why?

Models were based on poor or incomplete data — not the wrong math, but the wrong assumptions.

Lesson: When your input data doesn’t reflect reality, your questions (and answers) drift far from truth. Garbage in, gospel out.

So What Do Better Questions Look Like?

Each of the failures discussed above had one thing in common: a missing or misaligned question at the start. That means they’re preventable.
The secret isn’t better data pipelines or fancier models — it’s asking smarter questions before we begin.

The Data Science Funnel

Layer	Question to Ask	Common Pitfall
Business Goal	What do we really want to achieve?	Misaligned stakeholder expectations
Problem	Is the problem clearly defined and scoped?	Solving the wrong thing
Data	Does the data represent the reality we care about?	Missing context, bias, gaps
Model	What assumptions are baked into the algorithm?	Overfitting, blind trust in accuracy
Metrics	Are we measuring the right outcomes?	Optimizing vanity metrics, misleading KPIs

Here’s how to reframe problems more effectively:

What’s really being optimized?
If your model recommends products, is it optimizing for click-through, purchase, long-term retention, or customer trust?
Small differences in question → Big differences in outcome.
What does success actually look like?
Instead of “Can we reduce churn?”, ask:
- Which kind of churn is more damaging?
- Can we distinguish voluntary exits from poor onboarding?
- Is some churn actually healthy?
What might I be missing?
- Is there selection bias?
- Who is being excluded from the dataset?
- What happens if we succeed at this prediction?

🤖 Ask LLMs Better, Get Smarter Answers

With large language models, the danger isn’t just hallucination — it’s asking vague, leading, or incomplete questions.
LLMs are confident — even when they’re wrong.

Asking “Why is this stock a good investment?” assumes it is a good investment.
A better prompt: “Based on recent data, what are the risks and opportunities associated with this stock?”

	❌ Bad Question	✅ Better Question
Prompt	“Why is this stock a good investment?”	“What are the risks and opportunities associated with this stock?”
Assumption	Assumes the investment is good	Keeps the inquiry open and neutral
Effect	Model finds biased evidence to support assumption	Model explores a broader, balanced view
Risk	Overconfidence, hallucination	Nuanced, multi-sided analysis

TL;DR: Your Question Toolkit

🎯 Clarify the objective before you code
🧩 Challenge assumptions early
🔁 Talk to domain experts
🔎 Measure the right success metric
🛑 Pause before automating bad questions

💬 What’s the best (or worst) question you’ve seen in a data project?

Let’s swap war stories and build a culture of curiosity, not just cleverness.