The Art of Asking Good Questions
5 min read • April 23, 2025
In data science, we often obsess over the perfect algorithm - the flashiest model, the deepest neural net, the smartest pipeline. But none of that matters if you’re solving the wrong problem.
In 2012, Target knew a teenager was pregnant before her father did. The kicker? They were right.
How could this happen? It all started with a poorly framed question.
Why Questions Matter More Than Models
“A problem well stated is a problem half-solved.” – Charles Kettering
Before we ask how to solve a problem, we should ask what the problem really is. In the rush to model and analyze, the foundational step is often skipped: framing the right question.
Even a perfect model can’t fix a poorly-framed problem.
An elegant model trained on misunderstood objectives is like building a GPS that optimizes for shortest aerial distance — great in theory, but you might end up in a lake.
Funny in hindsight. Frustrating in production.
🚨 When Good Data Science Goes Bad
Let’s walk through some infamous real-world cases where asking the wrong question — or misunderstanding the data — led to costly mistakes.
🛍️ Case Study 1: Target’s Pregnancy Prediction Incident
Target built a model to predict which customers were pregnant, using shopping behavior like unscented lotion or vitamin supplements.
The model was incredibly accurate — and that was the problem.
They sent maternity-related coupons to a teenage girl… before she had told her family she was pregnant. The model worked, but the question wasn’t just “Can we predict pregnancy?”
It should have been, “Should we? And how do we communicate it?”
Lesson: Ethics and timing are part of the question. Predicting something isn’t the same as solving a business problem thoughtfully.
🚗 Case Study 2: Predicting Taxi Demand in New York
A team built a powerful ML model to predict yellow cab demand in NYC. Accuracy was solid. But when deployed, the model failed miserably. Why?
It had learned from past taxi pickups, not from real-time demand. But people were already switching to Uber and Lyft. So the model was optimizing for a shrinking ecosystem.
Lesson: The question wasn’t “Where were taxis picked up last year?”
It should’ve been, “Where do people want rides now, and how can we measure that?”
👩⚕️ Case Study 3: COVID Models That Misled
Early COVID-19 projections varied wildly. Some models overestimated death tolls; others missed the asymptomatic transmission problem. Why?
Models were based on poor or incomplete data — not the wrong math, but the wrong assumptions.
Lesson: When your input data doesn’t reflect reality, your questions (and answers) drift far from truth. Garbage in, gospel out.
So What Do Better Questions Look Like?
Each of the failures discussed above had one thing in common: a missing or misaligned question at the start. That means they’re preventable.
The secret isn’t better data pipelines or fancier models — it’s asking smarter questions before we begin.
The Data Science Funnel
Layer | Question to Ask | Common Pitfall |
---|---|---|
Business Goal | What do we really want to achieve? | Misaligned stakeholder expectations |
Problem | Is the problem clearly defined and scoped? | Solving the wrong thing |
Data | Does the data represent the reality we care about? | Missing context, bias, gaps |
Model | What assumptions are baked into the algorithm? | Overfitting, blind trust in accuracy |
Metrics | Are we measuring the right outcomes? | Optimizing vanity metrics, misleading KPIs |
Here’s how to reframe problems more effectively:
-
What’s really being optimized?
If your model recommends products, is it optimizing for click-through, purchase, long-term retention, or customer trust?
Small differences in question → Big differences in outcome. -
What does success actually look like?
Instead of “Can we reduce churn?”, ask:- Which kind of churn is more damaging?
- Can we distinguish voluntary exits from poor onboarding?
- Is some churn actually healthy?
-
What might I be missing?
- Is there selection bias?
- Who is being excluded from the dataset?
- What happens if we succeed at this prediction?
🤖 Ask LLMs Better, Get Smarter Answers
With large language models, the danger isn’t just hallucination — it’s asking vague, leading, or incomplete questions.
LLMs are confident — even when they’re wrong.
Asking “Why is this stock a good investment?” assumes it is a good investment.
A better prompt: “Based on recent data, what are the risks and opportunities associated with this stock?”
❌ Bad Question | ✅ Better Question | |
---|---|---|
Prompt | “Why is this stock a good investment?” | “What are the risks and opportunities associated with this stock?” |
Assumption | Assumes the investment is good | Keeps the inquiry open and neutral |
Effect | Model finds biased evidence to support assumption | Model explores a broader, balanced view |
Risk | Overconfidence, hallucination | Nuanced, multi-sided analysis |
TL;DR: Your Question Toolkit
- 🎯 Clarify the objective before you code
- 🧩 Challenge assumptions early
- 🔁 Talk to domain experts
- 🔎 Measure the right success metric
- 🛑 Pause before automating bad questions
💬 What’s the best (or worst) question you’ve seen in a data project?
Let’s swap war stories and build a culture of curiosity, not just cleverness.