This is a representative report from a fictional respondent, Alex Chen. The structure, scoring logic, evidence handling, and developmental feedback are identical to what a real customer receives. Only the name and the response text are fabricated.
Sample only. Alex Chen is not a real person. A real report includes the candidate's actual responses, the dimensions selected for that session, and a downloadable PDF.
Alex Chen
Sample Report
Assessed 2026-04-22
Strong
3.33 / 5
Six-dimension average
Capacity summary
Your responses demonstrate skilled reasoning within each situation. The frontier is reasoning about the structure of the situation itself, not just within it: what kind of problem are you actually solving, and what does that reveal.
Capacity distribution
Each axis is one of the six capacity dimensions, scored 1 to 5. The shaded polygon is this respondent's profile across all six.
Per-dimension detail
Each card shows the score, the level label from the rubric, the scorer's rationale grounded in evidence quotes pulled from the response, the strengths the response demonstrated, the gaps that kept it from the next level, and concrete next steps.
Dimension 1 of 6
Reasoning Under Ambiguity
Task: The Vendor Status Question
4/5
Weighs alternatives and commits with explicit conditions
Rationale
The response generates two competing readings of the situation (the vendor is genuinely behind schedule versus the project manager's reporting is sloppy) and surfaces the cost implications of each. It commits to a sequenced plan with an explicit decision rule (the 48-hour evidence threshold for the three highest-priority integrations), and hedges with a partially-reversible step (parallel sourcing without canceling). It names the conditions under which the choice would be revisited. These behaviors match the Level 4 anchors directly. What keeps the response from Level 5 is the absence of any reasoning about the structure of the uncertainty itself: the response treats this as one tractable case rather than asking whether the inability to read vendor status is a category-level signal worth a separate move.
Evidence from the response
Two readings of the situation are consistent with what we have: either the deliverable is actually behind schedule, or the project manager's status reports are inaccurate.
I would send a short request today asking for evidence of progress on the three highest-priority deliverables.
If they cannot produce that evidence within 48 hours, I would start parallel sourcing without canceling the contract yet, because parallel sourcing is reversible and cancellation is not.
This gives us low-cost optionality while we learn which reading is correct.
Strengths
Generates two distinct alternative readings of the situation and articulates the different cost profiles each implies.
Identifies the specific evidence (progress on three highest-priority integrations) that would discriminate between the readings.
Hedges the immediate move with reversibility (parallel sourcing without cancellation) so the decision can be revisited as evidence arrives.
Gaps
Does not step outside the immediate vendor question to ask whether the standing inability to tell from status reports is itself a signal about vendor-portfolio visibility, not just this one vendor.
Does not distinguish the tractable part of the uncertainty (this specific deliverable) from a potential irreducible part (visibility into vendor work as a category).
Recommended next steps
After naming alternatives and committing to a discriminating evidence check, ask explicitly: 'What does this situation tell me about the broader information environment I am working in?'
Practice articulating both a local decision rule (for this vendor) and a category-level move (for the vendor portfolio) when the local situation hints at a wider pattern.
Rubric version 1.0.0
Dimension 2 of 6
Learning Velocity
Task: The New AI Tool Evaluation
3/5
Identifies what to learn first but does not build from prior knowledge
Rationale
The response produces a structured, time-sequenced plan that goes beyond a generic study plan. It names specific things to look for (inputs/outputs, what would break it), commits to a minimum viable approach (three or four representative pieces of work), and identifies the cheapest evidence source (the two peers at the other company). These behaviors match Level 3 anchors. However, the response does not reach Level 4: it does not identify a specific analogous prior domain and use it as a working scaffold for the new one, and it does not state a working model in a form that could be wrong with named evidence that would falsify it.
Evidence from the response
Days 1-3 I would build a working mental model of the tool by reading the documentation and running it on three or four representative pieces of work picked for variety.
I would also reach out to the two peers at the other company immediately, because their three months of experience is the cheapest evidence I will get.
Days 8-10 I would write the recommendation, naming the conditions under which it would be wrong.
Strengths
Immediately prioritizes the cheapest, highest-signal evidence source (the two peers) rather than defaulting to documentation alone.
Commits to a minimum viable test set (three or four representative pieces of work chosen for variety) instead of trying to cover everything.
Applies epistemic discipline to the final recommendation by requiring it to name the conditions under which it would be wrong.
Gaps
Does not identify a specific prior evaluation framework or analogous domain to use as a working scaffold for this new tool.
Does not state a falsifiable working model of the tool that could be confirmed or revised during the evaluation period itself.
Recommended next steps
Before Day 1, write down your current best guess about how this tool works and what its failure modes are; treat the first tests as evidence that confirms or breaks that guess.
Identify a prior evaluation you have done in a different domain and explicitly map which parts of that structure carry over to this AI tool evaluation and which do not.
Rubric version 0.1.0
Dimension 3 of 6
Communication Architecture
Task: The Same Finding, Three Audiences
4/5
Architects the message so the receiver can act before reaching the end
Rationale
The response demonstrates clear Level 4 behavior. It identifies each receiver concretely, names what each will do with the information, and places the decision-relevant claim at the top for each audience. The CEO section leads with the action and ends with a single ask. The CFO section leads with the dollar figure and explicitly names what the model does not know, an anticipatory move against a predictable pushback. The data science section sequences the working session around the questions the team will need answered to extend the model. What keeps this from Level 5 is the absence of reasoning about the communication environment beyond the receiver: no consideration of channel, no thought about whether the message will be forwarded, no proposal that different artifacts might be needed for different downstream uses.
Evidence from the response
The CEO in five minutes needs the headline and the action.
I would lead with the dollar figure: what is the annual revenue from the customers the model flags as at-risk, what fraction is recoverable, and what does intervention cost.
Acknowledge what the model does not know (we are predicting cancellation, not the reason behind it).
The noun in the first sentence changes: customers, dollars, model.
Strengths
Each audience section leads with the decision-relevant unit for that receiver (action for the CEO, dollars for the CFO, modeling choices for the data team).
Anticipates a predictable CFO pushback by naming what the model does not know.
The closing meta-observation ("the noun in the first sentence changes") names the structural principle being applied, showing architectural self-awareness.
Gaps
Does not consider the communication channel or reading conditions for any of the three audiences (verbal vs. slides vs. working session).
Does not propose different artifacts (a one-pager, a financial model, a technical memo) that might be needed for different downstream uses.
Recommended next steps
Before structuring each message, name the channel and the reading conditions; let those shape format choices.
Ask what happens after each conversation; if the CEO will brief leadership, design the message to survive that forwarding intact.
Rubric version 0.1.0
Dimension 4 of 6
Creative Problem Decomposition
Task: The Retention Problem
3/5
Identifies the load-bearing sub-problem and concentrates effort there
Rationale
The response clearly rises above Level 2 by identifying four sub-problems and explicitly naming which is load-bearing (segmentation) with a concrete reason (it bounds the search space for the others). It explains how segmentation discriminates between the two leadership-team camps. These behaviors match Level 3 directly. The response does not reach Level 4 because it produces only one decomposition along the diagnostic-categories axis and does not test it against an alternative decomposition (by failure mode, by decision-maker, by time horizon), nor identify a non-obvious recombination across sub-problems.
Evidence from the response
Fix retention is a desired outcome, not a problem statement.
I would start with segmentation because it is cheapest and bounds the others.
Concentrated in one manager is a local problem. Spread across the org is structural.
Of the seniors who stayed, what is different? Without comparing stayers and leavers, the data is just half a picture.
Strengths
Reframes the task: identifies that "fix retention" is an outcome, not a problem statement.
Names a load-bearing sub-problem (segmentation) and gives a concrete reason for prioritizing it.
Adds a counterfactual sub-problem (comparing stayers to leavers) that most conventional decompositions omit.
Gaps
Produces only one decomposition; does not generate a second along a different axis to compare what each surfaces.
No non-obvious recombination across sub-problems is identified (e.g., timing plus destination jointly pointing to a specific external event).
Recommended next steps
After building your first decomposition, deliberately construct a second one along a different axis (e.g., by who is making the leave decision) and compare what each framing makes visible.
Look for recombinations: ask whether any two sub-problems, when solved together, reveal something neither reveals alone.
Rubric version 0.1.0
Dimension 5 of 6
Judgment Calibration
Task: The Forecast and the Caveat
3/5
Differentiates confidence across claims but does not test it against evidence
Rationale
The response demonstrates clear differentiation of confidence across claims and correctly diagnoses that the existing process is miscalibrated (single-number forecasts when realized results have varied plus or minus 12 percent). It identifies specific conditions that would make the forecast wrong by 20 percent or more and names a revision trigger. However, the response does not reach Level 4: it does not explain what evidence supports each confidence level it assigns, and it does not distinguish between claims based on direct experience and claims based on extrapolation.
Evidence from the response
A more sophisticated model on Option A would just produce a more confident wrong number with no diagnostic for when it is wrong.
If certain leading indicators move outside a stated range before week six, I would re-run the forecast and tell the CFO.
My forecast should be communicated as a 90 percent confidence band, not a single number.
Strengths
Correctly diagnoses the existing process as overconfident relative to the evidence, grounding the critique in the observed historical variance.
Proposes a concrete revision trigger (leading indicators outside a stated range before week six).
Distinguishes between categories of forecast risk rather than treating all uncertainty as uniform.
Gaps
Does not explain what evidence specifically supports each confidence level it assigns (e.g., why two-out-of-four quarters of plus or minus 12 percent justifies a 90 percent confidence band).
Does not distinguish claims based on direct observation from claims based on extrapolation.
Recommended next steps
For each major claim in your assessment, explicitly state what evidence supports that confidence level and what observation would raise or lower it.
Distinguish observations from inferences in your writing; treat the inferences as more revisable.
Rubric version 0.1.0
Dimension 6 of 6
Stakeholder Navigation
Task: The Silent Skeptic
3/5
Reads stated positions and proposes a workable next step
Rationale
The response identifies that the peer leader's shifting concerns suggest they want a different overall outcome rather than a specific data fix. It reads the sponsor's silence as information and proposes a deliberate sequence (engage the skeptic first, then the sponsor). These behaviors match Level 3 anchors. The response stops short of Level 4 because it does not consider whether the peer leader's opposition is situational (specific to this project) or structural (rooted in role incentives), and does not design moves that account for second-order responses or how the proposed sequence shifts the underlying stakeholder field.
Evidence from the response
The shifting substance suggests they are not stuck on a specific concern; they want a different overall outcome.
I would request a 30-minute conversation framed as a working-session check-in.
The sequence matters. Talk to the skeptic first, then to the sponsor.
Strengths
Correctly identifies that the peer leader's shifting concerns signal an underlying interest in a different outcome, not a specific data gap.
Reads the sponsor's silence as information with multiple interpretations rather than treating it as neutral.
Sequences the two conversations explicitly and explains the logic.
Gaps
Does not consider whether the peer leader's opposition is situational or structural, which would change how much investment to make in converting them.
Does not design moves that anticipate the peer leader's likely response to a request for a working-session check-in.
Recommended next steps
After mapping the immediate move, ask explicitly: is this person's opposition situational or structural, and how does that change my approach?
Practice naming what you would do if the skeptic refuses the working-session check-in; the second-order plan is the Level 4 behavior.
Rubric version 0.1.0
How to use this report
SPARK is a developmental feedback instrument. The scores describe capacity behaviors the response demonstrated on the day it was written, against a public, versioned rubric. The report identifies what the response did well, what it did not do, and what to try next.
SPARK is not a hiring credential and not a clinical assessment. Scores are not validated for employment selection decisions and should not be used as the sole basis for hire, promotion, or termination. See Methodology for the full statement of intended use.
Take the assessment.
The full SPARK assessment runs in about 30 minutes and produces the same report structure you just read, calibrated to your own responses.