The methodology of what to collect — mixed-method, longitudinal, pre/post — and the design choices that make multi-language, offline, skip-logic collection produce reports your funder will actually read.
By Unmesh Sheth · Sopact
§ 2.0 · Where this chapter sits
Where this chapter sits
Designing for the whole record.
Chapter 01 gave you the spine. This chapter is about feeding it well —
deciding what to measure, when to measure it, and how to make every form,
every interview, and every document arrive clean.
Chapters in Beyond the Survey
00Introduction8 pages
01Workflow22 pages
02Data Designyou are here
03Data Collectionnext chapter
04Intelligent Suite~18 pages
05Actionable Insight~18 pages
The library
Book 01 · this book
Beyond the Survey
The foundational field guide — methodology for the AI era.
One unified intelligence layer across many programs.
2
CHAPTER · 02
Data Design.
Three design lenses, every choice made before a respondent ever touches
a form. Plus the field considerations — offline, skip logic, multi-language —
that decide whether your report covers everyone or just the easy half.
What you'll learn
01.Why design beats collection-in-the-moment
02.The three lenses: mixed-method · longitudinal · pre/post
03.Designing for the field — offline, skip logic, multi-language
04.A fellowship measurement plan, worked end-to-end
Time to read
14 min
17 pages · 28 illustrations
3
§ 2.1 · Why design
Chapter 02 · §2.1
Most reports fail at the design phase.
By the time you're staring at an export trying to make sense of 450 rows,
the report has already been decided — for better or worse. The architecture
of clean reporting is built upstream, in the design choices nobody saw you make.
The "we'll figure it out" path
Six instruments, no shared spine. Pre-program in one tool, follow-up in
another. Match by hand later. Headers don't reconcile. Six weeks of cleanup
on the back end.
The "designed first" path
Four touchpoints, one ID, one Dictionary. Every wave links automatically.
The report exists as the architecture, ready the morning the last
response closes. No reconstruction.
Report quality is decided upstream. No amount of editorial polish
after collection can recover evidence the architecture never captured.
4
§ 2.2 · Three lenses
Chapter 02 · §2.2
Every measurement plan, three design lenses.
Before you pick instruments, you make three choices. They aren't methods —
they're lenses. Every credible measurement design is some combination of them.
LENS 01
Mixed-method
Numbers + stories in the same record. Numbers tell you what changed.
Stories tell you why and for whom.
LENS 02
Longitudinal
Same people, multiple moments. The only way to detect real change
rather than a snapshot of who happened to be in the room.
LENS 03
Pre / post
Two waves, one delta. The cleanest way to measure short-term change —
as long as you don't overclaim what caused it.
Most credible designs use two lenses at once.
Pre/post + mixed-method is the working baseline. Longitudinal + mixed-method
is the gold standard. All three together is what the next ten pages teach you.
5
§ 2.3 · Mixed-method
Lens 01Mixed-method
Numbers tell you what. Stories tell you why.
Quantitative data answers "how much changed?" Qualitative answers "what changed
and for whom?" Neither alone is sufficient. The trick is keeping both in the
same record — joined by participant ID — so a single response can be read
both as a number and as a quote.
EXAMPLE
Confidence × test score
Rubric score (quant) joined to "tell us about a moment when you felt stuck"
(qual). Was the score real or rote?
EXAMPLE
Wage gain × narrative
$1,100/wk after placement (quant) joined to "how has work changed your week?"
(qual). Wage gain ≠ life gain — these tell you both.
EXAMPLE
NPS × open-ended
Promoter score = 9 (quant) joined to "what would you change?" (qual).
The number is the headline; the quote is the meaning.
6
§ 2.4 · Longitudinal
Lens 02Longitudinal
Same people. Multiple moments.
Most "impact" data is a snapshot. Longitudinal design tracks the same individuals
across time — and only that lets you separate program effect from "who happened
to be in the room this week."
⚠
The attrition trap
Longitudinal designs lose people. Half your T0 cohort may not respond at T3.
Plan for it — track who drops, what their T0 record looked like, and report
your findings with non-response transparency, not without it.
7
§ 2.5 · Pre / post
Lens 03Pre / post
Two waves. One honest delta.
The cheapest credible measurement is a pre/post survey of the same people on
the same instrument. It's the ground floor of measurement — but the floor is
important, and most teams skip it.
PRE · T0
5-point confidence scale, 47 respondents
+1.4
DELTA
POST · T1
Same instrument, same 47 participants
DESIGN DISCIPLINE
Three rules to keep pre/post honest
Same instrument at T0 and T1. Changed wording = different measurement.
Same individuals. Cohort-level pre means matched to cohort-level post means is not pre/post — it's two cross-sections.
Don't claim causation without a comparison group. You measured change; you didn't prove the program caused it.
8
§ 2.6 · Designing for the field
Chapter 02 · §2.6
Field conditions decide who's in your data.
The lenses tell you what to measure. The field tells you whether anyone can answer.
Three design choices, made before a single form goes out, decide whether your
sample reflects your program — or just the easy half.
CHOICE 01
Offline
Will your respondents always have connectivity? Rural communities,
field staff in low-bandwidth settings, refugee camps, school visits — connectivity
can't be assumed.
·Capture on device, sync when connected
·Photos + voice memos as data, not attachments
·Persistent ID survives the offline session
CHOICE 02
Skip logic
Will respondents see questions that don't apply? A 60-question form
becomes a 12-question form for any individual respondent if you branch
correctly — and completion rates triple.
·AND/OR conditions on any prior answer
·Show/hide sections, not just questions
·Validation rules per branch path
CHOICE 03
Multi-language
Will your stakeholders speak the language of your instrument? A
form that requires English filters out half your population without you
realizing.
·Collect in any language · 40+ supported
·AI analysis in source language
·Reports in funder's language
All three are design choices — not collection mechanics.
Decide them before you build your first form. The next page goes deep on
the most under-considered: multi-language, end-to-end.
9
§ 2.7 · Multi-language end-to-end
Chapter 02 · §2.7
Collect in Swahili. Analyze in English. Report in Portuguese.
Most teams accept English as the limiting factor and shrink their sample to
match. Designed correctly, language doesn't have to be a constraint at all:
your instrument, your AI prompts, and your output reports can each live in
different languages without anyone translating by hand.
THREE INDEPENDENT LAYERS · COMPOSE ANY WAY
1
Collection
The form is rendered in the respondent's preferred language. Skip logic,
validation messages, and even error states all translate.
EN ·
"What barriers did you face?"
SW ·
"Ulikabili changamoto gani?"
FR ·
"Quels obstacles avez-vous rencontrés?"
2
Prompts
AI prompts read responses in their original language — not after a
lossy translation step. Themes and codes come out structured.
PROMPT (any UI lang)
Extract barrier themes from the response in its source language. Tag with: transport · childcare · stigma · cost · info.
RESPONSE (Swahili) →
tag: childcare · transport
3
Reports
The funder reads it in their language. The community partner reads it in
theirs. The same evidence, generated three times from the same dataset.
REPORT v1 (EN) ·
"Cohort confidence rose 38%…"
REPORT v2 (PT) ·
"A confiança da coorte subiu 38%…"
REPORT v3 (FR) ·
"La confiance de la cohorte a augmenté 38%…"
The end-to-end flow
Swahili response → analyzed in source → English structured tags → Portuguese funder report
40+ languages supported on collection. Any combination on output. The
participant never sees a translated question they didn't ask for.
The funder never reads a translated quote without source-language verification.
10
§ 2.8 · Worked example
Chapter 02 · §2.8
A fellowship measurement plan. From design choice to final report.
A global fellowship: 80 fellows, 18 countries, 6 working languages, 12 months
of programming. Watch how the three lenses + three field choices compose into
one coherent measurement plan.
COHORT
80
fellows · global
COUNTRIES
18
across 4 continents
LANGUAGES
6
EN · ES · FR · PT · SW · AR
PROGRAM
12mo
+6mo follow-up
01
Lens · Longitudinal + Mixed-method
Four waves (T0/T1/T2/T3). Each wave = rubric + interview, joined on fellow_id.
T0 intake · T1 month-4 · T2 month-12 · T3 +6mo
02
Lens · Pre/post on confidence rubric
Same 8-dimension instrument at T0 and T2. The delta is the headline.
8-dim rubric · 1–5 scale · same instrument
03
Field · Offline-first for South Sudan + Yemen cohort members
Captured on phone, synced when connected. Photos + voice memos as supporting data.
8 of 80 fellows · offline-capable
04
Field · Skip logic by region + program track
Six tracks × five regions = 30 paths. No fellow sees more than 14 questions.
60Q form → ≤14Q per fellow
05
Field · Multi-language collection + reporting
Collect in 6 languages. Analyze in source. Reports: EN for board, FR for francophone partners, AR for regional convening.
6 collect langs · 3 report langs
The result
One measurement plan, every design choice made before recruitment opens.
At T2 + 6 months, the final report writes itself from the accumulated record —
in English for the board, French for partners, Arabic for the regional convening.
Every quote in every report sources back to the language it was given in.
11
§ 2.9 · Gallery
Chapter 02 · §2.9
Five design archetypes. One spine.
Most measurement plans match one of five archetypes — or a combination of two.
Recognizing yours short-cuts the design phase from weeks to hours.
⇆
Pre / post cohort
p. 13
⌇
Longitudinal w/ attrition
p. 14
"
Qual-primary mixed
book 05
|/|
Treatment + control
book 03
≋
Continuous pulse
book 04
When each archetype fits
Pre/post cohort — your default when the program has a clear start and end,
and you can measure the same individuals twice. Most workforce-training, fellowship,
and skills programs land here.
Longitudinal w/ attrition — when you need to see effects months after
the program ends. Plan for attrition by design: at T3 you'll have 60–80% of T0.
Qual-primary mixed — participatory, ethnographic, or community-led
evaluations where stories carry more weight than scales. Numbers exist as
triangulation, not headline.
Treatment + control — when you need to claim causation. Higher
rigor, higher cost. Most impact investors live here.
The next two pages walk through the two most common archetypes in detail. The
others get full treatment in their domain-specific books.
12
§ 2.9.1 · Pre/post cohort
Archetype 01 of 05
Pre / post cohort.
The most common archetype for cohort-based programs. Same individuals,
same instrument, two waves. The delta is the headline; the qual is the
explanation.
Best for
Cohort programs · 3–12 mo
Sample
20–200 participants
Min waves
2 (T0 + T1)
THE SHAPE
Design rules
Identical instrument at T0 and T1. Any wording change invalidates the comparison.
Same individuals. Cohort-mean change ≠ individual change. Link by participant_id.
Mix quant + qual at both waves. Quant gives you the delta; qual gives you the why.
Watch-outs
Don't claim causation without a comparison group. You measured change.
Response shift bias — what "3" means at T0 may differ from "3" at T1. Anchor with examples.
Selection effects — your sample is who finished, not who started. Report both.
13
§ 2.9.2 · Longitudinal w/ attrition
Archetype 02 of 05
Longitudinal · w/ attrition.
When you need to see post-program effects — wage gain at +6mo, retention at
+12mo, civic engagement at +24mo — you design for the long view. And you
design for the fact that not everyone will reply.
Best for
Programs with downstream outcomes
Waves
3–5 (T0 → T3+)
Expected attrition
20–40% by T3
THE SHAPE · 4 WAVES, NARROWING SAMPLE
The attrition discipline
Track who drops, not just who stays. Compare T3 respondents' T0 records to T3 non-respondents'.
Plan capture for the long view — at T0, get an alternate contact and an opt-in for +6 / +12 / +24 month outreach.
Report response rate per wave in the final report. Funders trust transparency.
What you gain
Sustained-effect evidence — the kind funders renew on, not the kind they doubt.
Sleeper effects show up at T3 that weren't visible at T1.
A reusable cohort — every renewal cycle starts smarter than the last.
14
§ 2.10 · The accelerant
Chapter 02 · §2.10
How Sopact Sense handles all of this.
The design lenses and field choices in this chapter are methodology. Sopact
Sense is where they get implemented — without three different tools, three
different spreadsheets, or three different consultants.
THE PLATFORM
Sopact Sense
The data design choices in this chapter all run inside one platform. Contacts,
Forms, and Relationships keep every wave linked to the same participant.
Contacts
CRM-style cohort with unique IDs assigned at first contact.
Forms · 12 question types · skip logic
AND/OR conditions, advanced validation, save-progress for long forms.
Relationships
Every form ties to Contacts. T0 + T1 + T2 + T3 all link automatically.
Multi-language · 40+
Forms rendered, responses captured, AI analysis in source language.
Offline + sync
Mobile capture, syncs when connected, persistent ID survives.
THE ACCELERANT
Skills
Prepackaged playbooks for data-design decisions. They shorten the time from blank
page to a measurement plan that funders trust.
{ }study-design-advisor
Recommends the lens combo for your context.
{ }pre-post-validator
Checks T0 and T1 instruments are truly identical.
{ }cohort-balancer
Flags demographic imbalances at intake.
{ }instrument-translator
Renders the form in 40+ languages with branch-aware skip logic preserved.
These built-in Skills run inside Sopact Sense. Your team's custom Skills compose on top.
↑
Why this compounds
The first cohort teaches Sense your instrument. The second cohort starts
with the validated version. By cohort five your team isn't designing the
measurement plan from scratch — they're starting from the best version of last cohort's plan.
15
§ 2.11 · Recap + Up Next
Chapter 02 · §2.11
Six lessons to carry forward.
1
Report quality is decided upstream.
No editorial polish recovers evidence the architecture never captured.
Design before you collect.
2
Three lenses. Compose, don't pick one.
Mixed-method + longitudinal + pre/post. Most credible designs use at least
two; the gold standard uses all three.
3
Numbers tell what. Stories tell why.
Neither alone is sufficient. Keep both in the same record, joined by
participant_id, so a single response reads both as a number and as a quote.
4
The field decides who's in your data.
Offline · skip logic · multi-language — three design choices, made before
a respondent touches the form. Get them right or your sample reflects the
easy half only.
5
Language is three layers, not one.
Collect in source · analyze in source · report in audience language. Decouple
them and you reach everyone without translation lag.
6
Pick an archetype before you build.
Five archetypes cover most measurement plans. Match yours and short-cut
design from weeks to hours.
UP NEXT
Chapter 03 · Collection
You've designed the plan. Now you collect — online, offline, from documents,
from transcripts — and keep all four channels joined on one participant_id.
03
16
End of Chapter 02
END OF CHAPTER 02 · BOOK 01
Six books. One spine. Built for the AI era.
Design done. Collection next. Then transformation. Then reports. Pick the
industry guide that matches your world — or continue straight to Chapter 03.
BOOK 01
Beyond the Survey
You are here
BOOK 03
Grant Management
Industry guide
BOOK 04
Impact Investment
Industry guide
BOOK 05
Workforce Training
Industry guide
BOOK 05
Nonprofit Programs
Industry guide
BOOK 06
Application Management
Industry guide
"Report quality is decided upstream. The design phase is where the report
already exists — or doesn't."