The Sopact Intelligence Library
Book 01 of 06 · Chapter 02

Data
Design.

The methodology of what to collect — mixed-method, longitudinal, pre/post — and the design choices that make multi-language, offline, skip-logic collection produce reports your funder will actually read.

DESIGN what to collect 3 LENSES mixed/long/pre-post THE FIELD offline · multi-lang SKIP LOGIC smart flows REPORTS in any language
By Unmesh Sheth · Sopact
§ 2.0 · Where this chapter sits
Where this chapter sits

Designing for the
whole record.

Chapter 01 gave you the spine. This chapter is about feeding it well — deciding what to measure, when to measure it, and how to make every form, every interview, and every document arrive clean.

Chapters in Beyond the Survey

00Introduction8 pages
01Workflow22 pages
02Data Designyou are here
03Data Collectionnext chapter
04Intelligent Suite~18 pages
05Actionable Insight~18 pages

The library

Book 01 · this book
Beyond the Survey
The foundational field guide — methodology for the AI era.
Book 02 · industry guide
Application Management
Pitch comps, fellowships, scholarships, accelerators.
Book 03 · industry guide
Grant Intelligence
For program officers and foundation teams.
Book 04 · industry guide
Impact Intelligence
Portfolio outcomes with 5 Dimensions and IRIS+.
Book 05 · industry guide
Training Intelligence
Learner outcomes from enrollment to wage gain.
Book 06 · industry guide
Nonprofit Programs
One unified intelligence layer across many programs.
2
CHAPTER · 02

Data
Design.

Three design lenses, every choice made before a respondent ever touches a form. Plus the field considerations — offline, skip logic, multi-language — that decide whether your report covers everyone or just the easy half.

What you'll learn
  • 01.Why design beats collection-in-the-moment
  • 02.The three lenses: mixed-method · longitudinal · pre/post
  • 03.Designing for the field — offline, skip logic, multi-language
  • 04.A fellowship measurement plan, worked end-to-end
Time to read
14 min
17 pages · 28 illustrations
3
§ 2.1 · Why design
Chapter 02 · §2.1

Most reports fail
at the design phase.

By the time you're staring at an export trying to make sense of 450 rows, the report has already been decided — for better or worse. The architecture of clean reporting is built upstream, in the design choices nobody saw you make.

The "we'll figure it out" path
survey v1 survey v2 interview followup post-event retro
Six instruments, no shared spine. Pre-program in one tool, follow-up in another. Match by hand later. Headers don't reconcile. Six weeks of cleanup on the back end.
The "designed first" path
T0 baseline T1 midpoint T2 endpoint T3 +6mo one persistent participant_id same fields · same calc · same dictionary → report writes itself at T3
Four touchpoints, one ID, one Dictionary. Every wave links automatically. The report exists as the architecture, ready the morning the last response closes. No reconstruction.

Report quality is decided upstream. No amount of editorial polish after collection can recover evidence the architecture never captured.

4
§ 2.2 · Three lenses
Chapter 02 · §2.2

Every measurement plan,
three design lenses.

Before you pick instruments, you make three choices. They aren't methods — they're lenses. Every credible measurement design is some combination of them.

LENS 01

Mixed-method

Numbers + stories in the same record. Numbers tell you what changed. Stories tell you why and for whom.

LENS 02

Longitudinal

Same people, multiple moments. The only way to detect real change rather than a snapshot of who happened to be in the room.

LENS 03

Pre / post

Two waves, one delta. The cleanest way to measure short-term change — as long as you don't overclaim what caused it.

Most credible designs use two lenses at once. Pre/post + mixed-method is the working baseline. Longitudinal + mixed-method is the gold standard. All three together is what the next ten pages teach you.

5
§ 2.3 · Mixed-method
Lens 01 Mixed-method

Numbers tell you what.
Stories tell you why.

Quantitative data answers "how much changed?" Qualitative answers "what changed and for whom?" Neither alone is sufficient. The trick is keeping both in the same record — joined by participant ID — so a single response can be read both as a number and as a quote.

QUANT Likert scales Rubric scores Counts · rates Demographics QUAL Open responses Transcripts Case notes Field photos the joined story on participant_id
EXAMPLE
Confidence × test score

Rubric score (quant) joined to "tell us about a moment when you felt stuck" (qual). Was the score real or rote?

EXAMPLE
Wage gain × narrative

$1,100/wk after placement (quant) joined to "how has work changed your week?" (qual). Wage gain ≠ life gain — these tell you both.

EXAMPLE
NPS × open-ended

Promoter score = 9 (quant) joined to "what would you change?" (qual). The number is the headline; the quote is the meaning.

6
§ 2.4 · Longitudinal
Lens 02 Longitudinal

Same people.
Multiple moments.

Most "impact" data is a snapshot. Longitudinal design tracks the same individuals across time — and only that lets you separate program effect from "who happened to be in the room this week."

T0 Baseline enrollment survey + interview demographics + goals T1 Mid-program week 6 pulse check at-risk signals T2 Endpoint graduation exit survey capstone reflection T3 + 6 months post-program retention check + wage / placement one participant_id across all four waves

The attrition trap

Longitudinal designs lose people. Half your T0 cohort may not respond at T3. Plan for it — track who drops, what their T0 record looked like, and report your findings with non-response transparency, not without it.

7
§ 2.5 · Pre / post
Lens 03 Pre / post

Two waves.
One honest delta.

The cheapest credible measurement is a pre/post survey of the same people on the same instrument. It's the ground floor of measurement — but the floor is important, and most teams skip it.

PRE · T0
2.8 baseline mean

5-point confidence scale, 47 respondents

12 weeks
+1.4
DELTA
POST · T1
4.2 post-program mean

Same instrument, same 47 participants

DESIGN DISCIPLINE

Three rules to keep pre/post honest

  1. Same instrument at T0 and T1. Changed wording = different measurement.
  2. Same individuals. Cohort-level pre means matched to cohort-level post means is not pre/post — it's two cross-sections.
  3. Don't claim causation without a comparison group. You measured change; you didn't prove the program caused it.
8
§ 2.6 · Designing for the field
Chapter 02 · §2.6

Field conditions decide
who's in your data.

The lenses tell you what to measure. The field tells you whether anyone can answer. Three design choices, made before a single form goes out, decide whether your sample reflects your program — or just the easy half.

CHOICE 01

Offline

Will your respondents always have connectivity? Rural communities, field staff in low-bandwidth settings, refugee camps, school visits — connectivity can't be assumed.

  • ·Capture on device, sync when connected
  • ·Photos + voice memos as data, not attachments
  • ·Persistent ID survives the offline session
CHOICE 02

Skip logic

Will respondents see questions that don't apply? A 60-question form becomes a 12-question form for any individual respondent if you branch correctly — and completion rates triple.

  • ·AND/OR conditions on any prior answer
  • ·Show/hide sections, not just questions
  • ·Validation rules per branch path
A 语
CHOICE 03

Multi-language

Will your stakeholders speak the language of your instrument? A form that requires English filters out half your population without you realizing.

  • ·Collect in any language · 40+ supported
  • ·AI analysis in source language
  • ·Reports in funder's language

All three are design choices — not collection mechanics. Decide them before you build your first form. The next page goes deep on the most under-considered: multi-language, end-to-end.

9
§ 2.7 · Multi-language end-to-end
Chapter 02 · §2.7

Collect in Swahili.
Analyze in English.
Report in Portuguese.

Most teams accept English as the limiting factor and shrink their sample to match. Designed correctly, language doesn't have to be a constraint at all: your instrument, your AI prompts, and your output reports can each live in different languages without anyone translating by hand.

THREE INDEPENDENT LAYERS · COMPOSE ANY WAY
1

Collection

The form is rendered in the respondent's preferred language. Skip logic, validation messages, and even error states all translate.

EN ·
"What barriers did you face?"
SW ·
"Ulikabili changamoto gani?"
FR ·
"Quels obstacles avez-vous rencontrés?"
2

Prompts

AI prompts read responses in their original language — not after a lossy translation step. Themes and codes come out structured.

PROMPT (any UI lang)
Extract barrier themes from the response in its source language. Tag with: transport · childcare · stigma · cost · info.
RESPONSE (Swahili) →
tag: childcare · transport
3

Reports

The funder reads it in their language. The community partner reads it in theirs. The same evidence, generated three times from the same dataset.

REPORT v1 (EN) ·
"Cohort confidence rose 38%…"
REPORT v2 (PT) ·
"A confiança da coorte subiu 38%…"
REPORT v3 (FR) ·
"La confiance de la cohorte a augmenté 38%…"

The end-to-end flow

Swahili response → analyzed in source → English structured tags → Portuguese funder report

40+ languages supported on collection. Any combination on output. The participant never sees a translated question they didn't ask for. The funder never reads a translated quote without source-language verification.

10
§ 2.8 · Worked example
Chapter 02 · §2.8

A fellowship
measurement plan.
From design choice to final report.

A global fellowship: 80 fellows, 18 countries, 6 working languages, 12 months of programming. Watch how the three lenses + three field choices compose into one coherent measurement plan.

COHORT
80
fellows · global
COUNTRIES
18
across 4 continents
LANGUAGES
6
EN · ES · FR · PT · SW · AR
PROGRAM
12mo
+6mo follow-up
01
Lens · Longitudinal + Mixed-method

Four waves (T0/T1/T2/T3). Each wave = rubric + interview, joined on fellow_id.

T0 intake · T1 month-4 · T2 month-12 · T3 +6mo
02
Lens · Pre/post on confidence rubric

Same 8-dimension instrument at T0 and T2. The delta is the headline.

8-dim rubric · 1–5 scale · same instrument
03
Field · Offline-first for South Sudan + Yemen cohort members

Captured on phone, synced when connected. Photos + voice memos as supporting data.

8 of 80 fellows · offline-capable
04
Field · Skip logic by region + program track

Six tracks × five regions = 30 paths. No fellow sees more than 14 questions.

60Q form → ≤14Q per fellow
05
Field · Multi-language collection + reporting

Collect in 6 languages. Analyze in source. Reports: EN for board, FR for francophone partners, AR for regional convening.

6 collect langs · 3 report langs
The result

One measurement plan, every design choice made before recruitment opens. At T2 + 6 months, the final report writes itself from the accumulated record — in English for the board, French for partners, Arabic for the regional convening. Every quote in every report sources back to the language it was given in.

11
§ 2.9 · Gallery
Chapter 02 · §2.9

Five design archetypes.
One spine.

Most measurement plans match one of five archetypes — or a combination of two. Recognizing yours short-cuts the design phase from weeks to hours.

When each archetype fits

Pre/post cohort — your default when the program has a clear start and end, and you can measure the same individuals twice. Most workforce-training, fellowship, and skills programs land here.
Longitudinal w/ attrition — when you need to see effects months after the program ends. Plan for attrition by design: at T3 you'll have 60–80% of T0.
Qual-primary mixed — participatory, ethnographic, or community-led evaluations where stories carry more weight than scales. Numbers exist as triangulation, not headline.
Treatment + control — when you need to claim causation. Higher rigor, higher cost. Most impact investors live here.

The next two pages walk through the two most common archetypes in detail. The others get full treatment in their domain-specific books.

12
§ 2.9.1 · Pre/post cohort
Archetype 01 of 05

Pre / post cohort.

The most common archetype for cohort-based programs. Same individuals, same instrument, two waves. The delta is the headline; the qual is the explanation.

Best for
Cohort programs · 3–12 mo
Sample
20–200 participants
Min waves
2 (T0 + T1)
THE SHAPE
T0 Day 1 Rubric ×8 Open: goals Demographics T1 Final day Rubric ×8 Open: change Capstone refl. 12 weeks · same instrument · same fellows Δ confidence · Δ skill · Δ network
Design rules
  • Identical instrument at T0 and T1. Any wording change invalidates the comparison.
  • Same individuals. Cohort-mean change ≠ individual change. Link by participant_id.
  • Mix quant + qual at both waves. Quant gives you the delta; qual gives you the why.
Watch-outs
  • Don't claim causation without a comparison group. You measured change.
  • Response shift bias — what "3" means at T0 may differ from "3" at T1. Anchor with examples.
  • Selection effects — your sample is who finished, not who started. Report both.
13
§ 2.9.2 · Longitudinal w/ attrition
Archetype 02 of 05

Longitudinal · w/ attrition.

When you need to see post-program effects — wage gain at +6mo, retention at +12mo, civic engagement at +24mo — you design for the long view. And you design for the fact that not everyone will reply.

Best for
Programs with downstream outcomes
Waves
3–5 (T0 → T3+)
Expected attrition
20–40% by T3
THE SHAPE · 4 WAVES, NARROWING SAMPLE
T0 Intake 100 T1 Mid (mo 6) 92 T2 End (mo 12) 78 T3 +6mo 62 sample shrinks · design for it · report it
The attrition discipline
  • Track who drops, not just who stays. Compare T3 respondents' T0 records to T3 non-respondents'.
  • Plan capture for the long view — at T0, get an alternate contact and an opt-in for +6 / +12 / +24 month outreach.
  • Report response rate per wave in the final report. Funders trust transparency.
What you gain
  • Sustained-effect evidence — the kind funders renew on, not the kind they doubt.
  • Sleeper effects show up at T3 that weren't visible at T1.
  • A reusable cohort — every renewal cycle starts smarter than the last.
14
§ 2.10 · The accelerant
Chapter 02 · §2.10

How Sopact Sense
handles all of this.

The design lenses and field choices in this chapter are methodology. Sopact Sense is where they get implemented — without three different tools, three different spreadsheets, or three different consultants.

THE PLATFORM

Sopact Sense

The data design choices in this chapter all run inside one platform. Contacts, Forms, and Relationships keep every wave linked to the same participant.

  • Contacts
    CRM-style cohort with unique IDs assigned at first contact.
  • Forms · 12 question types · skip logic
    AND/OR conditions, advanced validation, save-progress for long forms.
  • Relationships
    Every form ties to Contacts. T0 + T1 + T2 + T3 all link automatically.
  • Multi-language · 40+
    Forms rendered, responses captured, AI analysis in source language.
  • Offline + sync
    Mobile capture, syncs when connected, persistent ID survives.
THE ACCELERANT

Skills

Prepackaged playbooks for data-design decisions. They shorten the time from blank page to a measurement plan that funders trust.

  • { } study-design-advisor
    Recommends the lens combo for your context.
  • { } pre-post-validator
    Checks T0 and T1 instruments are truly identical.
  • { } cohort-balancer
    Flags demographic imbalances at intake.
  • { } instrument-translator
    Renders the form in 40+ languages with branch-aware skip logic preserved.

These built-in Skills run inside Sopact Sense. Your team's custom Skills compose on top.

Why this compounds

The first cohort teaches Sense your instrument. The second cohort starts with the validated version. By cohort five your team isn't designing the measurement plan from scratch — they're starting from the best version of last cohort's plan.

15
§ 2.11 · Recap + Up Next
Chapter 02 · §2.11

Six lessons
to carry forward.

1

Report quality is decided upstream.

No editorial polish recovers evidence the architecture never captured. Design before you collect.

2

Three lenses. Compose, don't pick one.

Mixed-method + longitudinal + pre/post. Most credible designs use at least two; the gold standard uses all three.

3

Numbers tell what. Stories tell why.

Neither alone is sufficient. Keep both in the same record, joined by participant_id, so a single response reads both as a number and as a quote.

4

The field decides who's in your data.

Offline · skip logic · multi-language — three design choices, made before a respondent touches the form. Get them right or your sample reflects the easy half only.

5

Language is three layers, not one.

Collect in source · analyze in source · report in audience language. Decouple them and you reach everyone without translation lag.

6

Pick an archetype before you build.

Five archetypes cover most measurement plans. Match yours and short-cut design from weeks to hours.

UP NEXT
Chapter 03 · Collection

You've designed the plan. Now you collect — online, offline, from documents, from transcripts — and keep all four channels joined on one participant_id.

03
16
End of Chapter 02
END OF CHAPTER 02 · BOOK 01

Six books.
One spine.
Built for the AI era.

Design done. Collection next. Then transformation. Then reports. Pick the industry guide that matches your world — or continue straight to Chapter 03.

BOOK 01
Beyond
the Survey
You are here
BOOK 03
Grant
Management
Industry guide
BOOK 04
Impact
Investment
Industry guide
BOOK 05
Workforce
Training
Industry guide
BOOK 05
Nonprofit
Programs
Industry guide
BOOK 06
Application
Management
Industry guide

"Report quality is decided upstream. The design phase is where the report already exists — or doesn't."

THE SOPACT INTELLIGENCE LIBRARY · 2026
17