The Sopact Intelligence Library

Book 01 of 06 · Chapter 02

Data
Design.

The methodology of what to collect — mixed-method, longitudinal, pre/post — and the design choices that make multi-language, offline, skip-logic collection produce reports your funder will actually read.

By Unmesh Sheth · Sopact

§ 2.0 · Where this chapter sits

Where this chapter sits

Designing for the
whole record.

Chapter 01 gave you the spine. This chapter is about feeding it well — deciding what to measure, when to measure it, and how to make every form, every interview, and every document arrive clean.

Chapters in Beyond the Survey

00Introduction8 pages

01Workflow22 pages

02Data Designyou are here

03Data Collectionnext chapter

04Intelligent Suite~18 pages

05Actionable Insight~18 pages

The library

Book 01 · this book

Beyond the Survey

The foundational field guide — methodology for the AI era.

Book 02 · industry guide

Application Management

Pitch comps, fellowships, scholarships, accelerators.

Book 03 · industry guide

Grant Intelligence

For program officers and foundation teams.

Book 04 · industry guide

Impact Intelligence

Portfolio outcomes with 5 Dimensions and IRIS+.

Book 05 · industry guide

Training Intelligence

Learner outcomes from enrollment to wage gain.

Book 06 · industry guide

Nonprofit Programs

One unified intelligence layer across many programs.

2

CHAPTER · 02

Data
Design.

Three design lenses, every choice made before a respondent ever touches a form. Plus the field considerations — offline, skip logic, multi-language — that decide whether your report covers everyone or just the easy half.

What you'll learn

01.Why design beats collection-in-the-moment
02.The three lenses: mixed-method · longitudinal · pre/post
03.Designing for the field — offline, skip logic, multi-language
04.A fellowship measurement plan, worked end-to-end

Time to read

14 min

17 pages · 28 illustrations

3

§ 2.1 · Why design

Chapter 02 · §2.1

Most reports fail
at the design phase.

By the time you're staring at an export trying to make sense of 450 rows, the report has already been decided — for better or worse. The architecture of clean reporting is built upstream, in the design choices nobody saw you make.

The "we'll figure it out" path

Six instruments, no shared spine. Pre-program in one tool, follow-up in another. Match by hand later. Headers don't reconcile. Six weeks of cleanup on the back end.

The "designed first" path

Four touchpoints, one ID, one Dictionary. Every wave links automatically. The report exists as the architecture, ready the morning the last response closes. No reconstruction.

Report quality is decided upstream. No amount of editorial polish after collection can recover evidence the architecture never captured.

4

§ 2.2 · Three lenses

Chapter 02 · §2.2

Every measurement plan,
three design lenses.

Before you pick instruments, you make three choices. They aren't methods — they're lenses. Every credible measurement design is some combination of them.

LENS 01

Mixed-method

Numbers + stories in the same record. Numbers tell you what changed. Stories tell you why and for whom.

LENS 02

Longitudinal

Same people, multiple moments. The only way to detect real change rather than a snapshot of who happened to be in the room.

LENS 03

Pre / post

Two waves, one delta. The cleanest way to measure short-term change — as long as you don't overclaim what caused it.

Most credible designs use two lenses at once. Pre/post + mixed-method is the working baseline. Longitudinal + mixed-method is the gold standard. All three together is what the next ten pages teach you.

5

§ 2.3 · Mixed-method

Lens 01 Mixed-method

Numbers tell you what.
Stories tell you why.

Quantitative data answers "how much changed?" Qualitative answers "what changed and for whom?" Neither alone is sufficient. The trick is keeping both in the same record — joined by participant ID — so a single response can be read both as a number and as a quote.

EXAMPLE

Confidence × test score

Rubric score (quant) joined to "tell us about a moment when you felt stuck" (qual). Was the score real or rote?

EXAMPLE

Wage gain × narrative

$1,100/wk after placement (quant) joined to "how has work changed your week?" (qual). Wage gain ≠ life gain — these tell you both.

EXAMPLE

NPS × open-ended

Promoter score = 9 (quant) joined to "what would you change?" (qual). The number is the headline; the quote is the meaning.

6

§ 2.4 · Longitudinal

Lens 02 Longitudinal

Same people.
Multiple moments.

Most "impact" data is a snapshot. Longitudinal design tracks the same individuals across time — and only that lets you separate program effect from "who happened to be in the room this week."

⚠

The attrition trap

Longitudinal designs lose people. Half your T0 cohort may not respond at T3. Plan for it — track who drops, what their T0 record looked like, and report your findings with non-response transparency, not without it.

7

§ 2.5 · Pre / post

Lens 03 Pre / post

Two waves.
One honest delta.

The cheapest credible measurement is a pre/post survey of the same people on the same instrument. It's the ground floor of measurement — but the floor is important, and most teams skip it.

PRE · T0

5-point confidence scale, 47 respondents

+1.4

DELTA

POST · T1

Same instrument, same 47 participants

DESIGN DISCIPLINE

Three rules to keep pre/post honest

Same instrument at T0 and T1. Changed wording = different measurement.
Same individuals. Cohort-level pre means matched to cohort-level post means is not pre/post — it's two cross-sections.
Don't claim causation without a comparison group. You measured change; you didn't prove the program caused it.

8

§ 2.6 · Designing for the field

Chapter 02 · §2.6

Field conditions decide
who's in your data.

The lenses tell you what to measure. The field tells you whether anyone can answer. Three design choices, made before a single form goes out, decide whether your sample reflects your program — or just the easy half.

CHOICE 01

Offline

Will your respondents always have connectivity? Rural communities, field staff in low-bandwidth settings, refugee camps, school visits — connectivity can't be assumed.

·Capture on device, sync when connected
·Photos + voice memos as data, not attachments
·Persistent ID survives the offline session

CHOICE 02

Skip logic

Will respondents see questions that don't apply? A 60-question form becomes a 12-question form for any individual respondent if you branch correctly — and completion rates triple.

·AND/OR conditions on any prior answer
·Show/hide sections, not just questions
·Validation rules per branch path

CHOICE 03

Multi-language

Will your stakeholders speak the language of your instrument? A form that requires English filters out half your population without you realizing.

·Collect in any language · 40+ supported
·AI analysis in source language
·Reports in funder's language

All three are design choices — not collection mechanics. Decide them before you build your first form. The next page goes deep on the most under-considered: multi-language, end-to-end.

9

§ 2.7 · Multi-language end-to-end

Chapter 02 · §2.7

Collect in Swahili.
Analyze in English.
Report in Portuguese.

Most teams accept English as the limiting factor and shrink their sample to match. Designed correctly, language doesn't have to be a constraint at all: your instrument, your AI prompts, and your output reports can each live in different languages without anyone translating by hand.

THREE INDEPENDENT LAYERS · COMPOSE ANY WAY

1

Collection

The form is rendered in the respondent's preferred language. Skip logic, validation messages, and even error states all translate.

EN ·
"What barriers did you face?"
SW ·
"Ulikabili changamoto gani?"
FR ·
"Quels obstacles avez-vous rencontrés?"

2

Prompts

AI prompts read responses in their original language — not after a lossy translation step. Themes and codes come out structured.

PROMPT (any UI lang)
Extract barrier themes from the response in its source language. Tag with: transport · childcare · stigma · cost · info.
RESPONSE (Swahili) →
tag: childcare · transport

3

Reports

The funder reads it in their language. The community partner reads it in theirs. The same evidence, generated three times from the same dataset.

REPORT v1 (EN) ·
"Cohort confidence rose 38%…"
REPORT v2 (PT) ·
"A confiança da coorte subiu 38%…"
REPORT v3 (FR) ·
"La confiance de la cohorte a augmenté 38%…"

The end-to-end flow

      Swahili response → analyzed in source → English structured tags → Portuguese funder report
    

40+ languages supported on collection. Any combination on output. The participant never sees a translated question they didn't ask for. The funder never reads a translated quote without source-language verification.

10

§ 2.8 · Worked example

Chapter 02 · §2.8

A fellowship
measurement plan.
From design choice to final report.

A global fellowship: 80 fellows, 18 countries, 6 working languages, 12 months of programming. Watch how the three lenses + three field choices compose into one coherent measurement plan.

COHORT

80

fellows · global

COUNTRIES

18

across 4 continents

LANGUAGES

6

EN · ES · FR · PT · SW · AR

PROGRAM

12mo

+6mo follow-up

01

Lens · Longitudinal + Mixed-method

Four waves (T0/T1/T2/T3). Each wave = rubric + interview, joined on fellow_id.

T0 intake · T1 month-4 · T2 month-12 · T3 +6mo

02

Lens · Pre/post on confidence rubric

Same 8-dimension instrument at T0 and T2. The delta is the headline.

8-dim rubric · 1–5 scale · same instrument

03

Field · Offline-first for South Sudan + Yemen cohort members

Captured on phone, synced when connected. Photos + voice memos as supporting data.

8 of 80 fellows · offline-capable

04

Field · Skip logic by region + program track

Six tracks × five regions = 30 paths. No fellow sees more than 14 questions.

60Q form → ≤14Q per fellow

05

Field · Multi-language collection + reporting

Collect in 6 languages. Analyze in source. Reports: EN for board, FR for francophone partners, AR for regional convening.

6 collect langs · 3 report langs

The result

One measurement plan, every design choice made before recruitment opens. At T2 + 6 months, the final report writes itself from the accumulated record — in English for the board, French for partners, Arabic for the regional convening. Every quote in every report sources back to the language it was given in.

11

§ 2.9 · Gallery

Chapter 02 · §2.9

Five design archetypes.
One spine.

Most measurement plans match one of five archetypes — or a combination of two. Recognizing yours short-cuts the design phase from weeks to hours.

⇆

Pre / post cohort

p. 13

⌇

Longitudinal w/ attrition

p. 14

"

Qual-primary mixed

book 05

|/|

Treatment + control

book 03

≋

Continuous pulse

book 04

When each archetype fits

Pre/post cohort — your default when the program has a clear start and end, and you can measure the same individuals twice. Most workforce-training, fellowship, and skills programs land here.

Longitudinal w/ attrition — when you need to see effects months after the program ends. Plan for attrition by design: at T3 you'll have 60–80% of T0.

Qual-primary mixed — participatory, ethnographic, or community-led evaluations where stories carry more weight than scales. Numbers exist as triangulation, not headline.

Treatment + control — when you need to claim causation. Higher rigor, higher cost. Most impact investors live here.

The next two pages walk through the two most common archetypes in detail. The others get full treatment in their domain-specific books.

12

§ 2.9.1 · Pre/post cohort

Archetype 01 of 05

Pre / post cohort.

The most common archetype for cohort-based programs. Same individuals, same instrument, two waves. The delta is the headline; the qual is the explanation.

Best for

Cohort programs · 3–12 mo

Sample

20–200 participants

Min waves

2 (T0 + T1)

THE SHAPE

Design rules

Identical instrument at T0 and T1. Any wording change invalidates the comparison.
Same individuals. Cohort-mean change ≠ individual change. Link by participant_id.
Mix quant + qual at both waves. Quant gives you the delta; qual gives you the why.

Watch-outs

Don't claim causation without a comparison group. You measured change.
Response shift bias — what "3" means at T0 may differ from "3" at T1. Anchor with examples.
Selection effects — your sample is who finished, not who started. Report both.

13

§ 2.9.2 · Longitudinal w/ attrition

Archetype 02 of 05

Longitudinal · w/ attrition.

When you need to see post-program effects — wage gain at +6mo, retention at +12mo, civic engagement at +24mo — you design for the long view. And you design for the fact that not everyone will reply.

Best for

Programs with downstream outcomes

Waves

3–5 (T0 → T3+)

Expected attrition

20–40% by T3

THE SHAPE · 4 WAVES, NARROWING SAMPLE

The attrition discipline

Track who drops, not just who stays. Compare T3 respondents' T0 records to T3 non-respondents'.
Plan capture for the long view — at T0, get an alternate contact and an opt-in for +6 / +12 / +24 month outreach.
Report response rate per wave in the final report. Funders trust transparency.

What you gain

Sustained-effect evidence — the kind funders renew on, not the kind they doubt.
Sleeper effects show up at T3 that weren't visible at T1.
A reusable cohort — every renewal cycle starts smarter than the last.

14

§ 2.10 · The accelerant

Chapter 02 · §2.10

How Sopact Sense
handles all of this.

The design lenses and field choices in this chapter are methodology. Sopact Sense is where they get implemented — without three different tools, three different spreadsheets, or three different consultants.

THE PLATFORM

Sopact Sense

The data design choices in this chapter all run inside one platform. Contacts, Forms, and Relationships keep every wave linked to the same participant.

Contacts
CRM-style cohort with unique IDs assigned at first contact.
Forms · 12 question types · skip logic
AND/OR conditions, advanced validation, save-progress for long forms.
Relationships
Every form ties to Contacts. T0 + T1 + T2 + T3 all link automatically.
Multi-language · 40+
Forms rendered, responses captured, AI analysis in source language.
Offline + sync
Mobile capture, syncs when connected, persistent ID survives.

THE ACCELERANT

Skills

Prepackaged playbooks for data-design decisions. They shorten the time from blank page to a measurement plan that funders trust.

{ } study-design-advisor

Recommends the lens combo for your context.
{ } pre-post-validator

Checks T0 and T1 instruments are truly identical.
{ } cohort-balancer

Flags demographic imbalances at intake.
{ } instrument-translator

Renders the form in 40+ languages with branch-aware skip logic preserved.

These built-in Skills run inside Sopact Sense. Your team's custom Skills compose on top.

↑

Why this compounds

The first cohort teaches Sense your instrument. The second cohort starts with the validated version. By cohort five your team isn't designing the measurement plan from scratch — they're starting from the best version of last cohort's plan.

15

§ 2.11 · Recap + Up Next

Chapter 02 · §2.11

Six lessons
to carry forward.

1

Report quality is decided upstream.

No editorial polish recovers evidence the architecture never captured. Design before you collect.

2

Three lenses. Compose, don't pick one.

Mixed-method + longitudinal + pre/post. Most credible designs use at least two; the gold standard uses all three.

3

Numbers tell what. Stories tell why.

Neither alone is sufficient. Keep both in the same record, joined by participant_id, so a single response reads both as a number and as a quote.

4

The field decides who's in your data.

Offline · skip logic · multi-language — three design choices, made before a respondent touches the form. Get them right or your sample reflects the easy half only.

5

Language is three layers, not one.

Collect in source · analyze in source · report in audience language. Decouple them and you reach everyone without translation lag.

6

Pick an archetype before you build.

Five archetypes cover most measurement plans. Match yours and short-cut design from weeks to hours.

UP NEXT

Chapter 03 · Collection

You've designed the plan. Now you collect — online, offline, from documents, from transcripts — and keep all four channels joined on one participant_id.

03

16

End of Chapter 02

END OF CHAPTER 02 · BOOK 01

Six books.
One spine.
Built for the AI era.

Design done. Collection next. Then transformation. Then reports. Pick the industry guide that matches your world — or continue straight to Chapter 03.

BOOK 01

Beyond
the Survey

You are here

BOOK 03

Grant
Management

Industry guide

BOOK 04

Impact
Investment

Industry guide

BOOK 05

Workforce
Training

Industry guide

BOOK 05

Nonprofit
Programs

Industry guide

BOOK 06

Application
Management

Industry guide

"Report quality is decided upstream. The design phase is where the report already exists — or doesn't."

      THE SOPACT INTELLIGENCE LIBRARY · 2026
    

17

DataDesign.

Designing for thewhole record.

Chapters in Beyond the Survey

The library

DataDesign.

Most reports failat the design phase.

Every measurement plan,three design lenses.

Mixed-method

Longitudinal

Pre / post

Numbers tell you what.Stories tell you why.

Confidence × test score

Wage gain × narrative

NPS × open-ended

Same people.Multiple moments.

The attrition trap

Two waves.One honest delta.

Three rules to keep pre/post honest

Field conditions decidewho's in your data.

Offline

Skip logic

Multi-language

Collect in Swahili.Analyze in English.Report in Portuguese.

Collection

Prompts

Reports

The end-to-end flow

A fellowshipmeasurement plan.From design choice to final report.

Lens · Longitudinal + Mixed-method

Lens · Pre/post on confidence rubric

Field · Offline-first for South Sudan + Yemen cohort members

Field · Skip logic by region + program track

Field · Multi-language collection + reporting

Five design archetypes.One spine.

Pre / post cohort

Longitudinal w/ attrition

Qual-primary mixed

Treatment + control

Continuous pulse

When each archetype fits

Pre / post cohort.

Longitudinal · w/ attrition.

How Sopact Sensehandles all of this.

Sopact Sense

Skills

Why this compounds

Six lessonsto carry forward.

Report quality is decided upstream.

Three lenses. Compose, don't pick one.

Numbers tell what. Stories tell why.

The field decides who's in your data.

Language is three layers, not one.

Pick an archetype before you build.

Six books.One spine.Built for the AI era.

Data
Design.

Designing for the
whole record.

Data
Design.

Most reports fail
at the design phase.

Every measurement plan,
three design lenses.

Numbers tell you what.
Stories tell you why.

Same people.
Multiple moments.

Two waves.
One honest delta.

Field conditions decide
who's in your data.

Collect in Swahili.
Analyze in English.
Report in Portuguese.

A fellowship
measurement plan.
From design choice to final report.

Five design archetypes.
One spine.

How Sopact Sense
handles all of this.

Six lessons
to carry forward.

Six books.
One spine.
Built for the AI era.