The Sopact Intelligence Library

Book 01 of 06 · Chapter 03

Data
Collection.

Four channels — online, offline, documents, transcripts — and the one architectural choice that joins them into a single participant record instead of four disconnected exports.

By Unmesh Sheth · Sopact

§ 3.0 · Where this chapter sits

Where this chapter sits

From design
to data flowing in.

Chapter 02 told you what to measure. This chapter is the mechanics of getting it in — including the three channels traditional survey tools ignore.

Chapters in Beyond the Survey

00Introduction8 pages

01Workflow22 pages

02Data Design17 pages

03Data Collectionyou are here

04Intelligent Suitenext chapter

05Actionable Insight~18 pages

The library

Book 01 · this book

Beyond the Survey

The foundational field guide — methodology for the AI era.

Book 02 · industry guide

Application Management

Pitch comps, fellowships, scholarships, accelerators.

Book 03 · industry guide

Grant Intelligence

For program officers and foundation teams.

Book 04 · industry guide

Impact Intelligence

Portfolio outcomes with 5 Dimensions and IRIS+.

Book 05 · industry guide

Training Intelligence

Learner outcomes from enrollment to wage gain.

Book 06 · industry guide

Nonprofit Programs

One unified intelligence layer across many programs.

2

CHAPTER · 03

Data
Collection.

Surveys are one channel. Real collection has four — and pretending the other three don't exist is why "impact data" is usually missing the most important parts of what people actually said.

What you'll learn

01.Why "survey" is the smallest of four channels
02.The four channels — online · offline · documents · transcripts
03.How unique-link-per-respondent joins them all
04.One cohort, four channels, one record — end-to-end

Time to read

12 min

16 pages · 22 illustrations

3

§ 3.1 · Why "survey" is too small

Chapter 03 · §3.1

Most impact data
isn't in the survey.

Application essays, exit interviews, financial documents, partner audits, field photos, voice memos from rural visits. These are the most evidence-rich parts of any program — and traditional survey tools can't accept any of them.

What a survey tool sees

Q1

Rate confidence 1–5

Q2

Select your demographic

Q3

Open text · 200 char max

A flat schema of typed cells. Anything else is "out of scope."

What's actually in the program

📝

Web form

scales + open-ended

📱

Mobile offline

photos + voice

📄

Application PDFs

essays · recs · transcripts

🎙

Exit interviews

transcripts + tags

📊

Quarterly metrics

structured CSVs

🗂

Social audits

3rd-party PDFs

Six input shapes, one record. Each format is data — not exhaust.

If your tool can only handle questions and answers, you're collecting maybe 30% of what your program actually produces.

4

§ 3.2 · Four channels

Chapter 03 · §3.2

Four channels.
One stakeholder ID.

The unlock isn't accepting more formats — it's keeping them all linked to the same person. A unique stakeholder ID assigned at first contact survives across every channel that comes after.

∞

The architectural choice: persistent ID from first contact.

Same person fills a web form in March, gets interviewed in June, submits a PDF in September. All three land on the same record. No reconciliation, no VLOOKUPs, no consultant gluing exports together.

5

§ 3.3 · Online

Channel 01 Online

Web forms,
but unique-link.

Online surveys are familiar — the catch is what most tools do wrong: one generic URL for the whole cohort. A unique link per respondent is the difference between "we got 200 responses" and "we know which 200 people they were and what each of them said the last time too."

Generic URL · the old way

survey.example.com/q3-feedback

Identity collected inside the form (if at all). Email retypes. Duplicates pile up. Pre/post linkage by hand.

Unique link · designed

sense.app/f/q3?id=p_a7f3

Identity in the URL. Form pre-fills what's known. Respondent can edit later via the same link. Pre/post linkage is a calculation, not a project.

EMBED

Iframe into any LMS, website, or partner portal. Same unique-link logic.

SAVE-PROGRESS

Long applications resume where the respondent left off. Days later, on any device.

SUBMISSION ALERT

Email triggered on submit with full payload — route to staff or downstream system.

6

§ 3.4 · Offline

Channel 02 Offline

Capture now.
Sync when connected.

The respondents you most need to hear from often have the worst connectivity: rural farmers, refugee settlements, field staff on partner visits. Mobile offline-first capture is the difference between hearing them and writing them out of your data.

PHOTO

Camera-roll attached to a response. AI describes contents at sync time. Evidence, not exhibit.

VOICE

Press-hold to record a 30s voice memo in any language. Transcribed + tagged after sync.

GPS / TIMESTAMP

Optional location + timestamp on each response. Useful for field-monitor accountability.

7

§ 3.5 · Documents

Channel 03 Documents

PDFs are
data, not attachments.

Application essays, financial statements, social audits, grantee reports — these arrive as documents. Treated as "attachments" they sit at the bottom of the record unread. Treated as data, every page becomes searchable evidence with a citation you can click.

PDF input

Sustainability Report 2025

… committed to net-zero emissions by 2035 through a 40% renewable energy mix by year-end 2026 …

… diversity on the board grew from 28% to 41% women-identifying members …

page 12 of 47

Structured output

net_zero_year

2035 p.12

renewable_pct_target

40% p.12

board_diversity_pct

41% p.12

prior_year_diversity

28% p.12

Every value clicks back to the page it came from. No "trust me" extracts.

Common extracts

Numbers (spend, runway, headcount) with units
Claims + commitments, tagged by framework
Demographics from rec letters or essays
Compliance items checked against checklists

What survives

Page-level citation per extracted value
Original source quote, in source language
Confidence score on each extraction
Human-override path when AI gets it wrong

8

§ 3.6 · Transcripts

Channel 04 Transcripts

Interviews become
queryable.

A 30-minute exit interview used to be "we'll listen to it later." Now it's auto-transcribed with speaker labels and timestamps before the call ends — and every line is joined to the same participant record as their survey.

Auto-transcript · timestamped

[00:02:14] Interviewer: Walk me through the moment you realized the program was working for you.
[00:02:24] Participant: Probably week 6. I was helping a peer debug an API call and I didn't have to look anything up.
[00:04:08] Interviewer: What changed for you outside of the technical skills?
[00:04:18] Participant: My partner could finally not ask "did you fix anything today?" like it was a joke.
[00:08:42] Participant: Confidence in interviews is real now. Not faked.

Structured output

THEMES (3)

technical confidence peer recognition family validation

EVIDENCE QUOTES (3)

"didn't have to look anything up" 02:24
"partner could finally not ask…" 04:18
"confidence in interviews is real" 08:42

JOIN KEY

participant_id = p_a7f3

SPEAKER LABELS

AI distinguishes interviewer from interviewee. Multi-party calls are split per speaker.

TIMESTAMP JOIN

Every claim in a report clicks back to the second of the recording it came from.

SOURCE LANGUAGE

Interview in Swahili? Transcribe in Swahili, tag in English, report in Portuguese.

9

§ 3.7 · Worked example

Chapter 03 · §3.7

One cohort.
Four channels.
One participant record.

A coding bootcamp cohort, 60 learners, 14 weeks. Watch how each of the four channels delivers different data — and how all four land on the same record without anyone joining them by hand.

01

ONLINE · WEEK 0

Intake form (unique link per learner)

Demographics · goals · prior experience · accommodations needed

60 / 60 responses
stakeholder_id assigned

02

DOCUMENTS · WEEK 0–1

Application portfolio + rec letters

PDFs extracted into structured fields · joined on stakeholder_id automatically

180 PDFs read
page-cited evidence

03

OFFLINE · WEEKS 1–14

Weekly mobile pulse-checks

14 quick check-ins per learner · sync on commute · early at-risk signals

~840 pulses
99% sync rate

04

TRANSCRIPTS · WEEK 14

Exit interviews · 30 min each

Auto-transcribed with speaker labels · themed in real-time · joined on id

54 interviews
~1620 quote-tags

The result · one record per learner, four channels deep

Day-1 demographics from the form. Application evidence from the PDFs. Weekly pulse data from the phone. Closing reflections from the interview. Same stakeholder_id, four data shapes, zero reconciliation. The cohort report writes itself the morning week 14 ends.

10

§ 3.8 · Collection patterns

Chapter 03 · §3.8

Five patterns,
by program type.

Different programs lean on different channel mixes. Recognizing your pattern short-cuts the architecture phase from weeks to hours.

📝

Workforce training

online + offline mix · pulse-heavy

📄

Application-driven

document-heavy · scholarships, accelerators

🎙

Participatory eval

transcript-heavy · book 05

📊

Impact-investor

document + structured · book 03

📱

Field / rural

offline-first · book 04

Two patterns in detail

Workforce training (next page) — online intake + weekly offline pulse + exit transcript. Heaviest on volume of small data points across many weeks. Pulse data is the differentiator versus old-school pre/post-only.

Application-driven (page after) — document-heavy intake (essays, recs, financials) + structured online review forms. Each applicant generates 4–6 documents, all joined to one applicant_id.

The remaining three patterns get full treatment in their domain books.

11

§ 3.8.1 · Workforce training

Pattern 01 of 05

Workforce training.

Online intake at week 0, mobile pulses every week, document-light, transcript at the end. Continuous signal — not just a two-wave snapshot.

Cohort size

30–80 learners

Cadence

weekly pulse

Primary channels

Online + Offline

01

Online intake

Unique-link form, week 0
Demographics + goals + prior skill
Accommodations + language preference
stakeholder_id assigned here

02

Mobile pulse

30-second weekly check-in
Confidence + blocker + 1 photo
Captured offline, syncs on commute
Voice memo optional, in source lang

03

Capstone artifact

Project PDF or repo link
Extracted: stack, complexity, themes
Linked to same stakeholder_id
Reviewer rubric joined on submit

04

Exit interview

30-min auto-transcribed call
Speaker-labeled, time-coded
Themed against pulse history
Joined to T0 record automatically

05

+6mo follow-up

Same unique link as week 0
Wage / placement / retention
Open: "what's changed since?"
Pre/post + longitudinal in one shot

The win

Pulse data surfaces at-risk learners by week 3. Capstone evidence is read, not skimmed. Exit interviews are queryable by theme. +6mo response rate is 77% — because the same unique link still works.

12

§ 3.8.2 · Application-driven

Pattern 02 of 05

Application-driven.

Scholarships, accelerators, fellowships, pitch competitions. Document-heavy intake, structured rubric reviews, decision-supporting reports.

Volume

100–2000 applicants

Docs / app

3–8 PDFs

Primary channels

Documents + Online

01

Application portal

Save-progress online form
Document uploads inline
Skip logic by application track
Multi-language form rendering

02

Document extraction

Essay themes + sentiment
Rec letter signal extraction
Financials → structured numbers
All joined to applicant_id

03

AI-pre-scored brief

One-page summary per applicant
Rubric-aligned scoring with citations
Outliers flagged for panel attention
Time per app: 15 min → 3 min

04

Panel review grid

Sortable, citation-backed
Multi-reviewer rubric blending
Decision audit trail
Equity-audit-ready

05

Decisions + report

Accept / waitlist / decline tagging
Rationale captured per decision
Panel-ready evidence report
Cohort onboarding ready immediately

The win

500 scholarship applications reviewed in two days instead of three weeks. Every decision auditable, every score citation-backed. Selected cohort flows straight into the pattern-01 workforce-training channel mix without re-entering data.

13

§ 3.9 · The accelerant

Chapter 03 · §3.9

How Sopact Sense
handles all four channels.

Four channels could mean four tools. In Sopact Sense it's one — built around Contacts, Forms, and Relationships, with Skills handling the channel-specific work that traditional tools can't.

THE PLATFORM

Sopact Sense

Four channels, one platform. Contacts hold the unique IDs. Forms handle the structured input. Relationships keep documents and transcripts joined to the right person.

Online · web forms with unique links
Embed, save-progress, submission alerts, 12 question types, validation.
Offline · mobile capture with sync
Local storage, photos, voice memos, GPS, 99%+ sync rates.
Documents · PDF extraction
OCR, structured field extraction, page-level citation, confidence scoring.
Transcripts · auto-transcribe
Speaker labels, timestamps, theming, source-language preservation.
Relationships keep all four joined
One stakeholder_id, four channel feeds, zero reconciliation.

THE ACCELERANT

Skills

Prepackaged playbooks for the channel-specific moves that take a lot of configuration to get right the first time — and zero configuration on every subsequent cohort.

{ } unique-link-router

Generates per-respondent URLs and pre-fills known fields.
{ } offline-sync-monitor

Tracks sync state across field devices; flags missing data.
{ } document-extractor

Pulls structured fields from PDFs with page-level citations.
{ } transcript-importer

Brings audio/video into the record with speaker labels and themed quotes.

These Skills run inside Sopact Sense. They aren't shipped as standalone files.

↑

Why this compounds

Cohort 1's transcripts teach Sense your theming vocabulary. Cohort 2 inherits that vocabulary and adds nuance. By cohort 5 your team is starting from the best transcripts pipeline you've ever had — not configuring channel mechanics from scratch.

14

§ 3.10 · Recap + Up Next

Chapter 03 · §3.10

Five lessons
to carry forward.

1

"Survey" is the smallest of four channels.

Online forms are one part. Documents, transcripts, and offline mobile capture cover the other 70% of what your program actually produces.

2

Persistent ID is the architectural choice.

Unique stakeholder_id from first contact survives every channel that comes after. Pre/post becomes a calculation, not a project.

3

Documents and transcripts are data, not attachments.

Every PDF becomes structured fields with page citations. Every interview becomes themed quotes with timestamps. Both join on stakeholder_id.

4

Offline-first or you lose your hardest-to-reach.

Rural, field-staff, low-bandwidth participants are the ones funders most want evidence on. Mobile capture + sync makes them part of your data, not absent from it.

5

Pattern-match before you architect.

Five patterns cover most programs. Find yours, lift the channel mix, short-cut weeks of design work.

UP NEXT

Chapter 04 · Intelligent Suite

Four channels of data arrive on one record. Now: the AI features that analyze them — cell, row, column, grid — and the four canonical report types they produce.

04

15

End of Chapter 03

END OF CHAPTER 03 · BOOK 01

Six books.
One spine.
Built for the AI era.

Collection done across all four channels. Transformation next — where the Intelligent Suite turns this record into reports your funder will read.

BOOK 01

Beyond
the Survey

You are here

BOOK 03

Grant
Management

Industry guide

BOOK 04

Impact
Investment

Industry guide

BOOK 05

Workforce
Training

Industry guide

BOOK 05

Nonprofit
Programs

Industry guide

BOOK 06

Application
Management

Industry guide

"Four channels. One stakeholder ID. Same record growing across every form, every document, every interview."

      THE SOPACT INTELLIGENCE LIBRARY · 2026
    

16

DataCollection.

From designto data flowing in.

Chapters in Beyond the Survey

The library

DataCollection.

Most impact dataisn't in the survey.

Four channels.One stakeholder ID.

The architectural choice: persistent ID from first contact.

Web forms,but unique-link.

Capture now.Sync when connected.

PDFs aredata, not attachments.

Common extracts

What survives

Interviews becomequeryable.

One cohort.Four channels.One participant record.

Intake form (unique link per learner)

Application portfolio + rec letters

Weekly mobile pulse-checks

Exit interviews · 30 min each

Five patterns,by program type.

Workforce training

Application-driven

Participatory eval

Impact-investor

Field / rural

Two patterns in detail

Workforce training.

Online intake

Mobile pulse

Capstone artifact

Exit interview

+6mo follow-up

Application-driven.

Application portal

Document extraction

AI-pre-scored brief

Panel review grid

Decisions + report

How Sopact Sensehandles all four channels.

Sopact Sense

Skills

Why this compounds

Five lessonsto carry forward.

"Survey" is the smallest of four channels.

Persistent ID is the architectural choice.

Documents and transcripts are data, not attachments.

Offline-first or you lose your hardest-to-reach.

Pattern-match before you architect.

Six books.One spine.Built for the AI era.

Data
Collection.

From design
to data flowing in.

Data
Collection.

Most impact data
isn't in the survey.

Four channels.
One stakeholder ID.

Web forms,
but unique-link.

Capture now.
Sync when connected.

PDFs are
data, not attachments.

Interviews become
queryable.

One cohort.
Four channels.
One participant record.

Five patterns,
by program type.

How Sopact Sense
handles all four channels.

Five lessons
to carry forward.

Six books.
One spine.
Built for the AI era.