The Sopact Intelligence Library
Book 01 of 06 · Chapter 03

Data
Collection.

Four channels — online, offline, documents, transcripts — and the one architectural choice that joins them into a single participant record instead of four disconnected exports.

ONLINE web forms OFFLINE mobile + sync DOCUMENTS PDFs · OCR TRANSCRIPTS auto · speakers ONE stakeholder_id
By Unmesh Sheth · Sopact
§ 3.0 · Where this chapter sits
Where this chapter sits

From design
to data flowing in.

Chapter 02 told you what to measure. This chapter is the mechanics of getting it in — including the three channels traditional survey tools ignore.

Chapters in Beyond the Survey

00Introduction8 pages
01Workflow22 pages
02Data Design17 pages
03Data Collectionyou are here
04Intelligent Suitenext chapter
05Actionable Insight~18 pages

The library

Book 01 · this book
Beyond the Survey
The foundational field guide — methodology for the AI era.
Book 02 · industry guide
Application Management
Pitch comps, fellowships, scholarships, accelerators.
Book 03 · industry guide
Grant Intelligence
For program officers and foundation teams.
Book 04 · industry guide
Impact Intelligence
Portfolio outcomes with 5 Dimensions and IRIS+.
Book 05 · industry guide
Training Intelligence
Learner outcomes from enrollment to wage gain.
Book 06 · industry guide
Nonprofit Programs
One unified intelligence layer across many programs.
2
CHAPTER · 03

Data
Collection.

Surveys are one channel. Real collection has four — and pretending the other three don't exist is why "impact data" is usually missing the most important parts of what people actually said.

What you'll learn
  • 01.Why "survey" is the smallest of four channels
  • 02.The four channels — online · offline · documents · transcripts
  • 03.How unique-link-per-respondent joins them all
  • 04.One cohort, four channels, one record — end-to-end
Time to read
12 min
16 pages · 22 illustrations
3
§ 3.1 · Why "survey" is too small
Chapter 03 · §3.1

Most impact data
isn't in the survey.

Application essays, exit interviews, financial documents, partner audits, field photos, voice memos from rural visits. These are the most evidence-rich parts of any program — and traditional survey tools can't accept any of them.

What a survey tool sees
Q1
Rate confidence 1–5
Q2
Select your demographic
Q3
Open text · 200 char max

A flat schema of typed cells. Anything else is "out of scope."

What's actually in the program
📝
Web form
scales + open-ended
📱
Mobile offline
photos + voice
📄
Application PDFs
essays · recs · transcripts
🎙
Exit interviews
transcripts + tags
📊
Quarterly metrics
structured CSVs
🗂
Social audits
3rd-party PDFs

Six input shapes, one record. Each format is data — not exhaust.

If your tool can only handle questions and answers, you're collecting maybe 30% of what your program actually produces.

4
§ 3.2 · Four channels
Chapter 03 · §3.2

Four channels.
One stakeholder ID.

The unlock isn't accepting more formats — it's keeping them all linked to the same person. A unique stakeholder ID assigned at first contact survives across every channel that comes after.

ONLINE web forms · unique links OFFLINE mobile · sync · photos DOCUMENTS PDFs · OCR · extraction TRANSCRIPTS auto · speakers · timestamps ONE PERSON stakeholder_id contacts × forms × rel.

The architectural choice: persistent ID from first contact.

Same person fills a web form in March, gets interviewed in June, submits a PDF in September. All three land on the same record. No reconciliation, no VLOOKUPs, no consultant gluing exports together.

5
§ 3.3 · Online
Channel 01 Online

Web forms,
but unique-link.

Online surveys are familiar — the catch is what most tools do wrong: one generic URL for the whole cohort. A unique link per respondent is the difference between "we got 200 responses" and "we know which 200 people they were and what each of them said the last time too."

Generic URL · the old way
survey.example.com/q3-feedback
5 responses · who's who?

Identity collected inside the form (if at all). Email retypes. Duplicates pile up. Pre/post linkage by hand.

Unique link · designed
sense.app/f/q3?id=p_a7f3
a7f3 b2c1 c9e2 d4a8 e1f5

Identity in the URL. Form pre-fills what's known. Respondent can edit later via the same link. Pre/post linkage is a calculation, not a project.

EMBED
Iframe into any LMS, website, or partner portal. Same unique-link logic.
SAVE-PROGRESS
Long applications resume where the respondent left off. Days later, on any device.
SUBMISSION ALERT
Email triggered on submit with full payload — route to staff or downstream system.
6
§ 3.4 · Offline
Channel 02 Offline

Capture now.
Sync when connected.

The respondents you most need to hear from often have the worst connectivity: rural farmers, refugee settlements, field staff on partner visits. Mobile offline-first capture is the difference between hearing them and writing them out of your data.

IN THE FIELD · NO BARS SURVEY Local storage · queue 47 responses SYNC when device reconnects CLOUD · UNIFIED RECORD 47 records linked to stakeholder_id automatically AI runs on photos + voice + text
PHOTO
Camera-roll attached to a response. AI describes contents at sync time. Evidence, not exhibit.
VOICE
Press-hold to record a 30s voice memo in any language. Transcribed + tagged after sync.
GPS / TIMESTAMP
Optional location + timestamp on each response. Useful for field-monitor accountability.
7
§ 3.5 · Documents
Channel 03 Documents

PDFs are
data, not attachments.

Application essays, financial statements, social audits, grantee reports — these arrive as documents. Treated as "attachments" they sit at the bottom of the record unread. Treated as data, every page becomes searchable evidence with a citation you can click.

PDF input
Sustainability Report 2025
… committed to net-zero emissions by 2035 through a 40% renewable energy mix by year-end 2026 …
… diversity on the board grew from 28% to 41% women-identifying members …
page 12 of 47
extract page-level cite
Structured output
net_zero_year
2035 p.12
renewable_pct_target
40% p.12
board_diversity_pct
41% p.12
prior_year_diversity
28% p.12

Every value clicks back to the page it came from. No "trust me" extracts.

Common extracts
  • Numbers (spend, runway, headcount) with units
  • Claims + commitments, tagged by framework
  • Demographics from rec letters or essays
  • Compliance items checked against checklists
What survives
  • Page-level citation per extracted value
  • Original source quote, in source language
  • Confidence score on each extraction
  • Human-override path when AI gets it wrong
8
§ 3.6 · Transcripts
Channel 04 Transcripts

Interviews become
queryable.

A 30-minute exit interview used to be "we'll listen to it later." Now it's auto-transcribed with speaker labels and timestamps before the call ends — and every line is joined to the same participant record as their survey.

Auto-transcript · timestamped
[00:02:14] Interviewer: Walk me through the moment you realized the program was working for you.
[00:02:24] Participant: Probably week 6. I was helping a peer debug an API call and I didn't have to look anything up.
[00:04:08] Interviewer: What changed for you outside of the technical skills?
[00:04:18] Participant: My partner could finally not ask "did you fix anything today?" like it was a joke.
[00:08:42] Participant: Confidence in interviews is real now. Not faked.
Structured output
THEMES (3)
technical confidence peer recognition family validation
EVIDENCE QUOTES (3)
  • "didn't have to look anything up" 02:24
  • "partner could finally not ask…" 04:18
  • "confidence in interviews is real" 08:42
JOIN KEY
participant_id = p_a7f3
SPEAKER LABELS
AI distinguishes interviewer from interviewee. Multi-party calls are split per speaker.
TIMESTAMP JOIN
Every claim in a report clicks back to the second of the recording it came from.
SOURCE LANGUAGE
Interview in Swahili? Transcribe in Swahili, tag in English, report in Portuguese.
9
§ 3.7 · Worked example
Chapter 03 · §3.7

One cohort.
Four channels.
One participant record.

A coding bootcamp cohort, 60 learners, 14 weeks. Watch how each of the four channels delivers different data — and how all four land on the same record without anyone joining them by hand.

01
ONLINE · WEEK 0
Intake form (unique link per learner)

Demographics · goals · prior experience · accommodations needed

60 / 60 responses
stakeholder_id assigned
02
DOCUMENTS · WEEK 0–1
Application portfolio + rec letters

PDFs extracted into structured fields · joined on stakeholder_id automatically

180 PDFs read
page-cited evidence
03
OFFLINE · WEEKS 1–14
Weekly mobile pulse-checks

14 quick check-ins per learner · sync on commute · early at-risk signals

~840 pulses
99% sync rate
04
TRANSCRIPTS · WEEK 14
Exit interviews · 30 min each

Auto-transcribed with speaker labels · themed in real-time · joined on id

54 interviews
~1620 quote-tags
The result · one record per learner, four channels deep

Day-1 demographics from the form. Application evidence from the PDFs. Weekly pulse data from the phone. Closing reflections from the interview. Same stakeholder_id, four data shapes, zero reconciliation. The cohort report writes itself the morning week 14 ends.

10
§ 3.8 · Collection patterns
Chapter 03 · §3.8

Five patterns,
by program type.

Different programs lean on different channel mixes. Recognizing your pattern short-cuts the architecture phase from weeks to hours.

Two patterns in detail

Workforce training (next page) — online intake + weekly offline pulse + exit transcript. Heaviest on volume of small data points across many weeks. Pulse data is the differentiator versus old-school pre/post-only.
Application-driven (page after) — document-heavy intake (essays, recs, financials) + structured online review forms. Each applicant generates 4–6 documents, all joined to one applicant_id.

The remaining three patterns get full treatment in their domain books.

11
§ 3.8.1 · Workforce training
Pattern 01 of 05

Workforce training.

Online intake at week 0, mobile pulses every week, document-light, transcript at the end. Continuous signal — not just a two-wave snapshot.

Cohort size
30–80 learners
Cadence
weekly pulse
Primary channels
Online + Offline
01
Online intake
  • Unique-link form, week 0
  • Demographics + goals + prior skill
  • Accommodations + language preference
  • stakeholder_id assigned here
02
Mobile pulse
  • 30-second weekly check-in
  • Confidence + blocker + 1 photo
  • Captured offline, syncs on commute
  • Voice memo optional, in source lang
03
Capstone artifact
  • Project PDF or repo link
  • Extracted: stack, complexity, themes
  • Linked to same stakeholder_id
  • Reviewer rubric joined on submit
04
Exit interview
  • 30-min auto-transcribed call
  • Speaker-labeled, time-coded
  • Themed against pulse history
  • Joined to T0 record automatically
05
+6mo follow-up
  • Same unique link as week 0
  • Wage / placement / retention
  • Open: "what's changed since?"
  • Pre/post + longitudinal in one shot
The win

Pulse data surfaces at-risk learners by week 3. Capstone evidence is read, not skimmed. Exit interviews are queryable by theme. +6mo response rate is 77% — because the same unique link still works.

12
§ 3.8.2 · Application-driven
Pattern 02 of 05

Application-driven.

Scholarships, accelerators, fellowships, pitch competitions. Document-heavy intake, structured rubric reviews, decision-supporting reports.

Volume
100–2000 applicants
Docs / app
3–8 PDFs
Primary channels
Documents + Online
01
Application portal
  • Save-progress online form
  • Document uploads inline
  • Skip logic by application track
  • Multi-language form rendering
02
Document extraction
  • Essay themes + sentiment
  • Rec letter signal extraction
  • Financials → structured numbers
  • All joined to applicant_id
03
AI-pre-scored brief
  • One-page summary per applicant
  • Rubric-aligned scoring with citations
  • Outliers flagged for panel attention
  • Time per app: 15 min → 3 min
04
Panel review grid
  • Sortable, citation-backed
  • Multi-reviewer rubric blending
  • Decision audit trail
  • Equity-audit-ready
05
Decisions + report
  • Accept / waitlist / decline tagging
  • Rationale captured per decision
  • Panel-ready evidence report
  • Cohort onboarding ready immediately
The win

500 scholarship applications reviewed in two days instead of three weeks. Every decision auditable, every score citation-backed. Selected cohort flows straight into the pattern-01 workforce-training channel mix without re-entering data.

13
§ 3.9 · The accelerant
Chapter 03 · §3.9

How Sopact Sense
handles all four channels.

Four channels could mean four tools. In Sopact Sense it's one — built around Contacts, Forms, and Relationships, with Skills handling the channel-specific work that traditional tools can't.

THE PLATFORM

Sopact Sense

Four channels, one platform. Contacts hold the unique IDs. Forms handle the structured input. Relationships keep documents and transcripts joined to the right person.

  • Online · web forms with unique links
    Embed, save-progress, submission alerts, 12 question types, validation.
  • Offline · mobile capture with sync
    Local storage, photos, voice memos, GPS, 99%+ sync rates.
  • Documents · PDF extraction
    OCR, structured field extraction, page-level citation, confidence scoring.
  • Transcripts · auto-transcribe
    Speaker labels, timestamps, theming, source-language preservation.
  • Relationships keep all four joined
    One stakeholder_id, four channel feeds, zero reconciliation.
THE ACCELERANT

Skills

Prepackaged playbooks for the channel-specific moves that take a lot of configuration to get right the first time — and zero configuration on every subsequent cohort.

  • { } unique-link-router
    Generates per-respondent URLs and pre-fills known fields.
  • { } offline-sync-monitor
    Tracks sync state across field devices; flags missing data.
  • { } document-extractor
    Pulls structured fields from PDFs with page-level citations.
  • { } transcript-importer
    Brings audio/video into the record with speaker labels and themed quotes.

These Skills run inside Sopact Sense. They aren't shipped as standalone files.

Why this compounds

Cohort 1's transcripts teach Sense your theming vocabulary. Cohort 2 inherits that vocabulary and adds nuance. By cohort 5 your team is starting from the best transcripts pipeline you've ever had — not configuring channel mechanics from scratch.

14
§ 3.10 · Recap + Up Next
Chapter 03 · §3.10

Five lessons
to carry forward.

1

"Survey" is the smallest of four channels.

Online forms are one part. Documents, transcripts, and offline mobile capture cover the other 70% of what your program actually produces.

2

Persistent ID is the architectural choice.

Unique stakeholder_id from first contact survives every channel that comes after. Pre/post becomes a calculation, not a project.

3

Documents and transcripts are data, not attachments.

Every PDF becomes structured fields with page citations. Every interview becomes themed quotes with timestamps. Both join on stakeholder_id.

4

Offline-first or you lose your hardest-to-reach.

Rural, field-staff, low-bandwidth participants are the ones funders most want evidence on. Mobile capture + sync makes them part of your data, not absent from it.

5

Pattern-match before you architect.

Five patterns cover most programs. Find yours, lift the channel mix, short-cut weeks of design work.

UP NEXT
Chapter 04 · Intelligent Suite

Four channels of data arrive on one record. Now: the AI features that analyze them — cell, row, column, grid — and the four canonical report types they produce.

04
15
End of Chapter 03
END OF CHAPTER 03 · BOOK 01

Six books.
One spine.
Built for the AI era.

Collection done across all four channels. Transformation next — where the Intelligent Suite turns this record into reports your funder will read.

BOOK 01
Beyond
the Survey
You are here
BOOK 03
Grant
Management
Industry guide
BOOK 04
Impact
Investment
Industry guide
BOOK 05
Workforce
Training
Industry guide
BOOK 05
Nonprofit
Programs
Industry guide
BOOK 06
Application
Management
Industry guide

"Four channels. One stakeholder ID. Same record growing across every form, every document, every interview."

THE SOPACT INTELLIGENCE LIBRARY · 2026
16