DEV Community

Cover image for How I Made a Voice-First Todo List That's Actually Fast (And Why I Rewrote Half of It)
Isidore Mikorey-Nilsson
Isidore Mikorey-Nilsson

Posted on

How I Made a Voice-First Todo List That's Actually Fast (And Why I Rewrote Half of It)

TL;DR: I built a todo app. Then I made it slower by over-engineering it. Then I made it 6x faster by embracing platform capabilities. Here's what I learned about premature optimization, when to actually optimize, and why sometimes the "boring" solution is the right one.


Screenshot of WhisperPlan main interface with voice recording

The Problem: Why Another Todo App? (Besides the Obvious Answer: "I Have ADHD")

Look, I know what you're thinking. "Another todo app? Really? In 2025?"

But hear me out. I built WhisperPlan for one simple reason: I hate typing on my phone. Like, really hate it. The autocorrect betrayals, the tiny keyboard, the context switching from "thing I need to do" to "person actively typing on a tiny keyboard."

And here's the thing about ADHD (which I have, and which I'm building for): the friction between "I should do this" and "I have written this down" needs to be as close to zero as possible. Because if it takes more than 3 seconds, my brain has already moved on to something else. Probably a Wikipedia deep-dive about Byzantine architecture.

So I built WhisperPlan: a voice-first todo list. You press a button, you talk, and boom—tasks appear. Like magic, except it's actually OpenAI's Whisper and Google's Gemini having a conversation about your life.

Recording a voice note and seeing tasks appear magically

The app includes:

  • Voice-to-tasks in seconds: Record → Transcribe → AI extracts tasks with due dates, priorities, and projects
  • ADHD-friendly gamification: Confetti, celebrations, streaks (because dopamine matters)
  • Focus Mode: One task at a time with a Pomodoro timer (hello, Dynamic Island!)
  • Calendar Integration: Seeing what you have to do this week just by looking at your calendar is magical!

This is the story of how I built it, realized it was too slow, tore half of it out, and made it 6x faster. Buckle up.


The Stack: Choosing Your Weapons (Or: "The Tech I Already Knew")

Let me be honest with you: I didn't choose this stack after careful consideration of all options, extensive benchmarking, and architectural decision records. I chose it because I already knew these technologies and I wanted to ship something, not write a PhD thesis.

Here's what I went with:

Frontend: iOS (SwiftUI + SwiftData)

Why native? Because I wanted widgets, Dynamic Island integration, and that buttery 120fps scroll. Also, my target audience is "people who own iPhones and hate typing," which feels like a pretty specific demographic.

SwiftUI is delightful once you stop fighting it. SwiftData is CoreData without the trauma. Together they're like peanut butter and chocolate—if peanut butter occasionally crashed your app because you forgot to mark something @Published.

Backend: NestJS on Google Cloud Run

Why NestJS? TypeScript comfort zone. Decorators make me feel fancy. Dependency injection makes me feel like a "real" backend developer.

Why Cloud Run? Because I wanted serverless-ish (scale to zero, pay for what you use) but I also wanted to deploy a Docker container and not think about it. Plus, the cold start is under 1 second, which is faster than me making decisions in the morning.

Database: Cloud Firestore

Plot twist: This choice would later save my performance bacon. But initially, I treated it like a boring NoSQL database I had to use because I was already in the Firebase ecosystem.

Spoiler alert: Firestore's real-time listeners and offline support are chef's kiss.

AI: OpenAI Whisper + Google Gemini

  • Whisper (specifically gpt-4o-transcribe): For transcription. It's scary good. It handles my fast-talking, my accent, and even my tendency to say "um" 47 times per recording.
  • Gemini (gemini-2.5-flash): For extracting structured tasks from transcriptions. Turns "I need to call mom tomorrow at 2pm and also buy milk" into two properly formatted tasks.

Auth: Firebase Authentication

Sign in with Apple. Because it's 2025 and nobody wants another password.

Architecture diagram v1 - Backend-heavy approach

The "boring technology" crowd would approve of most of this. Except maybe the part where I initially over-engineered the backend. We'll get to that.


The First Architecture: Everything Through the Backend (AKA "Look Ma, I'm A Real Backend Developer!")

Here's how I initially architected WhisperPlan:

iOS App → NestJS Backend → Firestore → NestJS Backend → iOS App
Enter fullscreen mode Exit fullscreen mode

Every. Single. Operation. Went. Through. The. Backend.

Reading tasks? Backend API call.
Creating tasks? Backend API call.
Updating a task? Backend API call.
Marking a checkbox? You guessed it—Backend API call.

Why Did This Make Sense At The Time?

Look, I had reasons:

  1. API keys need to be secret: OpenAI and Gemini keys can't live in the iOS app. ✅ Valid!
  2. Quota management for freemium: I needed to enforce "20 tasks/month on free plan" somehow. ✅ Also valid!
  3. Backend as source of truth: The backend should control everything, right? That's what we learned in school! ❓ Questionable...
  4. I just really like writing TypeScript: ❌ Not a good reason, Isidore.

So I built this beautiful, over-engineered sync service. 477 lines of TypeScript that handled:

  • Bidirectional sync (client → server, server → client)
  • Conflict resolution
  • Change tracking
  • Incremental updates
  • A complex state machine

Here's a simplified version of what that looked like:

// backend/src/sync/sync.controller.ts
@Post()
async incrementalSync(
  @CurrentUser() user: CurrentUserData,
  @Body() dto: IncrementalSyncDto,
) {
  const results = [];

  // Process all local changes from the client
  for (const change of dto.changes) {
    if (change.action === 'create') {
      const created = await this.tasksService.create(user.uid, change.data);
      results.push({ changeId: change.id, serverId: created.id });
    } else if (change.action === 'update') {
      await this.tasksService.update(user.uid, change.serverId, change.data);
      results.push({ changeId: change.id, action: 'updated' });
    } else if (change.action === 'delete') {
      await this.tasksService.delete(user.uid, change.serverId);
      results.push({ changeId: change.id, action: 'deleted' });
    }
  }

  // Fetch all changes from the server since last sync
  const serverChanges = await this.syncService.getChanges(
    user.uid,
    dto.lastSyncAt
  );

  // Check for conflicts (same resource modified in both places)
  const conflicts = this.syncService.detectConflicts(
    dto.changes,
    serverChanges
  );

  return {
    results,
    conflicts,
    serverChanges,
    syncedAt: new Date().toISOString(),
  };
}
Enter fullscreen mode Exit fullscreen mode

This endpoint did EVERYTHING. It was my baby. I was so proud of it.

It was also completely unnecessary.

Sequence diagram of the original sync flow showing way too many arrows

The iOS side was equally complex:

// The old BackendSyncService.swift: 477 lines of pain
class BackendSyncService: ObservableObject {
    func syncWithBackend() async throws {
        // Collect all local changes
        let changes = try await collectLocalChanges()

        // Send to backend
        let response = try await apiClient.post("/sync", body: changes)

        // Process results
        for result in response.results {
            try await applyResult(result)
        }

        // Handle conflicts (oh god the conflicts)
        for conflict in response.conflicts {
            try await resolveConflict(conflict) // 🔥 This is fine 🔥
        }

        // Apply server changes
        for change in response.serverChanges {
            try await applyServerChange(change)
        }

        // Update last sync timestamp
        lastSyncAt = response.syncedAt
    }
}
Enter fullscreen mode Exit fullscreen mode

Beautiful, right? Complex, sophisticated, enterprise-grade!

Also: slow. Very, very slow.


The Wake-Up Call: Performance Metrics (Or: "Why Is My App So Slow?")

I launched a beta. People used it. People... waited. A lot.

Then I actually measured things (novel concept, I know):

  • App startup: 2-3 seconds

    • Why? Full sync on every cold start
    • User experience? Staring at a loading spinner
  • Task completion: 500-1000ms

    • Why? API call → backend update → response → local update
    • User experience? "Did I tap that? Let me tap it again."
  • Network requests per session: 20-30 requests

    • Why? Because EVERYTHING went through the backend
    • User experience? RIP their cellular data
  • Data transferred: ~100KB per session

    • Why? Full task lists on every sync
    • User experience? Not great, Bob

The moment of clarity came when I sat with the app open and actually used it. I completed a task. I waited. The checkbox hung there in limbo. Then—half a second later—it actually checked.

Half a second to mark a checkbox.

I was making THREE API CALLS just to check a box:

  1. PATCH /tasks/:id to update the task
  2. GET /sync to fetch updated data
  3. GET /tasks to reload the list (just to be safe!)

This was ridiculous. This was over-engineering. This was... exactly the kind of thing I make fun of other developers for doing.

Time to fix it.


The Pivot: Hybrid Architecture (Or: "Wait, Firestore Has An iOS SDK?")

Here's the thing about epiphanies: they usually involve realizing you were doing something dumb all along.

I was treating Firestore like a dumb database that I had to protect behind an API. But Firestore isn't dumb. Firestore is smart. It has:

  • Real-time listeners (automatic sync)
  • Offline support (with a local cache)
  • Security rules (server-side enforcement)
  • Native SDKs (for iOS, Android, web)

I had all of this available and I was... not using it? Because of some vague notion that "backends should control everything"?

The Aha Moment

I was reading the Firestore documentation (procrastinating, really) and I saw this:

"Firestore SDKs include built-in support for offline data persistence. This feature caches a copy of the Cloud Firestore data that your app is actively using, so your app can access the data when the device is offline."

Wait. WHAT?

You mean I can:

  • Read data directly from Firestore (instant, even offline)
  • Update data directly in Firestore (instant, automatic sync)
  • Use security rules to enforce permissions (no need for backend middleware)
  • Let Firestore handle all the real-time sync magic

And I've been... wrapping everything in a REST API... for no reason?

The New Rules

I rewrote the architecture with a simple principle: Use the backend only for things that REQUIRE the backend.

Backend ONLY for:
✅ Transcription (needs OpenAI API key)
✅ AI extraction (needs Gemini API key)  
✅ Creating tasks (needs quota management)
✅ Deleting tasks (needs quota decrement)
✅ Subscription verification (needs server validation)

Firestore Direct Access for:
✅ Reading tasks/projects (real-time listeners)
✅ Updating tasks/projects (instant, offline-first)
✅ Marking tasks complete (no round-trip needed)
Enter fullscreen mode Exit fullscreen mode

This is what software architects call a "hybrid approach." I call it "using the right tool for the job."

Code Example: Before and After

Here's what completing a task looked like before:

// BEFORE: The slow way
func completeTask(_ task: TaskItem) async throws {
    // 1. Optimistic update (for perceived speed)
    task.isCompleted = true

    // 2. Send to backend
    let response = try await apiClient.patch("/tasks/\(task.serverId)", [
        "isCompleted": true,
        "completedAt": Date().toISOString()
    ])

    // 3. Wait for response (500ms later...)
    task.isCompleted = response.isCompleted
    task.completedAt = response.completedAt

    // 4. Trigger sync to update other fields that might have changed
    try await backendSyncService.syncWithBackend()

    // Total time: 500-1000ms
    // Network requests: 2-3
}
Enter fullscreen mode Exit fullscreen mode

And here's the new way:

// AFTER: The fast way
func completeTask(_ task: TaskItem) async throws {
    // 1. Update locally
    task.isCompleted = true
    task.completedAt = Date()
    try modelContext.save()

    // 2. Update Firestore directly
    try await firestoreService.updateTask(
        userId: userId,
        taskId: task.serverId,
        updates: [
            "isCompleted": true,
            "completedAt": task.completedAt?.iso8601String ?? NSNull(),
            "updatedAt": Date().iso8601String
        ]
    )

    // That's it! Firestore syncs to other devices automatically.
    // Total time: 50-100ms
    // Network requests: 1
}
Enter fullscreen mode Exit fullscreen mode

50-100ms. 10x faster. And it works offline!

The Security Layer: Firestore Rules Are Magic

Now, you might be thinking: "But wait! If clients can write directly to Firestore, what about security? What about quota enforcement?"

This is where Firestore security rules save the day. They're basically server-side validators that run on every request:

// firestore.rules
rules_version = '2';
service cloud.firestore {
  match /databases/{database}/documents {
    // Helper function: Check if the user owns this resource
    function isOwner(userId) {
      return request.auth != null && request.auth.uid == userId;
    }

    // Helper function: Validate task updates
    function validTaskUpdate() {
      let allowedFields = ['title', 'description', 'dueDate', 
                           'priority', 'isCompleted', 'completedAt',
                           'isArchived', 'updatedAt'];
      return request.resource.data.keys().hasAll(allowedFields) == false;
    }

    match /users/{userId}/tasks/{taskId} {
      // Read: only if you own it
      allow read: if isOwner(userId);

      // Update: only if you own it AND the update is valid
      allow update: if isOwner(userId) && validTaskUpdate();

      // Create/Delete: NOPE! Must go through backend
      allow create, delete: if false;
    }

    match /users/{userId}/projects/{projectId} {
      allow read: if isOwner(userId);
      allow update: if isOwner(userId);
      allow create, delete: if false; // Backend only!
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

This is brilliant because:

  1. Users can update their own tasks (instant, offline-capable)
  2. Users CANNOT create or delete tasks (must go through backend for quota checks)
  3. Security is enforced server-side (no client can bypass this)
  4. Validation happens automatically (malformed updates are rejected)

The backend still handles creates/deletes so I can enforce the freemium quota:

// backend/src/tasks/tasks.service.ts
async create(userId: string, dto: CreateTaskDto) {
  // Check quota for free users
  const user = await this.usersService.findOne(userId);

  if (user.plan === 'free') {
    const stats = await this.usersService.getUsageStats(userId);
    if (stats.activeTasksCount >= 25) {
      throw new ForbiddenException('Free plan limit reached');
    }
  }

  // Create task in Firestore
  const taskRef = await this.firestore
    .collection(`users/${userId}/tasks`)
    .add({
      ...dto,
      isCompleted: false,
      createdAt: FieldValue.serverTimestamp(),
      updatedAt: FieldValue.serverTimestamp(),
    });

  // Increment counter
  await this.firestore
    .doc(`users/${userId}`)
    .update({
      activeTasksCount: FieldValue.increment(1)
    });

  return { id: taskRef.id, ...dto };
}
Enter fullscreen mode Exit fullscreen mode

Beautiful. Secure. Fast.

Architecture diagram v2 - Hybrid approach showing direct Firestore access

Firestore security rules visualization showing the layers

The New Sync Service: From 477 Lines to Real-Time Listeners

Remember that 477-line sync service? Gone. Replaced with this:

// ios/whisperPlan/Services/FirestoreService.swift
class FirestoreService: ObservableObject {
    private var tasksListener: ListenerRegistration?
    private var projectsListener: ListenerRegistration?

    func observeTasks(userId: String, modelContext: ModelContext) {
        tasksListener = db.collection("users/\(userId)/tasks")
            .addSnapshotListener { snapshot, error in
                guard let documents = snapshot?.documents else { return }

                for document in documents {
                    let data = document.data()

                    // Check if task already exists locally
                    let predicate = #Predicate<TaskItem> { 
                        $0.serverId == document.documentID 
                    }
                    let existing = try? modelContext.fetch(
                        FetchDescriptor(predicate: predicate)
                    ).first

                    if let task = existing {
                        // Update existing task
                        task.title = data["title"] as? String ?? ""
                        task.isCompleted = data["isCompleted"] as? Bool ?? false
                        // ... update other fields
                    } else {
                        // Create new task
                        let task = TaskItem(from: data, serverId: document.documentID)
                        modelContext.insert(task)
                    }
                }

                try? modelContext.save()
            }
    }

    func updateTask(userId: String, taskId: String, 
                    updates: [String: Any]) async throws {
        try await db.document("users/\(userId)/tasks/\(taskId)")
            .updateData(updates)
        // Done! Firestore will notify all listeners automatically
    }
}
Enter fullscreen mode Exit fullscreen mode

That's it. Real-time sync. Offline support. Automatic conflict resolution (last-write-wins based on updatedAt timestamp). All in about 100 lines.

The BackendSyncService still exists, but it's now down to 150 lines and only handles:

  • Creating tasks (via backend API)
  • Deleting tasks (via backend API)
  • Creating projects (via backend API)
  • Deleting projects (via backend API)

Everything else? Direct Firestore access.


The Results: Numbers Don't Lie (Finally, Some Good News!)

After rewriting the architecture, I measured again:

Metric Before After Improvement
App startup 2-3s 0.5s 4-6x faster
Task completion 500-1000ms 50-100ms 10x faster
Data transferred ~100KB ~5KB 95% less
Network requests 20-30 3-5 80% less
Cloud Run costs $20-30/mo $5-10/mo 70% savings

But the numbers don't tell the whole story. The app feels different now:

Offline Mode Just Works™

Because Firestore's SDK has built-in persistence, offline mode is basically automatic:

  1. User modifies a task (no internet)
  2. Firestore writes to local cache
  3. UI updates instantly
  4. When internet returns, Firestore syncs automatically
  5. Other devices get updates via real-time listeners

No complex queue system. No manual retry logic. No sync conflicts to resolve manually. It just works.

Real-Time Sync Between Devices

The Firestore listeners mean that if I complete a task on my iPhone, it appears as completed on my iPad instantly. No polling. No manual refresh. Magic.

// This is all you need for real-time sync:
firestoreService.observeTasks(userId: userId, modelContext: modelContext)

// Firestore handles:
// - Initial data load
// - Real-time updates
// - Conflict resolution
// - Offline caching
// - Automatic reconnection
// - Everything
Enter fullscreen mode Exit fullscreen mode

The first time I saw this work, I literally said "woah" out loud like I was in The Matrix.


Lessons Learned for Indie Devs

1. Start Simple, But Not Too Simple

My mistake wasn't building a backend—it was building TOO MUCH backend.

What I should have done from the start:

  • Use Firestore's native SDKs for CRUD operations
  • Use backend only for things that require secrets or business logic
  • Embrace platform capabilities instead of abstracting them away

The trap I fell into:
"If I'm building a backend, everything should go through the backend!"

No. Just because you have a hammer doesn't mean everything is a nail. Sometimes things are screws. Or maybe they don't need fastening at all and you're just adding complexity.

2. Measure Before Optimizing (But Also Actually Measure)

I violated both parts of this:

  1. I didn't measure initially (just assumed it was fine)
  2. I didn't optimize until it was obviously slow

The right approach:

  • Add basic performance monitoring from day 1
  • Set acceptable targets (e.g., "task completion < 200ms")
  • Measure again after major changes
  • Let data guide your decisions

Tools I wish I'd used earlier:

  • Firebase Performance Monitoring (literally free)
  • Xcode Instruments (already installed)
  • Backend latency logging (one line of code)

3. Security Rules Are Your Friend (And Surprisingly Powerful)

Firestore security rules are essentially a DSL for server-side validation. They're:

  • Type-safe: Wrong field types are rejected
  • Composable: Functions can call other functions
  • Testable: Firebase Emulator lets you test them locally
  • Fast: Run on Google's infrastructure, not your backend

This snippet alone saved me from having to write an entire middleware layer:

function validTaskUpdate() {
  // Only allow specific fields to be updated
  let allowedFields = ['title', 'description', 'isCompleted', 
                       'completedAt', 'priority', 'dueDate'];
  let incomingFields = request.resource.data.keys();
  return incomingFields.hasAll(allowedFields) == false;
}
Enter fullscreen mode Exit fullscreen mode

4. The Freemium Quota Problem (Solved!)

Challenge: How do you enforce "25 tasks max" when clients write directly to Firestore?

Solution: Split operations by privilege level:

  • Creates/Deletes: Must go through backend (quota enforcement)
  • Reads/Updates: Direct Firestore access (no quota needed)

Backend code for enforcement:

async create(userId: string, dto: CreateTaskDto) {
  const user = await this.usersService.findOne(userId);

  // Check quota
  if (user.plan === 'free' && user.activeTasksCount >= 25) {
    throw new ForbiddenException({
      error: 'task_limit_reached',
      message: 'Free plan allows 25 active tasks. Upgrade to Pro for unlimited tasks.',
      currentCount: user.activeTasksCount,
      limit: 25
    });
  }

  // Create task
  const taskRef = await this.firestore
    .collection(`users/${userId}/tasks`)
    .add(dto);

  // Increment counter atomically
  await this.firestore.doc(`users/${userId}`).update({
    activeTasksCount: FieldValue.increment(1)
  });

  return { id: taskRef.id };
}
Enter fullscreen mode Exit fullscreen mode

This way:

  • Free users can't bypass the limit (creates must use backend)
  • Users get instant updates (no need for backend roundtrips)
  • Counting is atomic (no race conditions)

5. Offline-First Is Easier Than You Think

I thought offline support meant:

  • Complex queueing system
  • Manual conflict resolution
  • Edge case nightmares

Reality:

// Enable persistence (one line)
let settings = FirestoreSettings()
settings.isPersistenceEnabled = true
settings.cacheSizeBytes = FirestoreCacheSizeUnlimited
db.settings = settings

// That's it. You now have offline support.
Enter fullscreen mode Exit fullscreen mode

Firestore handles:

  • Local caching
  • Offline writes (stored locally)
  • Automatic sync when online
  • Conflict resolution (configurable, defaults to last-write-wins)

Combined with SwiftData for app-level caching, the user never sees "no connection" errors. They just... use the app.


The AI Integration: Voice to Tasks (The Fun Part!)

Okay, let's talk about the actual point of WhisperPlan: turning voice into tasks.

Step 1: Transcription (OpenAI Whisper)

When the user records audio, the iOS app sends it to my backend:

// ios/whisperPlan/Services/AudioRecordingService.swift
func transcribeRecording() async throws -> String {
    guard let audioData = try? Data(contentsOf: recordingURL) else {
        throw RecordingError.fileNotFound
    }

    // Send to backend
    let response = try await apiClient.post("/transcription", 
        multipart: [
            "file": audioData,
            "language": preferredLanguage
        ]
    )

    return response.text
}
Enter fullscreen mode Exit fullscreen mode

Backend forwards it to OpenAI:

// backend/src/transcription/transcription.service.ts
async transcribe(audioBuffer: Buffer, filename: string, language?: string) {
  const formData = new FormData();
  formData.append('file', audioBuffer, { filename });
  formData.append('model', 'gpt-4o-transcribe'); // The good stuff
  formData.append('response_format', 'json');

  if (language) {
    formData.append('language', language);
  }

  const response = await axios.post(
    'https://api.openai.com/v1/audio/transcriptions',
    formData,
    {
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        ...formData.getHeaders()
      },
      timeout: 120000 // 2 minutes for large files
    }
  );

  return {
    text: response.data.text,
    language: response.data.language,
    duration: response.data.duration
  };
}
Enter fullscreen mode Exit fullscreen mode

The gpt-4o-transcribe model is scary good. It handles:

  • Multiple languages (automatic detection)
  • Accents and speech patterns
  • Background noise
  • Um's, uh's, and filler words (filtered out)
  • Natural pauses and context

Step 2: Task Extraction (Google Gemini)

Now I have text: "I need to call mom tomorrow at 2pm and also buy milk and don't forget to finish the blog post by Friday"

Time to turn that into structured tasks. Enter Google Gemini:

// backend/src/tasks/tasks.service.ts (simplified)
async extractTasksFromTranscription(
  userId: string,
  transcription: string,
  language: string
) {
  // Get user's existing projects for context
  const projects = await this.projectsService.findAll(userId);
  const projectNames = projects.map(p => p.name);

  // Build a context-aware prompt with:
  // - Current date and time (for relative date parsing)
  // - User's project list (for automatic categorization)
  // - Language preference (for better understanding)
  // - Clear JSON schema definition
  // - Few-shot examples (3-5 examples work best)
  const prompt = buildTaskExtractionPrompt({
    transcription,
    currentDate: new Date(),
    projects: projectNames,
    language,
  });

  const response = await this.geminiService.generate(prompt);
  const extracted = JSON.parse(response.text);

  // Validate and sanitize the output
  return this.validateExtractedTasks(extracted.tasks, projectNames);
}
Enter fullscreen mode Exit fullscreen mode

Prompt Engineering Lessons I Learned the Hard Way:

The quality of task extraction lives or dies by your prompt. Here's what actually matters:

1. Context is King
Don't just send the transcription. Send:

  • Current date/time: LLMs need this to parse "tomorrow", "next Friday", "in 2 hours"
  • User's projects: Helps the AI categorize tasks automatically
  • Language: Even if it can auto-detect, being explicit helps
  • Time zone: If your users are global, this matters for "tomorrow at 9am"

2. Structure Your Output Schema Clearly
Be extremely specific about the JSON structure you want. I use TypeScript-style definitions right in the prompt:

{
  "tasks": Array<{
    title: string;           // Max 100 chars
    description?: string;    // Optional, max 500 chars
    dueDate?: "YYYY-MM-DD";  // ISO format only
    priority: "low" | "normal" | "high";
    // ... etc
  }>
}
Enter fullscreen mode Exit fullscreen mode

3. Few-Shot Examples Are Worth 1000 Words
Include 3-5 examples showing:

  • Simple case: "Buy milk" → single task, no date
  • Complex case: "Call John tomorrow at 2pm and email the report by Friday" → two tasks with different due dates
  • Edge cases: Ambiguous priorities, vague timings, multiple projects mentioned

4. Be Explicit About Edge Cases
Tell the LLM what to do when:

  • No actionable items exist ("Just thinking out loud...")
  • Dates are ambiguous ("Friday" when it's currently Thursday)
  • Projects don't match existing ones (create new vs. ignore)
  • Priority isn't mentioned (default to "normal")
  • Multiple tasks are crammed into one sentence

5. Validate Everything
LLMs hallucinate. Your code should:

function validateExtractedTasks(tasks, validProjects) {
  return tasks
    .filter(task => task.title && task.title.length > 0)
    .map(task => ({
      ...task,
      // Clamp priority to valid values
      priority: ['low', 'normal', 'high'].includes(task.priority) 
        ? task.priority 
        : 'normal',
      // Validate project exists
      project: validProjects.includes(task.project) 
        ? task.project 
        : null,
      // Ensure date is valid
      dueDate: isValidDate(task.dueDate) ? task.dueDate : null,
    }));
}
Enter fullscreen mode Exit fullscreen mode

6. Iterate Based on Real Usage
My first prompt worked 60% of the time. After analyzing 100+ failed extractions, I discovered patterns:

  • Users say "urgent" but mean "high priority"
  • "This week" is ambiguous (does it include today?)
  • British vs American date formats cause confusion
  • Some users dictate entire emails, not just tasks

Each discovery led to prompt tweaks and validation rules.

The Result:

With good prompt engineering and validation, the system now handles:

  • ✅ Multiple tasks in one recording
  • ✅ Relative date parsing ("tomorrow", "next week", "in 3 days")
  • ✅ Time extraction ("at 2pm", "in the morning", "by end of day")
  • ✅ Priority detection (from context like "urgent", "important", "when you have time")
  • ✅ Project categorization (matches against existing projects)
  • ✅ Natural language variations (handles different phrasings of the same intent)

Example flow:

User says: "Call mom tomorrow at 2pm, buy milk, 
            and finish that blog post by Friday - make it urgent"

Transcription: "call mom tomorrow at 2pm and buy milk and 
                finish that blog post by Friday make it urgent"

AI extracts: 3 tasks with proper structure
  ↓ Task 1: "Call mom" - tomorrow, 2pm, normal priority
  ↓ Task 2: "Buy milk" - no date, normal priority  
  ↓ Task 3: "Finish blog post" - Friday, high priority

Validation: Check dates are valid, projects exist, priorities are sane

Result: 3 properly structured tasks saved to database
Enter fullscreen mode Exit fullscreen mode

Success rate after optimization: ~95%

The remaining 5% are usually edge cases like:

  • Very long, rambling recordings with no clear tasks
  • Heavy background noise affecting transcription
  • Extremely vague task descriptions ("do that thing")
  • Uncommon date formats or ambiguous references

For these cases, users can manually edit the extracted tasks before saving.

Flowchart

The AI pipeline: Voice recording → Whisper transcription → Gemini extraction → Structured tasks in Firestore → Real-time sync to app. The whole process takes 3-5 seconds.

Cost Considerations:

As an indie dev, API costs matter. Here's what I learned:

  • Whisper (gpt-4o-transcribe): ~$0.006 per minute of audio

    • Tip: Limit recordings to 2 minutes to keep costs predictable
    • Most task lists can be dictated in under 30 seconds
  • Gemini (gemini-2.5-flash): ~$0.00001 per request

    • Super cheap, even with long prompts
    • Flash model is fast enough (200-500ms response time)
  • Total cost per transcription: ~$0.01 on average

    • With 20 free transcriptions/month, that's $0.20 per free user
    • Totally sustainable for a freemium model

Is it perfect? No. Does it work 95% of the time? Yes. And that remaining 5% can be edited manually—which is exactly the right tradeoff for a v1 product.


The ADHD-Friendly Features (Because Dopamine Matters)

Building for ADHD meant focusing on:

  1. Reducing friction (voice input, instant feedback)
  2. Gamification (dopamine hits for completing tasks)
  3. Focus (one thing at a time, hide distractions)
  4. Consistency (daily briefings, streaks, reminders)

Celebrations & Confetti

When you complete a task, you get:

  • ✨ Confetti animation
  • 💬 Encouraging message ("You're crushing it!" or "One down, more to go!")
  • 📳 Haptic feedback (varies by intensity—first task of the day gets a gentle tap, completing everything gets a double-tap)
  • 📊 Streak counter (maintain your momentum!)
// Services/CelebrationService.swift
func celebrate(for task: TaskItem, context: CelebrationContext) {
    let intensity = determineIntensity(context)

    switch intensity {
    case .light:
        triggerHaptic(.medium)
        showMessage("Nice work!")
    case .medium:
        triggerHaptic(.success)
        showConfetti()
        showMessage("You're on fire! 🔥")
    case .intense:
        triggerHaptic(.doubleSuccess)
        showConfetti(amount: .lots)
        showMessage("ALL DONE! Take a break, you earned it! 🎉")
    }
}
Enter fullscreen mode Exit fullscreen mode

Confetti animation when completing a task

Daily Briefing Notifications

Every morning at 8am (customizable), WhisperPlan sends a notification:

"Good morning! You have 5 tasks today. Top priority: Finish blog post"

Tap the notification → opens daily briefing view with:

  • Weather-appropriate greeting
  • Task count
  • Top 3 priority tasks
  • "Start Focus Mode" button
// Services/NotificationService.swift
func scheduleDailyBriefing(time: DateComponents) {
    let content = UNMutableNotificationContent()
    content.title = "Good morning!"
    content.body = "You have \(taskCount) tasks today. Top priority: \(topTask.title)"
    content.sound = .default

    let trigger = UNCalendarNotificationTrigger(
        dateMatching: time,
        repeats: true
    )

    let request = UNNotificationRequest(
        identifier: "daily-briefing",
        content: content,
        trigger: trigger
    )

    UNUserNotificationCenter.current().add(request)
}
Enter fullscreen mode Exit fullscreen mode

What's Next: The Roadmap

WhisperPlan Beta is live on TestFlight now! You can find it at https://testflight.apple.com/join/5XCdyGDr

What's next:

  1. Better analytics: Time-tracking insights, productivity patterns, weekly/monthly summaries
  2. Collaboration: Share projects with others, assign tasks, real-time updates
  3. More AI features: Smart scheduling, task priority suggestions, context-aware reminders
  4. Apple Watch app: Quick voice recording, timer control, task completion from wrist
  5. Siri Shortcuts: "Hey Siri, add task" → voice recording → tasks created

Questions for the community:

  • What features would you want in a voice-first todo app?
  • How do you handle task organization? (Projects? Tags? Contexts?)
  • What's your biggest pain point with existing todo apps?

Drop your thoughts in the comments! I'm actively building based on feedback.


Conclusion: Build, Measure, Rebuild (And That's Okay!)

Here's what I learned building WhisperPlan:

1. It's okay to get the architecture wrong the first time.
I built a slow, over-engineered backend-heavy app. Then I measured it. Then I fixed it. That's not failure—that's iteration.

2. Use platform capabilities instead of abstracting them away.
Firestore has an iOS SDK. SwiftData has offline support. Firebase has real-time listeners. I could have saved myself weeks by using these from the start.

3. Measure everything.
I didn't measure initially. That was dumb. When I finally measured, I found obvious problems. Now I measure everything.

4. Security rules are underrated.
Firestore security rules let you have the speed of direct database access with the security of backend validation. This is the secret sauce of the hybrid architecture.

5. The joy of indie development: you can rewrite everything.
No committees. No architectural review boards. No "but we've always done it this way." Just you, your code, and the freedom to say "this is dumb, let's make it better."

6. Build for yourself first.
I built WhisperPlan because I needed it. I have ADHD. I hate typing on my phone. I forget things constantly. Every feature is something I wanted. And that authenticity shows.

WhisperPlan is live. It's fast. It's voice-first. It has confetti.

Try it out: whisperplan.app

And if you're building your own indie app, remember: it's okay to rewrite half of it. Sometimes that's exactly what you need to do.


Acknowledgments

Thanks to:

  • Everyone who beta tested and gave feedback
  • The SwiftUI community for endless Stack Overflow answers
  • Firebase for building such amazing tools
  • My therapist for helping me manage the ADHD that inspired this app
  • Coffee. So much coffee.

If you made it this far, you're amazing. Go build something cool. Or take a nap. Both are valid choices.

— Isidore

P.S. If you have ADHD and this app sounds useful, try it out! And if you don't have ADHD but you hate typing on your phone, also try it out! And if you're a developer curious about hybrid architectures, I hope this post was helpful!

P.P.S. The backend code is not open source, but if you have specific questions about the architecture, hit me up in the comments

Top comments (0)