·Noted Team

Meeting Transcription: Cloud vs On-Device Privacy Guide

Understand the privacy tradeoffs between cloud and on-device meeting transcription. Learn how Whisper AI enables local processing and why data sovereignty matters.

Meeting transcription privacy is not a feature checkbox. It is an architectural decision that determines who has access to your most sensitive conversations. In 2026, the gap between cloud-processed and on-device transcription has never been wider. This guide explains what happens to your audio in both models and why it matters.

How Cloud Transcription Works

When you use most AI meeting tools, your audio takes this journey:

  1. Capture. Your microphone (or a bot in the meeting) records raw audio
  2. Upload. The audio file is transmitted over the internet to the provider's cloud infrastructure
  3. Processing. A speech-to-text model (typically hosted on AWS, GCP, or Azure) converts audio to text
  4. AI Enhancement. The transcript is sent to another service (often OpenAI's API) for summarization, action item extraction, and other intelligence
  5. Storage. The transcript and sometimes the raw audio are stored in the provider's database
  6. Delivery. The processed results are sent back to your device

At minimum, your unencrypted audio exists on two different cloud platforms during processing. In many cases, three or more organizations handle your data before you see a transcript.

The Risks Are Not Hypothetical

Cloud processing creates concrete risks:

  • Data breaches. Any server that holds your audio is a target. Meeting recordings are high-value data. They contain strategy discussions, financial projections, personnel decisions, and intellectual property.
  • Subpoena exposure. Data stored on US servers is subject to US law enforcement access, regardless of where the meeting participants are located.
  • Training data concerns. Some providers use customer data to improve their models. Even with opt-out policies, the legal language often leaves room for anonymized usage.
  • Third-party API sharing. When a meeting tool uses OpenAI's API for summarization, your transcript travels to yet another company's infrastructure. OpenAI's API data retention policy has changed multiple times since 2023.
  • Employee access. Cloud providers typically have internal access controls, but employees can and do access customer data, whether for debugging, quality assurance, or malicious purposes.

How On-Device Transcription Works

On-device meeting transcription takes a fundamentally different approach:

  1. Capture. System audio and microphone input are recorded locally
  2. Processing. A speech-to-text model running on your hardware converts audio to text
  3. AI Enhancement. Local language models extract summaries, action items, and intelligence
  4. Storage. Everything stays in a local database on your machine

No upload. No third-party processing. No server-side storage. The audio never touches a network interface.

What Makes On-Device Possible in 2026

On-device transcription was impractical five years ago. Two developments changed that:

Apple Silicon performance. The M-series chips (M1 through M4) include dedicated Neural Engine cores designed for machine learning inference. An M1 MacBook Air can run Whisper's base model in real-time. An M3 Pro handles the large model comfortably.

Whisper by OpenAI. Released as open-source in 2022, Whisper is a speech recognition model that runs locally. It supports over 90 languages and approaches human-level accuracy on clear audio. The model weights are freely available, so anyone can run it without an API call.

Together, these make on-device transcription not just possible but practical for everyday use on modern Macs.

Whisper Model Sizes and Tradeoffs

Whisper comes in several sizes. Larger models are more accurate but require more processing power:

| Model | Size | Speed Factor | Best For | |-------|------|-------------|----------| | Tiny | 75 MB | 32x real-time | Quick drafts, low-power devices | | Base | 142 MB | 16x real-time | Daily use on any Apple Silicon Mac | | Small | 466 MB | 6x real-time | Good accuracy for most meetings | | Medium | 1.5 GB | 2x real-time | High accuracy, technical vocabulary | | Large | 3 GB | 1x real-time | Maximum accuracy, M2 Pro+ recommended |

Speed factor means how much faster than real-time the model processes audio on a typical M2 Mac. A 60-minute meeting processed at 6x takes about 10 minutes with the Small model.

Data Sovereignty: Why It Matters

Data sovereignty is the principle that data is subject to the laws of the country where it is stored. When your meeting audio goes to a US-based cloud service:

  • It falls under US jurisdiction, including the CLOUD Act (which allows US law enforcement to compel disclosure of data stored abroad by US companies)
  • GDPR compliance becomes complex. Transferring EU citizen data to US servers requires specific legal mechanisms that have been repeatedly challenged in European courts
  • Industry regulations like HIPAA (healthcare), SOX (financial), and attorney-client privilege create additional constraints on where conversation data can reside

On-device processing sidesteps all of this. Your data stays on your hardware in your jurisdiction. There is nothing to subpoena from a server because there is no server.

The Hybrid Approach

Some tools offer a middle ground. Granola, for example, captures audio locally and avoids bots, but sends transcript data to OpenAI's GPT-4 for AI enhancement. This is better than full cloud processing (no raw audio is uploaded), but your transcript text still travels to a third-party service.

Noted takes the fully local approach. Both transcription (via Apple Speech or local Whisper) and AI processing happen on your Mac. If you enable optional cloud features like cross-device sync, only processed text summaries are transmitted. Never audio, never raw transcripts.

When Cloud Processing Makes Sense

To be fair, cloud processing has legitimate advantages:

  • Scale. Processing thousands of hours of meeting audio across an organization is easier in the cloud
  • Cross-device. Cloud tools work on any device with a browser, while on-device tools depend on local hardware
  • Continuous improvement. Cloud models can be updated instantly without user downloads
  • Collaborative features. Real-time shared transcripts across participants are easier with a central server

For large enterprises with dedicated security teams and compliance infrastructure, managed cloud solutions can be appropriate. The risk calculus is different when you have a CISO and a legal team reviewing vendor contracts.

When On-Device Processing Is the Right Choice

On-device is the right choice when:

  • You handle sensitive conversations: client meetings, board discussions, HR, legal, medical
  • You work in a regulated industry where data residency matters
  • You value personal privacy and do not want third parties accessing your conversations
  • You need to work offline: planes, remote locations, unreliable internet
  • You want transparency: with open-source on-device models, you can verify exactly what the software does

How Noted Handles Meeting Transcription

Noted is built around the on-device model:

  1. System audio capture records any meeting without a bot
  2. Apple Speech or local Whisper transcribes on your Mac
  3. On-device AI extracts summaries, action items, decisions, and commitments
  4. GRDB (SQLite) stores everything locally on your machine
  5. Optional cloud sync transmits only text summaries, encrypted end-to-end

You get the intelligence of cloud-based tools with the privacy of a local notebook. Check the features page for the full capability breakdown, or explore the meeting transcription use case in depth.


Your meeting conversations are some of the most valuable and sensitive data you produce. The transcription architecture you choose determines who else has access to them. Choose deliberately.

Download Noted and keep your conversations on your machine.