Privacy on Mobile: a practitioner’s checklist / Хабр

People have always valued privacy. Developments of the past decades — the internet, social networks, targeted advertising — turned data into an asset. The AI wave multiplies what can be inferred from crumbs. As a predominantly mobile engineer, I’ve worked on apps that manage email inboxes, upload photos and videos to the cloud, and provide encrypted messaging. Phones and apps are integral to people’s lives. Some users keep everything on their phones; others are more restrictive. It shouldn’t rely only on user awareness: developers should provide the first line of defence and the tools that protect a user’s right to privacy. From an engineering perspective, privacy is an engineering constraint. Use a simple lifecycle — collect → store → process → send → delete — and treat each stage as a surface with budgets (fields, TTLs, queue sizes) and controls (consent, encryption, redaction, access). Even if you already deal with most of these pieces daily, I want to share my mental model — how I frame decisions with checklists and a few concrete examples from practice.

1. Privacy, precisely

What is privacy exactly? There are dozens of definitions, and I won’t pick a single one — most of us have a natural feel for it. One note that will help as we go: privacy is not security. We won’t dive into specific encryption algorithms or attacks; assume the mathematical and technical foundations work. The question is when and why to use them. My oversimplified view of privacy on mobile limits why we process, how much we move, and how long we keep the data. The lifecycle below isn’t a strict timeline; a feature can hit “send” more than once. The point is to treat each stage as a privacy surface. We’ll use that model throughout — threats first, then what counts as personal data, lawful grounds, minimisation/retention, and telemetry boundaries.

1.1 Start with a threat model

Threats decide where we spend effort. Imagine we’re building a secure chat client. Attackers aren’t only people trying to break the cryptographic protocol. Most incidents are boring: a debug log prints a phone number; the app‑switcher snapshot shows a private chat; a crash report scoops up payload fragments; a WebView cache outlives a user’s logout. Add two more realities: insiders with too much access to analytics dashboards, and lawful requests where scope and auditability matter. We also consider external factors such as device theft, rooted/jailbroken phones, and network attacks, though some are hard to prevent entirely. Day to day, the biggest wins come from removing accidental leak paths and proving we can answer narrowly when required. In the table below, I highlight which parts of the lifecycle are more prone to a given threat (read it as “watch closely during reviews”).

Threats / surfaces	Collect	Store	Process	Send	Delete
Accidental leak	●●	●●●	●●	●●	●●●
Insider overreach	●	●●	●●●	●●●	●
Attacker (device/net)	●	●●●	●●	●●●	●
Lawful request	●	●	●	●●●	●●●

1.2 What counts as personal data (so we speak the same language)

This is a regulated space, and I’m not giving legal advice. Fortunately, even though regulators differ (UK/EU GDPR, India’s DPDP, US state laws), a safe engineering stance is consistent:

Personal data: anything that relates to an identifiable person, directly or indirectly. On mobile, assume most telemetry qualifies.
Some fields are sensitive (e.g. health, precise location, biometrics used for unique identification). These need extra care.
Pseudonymised: data is still personal if we can re‑link. In a chat client, a mapping table or login token keeps it in scope.
Anonymous: rare in practice. Don’t label analytics “anonymous” unless you genuinely can’t re‑identify them.

“Is this personal?” quick tree.

When I say “treat as personal”, include it in consent, minimisation, retention and deletion decisions.

1.3 Lawful grounds to process

Data flows from users through your systems and back again. Regardless of complexity, every data flow needs a reason the law accepts. In apps, three show up again and again:

Contract — necessary to deliver the service the user asked for. In a chat client: sending messages, push delivery, device tokens for contactability. In a photo app: geo‑location to tag photos.
Consent — a clear, affirmative choice the user can withdraw as easily as they gave. In a chat client: optional diagnostics, marketing, experimental models that aren’t essential. In a photo app: allowing on‑device training of vision models.
Legitimate interests — a documented balance test (purpose, necessity, impact). In a chat client: anti‑abuse signals, coarse reliability telemetry.

You don’t need to be a lawyer to use these well — though carefully listing all attributes can make you feel like one — but you do need to write them down per data flow. It keeps debates concrete.

1.4 Minimisation and retention in practice

Here we move closer to execution — specifically how to work with code. In your app minimisation might mean:

Contact discovery without uploading full address books (hashes or private‑set intersection).
Event schemas that don’t include message content or full phone numbers; store a reference instead.
On‑device ML for spam detection, sending only model updates or coarse aggregates if you need fleet‑level learning.

Retention time‑boxes everything:

Messages cached on device until delivered or a short TTL; after that, wiped.
Encrypted offline queue with a maximum age and size; old items are dropped rather than shipped late.
Crash/diagnostic data expire locally; server retention mirrors the client.
Logout and account deletion wipe Keychain/Keystore entries, databases, caches, and WebView storage, not just app tables.

One of the most important aspects of retention is deletion. It’s important yet easy to overlook, which is why deletion needs an owner. Ship a small deletion orchestrator that wipes local state and coordinates server calls. It clears Keychain/Keystore, the encrypted DB, caches and WebView storage, and drops undelivered telemetry past TTL. Features don’t hand‑roll erasure; they call the orchestrator. After the server ack, the wipe finishes within seconds and the app broadcasts a “data erased” signal so modules purge in‑memory state too.

1.5 Telemetry boundaries (logs are data)

Telemetry is where strong designs die by a thousand cuts. From small start‑ups to giants, teams are data‑driven, and decisions ride on logs and analytical events. If your app brokers encrypted conversations between two people, a single “helpful” log line can link the parties and make traffic analysis trivial. Treat dynamic values as private by default. Redact at the call‑site. Keep sampling and kill‑switches on device so you can turn noisy families off without a new build. Give telemetry the same deletion guarantees as product data: queues age out; debug captures don’t survive a day; test‑only trust stores never ship in Release.

2. Why it matters

I get the urge to be first to market: ship fast, sort the details later. With that mindset, privacy can feel like a nice‑to‑have. But privacy isn’t just ethics; it’s how we keep shipping without surprises. We’ll stay with the chat client. We use a robust cryptographic protocol for end‑to‑end messages. That’s necessary, but not sufficient. If the client leaks around the edges, the protocol won’t save us.

Two common failure modes:

Listing vs reality. The store page says we don’t collect contact details, yet a diagnostics SDK includes emails in crash payloads. That’s review ping‑pong at best; a block at worst.
Consent drift. Analytics start sending before the user decides. Our standards say “ask first”; the build says otherwise. Strong cryptography, weak discipline.

Platform rules set the floor

We don’t build in a vacuum. We ship on iOS and Android, with their policies, forms and runtime checks. Treat these as engineering constraints.

On iOS, the “App Privacy” section must match what the binary does — first‑party code and SDKs. If they diverge, submission stalls.
On Android, “Data safety” and the privacy policy must accurately describe collection, sharing and protections. If the app behaves differently to the form, you’ll feel it at submission time.

Design with that in mind: if the chat client logs a push_token, it’s personal data and must be declared. If contact discovery uploads hashed numbers, it’s still collection. If diagnostics are optional, consent lives in the UI before the first send.

Third‑party SDK intake. SDKs go through the same constraints as our code. Be at least twice as cautious with third parties: vulnerabilities in your code are usually unintentional; with an SDK they can be both accidental and deliberate. Each SDK gets a one‑page intake card: purpose + lawful ground, exact fields collected, retention on their side, consent gate and kill‑switch, endpoints/pinning, and whether store disclosures change. Wrap the SDK behind your interface and route its sends through the same interceptors and encrypted queue. Until consent, the wrapper is a no‑op.

It pays back in delivery speed

We shouldn’t block creativity or delivery speed with red tape. The trick is to enrich workflows so checks are cheap. A small Privacy Block in the PR sets the purpose, grounds, budgets and deletion path. SDKs go through a repeatable intake. Telemetry defaults are safe by design. The store forms are easy because they reflect the same source of truth. Fewer “what is this event?” threads. Fewer last‑minute fixes.

Privacy SLOs help. Track a few budgets like performance: queue TTL ≤ 7 days; max queue size ≤ 500; diagnostics default ≤ 1%; kill‑switch propagation < 10 minutes. Secrets only in Keychain/Keystore. No Release build contains test trust roots. Tune numbers to your product and enforce them in CI/canary.

3. Mobile specifics

Client‑side privacy patterns look similar across user‑facing apps, but modern phones don’t trail desktops — they often exceed them in what they can collect. A phone lives in your pocket, not on a desk. It carries sensors (location, camera, mic, Bluetooth), share sheets and pasteboard, and background tasks that run when the screen is off. That creates more ways to collect, more places to stash data, and more chances to leak.

3.1 On‑device data: pick the right home

Most data starts life on the device. Treat it like hazardous material: move less of it, and keep it where escape is least likely.

If we go back to our chat client, that translates into:

Secrets and credentials — identity keys, session keys, refresh tokens — live in Keychain/Keystore with the tightest accessibility you can tolerate. Where hardware‑bound keys and biometry are available, require them for user‑initiated actions (e.g. revealing a recovery phrase). Keep secrets out of shared app‑group containers; share handles, not keys.
Business records — message metadata, device records, delivery receipts — belong in an encrypted database(Core Data/Room + encryption, or an encrypted file store). You get indexing and sync‑friendly queries without turning the keychain into a database.
Derived/temporary — thumbnails, model features, prefetch caches — go to ephemeral caches with TTLs. If you can regenerate it, don’t back it up.

Backups and snapshots. Exclude caches and generated artefacts from backup. On Android, use backup rules; on iOS, set the do‑not‑backup flag for throwaway files. Obscure sensitive screens in the app switcher (Android: FLAG_SECURE; iOS: blur/hide the background snapshot). Avoid pasteboard leaks on chat screens.

Work profiles and Direct Boot. On Android, secrets should live in credential‑encrypted storage, not device‑encrypted space available before first unlock. In work‑profile deployments, respect cross‑profile boundaries; don’t leak personal identifiers into the work side.

3.2 Logging & telemetry: don’t let observability undo privacy

Redact at source. Treat dynamic values as private by default. If you must print identifiers, mask or hash at the call‑site. Structured events with typed fields beat ad‑hoc strings.
Never join both ends. Don’t record sender_id and recipient_id together in one event. When pairing is genuinely required, use synthetic correlation that rotates often.
Budgets live on device. Sampling rates, queue TTLs and a global kill‑switch should be switchable without a new build. Noisy families die fast.
Crash reports & traces obey the same rules. Scrub payloads at SDK boundaries; redact request/response bodies before persistence; keep local captures short‑lived.

3.3 Network boundaries

When people think “network”, they jump to security: HTTPS, certificates, proxies. In this prism, I see the network as the moment data leaves the device.

Interceptors first. Everything outbound flows through interceptors that enforce redaction, apply idempotency keys, and write to an encrypted offline queue. That queue has hard limits: age, size, and retry budget with back‑off and jitter. If the budget is exhausted, drop rather than ship stale data.
Pin to an identity you can rotate. Use TLS properly and pin to a stable identity (SPKI or an intermediate), not a single leaf cert. Ship overlapping pinsets so you can add the new one, then retire the old one later. Never rotate in a single step.
mTLS sparingly. Mutual TLS helps when you must strongly bind a device to a private API, but it complicates rotation and recovery. Use it when the benefit is clear; otherwise rely on least‑privilege tokens with strong pinning.
Debug without foot‑guns. Proxies and custom trust stores stay Debug‑only. Ship toggles that enable extra logs or local capture only on non‑production builds. Never add test roots to Release.

3.4 ML on device: keep learning local

ML in apps isn’t new, and with the AI boom we’ll see even more of it. I’m especially interested in on‑device models — they have real potential to be your private helper, compared to large server models that also feed on inputs. Examples I’ve shipped or reviewed: spam scoring, media quality adaptation, keyboard suggestions.

Inference first. Run models on device. Inputs never leave; outputs are small, derived signals (e.g. “spam probability 0.91”, not message text). Cache features with short TTLs.
Updates are code. Treat model files like binaries: sign, verify, stage, activate, and keep a rollback. Canary new versions to a small percentage before full rollout.
If you must learn across devices. Prefer federated approaches where updates (not raw data) are aggregated. Add guard‑rails: minimum cohort sizes, noise where appropriate, and strict privacy budgets for any evaluation telemetry. Never ship gradients or metrics that can be tied back to message content or a single user.
Model drift without hoovering data. Track coarse, non‑identifying signals: version adoption, on‑device confidence histograms, rate of “disagree” events (user marks “not spam”). Enough to detect regressions without collecting inputs.

4. Checklists & templates (copy‑paste ready)

Short and opinionated.

4.1 Design doc snippet

## Privacy summary (for this design)

**Threats in scope:** [accidental logs, snapshots, cache residue, lawful request scope]

**Grounds:** [Contract for delivery; Legitimate interests for anti‑abuse (LIA‑xxx); Consent for diagnostics]

**Minimisation:** [fields reduced; no message content in telemetry; phone numbers not stored, only salted hashes]

**Retention:** [queue TTL 7d; diagnostics 30d; local caches TTL 24h; server mirrors client]

**Send gating:** [all sends through interceptors → redaction → encrypted queue → retry/backoff]

**Deletion owner:** [Deletion orchestrator clears keychain/keystore, DB, caches, WebView; drops stale queue items]

4.2 SDK intake card

# SDK Intake

**Name & version:**
**Owner:**
**Purpose:**
**Grounds:** [Contract / Consent / Legitimate interests (LIA link)]

## Data contract
| Field | PII class | Purpose | Retention (SDK side) | Notes |
|------|-----------|---------|----------------------|-------|

**Runtime controls:** consent gate? per‑country toggles? kill‑switch path?

**Routing:** domains; proxy? pinning policy? uses our interceptors? [Yes/No]

**Access:** secrets? [None] tokens? [short‑lived only]

**Store impact:** privacy details / Data safety update required? [Yes/No]

**Test plan:** block until consent; verify no sends with Debug off; verify kill‑switch

4.3 Telemetry event spec (schema template)

event: delivery_result
version: 3
purpose: reliability_metrics
grounds: legitimate_interests  # LIA-123
sampling: 0.01
retention_days: 30
fields:
  - name: sender_id
    type: string
    pii: pseudonymous
    join_with: forbidden  # never with recipient_id
  - name: result
    type: enum { success, timeout, error }
    pii: none
  - name: latency_ms
    type: int
    pii: none
kill_switch: events.delivery.*
notes: "No content, no phone numbers, no emails"

Lint rules (house style): dynamic values default to private; redaction at call‑site. No event may include both sides of a conversation. Every event needs sampling, retention_days, and a kill_switch pattern.

4.4 “Which store when?” cheat sheet

Data shape	Where it lives	Why
Secrets/credentials (keys, tokens)	Keychain/Keystore (hardware‑bound if available)	Non‑exportable, access control, right unlock semantics
Records (metadata, receipts)	Encrypted DB (Core Data/Room or encrypted file store)	Queryable, sync‑friendly, rotation without key loss
Derived/temporary (thumbnails, features)	Ephemeral cache with TTL, not backed up	Safe to drop, cheap to rebuild

Backups/snapshots: exclude caches from backup; secure the app‑switcher snapshot; avoid pasteboard leaks.

4.5 Network belt (interceptors → queue → transport)

Interceptors apply redaction, idempotency keys, and write to an encrypted queue.
Queue has TTL and max items; stale drops instead of “eventually sending”.
Transport enforces TLS, pinning to SPKI/intermediate, and retry with back‑off + jitter.
Debug roots/proxies never ship in Release.

Pin rotation playbook (one paragraph): ship A+B → confirm adoption → ship B+C → remove A → later remove B.

4.6 Deletion orchestrator (client wipe)

**Triggers:** logout, account deletion, DSAR erase

**Steps:**
1) Cancel background tasks; freeze queues.
2) Wipe keychain/keystore entries (access group aware).
3) Drop encrypted DB / tables that hold personal data.
4) Clear caches (files, image stores); secure snapshot artefacts.
5) Clear WebView storage and cookies.
6) Drop queued telemetry older than TTL.
7) Broadcast "data_erased" to modules; clear in‑memory caches.
8) Verify empty via quick audit hooks; resume with a clean state.

4.7 Release checklist (privacy)

[ ] App Store privacy details reflect reality (incl. SDKs).
[ ] Play “Data safety” + privacy policy updated from the data dictionary.
[ ] No debug trust roots or proxy bypass in Release.
[ ] Pin rotation overlap active; rollback plan tested.
[ ] Kill‑switch toggle observed in prod canary within <10 min.
[ ] Deletion orchestrator e2e tested (local wipe + server).
[ ] Snapshot test passes; logs show only redacted data.
[ ] SDK intake cards reviewed; toggles/consent verified.

4.8 DSAR (access/erasure) quick script (client responsibilities)

**Access:** fetch user‑visible artefacts (profile, devices, settings). Do not resurrect deleted caches while viewing exports.

**Erasure:** run deletion orchestrator; confirm server ack; show status to the user; ensure queues don’t send post‑erasure.

**Edge cases:** offline device at deletion time; work profile; app re‑install. Document behaviour for each.

4.9 “Ten smells” that trigger a privacy review

New SDK added without an intake card.
Leaf‑cert pinning and a calendar reminder to rotate.
Event schema missing retention_days.
Event joins both ends of a private interaction.
Secrets appear in an app‑group container or DB.
“Temporary” files that are in backups.
Debug trust roots in a Release artefact.
Consent buried in Settings.
Crash reports contain request/response bodies.
Deletion means “delete server row” only.

Collect less, move less, keep less — and delete on time. Guard rails in the client beat reminders in docs. Push every change through the same belt: consent, redaction, budgets, storage choice, network gate, deletion. If that sounds boring, good — boring privacy is exactly what we want.

Privacy on Mobile: a practitioner’s checklist