Deepfake Detection in Video Calls | Identity Verification

Written by Nina Benkotic | April 17, 2026

In January 2024, a finance employee at Arup, the multinational engineering firm, received what looked like a routine video call invitation. The invitation followed a phishing email about a secret transaction, which the employee had found suspicious. Rather than raise the alarm, he joined the video call, and what he saw there dissolved his doubts: the CFO was on screen, several familiar colleagues were present, and there was an urgent wire transfer request on the agenda. Everything looked normal. Everything sounded normal.

None of it was real.

Every person on that call was a deepfake: the CFO, the colleagues, the entire meeting. All of it was AI-generated synthetic video fed in real time. The attacker had not bypassed any platform access controls; the employee joined the call voluntarily after being brought there through social engineering. By the time the fraud was discovered, HK$200 million (approximately US$25 million) had been transferred out of the company's accounts. It remains the largest confirmed case of deepfake video call fraud against a corporate target.

The Arup case didn't just make headlines. It changed how security professionals think about video conferencing. If a trained finance professional can be deceived into authorising a $25 million transfer by a synthetic video call, the question is no longer whether your organisation could fall victim to this kind of attack. The question is whether your video platform and your processes are built to stop it.

This article breaks down how deepfake threats work in video environments, why your current defences may have a critical gap, and what genuinely effective protection looks like in 2025 and beyond.

Table of contents

The scale of the deepfake threat
How deepfakes threaten video calls specifically
The authentication gap: why encryption alone isn't enough
Defence approaches that actually work
How Digital Samba protects participant authenticity
Building a deepfake defence strategy
FAQ

The scale of the deepfake threat

The Arup case wasn't an isolated incident. It was a preview.

The deepfake detection market tells the story in numbers. Valued at $5.5 billion, analysts project it will reach $15.7 billion by 2026, growing at a 42 per cent compound annual growth rate, according to figures Deloitte cited in a November 2024 analysis. That level of investment doesn't happen unless the threat is real and growing.

The human side of the equation is more alarming. Research from Keepnet found that people correctly identify deepfakes only 24.5 per cent of the time. That's worse than a coin flip, and it means your employees are the wrong last line of defence against a deepfake fraud video call.

Enterprise exposure has accelerated sharply. Resemble AI tracked 980 corporate infiltration cases involving synthetic media in Q3 2025 alone, drawn from global media monitoring across that period. These weren't phishing emails or smishing attacks; they were coordinated attempts to infiltrate businesses through AI-generated personas on video calls. Meanwhile, Gartner has projected that by 2027, 50 per cent of enterprises will be investing in disinformation security products and strategies, up from less than 5 per cent just a few years ago, having recognised that traditional defences don't hold up against generative AI.

If your organisation runs video calls for onboarding, executive approvals, financial authorisations, or compliance sign-offs, that threat is directly relevant to you.

How deepfakes threaten video calls specifically

Can you fake a video call? The uncomfortable answer in 2026 is yes. You can do it convincingly, in real time, and at relatively low cost.

There are three primary attack vectors in a deepfake video call environment:

Face swapping and synthetic video feeds. An attacker captures video of their target (from LinkedIn, YouTube interviews, company websites, or prior leaked recordings, for example) and uses a generative model to map that face onto their own in real time. The result is a live video feed that shows the victim's face with natural-looking head movements, blinking, and realistic lip-sync video AI tools. Tools that could once only do this in post-production now operate with sub-second latency.
Voice cloning. Separately from video, voice cloning can replicate a person's speech patterns, accent, and cadence from as little as three to five minutes of audio. Combined with face swapping, attackers can produce a fully synthetic audio-visual persona. A fake video call from a cloned CFO giving verbal approval for a wire transfer is, to the human ear and eye, indistinguishable from the real thing.
Injected synthetic video streams. More sophisticated attacks bypass the camera entirely. Rather than modifying a live webcam feed, attackers use virtual camera software to inject a pre-generated or AI-synthesised video stream directly into the conferencing client. The platform receives what appears to be a legitimate camera feed, and there is no real-time processing required on the attacker's end.

These capabilities power several categories of real-world attack:

CEO impersonation and financial fraud: exactly what happened at Arup. An attacker synthesises a senior executive and uses the persona to authorise transactions or access.
Recruitment fraud: synthetic candidates attend job interviews, pass screening, and gain employment or access to internal systems. The FBI issued a warning about this exact pattern in 2022, and the frequency has grown significantly since.
KYC bypass: attackers use synthetic identities to pass Know Your Customer video verification checks at financial institutions.
AI impersonation in video conferences: synthetic board members, investors, or regulators appearing on calls to manipulate decisions or extract sensitive information.

Video calls are uniquely vulnerable to all of this for a simple reason: we've been trained to trust what we see and hear on a video call in a way we never would with an email. A suspicious email gets scrutinised. A confident, visually convincing 'CFO' on screen gets believed, especially when the request is framed as urgent and confidential.

The authentication gap: why encryption alone isn't enough

Many organisations, after reading about these threats, immediately think about their encryption posture. End-to-end encryption, TLS in transit, AES-256 at rest. Surely that covers it?

Encryption protects the channel. It does not verify who is on the other end of it.

Think of it this way: a sealed envelope guarantees that nobody opened the letter in transit. But it tells you nothing about whether the person who sent it is who they claim to be. In video conferencing, encryption prevents a third party from intercepting your call. It does nothing to prevent an attacker who has already synthesised the CFO's face from participating in that call as an authenticated participant.

This is the authentication gap, and it's where most enterprise video security postures have a real blind spot.

Two broad approaches have emerged to close it:

AI-based deepfake detection attempts to analyse video streams in real time and identify artefacts of synthesis such as unnatural blinking patterns, edge anomalies around the face, inconsistencies in lighting, or micro-expression irregularities that generative models still struggle to replicate perfectly. The challenge is that detection models are always chasing generation models. As synthesis quality improves, detection accuracy degrades. It's an arms race, and the detection side is always playing catch-up.
Cryptographic identity verification takes a different approach entirely. Rather than trying to spot the fake after it has joined the call, cryptographic verification ensures that only pre-verified, authenticated participants can join in the first place. The identity check happens before the session, not during it. This is not an AI problem to be solved, but an access control problem to be designed correctly from the start.

The strongest security postures combine both. But if you're choosing where to invest first, the cryptographic layer is the more reliable foundation.

Defence approaches that actually work

AI-based deepfake detection tools

A category of dedicated deepfake detection tools has emerged to address the real-time identification problem. These include platforms like:

Facia, which offers liveness detection and biometric analysis;
Reality Defender, which runs probabilistic detection across multiple modalities;
Pindrop, which focuses on voice-based deepfake identification;
UncovAI, which analyses video artefacts at the frame level.

Zoom has also been rolling out built-in deepfake detection as part of its Workplace platform, including an integration with Pindrop for contact centre use cases announced in early 2026.

These tools are improving rapidly, but they carry inherent limitations. Detection accuracy degrades as generation quality improves. They typically require additional integration into existing conferencing workflows, and they generate false positives that create friction for legitimate participants, which is a real concern in regulated environments where executive calls cannot afford interruption.

As one layer in a defence stack, they add real value. As your primary control, they're not sufficient.

Cryptographic identity verification

Solutions built around cryptographic identity verification address a different part of the problem. Rather than analysing what someone looks like during a call, cryptographic verification confirms that the person joining has already passed a verified identity check and holds a valid, unforgeable session credential.

This is implemented through token-based authentication systems where identity is asserted before the call begins. A participant cannot join a session without a cryptographically signed token issued to a verified identity. If someone attempts to impersonate a colleague using a synthetic face, they won't have that token, and they won't get in.

Token authentication has a clear limit, though. It verifies the credential at entry, not the face on screen during the call. Once a legitimately credentialled participant has joined, the token layer cannot detect a face-swap running on their device. An insider with a valid token, or an attacker who has obtained one through social engineering, could still conduct an in-session impersonation. Token authentication is a strong first control; it is not the complete answer on its own.

C2PA and content provenance standards

The Coalition for Content Provenance and Authenticity (C2PA) standard is backed by founding members including Adobe, Arm, BBC, Intel, Microsoft, and Truepic. It provides a framework for cryptographically signing media at the point of capture, creating a verifiable chain of provenance that links a video stream back to a specific, authenticated device. Applied to video conferencing, this would allow platforms to attest that a stream originated from a genuine device rather than a synthetic generator.

C2PA adoption in live video conferencing is still at an early stage. C2PA 2.3, released in December 2025, extended the standard to live streaming, but implementation in conferencing clients remains experimental. There is also a known limitation: many platforms strip embedded metadata during transcoding, which can break the provenance chain. These are solvable problems, and C2PA represents the most promising long-term architectural direction for deepfake video call detection at scale.

Liveness detection and behavioural challenges

Liveness detection systems require participants to perform random physical actions (tracking a moving object, turning their head to a specific angle, blinking on cue) that generative models cannot anticipate and synthesise in real time. Combined with challenge-response protocols, liveness detection raises the cost of AI video call impersonation attacks.

That said, liveness detection is most effective against presentation attacks, where someone holds a photo or plays a video to the camera. It is weaker against the injected-stream attacks described earlier in this post, where a synthetic feed is inserted directly into the video pipeline and can be engineered to respond to challenges. Treat it as one useful layer, not a standalone defence.

Zero trust for video identity

The Zero Trust principle, 'never trust, always verify', translates directly to video conferencing security. A Zero Trust video identity framework means:

Every participant is authenticated before joining, not assumed legitimate because they have the link
Session credentials are scoped, short-lived, and cryptographically signed
Role permissions are enforced server-side and cannot be escalated from the client
Every authentication event is logged for audit

How Digital Samba protects participant authenticity

Video call identity verification at Digital Samba is built on a fundamentally different model than AI-based detection. The approach is architectural: prevent unverified participants from joining in the first place, rather than attempting to identify synthetic media after it has appeared on screen.

E2EE with security verification codes

Digital Samba's end-to-end encryption implementation includes security verification codes, which are short cryptographic fingerprints derived from the session's encryption keys. When two participants compare their verification codes out-of-band (by voice, by message, or visually), they can confirm cryptographically that no man-in-the-middle is present and that both parties are genuinely connected to the same encrypted session.

That's not AI video call analysis. It's a mathematical proof. If the codes match, the session is authentic. The check cannot be spoofed by a synthetic video feed, because the attacker would need to compromise the cryptographic keys to generate a matching code, not just replicate someone's face.

Token authentication: verified before joining

Every Digital Samba session can be configured to require a signed authentication token for entry. These tokens are issued by the platform to participants who have been pre-verified by the host application. A participant without a valid, unexpired token simply cannot join.

In practice, deepfake defence starts at your user management layer. Whoever issues the token controls who gets in. If your onboarding, HR, or financial systems issue tokens only to verified identities, synthetic participants cannot obtain the credentials needed to join your calls. This does assume your identity management layer is secure upstream; token authentication is as strong as the issuance process behind it.

Role-based access control: no privilege escalation

Digital Samba's RBAC system is enforced server-side. Participants join with a specific role (host, moderator, or participant) and cannot escalate their permissions through client-side manipulation. This matters in AI impersonation scenarios where an attacker might try to gain host or moderator privileges to manipulate meeting content, remove legitimate participants, or access sensitive shared resources.

AI processing on self-hosted infrastructure

Digital Samba runs all AI-powered features (transcription, live captions, meeting summaries) only on self-hosted models. No meeting audio, video, or content is sent to third-party AI providers for processing.

For security-conscious organisations, this matters for data containment: platforms that route meeting content through external AI services create exposure to infrastructure you don't control or audit. Digital Samba's approach keeps meeting data within the platform's own infrastructure, and the same principle will apply to any future AI-based identity verification features as the capability matures.

Building a deepfake defence strategy

A video call deepfake scam targeting your organisation is unlikely to be stopped by any single control. The most resilient approach is layered:

Layer 1: Prevention through access control. Token authentication and RBAC prevent unverified participants from joining in the first place. This is your highest-value control. An attacker who cannot get into the call cannot conduct the attack.
Layer 2: Cryptographic session verification. E2EE with security verification codes provides mathematical confirmation that sessions are authentic and uncompromised. This is your assurance layer for high-stakes calls.
Layer 3: Real-time detection. Deploy a dedicated deepfake video call detection tooling as an additional signal, not as a primary control. Use it to flag anomalies for human review rather than as an automated gate that will generate unacceptable false positive rates.
Layer 4: Human protocols. Establish out-of-band verification procedures for high-risk requests. Any financial authorisation, sensitive data access, or strategic decision made on a video call should be confirmed via a separate channel before action is taken. Train employees to recognise the conditions that make deepfake fraud possible: artificial urgency, instructions to skip normal approval steps, and requests to keep the interaction confidential. In the Arup case, a single call to the real CFO through a known number would have exposed the fraud immediately. Layer 4 alone would have stopped it.
Layer 5: Audit and response. Log all authentication events, session participants, and access control decisions. When an incident occurs, you need a complete forensic record of who joined, when, and with what credential.

The video call deepfake scam threat will not diminish. Generation technology is becoming faster, cheaper, and more accessible every month. The organisations that will be resilient are those that treat video call identity as a security domain, not just a technical convenience.

FAQ

Securing video calls against deepfake fraud

The Arup case established a proof of concept that the security community cannot ignore: a convincing enough deepfake video call can deceive even trained professionals into authorising catastrophic financial decisions. The technology that enabled it has only become more accessible and more convincing since.

The answer is not to distrust video calls, because they're too valuable to abandon. The answer is to secure them the same way you secure any other high-stakes communication channel: with verified identity at the point of access, cryptographic session integrity, and layered controls that don't depend on human visual perception alone.

Digital Samba's approach is based on token authentication before joining, E2EE with cryptographic verification codes, server-side RBAC, and self-hosted AI processing. Together, these address the platform layer. Paired with clear human protocols for out-of-band verification, they cover both the technical and process failures the Arup case exposed.

Download our Security Whitepaper for full technical architecture details including encryption specifications, access control implementation, and audit logging.

Talk to our team to discuss your organisation's video security requirements and see these features in action.

References

Molins, L. (January 2025). The rise of deepfakes: What digital platforms and technology organizations should know. Deloitte Analysis.
Gartner. (20 November 2025). Gartner Predicts 50% of Enterprises Will Invest in Disinformation Security and TrustOps by 2027. Gartner Newsroom.
Gartner. (2 September 2025). Why CIOs can't ignore the rising tide of deepfake attacks. Gartner Newsroom.
Hong Kong Free Press. (5 February 2024). Multinational loses HK$200 million to deepfake video conference scam, Hong Kong police say. HKFP.
Keepnet Labs. (2024). Deepfake statistics & trends: Key data & insights. Keepnet Research.
Resemble AI. (2025). Q3 2025 AI deepfake security report. Resemble AI.
Regan, G. (27 March 2026). How to Evaluate Deepfake Detection Tools. Reality Defender.
Facia Admin. (21 January 2026). What Are Face Recognition Innovations and Trends for 2026. Facia.ai.
Stowe, K. (25 November 2025). Beyond Voice and Video: Tackling the Rise of Deepfake Text. Pindrop.
Coalition for Content Provenance and Authenticity. (2026). Content Credentials: C2PA Technical Specification.
Digital Samba Team. (2026). Security Whitepaper. Digital Samba.
FBI Internet Crime Complaint Center. (28 June 2022). Deepfakes and stolen PII utilized to apply for remote work positions. IC3.
UncovAI. (2026). Deepfake Detection & Prevention: What Business Leaders Must Know in 2026. UncovAI.
Deloitte Team. (9 November 2024). Deepfake disruption: A cybersecurity-scale challenge and its far-reaching consequences. Deloitte Insights.

View full post