The Problem Group Mode Solves
Translation apps are generally built for two people. One person speaks, one person listens, the phone passes back and forth. That works fine for a one-on-one conversation, but it breaks down the moment a third voice enters.
Group Mode is Puente’s answer to multilingual groups: up to 8 participants, one phone placed in the center, each person receiving captions and audio in their own language. The technical challenge that makes this hard is speaker diarization — knowing who said what — and Puente handles it automatically.
What Speaker Diarization Actually Does
Imagine a recording of a meeting where three people are talking. If you transcribe the audio without diarization, you get a wall of text with no indication of who said which sentence. Diarization adds speaker labels: “Speaker 1 said this, Speaker 2 said that.”
This sounds straightforward but it’s technically demanding. Voices need to be distinguished by acoustic properties — pitch, cadence, timbre — and those distinctions have to hold up across interruptions, background noise, and participants who speak similar languages. Puente’s Group Mode builds this into the real-time translation pipeline, so each voice gets its own translation output rather than having the conversation collapsed into a single mixed stream.
Setting Up a Group Mode Session
1. Confirm Pro access Group Mode requires a Pro license or Day Pass. If you’re on the free tier, you’ll be prompted to upgrade.
2. Select Group mode Open Puente, tap the mode selector, and choose Group. The interface shifts to a multi-participant layout.
3. Add participants and assign languages Tap the + Add Participant button for each person joining. Assign each participant their language from the 109-language list. Names are optional but help with on-screen attribution.
4. Place the phone in the center Set the phone flat on the table, screen up, with the microphone exposed. Puente’s Group Mode is optimized for table placement — the microphone picks up voices from all directions.
5. Start the session Tap Start. As each person speaks, Puente identifies the voice, transcribes it, translates it, and displays the translated caption to all participants in their respective languages. If audio output is enabled, the translated speech plays through the speaker or connected audio device.
Business Meeting Use Case
A consulting firm is running a kickoff meeting with clients from three countries. The project lead speaks English, one client speaks French, and another speaks Mandarin. A fourth participant, a local partner, speaks Spanish.
With Group Mode active, the project lead places their iPhone in the center of the conference table. Each participant sees captions in their own language as the project lead speaks. When the French client responds, everyone else sees that response translated into their language simultaneously. The meeting proceeds without any participant waiting for a human interpreter or struggling with a second language they’re not fully comfortable in.
For a one-hour meeting with four languages, this setup used to require either four interpreters or a compromise where everyone spoke in the weakest common language. Group Mode removes both constraints.
Construction Safety Briefing Use Case
A site foreman is conducting a daily safety briefing for a crew that includes workers who speak English, Spanish, and Haitian Creole. Compliance requires that every worker understands the hazard protocols for that day’s work — it isn’t optional, and “they got the gist” isn’t good enough when the task involves working at height or with high-voltage equipment.
The foreman opens Group Mode, assigns languages to each worker’s profile, and conducts the briefing while the phone sits on a crate between them. Each worker reads captions in their language as the foreman speaks. If a worker has a question, they speak up and the foreman sees the translation immediately.
This is Title VI compliance — ensuring meaningful access to programs for people with limited English proficiency — achieved with a $9.99 app rather than a scheduled interpreter who may or may not be available.
Multilingual Sports Team Scenario
A youth soccer club has players from four countries. The coach speaks English. Several players are more comfortable in Portuguese, Spanish, or Korean. Post-game debriefs and tactical discussions have historically been shallow because the coach couldn’t reach all players equally.
With Group Mode running during team meetings, the coach speaks English and every player sees the debrief in their language. Players can ask questions or respond in their own language and the coach receives English captions immediately. Team cohesion — the part that depends on everyone actually understanding what’s being communicated — improves in direct proportion to how clearly the message lands.
Live Captions vs. Audio Playback
Group Mode displays live captions by default, which are readable by all participants simultaneously on the screen. If the group prefers audio, translated speech can play through the phone’s speaker — though this creates an overlapping audio situation that can get confusing when multiple people speak in quick succession. For most settings, captions as the primary output and audio as supplementary is the most practical configuration.
For groups where participants prefer their own private audio, each person can connect their own earbuds to their own devices and use Remote mode to join a session — but that requires each participant to have their own phone. True one-phone Group Mode with shared audio works best in quieter settings with good turn-taking.
Related: Puente for Construction — pre-shift OSHA briefings · Puente for Teachers — multilingual classrooms · Puente for First Responders — multi-agency coordination · Remote Mode if participants aren’t in the same room · Profession Packs add domain vocabulary to group sessions