Elevating App User Experience: Strategies for Optimizing Streaming and Engagement

As Large-Language Models (LLMs) continue to revolutionize the way we interact with technology, it's crucial that developers prioritize user experience. In this article, we'll explore two essential techniques for creating an engaging and pleasant user experience in LLM applications.

The Problem

The exponential growth of LLMs has led to a surge in intelligent applications, but speed is often sacrificing to intelligence. This results in an underwhelming user experience. To bridge the gap between intelligence and usability, we'll focus on optimizing time-to-first token and showcasing intermediate LLM steps as user experience tricks.

User Experience Techniques

To create a seamless experience, developers must consider the time it takes for the system to respond with the first token. This metric, known as time-to-first token, is critical to the overall user experience. By reducing this latency, you can provide users with immediate feedback and create a sense of engagement.

Time-To-First Token

Traditionally, applications measure latency and response time. However, LLMs have a different metric that's essential to the user's experience: time-to-first token. This metric accounts for the time it takes for the system to send back the initial LLM token. By optimizing this metric, you can significantly improve the user experience.

Ways to optimize this metric include:

Response Streaming

Response streaming is a powerful technique for reducing latency. It allows the system to return partial responses, providing users with immediate feedback and creating a sense of engagement. This approach has contributed to ChatGPT's fast response times, even when dealing with complex answers.

Backend Parallelization

Modern LLM tooling enables developers to parallelize requests in the backend. This technique is particularly useful for complex applications that rely on multiple agents to generate intermediate outputs. By parallelizing these operations, you can reduce latency and create a more responsive user experience.

Showing Intermediate Steps

As users interact with LLM applications, they often wonder what's happening behind the scenes. To address this pain point, we recommend showcasing intermediate steps in the LLM pipeline. This approach provides transparency and keeps users informed about the current system status.

The Nielsen Norman Group's guidelines for this heuristic emphasize the importance of keeping users informed about system status:

The design should always keep users informed about what is going on, through appropriate feedback within a reasonable amount of time.
When users know the current system status, they learn the outcome of their prior interactions and determine next steps. Predictable interactions create trust in the product as well as the brand.

LLM systems can be transparent by providing insights into their inner mechanisms. By streaming responses back and using modern tooling to get the status of the system, you can provide users with a "pizza tracker experience."

Real-World Examples

Dominos Tracker is an excellent example of providing transparency in an opaque system. Similarly, Perplexity, an LLM-based search engine, uses this approach to keep users informed about complex and long-running processes.

By implementing these strategies, developers can create LLM applications that prioritize user experience and provide a seamless interaction.

The Problem

User Experience Techniques

Time-To-First Token

Response Streaming

Backend Parallelization

Showing Intermediate Steps

Real-World Examples

Related Articles

The Ultimate Guide to iOS App Development in 2026

Android App Development with Kotlin: Complete Tutorial for Beginners

React Native vs Flutter in 2026: Which Framework Should You Choose?