5️⃣LLM Responses

Assumed audience: You’re a product designer working on an LLM powered feature. You’re familiar with how LLMs work. Use this as a starting point for further exploration and innovation.

👋 Introduction

When designing for LLMs, there are a few additional states that a designer would need to consider. These are peculiarities specific to how LLMs work (eg. slower load time, streaming capability), and common interaction patterns we’re noticing across products (eg: empty states doubling down on educating users, explicit triggers, feedback). We’ll dive into the details of each state in this article.


With LLMs, this can be broken down into 2 steps—loading and streaming.


A traditional loading state where an API call is being made and we wait for the response.

🤞 As of Oct 2023, LLMs are not particularly fast. Be sure to test your implementation, and see if you need to do the right expectation setting for your users, and get creative with your loading states.


Responses are not generated by LLMs as complete blocks of text. They simply generate the next token (i.e. a single word, or set of words). This means we can actually show the progress in real time, and don’t need to wait for the entire text to load.

🤞 Streaming is a relatively uncommon pattern and there can be a lot of room to innovate with micro-interactions here.

💬 Response

This could be a summarisation of a long piece of text, an edit of a selected text, or the result of an API call facilitated by the LLM.

Generating a meaningful summary for a general use-case is challenging. We’d recommend you narrow your use-case, and work with your PMs to come up with a structure for your summary.

To get to the structure, you will typically:

  1. Identify types of questions: List down all the use cases and related questions your user may ask. Go broad, and then categorise them.

  2. Understand your sources: How are your sources structured? Based on this you may need to define some logic for what type of source is pulled for what type of question, and so on.

  3. Write ideal responses: What is the leanest summary that will be the most meaningful for your user? Write these for all the different types of questions.

  4. Identify a structure: Find patterns in your ideal responses, and create your structure.

  5. Create a dataset: Now use this structure to create a dataset of 50-100 ideal summaries to fine-tune your model (As of Oct 2023, fine-tuning is a more effective method than few-shot prompting)

🤞 Try using markdown to style the summaries in your fine-tuning dataset. The model will learn the styling as well, and your summaries will not look like walls of text.

🔄 Follow up

LLM features can be collaborative. Your user may not be satisfied with the first response, and may want to re-try, or make some tweaks to the query. And if they’re satisfied, the Follow Up should allow them to make use of the Response easily.

This also raises the question of whether you need to build a full-blown chat interface to facilitate this back-and-forth. A few questions to think about:

  1. Do you really need a name and persona to the LLM? Related, does it really need an avatar?

  2. Does it need to sound conversational, or does it need to just respond with the answer you seek?

  3. Is there a use-case for maintaining chat history at all?

🤞 Be intentional about the elements of a chat interface that are useful to you, and build out your own lean version of it.

👍🏼 Feedback

This one is relatively simple. Ask your engineering team what kind of feedback will help them improve the model, and capture that.

🤞 Progressive disclosure is the lightest way to go.

Last updated