As mobile app users navigate the App Store, they rely heavily on ratings and reviews to inform their download decisions. With the introduction of review summaries in iOS 18.4, users can now quickly grasp the overall sentiment around an app without delving into individual reviews. This innovative feature is made possible by a sophisticated AI-based system that periodically summarizes user feedback.

Challenges in Review Summarization

To ensure high-quality summaries that accurately reflect user opinions, our approach addresses several key challenges:

  • Timeliness: App reviews constantly evolve due to new releases and updates, requiring summaries to dynamically adapt to stay relevant.
  • Diversity: Reviews vary significantly in length, style, and informativeness, necessitating a system that can capture this diversity while providing both detailed and high-level insights.
  • Accuracy: Not all reviews are directly focused on the app's experience, and some may include off-topic comments, making it essential to filter out noise and produce trustworthy summaries.

LLM-Based Review Summarization Model

To overcome these challenges, we developed a robust approach that leverages generative AI. Our system consists of three primary modules:

  1. Insight Extraction: We fine-tuned an LLM with LoRA adapters to efficiently distill each review into key insights, capturing specific aspects and sentiments.
  2. Dynamic Topic Modeling: Another fine-tuned language model helps group similar themes from user reviews and identify the most prominent topics discussed, while avoiding a fixed taxonomy.
  3. Topic & Insight Selection: A set of topics is automatically selected for summarization, prioritizing topic popularity and incorporating additional criteria to ensure balance, relevance, helpfulness, and freshness.

Summary Generation

The selected insights are then used by a third LLM fine-tuned with LoRA adapters to generate a summary tailored to the desired length, style, voice, and composition. We fine-tuned this model using reference summaries written by human experts and continued fine-tuning through preference alignment and Direct Preference Optimization.

Evaluation

To assess the effectiveness of our approach, sample summaries were reviewed by human raters using four criteria: Safety, Groundedness, Composition, and Helpfulness. Each summary was evaluated for its ability to faithfully represent the input reviews, adhere to Apple's voice and style, and assist users in making informed download or purchase decisions.

By harnessing the power of AI in mobile apps, our review summarization system has revolutionized the App Store experience, providing users with a more efficient and effective way to discover new apps that meet their needs.