Autocomplete

Kevin Wayne
Princeton University
wayne@princeton.edu

Overview

Students write a program to process autocomplete queries for a given set of suggestion strings and associated weights. That is, given a query string, find all suggestion strings that start with the query string, sorted by weight. Autocomplete is a pervasive user-interface feature in many modern apps and services, ranging from text messaging to IDEs to web search. For example, Google uses autocomplete to display suggestions in real time, as the user types a web search query.
Google autocomplete

In one version of the assignment, students implement autocomplete in two stages:

The system sort suffices for sorting. However, the variant of binary search required is nonstandard, so students must design and implement their own version. Along the way, student must define three custom orders.

The assigment is nifty because autocomplete is a quintessential application of sorting that is familiar, authentic, and engaging. Autocomplete also exemplifies how applying a core HCI principle can dramatically improve user-interface design. Armed with only sorting, binary search, and a hundred lines of code, students can compose an effective solution to an important problem.


Resources


Metadata

Summary Students compose a program to process autocomplete queries for a set of suggestions.
Topics sorting, binary search, arrays, string processing, HCI
Audience late CS1 or early CS2
Difficulty easy to moderate
Strengths
  • Spotlights a familiar and important application, for which performance is critical.
  • Brings sorting and binary search to life—two topics that are often dull and insufficiently motivated in CS1 and CS2.
  • The accompanying interactive GUI and real-world datasets add concreteness, authenticity, and amusement.
  • Requires only about 100 lines of code (and no starter code).
Weaknesses
  • Implementing binary search is tricky.
  • Defining a custom order is awkward in Java 7.
  Dependencies   Requires a basic I/O library to read text from a file, ideally one that supports Unicode.
Variants Many variations of this assignments are suitable for CS1 and CS2.
  • Simplify the assignment to allow use of system binary search.
  • Refocus the assignment on lists/arrays and string processing by replacing binary search with sequential search (suitable for small datasets or small numbers of queries).
  • Achieve improved performance using fancier data structures (such as tries, suffix arrays, segment trees, and Cartesian trees).
  • Extend the autocomplete data type to support adding/removing suggestion strings or changing the associated weights.
  • Delve into Unicode and related internationalization issues (such as using a diacritic-insensitive order).
We also believe this assignment has great potential for adaption in several upper-level courses (HCI, Networks, Algorithms, and ML).