Avoiding duplicate objects in Django querysets

Dev comments explode: 'Use Exists!', 'Write SQL!', and 'DISTINCT is fine'—choose your fighter

TLDR: The post recommends using an Exists check to avoid duplicate results without heavy deduping, while comments split into camps: some praise Exists, others say performance varies and sometimes DISTINCT wins, and a loud crew says just write SQL. It matters because the wrong fix can slow apps and muddy code.

Django devs are seeing double—and arguing about it. The article says duplicates pop up when you search across linked data, and the clean fix is Exists: a quick “does one match exist?” check instead of hauling every match back and deduping later. That alone would be tidy… except the comments turned it into a three‑way brawl.

On one side, fans like augusteo cheer, “Exists is the cleanest,” and warn that blindly slapping .distinct() everywhere is a red flag for deeper query mistakes. On another side, performance pragmatists like jiaaro pour cold water on any one‑size‑fits‑all: sometimes DISTINCT (the “only show each thing once” button) is faster, sometimes Exists is—it depends on table sizes and how many hits you expect. And then there’s Team SQL, led by tecoholic and echoed by a very relatable ducdetronquito who admits after a decade with Django, “ORMs still feel weird—SQL is easier.” Their vibe: if you’re fighting your tool, drop the gloves and write the query by hand.

Cue the memes: “Choose your fighter—Exists vs DISTINCT vs Raw SQL,” with commenters swapping war stories about weird ordering rules and database quirks. The only thing everyone agrees on? Duplicates are annoying. The fix? That’s where the drama—and the fun—begins.

Key Points

  • Duplicate objects occur in Django querysets when filtering across relationships due to SQL JOINs.
  • distinct() removes duplicates but can be slow because it compares all selected fields, including large ones.
  • PostgreSQL’s DISTINCT ON via distinct(*fields) is more efficient but restricts ordering and can trigger errors.
  • A workaround for DISTINCT ON ordering issues is to select distinct ids in a subquery and then order in an outer query.
  • Using an Exists subquery with OuterRef is the recommended approach, offering clarity, performance, and flexible ordering.

Hottest takes

I use Django daily for 10 years but I don’t understand the ORM besides basic CRUD. — ducdetronquito
if ORM abstraction “distinct()” is a performance issue, then it’s probably time to switch to SQL. — tecoholic
Whether or not it's faster than distinct depends on the rest of the query. — jiaaro
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.