Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If I recall correctly (and it has been a while, so I'm not saying I am), the issue was with concurrent transactions inserting a record into tableA and another into tableB which has a foreign key constraint to tableA. The issue was likely specific to `ON CONFLICT DO NOTHING` and not `ON CONFLICT DO UPDATE`.

For example, let's saying you're building an index of open source packages and have two tables: package_type(id, name) and package(id, type_id, namespace, name).

If you receive two concurrent requests for `maven://log4j:log4j` and `maven://io.quarkus:quarkus`, a naive implementation to insert both "maven" and the packages if they don't exist might look something like this:

   WITH type_id AS (
     INSERT INTO package_type(name)
     VALUES (:type)
     RETURNING id
     ON CONFLICT DO NOTHING
   )
   INSERT INTO package (type_id, namespace, name)
   SELECT type_id, :namespace, :name
   FROM type
   ON CONFLICT DO NOTHING;
However, one or both inserts can fail intermittently because the primary key for `package_type` will be auto-incremented and thus the foreign key won't be valid. Also, as mentioned in another comment[0] this won't work if `maven` already exists in the `package_type` table.

[0]: https://news.ycombinator.com/item?id=36031790



Atomicity doesn't mean "doesn't fail", it means "either fails or succeeds, but does not succeed halfway".

There is nothing about what you are describing that is different from the behavior you'd get from a regular insert or update. If two transactions conflict, a rollback will occur. That isn't violating atomicity. In fact, it is the way by which atomicity is guaranteed.

The behavior of sequence values getting incremented and not committed, resulting in gaps in the sequence, is a separate matter, not specific to Postgres or to upsert.


I don't think this has to with concurrency; this query is fundamentally broken (besides typos), in that the CTE won't return anything if the package_type already exists.

You have two options:

(1) ON CONFLICT DO UPDATE, with dummy update:

    WITH
      type AS (
        INSERT INTO package_type (name)
        VALUES ($1)
        ON CONFLICT (name) DO UPDATE SET name = excluded.name
        RETURNING id
      )
    INSERT INTO package (type_id, namespace, name)
    SELECT id, $2, $3
    FROM type
    ON CONFLICT (type_id, namespace, name) DO UPDATE SET name = excluded.name
    RETURNING id;
(2) Separate statements with ON CONFLICT DO NOTHING (could be in a UDF if desired):

    INSERT INTO package_type (name)
    VALUES ($1)
    ON CONFLICT DO NOTHING;

    INSERT INTO package (type_id, namespace, name)
    SELECT type_id, $2, $3
    FROM package_type
    WHERE name = $1
    ON CONFLICT DO NOTHING;

    SELECT id
    FROM package
    WHERE (type_id, namespace, name) = ($1, $2, $3);


I think that actually more to the root of the problem, as other folks have noted, is that `ON CONFLICT DO NOTHING` means the RETURNING clause returns no rows if there is a conflict, which in my experience is rarely what people want. So instead people do `ON CONFLICT DO UPDATE` with a no-op update which has performance/locking implications, otherwise they need to do a complicated query (search stack overflow).

I wish that postgres would add some sort of backwards compatible option like `ON CONFLICT DO NOOP` or `ON CONFLICT DO RETURN` so that you got the semantics of `DO NOTHING` except that the conflicted rows are returned.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: