While everybody is going to be rightfully excited about the logical replication, for me personally, CREATE STATISTICS and the new ROW syntax for UPDATE amount to the additions that have the probably biggest effect on me ever since I moved to postgres exclusively when 7.1 was released.
Most of the time, the planner does an ok job, but sometimes you have tables in a peculiar shape and if you're unlucky, the planner will opt into horribly bad plans. Whenever this happened for me, it was due to one of the two things CREATE STATISTICS allows me to tune in the future.
Thank you, thank you, thank you to whoever gave us this wonderful feature
Like you, I think CREATE STATISTICS is huge. I work with a sharded PostgreSQL set-up where we roll our own sharding based on customer data. This means that most of our tables have compound primary keys where the identifying account data is part of the identifier.
Just speculating off hand, but this sounds like a schema-defined column dependency which is doubly-troublesome since, like I said, this is within our primary key index. I am super excited to see just how much a difference CREATE STATISTICS will improve overall performance.
It's difficult to say whether the extended statistics could help with your schema, particularly in Pg10 where we only implemented two simple statistics types - functional dependencies and ndistinct (GROUP BY) coefficients.
Maybe the changes in 9.6 that allow using foreign key constraints during estimation would help, though?
Has anything actually changed with the ROW syntax for UPDATE? Maybe I'm missing something but it looks like the functionality has always been there, but now you're allowed to type the word "ROW". E.g. compare [1] and [2].
EDIT: Turns out the below is not true, though you can achieve the same effect with the sub-select syntax.
The row value can be a single value from some other query. (Rather than having to pick apart each column from the row value.) That said I think the feature has been there for a while, and now simply the "ROW" keyword is optionally allowed.
Actually now that I re-read the grammar, sub-selects are yet another accepted syntax. Looks like the ROW syntax doesn't support row expressions like I thought. Not really sure the benefit beside making it easier to programmatically construct UPDATEs.
I don't quite understand the commit message, but I think the new syntax resolves some ambiguity in the grammar in an edge case. From the diff of src/test/regress/sql/update.sql:
-- *-expansion should work in this context:
UPDATE update_test SET (a,b) = ROW(v.*) FROM (VALUES(21, 100)) AS v(i, j)
WHERE update_test.a = v.i;
-- you might expect this to work, but syntactically it's not a RowExpr:
UPDATE update_test SET (a,b) = (v.*) FROM (VALUES(21, 101)) AS v(i, j)
WHERE update_test.a = v.i;
It's for when you want to update more than one column with values from the same record in another table, eg:
UPDATE books
SET (title, isbn) = (
SELECT title, isbn FROM other_books
WHERE other_books.foo = books.bar);
You can also do it with UPDATE ... FROM (which may be more efficient), but that's a PostgreSQL extension, while the added-in-9.5 syntax is SQL standard.
UPDATE books
SET title = other_books.title, isbn=other_books.isbn
FROM other_books
WHERE other_books.foo = books.bar;
The second option, "row" syntax (to which the optional "ROW" keyword was recently added), doesn't allow for a row expression, only column expressions. The sub-select syntax suffices for row expressions (as your example demonstrates).
I think it might also simplify the syntax if you want to update multiple columms from a composite type (which is essentially a ROW) so that writing sub query is not needed anymore.
As exciting as this is (and it really is) it's yet another thing that explodes the number of possibilities for optimization. It's getting towards the point of unmanageability and I hope there will be a movement towards, if not auto-tuning databases, assistive tools for exploring both this and indexing possibilities.
Yes, at this point it's manual optimization task - you have to decide which statistics you need (e.g. by looking for poor estimates in query plans) and then create them. Pretty much just like indexes.
The ultimate goal is to allow detecting this automatically, and either create the statistics right away or at least providing some suggestions to the users. That's not in Pg10 though.
I think the best way to handle this situation is for the default install to be generally performant on any given task, with the option to flip switches for those use cases where extreme performance are needed due to the specific workload.
If that's the case (which I suspect it is, having used prior PSQL versions), then most people will never need to optimize it, but those who have an extreme use case that can benefit from such tweaks will have such options available.
Is that related to the cool "hash compression" technology I hear about? Apparently it can compress an arbitrarily large file into just a few bytes, amazing!
Postgresql has compression by default on, for all large text and other large fields that get great benefit of compression. From documentation -" The technique is affectionately known as TOAST (or "the best thing since sliced bread"). "-
https://www.postgresql.org/docs/8.0/static/storage-toast.htm...
I am interested and I appreciate the thought and link, however these appear to be the slides to a talk without the actual talk. If so they are of limited use because the slides never have all the information the presenter has, and that information is often just a summary.
I know it's pretty popular to hate on Mongodb now (even more so than it was to love on Mongodb 4 years ago), but there are still areas where it's better than a relational db. In game development, it's extremely helpful (especially as an "indie") to change the structure on a whim so easily. Also based on the design of the game I'm working on, I believe the document structure captures the structure of the data so much better than if I was forced to make a bunch of tables. This aides in understanding the representation of our game's data, and I believe (but haven't tested) it will be faster than a relational db for my use case, but that's an ancillary benefit anyway.
But the posters above you said that JSON and JSONB types in Postgres, and functionality around them, eliminated the need to use other databases for document type data.
What you are describing can be done with PostgreSQL. One thing that is missing is better client libraries that make use of those data types. Morphia wins for now in that regard.
My claim wasn't that it didn't fulfill their needs, it was that it doesn't fulfill all needs (gamedev is one example that I'm familiar with).
Postgres storing JSON types != All mongo functionality
I'm sure I could achieve everything I'm doing in mongo by some roundabout way in Postgres, but if you're doing a large amount of reading/modifying partial fields within JSON structure, it's the exact use case for mongo.
1) it's a proof of concept that others have improved upon to show that you can indeed replace the functionality of mongo that most developers tend to rely on.
2) neither has mongodb at that level
3) that's correct, you need to actually use the term SELECT when you use the functions
and you are correct, pg is missing the random data loss that comes with mongo. it will never be 1-1 in regards to that.
EDITING TO ADD:
the important part is the note (the bulk of the comment) that I'm updating partial JSONB data on a very regular basis, and do it using PLV8, getting rid of the need for an unreliable database and instead using exactly what this news story is about.
A small independent game is where I've used it before as well. And currently I'm working on an app/game sort of thing where it makes a lot of sense because we're iterating often and fast. I like that it kinda gets out of my way and just works, though perhaps I would not so much had I experienced the data loss others speak of.
Yeah I've read some horror stories about that too. I think that the improvements to concurrency (ala WiredTiger), will help wth that, as well as making sure to think about possible concurrency issues from the outset, as I've tried to do.
Have you heard of ToroDB? FWIW they have benchmarks that claim it's running faster on top of Postgres than MongoDB does natively.
"ToroDB Server
It is a MongoDB-compatible server that supports speaks the MongoDB Wire Protocol (and therefore can be used with the same drivers used to connect to any standard MongoDB server) but stores your data into a reliable and trusted ACID database."
Using MySQL/Mongo for some years, major problems with MongoDB Cloud Manager (several hours site down due to Cloud Manager removing the mongo binary), for some time now I use pg for new projects and think it's great.
Two gripes:
1. Client libraries in Mongo work nicer with documents than Postgres libraries with JSONB (e.g. Scala Option[] mapping to non existing/existing fields)
2. Why is the Postgres JSON syntax so different? Why not just support SELECT document.field.field instead of (inconsistent) document->'field'. Imho pg JSON syntax is hard to read and new to learn.
Re 2: That syntax is already in use by `SELECT schema.column`. I do agree that the syntax is a bit cumbersome and harder to learn, but I'm not sure if they could have done much better while being consistent with SQL.
Why can't pg same the same syntax? I'm sure it could detect if it's a table column or a document field - or am I missing something? Why not handle documents and columns the same, with documents a a kind of hierarchical columns.
I would say that it isn't configured to scale like Mongo out of the box...but that doesn't mean it can't.
You can go outside of Postgres core to get multi-master solutions with easy sharding and clustering...with the open sourcing of CitusDB and 2nd Quadrant's pglogical and BDR extensions there are options out there.
You can also roll your own (if you really want)...and it is relatively approachable to do so using built-in features like partitioning.
And, of course, with the 10 Beta it would seem that logical replication is being brought into core which sets the foundation for future replication features such as BDR to also be brought into core.
I would also point out that since Mongo's BI connector fiasco, it would seem more and more Mongo users are finding more reasons to just use Postgres (where relational interfaces are desirable for BI): https://www.linkedin.com/pulse/mongodb-32-now-powered-postgr...
None of what you posted is built in and thus supported by the vendor.
That may not matter to you but it's a deal breaker for those of us in enterprises. We can't just be rolling our own versions of PostgreSQL and we can't use CitusDB when it is not supported by other vendors for use with their products.
The point still remains that after all these year PostgreSQL's scalability story is still a mess.
I think you might be confused somewhat...PostgreSQL doesn't have "a vendor". The history of Postgres starts at UC Berkeley and now has the PostgreSQL Global Development Group which is a mixture of contributors both community and corporate sponsored:
Of that group, both 2nd Quadrant and CitusDB are represented...so in a way you could say their support is "by the vendor". Not to mention EnterpriseDB which also has support options.
> we can't use CitusDB when it is not supported by other vendors for use with their products.
CitusDB is no longer a fork, it's an extension...this means so long as your other vendors products support Postgres, they support CitusDB. More over, CitusDB itself will sell you an Enterprise package.
> The point still remains that after all these year PostgreSQL's scalability story is still a mess.
If by mess you mean specifically there is no knob and dial arrangement in the core of Postgres I would agree.
But within the core of PostgreSQL there are primitives which make scaling approachable (I myself am working on a data ingestion process that utilizes table partitioning and hand rolled sharding for PostgreSQL).
And there are now a plethora of extensions and tools provided by core contributors such as 2nd Quadrant and CitusDB to offer somewhat out of the box solutions and they even come with support.
PostgreSQL is not the right tool for every job and the replication/clustering area has been a a sore spot for Postgres in the past. But it certainly isn't bereft of options now...and the inclusion of logical replication in this beta is only the first step in bringing these options closer/into the core.
This is absolutely true. I don't get the irrational downvotes when it comes to postgres.
It's a great database but it is sorely behind with scalability features and is just now finally getting single-node parallelism and logical replication. It's still hampered by the requirement on 3rd party tools to get connection scaling, decent backups, HA and distributed clustering. The future looks interesting but the other databases aren't sitting around idly either.
PostgreSQL is an amazing project. A no-nonsense database that delivers what it promises. I'm amazed at what a talented group of people can accomplish when they are driven and put their mind to it.
Thanks for a wonderful product.
Plus their documentation is top notch. After using Oracle for 15 years, I nearly cried when I moved to Postgres recently and saw how wonderful their documentation is.
I've often said that I didn't learn English grammar until I studied German, I didn't understand Linux until I read the Arch documentation, and I didn't comprehend relational data until I read the Postgres documentation.
A friend of mine was doing an MBA and was planning a paper on open source collaboration models. She asked me for my opinion on well-run projects and Postgres was the first thing that crossed my mind.
They have an excellent combination of goals, structure and expertise, and have created a remarkably low-drama, functional community around delivering excellent tools. I really think there's a lot to emulate there.
My short story regarding community. A few years ago I wanted to add a feature to PL/pgSQL that was on their TODO list. I asked about the feature on a public list and while I was unknown to the community I immediately got a private email from a developer who contributed to that part of code. We discussed the required changes and I prepared a patch. He reviewed it and later was supporting during formal review.
The biggest news for me is ICU support for collations (text sorting).
Previous versions of PostgreSQL relied only on strcoll, which is horribly broken on BSD and macOS. On platforms where it wasn't completely broken, it had the potential for subtle data corruption bugs (eg. an update to glibc might change sort order, causing indexes to become corrupt).
Now, you can optionally use ICU for collations, which gives you reliable, versioned collations. This is a big step forward!
ICU collations are not the default, you need to add them with CREATE COLLATION. You have a lot more collations available to choose from, but I think it's not yet possible to change any of the advanced settings that ICU provides.
(Also, when I tried it, it seems that the ICU collations are case insensitive -- but I think case insensitive collations aren't fully supported yet.)
The native table partitioning makes me so happy. I'd been doing this for years with really hacky external modules and tons of triggers. Sadly, even then there were always weird edge cases.
Postgres really has become the most versatile database out there. I cringe whenever I have to work with MySQL again...
That's very condescending of you. They may be just well informed of their use cases and know which database fits them well.
Poor Wikipedia and FB with their MySQL setups, they have no idea how much better of they'd be with postgres…
I tried to switch, once. I actually gave up when, after spending entirely too much time trying to find the cli client, I couldn't figure out how to actually send queries (or it may have been the "show database" part–it's been a while)
I tend to think the HN groupthink is strong on this subject, and 90%+ wouldn't ever see an effect beyond placebo from switching. MySQL (or MariaDB, which I actually use these days) has also changed drastically since those 200x-years where most of today's folk wisdom originates.
create table user_email (
email text not null
);
-- create a index on the lowercase form
-- of the email
create unique index user_email_case_idx
on user_email (lower(email));
-- select using the index, with the lowercase form.
select 1
from user_email
where lower(email)=lower('Foo@foo.com');
That's what per-column collations are for. Ideally you should be able to choose from case sensitive and case insensitive collations. Unfortunately PostgreSQL doesn't support case insensitive collations (for some reason the string comparison routines use memcmp as a tie-breaker when the collation says strings are equal).
If you want to do case-insensitive for all languages you can do this:
1. first install the following (be sure to replace [your schema]:
CREATE EXTENSION pg_trgm with schema extension;
CREATE EXTENSION unaccent with schema extension;
CREATE OR REPLACE FUNCTION insensitive_query(text) RETURNS text AS $func$
SELECT lower([your schema].unaccent('[your schema].unaccent', $1))
$func$ LANGUAGE sql IMMUTABLE;
2. then in your query you can use:
where insensitive_query(my_table.name) LIKE insensitive_query('Bob')
That will not work for all languages. Look at https://www.w3.org/International/wiki/Case_folding for an explanation of why this problem is nontrivial. The lower function is sufficient: it handles case-folding properly, using the configured locale. Explicitly stripping accents can actually be the wrong choice depending on the locale
1. You don't always have control over all the queries that have been written against your database.
2. You would probably lose the ability to use ORMs without a moderate amount of customization.
3. If you're migrating from a different database, you may have checksums on your data that would all need to be recalculated if you change case on everything stored.
4. Doing runtime lowercase() on everything adds a bit of overhead, doesn't it?
citext on postgresql seems a decent option - the citext docs even mention drawbacks of some of the other recommended options.
Is there a single ORM out there that doesn't support the lower() function? I googled "case insensitive search" + a couple ORMs and each of them could implement it as a one-liner.
And doing the runtime lower() on everything will generally not be slower than citext. If you look at the source for the citext comparison (https://github.com/postgres/postgres/blob/aa9eac45ea868e6dda...) you'll see it is internally converting the values to lowercase and comparing them. All it saves you is the overhead of a sql function invocation, and you'd have to do a lot of comparisons to make that difference measurable. But if you're doing a lot of those comparisons, unless you're just running the calculation on the same couple values over and over, the memory and disk latency will dominate performance, not the minimal overhead of the sql function invocation.
> 4. Doing runtime lowercase() on everything adds a bit of overhead, doesn't it?
Maybe MySQL has special sauce for doing this comparison without lowercasing the query string? But there must be some overhead relative to exact search?
Same here and, yeah, it really does. When preparing for upgrading a server recently I took a lvm snapshot of the disk and copied MySQL from it (to test in a VM that the server upgrade would work) assuming it would start just fine after running recovery, but instead it complained some about corruption and then segfaulted. So at least that version of MySQL cannot be trusted with your data in case of a power outage or kernel panic.
Hmmm, did you ensure the database had flushed outstanding writes + froze the filesystem before snapshotting it?
I don't touch MySQL much, but as a generalisation it's a good idea to flush outstanding writes + freeze the filesystem(s) for pretty much any database before snapshotting.
From previous experience with Oracle (years ago), that specifically would explode dramatically if things weren't flushed first.
I was expecting that presentation to be stuff you were missing, but the presentation is saying that essentially every feature they describe is implemented in PostgreSQL. Of the SQL:2011 ones, only one wasn't (temporal tables), and even the SQL:2016 features had partial support (with the summary slide at the end of other features they didn't do in detail having stuff that looks familiar in a PostgreSQL context). Do you have a reference of features not in PostgreSQL you are hoping to use?
This lets me materialize a view and then inspect deltas between refreshes, with history, and even update the materialized view from triggers and have those changes recorded in a history table automatically.
I know the feeling of having extremely limited time, i'm in the same boat. But I use Postgres every day with my startup, and we lean very heavy on the database (not by volume of data, just by complexity of the problems). I've reviewed a few patches that I wanted to see get into Postgres, and contributed to discussions on the mailing list. That's about the extent I am able to do at the moment, but it's better than nothing. More reviewers are never a bad thing.
Will having logical replication make doing a DB version upgrade in production easier? We're using Postgres 9.4 on RDS right now, and there doesn't seem to be an upgrade path that doesn't involve some downtime.
> Will having logical replication make doing a DB version upgrade in production easier?
Yes, that's one of the major goals.
> We're using Postgres 9.4 on RDS right now, and there doesn't seem to be an upgrade path that doesn't involve some downtime.
Unfortunately on RDS your options are a bit more limited than on plain postgres. Otherwise you could use londiste, pglogical, .. to keep the time to switch over to something very small.
I guess I remember 9.3 -> 9.4 on RDS sucked big time for us (no migration option whatsoever except pg_dump/pg_restore), but since 9.4 they had it pretty much nailed. Update worked flawlessly during the small failover maintenance window you have every now and then anyway. (You do run a HA setup, right?)
YMMV of course, better have your fingers crossed. If you don't trust it, do it the hard way: Switch to write-only mode (easiest: remove write rights for all DB users), notify your users, make a snapshot with pg_dump, restore to a fresh new DB copy and switch your services over. Worked for us, different story if you can't take any write downtime of course.
The most important (to me) thing about logical replication is that with it I can have additional schema elements in replicas. This is important for some use cases where one uses another group's DB and needs to add schema elements that the other DB's admins do not want to host (e.g., because they might think them risky) or where you don't want to have to be constrained by the other DB's maintenance schedules (e.g., for updating your schema elements).
> A dump/restore using pg_dumpall, or use of pg_upgrade, is. > required for those wishing to migrate data from any previous > release.
(https://www.postgresql.org/docs/devel/static/release-10.html)
This means no, I guess. This is what I don't get. How can anyone with a sizeable db in production do that?
That is only if you want to upgrade in-place. Upgrading using logical replication is different: it requires running another instance and switching over to it when all the data has been replicated.
Yes, but I still do not think it will be easy, just easier than it used to be. The support added here seems to be more aimed at people who want to only replicate certain tables.
The new features related to clustering looks exciting. Also SCRAM support and Full text search for JSON and JSONB . Postgres is cementing itself as not only the safe choice for Databases but also the Swiss Army knife .
As someone with little to no Postgres experience, it seems like they are heading in the direction of providing the type of massively parallel, scale out features that Citus provides.
Would love to hear thoughts from someone with real expertise.
Citus is building functionality on PostgreSQL to provide sharding for data sets too large for a single machine. There aren't really any PG 10 features that duplicate that. Some of the new PG features such as increased query parallelism allow for better utilization of single-machine resources. Other features such as logical replication may allow for some horizontal scaling by splitting read workloads across replicas, but none of them provide any sharding capabilities.
Aggregate push down to foreign servers seems to overlap with Citus. It's not as transparent, but you could have a dataset sharded across several PG instances, mount them all into a master with foreign data wrappers, and UNION them (for some queries) to aggregate across the set.
Any enhancement to PostgreSQL is also an enhancement to Citus, or rather, the PostgreSQL ecosystem as a whole. For example, PostgreSQL 10's declarative partitioning feature will help enable sharding+partitioning in Citus, which is one of the most frequently requested features.
PostgreSQL 10 also gives you the possibility of setting up a partitioned table in which the partitions are postgres_fdw tables, which allows a basic form of manual sharding without Citus. However, as we've learned over the years, there's a huge difference between the ability to distribute a table across multiple servers and addressing a use case.
A sweet spot for Citus is multi-tenant (SaaS) workloads, in which all queries and transactions are specific to a particular tenant. In that case, you can typically distribute most of your tables by tenant ID, and use (replicated) reference tables for data that is shared across tenants. Citus makes sure that data for the same tenant is automatically co-located and as long as your query filters by a particular tenant and joins by tenant, you get full SQL pushdown (with parallelism in PG10), and ACID transactions. At the same time, you can perform parallel DDL commands across all tenants to enable migrations, bulk load data through COPY, perform parallel rollups or transformations through INSERT..SELECT, and run parallel analytical queries. Overall, the combination of these features and the trade-offs that Citus makes ensure that if you need to scale out a multi-tenant app, sharding through Citus solves it. In many cases, the only changes you need to make in your app are adding a tenant_id column to your tables [1], being explicit about the tenant in your queries or ORM [2], and adding create_distributed_table calls.
None of the Citus features that allow you to scale out a multi-tenant app are available if you do sharding through partitioning+postgres_fdw so far. I also wouldn't expect core postgres to make aggressive trade-offs to optimise for specific use cases. The PostgreSQL way is to make everything pluggable and let extensions specialise.
I wrote the initial patch for the UUID support! It's my only Postgres contribution so far (except for docs), and I really only write C a few times a year. I made a few revisions but as I fell behind others picked it up and got it over the finish line. I'm pretty proud to say I participated, and maybe others can be encouraged to get involved too. Most of my day I write Ruby and Javascript. :-)
The issue with that is performance and index size. At first that was my solution, but as more data got into the tables that just wasn't working. For UUID's I had to move to casting to bytea using the uuid_send(uuid) function.
For enums, I had to create a function to cast an enum to a unique integer, and used that in my exclusion constraints. It's hacky as hell and not something I like, but it's worked for the past 3 years.
I'll be really really happy to get rid of those hacks come PG 10 though.
I know this sounds icky to some, but what I really want from Postgres is a proper equivalent to MSSQL's FILESTREAM.
I know, I know, "databases are bad for files" - but let's take something like an ECM suite where images and documents are literally part of a transaction, having to synchronize those between filesystem and database breaks the Atomic constraint in so many ways. PostgreSQL has LOB support, but oid's being 32-bits severely limits the usefulness of the feature without using convoluted workarounds (multiple databases).
> but what I really want from Postgres is a proper equivalent to MSSQL's FILESTREAM.
What sizes of files and such are you interested in? What kind of read/write patterns?
I do think we need some improvements in the area. Not enough that I'll drop the other stuff I'm working on, which I think is higher priority, but enough to discuss approaches and review patches.
It'd be cool if you could comment on the pgsql-hackers list.
My major issue with bytea is two fold, the same oid issue that you have with LOB's (we have 10's of billions of pages) so you're forced to less than optimal solutions like table inheritance for large numbers of records, plus it severely bloats the size of the heap file making a VACCUM FULL take a century if needed.
The native table partitioning removes one of the headaches of inheritance, but it mostly only relieves some of the administrative headaches - the foreign key issues remain.
fdw's are great, but they don't participate in replication - which is a pretty big issue. Unfortunately, the "proper" way to do this would be with a custom type, but I can't find any way you could write one with the extension API that wouldn't be stored in the database heap (you could create some native functions similar to the existing LOB functions, but then you run into atomicity issues again).
Honestly, this would be easy to write if the extension API supported off-heap storage of custom types - though I understand there's a ton of technical complexity in implementing such a feature.
To be clear—you want Postgres to manage the data (so that it gets replicated through Postgres replication), but you also want the data to exist in somewhere other than the DB heap?
As far as I can tell, these two conditions together form a bit of a bind: for Postgres to manage the data in a way that would enable replication under MVCC, it has to have said data mmap(3)ed and indexed and have it participate in the WAL log and so forth. Just the space overhead of this management metadata will then be prohibitively costly in memory, if you have "tens of billions of files" to keep under management.
Personally, I think the pragmatic solution would be to replicate the files themselves outside of Postgres, and have Postgres just hold opaque references to their paths, which you would join through an FDW to fill a column with their contents.
As you say, the files are Write-Once-Read-Many—so you (presumably) aren't worried about how Isolated or Consistent writes to the file data would be without Postgres. Plain filesystems provide Atomicity and Durability guarantees all by themselves. It's less convenient to do things this way—you have to manage e.g. "WAL-E + a separate rsync" rather than just WAL-E—but it's not particularly bad in terms of the engineering trade-offs.
> To be clear—you want Postgres to manage the data (so that it gets replicated through Postgres replication), but you also want the data to exist in somewhere other than the DB heap?
> As far as I can tell, these two conditions together form a bit of a bind: for Postgres to manage the data in a way that would enable replication under MVCC, it has to have said data mmap(3)ed and indexed and have it participate in the WAL log and so forth. Just the space overhead of this management metadata will then be prohibitively costly in memory, if you have "tens of billions of files" to keep under management.
There's really not too much of a technical problem here. What the sub-op is complaining about mainly just some work that needs to be put into parts of postgres that have been designed long ago. Not entirely trivial due to compatibility concerns, but also not super hard. The first thing would be to have a separate type of toast table with 64bit ids (can't drop support for 32 bit ids without making in-place upgrades impossible), and to have per table toast id. Then use same infrastructure for the LOB piece.
Btw, postgres doesn't mmap() data. On some systems it uses mmap(MAP_ANONYMOUS) to allocate its buffer cache, but that's just a configurable size.
Right, I didn't mean to suggest that Postgres currently mmap(3)s all data; rather, I meant to get across that—in the scenario where you really want to keep all this data laying around as files rather than as LOBs—Postgres would need to have some way to guarantee that the data that's sitting around in those files is always in the state Postgres thinks it's in, if the corresponding file-streams are going to join in the same MVCC transactions as everything else. From what I can tell, that would require Postgres to do whatever it does for each table backing-file it keeps, to each of those files: both on the OS level (at least fopen(3) + flock(3)), and in terms of book-keeping in the system catalog.
Let me put it this way: how much overhead do you think there would be if you split your Postgres data across "ten billion" tablespaces? Because that's essentially what's being talked about here—little tablespaces containing one {oid, LOB} table, with one row, with a "storage engine" that represents that tablespace as a file.
...of course, if you do just (fix and then) use LOBs rather than insist on externally-visible files, none of that matters. :)
A filesystem may guarantee atomic and durable transactions, but then you're in a bind trying to keep things in sync with the database - either potentially ending up with orphans or missing data. In theory you could use a 2-phase commit approach, but that comes with its own headaches.
Consistency is actually a huge issue for DR purposes, right now with the DB+filesystem approach it's nearly impossible to get a clean restore from backups if needed. We could alleviate this a bit by using something like GlusterFS' geo-replication support and replicas offsite, then verifying the tail of the data on-disk, but it would be nice to have a solution that could just work correctly with PostgreSQL's built-in replication (just like FILESTREAM does with MSSQL).
Edit: I should add, PostgreSQL's existing LOB support already works great with binary replication. The biggest issue with them is simply the 4B record limit, due to the use of OID's. It'd be nice if it was hidden away automatically with a special BYTEA-variant like FILESTREAM does with MSSQL, but it's "good enough" without that limitation.
Databases and filesystems duals. If "databases are bad for files", that's mostly just an implementation issue. The only serious impedance mismatches between databases and filesystems have to do with transactional semantics (which are much looser in POSIX and friends than in ACID DBs), and byte-range locking (which I'm thinking can't be advisory, only mandatory, in ACID DBs).
Is there an OSS version of DTC (Distributed transaction coordinator)? That gives you the transactionality you want from the file system and other systems.
unrelated to this particular release, but since we're talking about postgres... how do people find the ability to write stored procedures and functions in languages other than just SQL? I'm coming from a MSSQL shop and curious how writing a query in python for example has benefited anyone if at all -- did you / can you use modules like numpy or pandas in a postgres python procedure?
Python under PG can do anything python can do.. but that said, just because you CAN import numpy/pandas and go to town, doesn't mean you SHOULD. In fact you probably really shouldn't :)
That said, we use python inside of PG, and it's awesome. It's not really any more powerful than PG/SQL(their built-in language) i.e. in terms of what you can do in the DB itself.
Our main app is also python, so we have a continuity of language from front-end to back-end. We do have some DB functions that take advantage of python's flexibilities, but like I said, we try to keep external modules and craziness out of the PG backend if possible. Sometimes there is no help for it, and you do what you must, and it's very handy that you can just go do it.
Sometimes I'd rather do python magic than self join correlated sub-queries or write window functions and have it all become a convoluted mess. But I see your point.
edit: i'm still trying to train myself to think in a relational way.
relations are awesome, but it def. takes training! I'd suggest avoiding window functions for pagination.. see http://use-the-index-luke.com/no-offset as one method.
Yes, it was a reference to that. My comment is currently sitting at -2. I'm still getting the hang of Hacker News' sense of humor. On Slashdot, I would be at +5. Ho hum.
Anyway, just to make my intentions clear, I love Postgres, have used it for over a decade, and have yet to use MongoDB or any other NoSQL database.
In general, strive for substantive and constructive comments on HN. If you've got that, a little humor added in can be appreciated. Comments that are submitted only for humor value (which you knew yours was, given your parenthetical addendum) are likely to be less appreciated on HN than on other sites.
Here's a recent thread where this has been discussed:
So what I want to know is how the members of Hacker News act as one. Presumably these are the same people who appreciate humor on other sites, upvote it, and participate in it. But on Hacker News they somehow all know to downvote it. It's eerie.
I'm not sure I follow. People often behave differently depending on the context (e.g., work, home, school, out with friends), so acting differently on different sites doesn't seem very surprising. Also, different sites have different, though perhaps overlapping, populations, so the "average" behavior or culture is going to be different.
One example (mentioned in the HN guidelines) is Reddit. There are people who frequent both HN and Reddit, yet it's clear that members on HN—even those who use Reddit as well—don't want HN and Reddit to be the same. As such, they behave differently on each site. That's not to say one is better than any other: they're just different.
And HN members don't act as one—just as they don't act as one on any other site. If they did, your comment would have been downvoted to the point that it was flagged dead (given it's likely been seen by hundreds, if not thousands) or not flagged at all. Given it's current shade of gray, I suspect you've received only a few downvotes. (Edit to add: I see you did say it is at -2.)
Does that make sense? Or am I misreading you?
Anyway, this is quite off-topic now. I posted these in the hopes of providing a bit more insight into the HN community. I hope they've been more helpful than frustrating.
So you want something that covers all types of database systems? I don't think any such book exists, but maybe look at the database-oriented courses at CMU, Berkeley, Stanford, MIT ... Those are typically comprehensive and include links to source books/papers.
In particular, look at Joseph M. Hellerstein at CMU: db.cs.berkeley.edu/jmh/ (doesn't load for me at the moment, not sure what's wrong).
Also, the Redbook (prepared by Hellerstein, Bailis and Stonebraker) is great: http://www.redbook.io/ but it's more an update on various topics than introduction.
Postgres is awesome and thankfully is much easier to get set up with high availability and replication baked-in (remember pgpool anyone?). AWS Cloud Formation Docker Swarm stack + postgres primary/replica compose file + datadog and you're good to go.
Especially CREATE STATISTICS (wonderful explanation here https://www.postgresql.org/docs/10.0/static/multivariate-sta...) is the one single knob I wanted to be able to turn so many times in the past now (especially the n-distinct counts).
Most of the time, the planner does an ok job, but sometimes you have tables in a peculiar shape and if you're unlucky, the planner will opt into horribly bad plans. Whenever this happened for me, it was due to one of the two things CREATE STATISTICS allows me to tune in the future.
Thank you, thank you, thank you to whoever gave us this wonderful feature