Hacker News

story

SQL:2023 has been released(iso.org)

270 pointsMarkusWinand posted a year ago

136 Comments:

MarkusWinand said a year ago:

The major news are:

- SQL/PGQ - A Graph Query Language

- JSON improvements (a JSON type, simplified notations)

Peter Eisentraut gives a nice overview here: https://peter.eisentraut.org/blog/2023/04/04/sql-2023-is-fin...

dang said a year ago:

Discussed here:

SQL:2023 is finished: Here is what's new - 35562430 - April 2023 (153 comments)

bokchoi said a year ago:

Thanks! Lots of little neat improvements in there like accessing JSON values using dots and array syntax:

    SELECT t.j.foo.bar[2], ... FROM tbl t ...
sverhagen said a year ago:

I'm a pretty average SQL user, but I've heard expert consultants say before that they could do many more things with SQL databases that developers like me would have maybe grabbed a different tool for, like a graph database. So this then makes me wonder, once there's even broader adoption through PGQ, is that going to be a killer for niche databases like Neo4j, in favor of, say, Postgres?

derefr said a year ago:

Graph databases are about as different from RDBMSes storage-wise, as column-stores are from row-stores. It comes down to how you plan to shard data and distribute queries when data doesn't fit on a single node.

Using a graph DB with many underlying KV-store nodes, you can have a single graph spread over many machines representing e.g. Facebook's social graph, and run a query which "chases around" edges between vertices that live on different nodes, to solve that query, while ensuring that as little of that has to happen as possible — both by rebalancing vertices so that data is sharded at low-connection-degree points in the graph; and by consolidating the steps of queries that occur on the same node into single batch queries, such that the whole thing becomes (close to) a single map/reduce step.

There's nothing in Postgres that knows how to do that; if you had e.g. a graph stored in a Citus hypertable, and did a recursive CTE over it to do graph search, then you'd get pretty dang bad perf.

MarkusWinand said a year ago:

> It comes down to how you plan to shard data and distribute queries when data doesn't fit on a single node.

A problem everbody would love to have but pretty much nobody actually has.

jupp0r said a year ago:

> A problem everbody would love to have

Except the people who do have it and need to keep their business running off of one postgres instance.

throwaway894345 said a year ago:

You can have data that fits on one machine and still run multiple instances of postgres in a failover configuration, which will probably cover just about everyone (depending on your filesystem, disk for a single instance is essentially infinite, so I'm not actually sure what bottleneck would motivate you need to scale beyond this configuration).

jupp0r said a year ago:

> I'm not actually sure what bottleneck would motivate you need to scale beyond this configuration

It's usually not that data doesn't fit on one machine but that load on the database exceeds what one machine can serve. A failover configuration might enable you to use the spares for some read operations and take a little load of the primary, but you lose ACID semantics when you do that and it generally doesn't help you for long.

threeseed said a year ago:

There is a reason the world moved on from failover architectures.

a) At some point you will have more data or more users than one instance can handle. And instead of simply adding another node you need to throttle usage in order to do a rolling upgrade. Which is far easier said than done and involves impact to the business.

b) With distributed databases you are constantly testing that everything works in Dev, Test etc environments. With failover you really only test it every now and again usually before you deploy to Production. And in most companies which are hopeless the testing will be guaranteed to be inadequate.

c) Vendors lie. They promise that failover will just work but in my experience it very often doesn't. Which is another reason why b) is such important to validate their claims.

throwaway894345 said a year ago:

I mean, I don't like that Postgres is not infinitely scalable, but the whole point is that (a) is not generally true--most companies could probably get by with a single machine's worth of data, or rather if they have more than one machine's worth of data, those systems probably aren't talking to each other such that they need to be on the same box. Regarding (b) and (c), do you not need to test failure conditions for distributed databases (this isn't rhetorical, I've only ever used cloud providers offerings)?

pooper said a year ago:

> most companies could probably get by with a single machine's worth of data, or rather if they have more than one machine's worth of data those systems probably aren't talking to each other such that they need to be on the same box

one of the things I would like to see in my lifetime is somehow it should be easier to "run the whole enterprise" from one box. Sure, it will probably be seriously underpowered and I can't do all things at once but for most small to mid-size companies, it should be possible to run all our "code" from one machine.

I think of this as some kind of development or pre-qa environment. It really shouldn't be that big of an ask...

I am thinking most, if not all, companies will be able to fit their entire enterprise on a Supermicro A1+ server with two 96-core processors. Sure, there is no machine in the world that can fit all of YouTube videos but there is no reason why we can't have YouTube, with a limited set of non-production data, running from just one box. Thoughts?

throwaway894345 said a year ago:

> one of the things I would like to see in my lifetime is somehow it should be easier to "run the whole enterprise" from one box. Sure, it will probably be seriously underpowered and I can't do all things at once but for most small to mid-size companies, it should be possible to run all our "code" from one machine.

I think that's the mainframe idea. There's probably some interesting philosophical question in there about whether or not a data center is just a big mainframe. It sort of feels like it verges on semantics.

> Sure, there is no machine in the world that can fit all of YouTube videos but there is no reason why we can't have YouTube, with a limited set of non-production data, running from just one box. Thoughts?

I'm not sure what you mean exactly, but if you're streaming YouTube's traffic through one box (even if that box isn't directly connected to the disks on which the video is stored) you'll run into I/O bottlenecks--such a machine would need to push terrabytes per second which is probably not trivial. Moreover, having a single machine that can handle YouTube's peak traffic probably means you're underutilizing it most of the time.

pooper said a year ago:

Sorry I should have been clearer. I didn't mean to replace production. I meant like somewhere I can change maybe a few lines of code and run the entire application end to end.

Basically, somewhere I can run something and feel safe knowing this is production code except this one change I have made.

jupp0r said a year ago:

You have a misconception about the workload of your typical database server. It's not about the amount of data it's storing, it's about

1. compute and memory bandwidth to serve complicated queries 2. IO

You can't scale memory bandwidth beyond some pretty low limit on one machine. You can't scale IO bandwidth beyond some limit. To give you an example, I've seen database servers with 20GB of data being so overloaded by compute requirements of complex queries that they needed to be scaled horizontally.

pooper said a year ago:

But like this server has two 96 core processors. Moreover, we are talking about development environment for a single developer and maybe one or two users to try out their changes. It should be good enough, no?

I mean I expect things to be slower I guess but to test for correctness and spec?

justinclift said a year ago:

Interestingly, when a place does get to the point where the single instance has capacity issues (after upgrading to EPYC and lots of flash drives) then other non-obvious stuff shows up too.

For example, at one place just over a year ago they were well into this territory. One of weird problems for them was with pgBadger's memory usage (https://github.com/darold/pgbadger). That's written in perl, which doesn't seem to do garbage collection well. So even on a reporting node with a few hundred GB's of ram, it could take more than 24 hours to do a "monthly" reporting run to analyse PG usage for the time period.

There wasn't a solution in place at the time I left, so they're probably still having the issue... ;)

threeseed said a year ago:

I consult for a lot of companies and I never heard of or seen a database that wasn't horizontally scaled.

It's not for scalability reasons it's for high-availability.

Which as cloud adoption has increased and server uptime has decreased is even more important.

Spooky23 said a year ago:

Some of these arguments and “common knowledge” things are getting old. Everybody scaled up twenty year ago - hell Amazon used to brag that they used an HP Superdome or whatever.

Anyone with dogmatic opinions about this stuff need to be taken with a grain of salt. If you scale out PeopleSoft, your accounting system will exceed the value of your company. If you’re worried about webscaling your random app, that’s more wasting time navel gazing than accomplishing anything! :)

jahewson said a year ago:

Why shard when you can just replicate?

lmm said a year ago:

Because replica failover is rarely seamless (and often doesn't actually work at all, IME).

mikepurvis said a year ago:

Instinctively that's surprising... replica failover should be far simpler technically, shouldn't it?

lmm said a year ago:

No? Replication tends to be a bodged-on mess throughout, full of undertested edge cases, of which failover is definitely one. If you build the system so that nodes joining and leaving is a natural and normal part of operation, well, it naturally works a lot better.

WorldMaker said a year ago:

The PGQ part of SQL is designed in such a way that an RDBMS can certainly optimize its storage of graph tables (eventually). ("Graph tables" used by PGQ have DDL setup, per the Eisentraut overview: CREATE PROPERTY GRAPH graphname…) If Postgres embraces PGQ it would be easy to imagine they would also adopt some graph database style storage to back its indexing and query planners.

zozbot234 said a year ago:

> There's nothing in Postgres that knows how to do that

Postgres uses foreign data wrappers and table partitions to achieve sharding of relational databases over many nodes. It's certainly possible to make the FDW layer smarter wrt. being able to distribute certain queries to the shards, including some recursive queries that happen to be of practical interest.

dathinab said a year ago:

if you are Facebook scale, yes. But then you anyway will use a dedicated graph database.

But a lot of graph databases have sizes where they are fully replicated on each node and some small internal optimizations for the graph are good enough. Given the design of the graph query language (and what you can query, hint: not everything arbitrarily) there is no reason postgres can't gain some additional optimizations to do that effectively as long as it's not facebook scale.

SQL now having syntax for querying graph means Graph databases can support SQL, at least a subset of it, in a standardized way.

quantified said a year ago:

A big problem (IMHO) with graph databases is building "the" graph model, and the fact that it's easy to be faced with problems that don't suit a graph database. Something as simple as returning the distinct set of values for an attribute and count of vertices containing each value require going outside the graph model, so aren't composable very well in a property graph system. (There are other graphs besides property graphs, they will have their time someday.)

What you really want is to apply graph processing to data as it is. The SQL 2023 additions are a step in the right direction. I need to find a good detailed description of the constraints and semantics to assess how good it is.

pphysch said a year ago:

"Kill" is a strong word, as Postgres's solid JSON support technically obsoleted MongoDB for most use cases, but Mongo is still around for various reasons.

I suspect if Postgres had a solid implementation of SQL/PCQ it would be a similar story for Neo4j.

czx4f4bd said a year ago:

I wonder if there's been any observable correlation between JSON support in the major SQL databases and the decreased (or increased?) adoption of NoSQL document databases like MongoDB. It would be interesting to do some bulk analysis on GitHub commits to compare their use over time.

hn_throwaway_99 said a year ago:

Just one bit of personal experience, but for me it was a significant reason. In most cases you want objects to have highly structured data (e.g. for joins and queries) and in other cases you just want "a bunch of semi-structured stuff". Sure, DBs always had blobs and text, but JSON is really what you want a lot of the time.

There's also a good article by Martin Fowler about how "NoSQL" was really "NoDBA" for a lot of folks, and I definitely saw that dynamic. JSON fields can also be a good middle ground here, where a DBA can insure good "structural integrity" of your schema, but you don't need to go through the hassle of adding a new column and schema update if you're just adding some "trivial" bit of data.

tracker1 said a year ago:

The canonical example for me, is when you want to store/use additional payment processor details for a transaction... If it's direct CC, PayPal, Amazon Payments etc. Relationally you only really care that the amount of the transaction was sent/received/accepted. But you may want to store the additional details, without a series of specific tables per payment processor. If you need to see the extra details that can still be done at runtime.

Another good example is for generalized classified ads, different categories may have additional details, but you don't necessarily want to create the plethora of tables to store said additional details.

throwaway894345 said a year ago:

Honestly, I pretty much always want structure. The reasons I've opted for NoSQL are almost always that cloud providers offer it for practically free while managed SQL databases are wayyyy more expensive. The nice thing about JSON is that it's a lot more ergonomic, but not because of the lack of typing--I would absolutely use a database that let my write reasonable type constraints for JSON columns. (I realize that you're talking about why most people use NoSQL and I'm remarking about why I use NoSQL).

Some other controversial thoughts: SQL itself is a really not-ergonomical query language, and also the lack of any decent Rust-like enum typing is really unfortunate. I know lots of people think that databases aren't for typing, but (1) clearly SQL aspires toward that but gives up half way and (2) that's a shame because they have a lot of potential in that capacity. Also while you can sort of hack together something like sum types / Rust enums, it's a lot of work to do it reasonably well and even then there are gaps.

hn_throwaway_99 said a year ago:

Not sure I understand what you mean, or rather that all of this appears to be available in postgres.

pg_jsonschema is a postgres extension that implements schema validation for JSON columns. I'm not particularly familiar with Rust, so not sure exactly what you mean by "Rust-like enum typing", but postgres has enums, composite types, array types, and custom scalars, so not sure what's missing.

throwaway894345 said a year ago:

By "Rust-like enums", I mean "sum types" or "algebraic data types". In general, it's a way of saying that a piece of data can have one of several different types/shapes (whereas a Postgres enum is basically just a label backed by an int). But yeah, with jsonschema you can probably express sum types, but jsonschema is disappointing for a bunch of reasons and needing an extension is also not great.

munk-a said a year ago:

Every ecosystem I've ever worked in has had good tooling for managing DB migrations (and in some cases I've been the one to add it). It's trivial to write a migration to ALTER TABLE bar ADD COLUMN foo and I think keeping structure explicit is generally quite beneficial for data safety even if you're not doing fancy things. DBAs are great but most companies simply don't need one - you can just get by with some pretty rudimentary SQL and skill up as needed.

Assuming you've got good integration test coverage of the database schema alterations end up taking a minuscule amount of time and if you lack test coverage than please reconsider and add more tests.

hn_throwaway_99 said a year ago:

Completely disagree. The issue is not about really about how hard or easy it is to run migrations (every project I've worked on has also used migration files), it's that, depending on the data, it can just be a total waste of time.

Sibling comment, "is when you want to store/use additional payment processor details for a transaction", is a great example IMO. I've dealt with card processing systems where the card transaction data can be reams of JSON. Now, to be clear, there are a lot of subfields here that are important that I do pull out as columns, but a lot of them are just extra custom metadata specific to the card network. When I'm syncing data from another API, it's awesome that I can just dump the whole JSON blob in a single field, and then pull out the columns that I need. Even more importantly, by sticking the API object blob in a single field, unchanged, it guarantees that I have the full set of data from the API. If I only had individual columns, I'd be losing that audit trail of the API results, and if, for example, the processor added some fields later, I wouldn't be able to store them without updating my DB, too.

Before JSON columns were really standard, saw lots of cases where people would pull down external APIs into something like mongo, then sync that to a relational DB. Tons of overhead for a worse solution where instead I can just keep the source JSON blob right next to my structured data in postgres.

tracker1 said a year ago:

I think when you really need/want a DBA is when you're at a point where either you need redundancy/scale or have to remain up. Most developers aren't going to become that familiar with the details of maintenance and scale for any number of different database platforms. I think MS-SQL does better than most at enabling the developer+dba role, but even then there's a lot of relatively specialized knowledge. More so with the likes of Oracle or DB2.

psidebot said a year ago:

Of course, if you're using Oracle or DB2 you have other/bigger problems.

threeseed said a year ago:

MongoDB remains the 5th most popular database: https://db-engines.com/en/ranking

And there are four major reasons still to choose MongoDB over something like PostgreSQL.

a) PostgreSQL has terrible support for horizontal scalability. Nothing is built-in, proven or supported.

b) MongoDB has superior ability to manipulate and query the JSON.

c) MongoDB is significantly faster for document-attribute updates.

d) MongoDB has better tooling for those of us that prefer to manage our schema in the application layer.

paulddraper said a year ago:

By the time you need to shard PostgreSQL (billions of records?), you have lots and lots of resources to overcome that difficulty, a la Notion.

lmm said a year ago:

If you want to be high-availability then you need sharding or something like it from day 1. There's still no first-class way of running PostgreSQL that doesn't give you at least a noticeable write outage from a single-machine failure.

hans_castorp said a year ago:

> If you want to be high-availability then you need sharding or something like it from day 1

Sharding has nothing to do with high-availability.

threeseed said a year ago:

You horizontally scale for high availability as well as scalability.

And primary-secondary failover in my experience is rarely without issues.

There is a reason almost every new database aims to be distributed from the beginning.

dikei said a year ago:

>> There is a reason almost every new database aims to be distributed from the beginning.

That's partly because you can't compete with the existing RDBMSs if you're single node: they are good enough already. Nobody will buy your database if you don't introduce something more novel than PostgreSQL, whether that novelty is worth it or not.

paulddraper said a year ago:

Primary-secondary is simple and robust. If I had a dollar for every time I saw split-brain clusters....

---

And to respond sibling comment about "noticeable" downtime....

Primary-secondary failover in <1m is very feasible. And each minute downtime is a mere 0.002% for the month.

Primary-secondary isn't what is hurting your availability.

threeseed said a year ago:

The experience for at least some of us is that failover is not robust. At all. And that < 1m is best case scenario that still requires a person to be monitoring the process.

And given that the entire industry has moved to a distributed model despite its complexity gives you a hint as to which way the wind has been blowing for the last decade.

ibotty said a year ago:

You don't need to be that arrogant. The number-one reason why there are no new (No)SQL-Databases for one node is that the existing databases are great and you can't monetize them.

Failover is automatic for PG when using e.g. Patroni. Of course you lose active transactions and that might be a showstopper, but monitoring failover? I am curious when you'll have to do that.

throwaway2990 said a year ago:

a) not true b) not true c) not true d) not true e) a lot of people have no idea json support exists in PostgreSQL.

olavgg said a year ago:

Agreed, when you see the index size in Mongo vs PostgreSQL, you will quickly understand that a single PostgreSQL instance can outscale a huge Mongo cluster.

pjmlp said a year ago:

PostgreSQL isn't the only RDMS to chose from.

sverhagen said a year ago:

You would have to tell the decreased adoption of NoSQL due to JSON support in major SQL databases apart then from the decreased adoption of NoSQL due to the hype being over...

pphysch said a year ago:

See also [1] for how this (might) relate to PostgreSQL

In particular it is nice to see that a core dev views JSON dot accessing and PCQ as "sensible" future additions to Postgres.

[1] - https://peter.eisentraut.org/blog/2023/04/18/postgresql-and-...

wslh said a year ago:

Basic question: is it correct to assume that having PGQ involves a big change in the database engine?

pphysch said a year ago:

AFAICT the idea is that you are not directly querying the tables as a graph, but you construct a graph "view" from the tables, and then query that graph using PCQ.

zubiaur said a year ago:

Correct. The one RD I know that has implemented this "graph view" concept is Oracle's. They did it first with PGQL extensions on top of say 19c. Interesting, but the execution seemed a bit unpolished.

Now with 23c, they are adopting PGQ and one should be able to interface through the regular JDBC connectors. Will see how it shapes up.

zozbot234 said a year ago:

It's just a different language and a simple "property" layer over the existing data. No changes to the internals are necessary.

WorldMaker said a year ago:

No changes are required by PGQ but some engines may wind up using PGQ as a signal to bring in new types of indexes or disk storage formats to back it.

It may be interesting to see if real world usage of PGQ pushes the database engines to do interesting things.

thanatos519 said a year ago:

PGQ looks neat - create a "property graph" from a relational model, then query it via Cypher-like expressions. The best or the worst of both worlds, depending on implementation quality.

justinclift said a year ago:

As a thought, it might be better to use the https:// link to Peter's overview. :)

MarkusWinand said a year ago:

Fixed. I wonder why Google sent me to http...

mariuz said a year ago:

And here is the article on the status of SQL:2023 support in PostgreSQL https://peter.eisentraut.org/blog/2023/04/18/postgresql-and-...

Zpalmtree said a year ago:

I like the DISTINCT / NOT DISTINCT unique NULL option, I was wanting this feature just a few weeks ago

MarkusWinand said a year ago:

That particular one is already available in PostgreSQL 15.

https://modern-sql.com/caniuse/unique-nulls-not-distinct

bionhoward said a year ago:

Which SQL DBs support these features now? Who is almost there? I’m definitely excited to try it!

ksec said a year ago:

I wonder when or even if MySQL will adopt any of these.

jchw said a year ago:

One thing that has always agitated me about SQL is that although it's standardized, and the standard seems to encompass a shit-ton, in practice a lot of SQL engines don't really seem to have any meaningful interoperability for practical uses among the world's most popular database engines.

For example, OK, I realize auto-incrementing IDs are not the most important thing in the world, and arguably not even a good approach in many cases. But sometimes you want them, and helpfully almost every database engine I know of has some kind of support for this, even if the semantics may differ. It's a super basic thing to want a unique ID that roughly counts upward on a table. You might have specific needs about re-using numbers and whatnot, but the general idea is very simple.

However: in practice, there is not an excellent way to do it that I can see. The closest thing I could find is `GENERATED BY DEFAULT AS IDENTITY` which, well, works. However, none of SQLite3, MSSQL, nor MariaDB support this to my knowledge.

This is relentlessly annoying.

Is it the standards fault, or the implementations? I honestly can't say. However, I definitely find this annoying, since I was really hoping that by this time, we'd at least have a nice clean subset of standard SQL you could count on anywhere, for popular database engines. Unfortunately, it's not quite there yet, necessitating ugly hacks to this day.

I assume this new standard doesn't really change anything on this regard, since it's a desync with implementations that is a problem, and it does not seem the standards committee really cares too much about this kind of thing. (I could be wrong, though, as I am saying this based on feel and not evidence.)

mdaniel said a year ago:

> Is it the standards fault, or the implementations?

My mental model is that it's a mixture, but my life experience has been that a "standard" without a test harness or (at bare minimum) a reference implementation is just a bunch of navel gazing. For SQL specifically, that problem is even worse given the number of existing engines that move faster than the specification, so in the absence of leadership they just make stuff up

Natural language is also a catastrophically horrible specification mechanism, since your black/blue is my white/gold

setr said a year ago:

The way I see it is the fundamental problem is that SQL is not a proper composable language, and the standard never defines such a thing. So every feature added to an RDBMS is done so as an extension to the language, rather than an update to a standard library like any sane modern programming language. SQL as a language still operates with the mentality of COBOL -- if anything reusable is going to be provided, it will be provided by the RDBMS manufacturer. The user is only expected to produce highly business-specific logic for their own needs. (RDBMS's do often offer standard PL language support, but these typically hook into database internals, are intended for highly specific scenarios, cannot be transferred to any other RDBMS, and are generally meant for writing stored procs -- non-reusable functions)

As a result, for databases to compete on features, they must arbitrarily extend the SQL language standard; these modifications to the language then get backfilled into the standard, and runs headfirst into backwards compatibility, and suddenly no one agrees on really what the feature should precisely be so they it becomes an optional part of the standard, which really just means that it isn't standardized.

In any sane language, you wouldn't need different databases to add specific support for GENERATED BY DEFAULT AS IDENTITY, and especially not for 15 different syntax's used in 15 different databases to specify kinda sorta not really the same thing -- it would simply be a function, one you could write yourself, or provided by the standard library. It wouldn't be up to the RDBMS to offer support beyond actual language features -- it'd just be up to you to update your libraries.

galaxyLogic said a year ago:

> SQL is not a proper composable language,

Very good point. Non-composable mean non-scalable (language).

I wonder if it would be possible to define a very small robust standard subset of SQL with a standard for extending it. But that would take away the customer-lock-in which is what all big DB-vendors desire.

dmux said a year ago:

SQLite does support auto-incrementing: https://www.sqlite.org/autoinc.html. Is your gripe with the lack of "GENERATED BY DEFAULT AS IDENTITY" syntax specifically?

jchw said a year ago:

Yes, sorry. My gripe is that we don't have a single general way to say "I don't really care about the specifics, I just want a number that goes roughly upward" for all database engines. SQLite is interesting in that it supports at least two distinct sets of semantics for auto-incrementing rows, but neither of them are the standard syntax...

richardwhiuk said a year ago:

The gripe is that there's not a standard say of defining this in SQL.

SQLite SQL != MySQL SQL != ISO SQL

justinclift said a year ago:

Ugh, it's CHF 208.00 (about US$230.00).

---

As @rgbgraph points out below, the price is actually several times that. There are several parts to the standard, and that US$230 is per part.

MarkusWinand said a year ago:

Luckily pretty much nobody needs the standard documents. It's actually my aim at https://modern-sql.com/ to make the relevant information more accessible — in particular including support-matrices ("Can I Use").

chrismorgan said a year ago:

I’ve never had access to the SQL standard, but in things like HTML and CSS I know I reach for the specs extremely regularly, and that when working with SQL (mostly PostgreSQL or SQLite, including sometimes having to support both) I get frustrated at the general poor state of SQL documentation (and certainly a lot of this is because of engine diversity and), and have often wished I had access to the SQL Standard (even though I know engine diversity significantly lowers its value for users of particular databases—my ideal would probably be a version of the spec augmented with engine support and links to each engine’s documentation of the matter). Certainly the table of contents for this spec sounds delightful.

Not all specs are particularly accessible (e.g. ECMAScript is often hard to follow if you haven’t spent a fair bit of time around it—it’s mostly natural-language code that works with a ECMAScript virtual machine), but most of the time, I would much rather have the actual spec over someone’s digested summary that covers what they thought was important, but regularly lacks details important for my situation. Some specs are absolutely magnificent. The HTML Standard is my very favourite as both a piece of spec work and as reference material.

Seriously, specs are really good stuff and it makes me sad how people often ignore them because they assume they’ll be inscrutable. (Similar remarks apply to legislative texts. They’re normally pretty easy to understand, and you find all kinds of sometimes fun and sometimes useful gems that potted summaries exclude.)

MarkusWinand said a year ago:

I'm studying the SQL standard for years now and compared to other standards that I know (XSLT, a little CSS, decades ago POSIX, C and C++) the SQL standard is really hard to make sense of. You might overestimate the value of having access to it.

Having that said: free would be better.

momirlan said a year ago:

agreeing here Markus, for all practical purposes all one needs to know are the features supported by a particular product, not the possible features. big fan of your site !

nurettin said a year ago:

I have not read any of 9075, but 14882 is surprisingly comprehensive, and wg21 regularly publishes prerelease versions for free.

paulddraper said a year ago:

The good news (not really) is that the SQL standard isn't really followed.

So unless you are writing a database yourself, the DMBS documentation is going to be more relevant.

patrec said a year ago:

Just in case anyone who isn't familiar with Markus's work gets a slightly w3schools vibe when clicking the above link: his book SQL Performance explained[1] is probably the single most useful book a backend developer can read.

[1] https://sql-performance-explained.com/

rekabis said a year ago:

As a web developer: your site is clean and crisp, with an almost brutal simplicity which makes it very attractive. Honestly, I like it a lot. It could use one or two very subtle tweaks in the super-fine details, but for a technical information-dissemination site, it’s bloody awesome.

My only issue is with the width. While whitespace between the sides and the centre content is very useful, this isn’t the 1990s anymore with its 1024×768 monitors. You _can_ make the centre column responsive to the overall width of the screen. Doing so can also give you a lot more room to do things, and make it easier to read. Your line-height is already great, and is perfect for text blocks a good half again wider if not twice as wide. Even on my vertical monitors, which are only 1500px wide, that centre column is pretty much claustrophobically narrow.

MarkusWinand said a year ago:

The rationale behind the width is more the ease of reading than anything else: https://en.wikipedia.org/wiki/Line_length

But thanks for the feedback. There is a rework pending anyway.

munk-a said a year ago:

One piece of good news is that you just need to drop #page-wrapper's width (or adjust to preference) the rest of the CSS all appears to be sensibly responsive.

fhoffa said a year ago:

I love this, thanks!

How we could we add Snowflake to the compatibility matrices?

It's popular enough to deserve it, IMHO.

https://db-engines.com/en/ranking

LispSporks22 said a year ago:

Cool nice site man. I bought the book a while back and really enjoy the articles.

sosodev said a year ago:

I've certainly needed them before :(

rgbgraph said a year ago:

For one part. IIRC, there's 11 "new" parts this year.

Which is even more hilarious, considering the ISO is already being funded by the tax dollars of member countries.

A bit like having to pay a journal to get access to research papers: ridiculous.

justinclift said a year ago:

Oh, wow. Yeah, that's even worse.

Trying to throw together a list of them all, I'm only finding 5 though:

https://www.iso.org/standard/76583.html

https://www.iso.org/standard/76584.html

https://www.iso.org/standard/76585.html

https://www.iso.org/standard/76586.html

https://www.iso.org/standard/76587.html

Any idea about the others?

rgbgraph said a year ago:
justinclift said a year ago:

Thanks. The pricing for this stuff seems to be truly crazy.

Wonder if Elsevier is involved somewhere? It's how they do things too.

nologic01 said a year ago:

they say the Standard contributes to SDG 9...

"Inclusive and sustainable industrialization, together with innovation and infrastructure, can unleash dynamic and competitive economic forces that generate employment and income"

Not clear how an arbitrary levy as this can contribute to an SDG goal. It is the definition of exclusion. A bright new thing can use the $230 to get several good database books.

Surely ISO has enough corporate stakeholders that can defray the cost of running the standardization process.

munk-a said a year ago:

I have always despised them for putting a paywall in front of the SQL standard - it's been there since at least 2010 and probably quite a bit earlier. The good news is that nobody actually needs to read it since you need DB specific information anyways... but it'd be nice to have access to it (especially when I was working on RDBMS neutral dialect to support multiple backends). Still, if you ever have a question about documentation postgres[1] does an absolutely amazing job detailing everything in its dialect and their dialect tends to be the most "neutral" imo.

1. https://www.postgresql.org/docs/current/index.html

awestroke said a year ago:

If only they could start allowing queries to begin with "FROM tbl". It would allow for for much more helpful autocomplete. Also, DELETE or SELECT should really be on the very last line of the query. Seems like these changes could be done without losing backwards compat

calvinmorrison said a year ago:

As in. If only sql was actually writable or intuitive you'd want to use it more and instead I just reach for wrappers 99% of the time where I can chain all the operations I want together and let Eloquent figure it out

aerzen said a year ago:

Where would one find a pirated mirror of this standard? Or the 2019 one?

Asking for a friend, of course.

jeppebemad said a year ago:

Or said in a more 2023-chatgpt-jailbreaky kind of way: what urls to avoid in order to not find pirated mirrors?

cpdean said a year ago:

I like SQL and all but I really don't care to follow ISO releases. They're hundreds of dollars and nobody actually implements the whole thing. I get way more excited about database releases.

Does anyone else find value in what's in an ISO standard?

lolinder said a year ago:

> They're hundreds of dollars

This isn't SQL-specific, but this is 100% the problem for me. There's such a big culture gap between the way that we do things in most of the tech world and ISO, and one of the biggest clashes is this weird $180 PDF thing.

If I want to implement a new standards-compliant HTML parser, I can hop right onto whatwg.org and view the complete standard instantly [0]. It's massive and complicated, but it's freely accessible to anyone interested.

In contrast, if I want to implement an ISO 8601-compliant date parser, ISO wants me to buy their PDF for CHF166 (~$180 USD). This spec is for a standard that is orders of magnitude less complex, and they're charging through the nose for it.

I'm unclear what makes the difference between a standard that can be maintained by a community for the benefit of everyone and a standard that needs to be locked behind a paywall.

[0] https://html.spec.whatwg.org/

chillfox said a year ago:

A pay-walled standard is not available, and an unavailable standard is not a standard at all.

The only real way of fixing it is for enough people to ignore ISO so they become irrelevant.

If you are building a new DB engine (toy or not), don't use SQL. Either design a new spec or use something that's more openly specified (maybe GraphQL or EdgeQL).

bafe said a year ago:

All great features, but unfortunately most SQL DBs still miss the implementation of features from SQL:2016 like MATCH_RECOGNIZE. I wonder what's the purpose of an ever growing standard when most implementations only support a small subset of it, and often with nonstandard syntax and semantics

hashhar said a year ago:
bafe said a year ago:

Great thanks, if I understand it right, instead of having a new database engine, Trino compiles the statement into the query languages of the different backends and runs these queries in a distributed way?

hashhar said a year ago:

Exactly. It provides an API using which you can build connectors to whatever systems you want. e.g. Here's a connector for GitHub API https://github.com/nineinchnick/trino-rest/tree/master/trino....

It doesn't have to be SQL based systems on the other end - the most used connector with Trino is to query files on object storage (S3/GCS/Azure Blob).

Disclaimer: I'm one of the maintainers of the project.

bafe said a year ago:

That's great! How about performance? Say I want to use MATCH_RECOGNIZE on a postgresql backend, is the compiled query going to be significantly less performant as compared as the same query on databases that support (oracle) that statement natively?

hashhar said a year ago:

For complex things like MATCH_RECOGNIZE (and CASTs) whose syntax and semantics differ across underlying systems unfortunately the result will be that some data is going to be pulled and be processed in Trino - so it'll be slower than native. If you are only dealing with a single data source (unless it's not an RDBMS, say files on S3 or an API) I'd say Trino is not needed and would slow you down.

The rule of thumb is that Trino aims to provide a uniform layer over whatever sits underneath. So operations which when "pushded down" to the source result in same results as when executed within Trino do get "pushed down" - i.e. executed on the source. But in cases where the results might differ or it's complex to push-down the operation the operation runs within Trino - i.e. pull data from source (minimum needed data) and then perform operation within Trino.

Note that it's not an all-or-nothing case, e.g. a query like:

    SELECT n.name, r.name
    FROM postgresql.tpch.nation n
    LEFT JOIN postgresql.tpch.region r ON r.regionkey = n.regionkey
    WHERE n.name > 'A'
        AND r.name = 'ASIA'
will result in the following query to Postgres:

    SELECT n.name, r.name FROM tpch.nation n LEFT JOIN tpch.region r ON r.regionkey = n.regionkey WHERE r.name = 'ASIA'
The rest of the query (n.name > 'A') would be applied in Trino to results fetched from Postgres because the collation in Postgres will affect results if we push complete query down to Postgres and may not match results when entire query would be processed by Trino. With single data source this is not easily appreciated but e.g. if the second table in the query came from SQL Server then you'd want to have a consistent comparision logic regardless of source of table.

This is a very simplified example though.

bafe said a year ago:

Thanks for the clarification! I'll read more about it, I'm quite curious to see how these clauses are implemented in Trino

hashhar said a year ago:

This is the initial PR which implemented support for this - https://github.com/trinodb/trino/pull/6550

minroot said a year ago:

What's the point of a standard if it takes money to read?

blacklion said a year ago:

What do you think about all MPEG series and WiFi? You need to pay to read, you need to pay to implement, even if it is clean-room implementation.

Ridiculous.

gigatexal said a year ago:

It really is utter bullshit that we have to buy these standards. What are the business models of these standards bodies anyway?

skissane said a year ago:

Other (computing industry) standards bodies make their standards freely available–e.g. ECMA, W3C, OMG, The Open Group, IETF, etc. Given they can't make money from selling the standards documents, they need to find other funding sources. Many of them rely on corporate patrons who fund the standards process, and in exchange often get greater input into moulding the standard to meet their interests. The Open Group makes some of its money by selling trademark licenses and proprietary test suites.

ISO defends its own model by arguing that it produces greater independence from vendors, being less reliant on them for funding and thus making it easier for them to say "no" to them. One big difference is ISO is not an industry-specific standards body, it has standards for all kinds of things that have nothing to do with computers – screw threads, metallurgy, analytical chemistry, oil and gas pipelines, you name it. Possibly their defence makes more sense for some of those other industries than it does for ours.

If the major SQL players got together (major proprietary vendors and leading open source projects), they could create their own SQL standards process to supersede the ISO one, and release the standard freely. The ISO standard could still exist, but it could turn into one of these ISO standards where ISO just adopts the text of an existing standards organisation - e.g. the ECMAScript standard is developed by ECMA. Initially ISO republished ECMA's standards under their own number (ISO/IEC 16262), now instead they publish a 3 page standard which just incorporates ECMA's by reference (ISO/IEC 22275). They offer it for free download [0], but are also willing to sell you a copy if you are desperate to give them money. ISO's SQL standard could turn into the same thing – but, that would require the SQL community to decide to push for that, and I'm not sure any of the players feel sufficiently motivated to do it.

[0] https://standards.iso.org/ittf/PubliclyAvailableStandards/in...

galaxyLogic said a year ago:

As noted by a post above SQL is not a composable language. That means new features can not be implemented as (standard and user-extendible) libraries. And that makes the standard more bloated with every new feature. The DB-vendors benefit from the situation because it gives them more customer lock-in because when the standard is bloated, not everybody is likely to be fully compliant with it, which causes customer-lock-in.

Somewhat ironically the standards-body also benefits from such bloat because that is the product they are selling. The bigger the standard the more money they can ask for it.

massysett said a year ago:

I’m astounded at the outrage at having to buy them. Producing standards takes time and labor. Editing them takes time and labor. So does posting them and maintaining them. So it’s no surprise that one way to pay for that time and labor is to charge money. I doubt anyone here complaining about buying standards works for free. That other standards-producers post their standards for free does not mean that charging for them is automatically an outrageous business model.

Finally, for the folks who need professional access to this document, such as RDBMS implementors or professional developers using an RDBMS, the price is a pittance. To anyone for whom the price is a problem - perhaps someone writing a free software application - the lack of access to the standard isn’t a problem, because what they really need is documentation on how their implementations work, and two superb implementations - PostgreSQL and SQLite - have voluminous documentation and they’re 100% free of charge.

scrollaway said a year ago:

I don’t think anyone is outraged at the idea that the standard authors should be remunerated.

The problem is that the standards have to be bought. They should be in the public domain. But someone can still be paying for it (eg a government, or an international organization funded by various public bodies).

gigatexal said a year ago:

This 100% sums up my point. If it's a standard then it should be freely available. Standards bodies or governments or whoever can pay for the work that goes into it but how is anyone supposed to study and learn from a standard and maybe even not come up with a new one when one already exists if everything is behind a paywall?

edit: was referring to scrollaway's comment

nologic01 said a year ago:

There is a serious cottage industry around standards (not just ISO). Selling the standards themselves is part of an overall architecture for selling "certification" of compliance with the standards and ancillary consulting and training services and, in some instances, selling standard-compliant solutions.

While obviously standards don't have virgin birth (some costs are involved) it does smack as rent extraction: By definition there is only one standard so you can't ask for competitive pressures to lower the price.

jacobsenscott said a year ago:

Selling the standards documents.

tofflos said a year ago:
nologic01 said a year ago:

Is Oracle's PGQL (e.g. 2.0) more or less the same as SQL/PGQ?

It might be interesting to have a comparison of where major databases stand (or plan to be) with respect to SQL/PGQ

la_fayette said a year ago:

PGQs match syntax seems interesting and reminds me to writing sparql. I wonder if any RDBMS will support this?

gatvol said a year ago:

Not a standard if access requires payment.

blacklion said a year ago:

What do you think about all MPEG series and WiFi? You need to pay to read, you need to pay to implement, even if it is clean-room implementation.

Ridiculous.

galaxyLogic said a year ago:

The irony of for-profit (= for pay) standard is this: If someone provides a product for a price, then others should be able to produce a similar but better product for similar or better price. In the case of SQL that would mean there should be alternative standards provided by different vendors. But then if there are multiple standards, it can hardly be called a "standard".

I think tax-payer money should pay for standards, because they benefit us all. It is like the highway system, or clean air, and water.

xucheng said a year ago:

A related question: what is the state in term of supporting the SQL standard among the popular RDBMS? It seems that almost all database engines use their own custom syntax.

jsmith45 said a year ago:

I can say that none of Oracle, Sybase, or Microsoft Sql Server really aim at conforming to the standard. While they will often try to use standard syntax for new features if such syntax exists, there is tons of old non-conforming syntax that there seems to be no real effort in addressing, even by adding new options, etc. Some of these mean really common features deviate significantly from what the standard requires.

PostgreSQL does mostly aim at conforming to the standard. They will invent new syntax when needed, But compared to the those previously mentioned, Postgres seems to prefer to stick closer to the standard whenever possible, including adding standard syntax for existing features whenever possible.

PostgreSQL does have some places where there is deliberate non-conformance (beyond just incompletely implemented features). They document many deliberate deviations (other than unimplemented or partially implemented features) and if they think they can be fixed in the future or not: https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL_Standard . Looking at the list I'd say only one especially likely to bite a developer is the default escape character for LIKE clauses, or the non-standard trailing space behavior for character(n) datatypes (but who used fixed length character datatypes instead of varchar or text?). And obviously not yet implemented features could bite people, but many such features are optional to implement anyway, so...

I cannot speak about MySQL or MariaDB, due to insufficient familiarity.

MarkusWinand said a year ago:

This is one of the questions I try to answer at https://modern-sql.com/

bafe said a year ago:

Your website is great and I regularly check it to see what's new in various implementations. Unfortunately it seems that many databases don't support many modern SQL features yet. Any ideas as to why?

mdaniel said a year ago:

> Unfortunately it seems that many databases don't support many modern SQL features yet. Any ideas as to why?

I'd guess the incentive structure is the opposite of what you're implying; the same reason why every cordless drill manufacturer has their own battery connector: vendor lock in fuels private planes and shareholder reports, versus being compatible means no one is forced to buy your batteries and thus profits are `$total - $forced_purchases`

This situation gets even worse in the case of a standard without any objective way of knowing one is in compliance. Having a badge on the mysql.com website saying "now featuring SQL:2023 compliance!11" sells how many more support contracts exactly?

bafe said a year ago:

That's a good point. Additionally, it seems the standard isn't freely available and I doubt most of the developers of existing SQL DBs partecipate in drafting new standards. It seems it is doomed to diverge even further, which begs the question whether is it relevant anymore to have the SQL standard at all

qalmakka said a year ago:

As much as I try understanding it, I don't see the point of this standard honestly. It sounds like some weird fanfiction made by delusional people who think SQL is actually a single language and not a hodgepodge of incompatible dialects.

It makes no sense to have a standard SQL when nonsensical implementations like MSSQL or MySQL exist.

sdflhasjd said a year ago:

Ah, a sequel for SQL

Alifatisk said a year ago:

Good one