This is really a question of economics. The biggest organizations with the most ability to hire engineers have need for technologies that can solve their existing problems in incremental ways, and thus we end up with horrible technologies like Hadoop and Iceberg. They end up hiring talented engineers to work on niche problems, and a lot of the technical discourse ends up revolving around technologies that don't apply to the majority of organizations, but still cause FOMO amongst them. I, for one, am extremely happy to see technologies like DuckDB come along to serve the long tail.
I just watched the author of this feature and blog post give a talk at the DataCouncil conference in Oakland, and it is obvious what a huge amount of craft, ingenuity, and care went into building it. Congratulations to Hamilton and the MotherDuck team for an awesome launch!
Not yet, but I believe the DataCouncil staff recorded it and will post it to their YouTube channel sometime in the next few weeks: https://www.youtube.com/@DataCouncil/videos
I agree 100% that this needs to be more of a thing. For data engineers building data pipelines, queries are like functions, and table schemas are like types. There needs to be a way to write a query that runs on an abstract interface, rather than an actual table. To do this, most folks rely on string templating in Python or Jinja, which makes the development process really cumbersome. As a result, most teams end up in scenarios where data pipelines are always a big mess of spaghetti SQL, or they are stuck maintaining complex frameworks that abstract away common logic, but are inscrutable to the average user.
I think your blog post frames the problem very well!
Seeing that both someone working on PRQL and Malloy replied and to both of you it's an understood pain makes me feel a lot better about the future of these tools! When talking about that with people that are not that deep into the problem it is often hard to transport the difference between this kind of composability vs. the composability that the tools are offering today, and the implications that come with that.
At a past startup I had the fortune to be able to work on a similar system to what I am looking for: Packageable, reusable relation algebra inspired by Substrait. It had the downside though that it was quite tied to RDF and SPARQL in its implementation, and now I'm chasing something similar in the SQL world :D
I have used CTEs with dynamic query stitching to solve this problem (specifically my business operates over two very similar but distinct domains which we keep in separate buckets). If you build the majority of your logic into a CTE that processes named columns coming out of a prior chunk you can swap out what actual columns in the DB are mapped into the columns coming out of that earlier CTE with its definition. It may be possible to make this more magical using pl/pgsql but I've found that dynamic query stitching at the CTE resolution is a level of fiddly-ness I'm comfortable building into resilient products.
I work with complex data models and keeping all that structure in my brain takes enough effort that I want to keep my queries as simple as possible because when it's time to debug one there's no way I'm carrying over _any_ memory from when I originally wrote it.
> There needs to be a way to write a query that runs on an abstract interface, rather than an actual table.
Proper use of SQL inverts control. Instead of parameterizing query by table, you write a query and at the actual use site you join it on the table you need by fields your query provides. VIEWs allows you to not repeat yourself too often.
Best thing is that you do not need to even mention that "abstract interface table" as a parameter at all.
> VIEWs allows you to not repeat yourself too often.
No they don't. They only offer a solution to the problem "many different predicates for a few tables", but don't offer a solution to the problem "a few similar predicates for many different tables", as views as per their declaration are already tied to a single table.
The argument is more than that -- namely that in addition to having sophisticated AI, Google also controls the OS (Android) and hardware (Pixel). Being able to integrate best-in-class AI at every level of the stack is a tremendous advantage. OpenAI can't do this because they don't control the OS, and will always need to go through an app. Apple can play since they control OS and hardware, but at the moment they appear pretty far behind in the AI aspect.
I agree, Apple needs to wait and see what team offers a good alternative to OpenAI and buy them, I hope they do not buy and close an open source friendly team.
I know that I'm part of a minority, but I'll avoid touching an "AI" enabled smartphone as long as possible. All privacy issues and tech issues aside, so far I have simply not found a single use case in which AI-tools would make my life easier.
As for privacy - Apple does most, if not all AI processing on-device. E.g. Siri understands voice commands when offline (although may work worse then I think).
A good on-device assistant would be super useful:
- instead of searching for a setting, you just tell it to change a device setting
- decent appointment/reminder setting ("remind me to invite joe to my next birthday party" -> and it knows who Joe is, and how to set up such a reminder)
- all kinds of search ("what was that book about startups that someone recommended to me a year ago?", "open up a tracking page for that thing I ordered last week", "show me a photo of XX I took last summer")
- managing e-mails the way old-school secretaries did - "any important messages?", or "reply to everyone I'm out of office, unless it's related to X"
As I said, while this is usefull for others, I have zero need for any of this. Settings? I know where to find those. Setting alerts? Use the standard calendar and timer apps. Internet search? Type it.
I make a point to not share my search history with anyobe, or any other data as much as possible. And I wont start doing so to enable some AI gadget I am more than happy to live without.
"Pixie, book that car service for me and remind them they promised a free brake fluid replacement"
"Pixie order all the ingredients I need to make my wife her favourite dish"
"Pixel, buy everyone in my family birthday presents"
I will 100% pay for that capability, and as someone that uses GPT4 to basically do my job for me, it really does not seem that far off. And I do think Google has an advantage here over Apple or anyone else, not just in Hardware, but in the enormous amount of data and information they have about both people in general, and specifically me.
I won't let any AI read my e-mails, let alone reply to them. I discuss car services when I drop of the car at the workshop, and the workshop confirms the details. I don't want an AI to know anyones favorite dish, and delivery services suck for fresh ingredients anyway. And I think about presents before buying them, that is a crucial bit of gifting things to people, especially if you care about them.
By the way, if I would use ChatGPT for my job, let alone letting it do for me, I'd be fired, worst case go to jail. And even if not, what would prevent my employer from firing me anyway if a basically free to use web service can do it at the same level I can?
I think you'll find yourself in the minority soon.
My grandma said, "I won't send any email through a computer, I prefer to write it by hand and have it delivered by a human being".
Although maybe it will be even weirder with AI assistants sending birthday greetings to other people's AI assistants and then your AI assistant summarizing who sent you birthday greetings.
The fact is the next generation moves on and uses new tools. I don't think AI will replace our jobs but people who are good at their jobs can, with the help of AI, outcompete people who are just good at their jobs but don't use AI.
The parent you were replying to may currently be able to automate what they are doing with the help of AI but I don't think that will last long as jobs end up requiring a mix of human and AI capabilities.
Replying to email is one thing - a pita to implement securely.
But reading? On-premise? You already allow basic algorithms to read your mail, for spam control and whenever you search through your messages. Having a better algorithm (as long as it's still on device) has no difference in terms of safety.
AI would be much better at code _review_ than code generation.
AI would be much better at auto-completing whole sentences and paragraphs and suggesting rephrasing etc in docs and presentations, than at answering questions about the world.
But the race seems to be to answer questions about the world based on a 2021 dump of the internet, so ... :)
Oh, as a tool I see a ton of use: personally for image processing (sharpening, noise reduction) even if I can perfectly live without it, professionally I see a ton of potential use in optimizing planning (supply chain, production, scheduling, netwrok planning...) by proposing scenarios and supporting the people doing said planning. I don't need AI to search the internet for me, summarize a book (if it is worth or important enough for me, I just read it) or all the other stuff AI currently is doing.
Just to be that picky guy;
> Android phones might even momentarily be more desirable than iphone
Most of the world is already there, and it's not momentary. iPhones only dominate the market in North America - Android has the majority basically everywhere else
As for docs, I can see them pushing for more Gemini integration depending on how M$ Copilot goes when it reaches saturation
> Just to be that picky guy; > Android phones might even momentarily be more desirable than iphone
> Most of the world is already there, and it's not momentary. iPhones only dominate the market in North America - Android has the majority basically everywhere else
Market share is not necessarily proportional to desirability. Cars make a useful analogy here. I think a Ferrari is more desirable than a Volkswagen, but Volkswagen’s market share is much higher.
Android is there because it’s what the cheapest devices use and most of the world does not want to pay even iPhone SE pricing. That’s relevant because the play described of making the Pixel lineup more compelling will hit the same problem if it’s tied to more capable hardware which is outside of the budget for many people.
OpenAI has close ties to Microsoft and we've already seen them integrate AI into Bing; MS may not be a player in phones but they make a major OS and some of the best hardware money can buy. I'm really not convinced vertical integration matters all that much, but if it does, they can do it.
>Apple can play since they control OS and hardware, but at the moment they appear pretty far behind in the AI aspect.
I disagree strongly with this, and it has everything to do with how we conceptualize AI.
I get in my car and plug in my iPhone. CarPlay immediately causes the Maps app to pop up and route me to my next meeting. I can say "hey siri, set a reminder to call Mr. Jones at 3:00" and she will gladly comply. If my buddy texts me while I'm driving to the meeting and asks if I'm free for golf tomorrow, she will automatically try to pin that on my calendar. I can throw out lots of examples here, but you get the idea.
Now granted, voice recognition in Siri has been pretty bad. She struggles with a lot of basic things, like putting on the music I request. But, there's no question in my mind that these augmented reality moments are where AI is actually making a difference in our lives and represent the actual business opportunity bridgeheads. In Apple's case, they not only already control the hardware (the phone, the watch, the earbuds, the tablet) but they've also figured out how to start bridging this into other hardware like a vehicle.
The impressive ML is ML that’s part of daily life. Even as I tap on this iPhone’s keyboard, button regions (tappable, not visible) slightly change in size depending on what it predicts the next letter would be. No one calls this “AI” yet it’s the same tech, and arguably more beneficial for the society as a whole than “AI” as a dedicated commercial service purpose-built to launder copyrighted creative works for profit (which is what AI is in the eyes of an ordinary person these days).
And yet the iPhone keyboard has gotten worse than it was a few years ago. I used an iPhone 8 years ago, and when I got an SE last year the keyboard was too smart, and so I disabled all the AI features.
Conversely, I've recently had Apple software mess up in a whole bunch of different ways:
• Maps doesn't understand that I don't own a car, defaults to driving sometimes
• Autocorrupt rather than autocorrect
• Calendar suggestions only work for the simplest of dates, so it suggested an event for the wrong month
• One case where it seemed to think the only timezone in the world was California
(It's not all negatives: for me, Apple has the least wrong voice transcription AI, and I do like their computational photography and definitely the ability to select text in images and Safari's website translation — but even then I don't think they're way ahead of the rest with these things, and website translation was definitely behind).
I suggest you try the assistant features on a Pixel 8 Pro. They have all the features you mentioned (except the creepy eavesdropping golf one) and the interaction is miles ahead, especially the text to speech.
I do not have any Android products, I only use Google Assistant through my Sonos speakers. Do you know if that code is different from that on the Pixel? (Because my experience of Google Assistant is long from good/useful, it struggles with basic tasks, I have to overpronounce to be sure it can differentiate “lights on” from “lights off”, etc etc.)
I don't know, because I haven't used Sonos for this purpose. But it's easy to imagine that a mobile handset with an array of microphones that you tend to hold near yourself would be more suited at the hardware level.
He does a wonderful job of taking very dense mathematical notation and explaining it in ways that anyone can understand. He derives the basic concepts of the lambda calculus from the ground up using Python. Super fun to follow along with.
That looks interesting, although it is 3 hrs. I like this ~45 minute intro to the concept, which seems a little more Turing-accesible:, in that it shows how you can implement addition and multiplication in the system, which is a lot:
My understanding of Cube is that iterating on the data model requires the user to (1) write SQL to develop a metric (2) edit YAML or JS config to incorporate the new metric (3) issue API request to Cube server and (4) compare results to raw SQL. Am I mistaken? Does Cube offer a smoother way to do this exploration/iteration?
Hey, Carlin! Nice to see you here in comments! (Waving "hi" to the Malloy team.)
Usually, the experience would look like this: one directly develops the data model in YAML (with only bits of SQL, if needed) and instantly explores metrics. No need to start with SQL in a separate tool/place (1), no need to use the API to check metrics (2) (for that, we have Playground, an interactive UI tool), and, thus, no need to compare results to raw SQL (4). You iterate but changing the data model and seeing the metrics in an instant, quite similar to how you work with Malloy, if I may.
Glad you asked! It is certainly usable for writing apps. We publish an npm package, and the Malloy VSCode extension [0] is one such example of an app built on top of it. There's also a demo of a toy CLI app that showcases the simplest possible use of the SDK: https://github.com/malloydata/malloy-demo-bq-cli
Fair critique on the presentation. That's mostly a function of the circles I run in, which is heavily weighted towards data analysts, data engineers and "analytics engineers". But to your point, I think that cohort is unlikely to be early adopters of a tool like Malloy, and that developers are a much easier sell.
The cli demo reads very nicely! Most impressive, looks very straightforward. I'd have some questions or playing around to do to understand how best to use this from a we server, where we were making multiple concurrent queries... Should runtime be pooled or are they safe to use across multiple I flight requests? Details. Overall very impressed with the code. Definitely a huge spike of faith for me & greatly accelerates my interest in trying to bring this in as a dev.
We can't be everything to everyone, not a blocker, but I do want to just raise the topic & see what comes out, Malloy seems purely to be a querying tool. So we'd probably have parallel Knex.js code to write data, then Malloy to read it? I should check your road map (yay you have a road map!) to see if there's anything far off here, but it was one of the major other thoughts I had about trying to get onto Malloy.
Hats off on the GitHub vscode notebook. Really really nice malleable tactile way to show the project off. It seems like there is a DuckDB (itself very promising new tech) instance that it relies on, which I think somehow the Malloy vscode extension provided & loads the data into automatically... Is this running on my browser, or GitHub? How? What's the magic that makes this happen? It was really cool seeing the markdown notebook be so tactile, a great feat; really well done!
Thanks for the kind words! Admittedly, a lot of the stuff you're asking about is still a work in progress, and we don't have good answers for it all just yet. The upside of that is you have the potential to influence which direction we take next. If you're thinking about building on top of Malloy, join our community Slack channel, and we'd be happy to provide guidance or take your suggestions/feedback! https://join.slack.com/t/malloy-community/shared_invite/zt-1...
Semantic layers, also known as metrics layers or “headless BI”, have become a popular topic in the online data community in recent years. For all the hype, the idea hasn't seen much traction. In this blog post, I hypothesize why not, and describe the Malloy language and why I think it has a better chance at succeeding.
It looks promising, but I think it's a one sided solution. I would like to see something like Malloy crossed with CubeJS. We need a consistent and flexible language like Malloy and the interface that CubeJS provides: REST, SQL, GraphQL.
The way I see it is business defines the processes. Those processes store and manipulate data in a storage. Business processes could change quickly so we need a way to quickly model and adapt to match. The analyst needs a way to slide and dice the data to optimize the business. The end user can get limited access to the data via an NLP interface, such as chatbot. Also, applications and RPA need to interface with the data as well.
To make semantic layers work, I think we need to think holistically, not just making it easy for a group of users. It has to be for analysts, developers, and applications/APIs.
It's a semantic layer with an integrated query language that makes a ton of improvements on SQL. Full disclosure, I actually joined the team a couple of weeks ago :)
MotherDuck has been making the rounds with a big funding announcement [1], and a lot of posts like this one. As a life-long data industry person, I agree with nearly all of what Jordan and Ryan are saying. It all tracks with my personal experience on both the customer and vendor side of "Big Data".
That being said, what's the product? The website says "Commercializing DuckDB", but that doesn't give much of an idea of what they're offering. DuckDB is already super easy to use out of the box, so what's their value-add? It's still a super young company, so I'm sure all that is being figured out as we speak, but if any MotherDuckers are on here, I'd love to hear more about the actual thing that you're building.
We're being a bit hand-wavy with the offering while we're in "build" mode, because we don't want to sell vaporware. DuckDB is easy to use out of the box, but so is Postgres, and there are plenty of folks building interesting cloud services using Postgres, from Aurora to Neon. And as many people will point out, DuckDB is not a data platform on its own.
For a preview of what we're doing, on the technical side, a couple of our engineers gave a talk at DuckCon last week in Brussels, it is on youtube here: https://www.youtube.com/watch?v=tNNaG7e8_n8
(for context I'm the author of this blog post and co-founder of MotherDuck)
Deliberately speculating so someone will correct it: I'd guess they'll make a bunch of enterprise tools to do things like: enable access and synch the data in a way which complies with various policy, encrypt/tokenize/hide certain columns etc, monitor queries, ensure data is encrypted at rest, stuff like that.
Assuming the above it true: I'll bet the reason they aren't so loud about exactly what they are doing is they want to get a head start on it. In theory anyone can build this stuff around DuckDB. From a marketing perspective the clever thing to do would be drive up usage of DuckDB while they build out all this functionality and then the minute corporates start seeing problems with their people using it (compliance etc), they have the solutions.
I'd wager you're right. All the "boring" stuff that's actually very complicated/difficult, and without which no large enterprise will adopt a technology.
Especially since enterprise companies hate the idea of shifting large amounts of highly sensitive company data onto commonly lost and misplaced work laptops.
If you're going to do that you better have your security and governance on point.