Hacker News new | past | comments | ask | show | jobs | submit login
Amazon Timestream – Fast, scalable, fully managed time series database (amazon.com)
285 points by irs on Nov 28, 2018 | hide | past | favorite | 124 comments



At my day job, I build a lot of machine learning systems that require data to be fed in a time series manner[1].

Often this means building systems to analyze terabytes of logs [semi]-realtime. All I have to say is - thank god! This is going to make my job a lot easier, and likely empower us to remove our current infrastructure setup.

I know at one point we actually considered building our own time series database. Instead, we ended up utilizing a Kafka queue with an SQL based backend after we parsed and paired down the data, because it was the only one quick enough to do the queries.

Should make a lot of the modeling I've worked on a bit easier[1].

[1] https://medium.com/capital-one-tech/batch-and-streaming-in-t...


I wouldn't jump the gun on this. I've been working within Amazon cloud for years and every year they make massive claims about new services at re:invent. Not saying this isn't going to be a good product, just saying it will probably take a while to be as useful as you're hoping.


I agree w/ this.

They make insane promises but, the promises don't like up to expectations.

Kinesis analytics for example, can aggregate data across a time window (sliding window) from a stream (Kinesis). A huge issue that isn't document or stated is that when kinesis analytics restarts due to the process dies (being migrated, binpacked, etc.) the ENTIRE time window has to get re-aggregated. So your count drops to 0.

Really unacceptable if you're using it to generate KPIs which you alert on. We ended up switching to a system which pushes the stream data into influx and running the aggregations there via queries.

Dealing w/ AWS during this entire process was a huge pain.


Similar examples have been Fargate and Lambda, which have surprisingly long cold start times depending on runtimes and VPC configurations. IMHO it is a big part of AWS expertise to know these things and be able to choose the right services not just according to the marketing brochures, but according to how they really work for each use case. Having said that, I'm glad to now have learned about that Kinesis Analytics restart issue.


Have you looked at Clickhouse for timeseries data? It's the one database I've found that can scale and can query in near-realtime.

I've loaded a 100 Billion Rows in into a 5 shard database and can do full queries across the whole dataset in under 10 seconds. It also natively consumes multiple kafka topics.


Specialized tools are specialized: just remember the limitations! - Bad with heterogeneous hardware (Cloudflare experience) - Non-throttled recovery (source replicas flooded with replication load) - No real delete/update support, and no transactions - No secondary keys - Own protocol (no MySQL protocol support) - Limited SQL support, and the joins implementation is different. If you are migrating from MySQL or Spark, you will probably have to re-write all queries with joins. - No window functions


ClickHouse isn't general purpose DBMS. It is the best tool for collecting and near-online analyzing huge amount of events with many properties. We were successfully using ClickHouse cluster with 10 shards for collecting up to 3M events per second with 50 properties each (properties translate to columns). Each shard was running on n1-highmem-16 instance in Google Cloud. The cluster was able to scan tens of billions of rows per second for our queries. The scan performance was 100x better than on the previous highly tuned system built on PostgreSQL.

ClickHouse may be used as a timeseries backend, but currently it has a few drawbacks comparing to specialized solutions: - It has no efficient inverted index for fast metrics lookup by a set of label matchers. - It doesn't support delta coding yet - https://github.com/yandex/ClickHouse/issues/838 .

Learn how we created a startup - VictoriaMetrics - that builds on performance ideas from ClickHouse and solves the issues mentioned above - https://medium.com/devopslinks/victoriametrics-creating-the-... . Currently it has the highest performance/cost ratio comparing to competitors.


> I've loaded a 100 Billion Rows

Have you done any load tests that would more closely mirror a production environment such as performing queries while clickhouse is handling a heavy insert load?


I'm working on developing benchmarking tools for internal testing, but both Yandex and CloudFlare use Clickhouse for realtime querying. I'm still in development phase for my product, but I'll make sure to post information & results when we launch here.

https://blog.cloudflare.com/http-analytics-for-6m-requests-p...

But I've spent a long time looking at the various solutions out there, and while ClickHouse is not perfect, I think it's the best multi-purpose database out there for large volumes of data. TimescaleDB is another one, but until they get sharding it's dead on arrival.


I'd say kdb (from kx systems) is the best database for this problemspace, but it is prohibitively expensive for the http analytics use case. It is also a pain to query, but unbelievable what it can do.


Very cool, I'll check this out!


It's a quirky piece of software and has limitations that need to be considered when standing up a production cluster- such as you cannot reshard it currently. If you have a 3 node cluster, it's messy and requires downtime to add another node.


Still a bit messy, but the clickhouse-copier utility helps a bit: https://github.com/yandex/ClickHouse/issues/2579


I have open issues about it on GitHub. It does not work correctly at this time. If you dig through the issues, there are statements by the devs saying the tool has been neglected.


We load millions of rows per second and use a handful (or more?) of materialized views to build appropriate summaries. Various clients make queries to the "raw data" and the views and it all works fine, basically.


> it all works fine, basically

The "basically" is what intrigues me :D


Do you have a schema available publicly? I would like to build a similar system using custom software + S3 + Parquet + Athena for this task and see if it works.


The schema is

    CREATE TABLE points(Timestamp DateTime,Client String,Path String,Value Float32,Tags Nested(Key String,Value String)) ENGINE = MergeTree() ORDER BY (Client, Timestamp, Path) PARTITION BY toStartOfDay(Timestamp)
And this is a like query I was using

    SELECT (intDiv(toUInt32(Timestamp), 15) * 15) * 1000 as t, Path, Value as c FROM points_dist WHERE Path LIKE 'tst_val1' and Tags.Value[indexOf(Tags.Key, 'server')] = 'node' and Timestamp >= toDateTime(1543421708) GROUP BY t, Path, Value ORDER BY t, Path
This table was made on 5 servers via a distributed table partitioned on the timestamp- so distribution was even.


Thanks! How big is the 100B rows in your system?


Takes up 200gb or so across 5 servers (this is according to ClickHouse's query stats). Actual disk might be a bit higher.


Thanks!


Have you considered columnal databases like 1010data?


I recall why this looks familiar. Their chief scientist put up a job posting on the subreddit for array languages. It sounds like they know how to use some pretty powerful tools.

https://www.reddit.com/r/apljk/comments/42uf2f/not_exactly_a...


Keira, nice article. It will be a long time before we are allowed to use this new service in C1


This is not cheap for the "DevOps" use case.

Imagine you have 1000 servers submitting data to 100 timeseries each minute. That's 100,000 writes a minute (unless they support batch writes across series) At $0.50 per million writes that's $72 a day or $26k a year.

Now imagine you want to alert on that data. Say you have 100 monitors that each evaluate 1GB of data once a minute. At $10 per TB of data scanned, that's $1,440 a day or $525k a year!


Between RDS and these proprietary database products, AWS is now its own biggest competitor. That's potentially fine, but there is an inherent conflict of interest there and that needs to be properly managed, and while that may be happening it didn't come across in the keynote.

The only way I can have trust in Amazon's proprietary products is if RDS continues to get less expensive every year, since that is effectively the BATNA to these new products. It's been a while now since the last RDS cost reductions, and unless we continue to see more of those it's hard to have confidence that Amazon will continue to treat their customers of these new proprietary services fairly over the long term.


That's convoluted... from you example, there are 100K writes/minute, while you assume 1GB data evaluated per minute per alarm. That is you're assuming 10K/item/timeseries for each alarm, while reality is going to be closer to 10-100 bytes/item/time-series, which cuts down the expense by two or three orders of magnitude.


That's a really good point, I was guesstimating very quickly when I wrote that. Depending on what your doing for the alarm and how they measure reads it may well be a lot less.

Let's say the read is 1MB instead of 1GB, that's now $1.44 a day and $525 a year. Query pricing becomes not so bad.

From my own experience, the 100 metrics per server estimate I was giving above is pretty low, though. Once you factor in different combinations of tags closer to 1000 is more realistic. That potentially brings up the write pricing quite a bit.


Well, that depends on what you consider cheap. Hiring someone to manage a time series system like graphite or prometheus is going to cost you a whole lot more than $26k a year


You're ignoring half of my example scenario. $26k writing the data, $525k for querying it just for alerting, plus whatever it costs to store and to query ad-hoc. That's over half a million dollars. Even if you hire someone for $250k, you can self-host your time series system for cheaper than that.

Self-hosting isn't the only option though. For example, that hypothetical 1000 server scenario would cost $180k a year at list pricing on Datadog or SignalFX.


It's priced for government contracts.


I'm actually impressed at how incredibly expensive they made this. $0.50 per million 1KB writes, which is 20x what aurora charges, since aurora allows 8KB writes. And Aurora is already expensive if you actually read/write to it.


> $0.50 per million 1KB writes, which is 20x what aurora charges, since aurora allows 8KB writes.

That's a weird comparison. 20x is only true if you write 8KB with every entry, and you haven't included the storage and instance savings.

It's not hard to come up with suboptimal scenarios where this is more expensive, but that's missing the point. It's optimized for a specific kind of usage pattern.


The pricing lines up with CloudWatch Logs, 50 cents/GB, 3 cents/GB/month.

Curious to see what the query language is for this, wonder if they're just exposing the backing store for CloudWatch as a service now.


I really don't think it's 20x the cost of Aurora in general, considering that Aurora costs are not that simple, but I don't have time to run the numbers, so let's go with that for a moment. Do you really think Amazon would introduce a product that's 20x as expensive if they didn't know there was a market for it?


I get the feeling this is for important data (banking etc) so I have a feeling this is 200x cheaper than whatever else is available.


That's not what is says in the release:

"With Timestream, you can easily store and analyze log data for DevOps, sensor data for IoT applications, and industrial telemetry data for equipment maintenance."


Quite excited for this! We have currently been experimenting with using DynamoDB, and managing our own rollups of our incoming data (previously on an RDS, which is not a good choice for this kind of data).

---

I've seen a lot of people complain about pricing, so I thought I'd share a little why we are excited about this:

We have approximately 280 devices out, monitoring production lines, sending aggregated data every 5 seconds, via MQTT to AWS IoT. The average messages published that we see is around ~2 million a day (equipment is often turned off, when not producing). The packet size is very small, and highly compressable, each below 1KB, but let's just make it 1KB.

We then currently funnel this data into Lambda, which processes it, and puts it into DynamoDB and handles rollups. The costs of that whole thing is approximately $20 a day (IoT, DynamoDB, Lambda and X-Ray), with Lambda+DynamoDB making up $17 of that cost.

Finally, our users look at this data, live, on dashboards, usually looking at the last 8 hours of data for a specific device. Let's throw around that there will be 10,000 queries each day, looking at the data of the day (2GB/day / 280devices = 0.007142857 GB/device/day).

---

Now, running the same numbers on the AWS Timestream pricing[0] (daily cost):

- Writes: 2million * $0.5/million = $1

- Memory store: 2 GB * $0.036 = $0.072

- SSD store: (2GB * 7days) * $0.01 (GB/day) * 7days = $0.98

- Magnetic store: (2 GB * 30 days) * $0.03 (GB/month) = $1.8

- Query: 10,0000 queries * 0.007142857GB/device/day --> 71GB = free until day 14, where it'll cost $10, so $20 a month.

Giving us: $1 + $0.072 + $0.98 + $1.8 + ($20/30) = $4.5/day.

From these (very) quick calculations, this means we could lower our cost from ~$20/day to ~$4.5/day. And that's not even taking into account that it removes our need to create/maintain our own custom solution.

I am probably missing some details, but it does look bright!

[0] https://aws.amazon.com/timestream/pricing/


It's got to be a rough day for the team at https://www.influxdata.com/ . This could become serious competition for their InfluxCloud hosted offering.


We've been expecting this for two years. It was just a matter of when. It validates our space. AWS did this to Elastic, they have competing products with NewRelic, Splunk, SumoLogic, and countless others. All of whom still have healthy businesses.

Our goal remains the same: build the best possible product that optimizes for developer productivity and happiness. And open source as much as we possibly can while maintaining a healthy business.


Can't wait to see your new cloud offering :)


Amazon is the best partner until you prove worth cannibalizing.


This space is growing like mad. They would be remiss if they did not expect something like this.


and for timescaledb, streamlio, two sigma, kdb+ , quasardb ( and perhaps pipelinedb)


QuasarDB employee chiming in - we’re not actually worried, our clients are typically operating at a scale that would make costs very prohibitive for this AWS product.

Having said that, I can definitely see this be an interesting product for people doing less than 10k inserts per second.


> make costs very prohibitive for this AWS product.

Competing with AWS on just cost sounds worrying to me.


Actually not really. We have migrated several customers to AWS and our experience with AWS services are kind of all over the place. S3 is super cheap and reliable and there is almost nothing that beats that but for example CloudWatch is extremely expensive for a large scale operation that is easy to beat with custom software (like Prometheus or something similar). I guess even Datadog would beat them (and they have much more advanced features and integrations). This means that there are multiple software vendors can exist in the same space even when Amazon has an offering in that space.


At what point are open source projects going to change their licensing to prevent the major cloud providers from just stealing their products? I highly doubt AWS built this from scratch. Amazon, Google, and Microsoft are going to choke the life out of these projects

Redis and MongoDB at least seem to have woken up

https://www.geekwire.com/2018/open-source-companies-consider...


> At what point are open source projects going to change their licensing to prevent the major cloud providers from just stealing their products?

There's a lot to unravel in there.

I prefer 'free software' to 'open source' as it has a clearer meaning, especially in this context. Even so, no one can steal free / open source software (or as you say, product -- though that turn strongly implies a commercial offering).

By definition you can't really stop anyone from using your free software, unless perhaps you start naming companies explicitly, but I can't imagine it'd be an easy process, or have a happy outcome, if you started targeting 'major cloud providers' for special conditions.

Note that I am not an apologist for AWS, Google, Microsoft, etc - but it feels like the fundamental problem here is not massive corporations charging other people to access free software.


Free software and open source aren't always the same thing though. Free Software is software that through the license enforces a philosophy.

Open source is software that through the license enforces the source code to remain open.

I'm not a fan of RMS or his attitudes on most things, but am a strong OSS fan as it is the best way to develop and maintain software.


> Free software and open source aren't always the same thing though.

Entirely agree, hence I drew the distinction. I eschew 'open source' as it's highly ambiguous, and mostly misses the point.

> Free Software is software that through the license enforces a philosophy.

I would disagree. Free software ensures the user has certain freedoms.

> Open source is software that through the license enforces the source code to remain open.

This is a very circular definition -- open source is open.

> I'm not a fan of RMS or his attitudes on most things, but am a strong OSS fan as it is the best way to develop and maintain software.

As it happens, rms is no fan of OSS.


RMS has said publically that proprietary software should be illegal, and that free software guarantees user freedoms. This is a philosophy, and one that opines every user wants to become a developer. For things like emacs, and much of GNU, this is true, for most of the rest of the world, this is not.

I eschew free software because I'm not about forcing my views on others (which is literally the mission of GNU). I'm about developing software to be the best it can be, and maybe meeting some friends along the way. Open source, being the best software development model overall, allows me to meet that goal. You could almost say some of RMS's more extreme quirks border on authoritarian (see the example with the abortion joke in the libc manual he FORBADE removal of and demanded be re-added when a dev simply overruled him). He's not acting in a manner that encourages "freedom", but as a simple and obvious dictator of all things GNU or claiming to be GNU. He's frequently tried to shape the path of GNOME (which I am a former foundation member and was on the sysadmin team) in areas he literally has no business weighing in on. Then there are some more gross personality problems, like his sexism, or tendency to actually eat his toenail gunk[1], or to ever refuse to be wrong on anything, even when an entire community disagrees with him.

Dr Stallman has done a great deal of good for the world with Free Software, however like the VAX and PDP-11, his time has passed. Open source won just like Linux won over GNU/Hurd. It is ok that he won, by losing.

[1] https://www.youtube.com/watch?v=I25UeVXrEHQ


rms's toenails aside, I'd suggest that every licence is about enforcing a set of views on others. There are plenty of licences to choose from, so it's fairly easy to find one compatible with your own views.

In the context of GP's (beginningguava) comment about 'open source projects' needing to change their licensing to prevent corporations making money by SaaSing various tools, my point was twofold - first, by definition you can't have free software with restrictions like that, and second you'd be merely fighting the symptoms (with little chance of success).

Aside - I'm curious what you mean by the 'open source software development model', as I don't think that's actually a thing.


Fair enough, the open source software development model is no different in reality from the free software development model generally speaking.

It goes back to ESR's The Cathedral and the Bazaar and is what he deems "the bazaar model" or "bazaar style" before him, Larry Augustin, and Bruce Perens (if memory serves) went on to coin the phrase "open source". Even if you don't necessarily agree with ESR (I see him in a similar vein as RMS fwiw), his thoughts on software development models have generally speaking, been proven true.


> Free Software is software that through the license enforces a philosophy.

This is incorrect, licenses like MIT or BSD are also free software license because they afford the 4 freedoms to the user (even though they don't enforce them on derivative works).

Licenses like the Redis one are open-source but not free software because they place limitations on the user (can't sell hosted service, IIRC)


Redis is BSD licensed. You mean some Redis module developed independently by Redis Labs that are not part of Redis itself.


Why not? InfluxDB does not have these capabilities.


> I highly doubt AWS built this from scratch

Unfounded doubt.


Nice to see, this has felt like a gap in cloud offerings for a while... and the open source options have difficulties.

From the little that was said, going to guess this uses something like Beringei (https://code.fb.com/core-data/beringei-a-high-performance-ti...) under the hood


The financial read cost of this database makes it practically unusable for customer facing dashboards, disappointing.


A place to put the timestamped data they download from yesterday's Amazon Ground Station.


Been searching for years to a good alternative to postgres for storing gobs of weather timeseries data. So far we have been running postgres system for many years in production and have hired multiple contractors to implement a 'real timeseries solution'. All of which have been utter shit and complete failures. The AWS services are expensive as all hell. With a little bit of imagination we created a unique schema for timeseries data that doesnt require terabytes of space, and processes billions of data points a day, and has blazing fast queries into said data.


I moved a decade's worth of weather time series data from well indexed Sqlite to InfluxDB and was nothing but pleased. It ended up taking an order of magnitude less storage and so much faster to query that I didn't even bother to benchmark it. You can probably write a simple query to your Postgres database to cough out the text file to load InfluxDB to see how it works for you. Then it comes down to how easy it is to replace your query and insert functions… So it's all easy except for the hard part.



I once handled gobs of web tracking data in Cassandra, using Hadoop over Cassandra to build queryable rollups in MySQL and Pig for on the fly analysis

Neither tech works on its own, but together (substituting Cassandra columns for hdfs) was magic for the specific data configuration & use case.


So what will this compare wrt boundary, signalfx, stackdriver, etc types of previous services...

Ill have to go look into this, because if aws historic pricing for any large volume stream, quickly becomes untennable.

Its very easy to have gobs and gobs of time series points... aws might make using this way too expensive for anything at relative scale for a small startup?


It appears that ingestion alone is more expensive than the commercial metric services. Might not matter for small scale.


I wonder how this compares to KDB


Since kdb is SO good with time series data, I would need a lot of tech details to even know if AWS is worth testing. It's a LOT of work to test a time series db at scale.


The core of Amazon's timeseries db likely doesn't fit into a CPU's L1 cache. It does with KDB :)


>The core of Amazon's timeseries db likely doesn't fit into a CPU's L1 cache. It does with KDB :)

Does it?

https://kx.com/discover/in-memory-computing/ seems to indicate that it takes up ~600 Kb (I'm not sure if this is bits or bytes, but even if it's bits, that turns into 75KB)

L1 cache is per core. Skylake Xeons have a 64KB cache per core, 32KB for data and 32KB for instructions. Even with an even split there, you're not fitting 75KB (or 600KB) into the L1 cache.

Bits would be a weird measurement to use when talking about memory utilization, so I'm pretty sure that it's 600 kilobytes. You're not anywhere close to fitting that into the L1 cache. L2 cache, sure. But you get the relatively spacious 1 megabyte for L2.

I'm also not sure that the "core" fitting into the CPU cache is particularly meaningful for performance anyway. it doesn't say anything about how much outside of the core gets used, how big the working set size is for your workload, how much meaningful work is done on that working set of data, etc. If you're frequently using parts of the software that don't fit in the cache, or getting evicted from it for other code, or your working set of data doesn't fit in the cache and you're constantly going to main memory for the data you're working on, the "core" fitting in L1 cache (or L2 cache, which looks more realistic) is going to be basically meaningless.


Gah, I meant L2 cache, but was being entirely too smug. I remember a presentation a KX rep gave at our office a few jobs ago where this was one of their bullet points. I found it amusing, and a bit odd.


Gotcha! Definitely an interesting marketing point. I probably would have had the same reaction :)


Same for me:)


Seems positioned to compete with Azure Data Explorer (MSFT's log/time series optimized service). I know Azure runs a lot of services on top of Data Explorer (previously called Kusto) I wonder if this is a true internal battle tested product or a me-too offering.


I use Kusto daily and it is by far my favorite dev tool. It's incredibly fast and actually fun to use. I'm interested to compare it to Timeseries.

Kusto is still the name of the query language and the desktop application (Kusto.Explorer), the service was just renamed to Azure Data Explorer.


Kusto is architecturally closer to Dremel (or BigQuery). It's a columnar compressed datastore with a nice query language. Not the most efficient way to store and query timeseries data though.

Back then (internally) we actually had a lot of issues with ingesting and querying time series data at scale.


I might be mistaken but wouldn't Data Explorer be more similar to AWS CloudWatch which has been around for a long time.


Azure Data Explorer/ Kusto is more of a database that is optimized for the log use case than a service. There is a front end tool and a lot of the use-cases are around log management, but it is database you can do general SQL or KQL things with. Time series is one of the core use-cases for it also but it has less marketing around it.


Sounds more like AWS CloudWatch Log Insights launched yesterday.


Seems like this could be a great remote storage backend for Prometheus.


Oh yeah! Would that need a custom storage peovider in Prometheus?


Yes [0]. I haven't had time to fully look into the details. But from looking at the existing integration -- it would be fantastically convenient to have something cloud native [1].

[0] https://prometheus.io/docs/prometheus/latest/storage/#remote...

[1] https://prometheus.io/docs/operating/integrations/#remote-en...


Honest question: when dealing with time-series data, do you actually need every data point? Is that level of granularity really necessary?

IMO, it makes way more sense to decide the aggregations you want ahead of time (e.g. "SELECT customer, sum(value) FROM purchases GROUP BY customer"). That way, you deal with substantially less data and everything becomes a whole lot simpler.


Really depends on the use case. Working in healthcare, vital signs can be modeled as time series points, but are lower frequency than, say, metrics from servers. However we want to store every point so a spike is not missed. One could argue an unsustained spike is noise, but in the healthcare domain there may be a correlation with some external event (the purpose is surprised and their heart rate spikes).


The clever thing to do in this scenario would be to keep every spike but delete all the data between similar data points after storing. So you get low granularity for identical/nearly-the-same data points and high granularity when something interesting happens. I don't have any experience with time-series data so maybe this is commonplace.


That would be impossible to run any new analyses on.

What some would do is record in blocks where every point after the earliest is stored as a delta. Then each block is more compressible as it contains a lot of 0s.


In finance it can be critical.

Some tasks actually require absolute granularity, up to 6 decimal places of precision and thereafter reliance on atomic order of arrival, for deterministic results on data from high frequency trading.

Without absolute knowledge of the order or if there's aggregation the best you can do is approximate, which often is considered suboptimal when the real solution is available.


Sure you can do if you're really sure that you won't need to group by something else later. You wouldn't want to store more granularity than necessary but you can't go back in time to get a data point you didn't store.


In that case, can you just store each data point in a data lake somewhere and do a batch-job? Apache Flink supports this use case as well as real-time.


Yes but I guess the point of using time series DB rather than just a lake that doesn't necessarily have a time structure is that if you know time is going to be important then you probably want to organize and query it that way.

What I am doing with one program could almost be called a data lake because it is just a bunch of JSONL files that have really varied data in them. But it's organized by date and hour per day as well as predefined keys, since I know I will need to query it that way.


We had applications where we were tracking guests in a venue through various means. We tried a number of queuing systems to manage the flood of events, but they'd all fall over. I'll love to run my old "venue simulator" through this and see if it can stand up to actual guest load as they walk around, ride, purchase things.


I'm wondering if this shares any technology with the CloudWatch metrics backend. They've been making improvements there all year, and most of them generally align with what's announced here.

CloudWatch metrics are also very expensive for what you get, so that's another similarity to Timestream ;)


I couldn’t tell from the page is this SQL based similar to timescale or a more similar to influxdb?


This is looking like a managed druid... that would be very nice to have.


Anyone know if this is what CloudWatch Insights uses? If so, it doesn't even come close to competing with Elasticsearch performance (with a tiny cluster), it seemed quite slow.


There have been a lot of amazon links this week


AWS Reinvent is happening. Kind of like how's there's a lot of Google links during IO or a lot of Apple links during WWDC.


Re:invent week. Expect a lot more.


where is the docs?


There are at least 7 Amazon-related stories on the HN front page right now, what’s going on?



AWS re:Invent


Merry christmas?


[flagged]


Most big tech companies bundle their announcements on their annual conference day. It happens several times a year and these same complaints show up every time, but spamming the threads with them is excessive.

https://hn.algolia.com/?sort=byDate&dateRange=all&type=comme...


A quick look at the Hacker News frontpage shows a bit of a problem,

    1.  Amazon Timestream (amazon.com)
    3.  Amazon Quantum Ledger Database (amazon.com)
    8.  Amazon FSx for Lustre (amazon.com)
    13. AWS DynamoDB On-Demand (amazon.com)
    14. Amazon's homegrown Graviton processor was very nearly an AMD Arm CPU (theregister.co.uk)
    21. Building an Alexa-Powered Electric Blanket (shkspr.mobi)
    30. Amazon FSx for Windows File Server (amazon.com)



Ah, sorry. I should've known you'd already be on it. Thanks.


Whether we are fans or not, keeping up with (and discussing) new AWS offerings is extremely relevant for practitioners.


AWS Re:Invent is happening in Las Vegas this week. AWS is launching new services all this week. People here are interested in cloud services, so it makes sense that people share and upvote those announcements. I don't see the problem that you're talking about.


jesus christ six amazon articles in a day? AWS is undeniably the body of christ for HN but am i missing something? FSX, blockchain, timestream, Graviton, ground station, and cloudwatch... all of these articles are advertisements for mundane shit.


It's AWS re:Invent day/week. I've never been, but I get the impression that it's like Apple's keynote, or Google I/O, in which big product announcements are made. On those days, you'll see multiple submissions about the respective conferences too.


This is exactly what re:Invent is. Most teams dream of launching a new AWS product at re:Invent (and not missing their date and launching at a later time)


It'd be an interesting blog post topic to look at how AWS:Invent (or I/O, or F8, etc) product threads on HN compare to actual product impact. I remember Rekognition getting decent discussion 2 years ago [0], but not along the angles or magnitude of how Rekognition is usually discussed in recent months. OTOH, other things I've been interested in as a data geek, I've barely heard of since reading about them on HN -- e.g. Athena [1]

[0] https://news.ycombinator.com/item?id=13072956

[1] https://news.ycombinator.com/item?id=13072245


https://reinvent.awsevents.com/

It happens every year during Google, Apple, Amazon, and Facebook events.


thanks. Just glanced at the shop calendar and i guess the holidays are right around the corner too.


There will be a lot more. The keynote today still has an hour left, and there is two more hours of keynotes tomorrow.

(Written from keynote floor)


Where can I see your keynote-notes/updates?


I don’t usually do that, I just share my thoughts as comments on HN. There’s enough people doing that other stuff. :)


So.... whats your biggest takeaway thus far then? ;-)

EDIT: Is mobot a dead project?


AI AI AI AI.


> all of these articles are advertisements for mundane shit

What do you think hacker news is supposed to be? Anything that isn't mundane to a lot of people isn't interesting enough for those in the target audience. Amazon is having its annual AWS conference and thus has a lot of announcements, of course they have a lot of new niche products.


I count 7 separate Amazon posts on the front of HN. Is this some conspiracy? #NotAmused #ShouldBeBundled


The conspiracy is called re:Invent




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: