Hacker News new | past | comments | ask | show | jobs | submit | tnorgaard's comments login

Viking Link, the 765 km HVDC (VSC-based) link rated at 1400 MW between England and Denmark has a rated loss at 3.7% [0].

[0] https://www.viking-link.com/auction-faqs


I belive that Solaris (OpenSolaris) Zones predates LXC by around 3 years. Even when working with k8s and docker every day, I still find what OpenSolaris had in 2009 superior. Crossbow and zfs tied it all together so neatly. What OpenSolaris could have been in another world. :D


Answer: Materalized Views.

On a unrelated note: Still hoping for those automatically refreshed materalized views in PostgreSQL, ala what VoltDB has.


I've been looking at Materielize for a while (https://materialize.com/). It can handle automatically refreshed materialized views. Last time I checked, it didn't support some Postgres SQL constructs that I use often, but I'm really looking forward to it.


> Still hoping for those automatically refreshed materialized views in PostgreSQL, ala what VoltDB has.

Not exactly what you're hoping for and you probably already follow this pattern. pg_cron can help (and is now available in AWS RDS).

```sql CREATE EXTENSION IF NOT EXISTS pg_cron;

CREATE MATERIALIZED VIEW IF NOT EXISTS activeschema.some_thing_cached AS ...;

SELECT cron.schedule('some_thing_cached', '/5 * * *', $CRON$ REFRESH MATERIALIZED VIEW some_thing_cached; $CRON$ ); ```


I think that the problem is when you have a materialized view which takes hours to refresh. We are lucky that 99% of our traffic is during 7-19 on weekdays, so we can just refresh at night, but that won't work for others.

I don't know much about how postgresql works internally, so I just probably don't understand the constraints. Anyway as I understand, there are two ways to refresh. You either refresh a view concurrently or not.

If not, then postgres rebuilds the view from its definition on the side and at the end some internal structures are switched from the old to the new query result. Seems reasonable, but for some reason, which I don't understand due to my limited knowledge, an exclusive access lock is held for the entire duration of the refresh and all read queries are blocked, what doesn't work for us.

If you refresh concurrently, postgres rebuilds the view from its definition and compares the old and the new query result with a full outer join to compute a diff. The diff is then applied to the old data (like regular table INSERT/UPDATE/DELETE I assume), so I think you get away with just an exclusive lock and read access still works. There are two downsides to this, first that it requires a UNIQUE constraint for the join, second that the full outer join is a lot of additional work.

I never had the time to test Materialize, but it seems to do what I want with its continuous refresh.

I also thought about splitting the materialized view into two, one for rarely changing data and another one for smaller part of the data which changes daily. Then I would only have to refresh the smaller view and UNION ALL both materialized views in a regular view. Not sure how well will that work with postgres query planner.


Not sure about how that would work with the PG query planner either, but a batch for rarely changing data and rapid changing data is basically the Lambda data architecture, so probably a good call!


If it's a one shot data compilation, you could use something like postgres' NOTIFY to trigger a listening external app.


There's one gotcha with this approach: if there's another DDL operation running simultaneously with REFRESH MATERIALIZED VIEW, you'd get an internal postgres error.

You cannot be sure that refresh won't coincide with a grant on all tables in the schema, for example.


Given how well they work on any non-specialised DBMS, I prefer Postgres to take their time and do it right (AKA, differently from everybody else).


TimescaleDB (psql extension) has these, specific to time-series however.

https://docs.timescale.com/timescaledb/latest/how-to-guides/...


Mssql has "indexed views" which are automatically updated instantly... But they destroy your insert/update performance and their requirements are so draconian as to be completely impossible to ever actually use (no left joins, no subqueries, no self joins, etc...).


Yes, views are nice, but there is also a fair concept of not needlessly bogging down a table. Sure, they were making up data, but a flat table with stats, profile data and other easily external data is just bloat. Once you have an id then static fields can be retrieved from other services/data stores.


I'm not sure I am following. Aren't materialized views just formal, cached results of a query? That wouldn't bog down a table.


I think their point is more ‘don’t store all that junk in your primary database and then do all your work on it there too if you can just stuff it somewhere else’. Which has pros and cons and depends a lot on various scaling factors.


Materialized views are persistent tables that are typically updated when the underlying data is updated.

Typically.


I'm pretty sure most engines use the term "materialized views" for eventual consistency tables. The only db I've seen with that kind of ACID materialized view is MS SQL, which calls them "indexed views".


Perhaps he means it will bog down on refresh.


Maybe? Not sure.


Another thing I'm waiting for in Postgres is lifting and decoupling from the connection limit...


If one wanted to do server side rendering in Java with something like Turbo links, in 2020 - what would one use? JSP? Grails? JSF? Or just hit the bar instead? :-)


It depends!

JSF really is meant for quickly building internal applications that don't have to withstand "web scale" loads. It's focused on churning out data driven applications quickly. Add something like Bootsfaces or Primefaces and you can produce these things in very little time. Thats not to say you couldn't use JSF to make a "Web Scale" project, but you would have to dive into your server pretty far to carefully watch state management and session creation. Not impossible, but just probably not its primary purpose.

For external facing applications that need to withstand a "web scale" load, Eclipse Krazo (aka MVC spec 1.2) and JSP are what you're looking for. These things are lightning fast and give you a lot of control over session creation by default. Render times are usually under a few ms. This is probably the fastest and least resource intensive stack available (no benchmark provided, take it for what you paid for the comment).


According to this IBM study [2]/Wiki article [0], the probability is much higher - per the study with 128GiB of memory it is every 1h 25min.

There was an anecdote[1] about a Power Mac cluster at a university, which basically couldn't get past boot because the memory errors was so frequent.

[0] https://en.wikipedia.org/wiki/Cosmic_ray#Effect_on_electroni... [1] https://spectrum.ieee.org/computing/hardware/how-to-kill-a-s... [2] http://www.pld.ttu.ee/IAF0030/curtis.pdf


I see there is a few posts repeating the common interpretation that glyphosate is not dangerous because it only targets metabolic pathway only animals has, so for the sake of discussion here is another viewpoint: https://www.youtube.com/watch?v=kVolljHmqEs (disregard the clickbait title), summary: Glyphosate is not bad for your body, but it does kill everything in your stomach, and that is not so awesome.


Krste Asanovic in https://www.youtube.com/watch?v=KxuQW8HWBXI shows numbers that the Berkeley Rocket implementation of RISC-V is both faster and has smaller die size than the ARM Cortex-A5 and that the Berkeley BOOM is faster and smaller than the ARM Cortex-A9.


The Cortex-A series are big processors. They're generally used when compute power is more important than power consumption. While it's a promising benchmark, the RISC-V will need to compete with the M-series (and other 8/16 bit cores) to break into the IoT market.


As impressive as that is, I doubt there's much room for RISC-V in Cortex-A's target market (phones, smart TVs, etc etc). I explicitly mentioned Cortex-M.


We run with ZFS over LUKS encrypted volumes in production on AWS ephemeral disks and have done so in over two years on Ubuntu 14.04 and 16.04. The major issue for us has been getting the startup order right, as timing issues does occur once you have many instances. To solve this, we use upstart (14.04) and systemd (16.04) together with Puppet to control the ordering.

Performance wise it does fairly well, our benchmarks shows ~10-15% decrease on random 8kb IO (14.04).

We are definitely looking forward to ZFS native encryption!


What is the right order?


Since ZFS will run on blocklevel devices and you want to get the ZFS benefits of Snapshots/compression/(deduplication), in my opinion it makes sense to do the encryption at the blocklevel, i.e. LUKS has to provide decrypted block level devices before ZFS searches for its zpools. When ZFS native encryption is available on Linux this will be different, since you much finer control on what to encrypt and you can keep all ZFS features.

So:

First decrypt LUKS (we are doing this in GRUB) Then mount zpool(s)


Please stop spreading this misinformed statement. I assume you are referring to the ZFS ARC (Adaptive Replacement Cache). It works in much the same way as a regular Linux page cache. It does not take much more memory (if you disable prefetch) and will only use what is available/idle. We use Linux with ZFS on production systems with as low as 1GB memory. We stopped counting the times it has saved the day. :-)

ECC is a nice to have, but ZFS does not have special requirement over say a regular page cache. The only difference is that ZFS will discovery bit-flips instead of just ignoring them as ext4 or xfs would do.


> ECC is nice to have.

Actually it seems ECC is important for ZFS filesystems see:

http://louwrentius.com/please-use-zfs-with-ecc-memory.html


To be clear, it is not ZFS that requires or even mandates ECC. Since ZFS uses data as present in memory and has checks for everything post that, it is prudent to have memory checks at the hardware level.

Thus, if one is using ZFS for data reliability, one ought to use ECC memory as well.


> Actually it seems ECC is important for ZFS filesystems see:

The inflection made by the previous comment tends to lead people to think ECC RAM is needed for ZFS specifically. As the blog post you link to points out it's equally applicable to all filesystems.


It's not required, but it doesn't make sense to use ZFS but not to use ECC memory. That's the point. It's like locking the backdoor but leaving the front door wide open.


Interesting.

That's rigth the kind of hardware I was referring to, 1 GB of plain RAM. Truly, I haven't tested ZFS yet for that reason I've always read that ZFS has big requirements so I refrained to try it. It seems I should give it a try. ;)

Btrfs is another story I've used it for years and I'd prefer not to have to use it anymore untill it'll become "stable" and "performance". :)


FreeNAS != ZFS. The former is a specialised storage system that has to meet a very different set of criteria than a lightweight server with 1GB ram.


Is zfs able to repair from single data (copy) corruption?

My main issue is to be able to repair a "silent" data corruption on a single drive machine. Am I able to use x% of my "partition" to data repair or do I need to use other partition/drive to mirror/raid it?

If I understand right zfs can detect bitrot ("not really" a big deal) but without any local copy It can't self heal.

My use case is an arm A20 SoC (lime2) to storage local backups among other things, so I need something that detects and repairs silent data corruption at rest by itself (using a single drive).

A poor man NAS/server. ;)


Not sure if it will fit your needs or not, but for long term storage on single HDs (and back in the day on DVD), I would create par files with about 5-10% redundancy to guard against data loss due to bad sectors. http://parchive.sourceforge.net/ total drive failure of course means loss of data, but the odd bad sector or corrupted bit would be correctable on a single disk. This was very popular back in the binary UseNet days....


You can create a nested ZFS file system and set the number of copies of the various blocks to be two or more. This will take more space, but there'll be multiple copies of the same block of data.

Ideally, though, please add an additional disk and set it up as a mirror.

ZFS can detect the silent data corruption during data access or during a zpool scrub (which can be run on a live production server). If there happen to be multiple copies, then ZFS can use one of the working copies to repair the corrupted copy.


Got it but not for my use case then cause I don't want to halve my storage capacity.

Anyway I will try to use it for my main PC which has several disks and continue to use my solution for single disk machines (laptop, vps, SoC...). :)


Note it won't necessarily halve the capacity. Selectively enable it for the datasets requiring it, and avoid the overhead with the rest.


No but parity archives solves a different problem, with only some percent of wasted storage you can survive bit-errors in your dataset. It's like reed-solomon for files.

In order to achive the same with ZFS you have to run RAID-Z2 on sparse files.


We have been running ZFS on Linux in production since April 2015 on over 1500 instances in AWS EC2 with Ubuntu 14.04 and 16.04. Only one kernel panic observed so far, on a Jenkins/CI instance, but that was due to Jenkins doing magic on ZFS mounts, believing it was a Solaris ZFS mount.

In our opinion, when we made the switch, it was much more important to trust the integrity of the data, than any possible kernel panic.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: