Of course, the first commenter "willy" repeats the canard that statelessness makes no sense:
> The very notion of a stateless filesystem is ridiculous. Filesystems exist to store state.
It's the protocol that's stateless, not the filesystem. I thought the article made a reasonable attempt to explain that.
Overall the article is reasonable but it omits one of the big issues with NFSv2, which is synchronous writes. Those Sun NFS implementations were based on Sun's RPC system; the server was required not to reply until the write had been committed to stable storage. There was a mount option to disable this, but if you enabled it, it exposed you to data corruption. Certain vendors (SGI, if I recall correctly) at some point claimed their NFS was faster than Sun's, but it implemented asynchronous writes. This resulted in the expected arguments over protocol compliance and reliability vs. performance.
This phenomenon led to various hardware "NFS accelerator" solutions that put an NVRAM write cache in front of the disk in order to speed up synchronous writes. I believe Legato and the still-existing NetApp were based on such technology. Eventually the synchronous writes issue was resolved, possibly by NFSv3, though the details escape me.
NFS is basically the original S3. Both are useful for similar scenarios (maybe a slightly narrow subset for NFS (especially in later incarnations), and the semantics of both break down in similar ways.
I've always just presumed the development of EFS recapitulated the evolution of NFS, in many cases quite literally, considering the EFS protocol is a flavor of NFS. S3 buckets are just blobs with GUIDs in a flat namespace, which is literally what stateless NFS is--every "file" has a persistent UID (GUID if you assume host identifiers are unique), providing a simple handle for submitting idempotent block-oriented read and write operations. Theoretically, EFS could just be a fairly simple interface over S3, especially if you can implicitly wave away many of the caveats (e.g. wrt shared writes) by simply pointing out they have existed and mostly been tolerated in NFS environments for decades.
S3 and EFS actually are quite different. Files on EFS are update-able, rename-able and link-able (I.e what’s expected from a file system), while S3 objects are immutable once they are created. This comes from the underlying data structures. EFS uses inodes and directories while S3 is more of a flat map.
Protocol-wise EFS uses standard NFS 4.1. We added some optional innovations outside the protocol that you can use through our mount helper (mount.efs). This includes in-transit encryption with TLS (you can basically talk TLS to our endpoint and we will detect that automatically), and we support strong client auth using SigV4 over x509 client certificate.
> Will EFS be updated to use the NFS-TLS RFC once it settles down some?
I can't commit on a public forum for obvious reasons but we'll definitely take a serious look at this, especially when the Linux client starts supporting this. We did consult with the authors of that draft RFC earlier and it should be relatively easy for us to adopt this.
> It looked like work on this had stopped. Is there still hope that it might become a published RFC?
I don't know, I hope it will.
Not to go on too much of a tangent, and at the risk of sounding like my employer's fanboy, but one of the great things about working at AWS (I'm being honest, and yes we are hiring SDEs and PMs) is that we 100% focus on the customer. When our customers told us they needed encryption in transit, we figured out we could simply offer them transport-level TLS independent from the application-level RPC protocol. It may not have been the standards-compliant approach, but our customers have been enjoying fast reliable encryption for over 4 years now [1]. It solves a real problem because customers have compliance requirements.
"This one had to be paused for a bit to work out some issues around using a
wider type to hold the epoch value, to accomodate some DTLS-SCTP use cases
involving associations expected to remain up for years at a time.
https://github.com/tlswg/dtls13-spec/issues/249 ends up covering most of
the topics, though the discussion is a bit jumbled.
We have a proposed solution with almost all the signoffs needed, and should
be attempting to confirm this approach at the session at IETF 112 next
week...
"I'm sorry that these have been taking so long; these delays were
unexpected."
The sibling comment is correct. The EFS mount helper starts up and manages an stunnel process. We have not seen a significant impact on latency from the stunnel process.
Yes, Legato's first product was the Prestoserve NFS accelerator card (in 1989!).[0] NetApp's implementation mirrored the cache across two servers in a cluster with an interconnect.
NFSv3 "fixed" the write issue by adding a separate COMMIT RPC:
"The write throughput bottleneck caused by the synchronous definition of write in the NFS version 2 protocol has been addressed by adding support so that the NFS server can do unsafe writes. Unsafe writes are writes which have not been committed to stable storage before the operation returns. This specification defines a method for committing these unsafe writes to stable storage in a reliable way."[1]
“Why NFS Sucks” (2006), picking on a protocol that was over 20 years old at that point. Also cites “The Unix-Haters Handbook” in the abstract. Two strikes against its credibility already.
However, I did skim the paper, and it seems halfway reasonable, so I suppose I should read the whole thing. Of course nothing is above criticism, and there are many valid criticisms of NFS; but leading with “sucks” is just lazy.
If you think that was bad, just listen to what Theo de Raadt had to say.
"NFSv4 is a gigantic joke on everyone....NFSv4 is not on our roadmap. It is a ridiculous bloated protocol which they keep adding crap to. In about a decade the people who actually start auditing it are going to see all the mistakes that it hides.
"The design process followed by the NFSv4 team members matches the methodology taken by the IPV6 people. (As in, once a mistake is made, and 4 people are running the test code, it is a fact on the ground and cannot be changed again.) The result is an unrefined piece of trash."
Trash or not, the demand for the features is there. OpenBSD enjoys the luxury of simply telling people who need more sophisticated features to piss-off, at least until the time a protocol or interface has been hashed out and settled into a static target.
Notably, OpenBSD has an IPv6 and IPSec (including IKE) stack second to none. If OpenBSD developers actually had a need for the features provided by NFSv4, I'm sure OpenBSD would have an exceptionally polished and refined--at least along the dimensions they care about--implementation. But they don't. What they do have is a relatively well-maintained NFSv3 and YP stacks (not even NIS!), because those things are important to Theo, especially for (AFAIU) maintaining the build farm and related project infrastructure.
Yp is NIS. It was renamed by Sun due to the trademark on Yellow Pages. Maybe you’re thinking of NIS+ (which was an abomination). TBH, they are both horrible for their own reasons.
I also had in mind that OpenBSD deliberately and rigorously only refers to "YP" ("Yellow Pee"). Google "OpenBSD" and "NIS" and most of the hits you'll see directly from the OpenBSD project are from commit logs for patches removing accidental usages of "NIS" in initial YP-related feature commits. I'm not quite sure why they do that. I've kind of assumed it's to make clear that they have little interest in addressing vendor compatibility issues, and to emphasize that YP support, such as it is, is narrowly tailored to supporting the needs of the OpenBSD project itself. That's quite different from IPv6, IPSec/IKE, and even NFSv3, where cross-vendor interoperability is a concern (within reason).
Speaking of YP (which I always thought sounded like a brand of moist baby poop towelettes), BSD, wildcard groups, SunRPC, and Sun's ingenuous networking and security and remote procedure call infrastructure, who remembers Jordan Hubbard's infamous rwall incident on March 31, 1987?
>On March 31, 1987 Hubbard executed an rwall command expecting it to send a message to every machine on the network at University of California, Berkeley, where he headed the Distributed Unix Group. The command instead began broadcasting Hubbard's message to every machine on the internet and was stopped after Hubbard realised the message was being broadcast remotely after he received complaints from people at Purdue University and University of Texas. Even though the command was terminated, it resulted in Hubbard receiving 743 messages and complaints, including one from the Inspector General of ARPAnet.
I was logged in on my Sun workstation "tumtum" when it happened, so I received his rwall too, and immediately sent him a humorous email with the subject of "flame flame flame" which I've lost in the intervening 35 years, but I still have a copy of his quick reply:
From: Jordan K. Hubbard <jkh%violet.Berkeley.EDU@berkeley.edu>
Date: Tue, Mar 31, 1987, 11:02 PM
To: Don Hopkins <don@tumtum.cs.umd.edu>
Subject: re: flame flame flame
Thanks, you were nicer than most.. Here's the stock letter I've been
sending back to people:
Thank you, thank you..
Now if I can only figure out why a lowly machine in a basement somewhere
can send broadcast messages to the entire world. Doesn't seem *right*
somehow.
Yours for an annoying network.
Jordan
P.S. I was actually experimenting to see exactly now bad a crock RPC was.
I'm beginning to get an idea. I look forward to your flame.
Jordan
Here's the explanation he sent to hackers_guild, and some replies from old net boys like Milo Medin (who said the program manager of the Arpanet in the Information
Science and Technology Office of DARPA Dennis G. Perry said they would kick UCB off the Arpanet if it ever happened again), Mark Crispin (who presciently proposed cash rewards for discovering and disclosing security bugs), and Dennis G. Perry himself:
From: Jordan K. Hubbard <jkh%violet.Berkeley.EDU@berkeley.edu>
Date: April 2, 1987
Subject: My Broadcast
By now, many of you have heard of (or seen) the broadcast message I sent to
the net two days ago. I have since received 743 messages and have
replied to every one (either with a form letter, or more personally
when questions were asked). The intention behind this effort was to
show that I wasn't interested in doing what I did maliciously or in
hiding out afterwards and avoiding the repercussions. One of the
people who received my message was Dennis Perry, the Inspector General
of the ARPAnet (in the Pentagon), and he wasn't exactly pleased.
(I hear his Interleaf windows got scribbled on)
So now everyone is asking: "Who is this Jordan Hubbard, and why is he on my
screen??"
I will attempt to explain.
I head a small group here at Berkeley called the "Distributed Unix Group".
What that essentially means is that I come up with Unix distribution software
for workstations on campus. Part of this job entails seeing where some of
the novice administrators we're creating will hang themselves, and hopefully
prevent them from doing so. Yesterday, I finally got around to looking
at the "broadcast" group in /etc/netgroup which was set to "(,,)". It
was obvious that this was set up for rwall to use, so I read the documentation
on "netgroup" and "rwall". A section of the netgroup man page said:
...
Any of three fields can be empty, in which case it signifies
a wild card. Thus
universal (,,)
defines a group to which everyone belongs. Field names that ...
...
Now "everyone" here is pretty ambiguous. Reading a bit further down, one
sees discussion on yellow-pages domains and might be led to believe that
"everyone" was everyone in your domain. I know that rwall uses point-to-point
RPC connections, so I didn't feel that this was what they meant, just that
it seemed to be the implication.
Reading the rwall man page turned up nothing about "broadcasts". It doesn't
even specify the communications method used. One might infer that rwall
did indeed use actual broadcast packets.
Failing to find anything that might suggest that rwall would do anything
nasty beyond the bounds of the current domain (or at least up to the IMP),
I tried it. I knew that rwall takes awhile to do its stuff, so I left
it running and went back to my office. I assumed that anyone who got my
message would let me know.. Boy, was I right about that!
After the first few mail messages arrived from Purdue and Utexas, I begin
to understand what was really going on and killed the rwall. I mean, how
often do you expect to run something on your machine and have people
from Wisconsin start getting the results of it on their screens?
All of this has raised some interesting points and problems.
1. Rwall will walk through your entire hosts file and blare at anyone
and everyone if you use the (,,) wildcard group. Whether this is a bug
or a feature, I don't know.
2. Since rwall is an RPC service, and RPC doesn't seem to give a damn
who you are as long as you're root (which is trivial to be, on a work-
station), I have to wonder what other RPC services are open holes. We've
managed to do some interesting, unauthorized, things with the YP service
here at Berkeley, I wonder what the implications of this are.
3. Having a group called "broadcast" in your netgroup file (which is how
it comes from sun) is just begging for some novice admin (or operator
with root) to use it in the mistaken belief that he/she is getting to
all the users. I am really surprised (as are many others) that this has
taken this long to happen.
4. Killing rwall is not going to solve the problem. Any fool can write
rwall, and just about any fool can get root priviledge on a Sun workstation.
It seems that the place to fix the problem is on the receiving ends. The
only other alternative would be to tighten up all the IMP gateways to
forward packets only from "trusted" hosts. I don't like that at all,
from a standpoint of reduced convenience and productivity. Also, since
many places are adding hosts at a phenominal rate (ourselves especially),
it would be hard to keep such a database up to date. Many perfectly well-
behaved people would suffer for the potential sins of a few.
I certainly don't intend to do this again, but I'm very curious as to
what will happen as a result. A lot of people got wall'd, and I would think
that they would be annoyed that their machine would let someone from the
opposite side of the continent do such a thing!
Jordan Hubbard
jkh@violet.berkeley.edu (ucbvax!jkh)
Computer Facilities & Communications.
U.C. Berkeley
From: Milo S. Medin <medin@orion.arpa>
Date: Apr 6, 1987, 5:06 AM
Actually, Dennis Perry is the head of DARPA/IPTO, not a pencil pusher
in the IG's office. IPTO is the part of DARPA that deals with all
CS issues (including funding for ARPANET, BSD, MACH, SDINET, etc...).
Calling him part of the IG's office on the TCP/IP list probably didn't
win you any favors. Coincidentally I was at a meeting at the Pentagon
last Thursday that Dennis was at, along with Mike Corrigan (the man
at DoD/OSD responsible for all of DDN), and a couple other such types
discussing Internet management issues, when your little incident
came up. Dennis was absolutely livid, and I recall him saying something
about shutting off UCB's PSN ports if this happened again. There were
also reports about the DCA management types really putting on the heat
about turning on Mailbridge filtering now and not after the buttergates
are deployed. I don't know if Mike St. Johns and company can hold them
off much longer. Sigh... Mike Corrigan mentioned that this was the sort
of thing that gets networks shut off. You really pissed off the wrong
people with this move!
Dennis also called up some VP at SUN and demanded this hole
be patched in the next release. People generally pay attention
to such people.
Milo
From: Mark Crispin <MRC%PANDA@sumex-aim.stanford.edu>
Date: Mon, Apr 6, 1987, 10:15 AM
Dan -
I'm afraid you (and I, and any of the other old-timers who
care about security) are banging your head against a brick wall.
The philsophy behind Unix largely seems quite reminiscent of the
old ITS philsophy of "security through obscurity;" we must
entrust our systems and data to a open-ended set of youthful
hackers (the current term is "gurus") who have mastered the
arcane knowledge.
The problem is further exacerbated by the multitude of slimy
vendors who sell Unix boxes without sources and without an
efficient means of dealing with security problems as they
develop.
I don't see any relief, however. There are a lot of
politics involved here. Some individuals would rather muzzle
knowledge of Unix security problems and their fixes than see them
fixed. I feel it is *criminal* to have this attitude on the DDN,
since our national security in wartime might ultimately depend
upon it. If there is such a breach, those individuals will be
better off if the Russians win the war, because if not there will
be a Court of Inquiry to answer...
It may be necessary to take matters into our own hands, as
you did once before. I am seriously considering offering a cash
reward for the first discoverer of a Unix security bug, provided
that the bug is thoroughly documented (with both cause and fix).
There would be a sliding cash scale based on how devastating the
bug is and how many vendors' systems it affects. My intention
would be to propagate the knowledge as widely as possible with
the express intension of getting these bugs FIXED everywhere.
Knowledge is power, and it properly belongs in the hands of
system administrators and system programmers. It should NOT be
the exclusive province of "gurus" who have a vested interest in
keeping such details secret.
-- Mark --
PS: Crispin's definition of a "somewhat secure operating system":
A "somewhat secure operating system" is one that, given an
intelligent system management that does not commit a blunder that
compromises security, would withstand an attack by one of its
architects for at least an hour.
Crispin's definition of a "moderately secure operating system": a
"moderately secure operating system" is one that would withstand
an attack by one of its architects for at least an hour even if
the management of the system are total idiots who make every
mistake in the book.
-------
From: Dennis G. Perry <PERRY@vax.darpa.mil>
Date: Apr 6, 1987, 3:19 PM
Jordan, you are right in your assumptions that people will get annoyed
that what happened was allowed to happen.
By the way, I am the program manager of the Arpanet in the Information
Science and Technology Office of DARPA, located in Roslin (Arlington), not
the Pentagon.
I would like suggestions as to what you, or anyone else, think should be
done to prevent such occurances in the furture. There are many drastic
choices one could make. Is there a reasonable one? Perhaps some one
from Sun could volunteer what there action will be in light of this
revelation. I certainly hope that the community can come up with a good
solution, because I know that when the problem gets solved from the top
the solutions will reflect their concerns.
Think about this situation and I think you will all agree that this is
a serious problem that could cripple the Arpanet and anyother net that
lets things like this happen without control.
dennis
-------
OK, I read the Olaf Kirch article, and the "NFS Sucks" title is mostly clickbait. There are indeed a bunch of shortcomings in NFS that he points out, that are partially addressed by NFSv4. He also admits that (as of 2006) there isn't anything better.
Locking has historically always been a problem in NFS. Kirch mentions that NLM was designed for Posix semantics only. I frankly don't know if NLM is related to `rpc.lockd` which appeared in SunOS 4 and possibly even SunOS 3 (mid 1980s at this point) which well predates anything having to do with Posix. Part of the problem is the confused state of file locking in the Unix world, even for local files. There was BSD-style `flock` and SYSV-style `lockf` and there might even have been multiple versions of those. Implementing these in a distributed system would have been terribly complex. Even at Sun, at least through the mid 1990s, the conventional wisdom was to avoid file locking. If you really needed something that supported distributed updates, it was better to use a purpose-built network protocol.
One thing "willy" got right in his comment is that NFS is an example of "worse is better". In its early version, it had the benefit of being relatively simple, as acknowledged in the LWN article. This made it easy to port and reimplement and thus it became widespread.
Of course being simple means there are lots of tradeoffs and shortcomings. To address these you need to make things more complex, and now things are "ridiculous" and "bloated". Oh well.
The funny thing is that we're running our large NFS servers on OmniOS (with Linux NFS clients), as plugins for a certain large PHP-based blogging platform loves to sprinkle LOCK_EX flocks all over the place.
Sadly with a Linux NFS server, lock state eventually corrupts itself to extinction but OmniOS can tick along past 300 days uptime without a problem.
Of course.... these issues only show up under production levels of load, and have never been able to be distilled into a reproducable test case.
As others have said I think the title was more clickbait-y. Olaf was the person who wrote the in-kernel NFS server so I think he at least appreciated NFS somewhat. The paper makes a few reasonable criticisms many of which are addressed now.
My favorite criticism from that paper is that NFS clients reused the source port so that the server can detect whether a new connection is the same client or not. This confuses stateful packet filtering on the network because both connections now have the same 5-tuple and packets on the new connection can look like out of window packets on the old connection. This can get connections blackholed depending on the network. This was fixed a few years ago in the Linux client for NFS v4.1, since that version of the protocol already has a different way to identify clients. Before this was fixed, EFS had to document a workaround.
Stewart, I think "sucks" is a pretty fair description of a protocol that actually trusted the client to tell the server what its host name is, before the server checked that the host name appears in /etc/exports, without verifying the client's ip address. On a system that makes /etc/exports easily publicly readable via tftp by default.
Did you find my criticism of how X-Windows sucks in the Unix Haters Handbook as unfair and un-credible and lazy as you found the book's criticism of NFS? Or my criticism of OLWM and XBugTool, which also both sucked?
Did you ever fix those high priority OLWM bugs I reported with XBugTool that OLWM unnecessarily grabbed the X11/NeWS server all the time and caused the input queue to lock up so you couldn't do anything for minutes at a time? And that related bug caused by the same grab problem that the window system would freeze if you pressed the Help key while resizing a window? Or manage to get OLWM's showcase Open Look menus to pin up without disappearing for an instant then reappearing in a different place, with a totally different looking frame around it, and completely different mouse tracking behavior? That unnecessary song and dance completely ruined the "pinned menu" user experience and pinned menu metaphor's illusion that it was the same menu before and after pinning. While TNT menus simply worked and looked perfectly and instantly when you pinned them, because it WAS the same menu after you pinned it, so it didn't have to flicker and change size and location and how it looked and behaved. Ironically, the NeWS Toolkit was MUCH better at managing X11 windows than the OLWM X11 window manager ever was, because our NeWS based X11 window manager "OWM" was deeply customizable and had a lot more advanced features like multiple rooms, scrolling virtual desktops, tabbed windows supporting draggable tabs on all four edges, resize edges, custom resize rubber banding animation, and pie menus, as well. It also never grabbed and froze the window server, and it took a hell of a lot less time and resources to develop than OLWM, which never lifted a finger to support TNT the way TNT bent over backwards to support X11.
NeWS Tab Window Demo -- Demo of the Pie Menu Tab Window Manager for The NeWS Toolkit 2.0. Developed and demonstrated by Don Hopkins:
>I39L window management complicates pinned menus enormously. TNT menus pin correctly, so that when you push the pin in, the menu window simply stays up on the screen, just like you'd expect. This is not the case with XView or even OLWM. Under an I39L window manager, the Open Look pinned menu metaphor completely breaks down. When you pin an X menu, it dissappears from the screen for an instant, then comes back at a different place, at a different size, with a different look and feel. If you're not running just the right window manager, pinned menus don't even have pins! There is no need for such "ICCCM compliant" behavior with TNT menus. When they're pinned, they can just stay there and manage themselves. But were TNT windows managed by an external I39L window manager, they would have to degenerate to the level of X menus.
>I could go on and on, but I just lost my wonderful xbugtool, because I was having too much fun way too fast with those zany scrolling lists, so elmer the bug server freaked out and went off to la-la land, causing xbugtool to lock the windows and start "channeling", at the same time not responding to any events, so when I clicked on the scrolling list, the entire window system froze up and I had to wait for the input queue lock to break, but by the time the lock finally broke (it must have been a kryptonite), xbugtool had made up its mind, decided to meet its maker, finished core dumping, and exited with an astoundingly graceful thrash, which was a totally "dwim" thing for it to do at the time, if you think about it with the right attitude, since I had forgotten what I wanted to file a bug against in the first place anyway, and totally given up the idea of ever using bugtool to file a bug against itself, because bugtool must be perfect to have performed so splendidly!
From the news-makers archive:
>From: Skip Montanaro <crdgw1!montnaro@uunet.uu.net> Date: Feb 16, 1990
>Charles Hedrick writes concerning XNeWS problems. I have a couple of
comments on the XNeWS situation.
>The olwm/pswm interface appears (unfortunately) to be stable as far as Sun
is concerned. During XNeWS beta testing I complained about the lack of
function key support, but was told it was an OpenLook design issue. (NeWS1.1
supported function keys, and you could do it in PostScript if you like.) Sun
likes to tout how OpenLook is standard, and was designed by human factors
types. As far as I'm concerned, nobody has had enough experience with good
user interfaces to sit down and write a (horribly large, hard-to-read) spec
from which a window manager with a "good" look-and-feel will be created. I'm
convinced you still have to experiment with most user interfaces to get them
right.
>As a simple example, consider Don Hopkins' recent tabframes posting. An
extra goody added in tabframes is the edge-stretch thingies in the window
borders. You can now stretch one edge easily, without inadvertently
stretching the other edge connected to your corner-stretch thingie. Why did
the OpenLook designers never think of this? SunView had that basic
capability, albeit without visible window gadgetry. It wasn't like the idea
was completely unheard of.
>I agree that running the XNeWS server with an alternate window manager is a
viable option. Before I got my SPARCStation I used XNeWS in X11ONLY mode
with gwm, which was the only ICCCM-compliant window manager I had available
to me at the time. If you choose to use twm with XNeWS, I recommend you at
least try the X11R4 version.
>From:
William McSorland - Sun UK - Tech Support <will@willy.uk> Date: May 14, 1991 Subject: 1059370: Please evaluate
>Bug Id: 1059370
Category: x11news
Subcategory: olwm
Bug/Rfe: rfe
Synopsis: OLWM does a Server Grab while the root menu is being displayed.
Keywords: select, frame_busy, presses, left, mouse, server, grabbed
Severity: 5
Priority: 5
Description:
>Customer inisisted on having this logged as a RFE and so:-
>When bringing up the root menu inside OW2.0 the window manager
does a Server Grab hence forcing all its client applications
output to be queued by the server, but not displayed.
>The customer recommends that this should be changed to
make olwm more friendly.
>Apparently a number of other window managers don't do
a server grab while the root menu is being displayed.
>From: Don Hopkins <hopkins@sun.com> Subject: 1059974: Bug report created
>Bug Id: 1059974
Category: x11news
Subcategory: server
Bug/Rfe: bug
Synopsis: I have no mouse motion and my input focus is stuck in xbugtool!!!
Keywords: I have no mouth and I must scream [Harlan Ellison]
Severity: 1
Priority: 1
Description:
>This is my worst nightmare! None of my TNT or XView applications are
getting any mouse motion events, just clicks. And my input focus is
stuck in xbugtool, of all places!!! When I click in cmdtool, it gets
sucked back into xbugtool when I release the button! And I'm not using
click-to-type! I can make selections from menus (tnt, olwm, and xview) if
I click them up instead of dragging, but nobody's receiving any mouse
motion!
>I just started up a fresh server, ran two jets and a cmdtool, fired up
a bugtool from one of the jets (so input focus must have been working
then), and after xbugtool had throbbed and grunted around for a while
and finally put up its big dumb busy window, I first noticed something
was wrong when I could not drag windows around!
>Lucky thing my input focus ended up stuck in xbugtool!
>The scrollbar does not warp my cursor either... I can switch the input focus
to any of xbugtool's windows, but I can't ... -- oomph errrgh aaaaahhh! There, yes!
>Aaaaah! What a relief! It stopped! I can move my mouse again!!
Hurray!!! It started working when I opened a "jet" window, found I
could type into it, so I moved the mouse around, the cursor
disappeared, I typed, there were a couple of beeps, I still couldn't
find the cursor, so I hit the "Open" key, the jet closed to an icon,
and I could type to xbugtool again! And lo and behold now I can type
into the cmdtool, too! Just by moving my cursor into it! What a
technological wonder! Now I can start filing bug reports against
cmdtool, which was the only reason I had the damn thing on my screen in
the first place!!! I am amazed at the way the window system seems to
read my mind and predict my every move, seeming to carry out elaborate
practical jokes to prevent me from filing bugs against it. I had no
idea the Open Windows desktop had such sophisticated and well
integrated interclient communication!
>From: Don Hopkins <hopkins@sun.com> Subject: 1059976: Bug report created Date: May 21, 1991
>Bug Id: 1059976
Category: x11news
Subcategory: olwm
Bug/Rfe: bug
Synopsis: OLWM menus are inconsistant with the rest of the desktop,
Keywords: pinned menus, defaults, tracking, inconsistant look and feel, yet another open look toolkit
Severity: 2
Priority: 2
Description:
>You can't set the default of a pinned menu by holding down the control key
and clicking over an item.
>Pressing the middle button over the default of a pinned menu erases the
default ring.
>You can't set the default of a unpinned menu by pressing the control key then
the popping it up by pressing the MENU button on the mouse.
>When you're tracking a menu, and press the control key, the highlighting
changes properly, from depressed to undepressed with a default ring, but
when you release the control key before releasing the MENU button on
the mouse, the highlighting does not change back to depressed without a
default ring. Instead it stays the same, then changes to un-depressed
without a default ring at the next mouse movement, and you have to move
out and back into the menu item to see it depressed again.
>When you're dragging over a menu, then press the control key to set the
default, then release the mouse button to make a selection, without
releasing the control key, OLWM menus are stuck in default-setting mode,
until the next time it sees a control key up transition while it is
tracking a menu.
>Clicking the SELECT button on the abbreviated menu button on the upper
left corner of the window frame (aka the close box or shine mark)
should select the frame menu default, instead of always closing the
window.
>The tracking when you press and release the control key over a menu pin
is strange, as well. Push-pins as menu defaults are a dubious concept, and
the HIT people should be consulted for an explaination or a correction.
>When you press the menu button in a submenu, it does not set the default of
the menu that popped up the submenu, the way XView and TNT menus do. This
behaviour also needs clarification from the HIT team.
>Pinned OLWM menus do not track the same way as unpinned menus. When you
press down over an item, and drag to another item, the highlighting does
not follow the mouse, instead the original item stays highlighted. The item
un-highlights when the mouse exits it, but the menu highlighting should track
the item underneath the cursor when the user is holding the mouse button down,
just like they do with a non-pinned menu. The current behavior misleads you
that the item would be selected if the button were released even though the
cursor is not over the menu item, and is very annoying when you press down over
a pinned menu item and miss the one you want, and expect to be able to simply
drag over to the one you meant to hit.
>If we are crippling our menus this way on purpose, because we are
afraid Apple is going to sue us, then Apple Computer has already done
damage to Sun Microsystems without even paying their lawyers to go to
court. We licensed the technology directly from Xerox, and we should
not make our users suffer with an inferior interface because we are
afraid of boogey-men.
>In general, OLWM is yetanother OpenLook toolkit, and its menus are
unlike any other menus on the desktop. This is a pity because the user
interacts so closely with OLWM, and the conspicuous inconsistancy
between the window manager and the Open Look applications that it
frames leads people to expect inconsistancy and gives the whole system
a very unreliable, unpredictable feel.
>Status: Desktop integration issue. Marked in bug traq as evaluated.
>This is an X-and-NeWS integration issue and is a terribly complicated problem. X11 does not expect that server-internal locking will ever time out. X11 has similar situations to the one mentioned in the bug report where a client grabs the server and then a passive grab triggers. And it works fine. The difference is that the effect of the passive grab doesn't time-out, thereby causing an inconsistent state.
>One possibility cited for a fix is to change the server to stop distributing events to synchronous NeWS interests while an X client has the server grabbed. But this might only result in moving the problem around and not in solving the real problem.
>According to Stuart Marks, olwm could grab the keyboard and mouse before grabbing the server and that might get around this particular problem.
>ACTION: Stuart Marks should be supported in making this change.
You've posted two extremely long posts about this here. If you helped write the Unix Hater's Handbook that makes this argument older than decent chunk of the people here, myself included. That's an impressively long time to hold a grudge.
The topic of this discussion is "NFS: The Early Years", so if the Unix Haters Handbook is older than you are, then the topic of this discussion, The Early Years of NFS, is even older still.
That's an impressively long time for anyone born and working professionally for Sun Microsystems before The Early Years of NFS to hold the incorrect opinion that NFS doesn't suck. ;) So when smarks makes the provably false claim that NFS doesn't suck, and accuses me of being "lazy" for disagreeing with that, I'm glad I was diligent enough to keep the receipts, and generous enough to share them.
I just don't like being called "lazy" for saying "NFS Sucks" by the same guy whose window manager was so lazy it unnecessarily grabbed the X11 server all the time and locked up the window system for minutes at a time, and whose menus flickered and moved and resized and drew and tracked differently when you pinned them, since I've fairly and un-lazily written in great detail about NFS and other Sun security issues numerous times, and un-lazily implemented OPEN LOOK menus and a TNT X11/NeWS window manager that didn't suffer from all those usability problems.
Speaking of lazy menus: Correctly implemented Open Look pinned menus actually had to support two identical hot-synced menus existing and updating at the same time, in case you pinned a menu, then popped the same menu up again from the original location. The TNT menus would lazily create a second popup menu clone only when necessary (when it was already pinned and you popped it up again), and correctly supported tracking and redrawing either menu, setting the menu default item with the control key and programmatically changing other properties by delegating messages to both menus, so it would redraw the default item ring highlighting on both menus when you changed the default, or any other property.
Object oriented programming in The NeWS Toolkit was a lot more like playing with a dynamic Smalltalk interpreter, than pulling teeth with low level X11 programming in C with a compiler and linker plowing through mountains of include files and frameworks, so it was actually interactively FUN, instead of excruciatingly painful, and we could get a lot more work done in the same amount of time than X11 programmers.
Consequently, TNT had a much more thorough and spec-consistent implementation of pinned menus than OLWM, XView, OLIT, or MOOLIT, because NeWS was simply a much better window system that X11, and we were not lazy and didn't choose to selectively ignore or reinterpret the more challenging parts of the Open Look spec, like the other toolkits did because X-Windows and C made life so difficult.
See the comments in the "Clone" method and refs to the "PinnedCopy" instance variable in the PostScript TNT menu source code:
% Copy this menu for pinning. Factored out to keep the pinning code
% easier to read. The clone has a few important differences, such as
% no pin or label regardless of the pin/label of the original, but is
% otherwise as close a copy as we can manage.
> Object oriented programming in The NeWS Toolkit was a lot more like playing with a dynamic Smalltalk interpreter, than pulling teeth with low level X11 programming in C
Funnily enough I've been writing a pure-Smalltalk X11 protocol implementation recently, for Squeak, and it starts to have some of the feel you describe. It generates Smalltalk code from the XML xcbproto definitions. It's at the point now where you can send X11 requests interactively in a Workspace, etc., which is fun ("playing with a dynamic Smalltalk interpreter"), and I'm working on integrating it with Morphic. Anyway, thought you might enjoy the idea.
The NFS protocol wasn't just stateless, but also securityless!
Stewart, remember the open secret that almost everybody at Sun knew about, in which you could tftp a host's /etc/exports (because tftp was set up by default in a way that left it wide open to anyone from anywhere reading files in /etc) to learn the name of all the servers a host allowed to mount its file system, and then in a root shell simply go "hostname foo ; mount remote:/dir /mnt ; hostname `hostname`" to temporarily change the CLIENT's hostname to the name of a host that the SERVER allowed to mount the directory, then mount it (claiming to be an allowed client), then switch it back?
That's right, the server didn't bother checking the client's IP address against the host name it claimed to be in the NFS mountd request. That's right: the protocol itself let the client tell the server what its host name was, and the server implementation didn't check that against the client's ip address. Nice professional protocol design and implementation, huh?
Yes, that actually worked, because the NFS protocol laughably trusted the CLIENT to identify its host name for security purposes. That level of "trust" was built into the original NFS protocol and implementation from day one, by the geniuses at Sun who originally designed it. The network is the computer is insecure, indeed.
And most engineers at Sun knew that (and many often took advantage of it). NFS security was a running joke, thus the moniker "No File Security". But Sun proudly shipped it to customers anyway, configured with terribly insecure defaults that let anybody on the internet mount your file system. (That "feature" was undocumented, of course.)
While I was a summer intern at Sun in 1987, somebody at Sun laughingly told me about it, explaining that was how everybody at Sun read each other's email. So I tried it out by using that technique to mount remote NFS directories from Rutgers, CMU, and UMD onto my workstation at Sun. It was slow but it worked just fine.
I told my friend Ron Natalie at Rutgers, who was Associate Director of CCIS at the time, that I was able to access his private file systems over the internet from Sun, and he rightfully freaked out, because as a huge Sun customer in charge of security, nobody at Sun had ever told him about how incredibly insecure NFS actually was before, despite all Sun's promises. (Technically I was probably violating the terms of my NDA with Sun by telling him that, but tough cookies.)
For all Sun's lip service about NFS and networks and computers and security, it was widely know internally at Sun that NFS had No File Security, which was why it was such a running inside joke that Sun knowingly shipped it to their customers with such flagrantly terrible defaults, but didn't care to tell anyone who followed their advice and used their software that they were leaving their file systems wide open.
Here is an old news-makers email from Ron from Interop88 that mentions mounting NFS directories over the internet -- by then after I'd told him about NFS's complete lack of security, so he'd probably slightly secured his own servers by overriding the tftp defaults by then, and was able to mount it because he remembered one of the host names in /etc/exports and didn't need to fetch it with tftp to discover it:
>From: Ron Natalie <elbereth.rutgers.edu!ron.rutgers.edu!ron@rutgers.edu>
Date: Wed, Oct 5, 1988, 4:09 AM
To: NeWS-makers@brillig.umd.edu
>I love a trade show that I can walk into almost any booth and
get logged in at reasonable speed to my home machine. One
neat experiment was that The Wollongong Group provided a Sun
3/60C for a public mail reading terminal. It was lacking a
windowing system, so I decided to see if I could start up NeWS
on it. In order to do that, I NFS mounted the /usr partition
from a Rutgers machine and Symlinked /usr/NeWS to the appropriate
directory. This worked amazingly well.
>(The guys from the Apple booth thought that NeWS was pretty neat,
I showed them how to change the menus by just editing the user.ps
file.)
>DonHopkins on Sept 28, 2019 | parent | context | favorite | on: A developer goes to a DevOps conference
>I love the incredibly vague job title "Member, Technical Staff" I had at Sun. It could cover anything from kernel hacking to HVAC repair!
>At least I had root access to my own workstation (and everybody else's in the company, thanks to the fact that NFS actually stood for No File Security).
>[In the late 80's and early 90's, NFSv2 clients could change their hostname to anything they wanted before doing a mount ("hostname foobar; mount server:/foobar /mnt ; hostname original"), and that name would be sent in the mount request, and the server trusted the name the client claimed to be without checking it against the ip address, then looked it up in /etc/exports, and happily returned a file handle.
>If the NFS server or any of its clients were on your local network, you could snoop file handles by putting your ethernet card into promiscuous mode.
>And of course NFS servers often ran TFTP servers by default (for booting diskless clients), so you could usually read an NFS server's /etc/exports file to find out what client hostnames it allowed, then change your hostname to one of those before mounting any remote file system you wanted from the NFS server.
>And yes, TFTP and NFS and this security hole you could drive the space shuttle through worked just fine over the internet, not just the local area network.]
Sun's track record on network security isn't exactly "stellar" and has "burned" a lot of people (pardon the terrible puns, which can't hold a candle to IBM's "Eclipse" pun). The other gaping security hole at Sun I reported was just after the Robert T Morris Worm incident, as I explained to Martha Zimet:
>Oh yeah, there was that one time I accidentally hacked sun.com’s sendmail server, the day after the Morris worm.
>The worm was getting in via sendmail’s DEBUG command, which was usually enabled by default.
>One of the first helpful responses that somebody emailed around was a suggestion for blocking the worm by editing your sendmail binary, searching for DEBUG, and replacing the D with a NULL character.
>Which the genius running sun.com apparently did.
>That had the effect of disabling the DEBUG command, but enabling the zero-length string command!
>So as I often did, I went “telnet sun.com 25” to EXPN some news-makers email addresses that had been bouncing, and first hit return a couple of times to flush the telnet negotiation characters it sends, so the second return put it in debug mode, and the EXPN returned a whole page full of diagnostic information I wasn’t expecting!
>I reported the problem to postmaster@sun.com and they were like “sorry oops”.
An unpopular opinion, but NFS is super handy and useful in reliable private networks with centralised authentication. Sure, it has its downsides that are being worked on with newer versions of the protocol(4+) with addition complexity, but it sure is useful in closely controlled setups like for HPC clusters.
I ran a HPC cluster for an University, and relied upon good old NFSv3 for shared file storage(both home directories, and research datasets). In addition I also built out a big set of softwares compiled in one server and made available to the entire cluster via a read-only NFS mount point. The whole thing works so reliably without any hiccups whatsoever. To over some the limitations of authentication and authorisation with NFS storage, we use a centralised FreeIPA server that allows all machines in the cluster have the same UID/GID mapping everywhere.
As a cream on top, the storage we expose over NFS is ZFS, that integrates nicely with NFS.
Update 1:
Yes, data security is a bit of an afterthought with NFS. As anybody in my network with physical access can mount my central storage to another server physically and access data as long as they can recreate UID/GID locally.. but, if I let someone to do this physically, I have bigger problems to deal with first.
> I ran a HPC cluster for an University, and relied upon good old NFSv3 for shared file storage(both home directories, and research datasets).
Used in lots of places if they don't want to go GPFS, Lustre, maybe CephFS nowadays. Dell-EMC Isilon is used in lots of places for NFS (and SMB): I worked at a place that had >10PB in one file system/namespace (each node both serves traffic and has disk/flash, replicated over a back-end).
> […] we use a centralised FreeIPA server that allows all machines in the cluster have the same UID/GID mapping everywhere.
(Open)LDAP is still very handy as well and used in many places. (AD is technically LDAP+Kerberos.)
Pretty much the same setup we run at the university I work for, though for the whole department instead of just one cluster. Combination ZFS exporting shares with it's built in server and controlling autofs mounts from FreeIPA makes it a pretty easy to use system.
Out of curiosity, did you ever try Kereberized NFS for extra security? We tested it out a while back (and still use it in some small circumstances) but never got it stable enough for production use.
Side-note: I wouldn't be surprised if LDAP+NFS is still pretty common across universities, either as a holdover from Sun days or just out of practicality.
We (well, the IT group at a previous job) used kerberized NFS with Ubuntu (16.04 and 18.04 IIRC) and netapp filers, worked fine.
> Side-note: I wouldn't be surprised if LDAP+NFS is still pretty common across universities, either as a holdover from Sun days or just out of practicality.
Yes, absolutely. Most large enterprises, be it universities or big companies, have some kind of centralized directory (nowadays probably Microsoft AD), and machines (servers and end user clients) are then configured to lookup user and group info from there.
I can't believe they didn't toss in the "Not Working Filesystem" joke as a freebie.
Another way to have thumb-twiddling fun was for all the machines in a building to reboot - power failure, whatever - and then spend a lot of time waiting for each other's cross-mounted NFS shares to come up.
P.S.
> such as AFS (the Andrew File System)
That one prompted a senior engineer, on learning that it was to be deployed to undergrad's lab machines, to let slip "Why inflict it on the hapless livestock?"
I wasn't directly involved, but I think the AFS design was OK, the problem was that they were deploying "diskless" workstations (just the OS, no data on the local HD).
Back in the day of 1 MHz machines, with 3 Mbps net, having your home dir on a network file system was a tad hopeful - or hopeless ...
Under NFSv2 and NFSv3, the numeric user and group id is used to determine permission, and these must be aligned between the client and server. I have an oracle uid 60 on an older system that maps as elcaro on an NFS client (because I have a different oracle user there as uid 54321).
Under NFSv4, direct uid/gid is no longer used, but the RPC.idmapd process determines privilege. I'm not really sure how it works beyond continuing to work when uid/gid synchronization is in place for NFSv3 and the connection is upgraded.
There is also an NFS ACL standard, but I don't know anything about it.
> Under NFSv2 and NFSv3, the numeric user and group id is used to determine permission, and these must be aligned between the client and server.
Technically the server doesn't need to have a UID/GID database that's aligned with the client, what's required is that all clients of the same server are aligned. The server will take the numerical UID/GIDs from the RPC sent by the client and perform Posix style permission checks using the owner UID, owner GID, and mode bits stored in the inode of the file or directory. The server doesn't need to known what user the UID corresponds to.
Right. At least at Sun through the 1990s, when everybody had their own workstations, many network nodes had local filesystems, so they were both NFS clients and NFS servers. For this to work well it pretty much required that UIDs/GIDs were globally consistent.
This was maintained using YP/NIS. But Sun was too big for a single YP/NIS domain, so there was a hack where each YP/NIS master was populated via some kind of uber-master database. At least at one point, this consisted of plain text files on a filesystem that was NFS-mounted by every YP/NIS master....
This was all terribly insecure. Since everybody had root on their own workstations, you could `su root` and then `su somebody` to get processes running with their UID, and then you could read and write all their files over NFS. But remember, this was back in the day when we sent passwords around in the clear, we used insecure tools like telnet and ftp and BSD tools like rsh/rcp/rlogin. So NFS was "no more insecure" than anything else running on the network. But that was ok, because everything was behind a firewall. (Some sarcasm in those last bits, in case it wasn't obvious.)
Sun did have a firewall by the early 90's. It had application-level proxies, and you'd have to configure applications to bounce through it if you wanted to get to the Internet. In many ways, this was more secure than today's default for firewalls where you can make any outbound connection you want but only the inbound connections are filtered.
Note that I'm not arguing that Sun was a leader in security, but they did make some efforts that other companies didn't.
If you've got centralized account management, it can work. Sending a fixed length 16-bit numeric id rather than a variable width username is a lot easier.
I've worked somewhere with a lot of NFS, and they had centralized account management, so everything was fine other than actual security, at least until we hit the limit of 16-bit uids. That place had a different centralized account management for production, so uids weren't consistent between corp and prod, but NFS in prod was very limited. (And you wouldn't nfs between corp and prod either)
I worked somewhere else without real centralized management of accounts on prod, and it was a PITA to bring that back under control, when it started becoming important. Even without intentional use of uids, it's convenient that they all line up on all servers; and it's a pain to change a uid that already exists on the system.
At the time, Sun NFS clients would receive equivalents of `/etc/passwd` over the network, using the YP service (later renamed NIS).
Like much of Unix, it was worse-is-better, and pretty productive for a site. (Well, until there was a problem reaching the NFS server, or until there was a problem with an application license manager that everyone needed.)
> (Seriously, though, could someone tell me why this was supposed to make sense?)
Think about the environment it was originally used in — large organizations, computers which cost as much as a car, LANs which aren't easily accessible (e.g. the Unix people have access but laptops are expensive oddity and the sales people are probably sitting in front of a DOS box or shelled into that Unix server), etc. It's more defensible when your unix administrator is going to configure each of the servers to use the same NIS user directory.
All of that broke down when IP networking became the default, every desk in the building had a network port, and things like WiFi and laptops completely blew away the idea that the clients were managed by a single administrative group.
the NFSv4 ACL Standard is tangentially related to NFS, the TL;DR is that it replicates the kind of ACLs you can create under Windows (ie, seperate "Write to File" and "Append to File" into different permission bits, make inheritance configurable, etc.).
The TrueNAS people (ixsystems) have a patch to bring it to Linux and ZFS; though from what I've heard upstream LKML lists aren't too enthused since they'd rather see this being used by an in-kernel filesystem.
For Linux there's the richacl's (https://en.wikipedia.org/wiki/Richacls ) which was an attempt to add NFSv4/Windows style ACL's to the Linux VFS. It never went upstream though, AFAICT largely because the VFS maintainer thought such ACL's are stupid.
The VFS maintainers are still not very warm about the idea, the current stipulation, IIRC, is that an in-tree FS should support it before it gets support from the VFS layer.
As mentioned, there is working glue code in form of the TrueNAS patches, which you can apply yourself if you need to. The Solaris support for NFS v4 ACLs is in my experience... subpar. It doesn't really integrate with any of the tooling (not that Solaris tooling is great to begin with).
A fun glitch with NFS with Linux serving out ext2/ext3 was getting -ENOENT when calling unlink() on a very large file (for the time.)
The kernel on the server would do the work of unlinking the file which might take many seconds. In the meanwhile, the client would timeout, and make another NFS call to unlink. ext2/3 would have removed the path from the visible filesystem namespace, even though unlink hadn't complted, so this second call would return ENOENT. Somewhat confusing to users!
I don’t understand the “as both the client and server must maintain the same list of locks for each client” part of:
“NLM (the Network Lock Manager). This allows the client to request a byte-range lock on a given file (identified using an NFS file handle), and allows the server to grant it (or not), either immediately or later. Naturally this is an explicitly stateful protocol, as both the client and server must maintain the same list of locks for each client.”*
There is no “the client”, so if clients have to maintain that information, how is it distributed to all clients (including those that will make their first request in the future)? How does the server even know all clients, given the statelessness of the protocol? Or does that locking only work for requests from the same server? Or does the client keep that information only so that it can unlock the ranges when it discovers the process that locked the range exits/crashed? Is it even correct to assume such range locks can’t be created by another process than the one that will delete them (say after the first process forked)?
> One way is to generate a string that will be unique across all clients — possibly with host name, process ID, and timestamp — and then create a temporary file with this string as both name and content. This file is then (hard) linked to the name for the lock file. If the hard-link succeeds, the lock has been claimed. If it fails because the name already exists, then the application can read the content of that file. If it matches the generated unique string, then the error was due to a retransmit and again the lock has been claimed. Otherwise the application needs to sleep and try again.
I seem to recall checking the link count on the temporary file is all that was needed:
I noticed that NFS became a lot less important to me around the time that distributed version control software (specifically CVS) became popular. Suddenly the main reason I was using NFS, to share our RCS source code repo, disappeared.
My least favorite NFS thing I've dealt with was the integrated client in Oracle. Yes, really. Seemed almost impossible to figure out anything when stuff was broken.
The thing I hated in NFSv2/3 (and fixed, finally, in NFSv4) is the use of random port numbers, making it difficult to have an NFS server with a firewall. Yes you can set 5(!) environment variables in an obscure file to fix the port numbers. NFSv4 routes everything over port 2049, but unfortunately breaks user mapping, I guess we can't have everything.
> To provide a greater degree of compatibility with NFSv3, which
identified users and groups by 32-bit unsigned user identifiers and
group identifiers, owner and group strings that consist of ASCII-
encoded decimal numeric values with no leading zeros can be given a
special interpretation by clients and servers that choose to provide
such support. The receiver may treat such a user or group string as
representing the same user as would be represented by an NFSv3 uid or
gid having the corresponding numeric value.
I'm not sure how common this extension is, at least the Linux server and client support it out of the box. Isilon also supports it, but it must be explicitly enabled.
> I'm not sure how common this extension is, at least the Linux server and client support it out of the box. Isilon also supports it, but it must be explicitly enabled.
It is very common. I’m not aware of a v4 server that does not support this.
Originally v4 went all in on Kerberos (GSSAPI technically) to provide strong multi user auth. This is the reason that users and groups are represented as strings.
This approach works reasonably well on Windows with SMB since you have AD there giving you Kerberos and a shared (LDAP) directory. AD is also deeply integrated in the OS for things like credential management and provisioning.
The approach did not work so well on Linux where not everyone is running AD or even some kind of directory. This caused the protocol designers to make Kerberos optional in v4.1. I guess the spec authors already knew that Kerberos was going to be difficult because I just checked and the numerical-strong-user-as-posix-id workaround was already present in the original 4.0 spec.
Are you running idmapd? Are your uid/gids unified between the client and server?
Everything gets squashed in NFSv4 until idmapd is configured on both the client and the server, and they are set to the same "domain" (by default everything in the FQDN except for the simple host name).
Assuming this is up, everything will be unsquashed, and it will behave like NFSv3.
One of the reasons that NFS was so popular is that Sun gave away the protocol implementation. It's really interesting to see the technology behind that though. The whole protocol (both versions 2 and 3) is defined in a single file.[0] You run that file through the rpcgen program which spits out some .c and .h files which you can just link into your program. Of course on the server side you actually need to implement the code to handle those procedures, but on the client side it's all there, ready to use.
Did anybody use Auspex NFS servers in the late 1990's? My first job in 1997 was all Sun / SPARC based (semiconductor design) and we had a bunch of Auspex file servers. It looks like they went under in 2003 and NetApp became the dominant company.
Guy Harris worked at Sun from 1985 - 1988, then on NFS at Auspex from 1988 - 1994, then he worked at NetApp from 1994 - 2003. After that he worked at Apple, now he's been hacking Wireshark for more than 23 years, since 1998!
We got a lot of value out of ours in an academic research context but switched around 2002 when they just couldn't manage to be price-competitive. We ended up with Spinnaker, Isilon, and NetApp boxes which were all Linux-based, although with some custom extensions (if memory serves, we found a bunch of correctness issues running fsx against Isilon because their filesystem implementation wasn't as robust as the stock Linux filesystems).
I remember an issue that it's security model involved trusting the client. If you exported a file system to a PC, somebody could reboot the PC with Linux to get root and ignore the user permissions.
NFS can be...interesting to configure. Most of your enterprise storage systems support it (as well as the smaller stuff like Synology) and there's all kinds of 'but what about if we need to...'
We currently don't support Windows as a client. The main reason is that the builtin Windows NFS client is v3, while EFS is v4+. Some people have reported success mounting EFS on Windows using a third party NFS client, but I cannot comment on how well this works.
SMB1 was slow - very slow. Novell IPX/SPX was far faster.
SMB2 changed the protocol to include multiple operations in a single packet, but did not introduce encryption (and Microsoft ignored other SMB encryption schemes). It is a LOT faster.
SMB3 finally adds encryption, but only runs in Windows 8 and above.
NFS is a bit messy on the question of encryption, but is a much more open and free set of tools.
Just looked it up. It looks like the NFS server inside Netware was twice as fast as SCO on the same hardware.
I wonder if it would maintain a speed advantage today.
"NetWare dominated the network operating system (NOS) market from the mid-1980s through the mid- to late-1990s due to its extremely high performance relative to other NOS technologies. Most benchmarks during this period demonstrated a 5:1 to 10:1 performance advantage over products from Microsoft, Banyan, and others. One noteworthy benchmark pitted NetWare 3.x running NFS services over TCP/IP (not NetWare's native IPX protocol) against a dedicated Auspex NFS server and an SCO Unix server running NFS service. NetWare NFS outperformed both 'native' NFS systems and claimed a 2:1 performance advantage over SCO Unix NFS on the same hardware."
Novell NCP was faster in all contexts, as far as I know.
"SMB1 is an extremely chatty protocol, which is not such an issue on a local area network (LAN) with low latency. It becomes very slow on wide area networks (WAN) as the back and forth handshake of the protocol magnifies the inherent high latency of such a network. Later versions of the protocol reduced the high number of handshake exchanges."
It at least doesn't lock anything up that has a file open when the network goes down. NFS is a nightmare with that. NFS is more idiomatic on *nix but still a huge pain when dealing with matching file perms across systems.
> It at least doesn't lock anything up that has a file open when the network goes down.
I must admit I feel quite a bit of irrational fury when this happens (similarly, when DNS lookups hang). That some other computer is down should never prevent me from doing, closing, or killing anything on my computer. Make the system call return an error immediately! Remove the process from the process table! Do anything! I can power cycle the computer to get out of it, so clearly a hanging NFS server is not some kind of black hole in our universe from which no escape is possible.
> I must admit I feel quite a bit of irrational fury when this happens (similarly, when DNS lookups hang).
Neither of those reactions are in anyway irrational. In fact, they're not only perfectly reasonable and understandable but felt by a great many of us here on HN.
This is not the fault of NFS. The same thing would happen if a local filesystem suddenly went missing. The kernel treats NFS mounts as just another filesystem. You can in fact mount shares as soft or interruptible if you want.
> It at least doesn't lock anything up that has a file open when the network goes down. NFS is a nightmare with that.
Yeah, we've been bitten by this too, around once a year, even with our fairly reliable and redundant network. It's a PITA, your process just hang and there's no way to even kill it except restarting the server.
This is too bad. The sweet spot was "hard,intr" at least when I was last using NFS on a daily basis (mid 1990s). Hard mounts make sense for programs, which will happily wait indefinitely while blocked in I/O. This worked well for things like doing a build over NFS, which would hang if the server crashed and then pick right up right where it left off when the server rebooted.
Of course this is irritating if you're blocked waiting for something incidental, like your shell doing a search of PATH. In those cases you could just control-C and continue doing what you wanted to do (as long as it didn't actually need that NFS server).
However I can see that it would be difficult to implement interruptibility in various layers of the kernel.
I think the current implementation comes reasonably close to the old "intr" behavior.
AFAICT the problem with "intr" wasn't that the kernel parts were impossible to implement in the kernel, but rather an application correctness issue, as few applications are prepared to handle EINTR in any I/O syscall. However, with "nointr" the process would be blocked in uninterruptible sleep and would be impossible to kill.
However, if the process is about to be killed by the signal, then not handling EINTR is irrelevant. Thus in 2.6.25 a new process state TASK_KILLABLE was introduced (https://lwn.net/Articles/288056/ ), which is a bit like TASK_UNINTERRUPTIBLE except the task can be interrupted by a fatal signal, and the NFS client code was converted to use it in https://lkml.org/lkml/2007/12/6/329 . So the end result is that the process can be killed with Ctrl-C (as long as it hasn't installed a non-default SIGTERM handler), but doesn't need to handle EINTR for all I/O syscalls.
Depends on the use case. SMB auth is more robust and easier to integrate with AD, but NFS is simpler and typically faster for file access and transfer speeds. SMB is good for shares used by end users, NFS is good for shares used by services.
I've found NFSv4 to be more stable and performant than SMB when using it between Linux machines. Seems to handle multiple concurrent clients well, too.
Another reason that NFS sucks: Anyone remember the Gator Box? It enabled you to trick NFS into putting slashes into the names of files and directories, which seemed to work at the time, but came back to totally fuck you later when you tried to restore a dump of your file system.
The NFS protocol itself didn't disallow slashes in file names, so the NFS server would accept them without question from any client, silently corrupting the file system without any warning. Thanks, NFS!
Oh and here's a great party trick that will totally blow your mind:
On a Mac, use the Finder to create a folder or file whose name is today's date, like "2022/06/21", or anything with a slash in it. Cool, huh? Bet you didn't think you could do that!
Now open a shell and "ls -l" the directory containing the file you just created with slashes in it name. What just happened there?
Now try creating a folder or file whose name is the current time, or anything with colons, like "10:35:43". Ha ha!
Don't worry, it's totally harmless and won't trash your file system or backups like NFS with a Gator Box would.
DonHopkins on May 25, 2019 | parent | context | favorite | on: Why Does Windows Really Use Backslash as Path Sepa...
There used to be a bug in the GatorBox Mac Localtalk-to-Ethernet NFS bridge that could somehow trick Unix into putting slashes into file names via NFS, which appeared to work fine, but then down the line Unix "restore" would totally shit itself.
That was because Macs at the time (1991 or so) allowed you to use slashes (and spaces of course, but not colons, which it used a a path separator), and of course those silly Mac people, being touchy feely humans instead of hard core nerds, would dare to name files with dates like "My Spreadsheet 01/02/1991".
UFS allows any character in a filename except for the slash (/) and the ASCII NUL character. (Some versions of Unix allow ASCII characters with the high-bit, bit 8, set. Others don't.)
This feature is great — especially in versions of Unix based on Berkeley's Fast File System, which allows filenames longer than 14 characters. It means that you are free to construct informative, easy-to-understand filenames like these:
1992 Sales Report
Personnel File: Verne, Jules
rt005mfkbgkw0 . cp
Unfortunately, the rest of Unix isn't as tolerant. Of the filenames shown above, only rt005mfkbgkw0.cp will work with the majority of Unix utili- ties (which generally can't tolerate spaces in filenames).
However, don't fret: Unix will let you construct filenames that have control characters or graphics symbols in them. (Some versions will even let you build files that have no name at all.) This can be a great security feature — especially if you have control keys on your keyboard that other people don't have on theirs. That's right: you can literally create files with names that other people can't access. It sort of makes up for the lack of serious security access controls in the rest of Unix.
Recall that Unix does place one hard-and-fast restriction on filenames: they may never, ever contain the magic slash character (/), since the Unix kernel uses the slash to denote subdirectories. To enforce this requirement, the Unix kernel simply will never let you create a filename that has a slash in it. (However, you can have a filename with the 0200 bit set, which does list on some versions of Unix as a slash character.)
Never? Well, hardly ever.
Date: Mon, 8 Jan 90 18:41:57 PST
From: sun!wrs!yuba!steve@decwrl.dec.com (Steve Sekiguchi)
Subject: Info-Mac Digest V8 #3 5
I've got a rather difficult problem here. We've got a Gator Box run-
ning the NFS/AFP conversion. We use this to hook up Macs and
Suns. With the Sun as a AppleShare File server. All of this works
great!
Now here is the problem, Macs are allowed to create files on the Sun/
Unix fileserver with a "/" in the filename. This is great until you try
to restore one of these files from your "dump" tapes, "restore" core
dumps when it runs into a file with a "/" in the filename. As far as I
can tell the "dump" tape is fine.
Does anyone have a suggestion for getting the files off the backup
tape?
Thanks in Advance,
Steven Sekiguchi Wind River Systems
sun!wrs!steve, steve@wrs.com Emeryville CA, 94608
Apparently Sun's circa 1990 NFS server (which runs inside the kernel) assumed that an NFS client would never, ever send a filename that had a slash inside it and thus didn't bother to check for the illegal character. We're surprised that the files got written to the dump tape at all. (Then again, perhaps they didn't. There's really no way to tell for sure, is there now?)
> The very notion of a stateless filesystem is ridiculous. Filesystems exist to store state.
It's the protocol that's stateless, not the filesystem. I thought the article made a reasonable attempt to explain that.
Overall the article is reasonable but it omits one of the big issues with NFSv2, which is synchronous writes. Those Sun NFS implementations were based on Sun's RPC system; the server was required not to reply until the write had been committed to stable storage. There was a mount option to disable this, but if you enabled it, it exposed you to data corruption. Certain vendors (SGI, if I recall correctly) at some point claimed their NFS was faster than Sun's, but it implemented asynchronous writes. This resulted in the expected arguments over protocol compliance and reliability vs. performance.
This phenomenon led to various hardware "NFS accelerator" solutions that put an NVRAM write cache in front of the disk in order to speed up synchronous writes. I believe Legato and the still-existing NetApp were based on such technology. Eventually the synchronous writes issue was resolved, possibly by NFSv3, though the details escape me.