SMB has an extension mechanism and SMB 1 has support for Unix extensions for over 15 years - I was the author of the original Unix extensions spec. You can get full Unix semantics using them (links etc).
The predominant form of extension is an "info level". Somewhat analogous to a data structure like that returned from stat, the numeric info level controls what structure is returned (or supplied). Microsoft had a tendency to add new info levels that correspond to whatever the in-kernel data structures were in a particular release rather than longer term good design.
The general chattiness comes from their terrible clients like Windows Explorer (akin to Finder for Mac folk). I once did a test opening a zip file using using Explorer. If you hand crafted the requests it would have 5 of them - open the file, get the size, read the zip directory from the end of the file, close it. Windows XP sent 1,500 requests and waited synchronously for each one to finish. Windows Vista sent 3,000 but the majority were asynchronous so the total elapsed time was similar.
I worked on WAN accelerators for a while where you can cache, read ahead and write behind, in order to provide LAN performance despite going over WAN links. In one example a 75kb Word memo was opened over a simulated link between Indonesia and California. It took over two minutes - while instantaneous with a WAN accelerator. The I/O block size with SMB is 64kb so they could have got the entire file in two reads, but didn't.
If anyone is curious about what it was like writing a SMB server in the second half of the nineties I wrote about it at http://www.rogerbinns.com/visionfs.html
Do you know the cause of the 3k requests which Vista made? Do you have a sane theory why these were occurring? Also, do you have any suggestions for better clients to use?
> Do you know the cause of the 3k requests which Vista made? Do you have a sane theory why these were occurring?
Backwards compatibility and layers of indirection.
Microsoft has always made great efforts for backwards compatibility - Raymond Chen's blog is a good source of stories. Quite simply if you upgraded Windows and apps stopped working then you'd blame Windows. Of course it is almost always the apps relying on undocumented behaviour, ignoring documentation, relying on implementation artifacts etc. This means a lot of code to detect and work around problems in other components. For a networked filesystem client the simplest way is sending lots of requests and picking results of interest based on what comes back. Networked filesystem servers also work around client problems in various ways - eg they may return smaller block sizes than the client requested because it is known to have occasional problems. All of this builds up layers and layers of workarounds, workarounds to workarounds, having to test against OS/2 etc. SMB2 was an attempt to wipe the slate clean (no more OS/2!) but of course the crud starts building up again.
Explorer isn't a program that displays files and directories despite appearances. There are layers and layers of abstractions, parts provided by COM etc. The code that knows it wants to display the listing of a zip file is many layers away from the code that generates network requests. It is always easier to write code that does more than strictly needed than the absolute minimum necessary.
Time to air some dirty laundry. I worked for one of their competitors - Riverbed was set up after we were successful with the intention of beating us. (They eventually did mainly because we were acquired by a big company who essentially threw away the $300m they spent on us.)
But Riverbed's SMB implementation was done by people who didn't understand it, and who had a dangerous attitude. Essentially a WAN optimizer is looking at commands and responses going by and doing a beneficial man in the middle attack based on that data. One technical issue is to decide how you handle the unknown - eg a client or server speaking a dialect you haven't tested, or a command you haven't seen before/developed support for. Our attitude was always that it invalidated any caches, and worst case would disable acceleration on that connection. Riverbed just let it fly by.
An example of how that breaks things is that there is something similar to an ioctl to set ranges of a file to be zeroed out. Riverbed didn't know about that, and would keep returning the old cached contents. Similarly they didn't know about alternate data streams, and especially how they are named which breaks a naive filename caching implementation. At one point I sat down and came up with 5 separate demonstrations of how Riverbed corrupt data (ie 5 different areas of the protocol they messed up). The first one got published and Riverbed threatened to sue, as there was some Oracle inspired clause in their legal agreements! Our lawyers were chickens and that was the last of it.
My own view is that customer data is sacrosanct and I made sure we always did the right thing. They played fast and loose. However most people would blame Microsoft if there are issues rather than realising it was Riverbed's attitude causing corruption.
Riverbed did many other things right. They didn't get acquired like most in the industry, so they didn't have to deal with being squelched by an acquirer. Their marketing focussed a lot on the low end - when people already have two devices they are likely to buy more of the same (sunk cost fallacy). And they did TCP only (we did IP and TCP). TCP only makes it far easier to configure, load balance and do auto-discovery.
If it was up to me I would have publicised the threat to sue - Riverbed's correct reaction should have been to acknowledge the issue and fix it, although they didn't know I had a whole bunch more lined up and only stopped because I got bored.
> Time Machine, only works over a LAN with destinations that support AFP. This is at least in part because of Time Machine's reliance on Unix hard links, and also in part because it has to be able to ensure that any OS X files with HFS+ specific metadata are correctly preserved.
This is not the reason. Time Machine does support hard links, legacy Mac metadata, and other Unix features. It does this by writing all the data into large blobs (a sparse bundle) with an embedded filesystem of its choosing (i.e. HFS+). It can use any destination filesystem for the blobs, including FAT.
In particular, Time Machine makes large use of hard links to directories, which not many filesystems support. With HFS+ Apple can be sure that support is always there.
Actually I think you'll find it makes use of hard links to files. Its basically a reimplementation of rdiff-backup, or it might be the other way round. I can assure you that no directories get hard linked, and I'm sure someone will furnish the obligatory xkcd.
Edit -- I stand corrected! It does in fact link folders as well.
Also:
http://xkcd.com/981/
It hard-links directories, which is non-standard but supported by HFS+. It's kind of crazy to do in general, but in this specific use case it's a great idea.
I even remember using Time Machine with an SMB share in Tiger. You just had to enable a configuration option to make it work. Did later versions of OS X break that functionality?
by "large blobs" you mean sparsebundles, right? Sparsebundles (as opposed to diskimages) can be diff'd allowing Time Machine to not only treat them as an HFS capable FS, but to isolate changes to a single block, reducing network traffic and time to backup.
>backups are only done in whole-file increments, not at a block level
OP seems to refer to how the Time Machine destination backup file is chunked when using a sparsebundle, such that individual chunks can be updated in isolation, which is how Time Machine came to support network backups. Previously, Time Machine's destination file was a monolithic disk image that had to be updated in whole, which would be unrealistic for network backups.
At a filesystem level, a sparsebundle is just a directory and the chunks are individual data files, which the OS coalesces into a single logical file. Sparsebundles appear as individual "files" in the Finder but you can right click to see the individual chunk files. A sparsebundle is effectively a "blob" as GP states, since it is designed to contain arbitrary data, and since it's implemented with the most rudimentary filesystem components it works across platforms as GP also states.
But when discussing network transfers it makes sense to think if it as a simple collection files rather than a "blob." (OP describes the chunks as "blocks" but that implies hard drive blocks which is not the case.)
Sparse Bundles were always chunked (and introduced at the same time as TimeMachine). Sparse Images were single file, but could grow to accommodate additional data. IIRC, networked TimeMachine backups (say, to a TimeCapsule) always have used sparse bundles.
Back in the early nineties I worked at Miramar Systems on an AFP server and actually a full AppleTalk stack that ran on Windows 3.11 (VxDs!) and OS/2. Macs could run full AFP and whatever the printer protocol was called to a network of PCs.
IBM sold a version of our stuff that was called LanServer for Macintosh so back then Macs and AFP were covered!
It was quite a popular product at the time. Although I never enjoyed working on Macs I thought that AFP was pretty cool. We all had "Inside AppleTalk" pretty much memorised - what a great book.
I would have preferred NFSv4 over SMB2. They are quite similar technically, but the former has less chance of veering off into supporting strange Windowsims which will be hard to translate to a POSIX client. That said, SMB2 is widely deployed and Microsoft is innovating in SMB faster than NFS is improving.
Fortunately OS X does not use Samba as their SMB2 client.
Most users are going to have a Mac and a Windows machine, SMB makes far more sense. You're going to see NFS in enterprise situations and Apple does not really aim there target there.
from a functionality point of view yes - you have always been able to connect to a Linux/BSD box using NFS, SMB(1) ( if samba is installed on them ) or AFP ( if netatalk is installed on them ).
What this thing is saying SMB2 support has been added ( since Mountain Lion does not support SMB2 ) and seems to have been simultaneously made the default for connecting to servers that support it ( hopefully only if you don't explicitly specify what to use and presumably the SMB server on OS X Mavericks does )
Buggy is not nearly strong enough - supporting lots of mixed environments has given me ragequit levels of stress, precisely because Apple dropped samba and decided to write their own.
Why are people doing so much peer level file sharing anyway? Performance? Security? In a company it would be a lot better to have centralized servers with high availability, probably with some kind of web-based CMS to store files, something like Confluence or Sharepoint.
> Why are people doing so much peer level file sharing anyway? Performance? Security? In a company it would be a lot better to have centralized servers with high availability, probably with some kind of web-based CMS to store files, something like Confluence or Sharepoint.
You have to assume that the majority of Apple's users are home/edu/SMB users, who don't have centralised infrastructure.
They just want to share files between each other, or from a small NAS. Those who have Macs in an enterprise environment likely have a working solution via other methods.
How so? FS-level sharing is easy to use (it's just a drive that's the same on everyone's machine) and the workflow is a lot more efficient if you're using a conventional application to edit the files. It performs pretty well, and integrates nicely with AD/Kerberos for security. As long as there's a decent versioning/backup policy in place (which can be handled by the server admins, users don't have to worry about it) what's the problem?
Because not everyone is running a server cabinet filled to the brim with high performance kit and a gigabit internet connection. A lot of small companies have limited resources and limited experience and it turns out that peer level file sharing is the easiest way to do what they need, which is share files between themselves.
I did need to tweak some registry setting on my Win7 box to allow my MacBook to access its shares properly. Not very user-friendly, but after that it's worked fine.
Can someone chime in, with the pros and cons of each network filesystem. And which is a good fit for Linux - or rather for those OSs that don't need to cooperate with Windows? Was NFS ever updated - or replaced? How much of SMB is now open after court rulings? And is their one that is technically better than another?
My experience with Mac connecting to Linux file servers, is that OS X NFS client performance is fine (110 MB/s over Giga ethernet), and SMB performance sucks badly (70 MB/s on GigE, comparable to the poor old cranky windows XP). AFP performance with netatalk is comparable to NFS, but much more resource intensive on the server. Therefore I always use NFS shares between Linux and Macs.
NFS was never especially good, many a system admin would spend hours upon hours trying to fix it when it malfunctioned for no specific reason, but it was just the only viable option for many years. The alternatives were either research projects, or proprietary protocols like Novell used.
SMB isn't so much better as more widely supported.
That's a configurable option (mount -o soft). On Ubuntu, the nfs(5) manpage has more information. This option is a good thing in a strict client/server scenario where the client should not lose data even if the server is rebooted. Years ago I watched an NFS mount in an old broadcast video setup recover unimpeded after its target system was rebooted.
OS X's interchangeability with PCs is actually more badly broken than this. This is mind boggling because if Apple can get this one thing right more people would be willing to buy Mac Mini and put on their home networks. I recently tried to use external device full of NTFS formatted hard drives on Mac Mini. First thing I discovered was that OS X can't natively write to NTFS formatted drives. Even after you discover and purchase 3rd party apps that enables writing to NTFS formatted volumes, OS X can't share them via SMB. This is because Apple's own SMB implementation that they tried to replace is broken. So you have to disable that and install open source SMB anyway. There are quite a bit of hoops to accomplish this.
So there is no built-in way to share your external drives connected to Mac Mini on network if they are NTFS formatted.
I'm hoping this results in vastly improved SMB support, which I am in full agreement with other commenters, has been infuriating since Apple decided to roll their own. I frequently hop to my Windows machine to manage my Windows Home Server even though I'm just doing simple SMB communication and file cleanups that should work fine in OS X, but don't.
Related: I take it there is no maintained open source SMB server that isn't GPL3 these days? Sucks since apple abandoned samba2. How stupid would it be to use apple's old samba2 for an appliance? (guess: very?)
I'd guess polshaw wants to sell/distribute an appliance containing proprietary software (hence the reluctance to use a GPLv3 licensed component), not just install it on his own device.
No religious objection. The problem for me with GPLv3 is that it is not compatible with (privately) signed code. If it is possible to run unsigned code on my appliance then my proprietary code would not be secure, putting the entire business in jeopardy. If you can square this circle then I'd love to use it.
I'd be interested to see a list of shipping appliances (meaning not open hardware platforms) with GPLv3 if you know of any.
>No religious objection. The problem for me with GPLv3 is that it is not compatible with (privately) signed code. If it is possible to run unsigned code on my appliance then my proprietary code would not be secure, putting the entire business in jeopardy. If you can square this circle then I'd love to use it.
Your code is still under the full protection of the law. And no signing mechanism will prevent a competitor from simply dumping the flash and reading your code off there, if they really want to - if anything this is probably easier than running their own code on the system. So I don't see what using the GPLv3 changes.
If you're really paranoid, how about running samba in a chroot/jail/etc. where it has access to the data files it needs to serve/store, but not your code? (Your code can operate on the same data from outside the chroot). As long as you make it possible for the user to upgrade samba (which should be fine - you don't care what code runs inside this chroot, because it only has access to the same files the user could access via samba anyway, so the samba that runs in the chroot doesn't have to be signed) you're compliant with the GPL but haven't exposed the rest of your system.
If it is a straight rip-off then the law should protect (at least in the west), but if it were just used (for learning or adapting from) then it could be exceedingly hard to prove or even know about. I suppose what I would be most worried about is if it were leaked such that anyone could use it on any platform without paying. Who would buy an apple TV if you could run it off your raspberry pi? (I know the analogy doesn't quite work-- aTV is decent value as hardware-- but as a start up I will have higher costs so higher prices).
>flash dump
This is why you encrypt the private data on your flash :) Decryption codes can be stored in the processor (it's been a while since I looked at the system- I'll have to look again, but it seemed solid). So that means they'd have to either de-solder the RAM while somehow keeping it freezing cold too, or use an electron microscope or something on the CPU. If they are that capable then I'm sure they could just rewrite the code themselves without my 'help'. I'm not sure how much security compilation would offer, and if the details of that matter, that's something I should look into further. But the above seems pretty solid AFAICT.
>samba in a chroot/jail/etc
Thanks! This is a great idea. IIRC it is possible to break out of a chroot, but (IIRC again) not BSD jails.. so that could be a great option down the line if I am able to use BSD. It adds a fair amount of complexity legally (although it seems sound at first thought) and technically though (can they be hacked?), so perhaps one for later.
By "signed" do you mean a DRM-locked down platform ? There's no problem with signed code, there is a problem with trying to claim ownership of a device that the customer owns :-).
There are many appliances shipping with GPLv3 Samba, Netgear, Drobo, IOmega, Synology, just off the top of my head.
Of course none of these are trying to control what the customer does with the appliance.
If you want to control what customers do with their own hardware, write your own SMB3 server. Good luck with that..
Well thanks for replying I suppose. The reality is that the alternative is that customers get no SMB server, and will have to use other file interfaces. The appliances you mention.. well you didn't actually mention any specifically.. but going by the brands it sounds like you are referring to NAS boxes, in which case they are selling only hardware-- they have no valuable software of their own to protect, 99% just linux+samba, maybe they wrote a trivial control-panel backend. Show me a device that has a competitive advantage from its own software that uses GPLv3 code. 'Signed code only' does not have to mean customers aren't free with the hardware (I'd be quite happy to help wipe the device so they can do whatever they want), it just means potential competitors aren't free to steal my code and waste the investment of xyz man hours that went into that. As a consequence, those that mean no malice will lose freedom with my mix of software on the hardware they own, but they can remain free with their own software on the purchased hardware (as above. Except for GPLv3).
There are several cloud filesystem gateway appliances that contain considerable proprietary code on the appliance, and use Samba GPLv3 software to provide gateway services from SMB clients into the cloud.
I'm not at liberty to name them as I am the NAS boxes as most of them are not forthcoming about their use of Samba to anyone but their customers (to whom they provide replaceable source code of course), whereas the NAS vendors are well known users of GPLv3 Samba.
You seem to be under the impression that avoiding GPLv3 code prevents competitors from buying our box and rendering it down to components, including your precious software, and figuring out any trade secrets you may have.
This is a strange and incorrect impression.
If you genuinely want to use Samba in your proprietary appliance, email me (I'm easy to find). I help companies do this every day as part of my job.
You won't be able to use Samba outside of the terms of GPLv3 of course, but most companies not requiring DRM seem to be perfectly comfortable with that.
The predominant form of extension is an "info level". Somewhat analogous to a data structure like that returned from stat, the numeric info level controls what structure is returned (or supplied). Microsoft had a tendency to add new info levels that correspond to whatever the in-kernel data structures were in a particular release rather than longer term good design.
The general chattiness comes from their terrible clients like Windows Explorer (akin to Finder for Mac folk). I once did a test opening a zip file using using Explorer. If you hand crafted the requests it would have 5 of them - open the file, get the size, read the zip directory from the end of the file, close it. Windows XP sent 1,500 requests and waited synchronously for each one to finish. Windows Vista sent 3,000 but the majority were asynchronous so the total elapsed time was similar.
I worked on WAN accelerators for a while where you can cache, read ahead and write behind, in order to provide LAN performance despite going over WAN links. In one example a 75kb Word memo was opened over a simulated link between Indonesia and California. It took over two minutes - while instantaneous with a WAN accelerator. The I/O block size with SMB is 64kb so they could have got the entire file in two reads, but didn't.
If anyone is curious about what it was like writing a SMB server in the second half of the nineties I wrote about it at http://www.rogerbinns.com/visionfs.html