Current design was build by developers, not designers and will be renewed once the professional design is ready.
We put all our efforts on backend and storage subsystem development and completely forgot about our temporary public website.
Site images were taken from various public stock image services, I hope we didn't violate any rights by using these images.
Like I asked, can you let me know the stock image services you used? I really like that cloud that Salesforce uses and would love to be able to use it myself.
Also, if you go to http://rest.s3for.me/ (the URL you use to calculate uptime) the error message in the document tree says 'The AWS Access Key Id you provided does not exist in our records'.
You mentioned S3 stands for Storage Should be Simple, what does AWS stand for in your company? Unfortunately cloud software isn't my forte, I would love to know what that acronym stands for.
My naivete leads me to believe that your status check is hosted with Amazon, and that your uptime checker would in fact be checking Amazon's uptime but I am surely mistaken.
These headers are part of S3 implementation an we left them to avoid any possible problems with S3 clients.
Some clients are very strict on protocol validation and refused to work in some cases.
The purpose of these headers are to track all requests and be able to find any problems with any particular request. You must include them when contacting support with problems with some request. This is true both for Amazon and S3For.Me
Why is your key id called 'AWSAccessKeyId', why does your error message say 'AWS Access Key Id does not exist in our records' why is your error message identical to the letter of what S3 would return. Surely the error message text isn't required for protocol validation.
This looks like you just took an open source S3 REST API clone (there are many) and stuck it on a Hetzner server without bothering to change any variable names.
To me there are a lot of questions, the most obvious tell is that I highly doubt Salesforce would use a stock image as the logo for their cloud database solution considering the level of investment they made in it.
'AWSAccessKeyId' is part of XML Schema and can not be changed.
You are right regarding AWS text in error messages, I removed it, thank you for noticing this.
This was the first thought - to take an open source S3 REST API clone, install it on Hetzner servers and work this way, but it is not the case. Any of the available solutions fit to us for different reasons. All core software is build by our team. We use Open Source software a lot, but the core of S3For.Me was developed from the first to the last line by our team.
I've checked Salesforce site and do not see anything similar to our logo. It will be replaced in the nearest future anyway.
Seems like they should get ready for the trademark infringement letter.
Also, the math they use to get their 99.9999% durability is a bit sketchy - that 1% failure rate for a HDD isn't independent of its age, stress level, temperature, batch number, etc., and likely doesn't include the odds of corruption as opposed to failure. At scale you can't simply rely on your vendors' claims.
I can claim anything I want and refund you for the time I was offline. How will that help anyone?
>We guarantee 99.9999% storage durability and 99.99% availability over a given year. Each byte is stored on three separate servers in two datacenters at the same time to achieve this level.
99.99% means you are violating your promise if you are down for 53 minutes over a given year[0]. What happens when (not if) this happens? If you would credit me back a couple of bucks and call it even, the whole four nine thing is useless. I don't understand the promise of high availability. On one hand it seems difficult to achieve and on the other hand it seems there is not much penalty for failing your promise.
I don't think that is the main problem with that calculation. They assume to know the risk of failure over one year. Having three separate copies would allow you to estimate the risk as they do if they don't intend to monitor failures over shorter times than that (which would be a very very bad idea). Therefore the risk is actually much smaller since one would presume that they do something about it when the first hard drive fails...
As noted elsewhere this is hardly the only source of potential data loss however.
Hmmm... Well you may want to actually put that on your website. And you probably want to be careful, since you've posted comments on here like "all the rest documentation can be found at Amazon S3 documentation site, it is basically the same with small differences" and you have "Amazon" on the sign, which is above the "fold" on my screen and pretty much the first connection I made on S3. If your entire website doesn't scream "we are just like Amazon S3" then I must be missing something.
you are reading that guideline incorrectly. That's a contract that gives permission to use the trademarks listed under certain conditions. That is not a guide to what amazon may claim as trademarks.
If this company had registered the "S3" trademark before Amazon had registered "Amazon S3", then this company probably could have gotten a settlement amount from Amazon... In the actual case I'd expect the company to get a C&D from Amazon at some point, with eventual negotiations being either to change the name or add a prominent header noting the lack of relation to Amazon S3.
They say that they store data in three separate servers in two data centers. Three HDD's must fail between two data accesses or scrubs to lose data from HDD failures.
For 99.99999% durability their way of doing things is good enough to protect from HDD failures. Of course, there are other failure scenarios.
I'm not sure who this is targeting. As a (very) small-scale user of S3 I hardly pay anything per month and would not even consider switching.
Large companies on the other hand need to have the reliability and security of a big name behind the storage. Why would they even consider moving to a no-name company that hasn't proven itself? Sorry, it just seems like the wrong niche to step in to.
I'm actually facing a decision right now in building a new app that has relatively heavy needs for data storage and number crunching. Each paying customer would have a relatively large amount of data per person compared to your average CRUD app.
I'd like to use DigitalOcean or Linode VPS machines for processing and as web servers, but then I need a large object store like S3. Rackspace and AWS have both object storage and VPS, but their VPS machines are underpowered for the price.
So ideally I would use Rackspace-Files or Amazon-S3 for storage, and DigitalOcean for number crunching, but I'd get killed on transfer rates from S3 to DigitalOcean (for example). Amazon and Rackspace have a trump card with free data transfer within their data centers. You have to use their slow machines to realistically use their object stores in this type of a use case.
So that's the 'S3' problem I need solved, which still isn't quite what S3ForMe is going after. Extremely cheap data transfers to/from major VPS providers.
Why not use Rackspace-Files and get dedicated servers from Rackspace? That would solve your transfer rate issue and likely get you far better performance than Linode or DigitalOcean.
I've been happy with unmanaged servers from Incero. Their severs use ECC RAM, their support seems competent, and they offer buy-down prices for things like RAM.
Back to pgrove's point, though — there are trade offs with using a dedicated server. The storage isn't as reliable as S3 and the server itself is likely less reliable than a VPS (since some hardware problems on a VPS can be solved by migrating to new hardware and only involve a small downtime, rather than the large downtime of restoring from backup).
A major advantage to dedicated servers that I don't hear much about is simplicity. Dropping everything on a dedicated server means I can use local services like the file system rather than having to deal with S3 and latency and bandwidth costs.
Edit: I just realized that the idea was to use Rackspace files + one of their servers rather than S3 + EC2. Somehow I missed that bit and was thinking about just storing everything on the dedicated server.
Lets see, it has less durability, unknown reliability (new company), and is housed in only two data centers in one region. So that means two of your three replicas are probably sitting on a rack right next to each other.
Allright, i won't be on the haters and trolls side this time. I'm actually very interested in your product as i was looking for an Amazon S3 replacement.
Pro:
- Servers are located in Europe (WOOOW, fuck spying govs)
Not so good:
- I think you had a great idea, got one (maybe 2,3) developers working on it for some time and rolled out this product. Which obviously is quite unfinished (on the business point of view, i have no hint on how the technical side is working as i never used it).
So, question for you guys:
How the hell invoicing is working? As you could guess most of us work/own companies, we need freaking invoices for every cent we spend. No billing, no invoices, nothing like this in the private area, ideas?
The US has more legal ability to spy outside the country than inside. If you host in Europe, but do business in the US it is my understanding that both the NSA and the CIA can snoop on you, whereas if you are solely US based, you do not worry about the CIA.
Considering the number of downvotes, i guess at least 3 americans got hurted by my comment. Sorry for this. All i know is that if a business is based in germany accountancy is a lot easier and i might be able to recover VAT and other stuff. US weird invoices cause problems on this side.
That said, if i get (as an example) a search warrant from a US court and i'm based in EU, i will simply ignore it. This is quite enough.
S3for.me is hosting at hetzner.de one of the largest german hosting companies. Unfortunately they have to provide a wiretapping API to police, secret service (BND / Bundesnachrichtendienst) and other govermental institutions. They are not allowed to talk about this or risk punishment.
Recent developments from Mr. Snowden show, that BND regularly sniffs and provides data to the NSA.
The obligated party shall tolerate the
installation and operation of equipment of the
Federal Intelligence Service in its premises which
shall only be installed and serviced by Federal
Intelligence Service staff specially authorised and
shall meet the following requirements:…
I'm interested on more detail on this. How does it work? Object storage at scale is not a trivial problem to solve. Are they using their own storage engine or are they using an open-source one? Where are the docs? "Just look at the S3 docs" is a little vague since so much of S3 has to do with AWS product integration rather than just storage (eg billing, regions, identity/ACLs, etc).
We are using our own self-written software to implement S3 protocol. We do not support all enterprise-features like extended ACL, regions, encryption, this makes our software much simpler and easier to develop.
You can find basic documentation here https://my.s3for.me/client/docs/index, all the rest documentation can be found at Amazon S3 documentation site, it is basically the same with small differences.
Edit: can't post comments for some time (You're submitting too fast...), will reply later.
Can you share more about your implementation? (Disclosure: I'm very interested since I work on OpenStack Swift.)
Starter questions:
How does data placement work?
How is data checked for correctness?
How do you do listings at scale?
During hardware failures, do you still make any durability or availability guarantees?
How do you handle hot content?
How do you resolve conflicts (eg concurrent writes)?
> How does data placement work?
Each object bigger than some size is split in small parts, these parts are linked to the metadata object with all information, such as name, bucket, size, date, checksum, etc.
All data is split in server groups - each group is at least 3 mirrored servers with no more than 5TB of data to make system flexible. Server groups can be added to increase system capacity or removed to decrease.
> How is data checked for correctness?
With checksum. Once the data is uploaded by user he will receive its checksum and must compared it with local checksum to make sure that it was correctly transfered and stored. The same checksum is used to ensure server-side data correctness.
> How do you do listings at scale?
There is a trick - we support only one delimiter (/), this means that we can use very simple listing algorithm which scales very easy.
> During hardware failures, do you still make any durability or availability guarantees?
Yes, all data is split in server groups by 3 servers each. If one of 3 servers will fail, this group will still running like nothing happened, some running requests may fail though. If 2 servers will fail at the same time, then this group and all data in it will be put in read-only mode to avoid any possible data damage.
> How do you handle hot content?
It is cached in RAM by OS, we do not perform any additional measures. OS does a pretty good job.
> How do you resolve conflicts (eg concurrent writes)?
Some conflicts are resolved by the software if possible. Unrecoverable conflicts are returned back to user with HTTP 400, 500 errors to make him know that something is wrong and he must run request again.
For concurrent writes we use simple rule - the last one wins.
Interesting, and thanks for the response. If I may probe a little further, I have a couple of follow-up questions.
1) Server groups of at least 3 mirrored servers, with a max of 5TB.
This seems like an interesting design choice. What do you mean by "at least"? Does this mean you'll have some data with more replicas? Are these server pools filled up and then powered down until they are needed? How do you choose which server pool to send the data to? And since you have a mirrored set of servers, when do you send a response back to the client?
Is the 5TB number something that is a limit for the storage server (ie 15TB total for a cluster of 3)? That seems rather low. It also doesn't divide evenly into common drive sizes available from HDD vendors today. So what kind of density are you getting in your storage servers? How many drives per CPU, and how many TB per rack? Since you're advertising on low price, I'd think very high density would be pretty important.
2) You say you split data into smaller chunks if it crosses some threshold. Let's suppose you split it into 1MB objects, once the data is bigger than 5MB. And each 1MB chunk is then written to some server pool which has replicated storage (via the mirroring). How do you tie the chunks back to the logical object? Do you have a centralized metadata layer that stores placement information? If so, how do you deal with the scaling issue there? If not, another option would be to store a manifest object that contains the set of chunks. But in either case, you've got a potentially very high number of servers that are required to be available at the time of a read request in order to serve the data.
Just as an example (and using some very conservative examples), suppose I have a 100MB file I want to store and you chunk at 10MB. So that means there are now 10 chunks replicated in your system, for a total of 30 unique drives. Now when I read the data, your system needs to find which 10 servers pools have my chunks and then establish a connection with one of the servers in each server group. This seems like a lot of extra networking overhead for read requests. What benefits does it provide that offset the network connection overhead?
And what happens when one of the chunks is unavailable? Can you rebuild it from the remaining pieces (which would essentially be some sort of erasure encoding)?
Overall, the chunking and mirroring design choices seem to me like they would introduce a lot of extra complexity into the system. I'd love to hear more about how you arrived at these choices, and what you see as their advantages.
In order to not make my long post even longer, I'll not pursue more questions around listings, failures, or hot content.
1) I've made a typo, not "at least", but "at maximum", meaning that each server can store up to 5TB of data, it's 2x3TB hard drives servers. The density is very low because we use cheap hardware which fail regularly and such a small data amount means high recovery speed. 5TB is a soft limit and can be different for server groups, but it is not at the moment. Each group of 3 servers has a total capacity of 5TB because data is mirrored.
2) We have a centralised replicated metadata layer which is stored on the same servers as the data itself. All object chunks are stored at one servers group at the same time, so there is no need to connect to multiple servers to serve a file, it is enough to connect to one server from server group to get all the data. Metadata may be stored at different server group though.
All chunks are replicated to 3 servers at the same time using a sequential append-only log to ensure that all servers has the same data. This may introduce replication lag and if it is too big for some server then it is removed from the server group until replication lag back to normal (1-3 seconds usually).
Actually, it is much simpler than I explained, data layer with replication, data consistency and sharding is completely transparent to the application layer and it is really-really small and simple. Email me at support@s3for.me and I'll share with you software details and you will understand how simple it is.
Minor Advice: On your interactive chart, I'd arrange it so s3forme was first or last, and that it was a steadily increasing (or decreasing) value.
Also I'd consider changing the font/color on s3forme so that it 'pops' a bit more compared to your competitors. As is my guess is the user's eye is drawn to Rackspace.
Our prices will aways be lower than Amazon S3, I can tell this for sure.
But our main advantage is support - you can always open a ticket and get an answer on your question within 24 hours, but usually much faster.
Presumably, Amazon gets enormous volume discounts for the disk drives that would be a main cost of running S3, and they run their business with notoriously tiny margins. It would seem nearly impossible to undercut them for the same product at the same level of service.
Each file is stored at 3 different servers at the same time. Hot files are moved to RAM automatically and will be served very fast without even touching the disks. Our tests shows that each servers group has about 500Mb/s free bandwidth capacity at this time. And it can be increased to several Tb/s in a matter of days if needed.
Additional servers will be added in 2-3 days do the server group if necessary.
Which is all well and good if you think you are better than the half-dozen proven open source systems (such as OpenStack Swift) that are in production on large public clouds.
Right now you are a unknown player with a unknown product running on top of a bargain-basement server provider.
I wish you the best of luck but I'll be surprised if you don't slam into a growth wall and have to raise prices.
Yes, of course we have our reasons. One of the most important are:
- We didn't like internal storage implementation, we believe that we can do better. Very self-confident, I know, but I still think so after almost a year of development and production use.
- Production deployment from the start required a lot of hardware investments, we decided to put these investments into development.
Personally I think that we made a right choice and I don't regret about it. We learned a lot about cloud storage, scalability, performance, possible problems and I believe that this knowledge is very important for every team who works in cloud storage business.
You write your own controller and disk drive firmware? (That's not a facetious question. That was the single biggest source of pain on the storage product I worked on at Sun, and that pain was usually extremely difficult to work around satisfactorily.)
I would seriously reconsider putting that button on a page more than once or twice. It makes it look unprofessional.
It looks like an interesting storage service, but like others mentioned, you should consider renaming it. I was convinced you were reselling Amazon S3 storage when I visited the site.
So, there's no company info at the website at all, why should I trust "unknown" to host my data? Who's liable? Do I know you don't run out of business next week?
Last I checked, Mega does not support the Amazon S3 protocol nor even a RESTful one. IIRC, the majority of Mega magic is done in browser using a ton of JavaScript.
Yours: http://www.s3for.me/images/s3forme1.png
Salesforce: http://blog.database.com/wp-content/uploads/2012/11/intro-db...
Your S3: http://www.s3for.me/images/s3forme8.png
S3 from 2011: http://themetest.hollywoodtools.com/files/2012/06/s32.png
You: http://www.s3for.me/images/s3forme6.png Unbrekable IT (2011): http://www.unbreakableit.com/uit/wp-content/themes/unbreakab...
Your roadmap: http://www.s3for.me/images/s3forme4.png Oblaksoft (Nov 2012): http://www.oblaksoft.com/wp-content/uploads/2012/11/cloud-ch...
Albeit this following one could be a stock image: Your keyboard: http://www.s3for.me/images/s3forme5.jpg Cloudonlinebusiness (2010): http://www.cloudonlinebusiness.com/wp-content/uploads/2013/0...
Edit: Your TOS is Hetzner's almost verbatim. https://www.hetzner.de/en/hosting/legal/agb