mobilemonkey's comments

mobilemonkey · on May 9, 2011

bmw.de. It's a URL that typically people from the US don't hit, so it is a good check of DNS and connectivity.

mobilemonkey · on May 2, 2011

Have had mine for just over 3 months, no problems whatsoever.

mobilemonkey · on March 30, 2011

Given the number of times people mentioned BBQ and smell in the announcement a few minutes ago, I think you had two strikes against you.

buuuurrnnnnn

mobilemonkey · on March 30, 2011

OP resident here too.

Given the recent Google Voice integration with Sprint, and investment in Clearwire, I'm sure the proximity of Sprint doesn't hurt.

mobilemonkey · on March 21, 2011

Hi from Sprint - it works great for me, I've been testing for a couple of weeks. This is actual back-end call routing as opposed to VOIPing the call through the app on the phone.

http://bit.ly/sprintgooglevoice

mobilemonkey · on March 21, 2011

Sprint landing page - www.sprint.com/googlevoice

mobilemonkey · on March 21, 2011

I worked on the Sprint side of this a bit, had the same question early on and the answer I got was that it didn't change minutes of usage calculations...so...nNo change to how numbers behave from a billing perspective. If you're calling a mobile number and have AMA, it goes in that bottomless bucket. If you're calling a landline, it uses Anytime minutes. Same story for shared lines-- no change to how minutes are used.

camiller · on March 21, 2011

Thanks for the info! So if someone calls my current Google voice number which gets forwarded to my Sprint number it would not be AMA, but if I switch my Sprint number to Google Voice control and they call that it is still AMA, yes?

Sounds like a winner!

mobilemonkey · on March 21, 2011

yep, that's my understanding. If you actually sign up for GV as your Sprint number, it acts just like your Sprint number. No forwarding involved.

Also, you get to keep your GV number for..I believe 6 months is the plan. It behaves the same way as your Sprint number during that time. That's my understanding right now anyway.

mobilemonkey · on Feb 23, 2011

I'm not going to pretend I know what Mach is, but around here, (big company that you're familiar with), rebooting/bouncing the servers is pretty much how issues are dealt with. "Response times outside of SLA: bounced the server." "Database connections timing out: bounced the server." "Users experiencing high load times for pages: restarted JVMs. Then bounced the servers."

Root cause seems to be "server up too long."

jwhitlark · on Feb 23, 2011

At my last sysadmin job, we had a policy of bouncing a server if it's uptime exceeded one year, not to clean up any resources, but to make sure the config in the files matched the config that was running. Of course, not only were we running Linux (gentoo, in fact), but a lot of our stuff was DJB services, including his service manager.

I'd argue that a 5 why's analysis of "server up too long", leads to "server wasn't written well." YMMV

gwern · on Feb 24, 2011

Richard P. Gabriel makes that point somewhere - that one of the downsides of an 'organic' system like a Lisp or Smalltalk image where you can rewrite or do anything dynamically is that you tend to do just that, and can wind up in a situation where you are no longer able to stop using the image because it has too many valuable changes which are too difficult to reverse-engineer.

cwp · on Feb 24, 2011

That's a bit of an understatement. Basically all Squeak and VisualWorks images have been running since some time in the late '70s. They've been migrated from 16- to 32- to 64-bits and across chip architectures, but it's the same image. Other dialects were bootstrapped more recently, and some of them have retained the ability to create an image from scratch.

I guess bootstrapping is more common in the Lisp world.

epo · on Feb 24, 2011

I remember believing (though without any supporting evidence) that this was one reason why SmallTalk couldn't possibly catch on commercially. The system was so mutable that any kind of support was impossible, there as no way the support person could know what your system configuration was.

mpk · on Feb 23, 2011

Now this is an excellent reason and timeframe for doing periodic reboots.

It's basically a scheduled sanity check for your infrastructure. It boils down to robustness and assurance. If parts of your system won't survive a reboot this is an unknown and therefore a risk.

If a system can't survive a reboot it's better to find that out when it's during a scheduled reboot and personnel is on hand to deal with it rather than relying on on-call personnel responding to alarms from monitoring at some point.

Apart from the repair time being longer when relying on on-call technicians, there is also the general observation that productive systems tend to become more valuable or have more exposure over time as their usage increases. A failure at a later point therefore also becomes more costly as time goes by.

rbanffy · on Feb 23, 2011

> Now this is an excellent reason and timeframe for doing periodic reboots.

Most of the time you don't need to do complete reboots. Restarting services usually takes care of the majority of the sanity checks.

mpk · on Feb 24, 2011

> Restarting services usually takes care of the majority of the sanity checks.

Sure, that usually takes care of most checks, but the point is to eliminate as much as possible unusual scenarios. A service-level restart will catch service config mismatches, but it won't catch, for example, manual interface or routing table updates. It also won't catch the 'fragile root' problem where the device that the kernel is loaded from has become corrupt.

Bad things happen in the strangest places if you run enough systems.

rbanffy · on Feb 24, 2011

True. You still would need to reboot from time to time, it's just that you can restart services much more frequently, even automatically if your humans know what is going down today and what can they expect if things go wrong.

But I agree - the objective is to detect those situations when things have gone wrong a long time ago and we just don't know it yet. ;-)

jacques_chester · on Feb 23, 2011

Mainframe shops regularly reboot their systems on a rolling basis.

A common pattern is that each logical partition gets rebooted once per month, on a rotating basis. Say you have 4 LPARs: LPAR 1 is rebooted on the first week, LPAR 2 on the second and so on.

The goal is to demonstrate that in case of catastrophic failure that the system will come up. If it doesn't come back up during the scheduled reboot, you have the advantage that you are rebooting outside production hours and have some time to fix the problem minus frantic phonecalls.

Meanwhile, the other LPARs will keep running as if nothing is going on.

JoeAltmaier · on Feb 24, 2011

Yahoo reboots IM servers daily. Easier than bothering to find all the horrible leaks.

Its a business decision. Not worth getting emotionally involved, just weight cost of fixing vs cost of rebooting.

Twirrim · on Feb 23, 2011

See that's not actually solving the problem. Thats more like pretending that the problem doesn't actually exist. Instead of investigate and fix permanently so that customers don't experience problems they'd rather just keep letting the problems fester.

mochapixel64 · on Feb 23, 2011

Not everyone is "pretending that the problem doesn't actually exist." There are limited resources available to any and every organization. Many times, a proper fix can cost more than a band aid, sometimes even over time.

Here's a hypothetical, but not ridiculous, scenario. Which is a better way to spend resources? Track down and stamp out an extremely difficult resource leakage bug? Or simply bounce the server? The costs associated with the former may be HUGE. The costs associated with the latter may be small in comparison (cost of potential down time, minimal labor cost of bouncing the server(s), and costs associated with user reliability).

I wouldn't always chalk it up to whatever it is that your comment implies (laziness? willful ignorance?).

mobilemonkey · on Feb 23, 2011

Yeah I'm absolutely implying something by my comment. I understand the time:value relationship and can expect some degree of RCA to go away if rebooting a server solves a problem for X amount of time, where X is less than a few days. But if you're spending the time opening an alarm, calling people to a bridge, agreeing to bounce a server, and then bouncing it, at some point that equation comes out in favor of actually doing some sysadmin/RCA and making it so you're not bouncing a server every 48/72 hours.

It also looks really JV to see those alerts day in/day out. Like, come on. We can't just fix the problem?

NetMonkey · on Feb 23, 2011

I really doubt anybody would accept that. The solution in many cases are simply to reboot the servers every night.

Sure, we would all love to fix the bugs, but if there are no easy clues and a regular reboot fixes it - who really cares when there are features to build?

derleth · on Feb 23, 2011

Mach is a microkernel. That means it's a way to design an OS such that most of the things application programs rely on are outside the kernel, provided by servers that can be restarted and replaced without rebooting the whole system. For example, the majority of the filesystem is provided by a filesystem server, most of the networking code is in a networking server, and so on.

You begin to see why a microkernel that needs to be bounced more often than, say, Linux, which is not a microkernel, begins to lose some of the appeal of having a microkernel in the first place.

(Aside: Mac OS X is based on Mach, but practically all of the stuff applications see is provided by a single server which (to the best of my knowledge) can't be restarted without taking down the whole system. It's a hybrid design.)

Also, just to confirm a stereotype, do you use Windows for your servers?

stcredzero · on Feb 23, 2011

Also, just to confirm a stereotype, do you use Windows for your servers?

Related tangent: There was a time when the the US military's COTS (Civilian Off The Shelf) initiative resulted in AEGIS missile cruisers running largely on machines running Windows NT. Yes, the ships responsible for screening carrier task forces against attack -- had to be rebooted on a weekly schedule.

To be fair, the unmolested NT kernel is rock solid. I know people used to run RAID arrays with it, but for the desktop, Microsoft allowed the crossing of certain architectural lines.

rst · on Feb 23, 2011

And if the NT machines running your missile cruiser don't come back up, then you have real problems. During trials, the USS Yorktown had to be repeatedly towed into port after systems failures...

http://lists.essential.org/1998/am-info/msg03829.html

patrickgzill · on Feb 24, 2011

From what I understand the NT kernel was rock solid, if a little picky about hardware. The problem was that graphics performance suffered due to the strict isolation of the drivers from the kernel. MS relaxed those restrictions in later versions of their OS (either NT4 or Windows 2000), gaining better graphics performance at the cost of kernel stability.

VladRussian · on Feb 23, 2011

>the unmolested NT kernel is rock solid.

which, if memory serves me right, directly translates to "OS/2 kernel is rock solid"

wmf · on Feb 23, 2011

No, I don't think NT is a derivative of OS/2.

"Given these issues [with OS/2], Microsoft started to work in parallel on a version of Windows which was more future-oriented and more portable. The hiring of Dave Cutler, former VMS architect, in 1988 created an immediate competition with the OS/2 team, as Cutler did not think much of the OS/2 technology and wanted to build on his work at Digital rather than creating a "DOS plus". His "NT OS/2," was a completely new architecture." http://en.wikipedia.org/wiki/OS/2

stcredzero · on Feb 23, 2011

Also closely related to "VAX kernel is rock solid"

derleth · on Feb 24, 2011

Windows NT is more closely related to VMS than OS/2, but most directly related to DEC's unreleased Mica OS for their unreleased PRISM CPU architecture.

This is unsurprising: They're all designed by the same guy, Dave Cutler.

rbanffy · on Feb 23, 2011

> the unmolested NT kernel is rock solid

Too bad it resents running programs on it so much...

stcredzero · on Feb 23, 2011

It's not that. It's more boneheaded stuff like injecting components supporting the GUI into the kernel.

When that happens, it doesn't resent those programs so much as that it's been violated and turned into a warped version of itself. Sort of like Jeff Goldblum's character in The Fly.

reeses · on Feb 23, 2011

Mach isn't a microkernel. For example, NeXTSTEP was based on Mach 2.5, which is a monolithic kernel. Mach 3.0 was the only microkernel version, and to my knowledge, no one has adopted it with any success.

msbarnett · on Feb 23, 2011

Mach 3.0 was folded into XNU in roughly the same way that earlier versions of Mach were used, but with more of the BSD functionality ported down into Mach rather than hosted POE-style in kernel space.

So 3.0 was adopted, with success, but not as a microkernal hosting user-space servers.

glhaynes · on Feb 23, 2011

Does "POE" stand for something? POSIX Operating Environment, maybe? A quick Googling and a look at the Mach Wikipedia page doesn't reveal the answer.

msbarnett · on Feb 24, 2011

I poked around various mailing lists and even an archive of the poe source from a CMU server, but I couldn't find an authoritative expansion.

quadhome · on Feb 24, 2011

Proof of example.

mobilemonkey · on Feb 23, 2011

Thanks for the great explanation, too, btw.

mobilemonkey · on Feb 23, 2011

Hmm, nope. Solaris of some flavor I think.

rbanffy · on Feb 23, 2011

Oh boy...

That's what you get when you assign the management of Solaris servers to MCSAs...

josephcooney · on Feb 24, 2011

So if windows systems are unreliable it's windows fault, and if other OSs are unreliable it's also windows fault?

rbanffy · on Feb 24, 2011

No. It's just that Solaris (and every other kind of Unix) admins know very well that they don't bounce servers when they don't know what's wrong - they find out what's causing the problem and correct it. There is a very rich set of tools to make that easy.

Bouncing a server may even make the problem go away, but it does little to enlighten you on why it happened in the first place.

And yes, If your MCSAs tend to bounce servers all the time, that's because their experiences have shown this works for Windows.

josephcooney · on Feb 24, 2011

There are plenty of tools for troubleshooting problems on windows. And your statement that "every kind of unix admin knows not to do that" seems plainly at odds with the reality here.

rbanffy · on Feb 24, 2011

> seems plainly at odds with the reality here.

You have a human resources problem, not a technical one.