Hacker News new | past | comments | ask | show | jobs | submit login
Visa Systems Issues (state.gov)
171 points by _0nac on June 22, 2015 | hide | past | favorite | 111 comments



Does anyone else think that, all conspiracy theory aside, the information published is only partly true?

If it's a purely hw issue, say a piece of highly specialized hardware (hw crypto, piece of old mainframe, etc), it can take a very long time to source it (although a SPOF is surprising in such a critical infrastructure), but it does not take 100 people to work on it, 24/7. It requires a dozen guys making angry phone calls every 2 hours to some suppliers...

If it really takes 100 people 24/7, mostly likely explanation to me is that it's software related and they have to rewrite a critical sw in record time. Causes:

-bugfix (which may be entirely unrelated to secrity)

-emergency migration from one hw vendor to another, for sourcing reason which entails rewrite of a part of the system.


I think you're taking that a bit too literally: I interpreted that as "we've got 100 tech guys and we've told them to fix this ASAP". Which, this being government, likely means 99 of them frantically trying to cover their asses and 1 guy doing the work.


Lol at least one is working, not like in Spain (;

Edit: forgot to put a reference. This is the official website to renew the National Identity Document:

- https://www.citapreviadnie.es/

You also got the big warning notice saying 'This Connection is Untrusted' on Firefox and Chrome? Yep, it's been a feature for many years.


I've worked for the government before & that is more like it.


Let me describe the structure of a project I witnessed but thankfully was never part of (and see if I can get the formatting right)

Subcontractor which did the work:

- 2 devs - 1 accountant - 1 project manager

Contracted under:

- 1 project lead - 1 dev lead - 4 QA engineers - 1 lead QA engineer - 3 accountants - 1 lead accountant - 2 document reviewers - 1 lead document reviewer

The actual people doing the work spent 1 month writing documentation and then changed ...10 lines of code. This software was not mission critical, no lives were at stake, and if there was a bug the danger was negligible. That's government in action.

I whole heartedly believe that there are 100 people at work here.


That sounds typical for any large legacy codebase serving hundreds of customers. It's not a government thing, it's just the nature of large projects. It's also not as simple as saying that it's waste. Changing 10 lines of code in an app that has been live for years can be the result of months of work and careful ruling out of 10,000 lines of code you decided not to change.


> 1 project lead - 1 dev lead - 4 QA engineers - 1 lead QA engineer - 3 accountants - 1 lead accountant - 2 document reviewers - 1 lead document reviewer

Were all these people 100% on the project? My experience is that these roles usually are overhead for lot's of projects.


100% that project alone. Remember, qa people in government contracts often are concerned less with qa'ing the code and more qa'ing the process within which the code is developed.


"100 people to work on it, 24/7" sounds like manual data entry to me - who wants to bet against a loss of a db without usable backups available, and a bunch of people now hand entering hardcopy forms and data out of emails back into some critical database?


> sounds like manual data entry to me

It could also be manual verification. Say they had two or more database nodes, a hardware failure on one could cause them to go out of sync and then they need to verify all the divergent changes.


... which means that somebody somewhere didn't have backups of some database server that died.


I'm guessing a backup technically existed (to satisfy any requirements or boss's-orders to the "letter of the law"), but nobody bothered giving any thought to restoring everything from that backup.

Of course, a backup that can't be restored (or nobody knows how to restore) is more or less equivalent to not having a backup, so this distinction probably doesn't matter.

// always remember to test your restore plan


Indeed - "usable" backups - they are ones you've tested restoring from (recently enough to be assured they still work on the latest version of your system).

I _hope_ they don't have someone saying "we had a backup - it was on a RAID set!".

It would surprise me less to discover the reported migration from Oracle on Windows to Oracle on Linux was still partly done, and they were taking solid reliable useable backups - of the old not-yet-decommissioned windows db servers...

(For the record, I've made both of those mistakes (and more) in my career... Fortunately neither represented weeks of 24x7 remedial work by 100s of people.)


Damn I wish this would happen at the IRS. Tired of paying taxes to do nothing to help. Spend hundreds of millions on computer systems but they can't figure out how to do a backup.


I would be surprised if they didn't use OCR with possibly human checking.


Deploying OCR in an emergency seems like an insane thing to do, to me. Especially if you CAN just hire a 100 people to do it.


"More than 100 experts across the country are working on this problem 24/7" is PR-speak.

There's a pile of folks between a few contractor firms, and the staff in the offices that manage those contracts, whose current top priority is fixing whatever the issue is.

They said specifically that embassies and consulates were having problems doing biometric checks, which I would imagine requires a State system to talk to a number of other federal/intel/defense systems to do records look-ups. If we do some further supposing, that web of interconnected systems may have had some underlying issues and I could definitely see it taking some time to get each firm involved in building or maintaining those systems together to figure out where the issues are and to figure out how to fix it.


Not necessarily; replacing the physical hardware is just one step of the process of getting a system back to normal operation. They could have a spare already, but they haven't been keep up their disaster recovery drills and now have to figure out how to setup it up again from scratch. It doesn't even have to be custom hardware, it could just be a hard drive failure on commodity server that was just never backed up properly.


Or the system has a long and complex work flow and some parts of that involve human labour. When any part of the system has a backlog, the laborious part needs people thrown at it to clear the backlog.

This certainly is the case with the UK Passport office when it has issues.


I'm betting on the "we told all are IT workers to solve it" explanation, but their phrasing could still be true literally.

Imagine that software depends on complex hardware, but it's not manufactured anymore, or the gov can not legally buy it anymore (contract expired). If it fails, it must be ported ASAP, what can take that kind of work.

I've never seen something like this happen, but I've seen enough instances of it being possible to imagine it would happen once in a while.


Maybe there are 2,147,483,647 rows are in a table with the id signed int32? 2 billion isn't such a big number.


Hmm, given they already had 'billions of rows' [1] (though probably distributed across many tables), they might have been wise enough to plan for this.

[1] https://news.ycombinator.com/item?id=9756534


Apparently last year's major failure was caused by their Oracle data warehouse going down hard:

http://foia.state.gov/_docs/PIA/ConsularConsolidatedDatabase...

"The Consular Consolidated Database (CCD) is one of the largest Oracle based data warehouses in the world that holds current and archived data from the Consular Affairs (CA) domestic and post databases around the world. As of December 2009, it contains over 100 million visa cases and 75 million photographs, utilizing billions of rows of data, and has a current growth rate of approximately 35 thousand visa cases every day"

Unclear what's gone wrong this time, but the mention of "biometrics" (like photographs) makes me suspect it's the same system.


An interview with the director responsible for the system, props to @kalleboo for finding this:

http://fcw.com/Articles/2014/10/20/State-Department-database...

Typical quote: "We knew we could run it on one node. We needed to have one very powerful node."


> Typical quote: "We knew we could run it on one node. We needed to have one very powerful node."

As someone who has done government contracting...it's always alarming when this path is suggested. The last time was a beefy server with about 50TB of ram to keep all data in memory with multiple hard drives to keep backups. Ugh.


Mind talking more about what the challenges in running a 50TB relational database?

I and I'm sure many others would be very interested in a blog post or long-ish comment with anecdotes and lessons learnt...


> Mind talking more about what the challenges in running a 50TB relational database?

I wish I could but it wasn't my direct project so I wasn't part of the team implementing it. I'm not even sure it was successfully delivered.


The WOPR? How about a nice game of chess?


One of the problems with proprietary databases is that licensing issues are another barrier to creating proper clusters (and to have equal environments on development, testing and production).


I'd assume Oracle would still have a century long contract to ensure they still have a monopoly.


They did mention it's not the same thing this time.

"This is not the same problem we had with the CCD last year, which was a problem with the database caused by a software patch. This is a hardware failure, and we are working to restore system functions."


If you read that carefully, that does not preclude it from being a different problem with the CCD.


A Visa application system doesn't strike me as an obvious candidate for a relational database. I would rather store all the information related to a visa application in a document store with a relational database only used as a sort of index. I wonder if lots of systems are not built using relational just out of habit. And then bump into these problems.


Say you're the tech lead for the project, built before 2001 [1], and you need to hire a vendor that can provide a cluster that won't break with 100TB+ and guarantee long term support, possibly for decades.

Who would you call?

[1] http://www.law.umaryland.edu/marshall/crsreports/crsdocument...


I don't remember the exact timeframes, but I suspect Digital and Sun Microsystems were likely looking choices around that time. (Sun was the default choice for Telco billing systems around then...)


Why do you need a cluster?


Back in 2001, 100TB was massive. The first 1 TB HDD didn't roll out until 2007.


It's my understanding that most of this is pictures and not "actual rows", the database without the pictures should be much smaller. You could put the pictures on a SAN.


To a conservative government purchaser working with a conservative software vendor on a system that doesn't do anything remotely fancy or magic, a relational database (Oracle, specifically) was most certainly the right choice 10 years ago, only moderately less so today.

And just to be clear, that's (mostly) a good thing.


> I would rather store all the information related to a visa application in a document store with a relational database only used as a sort of index.

Now you have two problems.


I assume the technical fundaments are older than document databases were popular in enterprise and reliable on scale


Why not? There should only be at most a few billion rows (since you can't have more visa applications than humans).


Never bumped into the "beautiful flexibility" of ERD databases - where the app devs assume all responsibility over things that database designers _really_ should be saying "Hell no!" to?

  +------------+------------+--------------------------------+
  | asset_id   | asset_name | asset_data                     |
  +------------+------------+--------------------------------+
  | 2147483646 | firstName  | Joseph                         |
  | 2147483647 | lastName   | Bloggs                         |
  |-2147483647 | phoneNumber| +1 415 555 1234                |
  |-2147483646 | email      | joebloggs@gmal.com             |
  +------------+------------+--------------------------------+
(I think I still have brain damage from trying to get "too smart" with a Magento eCommerce site once...)


The flexibility typically comes from transactions and joins, something document store proponents typically shy away from. And yes, if you give stupid devs powerful tools they will shoot themselves.


It's not only flexibility, but also discipline and taking away choices (for bad modelling) that comes from following the relational normal forms.


Of course you can, you might need to apply for a visa multiple times if you want to visit multiple times. In fact, most tourists have to do that. It's still about the same order of magnitude though.


Yes, that's what I meant.


The database has way more than Visa applications, it also centralizes a bunch of other records.


What happened to "No one was fired for using Oracle or Microsoft" meme?


The meme doesn't imply that those products are good or never break, just that you won't get fired for choosing them. We'll have to wait and see if anybody involved in selecting Oracle gets fired over this.


I think it was "no one was fired for choose IBM" :p


and IBM as well, true :)


Was the State Department still running the whole Oracle database cluster on a single hardware node since last year's issues? http://fcw.com/Articles/2014/10/20/State-Department-database...


10g on Windows 2003, can't make this shit up.


This apparently started on June 10th, so down for 12 days and counting!


In entirely unrelated news:

June 4th - OPM breach announced. June 12th - OPM confirms security clearance records exposed.

I'm glad I don't have an urgent need to US visa or passport or any form of vaguely federal security or residency or travel related paperwork to go through.


So they're saying We're terrible and we don't care that much. Wow.

If this where a business, they'd be losing customers... but thankfully, they're a government apparatus that holds people over a barrel and doesn't have to provide enough decent service to offer enough visas to meet the demand, hence 12-30 million undocumented immigrants.


Most good companies have extensive interview process to hire new employees - this is most direct comparison with getting new immigrants to the country. Comparison with having 12-30 million illegals in US is the Caltrain deciding not to eject drunk hobos living under overpass for tresspassing. Instead of getting immigrants compete on education/skills/job experience/employee referrals (aka relatives) like Canada, Australia or NZ, US allows unprofitable, illiterate and low skill immigrants into the country.


The US has a process for highly skilled workers as well, not an easy one but it exists.

I don't have the numbers but looking at the amount of illegal immigration into Europe, and looking at what "jobs" most of the semi-legal (refugees, asylum seekers etc.) who also outweigh the amount of high skill labor coming into Europe I won't say this is a US specific problem.

I would also suspect that Canada with it's point based system has the same issue as well. Their immigration system is just more publicized and was given a priority especially during the late 90's and early 2000's since they felt like they were losing the competitiveness with US based companies.

Heck I've been to Canada 3 times in the past 5-6 years and the amount of what i would assume is "illegal" immigration in some of the cities there seems to also be quite high, quite a high percentage of Asian and African decent workers that don't speak English or French and seem to be very wary of people in general.


What is government but a bunch of provided services? We could replace it with well written software. No?


The keystone service is cartel enforcer. That's probably not something that should ever take the human out of the decision-making loop. When the service is essentially, "I hold the same gun to everyone's head to make sure that everyone follows the same rules," you really don't want that automated.

Leaving aside the issue of whether such a service needs to be centralized or to exist at all, if you substitute human evaluation for a set of algorithmic rules, a systems cracker can corrupt the entire cartel more easily than someone individually subverting possibly thousands of independent human actors.

In terms of many of the other services typically provided by government, yes, those could potentially be replaced by software.


Technically, sure (assuming you also build the robot hardware) but we're quite far from having software that can negotiate treaties, figure out how to regulate water rights in California, etc...


Well of course I don't mean to completely replace the government.

But how about making it more lean? How about modernizing with the times? How about making it all much more transparent and accountable? Couldn't open source software do all of this?

We could start with systems that are used by smaller, poorer countries & grow from there.

A visa processing system might be a good start.


I completely agree with this. First, we need to make the lawmakers that such a system is necessary which is understandably, a bigger challenge.


I've been trying to get marriage visa things completed since May. http://nvc.state.gov/ The payment system was down for almost the entire month of May. After finally being able to get through step 2, I've been stopped constantly. The entire social security website was down one night this month. I currently need my tax transcripts from the IRS. The online system for printing them out is down indefinitely and even the form to request they be sent by mail was constantly throwing up a technical error has occurred message.


I know that pain. Took me and my wife almost 5 years, near $2,000 and three interviews to get her green card. And that was with her already being in the states when we met and got married.

Keep at it. The feeling when it's finally over is great.


These stories always fascinate me. At that point why even bother validating the grounds for the visa - anyone who would put up with a process like that deserves it!

I immigrated to Japan - a country often held up as an example of xenophobia and resistance to immigration, and it only took a week for my spouse visa paperwork to be examined and approved. The only cost was for translations of some paperwork from home ($40 at the embassy) and then $20 for the residence card once I was approved.


My wife and I are trying to do the same thing, we're very close to applying for Adjustment of Status(for her), and are scrambling to make sure we don't miss any nitty-gritty details, which is proving to be quite the headache. Can you elaborate on why it took it as long as it did?


On slightly related note, USA doesn't support airport transfers without visa - which is taken for granted in the rest of the world. Which basically makes Latin America an island since most flights there are routed via USA, its gatekeeper.


FBI site for background checks also has implementation issues:

http://www.fbi.gov/about-us/cjis/identity-history-summary-ch...

So they can record all our conversations in real-time, but it takes 3-4 months to do a simple query of criminal records.


We always knew that the FBI surveillance project was an complete and utter waste of money designed simply to make it appear that the government is indeed doing something.


Different departments. The NSA and FBI don't talk to each other; NSA has all of the technological innovations for monitoring wide spreads of internet bandwidth.


"The NSA and FBI don't talk to each other"

You clearly have not read any of the Snowden docs.

http://cryptome.org/2013/11/snowden-tally.htm


You're both wrong, in a way. They interact, but that doesn't mean they are cohesive in utilizing each other's technologies/competencies.



What's funny is they used to do this stuff with much much slower computer systems back in the 70s and 80s.


That was before it became a political imperative to drown the government in a bathtub.


I'm waiting for an advanced parole visa (moved to the US in January) and can't get a visa in time to travel intentionally to my best friends wedding in the UK. Applied in Feb. Got told last week they need another 2 months to process it - I wonder if this is why?


Schedule a personal meeting at your closest USCIS (or whatever the immigration office is called these days) and explain it to the officer. They can give you a temporary "paper pass" you can use to travel until paperwork completes if they find your reason sufficient.

Best friend's wedding might not qualify as a good enough reason, but combined with the long delay it might - definitely worth your time talking to them.

I know someone who got this kind of pass to visit her sick father.


Took 14 months with Congressional intervention just to get my partner a immigrant visa -- which is presumably the easiest.

America has been so obsessed with thinking of itself as the "best country in the world!(TM)" for so long, it has meanwhile regressed into a failed state.


American Exceptionalism, etc. bother me too (an American), but it seems a bit incongruous to call it a failed state while simultaneously relating the story of someone desiring to immigrate to it so badly that they'd wait 14 months and get Congress to intervene. If it's so failed, your partner can leave and I'm sure someone else will be happy to come.


Of course I am a hypocrite. My partner wants to visit the US; I would prefer to live elsewhere. But, sometimes compromise is needed in relationships.

'failed state' is certainly hyperbole here -- there are large elements of society which continue to function, despite the problems at the national government level. It would truly be a failed state if, for example, the national government's performance level were propagated to the whole society.


As much as the US has inefficiencies, the hyperbole is a bit strong. There's quite a difference between being the "best country in the world" and a "failed state."


Why would you bring your partner into a "failed state?"


Certainly the mindset/passive prejudice that America is the "best country in the world (TM)" did not become a thing because of our once-fast Visa processing times.


The U.S. is only dysfunctional if you compare it to a small minority of handpicked countries, mostly Western/Northern European ones.

So this amounts to saying that the U.S. is dysfunctional compared to the most developed part of the world, which is almost a tautology.


Whoa, easy there. It's not very hard to find a failed state. And US is not one of them.


Those types of visa issues are common in many countries, especially in Western Europe.

Not to say that the US hasn't made a lot of stupid laws and policies around immigration, but they're not alone.


> There is no evidence the problem is cyber-security related.

have they fixed their data-leak into archive.org yet? (LOL) http://blog.valbonne-consulting.com/2015/05/20/misconfigurat...


Government holding biometric data is unsafe for citizens. There will always be problems, security of data and security of citizens.


Every visitor to the US is fingerprinted these days.


Except Canadians.

Unless they're NEXUS pass holding Canadians, in which case they have been fingerprinted. I just got fingerprinted this weekend for a NEXUS pass; my fingerprints are apparently "perfect". The US CBP agent taking them advised me not to take up a life of crime, as I would be caught very quickly. Or at least to wear gloves.


When did they start doing this?

I arrived two weeks ago, no finger print required.


http://www.immihelp.com/visas/usvisit.html

If your canadian, probably not.


They probably already have you fingerprints.


No they do not. I've never been fingerprinted as a Canadian until I got a work visa for the USA. For many Canadians who visit the usa for tourism or similar will never have their fingerprints in a canadian or us database under current laws.


And photographed


US visa processing is back online (same link).

"The Bureau of Consular Affairs reports that the database responsible for handling biometric clearances has been rebuilt and is being tested. 39 posts, representing more than two-thirds of our normal capacity, are now online and issuing visas. We are working to restore full biometric data processing."


Just to add to the conspiracy theory, wonder whether this is related: http://www.reuters.com/article/2015/06/08/us-g7-summit-obama...


It is natural for issues for happen like this time to time. If it's not technology, something else. Australia's visa offices (DIBP) are on industrial action at the moment.


So, what's the last time a Fortune 500 company's IT systems failed so hard they couldn't do anything for two weeks?


On a technical level, it's possible, on a practical level it's not. You can't get 100% uptime due to the unforeseen problems that will hit once in a while.

If the downtime was anticipated it can always be averted. Even a Fortune 500 can expect some downtime over a period large enough.

Take Sony earlier this year. The US government itself the year before.


I'm glad that they're at least prioritizing H2A agricultural workers.


Such a muppet show this entire situation there.


Statler and Waldorf are especially hilarious.


I wish others would also appreciate a little bit of humor on HN, but everything like that gets downvoted all the time.


Makes me glad that I accidentally put my RFID-enabled passport in the microwave, typed in 1 second and hit start.


It must fun be waiting in the non-RFID queue at immigration.


On 18 June 2015 the Lufthansa site looked like this %templateSomethingVar/somethingVar1% and lots of variables like that (not sure what template system were they using). Now it's okay.

But Polish airlines is down too (unrelated systems I guess)... but could this be something coordinated?

Ah... probably not :)


Lufty's site has always had bugs like that.


HA-clustered microservices: 1

Dated megaliths: 0

(Edit: Sheesh, what's with all the downvotes? Hardware failures are a route-aroundable thing that should never cause downtime... as long as you don't adopt an outdated "here sits the one source of truth, and its name is [fat server manually configured]" psychology. HA clustering is at the point where it's a generic, drop-in thing for arbitrary services. If you fail to recognize the above, go do some learning.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: