Hacker News new | past | comments | ask | show | jobs | submit login

> Let Office do the heavy work for you. Word and Excel have extremely complete object models, available via COM Automation ... You have a web-based application that’s needs to output existing Word files in PDF format. Here’s how I would implement that: a few lines of Word VBA code loads a file and saves it as a PDF using the built in PDF exporter in Word 2007. You can call this code directly, even from ASP or ASP.NET code running under IIS. It’ll work.

It'll work until it doesn't [1]. Like if you want to do two things at the same time.

> Considerations for server-side Automation of Office

> ...

> Reentrancy and scalability: Server-side components need to be highly reentrant, multi-threaded COM components that have minimum overhead and high throughput for multiple clients. Office applications are in almost all respects the exact opposite. Office applications are non-reentrant, STA-based Automation servers that are designed to provide diverse but resource-intensive functionality for a single client. The applications offer little scalability as a server-side solution ... Developers who plan to run more than one instance of any Office application at the same time need to consider "pooling" or serializing access to the Office application to avoid potential deadlocks or data corruption.

And if you want to use this functionality to service the requests of anonymous users, make sure to read up until "[u]sing server-side Automation to provide Office functionality to unlicensed workstations is not covered by the End User License Agreement (EULA)."

[1] https://support.microsoft.com/en-au/kb/257757




I have seen a production system (which is still live) using much like Joel's suggestion 2 - a Windows 2003 server that uses Excel to produce reports (inside an otherwise Linux shop). It's fragile, no one wants to touch it, it's costly to migrate, but too used to deprecate. It dies about once a month and has to be restarted.

I guess it works (although the output isn't particularly good), but it's drained quite a bit of maintenance time and even more morale.


I designed a system like that for the place I currently work. I'm positive the person who follows me in my position will curse me as the Devil. However, before I set up that system, documents were created with weekly batch merges that took one person two days to complete. Now a batch can be run in about 2 hours. I'm a huge Linux nerd, but it was easier to get something working with the existing infrastructure using COM than design something entirely new with PDFs and typesetting software, or whatever. Working in Windows drains my morale, sure, but making everyone else at my company that much more effective is a huge morale boost for myself, and it boosts the morale of everyone else here too. It also allows us to serve our constituents better, which is rewarding as well.

That being said, I wish I never had to program a mail merge in the COM ever again.


I think this is where the fact that Joel has not worked for MS for a long time shows.


Then it has shown that for the last 8 years, considering it was written in 2008


For over 10 years I've written and maintained Excel automation that integrates with a web-based research database. Beyond not being supported by the EULA, Microsoft flat out says in its documentation do not do this COM is not designed for this purpose.


It's not COM that's the problem. When you instantiate Excel you're firing up an out of process instance of Excel which you communicate with over DCOM.

The problem for web apps (for example written in Classic ASP or ASP.NET) is that each request can create its own instance of Excel, and if the request that did this didn't complete and cleanly close down Excel then you end up with loads of orphaned Excel instances lurking in the background. When this happens eventually the web server runs out of memory.

Excel is definitely not intended for this type of use. Some people get lucky because they know what they're doing, 99% of the rest should just stay away.


Can't you just write it as a Windows service then and use some sort of an asynchronous queue (like a file folder)?


Worked with this kind of tech for a few years... And trash it to go with Aspose components.

It's slow, and really hard to debug when it's not working (not only pdf output but merge fields).


Aspose is a complete no brainer, yet so many clients complain about the extra cost. Somehow they don't see the hours of their own time debugging this stuff as invariably costing far more.


Close. You don't even need a Windows service. Just us a serialized queue using MSMQ. Write the queue listener in Javascript and register with COM+. Excel automation on the server is easy if you do it right, and impossible if you do it wrong.


Most of the devs I've encountered over the years who want to use Excel server side have never heard of queuing.

> Just us a serialized queue using MSMQ. Write the queue listener in Javascript and register with COM+.

Sure, but now you have MSMQ and COM+ to deal with...and debugging/fixing when it breaks (under load), or isn't secured properly.

The smart folks shell out for a copy of Aspose Cells and get the job finished on time.


Subsequent to my MSMQ approach, I did a project with Aspose. It's a pretty good toolkit, and I would recommend it over the MSMQ/COM+ complexity - Unless you absolutely have to automate Office, which is sometimes true.

On more recent projects in last ~12 months, I have used OpenXML.


I use Java RMI to connect AIX to W2K3 running a service that uses Jacob COM to call Excel. It reads a CSV file off a Samba share on the AIX box and saves back an XLS.

The lock file is on the AIX side because Win files aren't atomic.

It's a bit Rube Goldberg-ish and has low traffic, but it's rock solid.



I have done similar crazy things with Lync, back before Microsoft opened up their APIs on the Office 365 side so that they were marginally usable.

I wouldn't recommend it...


Just out of interest, have you explored using the Apps for Office functionality? I think it could provide a more robust solution (robust in the sense that it's supported by Microsoft) for integrating web interfaces into Excel. Basically it allows you to build Excel extensions using HTML/CSS/JS. I could understand using COM before Apps for Office was available.

http://dev.office.com/docs/add-ins/overview/office-add-ins


I nearly had to support one of these - we were using embedded OpenOffice 3.2 on Solaris then Ubuntu to translate DOC into PDF. Someone suggested we use embedded Office; we said "certainly! you will need to pay for a Windows admin." We had two customers for this failed project, so they didn't go for it ...

the tale: https://reddragdiva.dreamwidth.org/599841.html




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: