Fixing the Ethernet Board from a Vintage Xerox Alto

linker3000 · on Nov 6, 2017

Noted that Ken had an issue when a 74S TTL part was replaced with a 74LS because the characteristics are different.

Back in the day (mid 1980s), when I was going through an electronic engineering apprenticeship with a flight simulator company and many boards were end-to-end TTL, it was generally understood that power-hungry 74S series TTL were used for a reason (generally speed) and should not be substituted lest 'funny things' happen.

Mind you, there was one time when debugging a glitch led to one chip being replaced with the same type from a different family - something like a 74LS replacing a 74F - and the propagation delay difference was enough to fix an edge-case timing issue. Due to project constraints, the root cause wasn't tracked down and so the board bill of materials made it clear that the 'odd one out' was correct and should not be changed.

PS: Ken - I think I still have a quantity of 74S TTL 'pulls' from the day, so if all else fails sourcing a part, look me up!

WalterBright · on Nov 7, 2017

Back in the 70s when I designed TTL circuits, the rule was to do the math to work out all the worst case timings. The design was done with the expectation that any part could be replaced with a faster part, and the design would still work. (Because parts got faster over time, not slower, and the slower parts would become unavailable.)

Declanomous · on Nov 6, 2017

Wouldn't replacing the 74S86 with a 74VHC86 have worked? I'm a hobbyist, so my knowledge is pretty limited[1], but I was under the impression that the VHC series chips were drop-in replacements for the S series in both timing and logic levels.

[1] Read as 'I know just enough to be dangerous, but not enough to understand why what I'm doing is dangerous.'

CodeWriter23 · on Nov 6, 2017

The way to confirm your assumption is to look at the different typical timing waveform charts from the data sheet for each part.

I once traced hundreds of thousands of dollars worth of bunk boards back to some RAM chips substituted on the basis of being a "drop in replacement" by the procurement department. The timing diagrams revealed the replacements possessed different timing characteristics. Enough so that the chips would occasionally leave the bus floating when the memory was read from the host side.

katastic · on Nov 6, 2017

"It's not a hardware problem, it's a human problem!"

ChrisGammell · on Nov 6, 2017

Had a chance to chat with Ken, really enjoyed hearing about the history of the Alto and the process they take for troubleshooting what is effectively priceless hardware (given the fact that so few are remaining): https://theamphour.com/361-an-interview-with-ken-shirriff/

sillysaurus3 · on Nov 6, 2017

I started probing the Ethernet board's input circuit with the oscilloscope. The board was receiving the input okay, but a few gates later the signals looked kind of sketchy, as you can see above.

Is it even possible to do this kind of debugging on modern hardware? Or is this a lost art?

SAI_Peregrinus · on Nov 6, 2017

It's possible, but harder. A modern Ethernet card would probably be doing this step entirely on one chip. A quick search shows the Microchip ENC28J60 as a pretty typical part. The result is that you'd be able to probe the input to the chip and the output, and see the failure at the output, and replace the chip. Effectively the same as what was done here, just with more components built into the chip. Louis Rossmann[1] does macbook component-level logic board repairs, though 99% of the time he doesn't need to break out the oscilloscope to find the problem. It's almost always a power supply or liquid damage problem, or something similar that can be found with nothing more than a multimeter.

[1] https://www.youtube.com/playlist?list=PLkVbIsAWN2lsHdY7ldAAg...

sillysaurus3 · on Nov 6, 2017

https://www.youtube.com/watch?v=dK9pfbGSE4A is incredibly cool. Thanks for this series!

It's so neat to watch a pro in a completely different domain. I know some of the basic concepts, but it's mesmerizing to watch him reason through voltage levels and pinpoint a failed resistor. That's magical to me.

heywire · on Nov 6, 2017

Agreed. I became aware of Louis' work after looking for information on the infamous Mid-2010 Macbook Pro 15 GPU kernel panic. After watching this video [1], I was able to replace the same capacitor for ~$3 and restore a previously unusable Macbook.

[1] https://www.youtube.com/watch?v=DzcgT_fiVTA

sillysaurus3 · on Nov 6, 2017

https://youtu.be/xRwfJpNvWI0?t=1247

Bad idea to eat cereal while watching his videos... That was hilarious.

I love that he leaves in all the little mistakes. Most videos are polished and presented. Tutorials are nice, but you get so much more raw data from watching this. Plus it feels like having a conversation.

Looks like Ken has some videos like this too: https://www.youtube.com/watch?v=adEr2aRwHnI

heywire · on Nov 6, 2017

> I love that he leaves in all the little mistakes

In fact, in the video I mentioned, someone who was watching his livestream pointed out that he installed a polarized capacitor backwards...

emeraldd · on Nov 6, 2017

    "That's not unprofessional... This is unprofessional... "

That made my day! Thanks for pointing this out.

WalterBright · on Nov 7, 2017

I kept my old computers from the 80's to today. A couple years ago I tried to fire them up. None older than 10 years would come up. Some would crackle and belch smoke, others weird POST errors.

Oh well.

katastic · on Nov 6, 2017

I thought you needed a many-GHZ scope/logic probe to capture modern ethernet.

SAI_Peregrinus · on Nov 7, 2017

Not really. For example the chip I mentioned (and most others) includes JTAG test capability, and several programmable modes. That makes it possible to test the chip more easily. Also some of the most common errors (overvoltage/overcurrent applied to inputs) will likely result in a short to ground, so they'll show up when checking the chip's power input with a multimeter: it will be shorted to ground! Other errors can cause other symptoms, but needing to actually view the signal is pretty rare.

dfox · on Nov 6, 2017

Debugging this problem in this way was not possible in the time of Alto and became possible in late 80's (with LeCroy DSOs) and really practical in early to mid 90's (HP/Agilent) and then cheap relatively recently (chinese DSOs). In the days before that you relied on CRT based scope and tried to make the fault frequent enought to be visible on it.

To some extent this is easier today (but not might be so in few years) because you can buy realtime DSO that is fast enough to capture almost any computer interface used today with usable resolution (albeit you can probably find some successful small-ish hardware startup that would cost you less than such scope)

The art in this kind of debugging is IMHO in using woefully inadequate test equipment to come up with meaningful idea of what might be broken. And this is still done today, with TV repair shops fixing OLED TVs with nothing but DVM and RLC bridge, me looking at 400Mbps+ serial LVDS on customer site with 50MHz portable DSO (they didn't have anything else) and such.

labcomputer · on Nov 7, 2017

I don't see anything in the write-up that requires a DSO--an storage tube (analog) oscilloscope should suffice.

Most storage tube oscilloscopes supported single sweep capture in bi-stable mode. That is, you could clear the screen, reset the timebase, and capture the trace of a single trigger event. In bi-stable mode, the image should last hours. Some models (like the Tek 7313) even allowed the top and bottom of the screen to be set to storage or live independently.

On top of that, most scope camera shutters could be set up to automatically open during the forward seep (called "single sweep" on Tek cameras). That allows you to set up the scope, and walk away while you wait for the rare event to occur. Tektronix even sold delay lines (big coils of hardline coax) with trigger pick-offs for the express purpose of displaying before-the-trigger waveforms.

No question that DSOs make this all easier, but saying it was impossible is a bit of a stretch.

sigstoat · on Nov 6, 2017

> Is it even possible to do this kind of debugging on modern hardware? Or is this a lost art?

EE's still have to diagnose hardware problems. Keysight will be happy to sell you diagnostic equipment for investigating problems with your 100Gbps ethernet devices. Your bank probably won't care to fund the necessary loans, though.

al2o3cr · on Nov 6, 2017

Sorta? Not all the signals are available, depending on the level of integration. OTOH, modern hardware also frequently has test support via things like JTAG, which can give you a ton of information and/or control the device directly. For instance, there's ChipScope from Xilinx (https://www.xilinx.com/products/design-tools/chipscopepro.ht...) which allows you to debug your FPGA hardware from outside without wasting a ton of pins. It's halfway between poking the board with an oscilloscope and running GDB, as it uses the metadata from your design to label things.

speleo_engr · on Nov 6, 2017

The input even on modern hardware is received by an Ethernet PHY which then has logic-level lines going into the SoC. So this level of debugging would be possible. Inside the SoC if something has failed, well, you are kind of screwed.

srcmap · on Nov 6, 2017

Fundamental engineering debugging process is similar.

I recently debugged packet issues after 100gbps port speed change to 25gbps, to 10 gbps issues. Breakdown the issue to sub-system components - check phy, mac loopback, check link status on both side of the QSFP connections before and after speed changes, check counters (A LOT of them), enable debug packet to send to CPU with special command. Check all the VLAN, port settings, configuration commands over and over again. Enable debugging on kernel driver to track down every bytes/bits of every packet. Use tcpudmp on linux socket layer when one gets to that point.

Instead of oscilloscope, today's SOC does have a lot of counters inside that help one identifies issues.

For complex issue, one does get tremendous high when the issue was ID and resolved.

speleo_engr · on Nov 7, 2017

Sure, that is very true during development or debugging a complex production issue as you mentioned. I meant troubleshooting and reworking along the lines of "this board used to work, now it doesn't". Most reworks I see of boards involve replacing parts, which in a modern Ethernet design would consist of the jack, some magnetics, PHY, and SoC.

nerpderp83 · on Nov 7, 2017

> one does get tremendous high when the issue was ID and resolved

I find getting a little high really helps with long debugging sessions.

subway · on Nov 6, 2017

If you can find the relevant documentation, sure. Modern hardware is made from black boxes on top of black boxes.

sgt · on Nov 6, 2017

That is a very interesting use of a BeagleBone. He made an interface to the Alto's 3Mbit/sec Ethernet using the BeagleBone's PRU (a programmable real-time unit of which I believe it has two). Someone also used the PRU to build a video card for a Macintosh SE: https://hackaday.com/2014/02/05/the-30th-anniversary-macinto...

CodeWriter23 · on Nov 6, 2017

A word from experience: an extender card can introduce noise, which can lead to observations of self-induced problems due to the introduced noise. I don't think this is the case for your situation, but symptoms the runt edges you observed could be caused by the extender.

Also, I have on occasion observed the magic of an extender card causing a flaky card to function properly.

If you get stumped, you might want to depopulate some non-essential cards to open up a space to get your hands and probes on the card under investigation, while it is directly plugged into the slot. Excuse me, I mean get your hands and probes in there while the system is powered off.

purplezooey · on Nov 7, 2017

What job can I get where people pay me to restore vintage gear like this? :)

(answer: "already be rich")

ChuckMcM · on Nov 7, 2017

Actually if you have the skills people will seek you out. The trick is to repair things and demonstrate said skills. Since most of the vintage gear is easily covered by a 100Mhz oscilloscope, a DMM, and a 10Mhz signal generator you can do a lot of that. Fixing things you pick up at thrift shops will help you develop the mindset and 'bodging' skills, then work your way up to more interesting things. You will be amazed at what you find in the bin at some places.