Sunday, 10 June 2018

Back from the Grave: Commodore VIC-20 (Part 3: blank, dark screen)


The Commodore VIC-20 - the "Friendly Computer".  Friendly perhaps to everyone except those who wanted to make it do anything significant.  The default RAM configuration meant it only had enough free RAM to hold a single page of 11-point text, making it perfectly adequate for documents which would fit but utterly dreadful for those that wouldn't.  Its 16 colour palette completely lacked any greys, and only 2 of the 4 colours in multicolour mode could even access the second 8.  Even its sound was limited by virtue of each of the three melody channels only having minor overlap in their range.

Yet, despite all this, it was a hit.  Even to this day people are still forcing it to do things never even dreamed of at the time of its creation.  In 2013, even Doom was ported to it - it didn't have all the levels, all the enemies, all the guns or all the music, and ran in a reduced size while requiring 32KB of RAM (all bar 1 expansion), but it is still there, playable, and relatively faithful to the original.  As yet the same cannot be said of the C64, which currently uses a co-processor to do the same (a 65c816, similar to that of a Super Nintendo/Famicom, running at 20MHz with 16MB of its own RAM) and still isn't nearly as complete (eg It has no sound).
This time, I'll be touching on the remaining of the two rescue units from part 1.  In the interest of brevity, I'll carry on from where I reached previously after having just tested the unit with the substitute VIC chip and found the garbled screen readout.

I was actually reluctant to post this at first - hence the delay.  While we may have gotten it working, the solution feels very "brute force".  It may well have been the right course of action but it leaves very little room for learning from the experience, since we can't be exactly sure what part of the system was at fault.

As we now had a fully working system, diagnostics could be performed easily by way of substitution.  Essentially chips from the non-working system could be removed, one by one, and installed to see if the same issue could be replicated.  However, there was no point in just removing chips randomly - not only would it potentially take longer, but it wouldn't help us in future diagnostics.

Our first step was testing the ROMs.  Since we were aiming only to have the system boot at this time, we only looked at the BASIC and kernel ROM chips.  On boot, the VIC-20 would ordinarily read in the kernel, then BASIC, so we started with the latter first as it seemed to be doing at least something.

Removing the BASIC ROM from the working machine would show that this was unlikely to be the cause.  Without the BASIC ROM, the screen was ready but blank - no visible corruption.  We then went on to confirm our suspicions by installing the chip from the not-working machine, and found that it worked perfectly.

The kernel ROM was much more obvious in that it wasn't likely to be a problem.  Without this chip, the working system would simply blank screen.  No initialisation was taking place.  Once more, installing the chip from the not-working machine would show that it too worked perfectly.  While we unfortunately didn't have a solution, it did mean it wasn't a proprietary chip that was at fault so it would be cheaper and easier to repair.

Our next target was the RAM.  As these are Revision D units, the RAM is divided into 5 chips - two 2K bytes and three 1K nibbles, with one of the 1K nibble chips being used for colour RAM.  Removing one of the two system RAM 1K nibble chips would result in the system blank-screening, so these would seem to be working; though again this was more food for thought for the 3rd unit.  We did, of course, confirm that these chips worked by substituting them into the working machine.

The two 2K byte chips thus remain the most likely cause of failure - which makes sense, given the display data is stored at the end of system RAM.  Removing one of the chips from the working unit resulted in a completely garbled, but continually changing display.  Not quite what we were seeing.  Removing the other one, however, produced an identical result.  It would appear we had a single non-functional RAM chip, which we confirmed by installing a known-working chip in the not-working system.

It was at this point we started to realise that there may be a more subtle issue with the system.  While it was now "working", it wasn't wasn't fixed.  It would run, but invariably it would glitch and stall.  Sometimes it would take seconds, other times it would run for around a minute, but it would always happen.

It was also at this point that we had serviced a faulty PS3 using a heat gun, which started giving me desperate ideas.

Initially, we simply put a bit of heat onto the rear side of the PCB.  This unit had very pronounced "bubbling" under the solder mask, which is simply a manufacturing defect that typically causes no problems beyond the aesthetic.  My line of thought was that one of these thicker traces may well have a small fracture, which could cause intermittent failure such as we were experiencing.

As it happens, this seemed to be a step in the right direction.  Following a brief, un-controlled heating, it was "working" better: it would no longer freeze on start-up, and it would take substantially longer to do so on occasion.  In fact, it took so long to crash the first time we thought it had been fixed.

Eventually we decided to do a second, controlled round of heating.  Removing all the chips we could (while damage was not expected, it's best to eliminate the possibility entirely), we proceeded to heat the board from the rear once more.  This time we kept applying the heat until the entire PCB had reached a surface temperature of 175°c (measured with an IR thermometer on the rear of the board), at which point it was left to cool overnight while laying as flat and level as we could get it.

The following day, testing resumed.  It fired up perfectly (as was normal by the time) so it at least wasn't any worse, and it started running.  And it kept running.  It looped the Donkey Kong demo for around an hour, successfully completed two levels of Doom and ran a memory test on 24KB of expanded RAM, all without even the slightest hint of a problem. 

So ends part third and final part of this three-part BftG.  As I mentioned, I'm not entirely happy with this solution.  Clearly there was some kind of soldering damage in there, be it a loose component or perhaps one of the built-up traces as I suspected, but we have no way of knowing for certain which of those is the case or where on the board it was.  All that really matters, though, is that it's working at last, which isn't bad for a curb-side find.

Remember that electricity can be dangerous to both property and life.  If you don't respect it you can wind up seriously hurting or even killing yourself or others, and chances are it's going to be painful while it's happening.  Unless you're sure of what you're doing, avoid tampering with any electrical circuit - especially when it's live or mains powered - and always ensure you have a nearby assistant who can contact emergency services.

No comments:

Post a Comment