N-SPC reference: https://wiki.superfamicom.org/nintendo-music-format-(n-spc)
DSO's notes: http://forum.metroidconstruction.com/index.php?topic=3184

Nomenclature
{
    So I *think* is how it is.
    
    The SNES APU (audio processor unit) is called the S-SMP (written on the chip itself).
    The S-SMP uses the SPC700 instruction set (also found in Sony's CXP8nnnn microchips), inspired by the 6502 ISA.
    For this reason, the S-SMP is also called the SPC700 CPU or just "the SPC700" (when it's clear from the context).
    There's also the S-DSP, a separate chip for digital signal processing that's accessible from the APU.
    
    The software that drive the music and sound effects is called the SPC engine.
    The SPC engine commonly used in Nintendo games is called the N-SPC engine.
    
    DIR refers to the sample table (because?).
    The sample table is an array of pointers to the sample data, indexed by instrument number.
    The sample data is an array of samples stored in the BRR format (bit rate reduction).
    
    An echo period is 16 ms.
    
    A piece of music is called a tracker.
    A tracker is composed of tracks, which are played on different channels / voices.
    Both trackers and tracks are lists of instructions / commands interpreted by the SPC engine.
}

IO registers
{
    $F0: Testing functions (write only). Software should never change this register
    $F1: APU options (write only)
    {
        v = r0Ll0cba
        a: Enable timer 0
        b: Enable timer 1
        c: Enable timer 2
        l: Reset port $F4/$F5 input latches to 0
        L: Reset port $F6/$F7 input latches to 0
        r: Reading $FFC0..FF reads from ROM (the boot program). Note that writes always affect RAM
    }
    $F2: DSP register address
    $F3: DSP register data. Read/write forwards from/to DSP RAM
    $F4..F7: CPU IO 0..3. Read/write forwards from/to CPU's APU IO registers (and vice versa)
    $F8..F9: Unused
    $FA..FC: Timer 0..2 dividers (write only). 0 = divide by 100h. Timers 0 and 1 have 8kHz sources, timer 2 has a 64kHz source
    $FD..FF: Timer 0..2 outputs (read only). Range 0..Fh. Cleared after read or if timer is disabled (by clearing bit in $F1)
}

DSP registers
{
    $v0: Left volume for voice v. Samples are multiplied by ±volume / 128
    $v1: Right volume for voice v. Samples are multiplied by ±volume / 128
    $v2: Pitch scaler * 1000h for voice v. Range 0..3FFFh. Defines the sample frequency (does not apply if noise enabled)
    {
        400h = 8 kHz (decrease pitch by two octaves)
        800h = 16 kHz (decrease pitch by one octave)
        1000h = 32 kHz (no adjustment)
        2000h = 64 kHz (increase pitch by one octave)
        3FFFh = 128 kHz (increase pitch by two octaves)
    }
    $v4: Sample table index for voice v. AKA source/instrument number. Read when voice is keyed on or looped
    $v5: ADSR settings for voice v
    {
        SSSsssssedddaaaa
        a: Attack rate
        {
            0: 64 ms
            1: 40 ms
            2: 24 ms
            3: 16 ms
            4: 10 ms
            5: 6 ms
            6: 4 ms
            7: 2.5 ms
            8: 1.5 ms
            9: 1 ms
            Ah: 0.625 ms
            Bh: 0.375 ms
            Ch: 0.25 ms
            Dh: 0.15625 ms
            Eh: 0.09375 ms
            Fh: 0.03125 ms (every sample)
        }
        d: Decay rate
        {
            0: 2 ms
            1: 1.25 ms
            2: 0.75 ms
            3: 0.5 ms
            4: 0.3125 ms
            5: 0.1875 ms
            6: 0.125 ms
            7: 0.0625 ms (every other sample)
        }
        e: Enable ADSR, otherwise gain is used
        s: Sustain rate
        {
            0: Forever
            1: 64 ms
            2: 48 ms
            3: 40 ms
            4: 32 ms
            5: 24 ms
            6: 20 ms
            7: 16 ms
            8: 12 ms
            9: 10 ms
            Ah: 8 ms
            Bh: 6 ms
            Ch: 5 ms
            Dh: 4 ms
            Eh: 3 ms
            Fh: 2.5 ms
            10h: 2 ms
            11h: 1.5 ms
            12h: 1.25 ms
            13h: 1 ms
            14h: 0.75 ms
            15h: 0.625 ms
            16h: 0.5 ms
            17h: 0.375 ms
            18h: 0.3125 ms
            19h: 0.25 ms
            1Ah: 0.1875 ms
            1Bh: 0.15625 ms
            1Ch: 0.125 ms
            1Dh: 0.09375 ms
            1Eh: 0.0625 ms
            1Fh: 0.03125 ms (every sample)

        }
        S: Sustain level
        
        When a voice is keyed on, its envelope is set to zero and is put into attack mode.
        Whilst in attack mode, the voice's envelope is increased periodically (the period is given by the attack rate).
        The envelope is increased by 20h, except for attack rate = Fh, where the envelope is increased by 400h.
        
        Once the voice's envelope reaches 7E0h+, it's capped to 7FFh and the voice is put into decay mode.
        Whilst in decay mode, the voice's envelope is decreased periodically (the period is given by the decay rate).
        The envelope is decreased by 1 + (envelope - 1) / 100h.
        
        Once the voice's envelope reaches (sustain level + 1) * 100h or less, it is put into sustain mode.
        Sustain mode is the same as decay mode, except using the sustain rate (which can be infinite) in place of the decay rate.
        
        When a voice is keyed off, it is put into release mode.
        Whilst in release mode, the voice's envelope is decreased every sample by 8 (or muted by BRR loop and mute).
    }
    $v7: Gain settings for voice v
    {
        0eeeeeee ; Direct gain
        1mmrrrrr ; Custom gain
        e: Voice envelope = e * 10h
        m: Gain mode
        {
            0: Linear decrease
            1: Exponential decrease
            2: Linear increase
            3: Bent increase
        }
        r: Gain rate
        {
            0: Forever
            1: 64 ms
            2: 48 ms
            3: 40 ms
            4: 32 ms
            5: 24 ms
            6: 20 ms
            7: 16 ms
            8: 12 ms
            9: 10 ms
            Ah: 8 ms
            Bh: 6 ms
            Ch: 5 ms
            Dh: 4 ms
            Eh: 3 ms
            Fh: 2.5 ms
            10h: 2 ms
            11h: 1.5 ms
            12h: 1.25 ms
            13h: 1 ms
            14h: 0.75 ms
            15h: 0.625 ms
            16h: 0.5 ms
            17h: 0.375 ms
            18h: 0.3125 ms
            19h: 0.25 ms
            1Ah: 0.1875 ms
            1Bh: 0.15625 ms
            1Ch: 0.125 ms
            1Dh: 0.09375 ms
            1Eh: 0.0625 ms
            1Fh: 0.03125 ms (every sample)
        }
        
        Using direct gain, when a voice is keyed on, its envelope is set according to e.
        
        Using custom gain, when a voice is keyed on, its envelope is set to zero.
        The voice's envelope is modified periodically according to the gain mode (the period is given by the gain rate).
        Linear increase/decrease increases/decreases the voice envelope by 20h.
        Exponential decrease decreases the voice envelope by 1 + (envelope - 1) / 100h.
        Bent increase increases the envelope by 20h until the envelope is 600h+, then increase the envelope by 8.
        
        When a voice is keyed off, it is put into release mode.
        Whilst in release mode, the voice's envelope is decreased every sample by 8 (or muted by BRR loop and mute).
    }
    $v8: Current envelope value / 10h for voice v (read only)
    $v9: Current sample value / 80h for voice v (read only)
    
    $0C: Left channel master volume. Samples are multiplied by ±volume / 128
    $1C: Right channel master volume. Samples are multiplied by ±volume / 128
    $2C: Left channel echo volume. Samples are multiplied by ±volume / 128
    $3C: Right channel echo volume. Samples are multiplied by ±volume / 128
    $4C: Key on flags
    $5C: Key off flags
    $6C: FLG (noise frequency, reset, mute and echo buffer write disable)
    {
        rmefffff
        f: Noise frequency
        {
            0: 0
            1: 15.625 Hz
            2: 20.8333 Hz
            3: 25 Hz
            4: 31.25 Hz
            5: 41.666 Hz
            6: 50 Hz
            7: 62.5 Hz
            8: 83.33 Hz
            9: 100 Hz
            Ah: 125 Hz
            Bh: 166.6 Hz
            Ch: 200 Hz
            Dh: 250 Hz
            Eh: 333 Hz
            Fh: 400 Hz
            10: 500 Hz
            11: 666 Hz
            12: 800 Hz
            13: 1000 Hz
            14: 1333 Hz
            15: 1600 Hz
            16: 2000 Hz
            17: 2666 Hz
            18: 3200 Hz
            19: 4000 Hz
            1Ah: 5333 Hz
            1Bh: 6400 Hz
            1Ch: 8000 Hz
            1Dh: 10666 Hz
            1Eh: 16000 Hz
            1Fh: 32000 Hz (every sample)
        }
        e: Disable echo buffer writes
        m: Mute amplifier
        r: Soft reset. Key off all voices and set envelopes to 0
    }
    $7C: Voice end flags. Reading gives the sample data end flags for each voice. Writing clears this register
    
    $0D: Echo feedback volume. Samples are multiplied by ±volume / 128
    $2D: Pitch modulation enable flags. Generates a frequency sweep effect
    {
        gfedcba0
        a: Modulate voice 1's pitch by voice 0's amplitude
        b: Modulate voice 2's pitch by voice 1's amplitude
        c: Modulate voice 3's pitch by voice 2's amplitude
        d: Modulate voice 4's pitch by voice 3's amplitude
        e: Modulate voice 5's pitch by voice 4's amplitude
        f: Modulate voice 6's pitch by voice 5's amplitude
        g: Modulate voice 7's pitch by voice 6's amplitude
    }
    $3D: Noise enable flags. When enabled, noise is output instead of sample data. Note that samples are still decoded and reaching the end of one can still mute the voice.
    $4D: Echo enable flags
    $5D: Sample table address / 100h
    $6D: Echo buffer address / 100h
    $7D: Echo delay (echo buffer size). Range 0..Fh. 0 = one sample (4 bytes). Otherwise delay = v * 16 ms (v * 2 KB)
    
    $cF: Echo FIR filter coefficient c (finite impulse response, basically just convolution)
    
    $xA..xB: Unused
    $1D: Unused
    $xE: Unused
}

Sample table format:
{
    Sample table contains up to 100h entries of four bytes:
        ssss llll
        s: Start address (used when voice is keyed on)
        l: Loop address (used when end of sample data is reached)
}

Sample data format (BRR):
{
    Sample data is written in 9 byte blocks, a header byte followed by 8 bytes of 4-bit signed samples (high nybble first).
    Header byte has the format:
        aaaaiill
        a: Amplifier amount. The sample is shifted left by s - 1 into a 15-bit sample
        i: Interpolation mode
        {
            0: Output sample as is
            1: Output sample + p * 15/16
            2: Output sample + p * 61/32 - P * 15/16
            3: Output sample + p * 115/64 - P * 13/16
            
            Where p is the previous sample and P is the sample before
        }
        l: Loop mode
        {
            0/2: Continue to next sample data block
            1: Loop and mute (jump to loop address from sample table, set voice end flag ($7C), release, envelope = 0)
            3: Loop (jump to loop address from sample table, set voice end flag ($7C))
        }
    
    Note that hardware glitches occur if the output sample is not within -3FFAh..+3FF8h.
}