N-SPC reference: https://wiki.superfamicom.org/nintendo-music-format-(n-spc) DSO's notes: http://forum.metroidconstruction.com/index.php?topic=3184 Nomenclature { So I *think* is how it is. The SNES APU (audio processor unit) is called the S-SMP (written on the chip itself). The S-SMP uses the SPC700 instruction set (also found in Sony's CXP8nnnn microchips), inspired by the 6502 ISA. For this reason, the S-SMP is also called the SPC700 CPU or just "the SPC700" (when it's clear from the context). There's also the S-DSP, a separate chip for digital signal processing that's accessible from the APU. The software that drive the music and sound effects is called the SPC engine. The SPC engine commonly used in Nintendo games is called the N-SPC engine. DIR refers to the sample table (because?). The sample table is an array of pointers to the sample data, indexed by instrument number. The sample data is an array of samples stored in the BRR format (bit rate reduction). An echo period is 16 ms. A piece of music is called a tracker. A tracker is composed of tracks, which are played on different channels / voices. Both trackers and tracks are lists of instructions / commands interpreted by the SPC engine. } IO registers { $F0: Testing functions (write only). Software should never change this register $F1: APU options (write only) { v = r0Ll0cba a: Enable timer 0 b: Enable timer 1 c: Enable timer 2 l: Reset port $F4/$F5 input latches to 0 L: Reset port $F6/$F7 input latches to 0 r: Reading $FFC0..FF reads from ROM (the boot program). Note that writes always affect RAM } $F2: DSP register address $F3: DSP register data. Read/write forwards from/to DSP RAM $F4..F7: CPU IO 0..3. Read/write forwards from/to CPU's APU IO registers (and vice versa) $F8..F9: Unused $FA..FC: Timer 0..2 dividers (write only). 0 = divide by 100h. Timers 0 and 1 have 8kHz sources, timer 2 has a 64kHz source $FD..FF: Timer 0..2 outputs (read only). Range 0..Fh. Cleared after read or if timer is disabled (by clearing bit in $F1) } DSP registers { $v0: Left volume for voice v. Samples are multiplied by ±volume / 128 $v1: Right volume for voice v. Samples are multiplied by ±volume / 128 $v2: Pitch scaler * 1000h for voice v. Range 0..3FFFh. Defines the sample frequency (does not apply if noise enabled) { 400h = 8 kHz (decrease pitch by two octaves) 800h = 16 kHz (decrease pitch by one octave) 1000h = 32 kHz (no adjustment) 2000h = 64 kHz (increase pitch by one octave) 3FFFh = 128 kHz (increase pitch by two octaves) } $v4: Sample table index for voice v. AKA source/instrument number. Read when voice is keyed on or looped $v5: ADSR settings for voice v { SSSsssssedddaaaa a: Attack rate { 0: 64 ms 1: 40 ms 2: 24 ms 3: 16 ms 4: 10 ms 5: 6 ms 6: 4 ms 7: 2.5 ms 8: 1.5 ms 9: 1 ms Ah: 0.625 ms Bh: 0.375 ms Ch: 0.25 ms Dh: 0.15625 ms Eh: 0.09375 ms Fh: 0.03125 ms (every sample) } d: Decay rate { 0: 2 ms 1: 1.25 ms 2: 0.75 ms 3: 0.5 ms 4: 0.3125 ms 5: 0.1875 ms 6: 0.125 ms 7: 0.0625 ms (every other sample) } e: Enable ADSR, otherwise gain is used s: Sustain rate { 0: Forever 1: 64 ms 2: 48 ms 3: 40 ms 4: 32 ms 5: 24 ms 6: 20 ms 7: 16 ms 8: 12 ms 9: 10 ms Ah: 8 ms Bh: 6 ms Ch: 5 ms Dh: 4 ms Eh: 3 ms Fh: 2.5 ms 10h: 2 ms 11h: 1.5 ms 12h: 1.25 ms 13h: 1 ms 14h: 0.75 ms 15h: 0.625 ms 16h: 0.5 ms 17h: 0.375 ms 18h: 0.3125 ms 19h: 0.25 ms 1Ah: 0.1875 ms 1Bh: 0.15625 ms 1Ch: 0.125 ms 1Dh: 0.09375 ms 1Eh: 0.0625 ms 1Fh: 0.03125 ms (every sample) } S: Sustain level When a voice is keyed on, its envelope is set to zero and is put into attack mode. Whilst in attack mode, the voice's envelope is increased periodically (the period is given by the attack rate). The envelope is increased by 20h, except for attack rate = Fh, where the envelope is increased by 400h. Once the voice's envelope reaches 7E0h+, it's capped to 7FFh and the voice is put into decay mode. Whilst in decay mode, the voice's envelope is decreased periodically (the period is given by the decay rate). The envelope is decreased by 1 + (envelope - 1) / 100h. Once the voice's envelope reaches (sustain level + 1) * 100h or less, it is put into sustain mode. Sustain mode is the same as decay mode, except using the sustain rate (which can be infinite) in place of the decay rate. When a voice is keyed off, it is put into release mode. Whilst in release mode, the voice's envelope is decreased every sample by 8 (or muted by BRR loop and mute). } $v7: Gain settings for voice v { 0eeeeeee ; Direct gain 1mmrrrrr ; Custom gain e: Voice envelope = e * 10h m: Gain mode { 0: Linear decrease 1: Exponential decrease 2: Linear increase 3: Bent increase } r: Gain rate { 0: Forever 1: 64 ms 2: 48 ms 3: 40 ms 4: 32 ms 5: 24 ms 6: 20 ms 7: 16 ms 8: 12 ms 9: 10 ms Ah: 8 ms Bh: 6 ms Ch: 5 ms Dh: 4 ms Eh: 3 ms Fh: 2.5 ms 10h: 2 ms 11h: 1.5 ms 12h: 1.25 ms 13h: 1 ms 14h: 0.75 ms 15h: 0.625 ms 16h: 0.5 ms 17h: 0.375 ms 18h: 0.3125 ms 19h: 0.25 ms 1Ah: 0.1875 ms 1Bh: 0.15625 ms 1Ch: 0.125 ms 1Dh: 0.09375 ms 1Eh: 0.0625 ms 1Fh: 0.03125 ms (every sample) } Using direct gain, when a voice is keyed on, its envelope is set according to e. Using custom gain, when a voice is keyed on, its envelope is set to zero. The voice's envelope is modified periodically according to the gain mode (the period is given by the gain rate). Linear increase/decrease increases/decreases the voice envelope by 20h. Exponential decrease decreases the voice envelope by 1 + (envelope - 1) / 100h. Bent increase increases the envelope by 20h until the envelope is 600h+, then increase the envelope by 8. When a voice is keyed off, it is put into release mode. Whilst in release mode, the voice's envelope is decreased every sample by 8 (or muted by BRR loop and mute). } $v8: Current envelope value / 10h for voice v (read only) $v9: Current sample value / 80h for voice v (read only) $0C: Left channel master volume. Samples are multiplied by ±volume / 128 $1C: Right channel master volume. Samples are multiplied by ±volume / 128 $2C: Left channel echo volume. Samples are multiplied by ±volume / 128 $3C: Right channel echo volume. Samples are multiplied by ±volume / 128 $4C: Key on flags $5C: Key off flags $6C: FLG (noise frequency, reset, mute and echo buffer write disable) { rmefffff f: Noise frequency { 0: 0 1: 15.625 Hz 2: 20.8333 Hz 3: 25 Hz 4: 31.25 Hz 5: 41.666 Hz 6: 50 Hz 7: 62.5 Hz 8: 83.33 Hz 9: 100 Hz Ah: 125 Hz Bh: 166.6 Hz Ch: 200 Hz Dh: 250 Hz Eh: 333 Hz Fh: 400 Hz 10: 500 Hz 11: 666 Hz 12: 800 Hz 13: 1000 Hz 14: 1333 Hz 15: 1600 Hz 16: 2000 Hz 17: 2666 Hz 18: 3200 Hz 19: 4000 Hz 1Ah: 5333 Hz 1Bh: 6400 Hz 1Ch: 8000 Hz 1Dh: 10666 Hz 1Eh: 16000 Hz 1Fh: 32000 Hz (every sample) } e: Disable echo buffer writes m: Mute amplifier r: Soft reset. Key off all voices and set envelopes to 0 } $7C: Voice end flags. Reading gives the sample data end flags for each voice. Writing clears this register $0D: Echo feedback volume. Samples are multiplied by ±volume / 128 $2D: Pitch modulation enable flags. Generates a frequency sweep effect { gfedcba0 a: Modulate voice 1's pitch by voice 0's amplitude b: Modulate voice 2's pitch by voice 1's amplitude c: Modulate voice 3's pitch by voice 2's amplitude d: Modulate voice 4's pitch by voice 3's amplitude e: Modulate voice 5's pitch by voice 4's amplitude f: Modulate voice 6's pitch by voice 5's amplitude g: Modulate voice 7's pitch by voice 6's amplitude } $3D: Noise enable flags. When enabled, noise is output instead of sample data. Note that samples are still decoded and reaching the end of one can still mute the voice. $4D: Echo enable flags $5D: Sample table address / 100h $6D: Echo buffer address / 100h $7D: Echo delay (echo buffer size). Range 0..Fh. 0 = one sample (4 bytes). Otherwise delay = v * 16 ms (v * 2 KB) $cF: Echo FIR filter coefficient c (finite impulse response, basically just convolution) $xA..xB: Unused $1D: Unused $xE: Unused } Sample table format: { Sample table contains up to 100h entries of four bytes: ssss llll s: Start address (used when voice is keyed on) l: Loop address (used when end of sample data is reached) } Sample data format (BRR): { Sample data is written in 9 byte blocks, a header byte followed by 8 bytes of 4-bit signed samples (high nybble first). Header byte has the format: aaaaiill a: Amplifier amount. The sample is shifted left by s - 1 into a 15-bit sample i: Interpolation mode { 0: Output sample as is 1: Output sample + p * 15/16 2: Output sample + p * 61/32 - P * 15/16 3: Output sample + p * 115/64 - P * 13/16 Where p is the previous sample and P is the sample before } l: Loop mode { 0/2: Continue to next sample data block 1: Loop and mute (jump to loop address from sample table, set voice end flag ($7C), release, envelope = 0) 3: Loop (jump to loop address from sample table, set voice end flag ($7C)) } Note that hardware glitches occur if the output sample is not within -3FFAh..+3FF8h. }