If MSR had been coded in assembly, it would likely have been a lot smaller. And actually, there are some things I would have done differently even if I were to start this from scratch. For example: the BRAM functions require a single, contiguous region of memory. MSR uses a lot of different arrays and variables. Therefore, it is necessary to put all the variables and arrays into one large array when saving, and to split up said large array when loading. That eats up a LOT of code space. Going back into the code and changing all of this would be a tremendous effort and would take at least two weeks' time in and of itself, so it's not really worth the trouble, plus it would set the beta testing back considerably. It would have been better to use a large char array for all the data from the start and implement a custom number system; would have saved a lot of code space I think. But of course, that was all done before BRAM access was even added, and by then, the code was already too far along to go back and modify. If HuC had struct support, it would have been a simple case of writing the struct to BRAM and that would be the end of it. But alas... 

HuC 
is powerful, but it's knowing how to use it that's the trick. Array access, for example, is extremely slow, so arrays can be difficult to use for fast-paced action games, where arrays are normally required. What you do instead is get the pointer to the array (still slow, but only has to be done once) and use that for all references in the code. Saves a ton of time. And using globals is really a "must" because of HuC's slow stack method for function arguments; this defies "normal" logic for C coding, where globals are best used sparingly.