Assuming you know where the tile is in the BAT, you can read it's value via n=vram[addr], and re-write it via vram[addr] = n. All in HuC.
It's a cool programming-metaphor, but it's actually (slightly) slower than the dedicated __fastcall functions that do the same thing, so FYI, I removed those vram[] arrays in the new HuC.
But much quicker in assembler....
fill_vram() { / #asm / ....asembler code here ... / #endasm / }
we do things like that quite a bit; the function is callable from HuC, and runs at assembler speed (because it is assembly)
Keep in mind, though, this is assuming you are not using the HuC map functions, and will handle scrolling / re-writing the BAT yourself.
Yep ... HuC is OK, as long as you're using the built-in library functions (written in assembly-language) to do stuff ... but anything that it doesn't directly support is only going to run fast if you write your own optimized assembly language code for it.
Arkhan and TheOldMan have spent a long time learning just what does, and does not, need that kind of optimization in their games.

As always, please excuse my ignorance but I love hearing and clarifying this stuff.
There's never anything wrong with asking questions!

max 256 tiles per map,
Is this the conflict with vram limitations? My next question could answer this one, but is this why there could be no room in the vram for a single tile additional in the screenies of question?
It's because the map in HuC uses a byte to store the tile number, rather than a 16-bit word/int.
That halves the amount of space needed for the map ... but it means that you're limited to 256 tiles.
And there's no room in the map to store a palette number either, so the palette number to use is looked-up from the tile number, instead of just being a part of the map-data.
It's a design trade-off ... reducing the size of stuff in RAM/HuCard, but causing a limitation in the graphics.
Consider ... 256 16x16 tiles take up 32KB of VRAM ... that probably seemed like a sensible limit when HuC was written.
OTOH ... 256 8x8 tiles only take up 8KB of VRAM, and severely limits the graphic-quality.
Design trade-offs are like that ... there's no one-size-fits-all solution!

max 256x256 map size,
Does that mean 4096 is the max size for a mappy (256x256 16x16 tiles)?
As-in 4096x4096 pixels maximum map size? Well yep ... kind-of.
Except that HuC also has a secondary mechanism for using multiple 256-tile-x-256-tile maps within a 16384-tile-x-16384-tile area.
But that's getting really complicated, and I suspect that you'd be better-off writing your own map functions if you needed large maps like that.
The low-RAM (i.e. HuCard) 4th-gen stuff that I've done used the tile/block/map design.
There the map is still byte-per-entry, but the byte is a block number (0..255).
Each block is then a 4-word 2x2 tile entry, allowing up to 1024 tiles, and allowing each tile in each block to specify its own palette.
Still limited ... but it allows for the reuse of tiles with different palettes, and it allows for more tiles (i.e. better graphics).
But again, it has its own design trade-offs.
BTW ... if you get sneaky, you can use certain map codes to trigger the loading/swapping of the block definitions to work around the 256-block limit.

When you have more RAM, like on the PCE CD, you can use 16-bits per entry in the map, with up to 4096 tiles, and full palette usage ... and then sectorize and compress everything and just decompress the sectors that are close to the player.
That gives you the greatest flexibility ... but it comes at the cost of complex art-tools and runtime code.
with palette defined by the tile number.
So to my previous question, there's not a practical/fast way to flood a screen with a single tile, then change each row's color to create a gradient for a background?
I can't think of a *simple* way, within the design of the HuC library, particularly if you're scrolling the screen.
If you drop down into ASM, you could do a nasty hack ... but if you're writing in ASM, then you'd just redesign the map functions and avoid the limit altogether.
So yeah, you don't even need tiles to make a gradient in the sky, and if you did it that way the gradient wouldn't move together with the BG if it moved vertically, thus making a bit of parallax, but again, I don't know if HuC supports that...
Good point!

That might be the sane way to do something with little cost, but it might need custom assembly-language code and, if so, could easily cause lots of problems with HuC's existing split-screen hblank-interrupt code.
IMHO, it would be a risky thing to try to add to Henshin Engine at this point.