Replace chained adder in rptr_empty/wptr_full with parallel
pre-computation (rbin+1, rbin+2) and mux selection. This reduces
the critical path from ~9 to ~5-6 logic levels, improving clk_pixel
Fmax from 120.8 to 166.7 MHz (+38%).
Add build.sh/build.tcl for headless CLI builds via gw_sh with
timing-driven PnR and increased placement/routing effort.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix blocking/non-blocking assignment mismatches: use = in combinational
blocks (hdmi.v hsync/vsync) and <= in sequential blocks (hsdaoh_core.v
fifo_read_en). Add explicit PLL clock definitions and asynchronous clock
group declarations to all SDC files to eliminate false cross-domain
timing violations.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>