It's a little late, but there's a M5stack Plus module now that includes a battery (400 or 500 mAh, the official page can't make up its mind), a rotary encoder and a microphone. Still adds a bit of bulk, but if that other little microphone didn't work out it might be worth it. I think the rotary encoder will be a great addition to a watch as you get extra controls (rotation and I think the little wheel can be pressed and used as another button) and using the wheel to set the time and alarms is probably faster and more enjoyable then using buttons.
Btw from what I've read and heard the speaker audio isn't amazing, because the DAC on an ESP32 just isn't as high res as a good audio Dac. Fine for some alarms and stuff and probably even usable as a memo recorder, but not something you'd use for music.