You have to capture a jpeg from the camera buffer and pass it to the llm... An example could be:
uint8_t* out_jpg = NULL;
size_t out_jpg_len = 0;
frame2jpg(CoreS3.Camera.fb, 255, &out_jpg, &out_jpg_len);
CoreS3.Camera.free();
module_llm.yolo.inferenceAndWaitResult(
yolo_work_id, out_jpg, out_jpg_len, [](String& result) {
/* do something with result */
....
}, 2000, "ID");
free(out_jpg);
the result will contain:
{"bbox":["195.86","197.75","319.00","280.27"],"class":"keyboard","confidence":"0.69"}
HTH