Observation

Sometimes, I designed a very complex workflow to finish a task. Each component, esp the one with LLM will have very niche and vertical functionality. Eventually, I found out instead of leveraging the big workflow, directly letting LLM work on the task might sometimes generate better answer. With more limitations or constraints (due to that you don’t trust LLM), you are lowering the generalization capacity of LLMs.

The key thing is that we need to balance the LLM’s generalization and controllability. We can rely on one prompt to LLM to produce the outcome instead of complex agentic workflow, but we will take its disadvantages that tons of operations with the same prompt might generate slightly or even largely different results. On the other hand, with complex workflow, we can guarantee each execution on the same context will produce the similar outcome but leave no room for novelty. Think generalization as creativity, and view controllability as governance.

Hybrid approach

I can think of a potential hybrid approach to address the balancing problem.

For cases that we understood (having a bunch of know-hows) pretty well, we design the workflow, so that we can:

Have more control, less room for creativity.
Reason for good and bad handling cases.

For cases that remains unknown or we don’t understand that much, we hand it over to LLM, so that we can:

Handle more cases, less chance for system errors.
Fully leverage the generalization of LLM to handle unknown cases.

Pre-condition: the cases unknown to us is known to the training dataset of LLM (I believe the generalization of LLMs comes from that the similar cases are presented in the training data sets, or prompts); otherwise, we kind of roll the dice to pray for a good response.