Despite their sophistication, LLMs often falter in tasks requiring autonomous planning. For example, benchmarks like ...