Head-to-head: Reorganizing a large document

(un)fortunately for me, a few months before the final draft was due, I made a great connection about how I might explain the need for Amplify Good Work.

It HAD to go in Part 1. But where? and what can it replace?

I started somewhat naively by asking Chat GPT 5.2 Pro to just figure it out (approximately). More or less, look at the manuscript, and try rearranging the text I have. If I need to add a transition, let me know, but don’t rewirte anything.

After it failed twice in a row (literally, “Something went wrong.”) I realized I needed a different approach.

So, I put Part 1 in a word doc, and started labelling it with components. For example [NEDA STORY] and [CHATBOT ROLES]. I did about 5 of them and then uploaded it to ChatGPT 5.2 Pro with the following request:

”I've attached Part 1 as I currently have it. I have added a new idea that I like, but I think makes other pieces of the content repetitive. I want to revisit what pieces belong and the order in which they are presented. I'd like to start by labeling the pieces.

I've started labeling the pieces with names in brackets and all caps. For example, [NEDA STORY] and [TENSION: DROPPING THE ROPE]. Subsections should get labels if they are self-contained and could be moved neatly with just the addition of a transition. For example, I want to be able to consider returning to the team story in a later chapter, so I put the introduction and resolution under separate labels. Can you please return this document with labels throughout, and return a list of all the labels you've made in the chat?”

After about 20 minutes, I got impatient. I wondered, could ChatGPT 5.2 Thinking take care of this? What about Claude? Gemini?

So I asked all of them. Here’s what I learned.

Speed.

Gemini won. Claude Opus and 5.2 Thinking right behind.

5.2 Pro is literally still going, a half hour later. (It finally wrapped up after 36 minutes). And— reverted itself to the Thinking model at some point, so BOO.

Quantity.

Gemini Fast came back with 20-something labels.

Gemini Thinking gave me 43 labels.

Claude Opus gave me 74 (and counted them. Thanks, Claude.)

ChatGPT 5.2 Thinking returned 112 (!!)

ChatGPT 5.2 Pro-reverted-to-Thinking returned 90.

Quality.

Each model did things a little different.

Both Gemini models gave me labels and summaries of the labels. For example:

[DEFINITIONS: LLMS AND GENERATIVE AI]: Explanation of chatbots, prediction, and integration.

[ALIGNMENT AND GUARDRAILS]: Discussion on how model makers try to make AI "useful" (RLHF and system prompts).”

Neither reconsidered the labels I’d put in originally, and instead of returning a document, it pasted the response in the chat. I rarely use Gemini, so I wondered whether this is a limitation of the model, but it doesn’t seem to be.

Gemini Fast did not separate out all of the subsections when a larger section was self-contained. All of the other models put a label on every single strength and weakness of AI, which isn’t necessary.

However, it missed A LOT of stuff. Like, all the content about human limitations, human biases, LLM roles, selecting a consultant… it missed so much that the output is not useful.

Gemini Thinking gave me labels explanations, and seems to have covered the entire section: good. It pretty consistently gave a heading and sub-head to the labels, like: [AI BASICS: LLMS], [AI BASICS: ALIGNMENT & RLHF], and [AI BASICS: AGENTIC AI]. I like this, and i think it will be helpful for later steps.

Claude reconsidered the labels I wrote, changed some, and explained how:
”I renamed [TENSION: TEAM STORY] to [TENSION: TEAM STORY INTRO] and created [TENSION: TEAM STORY RESOLUTION] for the follow-up, as you requested. I also added [TENSION: TWO AXES] for the "two separate axes" reframe that sits between those two pieces.”

Claude also gave me consistent [HEAD: SUBHEAD] labels.

ChatGPT Thinking Explained how it made it’s 100+ labels: “a new “piece” starts when the topic clearly shifts (often marked by a short standalone heading like “AI’s Core Strengths,” a divider like * * *, or a paragraph that introduces a new claim, example, or exercise).”

It often gave me [HEAD': SUBHEAD] but only in areas where it was breaking down a larger section. It pulled out quick tips as well, which I didn’t think about and appreciated.

Verdict?

I preferred Claude’s list, but liked that ChatGPT pulled out Quick Tips, so I returned to Claude and asked:
”This is great. Can you add Quick Tips as their own labels, and then return the list and the document again?”

What We Learned

For starters, even chain-of-thought models can benefit from step-by-step breakdowns still. This is important: I think it’s easy to assume they can sort out the steps on their own.

Second, Gemini Flash is not suitable for a task like this. If I had only tried the task with Flash, I would have been working with an incomplete list.

Finally, running the same task across multiple models can help you identify a few different ways of doing the task, and you can combine their approaches with additional requests.

I wrote this one because it would have taken forever to explain to an LLM

Previous
Previous

“AI for Good” Is Not a Get-Out-of-Tough-Decisions-Free Card

Next
Next

The Hidden Costs of AI: Deskilling