In this post we will compare ChatGPT vs Gemini vs Grok vs Deepseek vs Claude for several uses cases to understand which one suits best for your needs using the available free models.

What we will test are the following things:

  • Code generation
  • Content generation
  • Problem solving

What we will compare are the following aspects:

  • Generation Speed
  • Code/Content Quality/Plagiarism
  • Limitations
  • Robustness
  • Readability
  • Bugs/Issues

As a reminder, all the tests are executed using free models only.

AI Code Generation

I decided to use a short and simple, yet challenging prompt for code generation using python as the desired language since it’s quite popular.

The prompt is asking for a CSV parser script without using any external libraries, so it should be straightforward.

Prompt for all models:

1Create a python script that can parse CSV's, without using an external library.

Results/Comparison

Some models executed and returned a script very fast whilst some models took even 5 minutes to “think” 🥴 on it. All scripts however executed without any errors, which is nice.

ModelGeneration SpeedCode QualityLimitationsRobustnessReadabilityBugs/Issues
ChatGPT-4o19 secondsGood, basicNo custom delimiter; only lists (not by column name)Handles tricky CSVs with newlinesClear, straightforwardMay fail with broken CSV quotes
ChatGPT-4o Think25 secondsExcellentDoesn’t handle newlines inside quotesGreat for normal CSVsVery clear, well-documentedNone obvious, warns about limits
Claude Sonnet 430 secondsGood, friendlyDoesn’t support newlines in quotes; no dict outputWorks for standard filesBeginner-friendlyWarns if row lengths don’t match
Claude Sonnet 4 Think37 secondsGood, detailedNo newlines in quotes; basic outputWorks for most filesDetailed, lots of examplesWarns about data mismatches
DeepSeek62 secondsVery goodNo column name access; uses more memory for big filesHandles complex cases, newlines OKClean, simpleNone obvious; may be slower on large files
DeepSeek Think653 secondsSimpleBasic only; no headers; can’t handle newlines in quotesWorks for simple CSVs onlyMinimal, easy to followMay break on complex CSVs
Gemini 1.5 Pro29 secondsVery goodDoesn’t handle newlines in quotesWarns/skips broken rowsFriendly, tidySkips malformed lines
Grok-3Basic9 secondsNo headers; can’t handle newlines in quotesOK for basic CSVsSimple, shortNo error handling for broken files
Grok-3 Think87 secondsBasicVery simple; no headers or complex casesFine for small/simple filesShort, readableMinimal errors shown

Overall Best

In my opinion, the winner between these models is ChatGPT-4o (Think). A balance between decent generation time and output.

  • Most practical, especially for “normal” CSV files (no weird newlines inside cells).
  • Offers both dictionary (column names) and list output.
  • Customizable delimiter, good error handling, and clear feedback.
  • Very easy to read, extend, and integrate.
  • Works from command line or as an imported function.

If you want to see beautiful code, learn, or use it in your own scripts: ChatGPT-4o (Think) is the best. For most people and most files, ChatGPT-4o (Think) is the winner!

The scripts are available to download and view on my repository here .

AI Content generation

Code generation is by far the only thing AI’s are used for nowadays, as more and more people use AI to generate content. This ranges from blog posts, documents, emails and more.

Below we will test two categories: Email and Academic Style Writing.

Email Generation

For our Email content generation test, I will ask for a simple email pitch with the following prompt:

1Create an email pitch about my new floral shop. I sell cut flowers, often arranged into bouquets or floral designs. I also offer custom arrangements, provides daily or weekly flower deliveries, and may offer services like wedding or events styling.

Results/Comparison

Every AI/model executed ridiculously fast in generating the content so I will not include the Generation Speed column below in the table.

Model/VariantContent QualityLimitationsRobustnessReadabilityBugs/Artifacts
ChatGPT-4oVery natural, cleanSlightly genericVery strongExcellentNone
ChatGPT-4o ThinkCreative, clear, humanSafe, but adds subtle marketingExcellentVery highNone
Claude Sonnet 4Professional, warmSlightly formal, a bit longVery strongVery highNone
Claude Sonnet 4 ThinkDetailed, sectionedOverly verbose, too “website-like”RobustHighNone, but too long for a pitch email
DeepSeekFriendly, clearAdds “P.S.”, slight template feelStrongVery highNone, but slightly generic
DeepSeek ThinkMultiple subject/body optionsMultiple full emails in one fileGoodGoodDid NOT follow “one email” rule; too many choices
Gemini 1.5 ProPolished, professionalThree emails in one (for diff. clients)GoodHighIgnored “one email” rule; too much per file
Grok 3Warm, directSlightly repetitive languageGoodGoodNone, but a bit formulaic
Grok 3 ThinkFriendly, clear, sectionedLong intro, slightly “chunky” formatGoodGoodNone, just a bit segmented

Overall Best

In my opinion ChatGPT-4o (both versions) wins because of:

  • Producing a single, ready to use, natural email per file.
  • No formatting oddities, no AI artifacts, no excessive length, and high readability.

It wrote the most natural, easy to read, and professional-sounding email. It followed my instructions exactly (one email per file, no extra formatting or AI mistakes), so you can use it’s email pitch right away with just a few personal details added.

The contents generated are available to download and view on my repository here .

Academic Style Writing

It’s no surprise that more and more people are using AI’s to generate academic style writing, however most of the content generated will fail due to plagiarism checks because of either the AI not generating unique content, either because of the prompt input.

For our second test, I will ask for a short essay with the following prompt:

1Create a short essay (max 1000 words) about the evolution of CPU's, using academic style writing and unique content. Do not use content from already existing essays or sources. Include references where appropriate.

Results/Comparison

Model/VariantContent QualityLimitationsRobustnessReadabilityBugs/Artifacts
ChatGPT-4oStructured, academic, conciseSlightly formulaic; no narrative flairStrong, up-to-dateHigh (for technical readers)None
Claude Sonnet 4Encyclopedic, narrativeVerbose, minor repetitionComprehensiveSmooth, accessibleNone; slightly wordy
DeepSeekConcise, factual, survey-likeLess context, abrupt transitionsFocused, accurateModerate (technical)None
Grok-3Engaging, thematicOccasional generalizationBroad, accessibleVery highNone; minor cliché
Gemini 2.5 ProTechnical, rigorousDense, expects technical backgroundVery robustLower (non-technical)None

Overall Best

The winner is Grok 3 in my opinion for academic style writing.

  • Uses storytelling and accessible metaphors (“technological odyssey”), making it pleasant to read for non-specialists.
  • Walks the reader chronologically through CPU history while covering modern themes (multi-core, specialization, the future).
  • Explains key concepts without overwhelming the reader with jargon or dry technicalities.
  • You don’t need a deep technical background to follow and enjoy it.

Plagiarism Check

I used the Plagiarism Checker from Grammarly to check the content and see where each AI stands.

You can see the results in the following table:

Model/VariantPlagiarismGrammarSpellingPunctuationConcisenessReadability
ChatGPT-4o8 writing issuesOKFAILFAILFAILOK
Claude Sonnet 48 writing issuesFAILOKOKFAILOK
DeepSeek2 writing issuesOKOKFAILOKOK
Grok-38 writing issuesOKOKOKFAILOK
Gemini 2.5 Pro22 writing issuesFAILFAILOKFAILOK

The winner is clearly Grok 3 even though it’s not quite perfect; you can fix the small issues yourself and have an “award winning” essay 😊 (lol)

AI Problem Solving

A + B Integral Problem

It’s well know already that AI’s have wast amount of computing power and knowledge, but how do they compare to one another?

Let’s use this popular math quiz provided in some high schools:

1A = Integral from 0 to 1 of ex^2
2B = Integral from 1 to e2 of ln(√x)
3Find A + B

Results/Comparison

This is where every AI’s issues will start to shine. Every single AI struggled to provide a ready copy/paste solution, which is the simplest part of the “problem”. The math was done, but it wasn’t actually copiable, so I had to ask several times in several formats until I could somehow save it as .txt files for you.

Model/VariantCode QualityLimitationsRobustnessReadabilityCopy-Paste Friendliness
Claude 4 SonnetExcellentNoneVery HighExcellentBest (easy, markdown, stepwise)
DeepSeekExcellentSlight header overuseVery HighExcellentExcellent
ChatGPT-4oExcellentNoneVery HighExcellentExcellent
Grok 3GoodVerbose, slightly clutteredHighGoodGood
Gemini 1.5 ProAdequateNo exact symbolic answerHighExcellentGood (but summary only)

The one that stands out here is Claude Sonnet 4, which is the winner in my opinion. For a quick numeric result, Gemini is fastest, but for complete clarity and reusability, stick with Claude, DeepSeek, or ChatGPT-4o.

Broken Code Problem

Let’s kick this up a notch and see how smart are AI’s between them, by asking to fix a piece of broken plain C code:

 1#include <stdio.h>
 2#include <stdlib.h>
 3#include <string.h>
 4
 5char* copy_string(const char* src) {
 6    char* dest;
 7    strcpy(dest, src);
 8    return dest;
 9}
10
11int main() {
12    char* original = "Hello, world!";
13    char* copy = copy_string(original);
14
15    printf("Copied string: %s\n", copy);
16
17    return 0;
18}

What’s wrong with the above code? Let me explain:

  • dest in copy_string is used uninitialized, no memory allocated.
  • Using strcpy(dest, src) with an uninitialized pointer causes undefined behavior and likely a crash.
  • The memory for the copy (if it had been allocated) is never freed—potential memory leak.
  • The code prints the copy without checking for success.

With the above code, we formulate the prompt as follows:

 1Fix the following code for me and provide summary of fixes:
 2```
 3#include <stdio.h>
 4#include <stdlib.h>
 5#include <string.h>
 6
 7char* copy_string(const char* src) {
 8    char* dest;
 9    strcpy(dest, src);
10    return dest;
11}
12
13int main() {
14    char* original = "Hello, world!";
15    char* copy = copy_string(original);
16
17    printf("Copied string: %s\n", copy);
18
19    return 0;
20}
21```

Results/Comparison

Honestly, every model produced a correct and professional fix.

Model/VariantContent QualityLimitationsRobustnessReadabilityBugs/Artifacts
Claude 4 SonnetExcellent (edge-case handling, clear)Slightly verbose summaryChecks for NULL input and allocation; frees memoryVery clear, neatNone
DeepSeekExcellent (succinct, correct)Slightly less verbose on input validationChecks allocation; error handling; frees memoryClear, conciseNone
ChatGPT-4oExcellent (concise, covers all)No NULL input check (for src)Checks allocation; error handling; frees memoryVery readableNone
Grok 3Excellent (thorough, professional)Exits on alloc fail (not best for libraries); no NULL input checkHandles allocation error; frees memorySlightly verboseNone
Gemini 1.5 ProExcellent (professional, extra detail)No explicit input NULL check; lots of commentsHandles alloc errors, sets pointer NULL after freeVery readableNone

All models produced a correct and professional fix. Claude 4 Sonnet went above and beyond with edge-case handling and explanation, but all answers are solid and suitable for copy/paste into a C project. No model introduced any new errors.

The codes generated are available to download and view on my repository here .

Conclusion

After testing the latest generation of AI models on three very different tasks like academic essay writing, business email marketing, and hands-on coding, I found that no single AI rules them all. Instead, each model brings its own strengths, quirks, and ideal use cases.

But, Who Wins Overall?

It depends on what you need:

  • Ready to automate or build something serious? Go with ChatGPT-4o or Claude 4 Sonnet for code.
  • Need friendly, customer focused communication? Grok-3.
  • Want to inform and delight readers? Grok-3 is your best friend.

There is no single “best” AI, just the right tool for the right job.

The smartest way to use AI is to match the model to your mission, because as this experiment shows, even the most advanced bots have their own personalities and strengths.

Thank you for taking the time to read my article and please feel free to share it with friends.