ChatGPT vs Gemini vs Grok vs Deepseek vs Claude

In this post we will compare ChatGPT vs Gemini vs Grok vs Deepseek vs Claude for several uses cases to understand which one suits best for your needs using the available free models.

What we will test are the following things:

Code generation
Content generation
Problem solving

What we will compare are the following aspects:

Generation Speed
Code/Content Quality/Plagiarism
Limitations
Robustness
Readability
Bugs/Issues

As a reminder, all the tests are executed using free models only.

AI Code Generation

I decided to use a short and simple, yet challenging prompt for code generation using python as the desired language since it’s quite popular.

The prompt is asking for a CSV parser script without using any external libraries, so it should be straightforward.

Prompt for all models:

1Create a python script that can parse CSV's, without using an external library.

Results/Comparison

Some models executed and returned a script very fast whilst some models took even 5 minutes to “think” 🥴 on it. All scripts however executed without any errors, which is nice.

Model	Generation Speed	Code Quality	Limitations	Robustness	Readability	Bugs/Issues
ChatGPT-4o	19 seconds	Good, basic	No custom delimiter; only lists (not by column name)	Handles tricky CSVs with newlines	Clear, straightforward	May fail with broken CSV quotes
ChatGPT-4o Think	25 seconds	Excellent	Doesn’t handle newlines inside quotes	Great for normal CSVs	Very clear, well-documented	None obvious, warns about limits
Claude Sonnet 4	30 seconds	Good, friendly	Doesn’t support newlines in quotes; no dict output	Works for standard files	Beginner-friendly	Warns if row lengths don’t match
Claude Sonnet 4 Think	37 seconds	Good, detailed	No newlines in quotes; basic output	Works for most files	Detailed, lots of examples	Warns about data mismatches
DeepSeek	62 seconds	Very good	No column name access; uses more memory for big files	Handles complex cases, newlines OK	Clean, simple	None obvious; may be slower on large files
DeepSeek Think	653 seconds	Simple	Basic only; no headers; can’t handle newlines in quotes	Works for simple CSVs only	Minimal, easy to follow	May break on complex CSVs
Gemini 1.5 Pro	29 seconds	Very good	Doesn’t handle newlines in quotes	Warns/skips broken rows	Friendly, tidy	Skips malformed lines
Grok-3	Basic	9 seconds	No headers; can’t handle newlines in quotes	OK for basic CSVs	Simple, short	No error handling for broken files
Grok-3 Think	87 seconds	Basic	Very simple; no headers or complex cases	Fine for small/simple files	Short, readable	Minimal errors shown

Overall Best

In my opinion, the winner between these models is ChatGPT-4o (Think). A balance between decent generation time and output.

Most practical, especially for “normal” CSV files (no weird newlines inside cells).
Offers both dictionary (column names) and list output.
Customizable delimiter, good error handling, and clear feedback.
Very easy to read, extend, and integrate.
Works from command line or as an imported function.

If you want to see beautiful code, learn, or use it in your own scripts: ChatGPT-4o (Think) is the best. For most people and most files, ChatGPT-4o (Think) is the winner!

The scripts are available to download and view on my repository here .

AI Content generation

Code generation is by far the only thing AI’s are used for nowadays, as more and more people use AI to generate content. This ranges from blog posts, documents, emails and more.

Below we will test two categories: Email and Academic Style Writing.

Email Generation

For our Email content generation test, I will ask for a simple email pitch with the following prompt:

1Create an email pitch about my new floral shop. I sell cut flowers, often arranged into bouquets or floral designs. I also offer custom arrangements, provides daily or weekly flower deliveries, and may offer services like wedding or events styling.

Results/Comparison

Every AI/model executed ridiculously fast in generating the content so I will not include the Generation Speed column below in the table.

Model/Variant	Content Quality	Limitations	Robustness	Readability	Bugs/Artifacts
ChatGPT-4o	Very natural, clean	Slightly generic	Very strong	Excellent	None
ChatGPT-4o Think	Creative, clear, human	Safe, but adds subtle marketing	Excellent	Very high	None
Claude Sonnet 4	Professional, warm	Slightly formal, a bit long	Very strong	Very high	None
Claude Sonnet 4 Think	Detailed, sectioned	Overly verbose, too “website-like”	Robust	High	None, but too long for a pitch email
DeepSeek	Friendly, clear	Adds “P.S.”, slight template feel	Strong	Very high	None, but slightly generic
DeepSeek Think	Multiple subject/body options	Multiple full emails in one file	Good	Good	Did NOT follow “one email” rule; too many choices
Gemini 1.5 Pro	Polished, professional	Three emails in one (for diff. clients)	Good	High	Ignored “one email” rule; too much per file
Grok 3	Warm, direct	Slightly repetitive language	Good	Good	None, but a bit formulaic
Grok 3 Think	Friendly, clear, sectioned	Long intro, slightly “chunky” format	Good	Good	None, just a bit segmented

Overall Best

In my opinion ChatGPT-4o (both versions) wins because of:

Producing a single, ready to use, natural email per file.
No formatting oddities, no AI artifacts, no excessive length, and high readability.

It wrote the most natural, easy to read, and professional-sounding email. It followed my instructions exactly (one email per file, no extra formatting or AI mistakes), so you can use it’s email pitch right away with just a few personal details added.

The contents generated are available to download and view on my repository here .

Academic Style Writing

It’s no surprise that more and more people are using AI’s to generate academic style writing, however most of the content generated will fail due to plagiarism checks because of either the AI not generating unique content, either because of the prompt input.

For our second test, I will ask for a short essay with the following prompt:

1Create a short essay (max 1000 words) about the evolution of CPU's, using academic style writing and unique content. Do not use content from already existing essays or sources. Include references where appropriate.

Results/Comparison

Model/Variant	Content Quality	Limitations	Robustness	Readability	Bugs/Artifacts
ChatGPT-4o	Structured, academic, concise	Slightly formulaic; no narrative flair	Strong, up-to-date	High (for technical readers)	None
Claude Sonnet 4	Encyclopedic, narrative	Verbose, minor repetition	Comprehensive	Smooth, accessible	None; slightly wordy
DeepSeek	Concise, factual, survey-like	Less context, abrupt transitions	Focused, accurate	Moderate (technical)	None
Grok-3	Engaging, thematic	Occasional generalization	Broad, accessible	Very high	None; minor cliché
Gemini 2.5 Pro	Technical, rigorous	Dense, expects technical background	Very robust	Lower (non-technical)	None

Overall Best

The winner is Grok 3 in my opinion for academic style writing.

Uses storytelling and accessible metaphors (“technological odyssey”), making it pleasant to read for non-specialists.
Walks the reader chronologically through CPU history while covering modern themes (multi-core, specialization, the future).
Explains key concepts without overwhelming the reader with jargon or dry technicalities.
You don’t need a deep technical background to follow and enjoy it.

Plagiarism Check

I used the Plagiarism Checker from Grammarly to check the content and see where each AI stands.

You can see the results in the following table:

Model/Variant	Plagiarism	Grammar	Spelling	Punctuation	Conciseness	Readability
ChatGPT-4o	8 writing issues	OK	FAIL	FAIL	FAIL	OK
Claude Sonnet 4	8 writing issues	FAIL	OK	OK	FAIL	OK
DeepSeek	2 writing issues	OK	OK	FAIL	OK	OK
Grok-3	8 writing issues	OK	OK	OK	FAIL	OK
Gemini 2.5 Pro	22 writing issues	FAIL	FAIL	OK	FAIL	OK

The winner is clearly Grok 3 even though it’s not quite perfect; you can fix the small issues yourself and have an “award winning” essay 😊 (lol)

AI Problem Solving

A + B Integral Problem

It’s well know already that AI’s have wast amount of computing power and knowledge, but how do they compare to one another?

Let’s use this popular math quiz provided in some high schools:

1A = Integral from 0 to 1 of ex^2
2B = Integral from 1 to e2 of ln(√x)
3Find A + B

Results/Comparison

This is where every AI’s issues will start to shine. Every single AI struggled to provide a ready copy/paste solution, which is the simplest part of the “problem”. The math was done, but it wasn’t actually copiable, so I had to ask several times in several formats until I could somehow save it as .txt files for you.

Model/Variant	Code Quality	Limitations	Robustness	Readability	Copy-Paste Friendliness
Claude 4 Sonnet	Excellent	None	Very High	Excellent	Best (easy, markdown, stepwise)
DeepSeek	Excellent	Slight header overuse	Very High	Excellent	Excellent
ChatGPT-4o	Excellent	None	Very High	Excellent	Excellent
Grok 3	Good	Verbose, slightly cluttered	High	Good	Good
Gemini 1.5 Pro	Adequate	No exact symbolic answer	High	Excellent	Good (but summary only)

The one that stands out here is Claude Sonnet 4, which is the winner in my opinion. For a quick numeric result, Gemini is fastest, but for complete clarity and reusability, stick with Claude, DeepSeek, or ChatGPT-4o.

Broken Code Problem

Let’s kick this up a notch and see how smart are AI’s between them, by asking to fix a piece of broken plain C code:

 1#include <stdio.h>
 2#include <stdlib.h>
 3#include <string.h>
 4
 5char* copy_string(const char* src) {
 6char\* dest;
 7strcpy(dest, src);
 8return dest;
 9}
10
11int main() {
12char* original = "Hello, world!";
13char* copy = copy_string(original);
14
15    printf("Copied string: %s\n", copy);
16
17    return 0;
18
19}

What’s wrong with the above code? Let me explain:

dest in copy_string is used uninitialized, no memory allocated.
Using strcpy(dest, src) with an uninitialized pointer causes undefined behavior and likely a crash.
The memory for the copy (if it had been allocated) is never freed, potential memory leak.
The code prints the copy without checking for success.

With the above code, we formulate the prompt as follows:

 1Fix the following code for me and provide summary of fixes:
 2
 3```
 4#include <stdio.h>
 5#include <stdlib.h>
 6#include <string.h>
 7
 8char* copy_string(const char* src) {
 9    char* dest;
10    strcpy(dest, src);
11    return dest;
12}
13
14int main() {
15    char* original = "Hello, world!";
16    char* copy = copy_string(original);
17
18    printf("Copied string: %s\n", copy);
19
20    return 0;
21}
22```

Results/Comparison

Honestly, every model produced a correct and professional fix.

Model/Variant	Content Quality	Limitations	Robustness	Readability	Bugs/Artifacts
Claude 4 Sonnet	Excellent (edge-case handling, clear)	Slightly verbose summary	Checks for NULL input and allocation; frees memory	Very clear, neat	None
DeepSeek	Excellent (succinct, correct)	Slightly less verbose on input validation	Checks allocation; error handling; frees memory	Clear, concise	None
ChatGPT-4o	Excellent (concise, covers all)	No NULL input check (for src)	Checks allocation; error handling; frees memory	Very readable	None
Grok 3	Excellent (thorough, professional)	Exits on alloc fail (not best for libraries); no NULL input check	Handles allocation error; frees memory	Slightly verbose	None
Gemini 1.5 Pro	Excellent (professional, extra detail)	No explicit input NULL check; lots of comments	Handles alloc errors, sets pointer NULL after free	Very readable	None

All models produced a correct and professional fix. Claude 4 Sonnet went above and beyond with edge-case handling and explanation, but all answers are solid and suitable for copy/paste into a C project. No model introduced any new errors.

The codes generated are available to download and view on my repository here .

Conclusion

After testing the latest generation of AI models on three very different tasks like academic essay writing, business email marketing, and hands-on coding, I found that no single AI rules them all. Instead, each model brings its own strengths, quirks, and ideal use cases.

But, Who Wins Overall?

It depends on what you need:

Ready to automate or build something serious? Go with ChatGPT-4o or Claude 4 Sonnet for code.
Need friendly, customer focused communication? Grok-3.
Want to inform and delight readers? Grok-3 is your best friend.

There is no single “best” AI, just the right tool for the right job.

The smartest way to use AI is to match the model to your mission, because as this experiment shows, even the most advanced bots have their own personalities and strengths.

References/Links

Thank you for taking the time to read my article and please feel free to share it with friends.

ChatGPT vs Gemini vs Grok vs Deepseek vs Claude

AI Code Generation

Results/Comparison

Overall Best

AI Content generation

Email Generation

Results/Comparison

Overall Best

Academic Style Writing

Results/Comparison

Overall Best

Plagiarism Check

AI Problem Solving

A + B Integral Problem

Results/Comparison

Broken Code Problem

Results/Comparison

Conclusion

But, Who Wins Overall?

References/Links

Comments

Leave a comment

ChatGPT vs Gemini vs Grok vs Deepseek vs Claude

AI Code Generation

Results/Comparison

Overall Best

AI Content generation

Email Generation

Results/Comparison

Overall Best

Academic Style Writing

Results/Comparison

Overall Best

Plagiarism Check

AI Problem Solving

A + B Integral Problem

Results/Comparison

Broken Code Problem

Results/Comparison

Conclusion

But, Who Wins Overall?

References/Links

Related Posts

Comments

Leave a comment Cancel reply

Leave a comment