Expert writing, ChatGPT-like tools and originality

In this blog article we discuss how tools like ChatGPT impact expert writing, i.e., writing about some subject matter. Typical expert writing is academic writing, like papers or theses, but also technical reports in R&D or sometimes mails in a company, but in a certain sense also source code. We leverage the model of McEnerny to distinguish between writing for thinking and writing for text, and ask how generative AI impacts each of them. Finally we discuss what we may expect from the output of LLM-based AI systems like ChatGPT and what not, how this maybe relevant in our utilization as tool.

Introduction

Seldomly in innovation history a singular event would have such a ground shaking impact on such a diversity of aspects in the private and professional life of so many people as the advent of ChatGPT and its follow large-language models (LLMs) embodied in different tools. I will gladly refrain from the endeavor of sketching a complete picture of the effects unfolds, but rather would like to focus on one particular niche: Scientific writing or, more generally, expert writing.

Note that many aspects of this type of writing and its process not only takes place in academia, but in a way also in journalism, or in cooperate R&D, and in a sense also in coding. By “scientific writing” I mean the process of writing that assists the author in thinking a subject matter through, typically carries some novelty and typically is done to communicate the condensate of the thinking in some form. So even when we discuss “scientific writing” in a strict interpretation, many of the arguments to follow can be transferred to other domains.

We all made hands on experience with ChatGPT, but it seems after a couple of years we still have to find out what a good practice, what a good way of utilization is, or maybe also where the pitfalls are. In this article I try to order some thoughts in this context and provide a few models that help to add structure to the discussion.

The nature of expert writing

We first have to understand that scientific writing is very different to other forms of writing. One who conveys this difference is Larry McEnerny in his lecture on Writing Beyond the Acadamy and explains what “expert writing” is.

Experts write in their respective domain for other experts. These writing experts think in one form or the other about aspects of the world. And when these aspects are sufficiently difficult, reach a certain technical level, they help their thinking by writing. That is, they are in a tight loop of writing-thinking-writing-thinking and so on. The figure shown in the lecture is (mostly) this one:

Expert writing

Expert writing in the sense of McEnerny.

This is also what I personally do. Indeed, also beyond scientific writing. Also for business mails I sometimes bring order in my mind during the process of writing mails. Writing leverages my thinking. This is also what is happening right now. I keep rereading what what I wrote and constantly check whether the tread of thought makes sense, or whether the paragraphs reflect the structure of my thoughts.

The writing produces text about the world. The text is read by readers. The function of the text is to change what the reader thinks about the world. And this is effectively why we use words like “however” or “but” in the motivation or introduction section of papers: to contrast the current belief about the world (aka. state of the art) of the reader.

What makes this model above interesting is that it explains a phenomenon every PhD student probably experiences at some point when writing papers: You have an idea, you think about the world, you are in the writing-thinking loop, and the outcome is some text. Let’s go and publish it. But it does not work, the text somehow does not work, cannot be published as is. What happened? Well, the first writing has a different function (assist thinking) than the eventual paper (changing the readers view). And this is why we write the eventual paper again from scratch, but now with the reader in mind.

Summary: In expert writing, “writing” fulfills to functions: Writing for yourself to assist thinking thinking and writing for the reader to change the way the world is viewed.

And again note that the above discussion is not confined to scientific writing, but all sort of expert writing.

Learning expert writing and the purpose of a thesis

The craft of expert writing needs to learned. Too often, it is learned implicitly by doing and gaining experience, not in explicit courses. This has two bad effects: First, those who are good in expert writing often cannot articulate what makes them successful because this skill is implicit. Secondly, writing is often perceived as a tedious burden instead of a powerful thinking technique.

The advent of ChatGPT and others put a big question mark over the sense of the writing of theses. I believe that the discussion over this question is often based on a misconception of the purpose of writing the thesis and mixes in organizational difficulties.

Regarding the latter, the question of how to grade a thesis should per se be separated from the purpose of writing a thesis, just like law enforcement and law making are are per se different.

Regarding the former, the purpose of writing the thesis is often believed in lying in the thesis as “product”, i.e., in the text in the above figure you can hold in your hand. And often, especially in engineering disciplines, the thesis is a valuable artifact, the outcome of a project as such. But this point of view ignores a very important aspect – and I would like to think the actual core aspect –, namely writing the thesis lies in the experience, practice and demonstration of the thinking-writing loop, i.e., in the left part of the above figure, not in the right part.

It is a thinking technique to be learned and trained and is useful far beyond academia. Writing a thesis fulfills many purposes, this is one of it.

Modes of writing: Impact and integration of AI

Now that we ChatGPT and similar tools, we have to ask how expert writing changed or how we best integrate ChatGPT in the writing process.

Again referring to the above figure, we shall distinguish between

  • writing to assist thinking and
  • writing to produce text for the reader.

Writing for thinking with AI

In the first case, where writing and thinking is in a tight feedback loop, where the written manifestation of thoughts and the continuous reflection of these helps to sharpen arguments, phrasing, the choice of words, helps to construct a logical linear order of statements to give a clear whole, in this intimate, delicate, floating process, any outside intervention is interrupting, disrupting and eventually terminating the genesis of a thought complex.1

This is why programmers hate to be interrupted. This is why we find it hard to form complex thoughts in a meeting, and most statements come by heart, not by thought. And ChatGPT or any form of editor-integrated CoPilot is no different. This is why I have no AI running right now, because I am still in the process of thinking.

Writing for the text with AI

In the second case, where we write to produce the text with the reader in mind, AI tools can be quite handy. For instance, I write my professional text typically in LaTeX. With CoPilot turned on in my editor, the typesetting of tables or TikZ figures can be quite a speed up. Especially if I get a bit rusty in this or that task in terms of syntax, AI tools help. The NeurIPS LLM policy even says that authors do not need to document spell checkers, grammar suggestions, programming aid for editing purposes.

When we write up text for readers, we typically have a higher level strategy in mind, a story line that is capable to transport the messages we want to convey. Depending on how important the text is, the implementation of this strategy in text is very thoughtful and delicate. In these cases, AI tools may again be less supportive and maybe even harmful.

Writing with AI to substitute writing-thinking

With AI, there is another mode of writing that is not shown in the figure above: We skip or strip the thinking part at the left and go straight to AI-assisted writing of the text. The goal is to substitute the writing-thinking, i.e., instruct AI tools to generate text or let it auto complete from context. Depending on the mode at which AI tools are used, the user may engage in a dialog to iteratively refine the AI output, and at some point pulling the output over or manually refining it.

The NeurIPS LLM policy says: “Therefore, while authors are welcome to use any tool they wish for preparing and writing the paper, they must ensure that all content is correct and original.”

The first part is the easy one: Whatever you take over, you are responsible for correctness. The second part is more delicate: The content has to be original, when submitted to NeurIPS. This raises the question of whether LLMs produce original output? By “original” we mean it is not derived. Well, each of our thoughts is derived, but the question is rather what the level of originality is, i.e., how far does an original thought take us from the cloud of previous thoughts.

Bloom’s taxonomy

A very popular model for learning levels – and in a sense also cognition levels – is Bloom’s taxonomy, originating back to 1950s. In a revised form it is illustrated below, as taken from Wikipedia. We use this model frequently to reflect and assess what levels we shall reach in different courses, e.g., what and how are we addressing these in lectures, in labs, in student projects, at internships or in theses. For instance, in a lecture I primarily focus on “remember” and “understand”, I mostly focus on lecturing. In labs we take this further and go to “apply”, “analyze” for the exercises at home and in the group we then strive to reach “evaluate”, make the student defend their outcome. When I grade a Bachelor or Master thesis I can ask in what sense and to what extent did the student demonstrate the ability to “understand” (background & state of the art section), “apply”, “analyze”, “evaluate” and “create”.

Bloom's revised taxonomy

Bloom's revised taxonomy. Taken from wikipedia, created by Tidema.

Likewise we can ask what ChatGPT actually does for us in this context. The AI-assisted answer now incorporated in Google Search gives us results on the level of “remember”, it presents facts and the rhetoric mixes in “understand”. Often people use a ChatGPT dialog to assist their “analyze” process, also their “evaluate” process, but also at the “understand” level and in principle also at the “apply” level. When we use CoPilot to write code or when we generate images then we also reach the “create” level in the sense, the something is assembled, constructed, developed.2

These days, just like with paper submissions at conferences, students and authors have to declare their utilization of AI during the preparation of their text. However, it seems crucial to me that an essential aspect here is in what “mode of writing” the AI has been utilized, as discussed above. Because what is sent in is the final text, and we do not see the genesis in the result. We only see the genesis when we closely monitor this genesis.

But let us get back to the question we raised from the NeurIPS LLM policy: Certainly LLMs and other generative AI tools create new work in the sense of the above toxonomy, but do they create original work?

Originality

This question reoccurs since the dawn of AI and it comes in various forms. A very famous treatment is by John Searle, Minds, Brains and Science, from 1984. It starts with the classic mind-body problem, goes over the question whether computers can think, to cognitive science and ends with a discussion on the freedom of will. In chapter two, Searle argues that symbol-manipulating machines – computers – operate at the level of syntax, but for our mind to reason it needs meaning:

“If my thoughts are to be about anything, then the strings must have a meaning which makes the thoughts about those things. In a word, the mind has more than a syntax, it has a semantics. The reason that no computer program can ever be a mind is si mply that a computer program is only syntactical, and minds are more than syntactical. Minds are semantical, in the sense that they have more than a formal structure, they have a content.”

Here then Searle presents the thought experiment of the Chinese room, which can be read as a critique on the Turing test. In chapter three, Searle also makes the point that in history we keep using our latest technology as a model to understand our thinking:

In my childhood we were always assured that the brain was a telephone switch- board. (‘What else could it be?’) I was amused to see that Sherrington, the great British neuroscientist, thought that the brain worked like a telegraph system. Freud often compared the brain to hydraulic and electro-magnetic systems. Leibniz compared it to a mill, and I am told that some of the ancient Greeks thought the brain functions like a catapult. At present, obviously, the metaphor is the digital computer.

This was 1984. Now we have large-language models and we again see these metaphors. For instance, let us look into the inner workings of ChatGPT: In order for the transformer architecture based on multi-head attention to perform sequence-to-sequence learning, we have to translate text – a sequence of tokens – into vectors. We do so by powerful word embeddings, which aim to express semantics, i.e., where vector addition commutes with semantic operations. Let us denote by \(\varphi: W \to \mathbb{R}^n\) the embedding from the token/word space \(W\) to the vector space \(\mathbb{R}^n\) then Wikipedia gives the example where

\[\varphi(\text{Paris}) - \varphi(\text{France}) = \varphi(\text{Berlin}) - \varphi(\text{Germany}).\]

But we should note the notion of semantics on \(\mathbb{R}^n\) is a metaphor here. Also when your agentic AI gives a spinning icon saying “thinking” then this is a metaphor, the “chain of thoughts” in LLM systems is a metaphor, also because “thought” was a metaphor in the first place.

I think this discussion is important, but actually refers to a deeper question concerning strong AI and consciousness and the like.

But we were asking whether the output of an LLM can be original. There is another tangent here in a similar discussion, namely machine ethics. Catrin Misselhorn wrote a nice book with the title Grundfragen der Maschinenethik. Based on this book, I published a blog article with the title Ethik künstlicher intelligenter Systeme some 4 years ago. Here we also have the aspect of originality, namely concerning the existence of ethical agents, or more precisely: Genuine action (genuines Handeln) encompasses rationality and self originality (Selbstursprünglichkeit). Then Misselhorn refers to Floridi and Sanders, who name three criteria for self originality: interaction with environment, a certain independence of the environment, and the ability to adapt (upon changes in the environment).

I think, at least in a weak form, we can agree that the three criteria mentioned here are fulfilled. The ability to adapt is maybe the most interesting, which is also shown in the more recent and diverse techniques to augment the context on which LLMs operate, whether it is RAGs, agentic AI or whether you add more modules to your ChatGPT to improve performance in math-like problems or to incorporate current information from the internet.

So let us for now agree that some form of originality is present in LLMs.

Understanding

Let us revisit Bloom’s taxonomy again. The second layer, quite at the bottom, is called “understanding”. For the humans to create original work, a prerequisite is to understand the subject matter. That is, we read Bloom’s taxonomy as a hierarchy of levels. However, even if we agree that ChatGPT would create original work, this does not imply that we also believe that LLMs would have an “understanding” of what it produced. This brings us again back to Searle’s discussion on semantics versus syntax.

There is a paper by Mitchell and Krakauer, The debate over understanding in AI’s large language models, from 2023, which discusses exactly this. The cite a survey from 2022, where 480 active NLP researchers where asked whether they agree to the statement “Some generative model trained only on text, given enough data and compute, could understand natural language in some nontrivial sense” [a bit rephrased]: 51% agreed, 49% disagreed. And again, 40 years after Searle, the arguments are more or less the one of the Chinese room:

While “humanlike understanding” does not have a rigorous definition, it does not seem to be based on the kind of massive statistical models that today’s LLMs learn; instead, it is based on concepts—internal mental models of external categories, situations, and events and of one’s own internal state and “self”. In humans, understanding language (as well as nonlinguistic information) requires having the concepts that language (or other information) describes beyond the statistical properties of linguistic symbols. Indeed, much of the long history of research in cognitive science has been a quest to understand the nature of concepts and how understanding arises from coherent, hierarchical sets of relations among concepts that include underlying causal knowledge (39, 40). These models enable people to abstract their knowledge and experiences in order to make robust predictions, generalizations, and analogies; to reason compositionally and counterfactually; to actively intervene on the world in order to test hypotheses; and to explain one’s understanding to others (41–47). Indeed, these are precisely the abilities lacking in current AI systems, including state-of-the-art LLMs, although ever larger LLMs have exhibited limited sparks of these general abilities. It has been argued that understanding of this kind may enable abilities not possible for purely statistical models (48–52). While LLMs exhibit extraordinary formal linguistic competence—the ability to generate grammatically fluent, humanlike language—they still lack the conceptual understanding needed for humanlike functional language abilities—the ability to robustly understand and use language in the real world (53). An interesting parallel can be made between this kind of functional understanding and the success of formal mathematical techniques applied in physical theories (54). For example, a long-standing criticism of quantum mechanics is that it provides an effective means of calculation without providing conceptual understanding.

The analogy to quantum mechanics reminds about some pun I move to a footnote to limit distraction3. Instead I would like to keep the focus on the Chinese room of Searle, because what is argued above in 2022 exactly boils down to Searle’s argument from 1984. Ironically, the advent of LLMs allows us to make his argument even more clearer, if yo allow me to augment the original thought experiment:

A Chinese room in 2025. The Chinese room goes as follows: Suppose you do not speak Chinese. You are confined in an indefinitely large library filled with Chinese books. Somebody enters, hands you over a letter in Chinese and you are supposed to formulate a response letter. Now, in principle, we could image that by the help of this endless number of Chinese books and by carefully looking at all these symbols and if you have all the time you ask for, you might be able to arrange Chinese symbols on a response letter in meaningful way. Now the question: Does it make you to understand Chinese?

Well, this experiment is quite hypothetical, at least in 1984. But now we have LLMs; you can go to Youtube and learn how to build LLMs from scratch. That allows us to augment4 the original experiment as follows: Assume we also have a huge pile of paper and an endless pen. On this paper we perform all the calculations to train the LLM. It is linear algebra, trigonometric functions and we have to numerically compute gradients5. In principle, with indefinite time, paper and pen, we can actually answer a Chinese letter, we even can phrase answers to all sort of subject matters, like quantum physics, the music of Mozart, the theories of Sigmund Freud, the work of Socrates, whatever, in Chinese, just like ChatGPT. Now the question: Do we understand Chinese? I do not think so. Does the pile of paper understand Chinese? Silly question. But this pile of paper is no different to ChatGPT, which is just the pile of paper in a digital form, plus some a hard-coded processor that acts like a clockwork to execute the pile of paper.

To sum up, it appears pretty clear that (LLM-based) AI systems provide originality, without understanding. And this can be perfectly fine, depending on how you utilize it. Just do not fall in love and ask for a deeper mutual understanding.

Novelty

I would like to discuss another aspect concerning the ability of LLMs that goes beyond what we discussed in terms of originality and understanding, and I call it novelty here, to somehow differentiate. I start with another standard literature on AI, namely the textbook by Russel & Norvig’s Artificial Intelligence, A Modern Approach. In the introduction chapter, they differentiate AI along different dimensions, one of which is the following:

  • AI that acts or thinks human-like
  • AI that acts or thinks rationally

For the latter we could think about a chess computer or various reinforcement learning (RL) control algorithms. For the former we think of classifying images according to cancer or digits or the like. We can make this more precise: Reinforcement learning is of the second type, because the optimization goal is literally to maximize the long-term reward, which quantifies the rationale. Image classification is of the first type, when we apply supervised learning with training data labeled by humans, because then we literally make it resemble the human judgment.

Now what are LLMs? Well, prima vista it is a transformer architecture trained on human-created data, and hence of the first type. There is also RL-based fine tuning and LLM-based AI systems may add additional components that complement the LLM-engine (which might be of the second type), but in essence, it is of the first type.

However, this implies that LLMs will not produce anything beyond the training set. We can make this a little bit more precise: It will be able to “interpolate” between the training data, i.e., sort of pick from the “convex hull” of the training data, but it will not extrapolate in a notable way, i.e., go beyond this “convex hull”.6 In other words, it will not produce something novel beyond what is gained from limited stochasticity.

That is, we should not expect truly, globally innovative ideas coming out from ChatGPT. This would require a paradigm shift to RL at its foundation. This does not mean that each individual would not learn something interesting through interaction with ChatGPT.

The other thing is what Ilya Sutskever said in his talk for the test-of-time award at NeurIPS 2024 in Vancouver for seq-to-seq learning: Pre-training – the P and T in ChatGPT – as we know it will come to an end, because compute is growing but data is not. We only have one internet, which is now more or less entirely used for training, making data sort of the “fossil fuel of AI”. In other words, we cannot just train ever larger LLMs because we lack the data.

Metaphors of thinking and Kahneman

Before finally coming to an end, I would like to once more pick up metaphors of thinking. When it comes to thinking in the human mind, there is the famous book Thinking, Fast and slow by Daniel Kahneman (Nobel Memorial Prize 2002). The core model of this book describes two modes of thinking, System 1 and System 2. He describes that he would like to give them more descriptive names, to avoid – let’s say – unintended metaphorical interpretation. In a brief summary:

  • System 1 is fast, automatic, uses less energy, e.g., easy driving on the motorway, adding up to small numbers, read text on a billboard.

  • System 2 is slow, takes effort, is logical and concious, e.g., reverse parking a car, taking a chess move, calculating with larger numbers, hurrying through the airport, listing under noise.

Note that there is some analogy to AI methods. Symbolic AI – like SMT solvers, searching, planning and scheduling algorithms, et cetera – sort of resemble system 2 thinking. Often they are related to NP hard algorithmic problems. Whereas sub-symbolic AI, i.e., machine learning, sort of resembles system 1 thinking. It is statistical, not deductive. System 1 speaks from experience, it is prone to prejudices, it happens automatic, it does not not think things through, in contrast to system 2. We need system 1 thinking because system 2 thinking all day long would quickly exhaust us.

I mention this here because when we ask whether this or that AI system or paradigm would sort of give us human-like thinking capabilities, we should not forget, how multi-faceted human thinking is. Having said that, also Kahneman’s model is still just a model, whose function is to simplify reality.

But there is also a certain irony here: LLM-based AI systems are like system 1 thinkers with an enormous amount of experience, but no system 2 capabilities. Also chain of thought mechanisms are only techniques to iteratively interact with the system-1-type LLM core. It still does not make it system-2-like. So if you ask for an advice for a really important question: Do you want your advisor to think in system 1 or system 2?

Conclusion

This blog article now became quite lengthy, but I tried to summarize various angles of discussion concerning the application of LLM-based AI tools in expert writing.

In my personal experience, I avoided AI tools integration in my favorite editor neovim quite some time, then added GitHub’s Copilot, then a few other addons and used it quite intensively. Especially in summer 2024, when I prepared fresh set of chapters in the lecture notes for a brand new lecture. I already had a very clear mind of what I am going to write about, so the CoPilot utilization was only concerned with writing-for-the-text.

I use neovim for all of my text files – whether source code, configs, prosa text, mails, you name it. The AI addons are configured to be opt-in, i.e., they are turned off until I explicitly turn them on. For some time – probably because everything new is exciting – I kept turning it on for all sort of use cases, including writing-for-thinking. It it impaired my performance in these cases.

Indeed, I observe that I gradually decrease the usage of these tools over time. Just now, when I write this article, I deliberately did not turn any AI tool on. This now allows me to genuinely reflect how this article might look differently if I would have turned it on.

I now just opened ChatGPT in the browser, so it would not know what I have just written and prompted:

I would like to write a blog article about scientific writing and the usage of ChatGPT. I think I would use ChatGPT in different modes of writing.

And ChatGPT gives the usual patronizing response: What a great idea, the tile could be “How to Supercharge Your Scientific Writing with ChatGPT — Without Losing Your Voice” and the introduction section could be “Introduction: The New Era of Scientific Writing”. We also need the usual intellectual, critical thinking, so it suggests: “Emphasize that they don’t replace critical thinking or domain expertise — but they can enhance productivity, clarity, and creativity.” And something about ethical considerations and best practices, et cetera. Bullshit from beginning to end with a lot words and superficial pseudo structure.

But well, this maybe is not the best prompting.7 And in a year so, various tech firms will have stolen my article here anyhow, and then maybe, when you try it out, they will give you a better result. Until then you can maybe seriously try it out yourself.

  1. No LLM would have ever created such a long sentence without instruction, I guess. But I felt like this here. 

  2. If you want to know more in this direction, there is an education report by Anthrophic on the use of their Claude model within Bloom’s taxonomy. 

  3. If you allow me the pun and another analogy: Every time we measure a KPI – let it be business numbers in a company, or source code metrics, or whatever quality number you find in whatever organization –, there is the big danger of mistaking this number with the thing per se. Numbers have the powerful property that we can do calculations, but we shall not forget that they are not the real thing that was measured (e.g., quality of whatever). We trade tangibility against truth. 

  4. Note that our augmentation has a pure didactic value. There is no additional strength in arguments compared to the original version from 1984. 

  5. The Adam algorithm is really only a few lines of basic calculations. 

  6. Here we man “convex hull” in a metaphorical sense, but actually, there is some deeper sense to it. 

  7. I then hinted the system to McEnerny’s model of expert writing. More lengthy bullshit answers.