Practical text generation and writing assistants

January 8, 2019 — March 23, 2023

faster pussycat
machine learning
real time
signal processing

Friendly user interfaces for text generation with those large language models for the end user.

NB, since I wrote this, ChatGPT came online and is changing the world. I will not be commenting on that, because it is a phenomenon with independent momentum, and sufficient input from the internet commentariat.

Worth making a listicle here because this domain is full of exponentially much spam because many writers assistant tools are written by spammers to assist spammers. Secondary markets are for grey area applications like university essay generators. As someone who has a stated position against university essays, these purposes do not seem morally distinct to me. They are all spam. Nonetheless, as with any taboo-adjacent market, it is hard to filter entrants in it for quality.

To be clear, I do not want to write spam or essays (although, what isn’t spam? Surely history is the judge of that) but I do want a writing assistant with a smooth and helpful UI.

Figure 1

1 Hacks

See LLM hacks for some hacks to get around the limitations of the current state of the art.

2 text generation tools

Spammy recommendations from reddit, all essentially obsolete now.

3 For science in particular

Colleagues recommend the generic tool QuillBot AI for easing writing papers.

There is as far as I know only one science-specific text-generating large language model, and it has been a public furore. See Galactica saga.

A lot of the coverage has been negative. Perhaps I am missing something, but I do not get why.

Here is some stridently negative coverage: Why Meta’s latest large language model only survived three days online.

Galactica was supposed to help scientists. Instead, it mindlessly spat out biased and incorrect nonsense.

I guess the presumption here is that large language models should do science, rather than help us write science. I think I missed the memo when a large enough neural network would deduce the laws of physics and sociology etc.

Unfortunately, real scientists spit out biased and incorrect nonsense all the time, and people impersonating real scientists do too, and we already spend a lot of time addressing that. Lowering the cost of producing mindless bigotry might be a problem, I suppose, if we are concerned about conference and journal reviewers being overwhelmed by low-quality pseudo-research… but I cannot really imagine that being a huge problem — if a given researcher regularly produces crap they can be easily blacklisted via their institutional email addresses etc. What am I missing?

Is it that members of the public might be confused by spammy science-impersonating PDFs? I suppose that is a concern; In which case it is another argument for reducing the importance of journal publisher paywalls. After all, academic publisher justify their existence largely in terms of the importance of their gatekeeping and verification functions.

4 running LLMs at home

5 Grammar and style checking

6 References

Stiennon, Ouyang, Wu, et al. 2020. Learning to Summarize with Human Feedback.” In Advances in Neural Information Processing Systems.
Taylor, Kardas, Cucurull, et al. 2022. Galactica: A Large Language Model for Science.”