By Christian Prokopp on 2022-12-04
OpenAI's ChatGPT has made the news recently as a next-generation conversational agent. It has a surprising breadth which made me wonder, could OpenAI generate specific technology content good enough to post, and what would that imply for the future?
First, I tried the research preview of ChatGPT.
Me: What is AWS Athena?
ChatGPT: Amazon Athena is a serverless query service that allows you to analyze data in Amazon S3 using SQL. Athena is easy to use, and you can get started with it quickly. It allows you to query data stored in Amazon S3 using SQL, and can be used to analyze a wide variety of data such as logs, clickstream data, and IoT data. With Athena, there is no need to set up, manage, or scale a data warehouse, which makes it a cost-effective and convenient option for querying data stored in S3.
Nice. Immediately, ChatGPT could be useful querying documentation or even the internet (hey Google) in a better way. But how about something more complex, like writing technical content? If you read Medium or other sources, you know that there is a wide range of human-written content, from the novice and incomprehensible to simple tutorials and analysis to deep, insightful technical domain expertise.
Last week, I wrote about AWS Athena, and in the past, teams I worked with discussed the pros and cons between Athena and Redshift. They are different tools for different purposes but have some overlap. It is a particular technical discussion. Surely, a general ML text model will be unable to generate anything meaningful about the topic.
I used the OpenAI GTP-3 (text-davinci-003) website's beta and prompted it to write a blog post about the subject.
Prompt: A blog post on AWS Athena and why it is better than AWS Redshift.
OpenAI content: As businesses increasingly move their data and tracking systems to the cloud, there is an increasing need for powerful and efficient ways of querying this stored data. This is where AWS Athena comes in.
AWS Athena is a query service built on top of Amazon’s cloud infrastructure that allows users to easily query and analyze data stored in the cloud in a variety of different formats. With Athena, users can run complex queries against their data without having to manage a separate cluster or rely on managing their own hardware.
One of the primary advantages that sets Athena apart from other cloud query services is its ability to quickly query data stored in the cloud with low latency. Athena queries are able to execute in a matter of seconds, which makes it much faster than other services like AWS Redshift, which can take minutes or even hours to execute. This means that users can quickly access the data they need without having to wait for long periods of time.
Another benefit is that Athena provides users with a wide range of data sources that can be queried, including Amazon S3, Amazon RDS, and other cloud storage services. This makes it much easier to query data from multiple sources in one place.
Finally, Athena is also much cheaper to use than AWS Redshift. With Athena, users only pay for the queries they run, allowing them to save significant amounts of money. By contrast, with Redshift, users have to pay for the cluster as well as the queries, which can add up to significant costs.
In conclusion, AWS Athena provides businesses with a powerful and efficient way of querying data stored in the cloud in a variety of different formats. With its ability to execute query quickly with low latency, support for a range of data sources, and lower costs than AWS Redshift, Athena is a great choice for businesses looking to efficiently query and analyze their cloud data.
Impressive. Some details are either inaccurate or depend on circumstance, but this was mostly correct and had some valuable insight. It is as good (or bad) as some posts I have read thrown together by the odd human, but also, it is far from what an eloquent domain expert could write. It does not come across as artificial or nonsensical, importantly.
As I wrote, it is an open argument, so let's see what happens if we favour Redshift in the prompt as the better solution.
Prompt: A blog post about AWS Redshift and why it is better than AWS Athena.
OpenAI content: AWS Redshift is a cloud-based data warehouse service that is part of Amazon's larger suite of cloud-based products. It provides businesses with a scalable, secure, and cost-effective way to store and analyze their data. It offers features such as columnar storage, automatic compression, advanced query optimization, and much more.
Compared to AWS Athena, Redshift offers a number of advantages that make it a better choice for businesses looking to store and analyze their data.
First, Redshift is much faster than Athena. With Redshift, it is easy to quickly analyze large amounts of data. It offers advanced query optimization, allowing queries to be processed faster and more efficiently. Additionally, Redshift offers accelerated databases, which can increase query performance even further.
Second, Redshift is more secure than Athena. It offers improved security measures such as encryption at rest and in transit, role-based access control, and data auditing. With these features, businesses can rest assured that their data is safe and protected.
Third, Redshift offers much more scalability than Athena. Redshift's elasticity and scalability allows businesses to easily scale up or down their data analysis needs as the business grows. This makes it easier to manage and control costs associated with storing and analyzing data.
Finally, Redshift is much more cost-effective than Athena. It offers businesses flexible pricing options, making it easy to select the best plan for their particular needs.
We can see an inconsistency in the argument, e.g. both scenarios claim to be the faster and cheaper one. But you could argue with enough know-how and given the right scenarios either way. Is OpenAI that smart, or is it simply repeating common unsubstantiated claims from mediocre articles it learned? I doubt it is that clever. Still, the point is that it is good enough that you can argue about it and not immediately dismiss it as false or artificial.
Sticking with OpenAI, I created the image of the robot with the typewriter at the beginning using Dall-E. Here is the kicker, the two blog posts and image generation cost only 20 cents and minutes of machine time. How much money and time would it cost to hire a mediocre technical writer and artist to create equivalent content?
Highly qualified humans can still create much more insightful and correct content. However, things are about to change on the lower end of content creation, which is about the art of good enough, cost and scale. And where that lower end begins will move rapidly with the quality of the machine learning models improving.
As a pessimist, I imagine the advent of a new wave of low-cost spam content indistinguishable from mediocre human content regurgitating existing content learned in new variations.
As an optimist, I imagine the advent of new content generation that synthesises unique insight from multiple data sources or more accessible information like translating documentation into compelling articles and posts.
Whatever the future holds, visual and text content generation will change dramatically.
Christian Prokopp, PhD, is an experienced data and AI advisor and founder who has worked with Cloud Computing, Data and AI for decades, from hands-on engineering in startups to senior executive positions in global corporations. You can contact him at christian@bolddata.biz for inquiries.
2024-04-12
128k tokens are 96k words in English for ChatGPT 3.5 and 4. The ratio is estimated to be 0.75 words per token. However, the answer is not straightf...
2023-11-23
Recently, OpenAI released GPT4 turbo preview with 128k at its DevDay. That addresses a serious limitation for Retrieval Augmented Generation (RAG)...
2023-04-12
Learn to harness the potential of ChatGPT4, your virtual programming partner, with nine prompting tips. Improve your programming skills by communic...
2023-02-14
Discover the power of the Delta Lake transaction log - ensuring Data reliability and consistency.
2022-05-18
According to an adage, big data is anything too big for Excel, i.e. more than 1,048,576 rows. It is a bit cheek-in-tongue, but, as with many jokes,...
2022-05-10
Get huge, valuable datasets with 4.9 million Amazon bestsellers for free. No payment, registration or credit card is needed.