By Christian Prokopp on 2023-02-03
ChatGPT can combine Data with natural language and has extensive information about most subjects. That lends itself to novel applications like creating informative data dictionaries.
Let us ask ChatGPT for a public dataset we can use for this how-to.
> List public CSV datasets with links that I could use to demonstrate your ability to create a data dictionary from a CSV file.
Next, I downloaded one CSV file from the wine dataset and took a sample. ChatGPT can easily create a simple data dictionary table from it. But if we expand the question with some thought, it can make some valuable additions. For example, we can add SQL types, units of measure, descriptions expanded by ChatGPT's general know-how, and a summary for the table.
> Create a data dictionary from the wine quality dataset for the red wine quality. Add a column for SQL data types and favour DECIMAL over FLOAT. Add a column for the Unit of Measure. Create description fields using your knowledge of red wine for each column with at least two sentences each. Make them sound natural and not repetitive. Precede the data dictionary table with a summary paragraph for data users.
The output is remarkable. Three of four columns have been added by ChatGPT using context and its knowledge base. Naturally, you would want to verify the details to ensure it fits your purpose, but it is an impressive first draft.
Christian Prokopp, PhD, is an experienced data and AI advisor and founder who has worked with Cloud Computing, Data and AI for decades, from hands-on engineering in startups to senior executive positions in global corporations. You can contact him at christian@bolddata.biz for inquiries.
2024-04-12
128k tokens are 96k words in English for ChatGPT 3.5 and 4. The ratio is estimated to be 0.75 words per token. However, the answer is not straightf...
2023-04-05
Test-driven development in Javascript with ChatGPT-4 works. An example demonstrates it using a precise description and refined prompt engineering.
2023-02-15
Prevent errors and inconsistencies with Delta Lake's robust data management technology.
2023-02-14
Discover the power of the Delta Lake transaction log - ensuring Data reliability and consistency.
2023-01-29
Can ChatGPT help you develop software in Python? Let us ask ChatGPT to write code to query AWS Athena to test if and how we can do it step-by-step.
2022-05-10
Get huge, valuable datasets with 4.9 million Amazon bestsellers for free. No payment, registration or credit card is needed.