logoChat Smith
AI Prompt

20 ChatGPT Prompts for Data Analysis to Get Faster, Sharper Insights

Use these 20 ChatGPT prompts for data analysis to explore datasets, find patterns, write Python and SQL, and communicate insights clearly and quickly.
20 ChatGPT Prompts for Data Analysis to Get Faster, Sharper Insights
A
Aiden Smith
Mar 31, 2026 ・ 16 mins read

Data analysis is where most of the real work happens — cleaning messy datasets, finding patterns, writing queries, and turning numbers into decisions. AI models like Claude, GPT-5, and Gemini can compress hours of exploratory work into minutes when you know how to prompt them well. The right ChatGPT prompts for data analysis do not just ask for output — they give the AI enough context to act like a skilled analyst working alongside you: understanding your data structure, your business question, and the level of rigour you need.

These 20 prompts cover the full analysis workflow: understanding and cleaning data, exploratory analysis, SQL and Python code generation, statistical interpretation, visualisation, and communicating findings to stakeholders. Each prompt is built to be dropped in directly — or adapted for your specific dataset and context.

How to Get the Most Out of AI for Data Analysis

The single most important thing you can do is give the AI your actual data structure. Paste in a sample of your dataset, describe your column names, or share a schema. An AI working from a concrete example will produce code that runs and insights that are relevant — an AI working from a vague description will produce generic output that needs extensive editing. The prompts in this collection include placeholders for this context; the more you fill them in, the better your results.

Chat Smith gives you access to Claude, GPT-5, Gemini, and more AI models in one place — so you can run the same data analysis prompt across multiple models and compare their approaches, or switch between models depending on the task.

Stage 1: Understanding and Cleaning Your Data

Before analysis comes data quality. These prompts help you understand what you are working with and surface problems before they contaminate your results.

Prompt 1: Initial Data Audit

"I have a dataset with the following columns and sample rows: [paste sample]. Act as a senior data analyst. Audit this dataset and tell me: (1) what each column likely represents, (2) which columns have data quality issues such as nulls, inconsistencies, or likely errors, (3) what questions this data can and cannot reliably answer, and (4) what cleaning steps I should take before analysis."

Why it works: this prompt treats the AI as a collaborator reviewing your raw data rather than a tool executing a specific task. The four-part structure forces a comprehensive audit rather than a surface-level scan.

Prompt 2: Data Cleaning Code

"Write Python (pandas) code to clean this dataset: [paste sample or describe structure]. The cleaning should: (1) handle missing values in [column names] using [strategy: drop / fill with median / fill with mode / forward fill], (2) standardise inconsistent values in [column] — for example [list known inconsistencies], (3) convert [column] to the correct data type, and (4) remove duplicate rows based on [key columns]. Add a comment explaining each cleaning step."

Why it works: specifying the cleaning strategy for each column prevents the AI from making assumptions that could distort your data. The request for comments makes the output auditable.

Prompt 3: Outlier Detection

"I have a dataset with numerical columns [list columns]. Write Python code to: (1) detect outliers in each column using both the IQR method and Z-score method, (2) print a summary showing how many outliers each method flags in each column, and (3) explain the difference between the two methods and when I should use each one. My dataset has [number] rows and represents [describe what the data is]."

Why it works: using two methods and asking for a comparison gives you a more informed decision about whether flagged values are genuine outliers or legitimate data points, which depends heavily on your domain.

Prompt 4: Feature Engineering

"I am analysing a dataset about [describe subject: e.g. customer transactions, website sessions, employee records]. My columns are: [list columns]. Suggest 5 to 8 new features I could engineer from the existing columns that would be useful for [describe your goal: e.g. predicting churn, understanding seasonal patterns, segmenting customers]. For each feature, explain what it captures and write the pandas code to create it."

Why it works: feature engineering suggestions grounded in a specific goal and dataset are far more actionable than generic advice. The code alongside each suggestion lets you immediately test whether the feature adds value.

Stage 2: Exploratory Data Analysis

Exploratory analysis is where you discover what your data is actually telling you. These prompts help you surface patterns, relationships, and anomalies efficiently.

Prompt 5: EDA Plan

"I am starting exploratory data analysis on a dataset about [describe subject]. My business question is: [state your question]. The dataset has these columns: [list columns with types]. Write a structured EDA plan that covers: (1) univariate analysis for key columns, (2) bivariate analysis for relationships most relevant to my business question, (3) the visualisations I should create and why, and (4) the hypotheses I should test. Then write the Python code to execute the plan."

Why it works: anchoring EDA to a specific business question focuses the analysis. Without this anchor, exploratory analysis can become a fishing expedition that generates outputs without answering anything useful.

Prompt 6: Correlation Analysis

"My dataset has these numerical columns: [list columns]. My target variable is [column name]. Write Python code to: (1) compute a correlation matrix, (2) identify the top 5 features most correlated with the target, (3) create a heatmap using seaborn, and (4) explain what the correlation values mean in plain language, including what high correlation does and does not tell me about causation."

Why it works: requesting the plain-language explanation alongside the code prevents the common mistake of treating correlation as causation — and makes the output immediately presentable to non-technical stakeholders.

Prompt 7: Segmentation and Grouping

"I want to understand how [metric: e.g. revenue, engagement, churn rate] varies across different segments of my data. My dataset has these grouping variables: [list categorical columns]. Write Python code to: (1) calculate [metric] by each grouping variable individually, (2) calculate [metric] by the most interesting two-way combinations, (3) flag any segments that are significantly above or below average, and (4) produce a clear summary table of findings."

Why it works: segmentation is where most actionable insights live. Asking the AI to flag statistically notable segments rather than just producing tables means you get analysis, not just data aggregation.

Prompt 8: Time Series Exploration

"I have time series data with a date column [column name] and a metric column [column name]. The data covers [date range] and represents [describe what the metric is]. Write Python code to: (1) resample the data at daily, weekly, and monthly levels, (2) plot the trend for each level, (3) calculate and plot a 7-day and 30-day rolling average, (4) identify the top 5 peaks and troughs and label them on the chart, and (5) describe what patterns are visible and what might explain them."

Why it works: looking at multiple time granularities simultaneously reveals patterns that a single-level view misses. Asking for labelled peaks and troughs turns the chart into a starting point for root cause analysis.

Stage 3: SQL Query Generation

SQL is where many data questions get answered. These prompts generate queries that go beyond basic SELECT statements to deliver real analytical value.

Prompt 9: Complex Aggregation Query

"I am working with a SQL database. My schema is: [describe tables and key columns, or paste CREATE TABLE statements]. Write a SQL query to [describe what you want to calculate: e.g. calculate monthly revenue by product category, with month-over-month growth rate and a 3-month rolling average]. Use [SQL dialect: PostgreSQL / BigQuery / MySQL / Snowflake]. Add comments explaining each section of the query and flag any assumptions you have made about the schema."

Why it works: specifying the SQL dialect matters — window function syntax and date functions vary significantly across databases. Requesting flagged assumptions means you can catch schema mismatches before running the query.

Prompt 10: Cohort Analysis Query

"Write a SQL cohort analysis query using my tables: [describe schema]. I want to: (1) group users by the month they first [performed action: e.g. made a purchase, signed up], (2) track what percentage of each cohort returned in months 1, 2, 3, 6, and 12 after their first action, and (3) output a cohort retention grid I can export to a spreadsheet. Use [SQL dialect]. Explain how cohort analysis works and what I should look for in the results."

Why it works: cohort queries are notoriously difficult to write from scratch. Getting working code alongside an explanation of what to look for means you can interpret results correctly even if you are new to cohort analysis.

Prompt 11: Query Optimisation

"I have this SQL query that is running slowly: [paste query]. My database is [SQL dialect] and the tables involved have approximately [row counts]. Review the query and: (1) identify what is likely causing the performance issue, (2) rewrite it to be more efficient, (3) suggest any indexes that would speed up this type of query, and (4) explain the changes you made and why they improve performance."

Why it works: query optimisation requires understanding both the logic and the execution plan. Asking for explanations alongside the fix helps you build the knowledge to write better queries yourself rather than just fixing this one.

Stage 4: Statistical Analysis and Interpretation

These prompts move from description to inference — helping you draw statistically valid conclusions from your data and avoid common analytical mistakes.

Prompt 12: Hypothesis Testing

"I want to test whether [describe hypothesis: e.g. customers who received the discount have higher average order value than those who did not]. My dataset has [describe relevant columns and sample sizes]. Recommend the appropriate statistical test for this hypothesis, explain why it is the right choice, write Python code to run the test, and interpret the results in plain language including what the p-value means for my business decision."

Why it works: most analysts know they need a statistical test but are unsure which one to apply. Asking the AI to justify its recommendation builds your statistical intuition alongside delivering the answer.

Prompt 13: A/B Test Analysis

"I ran an A/B test with the following setup: control group size [n], treatment group size [n], control conversion rate [%], treatment conversion rate [%], test duration [days]. Analyse this A/B test and tell me: (1) whether the result is statistically significant and at what confidence level, (2) the practical significance — is the effect size large enough to matter for my business, (3) whether there are any concerns about the test design that might invalidate the result, and (4) what I should do next."

Why it works: statistical significance and practical significance are different things. Many A/B test analyses focus only on the p-value and miss the question of whether the effect is actually large enough to justify the cost of the change.

Prompt 14: Regression Analysis

"I want to understand what drives [target variable] in my dataset. My features are: [list columns]. Write Python code to: (1) run a linear regression with [target] as the dependent variable and [feature list] as independent variables, (2) check the key regression assumptions (linearity, homoscedasticity, normality of residuals, multicollinearity), (3) interpret the coefficients in plain language, and (4) identify which features have the most predictive power and flag any that should be removed."

Why it works: regression without assumption checking produces unreliable results. Building the assumption tests into the prompt ensures you get a valid model rather than just a model.

Stage 5: Visualisation and Reporting

Analysis that cannot be communicated clearly has limited value. These prompts help you build charts that make your findings impossible to ignore.

Prompt 15: Chart Selection and Code

"I want to visualise [describe what you want to show: e.g. the distribution of customer ages by product category, the trend in weekly revenue over the past year, the relationship between marketing spend and conversions]. My dataset has these relevant columns: [list]. Recommend the best chart type for each visualisation and explain why. Then write Python code using matplotlib and seaborn to create each chart with proper titles, axis labels, and a colour scheme that works for business presentations."

Why it works: chart type selection is a genuinely difficult decision that most people default on. Asking for justification alongside code means you understand why a particular chart works, which helps you make better choices independently.

Prompt 16: Dashboard Design Plan

"I am building a data dashboard for [describe audience: e.g. the sales leadership team, the marketing team, the board]. Their primary question is: [state the question]. My available data covers: [describe datasets and key metrics]. Design a dashboard structure that includes: (1) the 3 to 5 most important KPIs to show at the top, (2) the supporting charts and breakdowns, (3) the filters and controls users will need, and (4) the layout and visual hierarchy. Explain why each element earns its place on the dashboard."

Why it works: most dashboards fail because they show everything instead of answering a specific question. Forcing the design to start from the audience's primary question produces dashboards that actually get used.

Prompt 17: Insight Narrative

"I have completed an analysis of [describe subject] and found the following key results: [paste your findings or summary statistics]. Write an executive summary of these findings for [describe audience: e.g. the CFO, the product team, the board]. The summary should: (1) open with the most important finding, (2) explain what it means for the business in plain language, (3) highlight 2 to 3 supporting findings, (4) identify the key uncertainty or caveat in the analysis, and (5) close with a clear recommendation. Keep it under 300 words."

Why it works: translating analytical findings into business language is one of the hardest parts of the analyst role. Structuring the narrative to open with the most important finding — not the methodology — mirrors how executives actually want to consume information.

Stage 6: Advanced and Specialist Analysis

These prompts cover more specialist analytical tasks that come up frequently in data-intensive roles.

Prompt 18: Customer Segmentation

"I want to segment my customers based on their behaviour. My dataset has these customer-level features: [list features, e.g. total spend, purchase frequency, days since last purchase, product categories purchased]. Write Python code to: (1) normalise the features, (2) use K-means clustering to segment customers into 3 to 6 groups, (3) determine the optimal number of clusters using the elbow method, (4) profile each segment — what makes them distinct — and (5) suggest a name and a business action for each segment."

Why it works: segmentation is only useful if it leads to action. Asking for a name and business action for each segment forces the analysis to produce something operationally useful, not just a statistical artefact.

Prompt 19: Anomaly Detection

"I need to detect anomalies in my [describe data: e.g. daily transaction volume, server response times, weekly sales figures]. My dataset has [describe structure and time range]. Write Python code to: (1) apply Isolation Forest to detect anomalies, (2) visualise the results showing normal and anomalous points, (3) print a table of the top 10 most anomalous records with their key values, and (4) explain what kinds of anomalies this method is good at detecting and what it might miss."

Why it works: anomaly detection can surface fraud, system errors, or unexpected business events. Asking about what the method might miss is important — Isolation Forest has known limitations that matter depending on your use case.

Prompt 20: Analysis Review and Critique

"I have conducted the following analysis: [describe or paste your analysis, methodology, and findings]. Act as a critical peer reviewer. Identify: (1) any methodological weaknesses or invalid assumptions, (2) alternative explanations for the findings I may have overlooked, (3) confounding variables I should have controlled for, (4) whether my conclusions are supported by the evidence or overstate it, and (5) what additional analysis would strengthen or challenge my conclusions."

Why it works: this is the most valuable prompt on the list. Using AI to stress-test your own analysis before presenting it is like having a senior analyst review your work. It catches errors that confirmation bias would cause you to miss.

Tips for Better Data Analysis Prompts

Always include your data structure, even if you cannot share the actual data. Column names, data types, and a few sample rows give the AI enough to produce code that is close to correct rather than generically illustrative. Specify your stack — Python version, key libraries, SQL dialect — so the output runs in your environment. Ask for explanations alongside code. The explanation is not filler; it is how you catch errors and learn the reasoning so you can adapt the output. And always review AI-generated code before running it on production data.

Final Thoughts

The best ChatGPT prompts for data analysis treat the AI as a skilled collaborator rather than a code generator. Give it context, be specific about what you need, and ask it to explain its reasoning. The prompts in this collection cover the full analytical workflow — use them as templates and adapt them to your specific data and questions.

Frequently Asked Questions

1. Can I use these prompts with Claude and Gemini as well as ChatGPT?

Yes. These prompts work well across all major AI models including Claude, GPT-5, and Gemini. Different models have different strengths for data analysis — some produce cleaner code, others give better statistical explanations. Chat Smith lets you access all of them in one place so you can choose the right model for each task or compare outputs directly.

2. How much of my actual data should I share with an AI model?

Share the minimum needed for accurate output. A sample of 5 to 20 rows with realistic values is usually enough for the AI to understand your data structure and produce relevant code. For sensitive data, use anonymised or synthetic samples that preserve the structure but remove identifying information. Check your organisation's data governance policies before sharing any data externally.

3. Can I use Chat Smith to run these data analysis prompts?

Yes. Chat Smith gives you access to multiple leading AI models — Claude, GPT-5, Gemini, and more — in one interface. You can save your most effective data analysis prompts as templates, switch between models to compare their outputs, and build a personal library of prompts that work for your specific datasets and analytical workflows.

footer-cta-image

Related Articles