P-Value Calculator
Part of Math & Statistics Calculators
Calculate p-values from test statistics to determine statistical significance. Enter a z-score or t-statistic and select one-tailed or two-tailed test to get the p-value and significance assessment.
How to Use This P-Value Calculator
This calculator determines p-values from test statistics to help you assess statistical significance. Follow these steps based on your test type:
For Z-Score (Standard Normal Distribution):
- Enter your calculated z-score from your statistical test
- Select the test type: two-tailed (testing for any difference), left-tailed (testing if less than), or right-tailed (testing if greater than)
- Choose your significance level (alpha) for comparison
- Click Calculate to see the p-value and significance determination
For T-Statistic (Student's T-Distribution):
- Enter your calculated t-statistic
- Enter the degrees of freedom (typically n-1 for one-sample t-tests, or n1+n2-2 for two-sample tests)
- Select your test type and significance level
- Click Calculate to see results
The calculator displays the p-value and clearly indicates whether your result is statistically significant based on your chosen alpha level.
What is a P-Value?
The p-value is one of the most important yet frequently misunderstood concepts in statistics. It represents the probability of obtaining test results at least as extreme as what you observed, assuming the null hypothesis is true. In simpler terms, it answers the question: if there really is no effect, how likely would we be to see results this extreme by random chance?
A smaller p-value provides stronger evidence against the null hypothesis. When the p-value falls below your predetermined significance level (alpha), you reject the null hypothesis and conclude that your results are statistically significant. However, statistical significance does not automatically mean practical significance or that the effect is large or important.
Common interpretations of p-values:
- p less than 0.001: Very strong evidence against the null hypothesis. Results are highly significant.
- p less than 0.01: Strong evidence against the null hypothesis.
- p less than 0.05: Moderate evidence against the null hypothesis. This is the most common threshold for significance in many fields.
- p between 0.05 and 0.10: Weak evidence, sometimes called marginally significant or approaching significance.
- p greater than 0.10: Insufficient evidence to reject the null hypothesis.
Important caveats: A p-value is NOT the probability that the null hypothesis is true. It is also NOT the probability that your results occurred by chance. The p-value only tells you about the probability of the data given the null hypothesis, not the probability of the hypothesis given the data.
One-Tailed vs Two-Tailed Tests
Choosing between one-tailed and two-tailed tests depends on your research hypothesis and should be decided before collecting data:
Two-tailed test: Tests whether the parameter is different from the hypothesized value in either direction (greater than OR less than). Use this when you want to detect any deviation from the null hypothesis without specifying direction. The p-value accounts for extreme values in both tails of the distribution. This is the more conservative and commonly used approach.
Left-tailed test: Tests whether the parameter is less than the hypothesized value. Use when your alternative hypothesis specifically predicts a decrease or lower value. The p-value only considers the left tail of the distribution.
Right-tailed test: Tests whether the parameter is greater than the hypothesized value. Use when your alternative hypothesis specifically predicts an increase or higher value. The p-value only considers the right tail of the distribution.
One-tailed tests have more statistical power to detect effects in the specified direction but cannot detect effects in the opposite direction. Only use one-tailed tests when you have strong theoretical justification for predicting the direction before seeing the data.
Formulas for P-Value Calculation
Two-tailed Z-test: p = 2 * (1 - CDF(|z|))
Right-tailed Z-test: p = 1 - CDF(z)
Left-tailed Z-test: p = CDF(z)
For t-tests, the same logic applies but using the t-distribution CDF with the appropriate degrees of freedom. The t-distribution has heavier tails than the normal distribution, especially for small sample sizes, resulting in larger p-values for the same test statistic value.
Use the Z-Score Calculator to compute z-scores from raw data. The Confidence Interval Calculator provides another approach to statistical inference. Plan your study with the Sample Size Calculator for adequate statistical power.
Frequently Asked Questions
What does a p-value of 0.03 actually mean?
It means there's a 3% probability of observing your test statistic (or one more extreme) if the null hypothesis were true. Since 0.03 is less than the conventional 0.05 alpha threshold, you would typically reject the null and call the result statistically significant. The p-value is NOT the probability that the null hypothesis is true.
Should I use a one-tailed or two-tailed test?
Use a two-tailed test (the default) when you only care that a difference exists, regardless of direction. Use a one-tailed test only when your hypothesis specifies a direction in advance (e.g., "the new drug lowers blood pressure"). One-tailed tests give half the p-value of an equivalent two-tailed test, so a z of 1.96 produces p = 0.025 one-tailed or p = 0.05 two-tailed.
Why is p < 0.05 the standard threshold?
It's a historical convention from Ronald Fisher in the 1920s, not a mathematical law. Many fields now require p < 0.01 (medical trials) or p < 0.0000003 / 5σ (particle physics). For exploratory work, p < 0.10 may be acceptable, while preregistered confirmatory studies often demand p < 0.005.
Common Mistakes to Avoid
- Treating p = 0.051 as "not significant" and p = 0.049 as "significant": The thresholds are arbitrary. Report exact p-values and effect sizes rather than dichotomizing.
- P-hacking by testing many hypotheses: If you run 20 independent tests at α = 0.05, you expect 1 false positive by chance. Use Bonferroni correction (divide α by number of tests) or false discovery rate methods.
- Confusing statistical and practical significance: With a large enough sample, trivial differences become statistically significant. Always report effect size (Cohen's d, odds ratio) alongside the p-value.
- Choosing one-tailed after seeing data: Switching from two-tailed to one-tailed post hoc to halve your p-value is a form of p-hacking. Pick your test type before running the analysis.
Quick Reference
| Alpha Level (α) | Interpretation / Critical Z (two-tailed) |
|---|---|
| 0.10 | Marginal evidence / ±1.645 |
| 0.05 | Standard significance / ±1.96 |
| 0.01 | Strong evidence / ±2.576 |
| 0.005 | Very strong evidence / ±2.807 |
| 0.001 | Highly significant / ±3.291 |
| 3 × 10⁻⁷ (5σ) | Particle physics discovery / ±5.0 |