10 Open Datasets for Linear Regression
Computers & Technology → Technology
- Author Limarc Ambalina
- Published April 30, 2020
- Word count 542
Every data scientist will likely have to perform linear regression tasks and predictive modeling processes at some point in their studies or career. For those of you looking to learn more about the topic or complete some sample assignments, this article will introduce 10 open datasets for linear regression. Additionally, some of the datasets on this list include regression tasks for you to complete with the data.
Linear Regression Datasets for Machine Learning
- Cancer Linear Regression
This dataset includes data taken from cancer.gov about deaths due to cancer in the United States. Along with the dataset, the author includes a full walkthrough on how they sourced and prepared the data, their exploratory analysis, model selection, diagnostics, and interpretation.
- CDC Data: Nutrition, Physical Activity, Obesity
From the Behavioral Risk Factor Surveillance System at the CDC, this dataset includes information about physical activity, weight, and average adult diet.
- Fish Market Dataset
Built for multiple linear regression and multivariate analysis, the Fish Market Dataset contains information about common fish species in market sales. The dataset includes the fish species, weight, length, height, and width.
- Medical Insurance Costs
This dataset was inspired by the book Machine Learning with R by Brett Lantz. The data contains medical information and costs billed by health insurance companies. It contains 1338 rows of data and the following columns: age, gender, BMI, children, smoker, region, insurance charges.
- New York Stock Exchange Dataset
Created as a resource for technical analysis, this dataset contains historical data from the New York stock market. The dataset comes in four CSV files: prices, prices-split-adjusted, securities, and fundamentals. Using this data, you can experiment with predictive modeling, rolling linear regression, and more.
- OLS Regression Challenge
The OLS regression challenge tasks you with predicting cancer mortality rates for US counties. The dataset contains data from cancer.gov, clinicaltrials.gov, and the American Community Survey. It is in CSV format and includes the following information about cancer in the US: death rates, reported cases, US county name, income per county, population, demographics, and more.
- Real Estate Price Prediction
This real estate dataset was built for regression analysis, linear regression, multiple regression, and prediction models. It includes the date of purchase, house age, location, distance to nearest MRT station, and house price of unit area.
- Red Wine Quality
From the UCI Machine Learning Repository, this dataset can be used for regression modeling and classification tasks. The dataset includes info about the chemical properties of different types of wine and how they relate to overall quality.
- Vehicle Dataset from CarDekho
A useful dataset for price prediction, this vehicle dataset includes information about cars and motorcycles listed on CarDekho.com. The data is in a CSV file which includes the following columns: model, year, selling price, showroom price, kilometers driven, fuel type, seller type, transmission, and number of previous owners.
- WHO Statistics on Life Expectancy
This dataset contains information compiled by the World Health Organization and the United Nations to track factors that affect life expectancy. The data contains 2938 rows and 22 columns. The columns include: country, year, developing status, adult mortality, life expectancy, infant deaths, alcohol consumption per capita, country’s expenditure on health, immunization coverage, BMI, deaths under 5-years-old, deaths due to HIV/AIDS, GDP, population, body condition, income information, and education.
Rate article
Article comments
There are no posted comments.
Related articles
- Some reflections about leveraging GenAI at scale within IT departments
- How ADP Workforce Implementation Drives Faster Go Live and Better ROI
- Building Trust in the Mobile‑First Financial Era with Fintech App Development and Cyber Security
- Is Cross-Platform Board Game Development Worth the Investment in 2026?
- Why HyphenX Solutions is Your Ideal Salesforce Managed Service Provider for Scalable Growth
- Bluemile Academy – Video Editing Institute in Mohali
- Server Repair Services in India: Reliable, Fast & Affordable IT Support for Businesses
- How End Of Arm Tooling (EOAT) Enhances Productivity And Quality In Automotive Plastics
- Design Considerations For Injection Moulding: What You Need To Know Before Manufacturing
- Safety, Lifespan, And Performance: Key Considerations For Lithium Battery Use
- How Talent Management Software by Bullseye Engagement Transforms Workforce Performance
- Why Businesses Choose VALiNTRY to Hire Full Stack Expert Talent in 2026
- Australia’s Most Popular Taxi Dispatch Software Reviewed: Which One Truly Wins
- How AI Trends in 2026 Are Changing Gaming, Software, and Digital Access
- Microsoft Office 2026 Review: Why This New Perpetual Version Just Became My Go-To Tool in Early 2026
- The Evolution of Mobile App Development in the Age of Agentic AI and Generative Intelligence
- Understanding AI-Powered Security Awareness Training and Its Impact
- How Office Phone Systems Improve Customer Communication and Productivity
- Key Features of Powerful Executive Business Intelligence Dashboards
- Hire Top Cash Application Specialists to Improve Cash Flow and Reconciliation
- Professional SEO Services Boise to Boost Your Online Presence
- Streamline Payroll and HR with Professional Outsourced HR Payroll Services
- How HCM Software Improves Employee Performance and Engagement
- Challenges and Solutions in Slot Game Development
- HI1060 1xN Single-Mode Fiber Optic Motor-Modulated Optical Switch – Low Loss, High Isolation
- Single-mode 4×8 Matrix Fully Switched Optical Switch: The Core Engine for Building Flexible Optical Networks
- Principle of Optical Fiber Collimator: Core Technology for Improving Optical Switch Performance
- How Staff Augmentation Salesforce Improves Operational Efficiency and Execution
- Why Businesses Should Hire Certified Petroleum Engineers for Oil and Gas Projects
- Why Most SaaS Products Fail at Onboarding (And How UI/UX Fixes It)