Rent Estimator — Capstone Project
Lightweight ML for fair‑market rent • Python • Ridge regression • One‑hot encoding • CLI & web demo
Overview
A compact machine‑learning tool that estimates monthly rent from basic property attributes (zip code, bedrooms, bathrooms, unit square footage, price per square foot, and listed price). Built to be transparent, fast, and easy to run, it helps small landlords and affordable‑housing teams benchmark prices without costly, opaque platforms.
Key Features
Clean & Reproducible Data
- IQR trimming for Area, PPSq, and RentEstimate.
- Caps on bedroom (≤6) and bathroom (≤5) counts; nulls removed with simple rules.
- Saved cleaned dataset for consistent training/evaluation.
Simple, Trustworthy Model
- Ridge regression (α=10) for stability without sacrificing interpretability.
- One‑hot encoding for Zipcode to capture neighborhood effects.
- Clear CLI interface; web helper adjusts multifamily listed price by unit‑to‑building sqft ratio.
Practical Performance
- Hold‑out metrics: R² ≈ 0.63, RMSE ≈ $451.
- Predictions on sample homes within a few percent of public benchmarks.
- Fast to run and easy to retrain with new data.
Designed for Accessibility
- Lightweight dependencies and local execution.
- Transparent inputs/outputs to support human judgment.
- Clear README and modular scripts for cleaning, training, and inference.
Data & Methods
- Data cleaning: IQR filtering, caps, and null handling in clean_data.py.
- Modeling: train_data.py fits Ridge on encoded features; saves model & transformer.
- Inference: predict_rent_web.py applies unit‑level price scaling for multifamily inputs and runs the trained model.
Inputs & Features
- Core features: Zipcode (one‑hot), Bedroom, Bathroom, Area (unit sqft), PPSq, ListedPrice.
- CLI: prompts for property details; Web helper computes PPSq and adjusts ListedPrice as needed.
Intended as a transparent guide for pricing decisions, not an appraisal. Results depend on data quality and market conditions.
How It Works
- Load & clean the dataset; export cleaned.csv.
- Encode & train using Ridge with one‑hot Zipcode; persist model & transformer.
- Predict via CLI or web helper, returning an estimated monthly rent.
- Iterate by refreshing data and re‑training as markets change.
Technologies & Design
- Python pandas scikit‑learn joblib
- Modular scripts for cleaning, training, and prediction; emphasis on clarity and maintainability.
Ethics & Limitations
No personal data is used; the tool is positioned as a decision aid. Estimates are influenced by coverage and quality of source data and should be considered alongside local knowledge.
Author: Joel J Gerard • projects.jjg.dev