HubSpot lead scoring not working? 6 reasons it'll fail within six months

Published May 5, 2026

If you searched “HubSpot lead scoring not working,” there’s a decent chance one of three things just happened.

Your sales team stopped opening the alerts. Or your pipeline filled up with leads scoring 80+ that go nowhere. Or you ran a quick audit, found the scoring properties hadn’t been updated in eight months, and asked yourself why nobody noticed.

You are not alone, and the cause usually isn’t HubSpot itself. The cause is that lead scoring looks like a weekend project. It’s not.

We built kenbun after a decade in marketing automation, and we still got this wrong. We started kenbun thinking: scoring is just weighted attributes, so why hasn’t anyone nailed this? Months in, we finally got it. Scoring isn’t a math problem. It’s a context problem.

What we mean by that: the meaning of any given signal isn’t fixed. The same /pricing visit looks different from the CRO of a target account, a competitor doing research, and a job candidate prepping for an interview. The same demo request three days old is a different signal than the same demo request six months old. The same fit rule that worked when you sold to mid-market SaaS becomes the wrong rule the day you move upmarket. The math doesn’t change. The context does. Any scoring approach that treats inputs as static integers and runs a formula on top is going to drift the moment that context shifts, which is most weeks.

The way native HubSpot scoring is set up almost guarantees that drift compounds inside of six months.

Here are the six reasons it breaks, plus what to build instead.

Why HubSpot’s scoring works in week one and breaks by month six

Quick context check, since most teams reading this just lived through it. HubSpot retired the legacy HubSpot Score property on August 31, 2025; existing scores stopped updating that day, and the migration to the new Lead Scoring tool was manual. If you’ve been on HubSpot for more than a year, you either rebuilt your model under duress in mid-2025, or you’re still running on a half-migrated setup with rules that were ported badly. Either way, your current scoring is younger than your sales team’s expectations of it.

The new tool is more capable than the legacy property. It produces separate Engagement, Fit, and (on Enterprise) Combined scores instead of a single number, supports decay, and has a more flexible rule builder. It also inherits the same structural choices that made native HubSpot scoring fragile in the first place.

Three of those choices are worth naming up front:

Scores are flat integers per property. The new tool exposes Engagement, Fit, and Combined as separate properties, but each one is still a single number. There’s no per-rule breakdown sitting next to the score, no inline audit trail of which events contributed, and the threshold property is just a band label (High/Medium/Low or A1–C3) painted onto the same integer.
Rule weights are hand-tuned. Whoever set them up made a guess in 2025, and that guess is still running today.
Decay is coarse-grained. HubSpot’s lead scoring tool does support decay (you can configure a group of events to lose a fixed percentage over a fixed period), but the unit is the event group, not the event. Every signal in a group decays at the same flat rate, and you have to choose between decay and a hard timeframe filter for that group. There’s no concept of a half-life that lets different signal types age at different speeds.

Layer the realities of a growing B2B business on top of those three choices, and the model goes wrong in six specific ways.

A note on Professional Hub

If you’re on Marketing Hub Professional, the August 2025 migration didn’t just rebuild your scoring. It took capabilities away. On Pro, the new tool:

Caps total scores at 100 points (the legacy property allowed up to 300).
Doesn’t support compound conditions (AND/OR logic) inside a single rule. You can’t write “less than 5 page visits AND more than 3 emails opened” as one criterion.
Doesn’t expose count-based engagement signals like number of page visits or marketing emails opened.
Restricts you to either an Engagement score or a Fit score per object. The Combined score is Enterprise-only.

Several of the failure modes below get sharper on Pro as a result. Where it matters, we’ll flag the specific Pro-tier gap.

1. Fit isn’t a checklist

Most HubSpot scoring models start with a fit rule that looks like this: “if Industry = SaaS AND Company Size = 50–500 AND Country = United States, add 25 points.”

That feels precise. It isn’t.

“Mid-market B2B SaaS in North America” sounds clean until you check your closed-won deals from the last two quarters and find that 40% of them are technically SMB. Or that a third of them came from companies whose HubSpot Industry field is “Marketing & Advertising” because that’s how their HubSpot admin set up the dropdown. Or that your fastest-growing segment, usage-based PLG companies, doesn’t fit the Industry taxonomy at all.

Static fit rules can’t handle the fuzzy reality of a market that’s drifting under you. Your ICP at the moment you wrote the rules and your ICP today are not the same. Every time you raise prices, move upmarket, launch a new product, or change positioning, the old fit rules start scoring the wrong leads.

Native HubSpot has no way to flag that drift. The scoring property keeps producing numbers. You keep trusting them. Sales keeps closing fewer of them. Nobody connects the two.

What better looks like: calibration tooling that compares fit rules against your closed-won data on demand, plus explicit segmentation so you know which slice of the market each rule covers. If your ICP changes, you should be able to tell which leads scored under the old definition versus the new one, not just see one number that absorbed both.

2. Intent signals lie

This is the failure mode most teams underestimate.

The native HubSpot scoring guides assume that more activity = more intent. They tell you to give 5 points for a pricing page visit, 10 points for a demo request, 15 points for filling out a form. Stack enough of those and you have a “hot” lead.

It doesn’t work that way in practice. A lead hitting /pricing four times in a week might be a buyer running a vendor comparison. It might also be a competitor doing the same thing. A lead who reads one engineering documentation page and never returns might be your next $50K deal, or might be a curious developer with no purchasing authority.

Volume of activity is not buying intent. The relationship between the two depends on context: which page, the visitor’s role, the company’s stage, whether there’s an open opportunity, and how recent the activity is.

Native scoring can’t express that context. A pricing page view scores the same five points whether the visitor is the CRO at your dream account or a researcher at a competitor. Multiplied across thousands of leads, that flattening turns the score into noise.

What better looks like: intent rules that weight by event quality, not event count, and are honest about what they don’t know. A whitepaper download means hours of digestion and warrants weeks of warmth. A pricing page view means thirty seconds of curiosity and decays in days. Treating those the same is how good leads get buried under low-intent noise.

Pro Hub note: the new tool on Professional doesn’t expose count-based engagement properties at all (number of page views, number of marketing emails opened). You can score “viewed pricing page” as a binary yes/no, but you can’t score “viewed pricing page 4+ times in 7 days” without dropping into custom workflows. That makes the volume-vs-intent trap easier to fall into on Pro, not harder.

3. The black-box trust gap

Even when the rules are right, there’s a second failure mode that catches almost everyone: your sales team doesn’t trust the score, because they can’t see why a lead got it.

Native HubSpot shows you a number. If you click into the property, you can see the value. You usually cannot see, in any glanceable way:

Which specific rules contributed
Which events on this lead triggered them
How recently each contribution happened
What the score would be if you removed any one signal

That gap matters because trust in scoring is operational, not philosophical. An SDR working a queue of 30 leads needs to make a decision in five seconds: call this one, or skip to the next? If the scoring system says “87” and the SDR has no idea what’s behind it, they fall back on the heuristics they trust: recency, company name recognition, whether the lead reminds them of a recent close. The score becomes decorative.

We’ve watched this happen at three companies. The score gets built. Sales nods politely in the launch meeting. Within a month, every rep has gone back to working the inbound list by hand, sorted by company size or just chronological order. The score keeps incrementing in a property nobody reads.

This is the worst kind of failure because it is silent. The scoring still produces numbers. Dashboards still chart them. Nobody flags it as broken.

What better looks like: every score should be auditable down to the event. When a lead is an 87, you should be able to click in and see exactly which rules fired, which events triggered them, and what each one contributed. Without auditability there is no trust. Without trust there is no follow-up.

4. Decay is too coarse to be useful

There are two decay problems most teams hit, and HubSpot only solves one of them.

Problem 1: rule decay. When your business changes, your rules go stale. New product line, price increase, move upmarket, shift in target geography. Every one of those moves the goalposts. The scoring property keeps running on whichever rule set was last edited. If you launched a new product six weeks ago, the leads coming in for that product are still being graded by rules tuned for last quarter’s product mix. Most teams discover this only when their MQL → SQL conversion rate has been quietly dropping for two quarters, and the win rate on “high-fit” leads no longer matches the old benchmark. HubSpot’s account audit log records when an admin edits the rules, but there’s no built-in way to replay a lead’s score under a previous rule set or to correlate rule changes with downstream conversion movement.

Problem 2: event decay. Even if your rules are right, the events they reference get stale, and not at the same speed. A pricing page view from yesterday is a meaningful signal. A pricing page view from eight months ago, sitting on a contact who hasn’t been seen since, is just noise. A whitepaper download from six months ago is somewhere in between.

HubSpot’s modern lead scoring tool does support score decay: you can configure a group of events to lose a percentage of their score over a fixed period (e.g. “lose 25% after 30 days”). That’s a real improvement over the legacy single-property HubSpot Score. The remaining limitations:

Decay is set per event group, not per event. Every action in the group decays at the same flat rate.
Decay is a linear percentage step, not a curve. The score drops by the configured amount once the configured time elapses, not smoothly over time.
You have to choose between decay or a hard timeframe filter for any given group. You can’t say “include this event for 6 months but decay it as it ages.” Pick one.

If you want different events to decay at different speeds (a whitepaper holding 80% of its weight after a month, a pricing page losing 90% in a week, a demo no-show losing nothing for the first 14 days then dropping fast), you can approximate it by splitting events across multiple groups, each with its own decay setting. That works, but the configuration cost grows quickly with the number of distinct activities you care about, and the underlying math is still stepwise rather than continuous.

This is the gap kenbun fills with per-event half-life decay. Every activity gets its own exponential decay curve, parameterized by a half-life that reflects how durable that specific signal is. Whitepaper downloads have a long half-life because reading a whitepaper is a real time investment. Pricing page views have a short half-life because thirty seconds of curiosity isn’t a durable signal of intent. The math is exponential rather than stepwise, so leads age out smoothly rather than dropping by a fixed percentage on day 30.

What better looks like: decay that’s per-event and curve-based, not group-wide and step-based, so signal types with very different shelf lives can coexist in the same scoring model. Plus rule-change audit logs, so you can see when a weight moved and why.

5. You can’t tell if your score is actually working

The most overlooked problem: HubSpot has no built-in conversion lift report.

Score validation, the practice of asking “do my A1 leads actually convert at higher rates than my C3 leads?”, is the single most important QA loop in any scoring system. When the answer is yes, you have a working model. When the answer is no, or when A1 and C3 are converging, you have a drift problem that’s about to cost you pipeline.

You can construct this analysis manually. Drop a Custom Report on MQL→SQL conversion or Closed-Won rate, segment by score band (or by score threshold), look at the bands’ relative conversion rates. The math is straightforward. The catch is that almost nobody does it, because:

It’s a custom report, not a built-in dashboard or scheduled check.
It needs to be re-run regularly to be useful, since drift happens silently between runs.
Most HubSpot admins inherited the scoring rules from someone else and have no baseline conversion rate to compare against.

The result: scoring models that quietly degrade. The A1 band starts converting at the same rate as C3. Sales notices first, they always do, but they don’t have the data to prove the score is broken, so it just gets ignored. The score keeps producing numbers. Reports keep charting them. Nobody flags it.

This is the gap kenbun’s calibration tooling fills. We compare scoring predictions against actual closed-won outcomes, so you can see exactly how predictive each rule is, where the score is over- or under-weighting fit signals, and which rules contribute lift versus which are dead weight. When your score stops working, you find out from data, not from a CRO meeting.

What better looks like: scoring that includes built-in conversion lift measurement, a regular check on whether higher scores actually convert at higher rates. Without that loop, you’re tuning a model with the dashboard turned off.

6. Inputs should compose, not stack

The last failure mode is the one most teams reach for when scoring stops working: add more inputs.

The story usually goes like this. The model is producing weak signals, so somebody adds Bombora intent data. Then 6sense. Then ZoomInfo enrichment. Then a website chat tool. Each new source brings more events into HubSpot, each event gets a few points, and the total scores climb. Six months later the scoring is even less useful than it was, because everyone is now showing up as “hot.”

Adding more inputs doesn’t make scoring smarter. It usually makes it noisier.

Scoring isn’t the sum of its inputs. It’s the weighted composition of four kinds of signal: who the person is, who their company is, what they’re doing right now, and whether there’s an open opportunity. Each is a distinct dimension. Adding more sources within one dimension inflates that dimension; it doesn’t tell you anything new about the other three.

A lead with a high engagement score and zero profile fit isn’t a hot lead. It’s a curious researcher. A lead with high profile fit and zero engagement isn’t a hot lead either. It’s an account you should be marketing to, not pitching.

HubSpot’s modern lead scoring tool does separate Engagement from Fit at the contact level, which is a real improvement over the legacy single HubSpot Score property. Two dimensions still aren’t enough, though. Profile fit (the person) and account fit (the company) are different questions and behave differently as a market shifts; HubSpot’s Fit score collapses them. Deal context, the question of which of these leads have an open opportunity attached, lives in a separate property entirely if you’ve configured it. To get a four-dimension picture, you have to assemble it yourself across multiple objects.

Pro Hub note: if you’re on Marketing Hub Professional, the gap is structural, not just operational. HubSpot restricts Pro accounts to one Engagement score or one Fit score per object. The Combined score, the only built-in way to merge fit and engagement into a single composite, is Enterprise-only. So if you want a working multi-dimensional model on Pro, you’re either upgrading your seat, building it yourself in workflows, or replacing the scoring engine.

What better looks like: scoring that exposes the underlying dimensions instead of hiding them. kenbun uses a four-dimension model (engagement, profile fit, account fit, deal context) and shows the score per dimension as well as the composite. You can sort or segment by the dimension that matters for the play.

So what does “lead scoring that actually works” look like?

Strip the six failure modes back, and you get five design rules.

Score across dimensions, not as a single number. Engagement, profile fit, account fit, and deal context are four different questions. Don’t average them away. Show each one.

Make every score auditable. A rep should be able to click any score and see which rules fired, which events triggered them, and how recently. If you can’t show why, sales won’t act on it.

Decay events on their own clocks, not in lockstep. A whitepaper download warrants weeks of warmth. A pricing page view warrants days. HubSpot’s percentage-step decay is a fix for the legacy “events live forever” problem; the next step is per-event half-lives so signals with very different durability can coexist in one scoring model.

Audit-log rule changes. Track when weights moved and why, so you can correlate scoring changes with movement in your conversion rates.

Measure conversion lift. Run a regular check that compares actual conversion rates by score band. If your A1 band isn’t outperforming your C3 band, the score isn’t predictive and the rest doesn’t matter.

These aren’t novel. They’re how scoring works inside the better-resourced enterprise stacks. The reason most B2B teams don’t have them is that retrofitting them onto native HubSpot scoring takes weeks of workflow surgery, and the result is fragile. Native HubSpot scoring wasn’t designed for any of this.

How to fix HubSpot lead scoring without rebuilding from scratch

You have three paths, in roughly increasing order of effort.

Path 1: audit and prune. Pull a list of every contact above your hot-lead threshold and look at how many have actually engaged in the last 30 days. If more than a third are stale, you don’t have a scoring problem; you have a decay problem. Add a workflow that subtracts points for inactivity over time, and accept that the result will still be approximate.

Path 2: rebuild on HubSpot Operations Hub. With custom-coded actions and the Properties API, you can persist a per-dimension score and a rule-trigger log. This works, and we know teams who run sophisticated models on it. The cost is engineering time, both to build it and to keep it healthy as HubSpot evolves and as your ICP shifts.

Path 3: drop in a layer that does this natively. This is what we built kenbun to be. Connect in minutes; kenbun reads your HubSpot data and writes scores back as properties on contacts and deals, scores across the four dimensions, ships per-event decay out of the box, and exposes every score as an auditable breakdown. You keep HubSpot as the system of record. You stop fighting it as the scoring engine.

If your scoring already isn’t working, the worst path is the one most teams take by default: a few more rules, a few more workflows, hope it gets better. It usually doesn’t. The structure that produced the failure is still there.

TL;DR

If your HubSpot lead scoring isn’t working, it’s almost certainly one of these:

Your fit rules describe last year’s ICP.
Your intent rules confuse activity volume with buying intent.
Your sales team can’t see why a score is what it is, so they ignore it.
Your decay is too coarse (group-wide percentage steps instead of per-event half-lives), so cold signals still drag warm scores up.
You can’t tell if your score is actually working, because there’s no built-in conversion lift report.
You’ve stacked more inputs into a single number instead of separating dimensions.

The fix is structural: score across dimensions, audit every score, decay events on their own clocks, audit-log rule changes, and measure conversion lift continuously. The structural change is harder than another workflow tweak, but it’s the only thing that keeps scoring useful past month six.

If you’d rather not build that yourself: kenbun ships these defaults out of the box on top of HubSpot. Book a 30-minute call and we’ll run the structural fix on your actual HubSpot data, with the per-dimension breakdown, decay curves, and audit trail visible the same day.