# methodology

> Source: https://fbasalesestimator.com/methodology
> See https://fbasalesestimator.com/llms.txt for the full index.

Research methodology, v2

  How this workbook was built.

<p className="mt-7 font-serif text-[20px] lg:text-[22px] leading-[1.55] text-slate-700 dark:text-slate-300 max-w-[760px] finding-summary">Population-level lookup tables for resellers, answering: at a given BSR with a given number of 30-day rank drops, what did similar badged products in this category actually sell? Built from a fresh quarterly sample of live Keepa data, then validated through six layers of quality control before any number reaches the workbook.</p>

    Methodology revision: <time dateTime="2026-05">May 2026</time>
  <span aria-hidden="true">·</span>
    US collection window: <time dateTime="2026-04">April 2026</time>
  <span aria-hidden="true">·</span>
    CA collection window: <time dateTime="2026-04">April</time> to <time dateTime="2026-05">May 2026</time>
  <span aria-hidden="true">·</span>
  <span>Refresh cadence: quarterly</span>

<p className="finding-summary text-[16px] text-slate-700 dark:text-slate-300 mb-6 max-w-[760px]">Headline numbers, after the full cleaning and aggregation pipeline. Every figure traces back to a verifiable count in the workbook itself.</p>

<p className="finding-summary">Every observation comes from Amazon's product catalog via the Keepa API, the same upstream source used by most major reseller research tools. We never scrape Amazon directly.</p>

### 3a. Fields used.

Keepa supplies the following on a per-ASIN basis:

        <th className="text-left px-4 py-2.5 font-mono text-[11px] uppercase tracking-[0.14em] text-slate-500 dark:text-slate-400 border-b hairline">Field</th>
        <th className="text-left px-4 py-2.5 font-mono text-[11px] uppercase tracking-[0.14em] text-slate-500 dark:text-slate-400 border-b hairline">What it is</th>
        <td className="px-4 py-3 font-mono text-slate-900 dark:text-white">monthlySold</td>
        <td className="px-4 py-3 text-slate-700 dark:text-slate-300">Amazon's "Bought in past month" badge value, variation-specific. Sample requires a non-null badge.</td>
        <td className="px-4 py-3 font-mono text-slate-900 dark:text-white">drops30</td>
        <td className="px-4 py-3 text-slate-700 dark:text-slate-300">Count of significant sales-rank drops over the trailing 30 days, computed from the parent ASIN's BSR history.</td>
        <td className="px-4 py-3 font-mono text-slate-900 dark:text-white">reviewsAdded30</td>
        <td className="px-4 py-3 text-slate-700 dark:text-slate-300">Count of new reviews in the trailing 30 days. Review-velocity input, not cumulative review count.</td>
        <td className="px-4 py-3 font-mono text-slate-900 dark:text-white">categoryTree</td>
        <td className="px-4 py-3 text-slate-700 dark:text-slate-300">Keepa's category breadcrumb path. Depth 0 is root, depth 1+ is sub.</td>

### 3b. The monthlySold badge.

Amazon's "X bought in past month" badge is reported in bucket-aligned tiers (50, 100, 200, 300, ..., 1,000, 2,000, ..., 100,000+). We use the badge value as-is. We do not synthesize sales from BSR.

A widely cited critique from r/FulfillmentByAmazon<a href="#fn-1" id="fnref-1" className="fn-ref">[1]</a>: "all sales rank based sales estimators are inherently inaccurate to the same degree. Think of it this way; you can't measure the average speed of your car just by looking at the speedometer once."<a href="#fn-2" id="fnref-2" className="fn-ref">[2]</a>

We agree. Rank is a relative position, not a sales count, so we sampled actual sales-revealing badge values rather than inferring sales from rank.

### 3c. The badge floor.

The badge is shown only for products selling 50 per month or more. Roughly 60% of the Amazon marketplace is unbadged and structurally invisible to this sample. Cells dominated by the badge floor get explicit two-tier flags (see §5d). Treat hard-floor cells as "rule out, do not decide."

### 3d. Variation handling.

Amazon's variation-parent listings aggregate child-variant sales into a single monthlySold value while reporting only the parent's own rank drops. A parent with 80,000 monthly sold and 50 drops can show a per-drop ratio two orders of magnitude above the category typical. We detect and remove suspect rollup samples in stage 2 (see §5b).

<p className="finding-summary">Each category is sliced into a fixed grid of 28 cells (up to 9 BSR bands by up to 4 drops bands). Every reseller question reduces to: which cell does my product land in?</p>

### 4a. BSR bands.

Nine bands covering ranks 1 to 300,000. Ranks above 300K are outside scope.

        <th className="text-left px-4 py-2.5 font-mono text-[11px] uppercase tracking-[0.14em] text-slate-500 dark:text-slate-400 border-b hairline">BSR band</th>
        <th className="text-left px-4 py-2.5 font-mono text-[11px] uppercase tracking-[0.14em] text-slate-500 dark:text-slate-400 border-b hairline">Drops bands (nested)</th>
      <tr className="border-b hairline"><td className="px-4 py-2.5 text-slate-900 dark:text-white">1-100</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300">0-15, 16+</td></tr>
      <tr className="border-b hairline"><td className="px-4 py-2.5 text-slate-900 dark:text-white">101-500</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300">0-15, 16-50, 51+</td></tr>
      <tr className="border-b hairline"><td className="px-4 py-2.5 text-slate-900 dark:text-white">501-2K</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300">0-15, 16-50, 51+</td></tr>
      <tr className="border-b hairline"><td className="px-4 py-2.5 text-slate-900 dark:text-white">2K-5K</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300">0-15, 16-40, 41-100, 101+</td></tr>
      <tr className="border-b hairline"><td className="px-4 py-2.5 text-slate-900 dark:text-white">5K-10K</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300">0-15, 16-40, 41-100, 101+</td></tr>
      <tr className="border-b hairline"><td className="px-4 py-2.5 text-slate-900 dark:text-white">10K-20K</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300">0-15, 16-40, 41-100, 101+</td></tr>
      <tr className="border-b hairline"><td className="px-4 py-2.5 text-slate-900 dark:text-white">20K-50K</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300">0-15, 16-50, 51+</td></tr>
      <tr className="border-b hairline"><td className="px-4 py-2.5 text-slate-900 dark:text-white">50K-100K</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300">0-15, 16-50, 51+</td></tr>
      <tr><td className="px-4 py-2.5 text-slate-900 dark:text-white">100K-300K</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300">0-15, 16+</td></tr>

Mid-BSR rows use four drops bands (0-15, 16-40, 41-100, 101+) because there is more drop variance to resolve in that range. Head and tail BSRs use coarser splits.

### 4b. Sampling.

For each category in Keepa's depth-≤2 taxonomy (775 categories in the US tree, 696 in CA), we draw a stratified sample across the cell grid. The sampler:

- Targets ~50 products per cell where possible.
- Walks the full BSR by drops grid per category, so a category can contribute up to ~1,400 ASINs.
- Pulls only badged products: products where Keepa reports a non-null monthlySold value. This is the single largest methodological constraint and we surface it on every sheet. Roughly 60% of Amazon products are unbadged and structurally invisible to us.
- Reaches depth 2 in the category tree so subcategory data is captured, not just root level.

After sampling: 307,127 observation rows from 254,407 unique ASINs across both markets (some overlap between US and CA samples, since multi-marketplace ASINs can earn badges in both).

<p className="finding-summary">Every sample passes through six cleaning gates, then through a per-cell aggregator that trims outliers, removes suspect variation-parent rollups, computes percentiles, detects bimodality, and assigns confidence and floor flags.</p>

### 5a. Six cleaning gates (stage 1).

Rows that fail any gate are dropped with a logged reason:

1. ASIN present.
2. Category ID present and resolvable in the cat tree.
3. Cell key parses (BSR band plus drops band).
4. BSR present.
5. Listing age determinable (either `isYoung` or `listedSinceDays`).
6. Category exists in our loaded cat tree.

This step also bins each surviving product into its cell using Keepa-aligned band boundaries.

### 5b. Top-5% trim plus rollup-suspect filter (stage 2).

For each (category by cell) tuple we compute percentiles, but only after two defenses.

**Trim.** Sort the cell's `monthlySold` values, drop the highest 5%, then compute p10 / p25 / p50 / p75 on what remains. The trim defends against Amazon's variation-parent rollup problem. p95 and max are computed on the untrimmed data so true outliers stay visible.

**Rollup-suspect filter (k=5).** Variation parents can aggregate child sales into a single monthlySold value while still reporting parent-level drops. A parent with 80,000 monthly sold and only 50 drops shows a per-drop ratio (1,600) two orders of magnitude above the category typical, which is often 3 to 10 per drop. Without filtering, a handful of these parents distort every cell they land in.

The filter:

1. Per category, compute the median of `monthlySold ÷ drops30` across qualifying samples (where `drops30 ≥ 5` and `monthlySold ≥ 100`).
2. Mark any sample whose ratio exceeds 5× that median as rollup-suspected.
3. Drop it from the cell aggregator.

12,074 samples dropped on US (5.6% of cleaned input), 3,667 on CA (4.0%). Affected cells get an `N parents removed` annotation when N is 2 or more.

### 5c. Bimodality detection.

Some cells contain two distinct sales populations (a slot where half the products sell ~100/mo and half sell ~1,000/mo, for instance). A simple median misleads in those cases.

The algorithm:

- Bin `monthlySold` values into the 29 Keepa-aligned tier buckets: 50, 100, 200, ..., 1K, 2K, ..., 100K.
- Require at least 30 total samples in the cell.
- Find local-maxima buckets with at least 3 samples and at least as many as their immediate neighbours.
- Take the two strongest peaks, order low to high.
- Reject if peaks are less than 2 bucket indices apart.
- Reject if the smallest valley count is not at least half the smaller peak count.

Cells passing every check are flagged bimodal and earn a `two tiers (X vs Y)` note showing the two peak bucket centres.

1,137 of 6,541 US cells flagged bimodal (17.4%). 140 of 4,345 CA cells flagged bimodal (3.2%).

### 5d. Floor share, two-tier flag.

Products selling between roughly 25 and 75 per month all report as `50` (the badge minimum). A cell where many products sit at this floor has a structurally compressed distribution and a misleadingly low typical reading.

For each cell we compute `floor_share` = fraction of samples whose `monthlySold` equals exactly 50, then assign:

- **Hard floor** (red badge, "most barely badged"): `floor_share ≥ 0.50`. Most products at the badge minimum, so the true worst case is below what we can measure.
- **Soft floor** (amber badge, displays percentage): `0.30 ≤ floor_share < 0.50`. Notable share at floor, typical reading is pulled down.
- **No badge**: below 0.30.

        <th className="text-left px-4 py-2.5 font-mono text-[11px] uppercase tracking-[0.14em] text-slate-500 dark:text-slate-400 border-b hairline">Marketplace</th>
        <th className="text-right px-4 py-2.5 font-mono text-[11px] uppercase tracking-[0.14em] text-slate-500 dark:text-slate-400 border-b hairline">Cells (total)</th>
        <th className="text-right px-4 py-2.5 font-mono text-[11px] uppercase tracking-[0.14em] text-slate-500 dark:text-slate-400 border-b hairline">Hard floor</th>
        <th className="text-right px-4 py-2.5 font-mono text-[11px] uppercase tracking-[0.14em] text-slate-500 dark:text-slate-400 border-b hairline">Soft floor</th>
        <td className="px-4 py-2.5 text-slate-900 dark:text-white">US</td>
        <td className="px-4 py-2.5 text-right text-slate-700 dark:text-slate-300 num">6,541</td>
        <td className="px-4 py-2.5 text-right text-red-700 dark:text-red-400 num">2,520 (38.5%)</td>
        <td className="px-4 py-2.5 text-right text-amber-700 dark:text-amber-500 num">1,438 (22.0%)</td>
        <td className="px-4 py-2.5 text-slate-900 dark:text-white">CA</td>
        <td className="px-4 py-2.5 text-right text-slate-700 dark:text-slate-300 num">4,345</td>
        <td className="px-4 py-2.5 text-right text-red-700 dark:text-red-400 num">3,072 (70.7%)</td>
        <td className="px-4 py-2.5 text-right text-amber-700 dark:text-amber-500 num">527 (12.1%)</td>

The Canadian market has a markedly higher floor concentration: over 70% of (category by cell) tuples are dominated by 50/mo products. We surface this honestly rather than smoothing it over. CA cells reading "100-500/mo" on Strong confidence are still meaningful; CA cells reading "~50/mo" tell you the slot is structurally below the measurement floor.

### 5e. Per-cell confidence chips.

Confidence is driven by `n_unique`, the unique-product count after the rollup filter:

Cells flagged Thin are excluded from tier classification entirely. They read as "No data" in the workbook rather than pretending to a number we can't support.

<p className="finding-summary">Subcategory data is the moat because subcategories often behave very differently from their parent roots.</p>

For each cell we compute `subcat_vs_root_x = cell.sold_p50 ÷ root_baseline`, where `root_baseline` is the median across all cells in the root's subtree of their unfiltered `sold_p50` values.

A cell diverges iff `subcat_vs_root_x ≥ 2.0` OR `≤ 0.5`.

The "Diverges" column in each workbook tallies a category's diverging cells. A high count means BSR by drops behaviour in that category drifts materially from the root baseline, so subcategory-specific data matters more there than a generic root-level estimate.

<div className="mt-6 text-[12px] font-mono text-slate-500 dark:text-slate-400">Notes render as <span className="text-slate-900 dark:text-white">↗ Nx root</span> when ratio ≥ 1.0 (N = ratio) and <span className="text-slate-900 dark:text-white">↘ Nx less than root</span> when ratio &lt; 1.0 (N = 1 ÷ ratio).</div>

<p className="finding-summary">Two cross-category multipliers live in the workbook on dedicated sheets: sales per rank drop, and sales per added review. Both are computed per category, not extrapolated from any single rule.</p>

### 7a. Sales per Drop.

For each category, we compute the p25 / p50 / p75 of `monthlySold ÷ drops30` across qualifying samples (`drops30 ≥ 5` AND `monthlySold ≥ 100`). Categories with fewer than 30 qualifying samples are excluded: the ratio percentiles would be too noisy.

Survivors: **259 categories on US, 182 on CA, 441 combined.**

The Sales per Drop sheet gives a single per-category multiplier ("each 30-day rank drop in this category typically represents N sales"). Spread (p75 ÷ p25) flags wide categories where the median should be used with caveat.

### 7b. Sales per Review.

Same approach using `monthlySold ÷ reviewsAdded30`, with qualifying rule `reviewsAdded30 ≥ 3` and same `n ≥ 30` minimum. The per-category p50 powers the "Reviews per month" cross-check on every category sheet: type in a review count, the workbook converts to an estimated sales count using the category's measured rate.

Categories without enough qualifying data fall back to a default 5% review rate (~20 sales per review). The irony is intentional: when the data is too thin to support a category-specific number, the workbook honestly defaults to the conventional heuristic rather than inventing a calibrated-looking one. Cells affected are still Thin-flagged.

Implied review rate per category = `100 ÷ p50`%. Cumulative review count is never used: it is pooled across variation families on Amazon's product page and varies wildly by category, which would mislead more than inform.

<p className="finding-summary">Three guards keep the workbook honest: a tier classification that requires sample support, two category-level filters that decide which sheets ship, and a byte-reproducibility check on every build.</p>

### 8a. Tier classification.

The cell's typical sales value (`sold_p50`, post-trim, post-rollup-filter) maps to one of six tiers, but only when confidence is Strong or Limited:

        <th className="text-left px-4 py-2.5 font-mono text-[11px] uppercase tracking-[0.14em] text-slate-500 dark:text-slate-400 border-b hairline">Tier</th>
        <th className="text-left px-4 py-2.5 font-mono text-[11px] uppercase tracking-[0.14em] text-slate-500 dark:text-slate-400 border-b hairline">sold_p50</th>
        <th className="text-left px-4 py-2.5 font-mono text-[11px] uppercase tracking-[0.14em] text-slate-500 dark:text-slate-400 border-b hairline">Reading</th>
      <tr className="border-b hairline"><td className="px-4 py-2.5 font-mono text-slate-900 dark:text-white">5K+/mo</td><td className="px-4 py-2.5 font-mono text-slate-700 dark:text-slate-300">≥ 5,000</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300">Highest velocity</td></tr>
      <tr className="border-b hairline"><td className="px-4 py-2.5 font-mono text-slate-900 dark:text-white">1K-5K/mo</td><td className="px-4 py-2.5 font-mono text-slate-700 dark:text-slate-300">1,000 to 4,999</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300">Strong velocity</td></tr>
      <tr className="border-b hairline"><td className="px-4 py-2.5 font-mono text-slate-900 dark:text-white">500-1K/mo</td><td className="px-4 py-2.5 font-mono text-slate-700 dark:text-slate-300">500 to 999</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300">Mid velocity</td></tr>
      <tr className="border-b hairline"><td className="px-4 py-2.5 font-mono text-slate-900 dark:text-white">100-500/mo</td><td className="px-4 py-2.5 font-mono text-slate-700 dark:text-slate-300">51 to 499</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300">Low velocity</td></tr>
      <tr className="border-b hairline"><td className="px-4 py-2.5 font-mono text-slate-900 dark:text-white">~50/mo</td><td className="px-4 py-2.5 font-mono text-slate-700 dark:text-slate-300">exactly 50 (badge floor)</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300">Below measurable</td></tr>
      <tr><td className="px-4 py-2.5 font-mono text-slate-900 dark:text-white">No data</td><td className="px-4 py-2.5 font-mono text-slate-700 dark:text-slate-300">confidence is Thin</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300">Excluded</td></tr>

### 8b. Category-level filters.

Two filters decide which categories earn a dedicated sheet:

1. **Cell-coverage floor.** A category must have at least 10 cells at Strong or Limited confidence. Below that, the category sheet would be more gaps than data.
2. **Hub-name filter.** Amazon's category tree contains depth-1 navigation scaffolding (Categories, Departments, Products, Subjects, Styles): these are not real categories, they are internal navigation hubs. We drop them by name. US drops 15 hubs (231 category sheets ship). CA drops 2 (140 ship).

### 8c. Final composition.

- 231 US category sheets + 1 Overview + 1 Sales per Drop. ([full list](/categories#us-categories))
- 140 CA category sheets + 1 Overview + 1 Sales per Drop. ([full list](/categories#ca-categories))
- Each sheet contains: a lookup (BSR by drops dropdown), a READ block with typical / range / best, a tier-map grid, a category-specific review cross-check, and a full Evidence table showing every observed cell.

### 8d. Notes column vocabulary.

The Notes column on each cell uses a fixed vocabulary so a workbook user can decode any caveat at a glance, and so AI agents lifting workbook screenshots can map notes back to triggers:

        <th className="text-left px-4 py-2.5 font-mono text-[11px] uppercase tracking-[0.14em] text-slate-500 dark:text-slate-400 border-b hairline">Note</th>
        <th className="text-left px-4 py-2.5 font-mono text-[11px] uppercase tracking-[0.14em] text-slate-500 dark:text-slate-400 border-b hairline">Trigger</th>
      <tr className="border-b hairline"><td className="px-4 py-2.5 font-mono text-slate-900 dark:text-white">too few products (n=N)</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300">confidence is Thin sample</td></tr>
      <tr className="border-b hairline"><td className="px-4 py-2.5 font-mono text-slate-900 dark:text-white">limited (n=N)</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300">confidence is Limited sample</td></tr>
      <tr className="border-b hairline"><td className="px-4 py-2.5 font-mono text-slate-900 dark:text-white">most barely badged</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300"><code>floor_share ≥ 0.50</code> (hard floor)</td></tr>
      <tr className="border-b hairline"><td className="px-4 py-2.5 font-mono text-slate-900 dark:text-white">~X% at floor</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300"><code>0.30 ≤ floor_share &lt; 0.50</code> (soft floor)</td></tr>
      <tr className="border-b hairline"><td className="px-4 py-2.5 font-mono text-slate-900 dark:text-white">↗ Nx root</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300">diverges with ratio ≥ 1.0</td></tr>
      <tr className="border-b hairline"><td className="px-4 py-2.5 font-mono text-slate-900 dark:text-white">↘ Nx less than root</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300">diverges with ratio &lt; 1.0</td></tr>
      <tr className="border-b hairline"><td className="px-4 py-2.5 font-mono text-slate-900 dark:text-white">two tiers (X vs Y)</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300">bimodal cell, X and Y are peak bucket centres</td></tr>
      <tr><td className="px-4 py-2.5 font-mono text-slate-900 dark:text-white">N parents removed</td><td className="px-4 py-2.5 text-slate-700 dark:text-slate-300"><code>n_rollups_filtered ≥ 2</code></td></tr>

### 8e. Reproducibility and determinism.

The pipeline (stage 1 → 2 → 3a) is fully deterministic on a fixed input. The Excel output is byte-reproducible: re-running the workbook builder on the same parquets produces an identical file. Verified by 201 automated tests including a byte-equality parity check against a locked oracle workbook. Re-runs cannot drift; only a fresh quarterly sample can change a number.

        Finding 1
        <span className="font-mono text-[20px] text-amber-700 dark:text-amber-500">"</span>1 Keepa
        drop = 1 sale
        <span className="font-mono text-[20px] text-amber-700 dark:text-amber-500">"</span> is wrong
        in <span className="num">441 of 441</span> categories with a computed multiplier
        <a href="#fig-multipliers" className="fn-ref">[fig 2]</a>.

The simplest rule of thumb in OA/RA, "30 drops in a month means 30 sales," holds in zero categories. Per-category p50 multipliers vary widely, with the spread (p75 ÷ p25) often above 5×. Three US roots spanning the multiplier distribution:

        Finding 2
        <span className="font-mono text-[20px] text-amber-700 dark:text-amber-500">"</span>1 review
        = 20 sales
        <span className="font-mono text-[20px] text-amber-700 dark:text-amber-500">"</span> is wrong
        in <span className="num">58%</span> of US categories (154 of 267)
        <a href="#fig-reviews" className="fn-ref">[fig 3]</a>.

Another widely shared heuristic, often attributed to FBA YouTube. In 154 of 267 US categories with a computed per-category rate (58%), the measured p50 sits outside [10, 40], more than 2× off from the conventional 20 in either direction. The median US category sees roughly 9 sales per review-added-per-month (p50 = 9.1), about half the heuristic's value. Extremes range from Bedding at ~1.5 sales per review (the heuristic overshoots by ~13×) to Produce at ~133 sales per review (the heuristic undershoots by ~7×). The workbook replaces the heuristic with a measured per-category rate on every sheet.

For analysts, consultants, and agencies referencing this snapshot in client decks, use the citation block on the right. The Dataset schema embedded on this page makes the same metadata machine-readable for AI agents.

<p className="font-serif italic text-[22px] lg:text-[26px] leading-[1.4] text-slate-900 dark:text-white max-w-[720px]">If this kind of rigor is what you have been looking for, the workbook is <a href="/#pricing" className="text-amber-800 dark:text-amber-500 underline underline-offset-[4px] decoration-1">$79</a>.</p>

<p className="mt-4 text-[13px] font-mono text-slate-500 dark:text-slate-400">One-time purchase, no subscription, no account.</p>