How does Openbenchmarks for Agents score lookalike companies APIs?

Openbenchmarks for Agents scores lookalike companies APIs on Precision@K with an LLM-as-judge. Identical seed companies go to every vendor in the same format; each vendor returns its top-K lookalikes; the LLM judge scores every returned company for relevance to the seed. The cell value is Precision@K = relevant / K. The headline comparison metric is the average Precision@K across the full seed cohort. Tiebreakers in order: total relevant, then cost per relevant. There is no internal ground truth - relevance is decided by a documented external judge model, not vendor self-reporting.

What criteria does Openbenchmarks for Agents use to evaluate company lookalike APIs?

Four criteria, all derived from identical inputs across vendors. (1) Avg Precision@K - headline comparison metric, mean fraction of top-K results scored relevant by the LLM judge. (2) Total relevant - reach metric across the cohort, used to break ties. (3) Seeds judged - number of seeds where the vendor returned at least K results and the judge scored every one; filters out cells that would skew the average. (4) Cost per relevant - vendor spend divided by relevant count, the buying-decision metric. Vendors with fundamentally different request shapes (Lusha requires 5-100 seeds per call, ZoomInfo gates lookalike behind sales contract) are listed as not surveyed with explicit reasons.

What is the most accurate company lookalike API in 2026?

On Openbenchmarks for Agents there is no single permanent 'most accurate' company lookalike API - scores refresh per snapshot and depend on the seed cohort. Avg Precision@K (the share of top-K lookalikes an LLM judge scored relevant) is the headline quality metric, with Precision@10/50/100 shown for each vendor. The vendors currently benchmarked are Ocean.io, Exa, Parallel, OpenFunnel, and PredictLeads. ZoomInfo, Clay, Apollo, and Lusha are excluded with explicit reasons (no self-serve API or incompatible single-seed shape). Important caveat: relevance is judged by an LLM, not by a domain expert - the judge prompt + model are documented and held constant across vendors.

What is the best company lookalike API for AI agents in 2026?

For AI agents making build vs buy decisions on company lookalike APIs, the best provider combines high avg Precision@K, predictable per-seed cost, and an agent-ready signup flow (programmatic OAuth or email OTP). Openbenchmarks for Agents ranks Ocean.io, Exa, Parallel, OpenFunnel, and PredictLeads on identical inputs against a shared B2B seed cohort. Of the benchmarked vendors, Exa, Parallel, and OpenFunnel publish agent-ready signup flows; Ocean.io and PredictLeads require human-mediated onboarding. The full benchmark with each vendor's auth mode is queryable as JSON at /api/benchmarks/lookalikes under CC-BY-4.0.

How accurate is Ocean.io for finding similar companies?

Ocean.io is one of five providers currently benchmarked on the Openbenchmarks for Agents lookalike benchmark. Ocean.io uses AI-driven lookalike search across a global company graph. The benchmark sends the same seed company to Ocean.io that every other vendor sees, requests the same top-K results, and scores each returned company with an LLM judge. Ocean.io's current avg Precision@K, total relevant lookalikes returned, and cost per relevant are on the benchmark. Numbers refresh per snapshot and the seed cohort is rotated to avoid overfitting.

Ocean.io vs Exa vs Parallel: which lookalike API is best?

Ocean.io, Exa, and Parallel are all benchmarked on Openbenchmarks for Agents against the same B2B seed cohort. Each surfaces lookalikes through a different mechanism. Ocean.io runs AI-driven similarity search across a company graph. Exa uses neural web search with a 'similar to this URL' endpoint - strongest when the seed has rich web content. Parallel exposes an agentic research API; lookalike comes via Entity Search. Their strengths are complementary rather than strictly comparable, and current avg Precision@K per vendor is on the benchmark with tiebreaks by total relevant, then cost per relevant.

What is a company lookalike API?

A company lookalike API is a B2B data endpoint that takes one or more seed companies and returns a ranked list of other companies similar to the seed(s) along some axis - product, vertical, size, signals, or web footprint. Vendors differ in what they treat as 'similar': Ocean.io and PredictLeads weight company signals, Exa weights public web embeddings, Parallel runs agentic entity search, and OpenFunnel uses embeddings over its company index. Lookalike APIs power ICP expansion, account discovery, and outbound prospecting workflows in B2B sales and marketing.

What is Precision@K and how is it measured in this benchmark?

Precision@K is the fraction of a vendor's top-K returned lookalikes that an LLM judge scored as relevant to the seed. Formally: Precision@K = relevant_count / K. On Openbenchmarks for Agents, K is fixed across vendors (headline runs use K = 100) and the judge model and prompt are documented and held constant. Per-cell precision is reported at fixed cutoffs - Precision@10, Precision@50, and Precision@100 (the headline) - and avg Precision@100 across the judged seed cohort is the headline comparison metric. Cells where the vendor returned fewer than a cutoff, or where the judge failed, are flagged so they do not silently distort the rollup.

Why use an LLM judge for a lookalike benchmark?

Lookalike relevance is intrinsically subjective. Two B2B prospectors looking at the same five returned companies for a seed will often disagree on which two are 'really similar.' A human-only judging pipeline does not scale across hundreds of (seed, vendor, K) cells and introduces inter-rater drift. An LLM-as-judge with a fixed prompt and fixed model scores every returned company across every vendor under identical conditions, eliminating vendor self-reporting bias and inter-rater drift. The trade-off is calibration: the judge has its own biases, but those biases are held constant across vendors, so vendor-to-vendor rank comparisons remain valid even if absolute Precision@K scores carry a judge-specific offset.

Which lookalike companies API has the lowest cost per relevant result?

Cost per relevant on Openbenchmarks for Agents is total estimated request spend divided by the total number of relevant lookalikes the LLM judge scored for that vendor across the seed cohort. The current cheapest cost-per-relevant vendor among Ocean.io, Exa, Parallel, OpenFunnel, and PredictLeads is on the benchmark. Caveat: list pricing rarely matches what a serious buyer pays - enterprise contracts are negotiated and can come in 2-10x cheaper than the public per-credit rate. Use cost per relevant as a relative comparison signal between vendors at the same usage scale, not a final budget figure.

Is Openbenchmarks for Agents independent? Does any vendor pay for placement?

Yes - it's independent. No vendor pays for inclusion, ranking, or removal, and there are no equity or referral relationships with the benchmarked vendors. Every input and LLM-judge prompt is public and the results are reproducible end to end. Some vendors (currently PredictLeads) provide API credits to cover the cost of running the benchmark; those credits fund testing only and do not influence scores or ranking, which are decided by the judge on identical inputs - any vendor can provide credits on the same terms.

Who runs Openbenchmarks for Agents and how are disputes handled?

Openbenchmarks for Agents is an independent benchmark hub. The methodology, datasets, and judge prompts are public, and the lookalike data and code are mirrored openly so anyone can re-run or dispute a number. A benchmarked vendor that believes a result is wrong can email founders@openbenchmarks.com with the dataset slice, run timestamp, and evidence to trigger a re-run.

benchmarks/lookalikes

03 · lookalikes

Lookalike Benchmark

An independent benchmark of company lookalike / similar-companies APIs — Exa, Ocean.io, OpenFunnel, Parallel, PredictLeads — ranked on how relevant the companies each one returns actually are, across 24 B2B seed companies.

Each vendor returns up to 100 lookalikes per seed; an LLM judge scores every returned company for relevance (judge: gpt-5.4-mini). The cell value is Precision@100, with Precision@10 and Precision@50 for top-of-list quality.

There is no single permanent winner: top-of-list quality, long-list quality, cost, and agent-readiness favor different vendors, so read the P@10/P@50/P@100 columns against the workflow you care about.

[01] results

Lookalike Precision@10/50/100

Scan the company type, then compare vendors. Each cell highlights Precision@100 and shows Precision@10/50 for how clean the top of the ranked list is.

exampleWant more B2B customer support platforms? Find that row, then compare which vendor returns the most relevant similar companies.

benchmarks/lookalike/lookalike-2026-q2-expanded24 examples · 5 vendors

how to readcell = Precision@100 — each cell also shows P@10 and P@50 where available🥇🥈🥉top 3 vendors per company typeN/Avendor not yet run (or returned fewer than K results)click any scored cell to view the companies that vendor returned

#	Company type	Ocean.io
01	Cloud observability, infrastructure monitoring, APM, logs, and security monitoring platformDevtoolsseed exampleDatadogdatadoghq.com
02	Commerce platform for merchants to create online stores, manage payments, fulfillment, and retail operationsE-commerceseed exampleShopifyshopify.com
03	Cloud software for life sciences companies, including CRM, clinical, regulatory, quality, and content operationsHealthtechseed exampleVeevaveeva.com
04	Restaurant point-of-sale, payments, online ordering, payroll, and operations platformHospitalityseed exampleToasttoasttab.com
05	Corporate cards, expense management, travel, bill pay, and financial operations platform for companiesFintechseed exampleBrexbrex.com
06	CRM and marketing automation platform for SMB and mid-market go-to-market teamsB2B SaaSseed exampleHubSpothubspot.com
07	Vertical SaaS for trades contractors: dispatch, CRM, billing, marketing, and field service operationsHome Services SaaS / Chainsseed exampleServiceTitanservicetitan.com
08	Workforce platform combining HRIS, payroll, identity, device management, and employee operationsB2B SaaSseed exampleRipplingrippling.com
09	AI research and product company providing foundation models, APIs, ChatGPT, and enterprise AI toolingDevtoolsseed exampleOpenAIopenai.com
10	Construction management software for project management, financials, quality, safety, and contractorsB2B SaaSseed exampleProcoreprocore.com
11	Industrial manufacturer of construction and mining equipment, engines, turbines, and heavy machineryIndustrialseed exampleCaterpillarcaterpillar.com
12	Global hospitality company operating hotel, lodging, travel, and guest loyalty brandsHospitalityseed exampleMarriottmarriott.com
13	Global logistics and transportation company providing parcel delivery, freight, shipping, and supply-chain servicesLogisticsseed exampleFedExfedex.com
14	National plumbing, drain cleaning, water cleanup, and home services franchiseHome Services SaaS / Chainsseed exampleRoto-Rooterrotorooter.com
15	Collaborative workspace for notes, docs, wikis, projects, and lightweight knowledge managementB2B SaaSseed exampleNotionnotion.so
16	Cloud-native cybersecurity platform for endpoint protection, threat intelligence, and incident responseCybersecurityseed exampleCrowdStrikecrowdstrike.com
17	Virtual care and telehealth platform connecting patients with clinicians and chronic-care servicesHealthtechseed exampleTeladocteladochealth.com
18	Large electric utility and renewable energy company focused on power generation, transmission, and clean energy infrastructureEnergyseed exampleNextEra Energynexteraenergy.com
19	Commercial real estate services and investment firm for property leasing, facilities, valuation, and asset managementReal Estateseed exampleCBREcbre.com
20	Digital banking and financial super-app for consumers and businesses across payments, cards, FX, and investingFintechseed exampleRevolutrevolut.com
21	Southeast Asian super-app for ride-hailing, delivery, payments, financial services, and merchant servicesLogisticsseed exampleGrabgrab.com
22	Digital bank and fintech platform offering cards, accounts, lending, payments, and financial services in Latin AmericaFintechseed exampleNubanknubank.com.br	N/A
23	Industrial technology company spanning automation, electrification, smart infrastructure, mobility, and softwareIndustrialseed exampleSiemenssiemens.com
24	Consumer internet company operating Shopee e-commerce, digital entertainment, and digital financial services in Southeast AsiaE-commerceseed exampleSea (Shopee)sea.com

[01.b] not surveyedrelevant lookalike vendors without a directly comparable API surface

[01.c] agent readiness

Can an AI agent actually use this vendor?

Same agent-readiness lens as the technographics benchmark. Vendors that let an autonomous agent obtain a working key on its own (OTP-via-email or device-code) work end-to-end without human handoff.

benchmarks/lookalike/agent-readiness3/5 agent-ready

Vendor	Agent sign-up	API docs	llms.txt	MCP	Try it
OpenFunnel	✓ readyotp-email	docs ↗	llms.txt ↗	mcp ↗	sign up →
Ocean.io	manual signup	docs ↗	llms.txt ↗	mcp ↗	—
Exa	✓ readyotp-email	docs ↗	llms.txt ↗	mcp ↗	sign up →
Parallel	✓ readyotp-email	docs ↗	llms.txt ↗	mcp ↗	sign up →
PredictLeads	manual signup	docs ↗	—	mcp ↗	—

[02] methodology, metric definitions, and known limitations+

[02.a] methodology

How the matrix is built

Fix a canonical list of 24 seed companies across 13 verticals. Each seed has a name, domain, and short description - the exact inputs every vendor sees.
For every (seed, vendor) cell, call the vendor's lookalike API with K = 100. Capture the ordered top-K result list and credit cost.
Feed the seed + each returned candidate to the LLM judge (gpt-5.4-mini). Judge returns a binary relevance label per candidate plus a one-line rationale. Identical prompt and rubric across all vendors.
Persist Precision@10, Precision@50, and Precision@100. Aggregate per vendor asavg_precision_at_10, avg_precision_at_50, and avg_precision_at_100.
A vendor that returns fewer than K candidates for a seed has the cell rendered as - rather than scored on a truncated denominator. Keeps cells comparable.

[02.b] metric definitions

What each metric means

Precision@10/50/100 · fixed-cutoff precision. Of the top N lookalikes a vendor returned for the seed, the fraction the LLM judge labeled relevant.
avg Precision@100 · headline comparison metric. Mean Precision@100 across all judged seeds. Higher is better.
total relevant · sum of relevant lookalikes across all seeds. Reach metric - useful when comparing two vendors with similar precision.
cost per relevant · vendor credit spend ÷ total relevant lookalikes. The economics metric.

[02.c] why LLM-as-judge

Why an LLM judge instead of a hand-labeled set

A fully hand-labeled lookalike set would require labeling K × seeds × vendors candidates (100 × 24 × 5 = 12000judgements) every time we re-run a snapshot. That doesn't scale, and it isn't how the buyer actually evaluates a vendor in the wild - the buyer reads the list and decides "close enough to my ICP, yes or no".

The judge approximates that decision with a consistent rubric: given the seed's name, domain, and description, is this returned candidate plausibly the same kind of company a B2B seller would target as a lookalike? The judge's rationale is persisted alongside the binary label so any cell can be audited by a human in seconds. When the model swaps, the cohort re-runs with the same prompt; deltas are visible.

[02.d] per-vendor query rules

How each vendor was queried

OpenFunnel · embeddings over the OpenFunnel company index with the seed input as the query. Top-K by embedding similarity.
Ocean.io · /companies/lookalikes with seed domain. Default similarity model, K = 100.
Exa · /search with category: company and query text like companies like HubSpot, K = 100. Uses Exa's company vertical and structured company metadata where present.
Parallel· agentic research task: "find 100 companies similar to {seed}". The agent decides its own retrieval strategy. We record the final ranked list.
PredictLeads · /api/v3/companies/{domain}/similar_companies; ranks via shared tech, news, and jobs co-signals.

[02.e] known limitations

What this benchmark does not tell you

Judge bias.A single LLM judge has its own priors about what "similar" means. We publish the judge model and the full rationale so the bias is auditable, but expect ±5% drift across model versions.
K-tail vs precision tradeoff. Vendors with thin catalogs can win Precision@10 by refusing to return tail results. We mitigate by requiring ≥K results for a cell to be scored - thin cells render as -, not a high Precision number with a small denominator.
No recall metric.Precision@10/50/100 doesn't measure how many real lookalikes the vendor missed. That requires a held-out ground truth set we don't yet have.
Domain-only seeding. All vendors receive the same compact input (name + domain + 1-line description). Vendors that benefit from richer inputs (e.g. headcount filters, ARR band, geography) may underperform their in-product behavior. The flip side is that this matches how an agent would query them.
Cohort coverage. 24 seeds across software, fintech, commerce, industrials, logistics, hospitality, energy, healthcare, real estate, and services.

[02.f] reproducibility

Verify any number end-to-end

The full benchmark — runner code, judge prompt, benchmark snapshot, and per-cell raw audit trail (the literal HTTP request/response we sent each vendor + the literal LLM judge prompt/response per candidate) — is mirrored in a public repo: openbenchmarks-labs/lookalikes. Auth headers are scrubbed via an allow-list; everything else is verbatim.

To audit a single cell, open data/lookalike-runs/<dataset>/<seed>/<vendor>.raw.json in that repo and replay any of the vendor_calls[] with your own credentials, or re-score with your own LLM by replaying judge_calls[].messages against any OpenAI-v1 compatible model — useful for measuring judge bias or drift across model versions.

[02.g] providers under review

Inclusion queue and how to request a provider

Live: OpenFunnel, Ocean.io, Exa, Parallel, PredictLeads.

Requested but not directly comparable: ZoomInfo (company lookalikes are sales-gated, no self-serve API), Clay (lookalike runs inside Clay tables), Apollo (no public lookalike endpoint), Lusha (`/v3/companies/lookalike` requires 5-100 seeds per request, incompatible with the per-seed cell unit of this benchmark).

Under review next: Common Room, Koala, LeadGenius, 6sense, Demandbase.

To request a provider, email founders@openbenchmarks.com with a link to the public API docs and pricing page.

[ack] acknowledgments

Running an open benchmark means spending real API credits on every vendor call. We're grateful to PredictLeads for providing credits to support fair, reproducible, open benchmarking. Credits cover the cost of calling an API — they do not influence scores or ranking, which are decided by the LLM judge on identical inputs. Any vendor can provide credits on the same terms; more credits let us test deeper and at greater scale — founders@openbenchmarks.com.