跪拜 Guibai
← All articles
Data Analysis

A Chinese Business Data Site Has Zero Anti-Scraping — Here's the Full Batch Extraction Code

By 倔强的石头_ ·
Read original on juejin.cn ↗ Google Translate ↗ Alt translation

This is a rare glimpse into how exposed some Chinese business data platforms still are. For Western developers doing market research, lead generation, or data cleaning on Chinese companies, this represents a low-friction entry point — but one that could vanish overnight. The pattern also highlights a broader trend: as Chinese data regulation tightens, such open windows are closing fast.

Summary

Jinghai Data (kqdaas.com) is a Chinese business information lookup site that, as of this writing, has essentially no anti-scraping protections. A developer discovered that a simple requests.get returns complete HTML with company registration data — no CAPTCHA, no redirect, no IP block, no rate limit warnings.

The site exposes fields like company name, unified social credit code, legal representative, registered capital, establishment date, address, operating status, industry, and business scope. The developer published a Python scraper that uses requests + BeautifulSoup to batch search by keywords (e.g., "tech", "information", "data") and outputs results to CSV. The scraper handles the site's Next.js Server Action POST endpoint, parsing RSC stream responses to extract records.

The post notes this lax state is likely temporary and advises acting quickly. It also warns about compliance — check robots.txt, add polite delays, and use data only for lawful purposes. For deeper data like judicial risks, bidding records, or IP filings, the site offers a RESTful API with 1,000 free credits on registration.

Takeaways
Jinghai Data (kqdaas.com) returns full HTML with no CAPTCHA, IP blocking, or rate limiting.
Basic fields available: company name, unified social credit code, legal representative, registered capital, establishment date, address, operating status, industry, business scope.
A Python scraper using requests and BeautifulSoup can batch search by keywords and output to CSV.
The site uses a Next.js Server Action POST endpoint; the scraper must parse RSC stream responses.
The developer ran large batches without hitting any restrictions.
The post advises acting quickly because the lax state is likely temporary.
Compliance warnings: check robots.txt, add polite delays (time.sleep(1)), and use data only for lawful purposes.
For deeper data (judicial risks, bidding, IP), the site offers a RESTful API with 1,000 free credits on registration.
Conclusions

The complete absence of anti-scraping on a live business data platform is unusual and suggests either a new or neglected deployment.

The developer's choice to publish a full working scraper alongside a promotional link to the site's paid API creates an interesting tension between free data extraction and commercial upsell.

The reliance on Next.js Server Actions and RSC streaming for search responses means the scraper must reverse-engineer a modern frontend framework's internal protocol, not just parse a standard API.

The 1,000 free API credits being 'universal across the entire platform' is a notable selling point — most Chinese data APIs offer only limited free tiers on specific endpoints.

The post's timing advice ('act before it's gone') reflects a pragmatic understanding that Chinese data regulation and platform security are rapidly evolving.

Concepts & terms
Next.js Server Action
A feature in the Next.js framework that allows server-side functions to be called directly from client components, often using a POST endpoint with a custom 'next-action' header. The response is typically in RSC (React Server Components) stream format, not standard JSON.
RSC (React Server Components) stream
A serialization format used by React to send server-rendered component output to the client. It is not human-readable JSON but a line-delimited protocol where each line starts with an ID and contains JSON payloads.
Unified Social Credit Code (统一社会信用代码)
An 18-character alphanumeric identifier assigned to every legal entity in China, similar to a tax ID or DUNS number. It is the primary key for business registration data.
Source: juejin.cn ↗ Google Translate ↗ Backup ↗