# 如何提前生成列表页URL再提取内容？

> 通常情况下，爬虫会从起始页(scan\_urls)开始通过列表页规则(list\_url\_regexes)寻找列表页，内容页同理，但是很多时候，第三方网站为了防止采集，会采用ajax的方式，不把列表页直接显式放在页面内容，而是通过js生成，又或者是直接显示前10页，因为正常的用户也只需要浏览前10页的数据就够了，现在我们针对这两种方式来看看抓取方法

只显示前10页的网页我们可以先生成列表页URL入队列

```
$configs = array(
    // configs的其他成员
    ...
    'scan_urls' => array(
        'https://www.itjuzi.com/investfirm?user_id=305129'
    ),
    'list_url_regexes' => array(
        "https://www.itjuzi.com/investfirm\?user_id=305129&page=\d+"
    ),
    ...
);

$spider->on_start = function ($spider) 
{
    // 生成列表页URL入队列
    for ($i = 0; $i <= 652; $i++) 
    {
        $url = "https://www.itjuzi.com/investfirm?user_id=305129&page={$i}";
        $spider->add_url($url);
    }
};
```

### results matching ""

*

### No results matching ""


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://zhai-shi-sansorganization.gitbook.io/phpspider/ru-he-ti-qian-sheng-cheng-lie-biao-ye-url-zai-ti-qu-nei-rong.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
