# 第一个demo

爬虫采用PHP编写, 下面以糗事百科为例, 来看一下我们的爬虫长什么样子:

#### 安装 <a href="#an-zhuang" id="an-zhuang"></a>

**1、通过GitHub下载**

```
require_once __DIR__ . '/../autoloader.php';
use phpspider\core\phpspider;
```

**2、通过composer下载**

```
composer require owner888/phpspider
```

```
require './vendor/autoload.php';
use phpspider\core\phpspider;
```

**3、加上一段很讨厌的注释，别问我为什么，我就是这么讨厌 ^\_^**

```
/* Do NOT delete this comment */
/* 不要删除这段注释 */
```

```
$configs = array(
    'name' => '糗事百科',
    'domains' => array(
        'qiushibaike.com',
        'www.qiushibaike.com'
    ),
    'scan_urls' => array(
        'http://www.qiushibaike.com/'
    ),
    'content_url_regexes' => array(
        "http://www.qiushibaike.com/article/\d+"
    ),
    'list_url_regexes' => array(
        "http://www.qiushibaike.com/8hr/page/\d+\?s=\d+"
    ),
    'fields' => array(
        array(
            // 抽取内容页的文章内容
            'name' => "article_content",
            'selector' => "//*[@id='single-next-link']",
            'required' => true
        ),
        array(
            // 抽取内容页的文章作者
            'name' => "article_author",
            'selector' => "//div[contains(@class,'author')]//h2",
            'required' => true
        ),
    ),
);
$spider = new phpspider($configs);
$spider->start();
```

爬虫的整体框架就是这样, 首先定义了一个$configs数组, 里面设置了待爬网站的一些信息, 然后通过调用`$spider = new phpspider($configs);`和`$spider->start();`来配置并启动爬虫.

**运行界面如下:**

![](/files/3BpzKNRdnfX8fFcb9QIf)

$configs对象如何定义, 后面会作详细介绍.^\_^

### results matching ""

*

### No results matching ""


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://zhai-shi-sansorganization.gitbook.io/phpspider/di-yi-ge-demo.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
