# configs详解——之field

`field`定义一个抽取项, 一个`field`可以定义下面这些东西

#### `name` <a href="#name" id="name"></a>

> 给此项数据起个变量名\
> 变量名中不能包含.\
> 如果抓取到的数据想要以文章或者问答的形式发布到网站(WeCenter,\
> WordPress, Discuz!等), field的命名请参考两个完整demo中的命名, 否则无法发布成功

**String类型 不能为空**

举个栗子:

给`field`起了个名字叫`content`

```
array(
    'name' => "content",
    'selector' => "//*[@id='single-next-link']"
)
```

#### `selector` <a href="#selector" id="selector"></a>

> 定义抽取规则, 默认使用xpath\
> 如果使用其他类型的, 需要指定selector\_type

**String类型 不能为空**

举个栗子:\
使用xpath来抽取糗事百科的笑话内容，selector的值就是内容的xpath

```
array(
    'name' => "content",
    'selector' => "//*[@id='single-next-link']"
)
```

#### `selector_type` <a href="#selectortype" id="selectortype"></a>

抽取规则的类型

> 目前可用[xpath](http://www.w3school.com.cn/xpath/index.asp), [jsonpath](http://www.cnblogs.com/draem0507/p/5111002.html), [regex](http://www.runoob.com/regexp/regexp-tutorial.html)\
> 默认`xpath`

**枚举类型**

栗子1:\
selector默认使用xpath

```
array(
    'name' => "content",
    'selector' => "//*[@id='single-next-link']" // xpath抽取规则
)
```

栗子2:\
使用正则表达式来抽取数据

```
array(
    'name' => "content",
    'selector_type' => 'regex',
    'selector' => '#<div\sclass="content">([^/]+)</div>#i' // regex抽取规则
)
```

#### `required` <a href="#required" id="required"></a>

> 定义该`field`的值是否必须, 默认false\
> 赋值为true的话, 如果该`field`没有抽取到内容, 该field对应的整条数据都将被丢弃

**布尔类型**

举个栗子:

```
array(
    'name' => "content",
    'selector' => "//*[@id='single-next-link']",
    'required' => true
)
```

#### `repeated` <a href="#repeated" id="repeated"></a>

> 定义该`field`抽取到的内容是否是有多项, 默认`false`\
> 赋值为true的话, 无论该`field`是否真的是有多项, 抽取到的结果都是数组结构

**布尔类型**

举个栗子:\
爬取的网页中包含多条评论，所以抽取评论的时候要将repeated赋值为true

```
array(
    'name' => "comments",
    'selector' => "//*[@id='zh-single-question-page']//a[contains(@class,'zm-item-tag')]",
    'repeated' => true
)
```

#### `children` <a href="#children" id="children"></a>

> 为此`field`定义子项\
> 子项的定义仍然是一个`fields`数组\
> 没错, 这是一个树形结构

**数组类型**

举个栗子:\
抓取糗事百科的评论，每个评论爬取了内容，点赞数

```
array(
    'name' => "article_comments",
    'selector' => "//div[contains(@class,'comments-wrap')]",
    'children' => array(
        array(
            'name' => "replay",
            'selector' => "//div[contains(@class,'replay')]",
            'repeated' => true,
        ),
        array(
            'name' => "report",
            'selector' => "//div[contains(@class,'report')]",
            'repeated' => true,
        )
    )
)
```

#### `source_type` <a href="#sourcetype" id="sourcetype"></a>

该field的数据源, 默认从当前的网页中抽取数据\
选择`attached_url`可以发起一个新的请求, 然后从请求返回的数据中抽取\
选择`url_context`可以从当前网页的url附加数据（点此查看“url附加数据”实例解析）中抽取

**枚举类型**

#### `attached_url` <a href="#attachedurl" id="attachedurl"></a>

当source\_type设置为`attached_url`时, 定义新请求的url

**String类型**

举个栗子:\
当爬取的网页中某些内容需要异步加载请求时，就需要使用attached\_url，比如，抓取知乎回答中的评论部分，就是通过AJAX异步请求的数据

```
array(
    'name' => "comment_id",
    'selector' => "//div/@data-aid",
),
array(
    'name' => "comments",
    'source_type' => 'attached_url',
    // "comments"是从发送"attached_url"这个异步请求返回的数据中抽取的
    // "attachedUrl"支持引用上下文中的抓取到的"field", 这里就引用了上面抓取的"comment_id"
    'attached_url' => "https://www.zhihu.com/r/answers/{comment_id}/comments",
    'selector_type' => 'jsonpath'
    'selector' => "$.data",
    'repeated => true,
    'children' => array(
        ...
    )
}
```

### results matching ""

*

### No results matching ""


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://zhai-shi-sansorganization.gitbook.io/phpspider/configs-xiang-jie-zhi-field.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
