channel > item` and comments are
* stored in `rss > channel > item > `wp:comment`.
*
* The `$wxr->next_entity()` method stream-parses the next entity from the WXR document and
* exposes it to the API consumer via `$wxr->get_entity_type()` and `$wxr->get_entity_date()`.
* The next call to `$wxr->next_entity()` remembers where the parsing has stopped and parses
* the next entity after that point.
*
* Example:
*
* $reader = WP_WXR_Entity_Reader::create_for_streaming();
*
* // Add data as it becomes available
* $reader->append_bytes( fread( $file_handle, 65536 ) );
*
* // Process entities
* while ( $reader->next_entity() ) {
* switch ( $wxr_reader->get_entity_type() ) {
* case 'post':
* // ... process post ...
* break;
*
* case 'comment':
* // ... process comment ...
* break;
*
* case 'site_option':
* // ... process site option ...
* break;
*
* // ... process other entity types ...
* }
* }
*
* // Check if we need more input
* if ( $reader->is_paused_at_incomplete_input() ) {
* // Add more data and continue processing
* $reader->append_bytes( fread( $file_handle, 65536 ) );
* }
*
* The next_entity() -> fread -> break usage pattern may seem a bit tedious. This is expected. Even
* if the WXR parsing part of the WP_WXR_Entity_Reader offers a high-level API, working with byte streams
* requires reasoning on a much lower level. The StreamChain class shipped in this repository will
* make the API consumption easier with its transformation–oriented API for chaining data processors.
*
* Similarly to `WP_XML_Processor`, the `WP_WXR_Entity_Reader` enters a paused state when it doesn't
* have enough XML bytes to parse the entire entity.
*
* ## Caveats
*
* ### Extensibility
*
* `WP_WXR_Entity_Reader` ignores any XML elements it doesn't recognize. The WXR format is extensible
* so in the future the reader may start supporting registration of custom handlers for unknown
* tags in the future.
*
* ### Nested entities intertwined with data
*
* `WP_WXR_Entity_Reader` flushes the current entity whenever another entity starts. The upside is
* simplicity and a tiny memory footprint. The downside is that it's possible to craft a WXR
* document where some information would be lost. For example:
*
* ```xml
*
*
*
* Page with comments
* http://wpthemetestdata.wordpress.com/about/page-with-comments/
*
* _wp_page_template
*
*
* 146
*
*
*
* ```
*
* `WP_WXR_Entity_Reader` would accumulate post data until the `wp:post_meta` tag. Then it would emit a
* `post` entity and accumulate the meta information until the `` closer. Then it
* would advance to `` and **ignore it**.
*
* This is not a problem in all the `.wxr` files I saw. Still, it is important to note this limitation.
* It is possible there is a `.wxr` generator somewhere out there that intertwines post fields with post
* meta and comments. If this ever comes up, we could:
*
* * Emit the `post` entity first, then all the nested entities, and then emit a special `post_update` entity.
* * Do multiple passes over the WXR file – one for each level of nesting, e.g. 1. Insert posts, 2. Insert Comments, 3. Insert comment meta
*
* Buffering all the post meta and comments seems like a bad idea – there might be gigabytes of data.
*
* ## Remaining work
*
* @TODO:
*
* - Revisit the need to implement the Iterator interface.
*
* @since WP_VERSION
*/
class WXREntityReader implements EntityReader {
/**
* The XML processor used to parse the WXR file.
*
* @since WP_VERSION
* @var WP_XML_Processor
*/
private $xml;
/**
* The name of the XML tag containing information about the WordPress entity
* currently being extracted from the WXR file.
*
* @since WP_VERSION
* @var string|null
*/
private $entity_tag;
/**
* The name of the current WordPress entity, such as 'post' or 'comment'.
*
* @since WP_VERSION
* @var string|null
*/
private $entity_type;
/**
* The data accumulated for the current entity.
*
* @since WP_VERSION
* @var array
*/
private $entity_data;
/**
* The byte offset of the current entity in the original input stream.
*
* @since WP_VERSION
* @var int
*/
private $entity_opener_byte_offset;
/**
* Whether the current entity has been emitted.
*
* @since WP_VERSION
* @var bool
*/
private $entity_finished = false;
/**
* The number of entities read so far.
*
* @since WP_VERSION
* @var int
*/
private $entities_read_so_far = 0;
/**
* The attributes from the last opening tag.
*
* @since WP_VERSION
* @var array
*/
private $last_opener_attributes = array();
/**
* The ID of the last processed post.
*
* @since WP_VERSION
* @var int|null
*/
private $last_post_id = null;
/**
* The ID of the last processed comment.
*
* @since WP_VERSION
* @var int|null
*/
private $last_comment_id = null;
/**
* Buffer for accumulating text content between tags.
*
* @since WP_VERSION
* @var string
*/
private $text_buffer = '';
/**
* Stream to pull bytes from when the input bytes are exhausted.
*
* @var WP_Byte_Producer
*/
private $upstream;
/**
* Whether the reader has finished processing the input stream.
*
* @var bool
*/
private $is_finished = false;
/**
* Mapping of WXR tags representing site options to their WordPress options names.
* These tags are only matched if they are children of the element.
*
* @since WP_VERSION
* @var array
*/
private $known_site_options = array();
/**
* Mapping of WXR tags to their corresponding entity types and field mappings.
*
* @since WP_VERSION
* @var array
*/
private $known_entities = array();
public static function create( ?ByteReadStream $upstream = null, $cursor = null, $options = array() ) {
$xml_cursor = null;
if ( null !== $cursor ) {
$cursor = json_decode( $cursor, true );
if ( false === $cursor ) {
_doing_it_wrong(
__METHOD__,
'Invalid cursor provided for WP_WXR_Entity_Reader::create().',
null
);
return false;
}
$xml_cursor = $cursor['xml'];
}
$xml = XMLProcessor::create_for_streaming( '', $xml_cursor );
$reader = new WXREntityReader( $xml, $options );
if ( null !== $cursor ) {
$reader->last_post_id = $cursor['last_post_id'];
$reader->last_comment_id = $cursor['last_comment_id'];
}
if ( null !== $upstream ) {
$reader->connect_upstream( $upstream );
if ( null !== $cursor ) {
if ( ! isset( $cursor['upstream'] ) ) {
// No upstream cursor means we've processed the
// entire input stream.
$xml->input_finished();
$xml->next_token();
} else {
$upstream->seek( $cursor['upstream'] );
}
}
}
return $reader;
}
/**
* Constructor.
*
* @param XMLProcessor $xml The XML processor to use.
*
* @since WP_VERSION
*/
protected function __construct( XMLProcessor $xml, $options = array() ) {
$this->xml = $xml;
if ( isset( $options['known_site_options'] ) || isset( $options['known_entities'] ) ) {
$this->known_site_options = isset( $options['known_site_options'] ) ? $options['known_site_options'] : array();
$this->known_entities = isset( $options['known_entities'] ) ? $options['known_entities'] : array();
return;
}
// Every XML element is a combination of a long-form namespace and a
// local element name, e.g. a syntax could actually refer
// to a (https://wordpress.org/export/1.0/, post_id) element.
//
// Namespaces are paramount for parsing XML and cannot be ignored. Elements
// element must be matched based on both their namespace and local name.
//
// Unfortunately, different WXR files defined the `wp` namespace in a different way.
// Folks use a mixture of HTTP vs HTTPS protocols and version numbers. We must
// account for all possible options to parse these documents correctly.
$wxr_namespaces = array(
'http://wordpress.org/export/1.0/',
'https://wordpress.org/export/1.0/',
'http://wordpress.org/export/1.1/',
'https://wordpress.org/export/1.1/',
'http://wordpress.org/export/1.2/',
'https://wordpress.org/export/1.2/',
);
$this->known_entities = array(
'item' => array(
'type' => 'post',
'fields' => array(
'title' => 'post_title',
'link' => 'link',
'guid' => 'guid',
'description' => 'post_excerpt',
'pubDate' => 'post_published_at',
'{http://purl.org/dc/elements/1.1/}creator' => 'post_author',
'{http://purl.org/rss/1.0/modules/content/}encoded' => 'post_content',
'{http://wordpress.org/export/1.0/excerpt/}encoded' => 'post_excerpt',
'{http://wordpress.org/export/1.1/excerpt/}encoded' => 'post_excerpt',
'{http://wordpress.org/export/1.2/excerpt/}encoded' => 'post_excerpt',
),
),
);
foreach ( $wxr_namespaces as $wxr_namespace ) {
$this->known_site_options = array_merge(
$this->known_site_options,
array(
'{' . $wxr_namespace . '}base_blog_url' => 'home',
'{' . $wxr_namespace . '}base_site_url' => 'siteurl',
'title' => 'blogname',
)
);
$this->known_entities['item']['fields'] = array_merge(
$this->known_entities['item']['fields'],
array(
'{' . $wxr_namespace . '}post_id' => 'post_id',
'{' . $wxr_namespace . '}status' => 'post_status',
'{' . $wxr_namespace . '}post_date' => 'post_date',
'{' . $wxr_namespace . '}post_date_gmt' => 'post_date_gmt',
'{' . $wxr_namespace . '}post_modified' => 'post_modified',
'{' . $wxr_namespace . '}post_modified_gmt' => 'post_modified_gmt',
'{' . $wxr_namespace . '}comment_status' => 'comment_status',
'{' . $wxr_namespace . '}ping_status' => 'ping_status',
'{' . $wxr_namespace . '}post_name' => 'post_name',
'{' . $wxr_namespace . '}post_parent' => 'post_parent',
'{' . $wxr_namespace . '}menu_order' => 'menu_order',
'{' . $wxr_namespace . '}post_type' => 'post_type',
'{' . $wxr_namespace . '}post_password' => 'post_password',
'{' . $wxr_namespace . '}is_sticky' => 'is_sticky',
'{' . $wxr_namespace . '}attachment_url' => 'attachment_url',
)
);
$this->known_entities = array_merge(
$this->known_entities,
array(
'{' . $wxr_namespace . '}comment' => array(
'type' => 'comment',
'fields' => array(
'{' . $wxr_namespace . '}comment_id' => 'comment_id',
'{' . $wxr_namespace . '}comment_author' => 'comment_author',
'{' . $wxr_namespace . '}comment_author_email' => 'comment_author_email',
'{' . $wxr_namespace . '}comment_author_url' => 'comment_author_url',
'{' . $wxr_namespace . '}comment_author_IP' => 'comment_author_IP',
'{' . $wxr_namespace . '}comment_date' => 'comment_date',
'{' . $wxr_namespace . '}comment_date_gmt' => 'comment_date_gmt',
'{' . $wxr_namespace . '}comment_content' => 'comment_content',
'{' . $wxr_namespace . '}comment_approved' => 'comment_approved',
'{' . $wxr_namespace . '}comment_type' => 'comment_type',
'{' . $wxr_namespace . '}comment_parent' => 'comment_parent',
'{' . $wxr_namespace . '}comment_user_id' => 'comment_user_id',
),
),
'{' . $wxr_namespace . '}commentmeta' => array(
'type' => 'comment_meta',
'fields' => array(
'{' . $wxr_namespace . '}meta_key' => 'meta_key',
'{' . $wxr_namespace . '}meta_value' => 'meta_value',
),
),
'{' . $wxr_namespace . '}author' => array(
'type' => 'user',
'fields' => array(
'{' . $wxr_namespace . '}author_id' => 'ID',
'{' . $wxr_namespace . '}author_login' => 'user_login',
'{' . $wxr_namespace . '}author_email' => 'user_email',
'{' . $wxr_namespace . '}author_display_name' => 'display_name',
'{' . $wxr_namespace . '}author_first_name' => 'first_name',
'{' . $wxr_namespace . '}author_last_name' => 'last_name',
),
),
'{' . $wxr_namespace . '}postmeta' => array(
'type' => 'post_meta',
'fields' => array(
'{' . $wxr_namespace . '}meta_key' => 'meta_key',
'{' . $wxr_namespace . '}meta_value' => 'meta_value',
),
),
'{' . $wxr_namespace . '}term' => array(
'type' => 'term',
'fields' => array(
'{' . $wxr_namespace . '}term_id' => 'term_id',
'{' . $wxr_namespace . '}term_taxonomy' => 'taxonomy',
'{' . $wxr_namespace . '}term_slug' => 'slug',
'{' . $wxr_namespace . '}term_parent' => 'parent',
'{' . $wxr_namespace . '}term_name' => 'name',
),
),
'{' . $wxr_namespace . '}tag' => array(
'type' => 'tag',
'fields' => array(
'{' . $wxr_namespace . '}term_id' => 'term_id',
'{' . $wxr_namespace . '}tag_slug' => 'slug',
'{' . $wxr_namespace . '}tag_name' => 'name',
'{' . $wxr_namespace . '}tag_description' => 'description',
),
),
'{' . $wxr_namespace . '}category' => array(
'type' => 'category',
'fields' => array(
'{' . $wxr_namespace . '}category_nicename' => 'slug',
'{' . $wxr_namespace . '}category_parent' => 'parent',
'{' . $wxr_namespace . '}cat_name' => 'name',
'{' . $wxr_namespace . '}category_description' => 'description',
),
),
)
);
}
}
public function get_reentrancy_cursor() {
/**
* @TODO: Instead of adjusting the XML cursor internals, adjust the get_reentrancy_cursor()
* call to support $bookmark_name, e.g. $this->xml->get_reentrancy_cursor( 'last_entity' );
* If the cursor internal data was a part of every bookmark, this would have worked
* even after evicting the actual bytes where $last_entity is stored.
*/
$xml_cursor = $this->xml->get_reentrancy_cursor();
$xml_cursor = json_decode( base64_decode( $xml_cursor ), true ); // phpcs:ignore WordPress.PHP.DiscouragedPHPFunctions.obfuscation_base64_decode
$xml_cursor['upstream_bytes_forgotten'] = $this->entity_opener_byte_offset;
$xml_cursor = base64_encode( json_encode( $xml_cursor ) ); // phpcs:ignore WordPress.PHP.DiscouragedPHPFunctions.obfuscation_base64_encode
return json_encode(
array(
'xml' => $xml_cursor,
'upstream' => $this->entity_opener_byte_offset,
'last_post_id' => $this->last_post_id,
'last_comment_id' => $this->last_comment_id,
)
);
}
/**
* Gets the data for the current entity.
*
* @return ImportEntity The entity.
* @since WP_VERSION
*/
public function get_entity() {
if ( ! $this->get_entity_type() ) {
return false;
}
return new ImportEntity(
$this->get_entity_type(),
$this->entity_data
);
}
/**
* Gets the type of the current entity.
*
* @return string|false The entity type, or false if no entity is being processed.
* @since WP_VERSION
*/
private function get_entity_type() {
if ( null !== $this->entity_type ) {
return $this->entity_type;
}
if ( null === $this->entity_tag ) {
return false;
}
if ( ! array_key_exists( $this->entity_tag, $this->known_entities ) ) {
return false;
}
return $this->known_entities[ $this->entity_tag ]['type'];
}
/**
* Gets the ID of the last processed post.
*
* @return int|null The post ID, or null if no posts have been processed.
* @since WP_VERSION
*/
public function get_last_post_id() {
return $this->last_post_id;
}
/**
* Gets the ID of the last processed comment.
*
* @return int|null The comment ID, or null if no comments have been processed.
* @since WP_VERSION
*/
public function get_last_comment_id() {
return $this->last_comment_id;
}
/**
* Appends bytes to the input stream.
*
* @param string $bytes The bytes to append.
*
* @since WP_VERSION
*/
public function append_bytes( string $bytes ): void {
$this->xml->append_bytes( $bytes );
}
/**
* Marks the input as finished.
*
* @since WP_VERSION
*/
public function input_finished(): void {
$this->xml->input_finished();
}
/**
* Checks if processing is finished.
*
* @return bool Whether processing is finished.
* @since WP_VERSION
*/
public function is_finished(): bool {
return $this->is_finished;
}
/**
* Checks if processing is paused waiting for more input.
*
* @return bool Whether processing is paused.
* @since WP_VERSION
*/
public function is_paused_at_incomplete_input(): bool {
return $this->xml->is_paused_at_incomplete_input();
}
/**
* Gets the last error that occurred.
*
* @return string|null The error message, or null if no error occurred.
* @since WP_VERSION
*/
public function get_last_error(): ?string {
return $this->xml->get_last_error();
}
public function get_xml_exception(): ?XMLUnsupportedException {
return $this->xml->get_exception();
}
/**
* Advances to the next entity in the WXR file.
*
* @return bool Whether another entity was found.
* @since WP_VERSION
*/
public function next_entity() {
if ( $this->is_finished ) {
return false;
}
while ( true ) {
if ( $this->read_next_entity() ) {
return true;
}
// If the read failed because of incomplete input data,
// try pulling more bytes from upstream before giving up.
if ( $this->is_paused_at_incomplete_input() ) {
if ( $this->pull_upstream_bytes() ) {
continue;
} else {
break;
}
}
$this->is_finished = true;
break;
}
return false;
}
/**
* Advances to the next entity in the WXR file.
*
* @return bool Whether another entity was found.
* @since WP_VERSION
*/
private function read_next_entity() {
if ( $this->xml->is_finished() ) {
$this->after_entity();
return false;
}
if ( $this->xml->is_paused_at_incomplete_input() ) {
return false;
}
/**
* This is the first call after emitting an entity.
* Remove the previous entity details from the internal state
* and prepare for the next entity.
*/
if ( $this->entity_type && $this->entity_finished ) {
$this->after_entity();
// If we finished processing the entity on a closing tag, advance the XML processor to.
// the next token. Otherwise the array_key_exists( $tag, static::known_entities ) branch.
// below will cause an infinite loop.
if ( $this->xml->is_tag_closer() ) {
if ( false === $this->xml->next_token() ) {
return false;
}
}
}
/**
* Main parsing loop. It advances the XML parser state until a full entity
* is available.
*/
do {
$breadcrumbs = $this->xml->get_breadcrumbs();
// Don't process anything outside the hierarchy.
if (
count( $breadcrumbs ) < 2 ||
array( '', 'rss' ) !== $breadcrumbs[0] ||
array( '', 'channel' ) !== $breadcrumbs[1]
) {
continue;
}
/*
* Buffer text and CDATA sections until we find the next tag.
* Each tag may contain multiple text or CDATA sections so we can't
* just assume that a single `get_modifiable_text()` call would get
* the entire text content of an element.
*/
if (
'#text' === $this->xml->get_token_type() ||
'#cdata-section' === $this->xml->get_token_type()
) {
$this->text_buffer .= $this->xml->get_modifiable_text();
continue;
}
// We're only interested in tags after this point.
if ( '#tag' !== $this->xml->get_token_type() ) {
continue;
}
if ( count( $breadcrumbs ) <= 2 && $this->xml->is_tag_opener() ) {
$this->entity_opener_byte_offset = $this->xml->get_token_byte_offset_in_the_input_stream();
}
$tag_with_namespace = $this->xml->get_tag_namespace_and_local_name();
/**
* Custom adjustment: the Accessibility WXR file uses a non-standard
* wp:wp_author tag.
*
* @TODO: Should WP_WXR_Entity_Reader care about such non-standard tags when
* the regular WXR importer would ignore them? Perhaps a warning
* and an upstream PR would be a better solution.
*/
if ( '{http://wordpress.org/export/1.2/}wp_author' === $tag_with_namespace ) {
$tag_with_namespace = '{http://wordpress.org/export/1.2/}author';
}
/**
* If the tag is a known entity root, assume the previous entity is
* finished, emit it, and start processing the new entity the next
* time this function is called.
*/
if ( array_key_exists( $tag_with_namespace, $this->known_entities ) ) {
if ( $this->entity_type && ! $this->entity_finished ) {
$this->emit_entity();
return true;
}
$this->after_entity();
// Only tag openers indicate a new entity. Closers just mean
// the previous entity is finished.
if ( $this->xml->is_tag_opener() ) {
$this->set_entity_tag( $tag_with_namespace );
$this->entity_opener_byte_offset = $this->xml->get_token_byte_offset_in_the_input_stream();
}
continue;
}
/**
* We're inside of an entity tag at this point.
*
* The following code assumes that we'll only see three types of tags:
*
* * Empty elements – such as , that we'll ignore
* * XML element openers with only text nodes inside them.
* * XML element closers.
*
* Specifically, we don't expect to see any nested XML elements such as:
*
*
* Pygmalion
* Long time ago...
*
*
* The semantics of such a structure is not clear. The WP_WXR_Entity_Reader will
* enter an error state when it encounters such a structure.
*
* Such nesting wasn't found in any WXR files analyzed when building
* this class. If it actually is a part of the WXR standard, every
* supported nested element will need a custom handler.
*/
/**
* Buffer the XML tag opener attributes for later use.
*
* In WXR files, entity attributes come from two sources:
* * XML attributes on the tag itself
* * Text content between the opening and closing tags
*
* We store the XML attributes when encountering an opening tag,
* but wait until the closing tag to process the entity attributes.
* Why? Because only at that point we have both the attributes
* and all the related text nodes.
*/
if ( $this->xml->is_tag_opener() ) {
$this->last_opener_attributes = array();
// Get non-namespaced attributes.
$names = $this->xml->get_attribute_names_with_prefix( '', '' );
foreach ( $names as list($namespace, $name) ) {
$this->last_opener_attributes[ $name ] = $this->xml->get_attribute( $namespace, $name );
}
$this->text_buffer = '';
$is_site_option_opener = (
3 === count( $this->xml->get_breadcrumbs() ) &&
$this->xml->matches_breadcrumbs( array( 'rss', 'channel', '*' ) ) &&
array_key_exists( $this->xml->get_tag_namespace_and_local_name(), $this->known_site_options )
);
if ( $is_site_option_opener ) {
$this->entity_opener_byte_offset = $this->xml->get_token_byte_offset_in_the_input_stream();
}
continue;
}
/**
* At this point we're looking for the nearest tag closer so we can
* turn the buffered data into an entity attribute.
*/
if ( ! $this->xml->is_tag_closer() ) {
continue;
}
if (
! $this->entity_finished &&
array( array( '', 'rss' ), array( '', 'channel' ) ) === $this->xml->get_breadcrumbs()
) {
// Look for site options in children of the tag.
if ( $this->parse_site_option() ) {
return true;
} else {
// Keep looking for an entity if none was found in the current tag.
continue;
}
}
/**
* Special handling to accumulate categories stored inside the
* tag found inside tags.
*
* For example, we want to convert this:
*
*
*
*
*
*
* Into this:
*
* 'terms' => [
* [ 'taxonomy' => 'category', 'slug' => '', 'description' => 'Uncategorized' ],
* [ 'taxonomy' => 'category', 'slug' => 'WordPress', 'description' => 'WordPress' ],
* ]
*/
if (
'post' === $this->entity_type &&
'category' === $this->xml->get_tag_local_name() &&
array_key_exists( 'domain', $this->last_opener_attributes ) &&
array_key_exists( 'nicename', $this->last_opener_attributes )
) {
$this->entity_data['terms'][] = array(
'taxonomy' => $this->last_opener_attributes['domain'],
'slug' => $this->last_opener_attributes['nicename'],
'description' => $this->text_buffer,
);
$this->text_buffer = '';
continue;
}
/**
* Store the text content of known tags as the value of the corresponding
* entity attribute as defined by the $known_entities mapping.
*
* Ignores tags unlisted in the $known_entities mapping.
*
* The WXR format is extensible so this reader could potentially
* support registering custom handlers for unknown tags in the future.
*/
if ( ! isset( $this->known_entities[ $this->entity_tag ]['fields'][ $tag_with_namespace ] ) ) {
continue;
}
$key = $this->known_entities[ $this->entity_tag ]['fields'][ $tag_with_namespace ];
$this->entity_data[ $key ] = $this->text_buffer;
$this->text_buffer = '';
} while ( $this->xml->next_token() );
if ( $this->is_paused_at_incomplete_input() ) {
return false;
}
/**
* Emit the last unemitted entity after parsing all the data.
*/
if (
$this->is_finished() &&
$this->entity_type &&
! $this->entity_finished
) {
$this->emit_entity();
return true;
}
return false;
}
/**
* Emits a site option entity from known children of the
* tag, e.g. or .
*
* @return bool Whether a site_option entity was emitted.
*/
private function parse_site_option() {
if ( ! array_key_exists( $this->xml->get_tag_namespace_and_local_name(), $this->known_site_options ) ) {
return false;
}
$this->entity_type = 'site_option';
$this->entity_data = array(
'option_name' => $this->known_site_options[ $this->xml->get_tag_namespace_and_local_name() ],
'option_value' => $this->text_buffer,
);
$this->emit_entity();
return true;
}
/**
* Connects a byte stream to automatically pull bytes from once
* the last input chunk have been processed.
*
* @param ByteReadStream $stream The upstream stream.
*/
public function connect_upstream( ByteReadStream $stream ) {
$this->upstream = $stream;
}
/**
* Appends another chunk of bytes from upstream if available.
*/
private function pull_upstream_bytes() {
if ( ! $this->upstream ) {
return false;
}
if ( $this->upstream->reached_end_of_data() ) {
$this->input_finished();
return false;
}
$available_bytes = $this->upstream->pull( 65536 );
$this->append_bytes( $this->upstream->consume( $available_bytes ) );
return true;
}
/**
* Marks the current entity as emitted and updates tracking variables.
*
* @since WP_VERSION
*/
private function emit_entity() {
if ( 'post' === $this->entity_type ) {
// Not all posts have a `` tag.
$this->last_post_id = isset( $this->entity_data['post_id'] ) ? $this->entity_data['post_id'] : null;
} elseif ( 'post_meta' === $this->entity_type ) {
$this->entity_data['post_id'] = $this->last_post_id;
} elseif ( 'comment' === $this->entity_type ) {
$this->last_comment_id = $this->entity_data['comment_id'];
$this->entity_data['post_id'] = $this->last_post_id;
} elseif ( 'comment_meta' === $this->entity_type ) {
$this->entity_data['comment_id'] = $this->last_comment_id;
} elseif ( 'tag' === $this->entity_type ) {
$this->entity_data['taxonomy'] = 'post_tag';
} elseif ( 'category' === $this->entity_type ) {
$this->entity_data['taxonomy'] = 'category';
}
$this->entity_finished = true;
++$this->entities_read_so_far;
}
/**
* Sets the current entity tag and type.
*
* @param string $tag_with_namespace The entity tag name.
*
* @since WP_VERSION
*/
private function set_entity_tag( string $tag_with_namespace ) {
$this->entity_tag = $tag_with_namespace;
if ( array_key_exists( $tag_with_namespace, $this->known_entities ) ) {
$this->entity_type = $this->known_entities[ $tag_with_namespace ]['type'];
}
}
/**
* Resets the state after processing an entity.
*
* @since WP_VERSION
*/
private function after_entity() {
$this->entity_tag = null;
$this->entity_type = null;
$this->entity_data = array();
$this->entity_finished = false;
$this->text_buffer = '';
$this->last_opener_attributes = array();
}
}