Thread  RSS My now-outdated XML parser class



# 11546 9 years ago on Sat, Jan 16 2016 at 11:23 pm

I would strongly recommend using PHP's parse_into_struct() function nowadays (or, if you're parsing a huge XML ingestion, put together a more robust, incremental SAX-based solution).

Having prefaced with that, here is a now no longer used XML parser class I put together that parses XML into a multi-dimensional associative array. It's far from perfect but for relatively quick and simple parsing of XML data into an array you can use, here it is:

NOTE: The auto-censor might filter out some of the code in this forum. If you want to see the unaltered code, be sure you're signed in and temporarily disable "hide profanity" in your profile.

/**
* This class provides functionality for importing and exporting XML feeds.
*/

class xml{

  var $data = Array();  // Array to hold parsed data
  var $tag_counts = Array(); // Array to keep track of the number of each tag on each level.
  var $parent_tags = Array(); // Array to keep track of the current list of parent tags.
  var $depth = 0; // Number to determine the depth in the XML structure (to apply to each tag)
  var $pointer; // Pointer to refer to positions in the main parsed data array
  var $parser;  // Parser object
  var $cdata; // Holder for detected character data.

  /**
  * Reads an XML string and parses it into a useable multi-dimensional array.
  * @param $xml = the string of XML to parse.
  * @return $data = the parsed XML data.
  */
  function parse($xml){     
    // Set up the XML parser
    $this->parser = xml_parser_create();
    xml_parser_set_option($this->parser,XML_OPTION_SKIP_WHITE,1);    
    xml_set_object($this->parser,$this);
    xml_set_element_handler($this->parser,'xmlStartTag','xmlEndTag');
    xml_set_character_data_handler($this->parser,'xmlCharacterData');
    // Parse.
    xml_parse($this->parser,$xml,true);   
  }
  
  // Function that handles xml start tags
  function xmlStartTag($parser, $name, $attributes){
    $name = strtolower($name);
    // Increase tag depth
    $this->depth++;
    // Add tag to list of parent tags
    $tag_count = (isset($this->tag_counts[$this->depth][$name])) ? ($this->tag_counts[$this->depth][$name] + 1) : 0;
    $this->tag_counts[$this->depth][$name] = $tag_count;
    // Add current tag name to list of parent tags
    $this->parent_tags[$this->depth] = $name;
    // Set the pointer in the parsed data structure
    $this->setPointer();
    // Store the tag's attributes
    if(count($attributes) > 0){
      foreach($attributes as $n => $v){
        $this->pointer['attributes'][strtolower($n)] = $v;    
      }
    }
  }
  
  // Function that handles xml end tags
  function xmlEndTag($parser, $name){
    $name = strtolower($name);
    // Record the character data for the current tag.
    if(!empty($this->cdata)){
      $this->pointer['contents'] = $this->cdata;
    }
    // Clear the character data holder
    $this->cdata = '';    
    // Remove tag from list of parent tags
    unset($this->parent_tags[$this->depth]);
    // Decrease tag depth
    $this->depth--;
  }
  
  
  // Function that handles xml character data
  function xmlCharacterData($parser, $data){
    $this->cdata = trim($data);
  }
    
  // Function that dynamically sets the data pointer
  function setPointer(){
    $path = '';
    $eval_string = '';
    foreach($this->parent_tags as $depth => $tag_name){
      $path .= '[\''.$tag_name.'\']['.$this->tag_counts[$depth][$tag_name].']';    
    }
    $eval_string .= 'if(!isset($this->data'.$path.')){$this->data'.$path.' = Array();}';
    $eval_string .= '$this->pointer = &$this->data'.$path.';';
    eval($eval_string); 
  }
  
  // Returns an array of all tag counts for each depth in the xml.
  function getTagCounts(){
    return $this->tag_counts;
  }
  
  // Returns the parsed data
  function getData(){
    return $this->data;
  }

}

(This post was edited 9 years ago on Sunday, January 17th, 2016 at 10:29 pm)

73's, KD8FUD

User Image

# 11554 9 years ago on Sun, Jan 17 2016 at 12:38 am

Hmmm. I'm not a PHP guy but PHP is similar enough to C that I can understand what you're doing, here. It's kind of an interesting implementation of what I'm assuming are built-in libraries in PHP and not other functions you've written elsewhere.

Using eval() is generally considered dangerous but as long as you're not using it on user-supplied strings, it should be fine. If, however, you were ingesting XML from an untrusted source, I'd be awfully wary since you are passing the XML tag names into that eval() call.

Without analyzing it too deeply, I think setPointer() might potentially cause your code to skip certain tags. You're probably better off implementing that newer PHP library and then massaging / arranging your data as you like it.

One more thing: If you're parsing a really large XML file, you're going to run into memory issues like this, no matter what language you code in.

"Dangerous toys are fun, but you could get hurt!"


Return to Index Return to topic list

Forgot password?
Currently Online
Users:0
Guests:5

Most Recently Online
Nitrocosm2 weeks ago
ZOL2 months ago
Wolfwood292 months ago
lam2 months ago
Jovian2 months ago