javascript - How to convert HTML to JSON using PHP? -


i can convert json html using jsontohtml library. now,i need convert present html json shown in site. when looked code found following script:

<script> $(function(){      //html json     $('#btn-render-json').click(function() {          //set html output         $('#html-output').html( $('#html-input').val() );          //process json , format consumption         $('#html-json').html( formatjson(totransform($('#html-output').children())) );     });  });  //convert obj or array transform function totransform(obj) {      var json;      if( obj.length > 1 )     {         json = [];          for(var = 0; < obj.length; i++)             json[json.length++] = objtotransform(obj[i]);     } else         json = objtotransform(obj);      return(json); }  //convert obj transform function objtotransform(obj) {     //get dom element     var el = $(obj).get(0);      //add tag element     var json = {'tag':el.nodename.tolowercase()};      (var attr, i=0, attrs=el.attributes, l=attrs.length; i<l; i++){         attr = attrs[i];         json[attr.nodename] = attr.value;     }      var children = $(obj).children();      if( children.length > 0 ) json['children'] = [];     else json['html'] = $(obj).text();      //add children     for(var c = 0; c < children.length; c++)         json['children'][json['children'].length++] = totransform(children[c]);      return(json); }  //format json (with indents) function formatjson(odata, sindent) {     if (arguments.length < 2) {         var sindent = "";     }     var sindentstyle = "  ";     var sdatatype = realtypeof(odata);      // open object     if (sdatatype == "array") {         if (odata.length == 0) {             return "[]";         }         var shtml = "[";     } else {         var icount = 0;         $.each(odata, function() {             icount++;             return;         });         if (icount == 0) { // object empty             return "{}";         }         var shtml = "{";     }      // loop through items     var icount = 0;     $.each(odata, function(skey, vvalue) {         if (icount > 0) {             shtml += ",";         }         if (sdatatype == "array") {             shtml += ("\n" + sindent + sindentstyle);         } else {             shtml += ("\"" + skey + "\"" + ":");         }          // display relevant data type         switch (realtypeof(vvalue)) {             case "array":             case "object":                 shtml += formatjson(vvalue, (sindent + sindentstyle));                 break;             case "boolean":             case "number":                 shtml += vvalue.tostring();                 break;             case "null":                 shtml += "null";                 break;             case "string":                 shtml += ("\"" + vvalue + "\"");                 break;             default:                 shtml += ("typeof: " + typeof(vvalue));         }          // loop         icount++;     });      // close object     if (sdatatype == "array") {         shtml += ("\n" + sindent + "]");     } else {         shtml += ("}");     }      // return     return shtml; }  //get type of obj (can replace jquery type) function realtypeof(v) {   if (typeof(v) == "object") {     if (v === null) return "null";     if (v.constructor == (new array).constructor) return "array";     if (v.constructor == (new date).constructor) return "date";     if (v.constructor == (new regexp).constructor) return "regex";     return "object";   }   return typeof(v); } </script> 

enter image description here

now, in need of using following function in php. can html data. needed convert javascript function php function. possible? major doubts follows:

  • the primary input javascript function totransform() object. possible convert html object via php?

  • are functions present in particular javascript available in php?

please suggest me idea.

when tried convert script tag json per answer given, errors. when tried in json2html site, showed this:enter image description here .. how achieve same solution?

if able obtain domdocument object representing html, need traverse recursively , construct data structure want.

converting html document domdocument should simple this:

function html_to_obj($html) {     $dom = new domdocument();     $dom->loadhtml($html);     return element_to_obj($dom->documentelement); } 

then, simple traversal of $dom->documentelement gives kind of structure described this:

function element_to_obj($element) {     $obj = array( "tag" => $element->tagname );     foreach ($element->attributes $attribute) {         $obj[$attribute->name] = $attribute->value;     }     foreach ($element->childnodes $subelement) {         if ($subelement->nodetype == xml_text_node) {             $obj["html"] = $subelement->wholetext;         }         else {             $obj["children"][] = element_to_obj($subelement);         }     }     return $obj; } 

test case

$html = <<<eof <!doctype html> <html lang="en">     <head>         <title> test </title>     </head>     <body>         <h1> working? </h1>           <ul>             <li> yes </li>             <li> no </li>         </ul>     </body> </html>  eof;  header("content-type: text/plain"); echo json_encode(html_to_obj($html), json_pretty_print); 

output

{     "tag": "html",     "lang": "en",     "children": [         {             "tag": "head",             "children": [                 {                     "tag": "title",                     "html": " test "                 }             ]         },         {             "tag": "body",             "html": "  \n        ",             "children": [                 {                     "tag": "h1",                     "html": " working? "                 },                 {                     "tag": "ul",                     "children": [                         {                             "tag": "li",                             "html": " yes "                         },                         {                             "tag": "li",                             "html": " no "                         }                     ],                     "html": "\n        "                 }             ]         }     ] } 

answer updated question

the solution proposed above not work <script> element, because parsed not domtext, domcharacterdata object. because dom extension in php based on libxml2, parses html html 4.0, , in html 4.0 content of <script> of type cdata , not #pcdata.

you have 2 solutions problem.

  1. the simple not robust solution add libxml_nocdata flag domdocument::loadhtml. (i not 100% sure whether works html parser.)

  2. the more difficult but, in opinion, better solution, add additonal test when testing $subelement->nodetype before recursion. recursive function become:

function element_to_obj($element) {     echo $element->tagname, "\n";     $obj = array( "tag" => $element->tagname );     foreach ($element->attributes $attribute) {         $obj[$attribute->name] = $attribute->value;     }     foreach ($element->childnodes $subelement) {         if ($subelement->nodetype == xml_text_node) {             $obj["html"] = $subelement->wholetext;         }         elseif ($subelement->nodetype == xml_cdata_section_node) {             $obj["html"] = $subelement->data;         }         else {             $obj["children"][] = element_to_obj($subelement);         }     }     return $obj; } 

if hit on bug of type, first thing should check type of node $subelement is, because there exists many other possibilities short example function did not deal with.

additionally, notice libxml2 has fix mistakes in html in order able build dom it. why <html> , <head> elements appear if don't specify them. can avoid using libxml_html_noimplied flag.

test case script

$html = <<<eof         <script type="text/javascript">             alert('hi');         </script> eof;  header("content-type: text/plain"); echo json_encode(html_to_obj($html), json_pretty_print); 

output

{     "tag": "html",     "children": [         {             "tag": "head",             "children": [                 {                     "tag": "script",                     "type": "text\/javascript",                     "html": "\n            alert('hi');\n        "                 }             ]         }     ] } 

Comments

Popular posts from this blog

windows - Single EXE to Install Python Standalone Executable for Easy Distribution -

c# - Access objects in UserControl from MainWindow in WPF -

javascript - How to name a jQuery function to make a browser's back button work? -