javascript - How to convert HTML to JSON using PHP? -
i can convert json html using jsontohtml library. now,i need convert present html json shown in site. when looked code found following script:
<script> $(function(){ //html json $('#btn-render-json').click(function() { //set html output $('#html-output').html( $('#html-input').val() ); //process json , format consumption $('#html-json').html( formatjson(totransform($('#html-output').children())) ); }); }); //convert obj or array transform function totransform(obj) { var json; if( obj.length > 1 ) { json = []; for(var = 0; < obj.length; i++) json[json.length++] = objtotransform(obj[i]); } else json = objtotransform(obj); return(json); } //convert obj transform function objtotransform(obj) { //get dom element var el = $(obj).get(0); //add tag element var json = {'tag':el.nodename.tolowercase()}; (var attr, i=0, attrs=el.attributes, l=attrs.length; i<l; i++){ attr = attrs[i]; json[attr.nodename] = attr.value; } var children = $(obj).children(); if( children.length > 0 ) json['children'] = []; else json['html'] = $(obj).text(); //add children for(var c = 0; c < children.length; c++) json['children'][json['children'].length++] = totransform(children[c]); return(json); } //format json (with indents) function formatjson(odata, sindent) { if (arguments.length < 2) { var sindent = ""; } var sindentstyle = " "; var sdatatype = realtypeof(odata); // open object if (sdatatype == "array") { if (odata.length == 0) { return "[]"; } var shtml = "["; } else { var icount = 0; $.each(odata, function() { icount++; return; }); if (icount == 0) { // object empty return "{}"; } var shtml = "{"; } // loop through items var icount = 0; $.each(odata, function(skey, vvalue) { if (icount > 0) { shtml += ","; } if (sdatatype == "array") { shtml += ("\n" + sindent + sindentstyle); } else { shtml += ("\"" + skey + "\"" + ":"); } // display relevant data type switch (realtypeof(vvalue)) { case "array": case "object": shtml += formatjson(vvalue, (sindent + sindentstyle)); break; case "boolean": case "number": shtml += vvalue.tostring(); break; case "null": shtml += "null"; break; case "string": shtml += ("\"" + vvalue + "\""); break; default: shtml += ("typeof: " + typeof(vvalue)); } // loop icount++; }); // close object if (sdatatype == "array") { shtml += ("\n" + sindent + "]"); } else { shtml += ("}"); } // return return shtml; } //get type of obj (can replace jquery type) function realtypeof(v) { if (typeof(v) == "object") { if (v === null) return "null"; if (v.constructor == (new array).constructor) return "array"; if (v.constructor == (new date).constructor) return "date"; if (v.constructor == (new regexp).constructor) return "regex"; return "object"; } return typeof(v); } </script>
now, in need of using following function in php. can html data. needed convert javascript function php function. possible? major doubts follows:
the primary input javascript function
totransform()
object. possible convert html object via php?are functions present in particular javascript available in php?
please suggest me idea.
when tried convert script tag json per answer given, errors. when tried in json2html site, showed this: .. how achieve same solution?
if able obtain domdocument
object representing html, need traverse recursively , construct data structure want.
converting html document domdocument
should simple this:
function html_to_obj($html) { $dom = new domdocument(); $dom->loadhtml($html); return element_to_obj($dom->documentelement); }
then, simple traversal of $dom->documentelement
gives kind of structure described this:
function element_to_obj($element) { $obj = array( "tag" => $element->tagname ); foreach ($element->attributes $attribute) { $obj[$attribute->name] = $attribute->value; } foreach ($element->childnodes $subelement) { if ($subelement->nodetype == xml_text_node) { $obj["html"] = $subelement->wholetext; } else { $obj["children"][] = element_to_obj($subelement); } } return $obj; }
test case
$html = <<<eof <!doctype html> <html lang="en"> <head> <title> test </title> </head> <body> <h1> working? </h1> <ul> <li> yes </li> <li> no </li> </ul> </body> </html> eof; header("content-type: text/plain"); echo json_encode(html_to_obj($html), json_pretty_print);
output
{ "tag": "html", "lang": "en", "children": [ { "tag": "head", "children": [ { "tag": "title", "html": " test " } ] }, { "tag": "body", "html": " \n ", "children": [ { "tag": "h1", "html": " working? " }, { "tag": "ul", "children": [ { "tag": "li", "html": " yes " }, { "tag": "li", "html": " no " } ], "html": "\n " } ] } ] }
answer updated question
the solution proposed above not work <script>
element, because parsed not domtext
, domcharacterdata
object. because dom extension in php based on libxml2
, parses html html 4.0, , in html 4.0 content of <script>
of type cdata
, not #pcdata
.
you have 2 solutions problem.
the simple not robust solution add
libxml_nocdata
flagdomdocument::loadhtml
. (i not 100% sure whether works html parser.)the more difficult but, in opinion, better solution, add additonal test when testing
$subelement->nodetype
before recursion. recursive function become:
function element_to_obj($element) { echo $element->tagname, "\n"; $obj = array( "tag" => $element->tagname ); foreach ($element->attributes $attribute) { $obj[$attribute->name] = $attribute->value; } foreach ($element->childnodes $subelement) { if ($subelement->nodetype == xml_text_node) { $obj["html"] = $subelement->wholetext; } elseif ($subelement->nodetype == xml_cdata_section_node) { $obj["html"] = $subelement->data; } else { $obj["children"][] = element_to_obj($subelement); } } return $obj; }
if hit on bug of type, first thing should check type of node $subelement
is, because there exists many other possibilities short example function did not deal with.
additionally, notice libxml2
has fix mistakes in html in order able build dom it. why <html>
, <head>
elements appear if don't specify them. can avoid using libxml_html_noimplied
flag.
test case script
$html = <<<eof <script type="text/javascript"> alert('hi'); </script> eof; header("content-type: text/plain"); echo json_encode(html_to_obj($html), json_pretty_print);
output
{ "tag": "html", "children": [ { "tag": "head", "children": [ { "tag": "script", "type": "text\/javascript", "html": "\n alert('hi');\n " } ] } ] }
Comments
Post a Comment