recentpopularlog in

kme : html   316

« earlier  
text processing - how to massage or format html in order to parse with xmstarlet? - Unix & Linux Stack Exchange
Pretty key when the input is HTML but not XHTML:
<code class="language-bash">xmlstarlet fo -H -R </code>
xmlstarlet  malformed  html  webdevel  textprocessing  commandline  cli  solution 
11 days ago by kme
text processing - How to parse hundred html source code files in shell? - Unix & Linux Stack Exchange
Reference: https://www.w3.org/Tools/HTML-XML-utils/
<code class="language-bash">hxselect '#the_div_id' <file</code>

<code class="language-bash">pup '#the_div_id' < file.html</code>
webdevel  cssselectors  commandline  cli  html  parser  fuckina  alternativeto  xmlstarlet 
11 days ago by kme
Index of /Tools/HTML-XML-utils
Via https://stackoverflow.com/questions/22021494/how-to-xmlstarlet-to-extract-html-data-by-id

<code>cexport (1) - create headerfile of exported declarations from a C file
hxaddid (1) - add ID's to selected elements
hxcite (1) - replace bibliographic references by hyperlinks
hxcite-mkbib (1) - expand references and create bibliography
hxcopy (1) - copy an HTML file while preserving relative links
hxcount (1) - count elements and attributes in HTML or XML files
hxextract (1) - extract selected elements
hxclean (1) - apply heuristics to correct an HTML file
hxprune (1) - remove marked elements from an HTML file
hxincl (1) - expand included HTML or XML files
hxindex (1) - create an alphabetically sorted index
hxmkbib (1) - create bibliography from a template
hxmultitoc (1) - create a table of contents for a set of HTML files
hxname2id - move some ID= or NAME= from A elements to their parents
hxnormalize (1) - pretty-print an HTML file
hxnum (1) - number section headings in an HTML file
hxpipe (1) - convert XML to a format easier to parse with Perl or AWK
hxprintlinks (1) - number links & add table of URLs at end of an HTML file
hxremove (1) - remove selected elements from an XML file
hxtabletrans (1) - transpose an HTML or XHTML table
hxtoc (1) - insert a table of contents in an HTML file
hxuncdata (1) - replace CDATA sections by character entities
hxunent (1) - replace HTML predefined character entities to UTF-8
hxunpipe (1) - convert output of pipe back to XML format
hxunxmlns (1) - replace "global names" by XML Namespace prefixes
hxwls (1) - list links in an HTML file
hxxmlns (1) - replace XML Namespace prefixes by "global names"
asc2xml, xml2asc (1) - convert between UTF8 and &#nnn; entities
hxref (1) - generate cross-references
hxselect (1) - extract elements that match a (CSS) selector
</code>
webdevel  cssselectors  commandline  cli  html  parser  alternativeto  xmlstarlet 
11 days ago by kme
xml - how to? xmlstarlet to extract HTML data by id - Stack Overflow
Essential tip for namespaced HTML, otherwise you get... NOTHING out of 'xmlstarlet'

Just passing HTML through 'xml fo -H -R' (process as HTML and recover as much as possible) is enough to get un-namespaced HTML that is also valid XML (source: https://unix.stackexchange.com/a/382928/278323).

The html data has a default namespace that you have to declare in the xmlstarlet command:
<code class="language-bash">
xmlstarlet sel \
-N n="http://www.w3.org/1999/xhtml" \
-t \
-c "/n:html/n:body/n:table[@id='test_table']/descendant::*/text()" \
htmlfile 2>/dev/null
</code>

UPDATE: I didn't know it but as the error message says, there is no need to declare the namespace when it's the default one, so also this works:
<code class="language-bash">
xmlstarlet sel \
-t \
-c "/_:html/_:body/_:table[@id='test_table']/descendant::*/text()" \
htmlfile 2>/dev/null
</code>
xml  xmlstarlet  textprocessing  malformed  html  reference  namespaced  xhtml  solution  fuckina 
11 days ago by kme
shell - Unable to locate and replace an element with its classname using bash script? - Stack Overflow
The actual solution is
Clean it up into valid XHTML with tidy

I used
<code class="language-bash">tidy -f /dev/null -w 0 -n -q -asxhtml</code>
in a pipe to suppress all the extraneous warnings, and get XML that something like XMLStarlet could handle.
html  xhtml  xml  tidy  importexport  datamunging  solution  dammitbrain  fuckina 
september 2019 by kme
ndmitchell/tagsoup: Haskell library for parsing and extracting information from (possibly malformed) HTML/XML documents
Haskell library for parsing and extracting information from (possibly malformed) HTML/XML documents - ndmitchell/tagsoup
haskell  html  parser  tagsoup  malformedhtml  library  webdevel 
may 2019 by kme
benibela/xidel: A command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern templates. It can also create new or transformed XML/HTML/JSON documents.
This tool seems to be able to deal with malformed HTML that 'xmllint' and 'xmlstarlet' choke on (even after a pass through 'tidy').
A command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern templates. It can also create new or transformed XML/HTML/JSON documents. - benibela/xidel
xml  html  webscraping  webdevel  api  testing  alternativeto  xmllint  xmlstarlet 
may 2019 by kme
'target' : Blank or New? · Andrew Chilton
Just note that if using _blank when pointing to an untrusted website, you should also add rel="noopener" as well. This ensures that the site being opened won’t have access to the opener property and hence the ability to find out information about your site from their JavaScript
webdevel  html  links  bestpractice 
march 2019 by kme
necolas/normalize.css: A modern alternative to CSS resets | https://github.com/
A modern alternative to CSS resets. Contribute to necolas/normalize.css development by creating an account on GitHub.
html  css  webdesign  cssreset  boilerplate 
february 2019 by kme
Indeterminate Checkboxes | CSS-Tricks | https://css-tricks.com/
<code class="language-html"><!-- Inline click handler, just for demo -->
<input type="checkbox" id="cb1" onclick="ts(this)"></code>

<code class="language-javascript">
function ts(cb) {
if (cb.readOnly) cb.checked=cb.readOnly=false;
else if (!cb.checked) cb.readOnly=cb.indeterminate=true;
}</code>
webdesign  webdeve  html  jquery  css  javascript  checkbox  tristate  tipsandtricks 
may 2018 by kme
Extension:LinkTarget - MediaWiki | https://www.mediawiki.org/
Allows specifying external link targets with a specific class on the parent div.
mediawiki  linktarget  link  html  extension  essential  movein 
march 2018 by kme
xml_grep2 - search.cpan.org
This is called "App::Xml_grep2" on CPAN.
-t, --text-only

Return the result as text (using the XPath value of nodes). Results are stripped of newlines and output 1 per line.

Results are in the original encoding for the document.
perl  xml  grep  xpath  html  webdevel  textprocessing 
december 2017 by kme
GitHub - mganss/HtmlSanitizer: Cleans HTML to avoid XSS attacks
Cleans HTML to avoid XSS attacks. Contribute to mganss/HtmlSanitizer development by creating an account on GitHub.
dotnet  html  sanitizer  library  webdevel  security 
november 2017 by kme
javascript - SYNTAX_ERR: DOM Exception 12 - Hmmm - Stack Overflow | https://stackoverflow.com/
You are using illegal id-attributes(illegal before HTML5) inside the document, e.g. 2-slide . Fix them.

To explain: to solve the known misbehaviour of element.querySelectorAll() the selector .slide will be internally rewritten(by using the id of the element). This will result in something like that:

#2-slide .moreselectors

...and forces the error, because an ID may not start with a Number.
javascript  queryselector  html  solution 
october 2017 by kme
« earlier      
per page:    204080120160

Copy this bookmark:





to read