LibreOffice HTML tidy-up app
Steve Pampling (1551) 8155 posts |
Looks more like a lady’s reproductive system to me. Just showed the larger version (in the profile page) to the wife, and she independently came up with the same reproductive system comment. (Minor edit, rubbish word use) |
Rick Murray (539) 13806 posts |
Hey – dragons are cool. Three million Welsh and 1.4 billion Chinese can’t be wrong…
Oh, Elvira, dear, Halloween was last weekend. |
John WILLIAMS (8368) 493 posts |
Would that be “lady’s”? Or is it a composite? Well, I can’t possibly comment on that without referring to the recent American election. But I didn’t spot that one myself, despite being still active in that arena – same age as the “former” president! But not so (insert own adverbial perjorative term here!)! |
Rick Murray (539) 13806 posts |
This is amusing for all the wrong reasons:
TMI, John, T. M. I. |
Clive Semmens (2335) 3276 posts |
For Authentic Steve: http://clive.semmens.org.uk/RISCOS/XP1deBloat is ready for service. It goes through a file copying everything until it finds ‘ Or if you want a standard HTML file containing just the links, rather than a NetScape Bookmarks file, that’s a slightly bigger job but not hard.
This was a very quick and easy hack compared to most of my other apps. You could easily alter it to do completely different tricks. It’s pretty easy to edit the runfile yourself, as long as you don’t want to add options. All the actual work on the files is near the top of the program, and all the wimp stuff is out of the way down the bottom. If you don’t need to add options, there’s no need to mess with the wimp stuff at all. |
Clive Semmens (2335) 3276 posts |
Just to show what an easy little hack this is, here’s the guts of the program (the rest is bog standard WIMP stuff to drag and drop files on and off the iconbar icon):
|
Jeff Doggett (257) 234 posts |
|
Clive Semmens (2335) 3276 posts |
Cheers Jeff! Done it a different way already – and lost my indents… will try your way, see if it preserves them! Great. It does. |
Jeff Doggett (257) 234 posts |
Textile is hard, I still can’t work out how to show the < /code> tag without having an extra space to make it show! |
Chris Mahoney (1684) 2165 posts |
<code> works outside a <pre> element, but I don’t know how to do it inside one. |
Steve Pampling (1551) 8155 posts |
Part curiosity about how much the “dross” amounts to in a bookmarks file, part interest in seeing whether memory use reduces and whether things load faster etc.
I think that, theoretically, the notextile tag does that – assuming textile isn’t having one of its quirky days. " <code> works outside a < pre> element" Except when textile decides you’re wrong, because “y” |
GavinWraith (26) 1563 posts |
Sorry, cannot resist it:
Drag the above into a StrongED window with the bookmark stuff in it. |
Clive Semmens (2335) 3276 posts |
8~) Such is life. I suspect PERL would be good for this job, too. But better the language one knows than the language one half knows (PERL) or doesn’t know at all (Lua). I’m not a StrongED user, either. Whereas Zap and BASIC are very familiar. Years ago, I wrote an app in BASIC, with an assembler core for the heavy lifting, that could do several hundred overlapping search and replaces like this in parallel, to extract the content from a wide variety of different files from different word processors that were around in those days, and put them into Impression’s DDF (Document Description Format). This was for the Physiological Society’s journals – dozens of academic papers every month. My app processed papers to a reasonable state ready for the copy editors in a few seconds – versus overnight runs on a Mac for a similar process used by another publisher we knew publishing similar volumes of stuff. I’ve never bothered to update the assembler core to 32-bit it… I’m slowly getting more familiar with PERL, because it’s what I’ve got easy access to on the Mac. |
David J. Ruck (33) 1629 posts |
Perl – bah! Python if you want to retain a modicum of sanity. BTW: I’m going to have to change my icon now aren’t I? |
Chris Hall (132) 3554 posts |
BTW: I’m going to have to change my icon now aren’t I? Yes – at the size it is shown, it appears to be a crucifixion scene. |
Clive Semmens (2335) 3276 posts |
I’d have to acquire some before I could retain it. Given that I know BASIC very well, I’m not sure I can be bothered with Python on the Pi. PERL is good on the Mac for doing search and replaces across numerous files all over a directory tree, which is all I’ve wanted it for so far. Anything I want to do on a single file that’s more than Atom’s search and replace can do easily, I just flip over to the Pi and do it in BASIC (or Zap). Lua looks quite interesting, but again, I doubt I’ll bother. |
GavinWraith (26) 1563 posts |
If you are using the latest version of RiscLua the script can in fact be written in four lines: because the filehandle f , having been declared as a to-be-closed local variable, will be closed automatically when it goes out of scope. However, the puritanical may disapprove of such safety nets for the forgetful. That local variables can have attributes like this (const and close are the only ones available so far) is a new feature of Lua 5.4. I mention it for your entertainment.
|
Steve Drain (222) 1620 posts |
It’s not as slick as Lua but: *BasaltInit ARGS "source,output" TO source$,output$ (strand$)=LOAD$(source$) regex$=" ICON=""\a*""" SEARCH (strand$),regex$ WHILE GROUPS (strand$)=REPLACE$("") SEARCH (strand$),regex$ ENDWHILE SAVE (strand$),output$ DELETE (strand$) QUIT ;-) Edit: added DELETE |
Clive Semmens (2335) 3276 posts |
Those suggesting alternative languages might perhaps note that this simple problem wasn’t what the thread was about. It was the more complex (but still not terribly difficult) problem of tidying up HTML files generated by LibreOffice. When LibreOffice saves HTML, it gives detailed formatting information that makes the output match what was in LibreOffice – whereas a web page probably doesn’t want most of that. It’s actually worse than that, in that LibreOffice has a tendency to leave matched pairs of tags lying around with nothing between them, and a few other oddities. Details in the help file in the app, and on the description page on my website. The other thing is that I expect to be processing numerous such files in the future, so being able to simply drag files onto the icon bar and drag the product wherever I want it is important. I’ve no idea whether this would be easy with other languages. You’re welcome to download the app and take a look inside. I’m quite interested to know what a complete drag and drop app like this would look like in Lua or other language – but I’ll probably continue using BASIC anyway, because I know it, and I can re-use large chunks of wimp handling code. |
Clive Semmens (2335) 3276 posts |
Also, for what it’s worth, BBC BASIC could be a lot more concise than what I wrote – probably not a lot different from Basalt – but I like the clarity of doing it the way I wrote it. No idea whether other people find it clearer, but that way of writing it makes it much easier for me to reread it in the future, for editing this app or re-using the code. |
GavinWraith (26) 1563 posts |
This script will do quite a lot of cleaning:
|
Steve Pampling (1551) 8155 posts |
So much to play with. Tired eyes though. Just done my first full day in the phased return. |
Clive Semmens (2335) 3276 posts |
Gavin: a script isn’t a drag-and-drop app; and you’ve only dealt with a handful of the cases, without any attention to whether any of the detail of the cases is worth preserving… Lua looks more concise than BASIC when it comes to a job like this, although how clear it would be for the more complex processing I do on LibreOffice files, I don’t know. I’ve no idea whether it’s quicker or slower, but since my app takes a few seconds to do multimegabyte files it’s not an issue I’m going to worry about. But the main thing is, while I could doubtless learn to write scripts like that, it’d be a whole different ball game writing a drag-and-drop app to do it. BASIC isn’t beautiful to write them in, but I already know how to do it – for RISCOS. This is my main attachment to RISCOS. If I was going to learn to write drag-and-drop in a new language, it’d be for MacOS, not for RISCOS. If your handling of You’ve done nothing to clean out empty pairs of tags, common in LibreOffice HTML if the work has been edited much, or tags that give useless information, like (for the nineteenth time) “this in in GB English” or “this is black text” or “this (body text) is Times New Roman” – which LibreOffice thinks is part of the content, but which you don’t want for a web page (or you shouldn’t, if you understand what HTML is supposed to be for) – and you certainly don’t want it repeated ad nauseam when it’s not even been anything else in between. Sure, I could probably write it all in a Lua script – but can you then drag and drop files on it? And anyway, I know BASIC… |
Clive Semmens (2335) 3276 posts |
Just for laughs, I went and looked what it was that went wrong when I dropped the Firefox bookmarks file on XP1LO2web – it wasn’t the tag – and it did at least fail gracefully, reporting an error rather than crashing.
I suppose I could make XP1LO2web accept “HTML” files without a header, but I’m not sure why I would. What it ought to accept, perhaps, but currently doesn’t, is a tag – in upper case. But since LibreOffice seems to reliably do them lower case, perhaps I won’t bother…
|
GavinWraith (26) 1563 posts |
I fully take your point Clive. By drag and drop you mean dropping the document onto the application, whereas what I am doing is the reverse: dragging the application (i.e. script) onto the document, or at least a StrongED window displaying it. I am sorry if my relentless pushing of Lua seems irrelevant. I am doing it mostly for my own pleasure. So here is a script for removing empty tags – it is not recursive, alas.
Yes, LibreOffice is very useful, it would be nice if it had an option not to include all the guff. Maybe it has. I have done man libreoffice but it is not immediately clear that any such option exists.
|