"Computing in the 2020s is (still) a user-hostile shifting sand land1. We are drowning in churn and noise. I am fighting back by switching this website from HTML to PDF. … PDF has many shortcomings. But … it stands in opposition to the mercenary, dynamic web … PDFs are self-contained and offlineable – you can archive them and be confident they will remain stable and readable in the future, with no external dependencies to manage."
"A file is a sequence of bytes, of a known length. It is completely under the control of the user. A vendor cannot change it sneakily. You can checksum it and manage its integrity. You can sign it and manage its authenticity. You can back it up and distribute it easily. You can sneakernet it andsamizdat it. You can parse it and convert it to another format. You can work with it offline. We have 60 years of tooling available to manage files."
"Lab6 is … also a static, immutable doc-chain with each issue referring back to the published hashes of previous issues …"
Now here's a meme to ride on the block-chain high! 😆
@kensanata But why adding the PDF at the end in a pre-formatted block? "This whole preformatted block is PDF content and cannot be viewed in a Gemini client."
@kensanata this is pretty cool!
The reason I don't like it, however, is that it's illegible on mobile. Zooming in PDFs on a smartphone is terrible UX. Sensibly scaling PDFs to mobile screens is not possible - they are by their very nature not made out for the task. Extending this line of thinking gets us to styling the same content different ways, essentially reinventing HTML+CSS.
@bfiedler they claim that on their phone accessibility and reflow and all that work just fine; personally, I have no idea
@kensanata huh, never heard of PDF reflow. I can report that my reader sadly does not implement this (iOS Safari whatever is the current version). I remember my Android phone also not implementing this, but that's a year ago and stuff might have changed since then.
I'm still sceptical, also because each issue is tens of MB in size, which is unusable with slow speeds or data limits, compared to the tens of kB for sensible blogs.
But then I shouldn't be only negative: I think it's a cool idea to explore, especially with polyglots, as mentioned in issue 2 or also in PoC||GTFO 😄
@bfiedler It's possible to specify media size when creating a PDF. For services which do this on-the-fly, this could be custom to any given user's screen specs (CSS and/or JS queries can typically determine this), or can be set at reasonable breakpoints.
What size are most smartphone displays now? 5--6"?
PDF starts getting ridiculus below that point, but might be useful at that scale.
I've got 9" (OLED) and 13.3" (e-ink) devices, both of which handle most full-sized reading materials (books, magzines) fine w/o zoom.
I'm coming to the PoV that it's not PDF that's bad, it's our device displays.
@dredmorbius @kensanata It is rather the fact that out displays are too good. The fundamental problem is the high resolution of small screens. That’s why every smartphone browser out there renders at half its resolution. With HTML and CSS reflowing is not really a problem, but PDFs are paged, which is a great thing on large displays and for printing, but terrible for dynamic content. Most PDFs I want to read on mobile are scientific literature, which is typeset to have its illustrations and tables at very specific points - and there simply isn’t a way to reflow semantically.
My display size is 4.7”, but that’s splitting hairs tbh.
@bfiedler You cannot divorce screen resolution from screen size.
Smartphones are sized for pockets, and occasionally purses or handbags. They're primarily focused on portability and one-handed use. They're optimised for neither reading nor writing.
Books are an artefact engineered through centuries to fit human ergonomics of weight, font size, line length, lines per page, pages per volume, and overall dimensions. Most will land in the 9--12" diagonal measure. Larger and smaller books exist, but those are themselves typically obvious compromises for either portabiity (pocketbooks) or display size ("coffee table" books, atlases, etc.)
Ink-on-paper tends to achieve somewhere between 300--900 dpi effective resolution. My e-ink display at just over 200 dpi is very nearly paper-quality --- a magnifying glass shows pixel elements, but the naked eye does not.
Smartphones today tend to ship with 5--6" diagonal displays. That's the size of a small pocketbook, it's smaller than a 4x6 index card, most are smaller than a 3x5 card. Index cards are not a form-factor optimised for long-form content presentation or consumption.
The problem with smartphones for reading is that they're too damned small, and you have to scale up devices to fix that.
PDFs tend to work well starting at about 8" displays and up. My 13.3" eink ebook reader is wonderful as a reading experience, it really is.
@bfiedler Note also with mobile phones: after accounting for bezels, any hardware buttons, and on-screen elements (application and/or device menus, website menus, headers, and footers for any browser-based content), your're cutting even further into the available viewable area.
I've often seen viewable area cut by half or more. No joke.
The Internet Archive's BookReader has lost its full-screen immersive option, and that shaves about 30% off the displayable area. Given many scans are only just at the edge of legible quality at maximum size, this is ... frustrating.
For downloadable materials, I can grab the PDF and view locally, immersively. For online-only content (many of the books in the Online Library) this makes reading / viewing far less viable. Given that the onscreen contrast and visibility controls are also poorer than my offline ones (many scans have dark or damaged page backgrounds), an already pretty marginal situation rapidly descends to unusable.
@kensanata Indeed, reading the rant against the Web, I thought I was reading Gemini propaganda. (But the conclusion "use PDF" surprised me.)
@kensanata The part where this kinda breaks down is browsers are, still despite the 'market currents', more extensible than PDF readers, and probably better at accessibility.
That said, I do like the idea of publishing a static website where you have both an HTML target and a PDF target. Had that in my todo lists for certain stretches of time, never got around to actually making that happen. Shouldn't be too hard to do with some pandoc and a wee bit of a LaTeX template.
As for pages, I really wish the idea of paragraph-addressed text caught up. Pages are arbitrary, paragraphs are logical. Scrolls aren't new, didn't we use to have scrolled long pages back in the day as the main thing? (Imagine reading this on the metro: https://en.wikipedia.org/wiki/Handscroll#/media/File:Anonymous-Ten_Thousand_Miles_of_the_Yangtze_River.jpg 😂 )
@cadadr My wiki has a PDF button at the bottom of every page that uses wkhtmltopdf to dynamically create a badly laid out and ugly PDF. Not sure if anybody ever uses it.
@kensanata I use that one to convert my HTML CV https://gkayaalp.com/vc.html to PDF https://gkayaalp.com/vc.pdf and it does look fine IMO (and definitely very hire-able with that beautiful professional career record 🤭), tho not necessarily as good as a nice LaTeX PDF. It does take a wee bit of media-querying tho, you can view-source on that HTML page to see a small example.
@Mayana I do often find myself having to copy excepts of text out of PDFs and there you encounter a usability problem that is also a great accessibility problem. I need to have a complex bit of Emacs lisp to reflow the copied text so it doesn't contain those hyphens at the ends of lines (which is further complicated by the fact that dashed compounds can be broken at EOL like normal words and that requires extensive coding or just manually editing...), and it's not infrequent that PDFs represent on-screen text with utterly different characters. It goes from infrequent stuff like e.g. recently I had to deal with one that had parentheses represented like ~text!, to the extreme end where the underlying text is just a random string of Unicode symbols and has nothing to do whatsoever with the actual text.
Not inherently PDF problems maybe, but frequent nevertheless, especially in OCR-ed stuff. I'd really love at least scientific publishing moved from PDFs to something that is plain text with extensions.
@emacsomancer ePub has promise (it's mostly a more-constrained HTML), but seems to be subject to the same abuses HTML is: excessive fiddling by publishers without awareness/concern of how end-user software / devices will present it.
Also much of the end-user software is absolulte shit. Linux's fbreader (I think) really sucks at presenting ePubs.
@dredmorbius @Mayana @kensanata I think PDFs are fairly like postscript under the hood. Problem is, even in nicer PDFs, things like hypehation and stuff are hard-coded, so to speak. Some PDFs are nice enough to use the appropriate unicode hyphen (I don't recall the particular one), some are not. Another problem is, PDFs have many sources. A PDF can come from XML et al, from various light markup through org-export or pandoc, from troff et al, from raw postscript, LaTeX, Scribble, Sile, etc.. HTML (or probably epub as well) can come from many sources as well, but the final format, HTML, still does divorce layout and text well enough to be accessible and more usable. A non-interactive subset of XHTML + SVG + MathML + some little CSS2 on top would make a nice replacement for even the best PDFs IMO.
IMO the appeal of PDFs at least in scientific context is that the digital format can be an equivalent alternative of the print version of a document so that if we're talking about Cadadr 1984 p. 420, everybody knows what is referred to, it doesn't matter if it's a PDF or the print volume. At least in a scenario where it's the original PDF and not a preprint, the PDF has pages marked and counted properly, and everyone is similarly abled or the PDF is really good. Which is seldom the case, most PDFs I have to deal with are near-worst-case and/or preprints that are flowed and layed out completely differently so page numbers don't match. More often than not tho, we are talking about paragraphs / figures / subsections etc., so paragraph addressing rather than page addressing makes more sense imo.
@cadadr PDF is a typographic format, not a document format.
Again, a PDF which includes the raw text and/or markup source would be more useful. Also slightly larger than current formats, though given storage capabilities, that's a rapidly diminishing concern.
The other option would be to distribute the markup and have all render on the client end, much as with HTML.
I suspect that this would result in much the same set of problems as now afflicts HTML.
@cadadr With a library of thousands of documents in various formats:
PDF renders consistently no matter what. Pagination, spatial memory, obscure glyphs, equations, graphics, etc., All Just Work. (DJVU is equivalent to PDF in this regard..)
Nothing else does.
The various idiosyncracies among ePub documents espeically are increasingly annoying. There should be a stock set of defined formats with publishers required to support at least those. (They can supply their own broken formats if they want, But They Will Be Wrong.)
There are very few highly creative / unique variations in book typography. Conventions should support the text, not call attention to themselves. (Brochures and special reports are notable exceptions, they're hell to read, especially on e-Ink devices where assumptions of colour support and contrast/shading fail.)
HTML is increasingly a shitfest. Hell, even my own Motherfucking Web Page fails me on e-Ink (the slight off-black-and-white contrast becomes annoying), though it remains better than most. (Some colour-depth media queries may help with that.)
@dredmorbius they Just Work as long as font embedding/subsetting has been done right. Some viewers won't render JBIG2 or JPEG-2000 embedded images quickly, properly, or even at all.
And then there's PDF forms, the most disheartening subject on the planet ...
@scruss Also: I've not run across a PDF with broken fonts in years.
Broken HTML? Daily. (As in broken enough to be entirely unreadable / inaccessible.)
@scruss PDF forms are interactive, which is not PDF's strong suit.
(I've never had luck with them either. That said, I also don't have them in my collection, which consists of documents --- books, articles, reports, brochures, etc.)
@dredmorbius it's more that there are two completely incompatible technologies that Adobe pushes for form generation. One (LifeCycle, affectionately known as DeathSpiral) only works/displays on Windows. Many governments use LiveCycle forms.
I'm still with plinth's assessment: “PDF is a file format that is made to be able to represent marks on a page. It dictates neither how those marks are made nor whether or not [they] carry any meaning”
@Mayana One of my goals is to have publishing system which utilises a minimum sufficient markup (might be plain text, HTML, Markdown / Asciidoc, LaTeX), and generates on-the-fly endpoints / targets (chaching for reuse) as requested by the client.
That could be HTML, it could be PDF or ePub. It could be direct TTS or some native format addressing the needs of screen-readers, if such exists.
publishing a static website where you have both an HTML target and a PDF target
There are sites that do that for at least some content. It's not generalised, but does exist.
Wikipedia/MediaWiki has a "download as PDF" option (this may rely on the browser's print engine, something FennecFox pointedly lacks).
Wikisource has an ePub export, which may be better suited to handheld mobile devices. Project Gutenberg similarly. (Both sites are aimed at longer text documents.)
Internet Archive has numerous download formats for its texts, usually including PDFs (generally of scanned / OCRd documents), but also plain text and such.
There are archival formats such as WARC which are more HTML-native.
On my 13" eink tablet, I'm finding I increasingly dislike HTML for longer-form reading, though with a suitable browser (EInkBro does well), they're not too painful.
PDFs don't have the flow of the Web, and organising them is a very painful weak point. But with a sufficiently large screen (8" and up seems to be the range), they're much better as a reading experience.
@kensanata sorry to be that nitpicking nerd, but this has to be some form of elaborate satire given how terrible the proposed solution is.
Issue 1 for example: seventeen (17) megabytes for 12 A4 pages (issue 1) of badly laid out text
> New Frontiers in PDF AccessibilityThis file is both a valid PDF/A-3b document and a valid MP3 file containinga dramatic reading of the content. It is also readable as plain text in anytext editor.
I don't think this gives them any right to complain about "We are drowning in churn and noise" when they are stuffing all of that bloat in an all or nothing opaque blob.
@codeforchaos You did realize that the PDF was a polyglot file that includes a voice reading? It's performance art, as far as I am concerned.
@kensanata >You did realize that the PDF was a polyglot file that includes a voice reading?
yes, i mentioned this specifically
>It's performance art, as far as I am concerned.
maybe, or satire, or just plain trolling...
@kensanata Most of the good things he mentions about PDF ("PDFs are self-contained and offlineable") apply to EPUB as well. Why not using EPUB?
@kensanata This is strange. HTML has an open specification, a lot of implementations so I would guess it has a better future than PDF.
@bortzmeyer The author is unhappy with inherent instability of the standard, I think. If you back into the archives, twenty years from now, will the sites work, if they load frameworks from here and there, relying on APIs that may or may not exist… in a way "HTML5" is worse than a build environment in a fast-moving world because you usually cannot get well-defined versions of everything, easily. PDFs, the author argues, are not apps. That's my take.
(But JS for PDF is coming.)
@renatoram I'm not planning on converting but it seems to me that PDF is a standard, albeit a proprietary one. The author does mention the limitation of a single column. PDF offers more benefits than multiple columns, though. One feature of the lab6 PDFs is the attached disk image, for example; and the images of course, all in one file.
@kensanata But, you can still ship HTML 1.0 documents today and they look and work perfectly fine, and are trivial to edit. PDF hasn't changed *much*, but it has had spec changes, and it's a pain to generate or edit.
@mdhughes True. I think the author is not concerned too much with early HTML, when the web was document based. It's all the other APIs that trouble them.
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!