Conversation

What if #iocaine supported generating images too?
(inspired by @pengfold's FakeJPEG tool)

2
1
1

There are a number of rust crates that seem to make it easy to create valid JPEGs (or PNGs). The question is, what should they contain, and is the generation fast enough?

Or, perhaps another approach: what if SVG, but partially built by a markov generator? Can I make that valid? Is it something the scrapers would even care about? Or should I stick to playing with jpeg & png?

3
1
0

@algernon for sure the alt text provided and the garbage text around them are the most important pieces.

1
1
0

@alex You might be onto something.

Oh dear, this is going to be glorious.

0
0
0
@algernon As for generating SVG's, a fuzzer may be useful: https://komar.in/en/code/xmlfuzzer (haven't used this one, but the description fits)
1
0
4

The image crate looks like a simple way to create PNGs and JPEGs.

Question remains: what shall be the content? I suppose, the easiest is to make it user-controlled, somewhat. Provide a few template helpers that can insert various types of images at certain points.

Like, there'd be purely random, slightly randomized mandelbrot or julia fractals, and so on.

I have a ton of other ideas, but... this'll do for starters. Still need to do some benchmarking, to make sure this is even viable.

2
0
0

Ideally, the images would be trainable too... but that's beyond my expertise. It would also require a lot of images to train on, and that's much more expensive than training on text.

So sticking with untrained, but procedurally generated or random images is the way to go for now, assuming the performance hit is acceptable.

1
0
0

@algernon I think the key is keeping the resource use down, I'm most interested in iocaine to save resources, and if generating images uses too much resources, we're better off sticking with text.

1
0
0

@skyfaller Yep, agreed. If I add image generation, it will be entirely optional, and off by default.

1
0
0

@algernon @skyfaller This is why I went with "not quite jpeg" generation.

On my laptop, with FakeJPEG, I can generate around 8,500 1280x1024 "fake" jpegs per second. That's in pure Python.

Using the PIL library (where the compressor is compiled C), I can only generate about 400 per second.

Creating a JPEG from is a fairly CPU-heavy operation.

1
0
0

@pengfold @skyfaller Ooof.

Just did a quick test with the image crate, to create 1280x1024 jpeg of pure randomness, and it was going at 10 / sec. Generating a PNG is much faster (~100 / sec), and I suppose I could make it faster if I disabled compression.

Though, the bottleneck in this case is not just the png/jpeg compression, but the generation is costly too. Would need considerably smaller images, or faster (less naive) image generation in the first place for this to be viable.

I guess I'm not generating images just yet!

0
0
0

Did some benchmarking, and nope, this is not going to happen anytime soon. Generaging a valid JPEG out of pure random data is slow. Generating a PNG is considerably faster, but still orders of magnitudes slower than generating the text.

This doesn't scale well enough, and requires too much computing to be viable.

I still see potential in it, but will need to be smarter about how it is done. Generating an image - or even multiple images - for every request is likely not sustainable. But if we generated it for some pages only, in smaller sizes, that might work.

Still would need a way to generate them fast, and for the output to be plausible.

This is not something I have experience with, nor something I can easily borrow from someplace else. So I'm going to postpone this idea for now.

3
1
0

@algernon feel free to port FakeJPEG over to your favourite language. It's not a particularly complex bit of code and I'm right here if you have questions.

1
0
0

@chfkch Or URLs! QR codes should be fast to generate, and small enough to include as data: URLs too. That's a neat idea!

0
0
0

Well, that didn't take too long, and @chfkch came up with a splendid idea: What if QR codes?

They're images, they're small, and they can be generated fast. With the qrcode, image and base64 crates, I can render 5k codes / sec into a base64-encoded data: URI on a single core.

That sounds like an acceptable speed, and I can provide a qr <STRING> template helper for people who want to opt-in.

1
0
0

The qrcode crate can render into SVG directly, which would likely be faster - I haven't checked yet - but my suspicion is that the crawlers would be happier to ingest a PNG.

I'll have a look at the SVG parts too, and might offer both: qr <format> <STRING> or something...

3
0
0

@algernon what about blurhash? Just a 20-ish character string to generate there.

1
0
0

@mike 20-ish to generate, but for crawlers to ingest it as an image, I'd still have to encode it into some kind of image (can't rely on JS to do it for me on the client side), and then it is suddenly considerably larger, isn't it?

1
0
0

@algernon I really don't know, it was just a passing thought when I saw the thread. I'm not sure how efficient the image generation side is, just that it was supposed to be simple and fast so thought it might be a fit.

Haven't thought it through more than that! 😁

1
0
0

@mike Oh, it was a very intriguing idea! It might still end up being useful in the long run.

Like... if I taught iocaine to train on not only text, but images too, then I could generate blurhashes of those, and then generate a random blurhash from the learned ones, turn that into an image, and use that.

It would be slower than qr code into svg (or even png), but it would generate a different kind of image with more colors, and perhaps more plausible ones, too.

I'll definitely keep blurhash in mind, for the next time I'm playing with new garbage generation methods :)

0
0
0

Hmm... what should the template helper look like... I kinda want to be able to tell it the size, too, but make most things optional.

{{ qr "<STRING>" [width height] [format]}
{{ qr "<STRING>" [format] [width height]}}

The string is required, but if there are more params: if there's only one, treat it as the format, with default width & height. If there's at least two more, then the next two are width & height, and format is an optional third.

I guess that can work. Lets see if it does in practice!

3
0
0

@algernon do the images have to be unique, or could it just download a bunch of CC0 images, maybe apply some ai poisoning thing (if that exists), and serve a random one per url with maybe markov alt text?

1
0
0

@Ember The images must be generated by iocaine, or be completely external.

0
0
0

And with all the new dependencies, the static binary is only ~100k bigger. And it can generate fancy QR codes.

I just need to come up with a sensible template where the QR code fits in.

And then build a template garden, because I've seen some fancy ones!

1
0
0

I swear the QR code's text is entirely accidental, I did not tell any crawler to fuck themselves with a pencil.

(The QR code decodes to "! The pencil felt thick and hoarse.")

1
0
0
@algernon Damn you make it harder to resist installing this thing every day!
0
0
3

@UnePorte It's pushed to the main branch already! No docs yet, though.

1
0
0

The downside of the new template is that it's ~8.5k, up from the ~2.2k with the default template iocaine ships with.

Lets see if I can make it smaller, without sacrificing much...

1
0
0

@algernon I might give it a try this week, it makes me want to build maybe an e-commerce template, with products and images, something like that

Or add an image section to the search engine one !

1
0
0

@UnePorte It can only generate QR codes, though, not "real" images.

1
0
0

Down to 5.2k:

  • removed OpenGraph & Twitter cards
  • Now using minify_html to further minify the output (trading some CPU time to gain size reduction)
  • Manually condensed the CSS (because minify_html leaves that alone)
  • Adjusted the config to generate two paragraphs less, since the template hardcodes two extra ones.

This looks acceptable, because my currently running instance averages around 4.5-5k pages, so 5.2k is marginal increase.

3
0
0

OTOH, on my live deployment, there's a bit of javascript to read the page contents, that JS is not part of my current test template, and it adds about 1k...

I might remove that, because while it is funny, the bots don't click it. And then we're at an OK size.

0
0
0

@algernon yeah but since it's SVG, it can be styled via CSS, so there's probably optionns here to alter colors, shapes, etc.

1
0
0

@UnePorte Possibly... though the SVG is currently generated as a base64 encoded image, too: <img src="data:image/svg+xml;base64,<blah>"> - not sure how styleable that is.

0
0
0

Looks like minify saves me around 500 bytes on this template, that's pretty big.

1
0
0

@orva Sadly, smaller QR code does not neccessarily translate to smaller PNG :(

1
0
0

Oh, and minify_html can minify CSS! And JS too! I just have to enable the options. Neat. I can keep my templates readable then.

Lets see what happens if I add the TTS JS stuffs...

1
0
0

Without JS minification, that's ~6.4k page size. If I enable JS minification, minify-js blows up:

thread 'tokio-runtime-worker' panicked at [...]/minify-js-0.6.0/src/minify/pass1.rs:288:81:
called `Option::unwrap()` on a `None` value

I guess I'm not minimizing JS for now!

1
0
0

@algernon Oh yeah, PNGs. I was still somehow thinking about SVGs, which would probably be smaller as there would be less paths.

1
0
0

@orva Yeah, SVGs would likely be smaller, indeed. I'll check that too, eventually. Though, right now, SVGs are bigger than PNGs (no compression).

It's a delicate balance =)

0
0
0

Oh. It's an upstream issue. I can work that around, I guess... but that means I can't enable JS minimization by default, have to make it opt-in.

1
0
0

With the workaround applies to my JS, and JS minizmiation enabled, it's 6.1k. Still too big, so the TTS parts are gonna go.

I could save a bunch if I didn't inline the CSS & JS, and would host them separately. But that would be too much work.

1
0
0

HAH!

With some further tweaking, down to 5.8k with the TTS JS! Looks like it will be able to stay.

0
0
0
Timer precision: 40 ns
generate                           fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ builtin_template_with_defaults  72.11 µs      │ 181.3 µs      │ 74.58 µs      │ 75.9 µs       │ 65545   │ 65545
├─ builtin_template_with_minify    71.17 µs      │ 262.3 µs      │ 73.53 µs      │ 75.16 µs      │ 66305   │ 66305
├─ qr_png                          617.7 µs      │ 1.371 ms      │ 644 µs        │ 651 µs        │ 7676    │ 7676
╰─ qr_svg_raw                      623.7 µs      │ 1.36 ms       │ 648.1 µs      │ 653 µs        │ 7653    │ 7653

Ooof. That's a big drop in performance. Lets see if I can tweak it...

1
0
0

Hrm. Not likely I can do much here. The PNG encoder is already at fast setting, and I can't go "just disable compression kthxbye" on it.

The next best thing is to not generate a QR code for every page. That's a bitch to benchmark, though.

1
0
0

I should be able to speed up the svg:raw case, though, because that's doing a bunch of unnecessary back-and-forth conversions. At least I think so.

1
0
0

aaand nope, the back & forth conversion is inconsequential, and/or rust is smart enough to figure out it's not needed in the first place.

I guess I could compare the generated assembly, but cba. Benchmark says that my optimization attempt did jack shit.

The surprising thing is, that when I did some naive benchmarking earlier, using image directly, the svg generation was 10x faster than png. Now its in the same ballpark.

1
0
0

I should profile it, maybe. Question is: do I care that much?

At the moment - no. But I'll need to figure out a way to only generate QR codes on some pages, a method that's reasonably efficient, and stable (as in: every request for the same page should end up in the same situation: either always with an image, or always without).

Starting to think handlebars might not have been the best choice for templating.

1
0
0

The problem with handlebars is that the helper functions are heavy and awkward, and probably slow as heck.

Maybe I'll just go 2.0, and replace it with something like minijinja. It has filters, a better custom filter & function story, too.

There's also tera, which I have experience with (as a user, because Zola uses it), but I'm not a big fan of its syntax. If minininja ends up being slower than handlebars, or gets disqualified for some other reason, tera is an option.

Then there's askama, which I have used before, and it was okay, too. I think I'd prefer it over tera, too.

1
0
0

One of the hard things with templates is that my helper functions need the random number generator, which is initialized for each request, for consistency. The generator is used by the functions, but it should not be exposed to the templates themselves.

With handlebars, I re-create the whole handlebars instance for every request. That's very wasteful.

If I switch away, I'd like to avoid that: create the template once, including helpers, and control them from that point on with state & context or something along those lines.

...but my brain keeps falling asleep, so I guess this will be a tomorrow thing.

1
1
0

After a quick glance through the docs, I prefer minijinja > tera, and ruled out askama for now. My gut feeling is that minijinja can do everything I want, better than handlebars, and provide a richer templating language at the same time.

But! Sleep. This can wait until tomorrow.

1
1
0

Well, the problem with minininja is that if I want to have an app-wide Environment, which I do not rebuild for every request, I'm dealing with lifetimes suddenly.

The current architecture of iocaine does not like lifetimes.

This means that I either do the same thing I did with handlebars, and build an Environment for each request, or rewrite large parts of iocaine. The former... isn't worth it at this point, and the latter especially not.

So this project is getting postponed for now.

3
0
0

@algernon how do you find time for this ontop of family and a full time job!!

1
0
0

@algernon I'd be interested in setting the generator seed myself sometimes, or a "lifetime" seed : it would allow me to have some consistency across pages. for instance, I could have a menu with items or a footer that stay the same across request, while still being random, making the site shell more stable and plausible

But that might not be possible ?

1
0
0

@UnePorte That isn't possible right now. I agree, it would be useful, it would give you more power and flexibility, but it's not something that I can sanely retrofit into the current iocaine architecture.

If (or rather, when) I rework how templating works, I'll make it easier to have more control over the RNG too. That may be a while yet, though.

0
0
0

Mngh. Keep coming back to this, because I am genuinely unhappy with handlebars. I might go ahead and replace it with minijinja, and keep building an Environment for every request.

Not a performance win, but more flexible, more powerful templating would be nice. It helps that it is nicer on the Rust side, too.

0
0
0