boost for
(cat|girl|gay|boob|shark|emoj(i|o)|gargr?(on|amel))s?
@CobaltVelvet they're nifty little problems that are always easy enough to solve in a reasonable time
@wxcafe @CobaltVelvet i take it you never hung out on perlmonks.
@brennen @CobaltVelvet nah it needs to have a goal for me
@CobaltVelvet @wxcafe (technically i s'pose most of that predated "shitpost" as a meme, but still.)
@CobaltVelvet
I enjoy writing regex engines, does that count?
@kellerfuchs @CobaltVelvet I enjoy complaining about how much better regular expressions could be if they were composable, ie http://synthcode.com/scheme/irregex/ (see also http://www.more-magic.net/posts/lispy-dsl-sre.html and for why regexes aren't composable (read "stringly typed systems") see http://groups.csail.mit.edu/mac/users/gjs/6.945/psets/ps01/ )
@cwebber @CobaltVelvet
Yes, that's why embedding regexps in your language (i.e. having regexps as values and operators on them, as in https://github.com/ocaml/ocaml-re/blob/master/lib/re.mli#L208-L237) is so much nicer :)
@kellerfuchs @CobaltVelvet Those look nice! Yes, non-stringly-typed actually-composable regexps is what the world wants but what most of us day to day don't get to have!
@CobaltVelvet @kellerfuchs fun fact I wrote the parser for Mudsync using irregex and it was a delight (but not super fast)
lambda, the ultimate regex engine!
@cwebber
That's a pity (that irregex isn't fast).
FWIW, ocaml-re uses pretty simple automata techniques; IIRC, it lazily constructs a DFA, using Brsosowski's derivative to lazily build a NFA and “then” lazily determinizing.
There is no obvious reason this can't be done in Scheme too. :)
@kellerfuchs tbh I think it *could* be faster, without rewriting irregex at all already, if Guile gets a JIT or AoT compilation (which it looks like it will)
@kellerfuchs but the method you suggested sounds appealing :)
@cwebber “make the runtime faster so my library isn't as slow” sounds like something I would do.
(In that case, the two gains are likely complimentary)
@cwebber @kellerfuchs @CobaltVelvet Shout-out to probably my favorite published paper ever, "A Play on Regular Expressions": https://sebfisch.github.io/haskell-regexp/
I used that library once to crack the cipher state of badly-encrypted files where the plaintext had a nice regular structure. Each observed byte constrained the possible cipher states a little more, in ways that fit the semiring construction nicely.
@jamey @kellerfuchs @CobaltVelvet holy shit this post has everything
@cwebber @jamey @kellerfuchs @CobaltVelvet kinda sorta related:
thoughts on this?
http://doc.cat-v.org/bell_labs/structural_regexps/
@grainloom @jamey @kellerfuchs @CobaltVelvet Don't know anything about that one but skimming the first page I think I'd say "yeah there should be no reason regexps are so line focused"
and then
"there should be no reason all our unix tooling should be line-focused"
and then
"imagine an alternate future where git merge operated on the AST level rather than on the line level"
@cwebber @grainloom @jamey @CobaltVelvet
This, very much.
There was a FreeBSD GSoC project that attempted to do that (making all tools in the base system use a generic output lib that could spit out our current human-readable text, or JSON/XML/whatever), but AFAIK nothing came out of it and it's sad. ;_;
@cwebber @grainloom @kellerfuchs @CobaltVelvet I'm not sure I can stand to read a Rob Pike paper, but I'm a huge fan of applying language theory to things we don't currently have very good tools for. Not just non line oriented text, but also binary files, network protocols, etc. Somebody in the fediverse linked to a paper a couple weeks ago about context-free protocol serializer/deserializer generators which I gotta dig up again, so cool!
@jamey @grainloom @kellerfuchs @CobaltVelvet notably if the language used s-expressions it should be super straightforward :)
@CobaltVelvet @kellerfuchs @grainloom @jamey Likewise editing a lisp from emacs is just *the* *best* if you use a tool like paredit (or the a bit looser smartparens) because you aren't editing text, you're operating on the language AST http://emacsrocks.com/e14.html
@jamey @CobaltVelvet
Yeah, it's a very nice paper.
IIRC, it's however pretty hard to implement match groups with that approach (and it seems the Haskell library doesn't?), so that's likely a bust for @cwebber
@kellerfuchs @CobaltVelvet @cwebber Yes, you're correct, that library doesn't make it easy to extract parts of the match. I seem to recall that I thought through how to do it, but didn't actually try it because it's hard, and now I don't even remember how it would work.
But there's a paper from the next year's ICFP that extends regex derivatives to context-free parser combinators, so cleverer people than me have that covered!
@jamey @CobaltVelvet @cwebber
IIRC, they already had context-free stuff there, using coinductive (i.e., potentially infinite) structures for regexps.
@kellerfuchs @CobaltVelvet @cwebber Yes, that's correct, but the later paper makes it relatively efficient and, of immediate relevance to this discussion, extends it to parser combinators which can extract a parse tree rather than just recognizing whether an input matches or not.
@jamey @CobaltVelvet @cwebber
*nods.*
If you happen to have a ref. to the paper, that would be great; otherwise, I'll try and remember to hunt it down.
@kellerfuchs @CobaltVelvet @cwebber Right, that would help of course! http://matt.might.net/papers/might2011derivatives.pdf
@jamey @kellerfuchs @CobaltVelvet @cwebber I think you actually want https://arxiv.org/pdf/1604.04695.pdf
@cwebber @CobaltVelvet @kellerfuchs @jamey at least, that's the one that gives the good runtime
@CobaltVelvet hate em
@CobaltVelvet Shouldn't it be garg?(ron|amel) at the end? Otherwise it would be triggered on "gargramel", and I'm pretty sure it "gargamel", right?
@dolfsquare gargramels and gargon are what i expected to be the funniest part of the toot
convoluted maybe
hello i'm one of the seven (7) people in the world who apparently enjoys writing regular expressions enough to shitpost about it