Contest and Guild
I need to know what is wrong with the parser lazner wants to know and i really do not know how to explain it that great can someone tell me so i can pass it on to lanzer now in this ask the admin.
before people say this cant go here.My question is what is wrong with the parser so i can explain to the admin for us. After that it is the last of my matters i wanted to bring up to the admin.
You could just refer to the
post I made in Spring Cleaning two goddamn years ago. It lists the majority of the known issues and makes some suggestions.
None of the issues I listed then have been fixed. There has been at least one issue introduced recently, but I haven't spent the time to identify it.
Everything on the list stems from the way the BBCode parser
scans with poorly-written regexes instead of actually parsing, the Smileys parser is done as a separate pass
with disregard for surrounding content, and the "anti-XSS" filter
does not do what it's supposed to do at all.
In fact, the anti-XSS crap is completely wrong:
It does an unnecessary HTMLEntities pass that mangles actual HTML entities (for example, you can't use the less-than or greater-than symbol in the vicinity of text without it being mangled into an ampersand by overzealous HTML removal)
It treats % when followed by any two alphanumeric characters as a URL-decode and then does it wrong
It strips any mention of Javascript, Window, Expression, and Script (among many other things) if they're followed by a period (which, even if it's to protect against XSS, that wouldn't work anyway, since if they're wrapped in quotes, the damn filter adds a backslash before the closing quote)
Backslashes, semicolons, and a few other things disappear when posts go into the server's page cache (so four backslashes look like four until someone else views the page, then it becomes two, until someone quotes it, where it becomes one, then disappears as it enters the cache again)
... And a dozen other problems I can't be arsed to list here.
It's not simply broken.
It needs to be completely rewritten by someone competent enough to write an actual honest-to-god lexer and parser. It's not that hard, and it can be written to run very, very fast.