Jump to content

User talk:Jan Hidders/HTML-free mark-up

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Toby Bartels (talk | contribs) at 23:04, 30 July 2002 (Re: Parsing). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

That would certainly make the work of the parser (and mine;) a lot easier. But it would also mean to automatically replace HTML markup (which people will use no matter what) with wiki format (upon saving), which will be

  1. very tough, especially with tables
  2. a reason for people to cry out loud (I am thinking especiallyof The Cunctator;)

Also, some HTML things are nice, font tags, for example. Labelling an image is quite neat if the label is the same color as the object in the image.

Magnus Manske

You only once have to translate the complete contents of Wikipedia to the new mark-up. After that you always replace the tag delimiters < and > with the entities & gt ; and & lt ;. (That's what PhpWiki does, for example.) People can then type all the HTML they like, it won't work. I agree about the font color, but you can probably invent some mark-up for that too. Jan Hidders

Jan, why do you think it desirable to banish all HTML markup? Isn't it be better to keep the threshold of contributing as low as possible for new users? AxelBoldt

I believe firmly that using HTML actually heightens that threshold. (FWIW, I actually teach XML but still find that it doesn't make sense as a human-readable format.) Remember that the complexities of HTML was exactly the reason that WikiWiki was invented (See "The Wiki Way" by Ward Cunningham, the originator of the concept). The HTML table-syntax, for example, is much more involved and harder to read in ASCII form than PhpWiki/MoinMoin table-syntax. Having two ways to do the same thing (e.g., ' ' and < i >) also doesn't make things simpeler. Also remember that accessible does not just mean that it should be easy for people to write something new, but also that it should be easy to adapt something old. The latter becomes more difficult if a previous writer used some nifty HTML stuff. ... I guess I could go on about this but I have to get back to work now. Jan Hidders

FWIW, I agree, especially about the table syntax - take a look at my (still incomplete) list of food additives and think about why my first run is generated by a Python script from a space-separated file on my local machine. It'd be nice not to have to carefully filter HTML, too, so that things like clicking here aren't possible.

I do have some notes on your proposal, though:

  • We'll still need to be able to enter entities like β as "& beta ;". It'd be nice to be able to enter hexadecimal entities like ’ and have them converted to & #8217; on output for older browsers too.
  • Recognising a "_" or "/", etc., that's supposed to be rendered as itself might be tricky. Maybe a double-underscore?
  • I'd like --- to do em-dash, "—", myself. I wonder how many people use strike-out?
  • I will never remember which is superscript or subscript. How about something more mnemonic like {^superscript^} and {_subscript_}?

Carey Evans

I agree that the entities et cetera should stay, it's only the tags that I don't like. The problem of escaping special mark-up symbols is usually solved by a special escape symbol like "\". I would advocate that here too. I also agree about the em-dash and, yes, I don't think strike is used very much. I also agree that my symbols for sub and superscript are not very intuitive, but {_sub_} looks a bit much like _sub_. -- Jan Hidders

I have to say I like your proposal. Although I'm generally very comfortable editing HTML by hand in Vim, wiki editing with wiki tags seems very appropriate. I like having different level headings indicated by the number of = signs before and after, for instance. However, that particular convention leads to lots of typos: people forget to leave a space between the section heading's text and the equals signs on either side, or they don't balance the number of equals signs on either side so we see a dangling = on the page. Regarding tables, could there be a way to specify/enforce the number of columns in a table at the beginning? I think pages like List of saints would be much easier to edit using the syntax you suggest. Wesley

Thanks for agreeing with me. Enforcing the number of columns given the first line of the table is possible but not easy to implement; the parser then has to remember the number of columns. -- Jan Hidders

One problem with the proposal: How will the new table syntax represent border/borderless cells and rowspan/colspan (necessary for the depiction of the roulette board)? -- Damian Yerrick

Good question. I only gave a notation for colspan. It is enevitable that if you are going to forbid the liberal use of HTML some things will no longer work. On the other hand, if you do want to allow HTML (or a safe subset of it) then you should write a small parser for that if you always want to guarantee correct HTML output and make sure that Magnus's table lay out isn't messed up. -- Jan Hidders

Just wondering: why would I care? --The Cunctator


From the mailing list, with replies:

I do disagree with you there, thinking that ''' is more difficult, although only because more newbies will know <b> to begin with — they are inherently of pretty much the same complexity (I see two minor arguments each for relative simplicity). This is minor, and the difference will probably only lessen with time.

I think the relevant arguments have already been mentioned:
  • both are equally easy to learn
  • many newcomers already know <b>
  • other newcomers are a bit intimidated by HTML tags
  • raw text with ''' is slightly easier to read than with <b>
The first isn't an argument, just a denial of the existence of an argument for the other. The second is the argument that I think wins the case for <b>; I'm claiming that they look about even without that (relevant since that will lose its strength over time). So you missed the two minor arguments in favour of <b>:
  • Beginning and ending are clearer; a misplaced tag is easier to spot when it's rendered as a literal <b> in the text.
  • The letter "b" reminds people of the meaning of the markup; many people already associate this letter with boldface thanks to Microsoft Word and similar programs.
But we're getting down to niggliness here. You already know that I prefer ''' in most situations anyway, and prefer it to <strong>, which is how it actually renders.

I argue that the HTML tag itself is the best wiki markup for most of these. It's just a few situations where we have something better, or where the HTML is so complicated that we *need* something better. Then I'm with you; I just wish that this weren't an antiHTML crusade.

That crusade is just me. Please don't let my extreme point of view stop you from agreeing with more reasonable points of views. :-) Even I could probably be convinced to use HTML tags for certain mark-up if we cannot find good Wiki alternatives. However, if there is a good Wiki alternative then we should use that and that alone. But you probably agree with me there.
Except for "alone". Although I am mulling thoughts in that direction. I also suspect that we'll disagree on what's a "good" alternative, but at least doesn't affect the principle of the thing.

Well, Lee has just informed me that <strong> and <em> are taken care of; it was only Phase II that rendered ''' and '' suboptimally as <b> and <i>.

This means that the mark-up has even now again become more complex because a writer now has to decide between ''' and <b> and know the difference. If there had been only one notation we wouldn't even have had this discussion and/or the developers would have had to consult Wikipedia-l for adding new mark-up. We are failing in keeping the mark-up simple. That is bad.
There is a difference between <strong> and <b>, and what we write here generally should be <strong>. What we need to do now is to deprecate <b> in ordinary Wikipedia usage, and I'll join you on that. But I will resist getting rid of it entirely, for several reasons.

Toby 13:14 Jul 30, 2002 (PDT)

Jan Hidders
Toby 23:04 Jul 30, 2002 (PDT)