Skip to content
This repository was archived by the owner on Aug 3, 2024. It is now read-only.

Very WIP: markdown support #729

Closed
wants to merge 4 commits into from

Conversation

harpocrates
Copy link
Collaborator

@harpocrates harpocrates commented Jan 9, 2018

This is pretty much the biggest feature request I've heard. It comes up over and over again (this was one of the biggest issues my co-workers came up with in a recent conversation). I've read through this and I generally agree with it: markdown is a format for the web and not for documentation. Nonetheless, Rust's success using markdown shows that the format can be effectively used for documentation. We could borrow their extensions.

So far, I've only stripped/refactored the markdown cheapskate library heavily and done the bare minimum in order to pipe a hardly-working PoC through. My plan forward would be to

  • fix/refactor the markdown parsing
  • integrate better with DocH (sometimes by extending DocH - for instance to support hyperlink names different from the url)
  • add a module level option for switching to markdown parsing
  • tests!

Before I dedicate any more time to this, I'd like to ascertain that this is something that could eventually be merged into Haddock (and that my approach is not inherently limited).

Cheapskate is easier to adapt in a nice way. Less wrestling around
string types (the parser is more abstract), and a saner conversion
(I re-adapted all the parser combinator functions to work over strings
and that was pretty much it).
The goal is to have markdown map nicely into Haddock's internal 'DocH' structure.
To that end, I've added to the Markdown parser support for:

  * inline equations

I've added to the internal 'DocH' structure (and in the backends) support for

  * block quotes
  * code blocks also have some language information (sometimes filled by fences)
  * allow more docs inside of links

Also added a module-level Markdown option, which currently does nothing.
@harpocrates
Copy link
Collaborator Author

Sample:

module Main where


-- | Something _emphasisized_. [This is actually a link to **www.google.com**][0].
-- Although inline $x = 4 + 2$ equations work, display ones don't (yet)!
--
-- A bulleted list:
--
--   * an elem
--   * another and `main` code
--   * nested!
--      - Fenced code blocks work:
--
--        ```haskell
--        main = pure "this is fenced"
--        ```
--
--      - Ditto for indented code blocks
--
--            main = pure "this is indented"
-- 
-- # A HEADER
-- ## A smaller header
-- ### And a smaller one
--
-- > Some block quoted stuff
-- >
-- > > # A header
-- > >
-- > > _Formatting_ **works** as usual
--
--  [0]: www.google.com
main :: IO ()
main = pure ()

Output:

screen shot 2018-01-10 at 5 41 06 pm

@alexbiehl
Copy link
Member

Alec, this is amazing!

@harpocrates
Copy link
Collaborator Author

@alexbiehl Should I take that to mean there is a reasonable chance of a more fleshed out version of this getting merged? If so, I may be able to justify to my employer spending time on this.

@gbaz
Copy link
Contributor

gbaz commented Feb 5, 2018

I'd be supportive of the general approach. A module-level flag seems fine to me. I think markdown is innately too limited for what we want with Haddock... but if people are willing to live with those limitations (and they keep asking to! so why not believe them?) then yes, lets provide it.

Why inline-and-strip-down cheapskate rather than directly depend on it? I would prefer the latter, especially if it lets us swap-out engines as things settle down and evolve (see haskell/hackage-server#565 for example on a discussion on the state of things) instead of getting a snapshot of some-version-of-markdown that's just unique to haddock?

@harpocrates
Copy link
Collaborator Author

Why inline-and-strip-down cheapskate rather than directly depend on it? I would prefer the latter, especially if it lets us swap-out engines as things settle down and evolve (see haskell/hackage-server#565 for example on a discussion on the state of things) instead of getting a snapshot of some-version-of-markdown that's just unique to haddock?

Three issues:

  • We can't add extra dependencies (beyond those GHC already has + attoparsec already bundled with Haddock) since Haddock is part of the GHC distribution.
  • We will want to support a markdown flavor of our own, for instance to support Haddock's hyperlinking of identifiers. This is the main blocker to easily swapping out engines.
  • There is no actively maintained, low-dependency markdown library out there that we can just plug (see Adapting cheapskate for use in Haddock jgm/cheapskate#23 for instance).

By the way, I am still interested in completing this work, but I want to first know that something of this form would be merged (if so, my employer can justify me working on this for a couple of days to bring it to completion).

@gbaz
Copy link
Contributor

gbaz commented Feb 5, 2018

So I have an anxiety with regards to this. When you have to vendor code, then its much harder to replace. And @jgm, the author of the code, things it is not good code to base things on (as per that discussion). Meanwhile, on the linked issue from hackage server, he says "I'm working on a pure Haskell commonmark parser which you could switch to later." So I know perfect is the enemy of good, and I know that this much-desired thing has been delayed again and again on the basis of things that people feel are very partial objections. But I would hate to be in a situation where four months down the line people say "wait, we really should have based our fork on this new pure-Haskell commonmark code" and then its way harder...

Also, I believe the rule is not "under no means can we add dependencies to Haddock" but rather that adding such dependencies should be done sparingly and with care. So my temptation is to wait a bit to see how things shake out, and failing that, even if there needs to be a special purpose markdown lib for this, to try to make it an external lib with low footprint (no new transitive deps to be introduced :-P) so that it is not entirely coupled with/baked into the haddock codebase. I think that will lead to better long-term architecture.

@jgm
Copy link
Contributor

jgm commented Mar 28, 2018

I have been working on my pure Haskell commonmark parser, and I have a draft here https://github.com/jgm/commonmark-hs. I'd be curious if you have comments about the API from the point of view of haddock integration.

It is designed to be extensible. By defining instances for IsBlock and IsInline, you can have it parse directly to a haddock Doc structure. And you can define new syntax elements and extensions; there are a few examples in the package.

It is slower than cheapskate, but still considerably faster than pandoc, and it behaves better on pathological input than either of these libraries.

@harpocrates
Copy link
Collaborator Author

@jgm That project looks great! Here's what I'm going to attempt to do:

  • implement instances of IsBlock and IsInline for Haddock's Doc exactly as you suggested
  • define a new custom syntax element for describing linked identifiers (similar to Haddock's current
    'identifier' syntax)

Once this is done, I expect I'll be better placed to provide useful feedback. Thanks again for developing such a library.

@ntc2
Copy link
Contributor

ntc2 commented May 9, 2018

This PR was closed accidentally, see this comment for how to reopen.

@harpocrates
Copy link
Collaborator Author

Thanks @ntc2, but this PR should anyways have been closed. The markdown effort is being documented in #794.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants