Skip to content

Find missing IO specializations #697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,19 @@ jobs:
- name: 🎗️ Lint with stylish-haskell
run: ./scripts/format-stylish-haskell.sh && git diff --exit-code

################################################################################
# Lint for missing IO specialisations
################################################################################
lint-io-specialisations:
name: Lint for missing IO specialisations
runs-on: ubuntu-latest
steps:
- name: 📥 Checkout repository
uses: actions/checkout@v4

- name: 🎗️ Lint for missing IO specialisations
run: ./scripts/lint-io-specialisations.sh

################################################################################
# Lint with generate-readme
################################################################################
Expand Down
42 changes: 42 additions & 0 deletions scripts/lint-io-specialisations.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/bin/sh

SCRIPTS_DIR="$(CDPATH= cd -- "$(dirname -- "$0")" && pwd -P)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? The script is expected to be run with the root of the repository as the current directory; after all, other linting scripts work based on the same expectation. Given this, it should be enough to just use scripts as the scripting directory.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For robustness. All other linting scripts use git search, which is robust in that it searches from the repository base.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, git ls-files does not search from the repository base but from the current directory. For example, running the command git ls-files --exclude-standard --no-deleted --deduplicate '*.hs' (taken from scripts/lint-hlint.sh) when being in the scripts directory yields only generate-haddock-prologue.hs and generate-readme.hs.

So currently several, maybe most, of our scripts have to be run with the working directory being the repository base in order to function correctly. It would be good, of course, to change this, perhaps by adding a code snippet that just changes the working directory to the repository base right at the beginning, but, since this would be a change to be made to several scripts, it should be done in a different pull request.

absence_allowed_file="${SCRIPTS_DIR}/lint-io-specialisations/absence-allowed"
absence_finder="${SCRIPTS_DIR}/lint-io-specialisations/find-absent.sh"

IFS='
'

export LC_COLLATE=C LC_TYPE=C

printf 'Linting the main library for missing `IO` specialisations\n'

if ! [ -f "$absence_allowed_file" ]
then
printf 'There is no regular file `%s`.\n' "$absence_allowed_file"
exit 2
fi >&2
if ! sort -C "$absence_allowed_file"
then
printf 'The entries in `%s` are not sorted.\n' "$absence_allowed_file"
exit 2
fi >&2

hs_files=$(
git ls-files \
--exclude-standard --no-deleted --deduplicate \
'src/*.hs' 'src/**/*.hs'
) || exit 3
absent=$(
"$absence_finder" $hs_files
) || exit 3
missing=$(
printf '%s\n' "$absent" | sort | comm -23 - "$absence_allowed_file"
) || exit 3
if [ -n "$missing" ]
then
printf '`IO` specialisations for the following operations are missing:\n'
printf '%s\n' "$missing" | sed -e 's/.*/ * `&`/'
exit 1
fi
printf 'All required `IO` specialisations are present.\n'
Empty file.
133 changes: 133 additions & 0 deletions scripts/lint-io-specialisations/find-absent.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
#!/bin/sh

# Usage notes:
#
# * The arguments to this utility specify the files to check. If no
# arguments are given, standard input is checked. A typical usage of
# this utility is with the `**` glob wildcard, supported in
# particular by the Z Shell and by Bash with the `extglob` option
# set. For example, the following command will check all Haskell
# source files of the main library:
#
# scripts/io-specialisations/find-absent.sh src/**/*.hs
#
# * The results of this utility are not reliable, but should generally
# be correct for “reasonably styled” code. One important restriction
# is that, in order to be considered in need of having an `IO`
# specialisation, an operation must have an application of a type
# variable named `m` as its result type.
#
# Implementation notes:
#
# * The `sed` script that essentially performs all the work uses the
# hold space to hold the name of the current module and the name of
# the operation to which the most recently found `IO` specialisation
# or inlining directive refers. These two names are stored with a
# space between them. The strings before and after the space can
# also be empty:
#
# - The string before the space is empty when the module name is
# not given on the same line as the `module` keyword. This
# causes the module name to not appear in the output but
# otherwise does not have any drawback.
#
# - The string after the space is empty when no `IO`
# specialisation or inlining directive has been found yet in the
# current module or the most recently found such directive is
# considered to not be relevant for the remainder of the module.

# Find sed:
case "$(uname)" in
Darwin)
sed="$(which gsed)"
if [ "${sed}" = "" ]; then
printf 'This script requires GNU sed, which can be installed with Homebrew:\n\n' >&2
printf ' brew install gsed\n\n' >&2
exit 1
fi
;;
*)
sed="$(which sed)"
;;
esac

sed=$(which gsed)
if [ "${sed}" = "" ]; then
sed=$(which sed)
fi

Comment on lines +39 to +58
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implements two, slightly different, ways of deciding on a sed executable:

  1. gsed on Darwin, sed everywhere else
  2. gsed when available, sed otherwise

I find solution 1 unnecessarily restrictive, since there may be other systems where GNU sed is available under the name gsed (I remember there being at least gmake on the Sun servers we had at university in the 1990ies 😄). Option 2 lifts that restriction, but it unconditionally uses sed whenever there is no gsed, even on Darwin, and thus risks picking a non-GNU sed. Best is probably to use gsed whenever available, use sed if gsed is unavailable but the system is GNU/Linux, and fail otherwise.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a decent improvement, yes. As long as not having gsed on Darwin remains an error with an appropriate help message.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version in #683 certainly flags absence of gsed on macOS as an error. It doesn’t provide help on how to install gsed on macOS, since I believe that a Unix script shouldn’t provide assistance in installing certain software on particular operating systems and Homebrew should be so well known among developers using macOS that virtually all of them should know how to install gsed (and the remaining few ones could find out from the first hit of a corresponding web search 🙂).

specialise='SPECIALI[SZ]E'
pragma_types="($specialise|INLINE)"
hic='[[:alnum:]_#]' # Haskell identifier character

LC_COLLATE=C LC_CTYPE=C $sed -En -e '
:start
# Process the first line of a module header
/^module / {
s/module +([^ ]*).*/\1 /
h
}
# Process a `SPECIALISE` or `INLINE` pragma
/^\{-# *'"$pragma_types"'( |$)/ {
# Remove any pragma operation name from the hold space
x
s/ .*//
x
# Add the pragma to the hold space
:prag-add
H
/#-\}/ !{
n
b prag-add
}
# Get the contents of the hold space
g
# Skip a `SPECIALISE` pragma with a non-`IO` result type
/\{-# *'"$specialise"'( |\n)/ {
s/.*(::|=>|->)( |\n)*//
/^IO / !{
g
s/\n.*/ /
h
d
}
g
}
# Store the operation name along with the module name
s/\{-# *'"$pragma_types"'( |\n)+//
s/\n('"$hic"'*).*/ \1/
h
}
# Process a potential type signature
/^[[:lower:]_]/ {
# Add the potential type signature to the hold space
:tsig-add
s/ -- .*//
H
n
/^ / b tsig-add
# Get the persistent data and save the next line
x
# Process a type signature with a context
/^[^ ]* '"$hic"'*\n'"$hic"'+( |\n)*::.+=>/ {
# Place the result type next to the operation name
s/([^ ]* '"$hic"'*\n'"$hic"'+).*(=>|->)( |\n)*/\1 /
# Handle the case of a monadic result type
/^[^ ]* '"$hic"'*\n[^ ]+ m / {
# Handle the case of pragma absence
/^[^ ]* ('"$hic"'*)\n\1 / !{
s/([^ ]*) '"$hic"'*\n([^ ]+).*/\1.\2/p
s/\.[^.]+$/ /
b tsig-fin
}
}
}
# Clean up and forget about the pragma operation name if any
s/ .*/ /
# Get the saved next line and store the persistent data
:tsig-fin
x
# Continue
b start
}
' "$@"
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
{-
Pronunciation note:

The identifiers in this module are somehow considered to be German. They
used to contain the German ä and ö, but since the script only treats English
letters as letters eligible to be part of identifiers, ä and ö were replaced
by their standard alternatives ae and oe. This all should give some
indication regarding how to pronounce the identifiers. The author of this
module thought this note to be necessary, not least to justify the choice of
module name. 😉
-}
module Animals.Sheep where

{-# SPECIALISE
boerk
::
Show a => a -> m ()
#-}
boerk ::
(Monad m, Show a) -- ^ The general way of constraining
=> a -- ^ A value
-> m a -- ^ An effectful computation
{-# SPECIALISE
schnoerk
::
Show a => m a
#-}
schnoerk
:: (Monad, m, Show a) -- ^ The general way of constraining
=> m a -- ^ An effectful computation

{-# SPECIALISE
bloek
::
IO a
#-}
bloek ::
IO a

lamb :: m a -> m a
lamb = id

{-# INLINE baeh
#-}
baeh :: Monad m => m a -> m a
baeh = id

{-# INLINE
boo #-} -- maybe too large for inlining
boo :: MonadSheep m => Scissors -> m Wool
boo scissors = withScissors scissors $ \ capability -> cut capability (fur Boo)

maeh :: a -> (b -> IO (a, b))
maeh = curry return

moeh :: Monad m => a -> (b -> m (a, b))
moeh = curry return
26 changes: 26 additions & 0 deletions scripts/lint-io-specialisations/find-absent.tests/Misc.fake-hs
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
module Misc
(
conv,
first
)
where

yield :: Monad m => a -> m a
yield = return

{-# SPECIALISE first :: [a] -> IO (WeakPtr a) #-}
-- | Get a weak pointer to the first element of a list.
first :: MonadWeak m => [a] -> m (WeakPtr a)
first = _

{-# SPECIALISE last :: [a] -> IO (WeakPtr a) #-}
last :: [a] -> IO (WeakPtr a)
last _ = _

{-# SPECIALISE conv :: MonadIO m => [a] -> m a #-}
conv :: (Functor f, Monad m) => f a -> m a
conv = id

{-# SPECIALISE mis :: MonadIO m => [a] -> IO a #-}
match :: (Functor f, Monad m) => f a -> m a
match = id
6 changes: 6 additions & 0 deletions scripts/lint-io-specialisations/find-absent.tests/output
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Animals.Sheep.boerk
Animals.Sheep.schnoerk
Animals.Sheep.moeh
Misc.yield
Misc.conv
Misc.match