J for Haskell Programmers

The J programming language is like Forth. Both are old-school and obscure. Both have quirky syntax thanks to a dead simple parser and unorthodox choices. Both encourage point-free style. Both insist on terminology that evokes comparisons with natural languages, e.g. Forth calls functions words, while J calls them verbs.

However, several features make a J interpreter more challenging to write than a Forth interpreter.

Here’s an inexact decoding of J’s cute terminology (J monads are no relation of Haskell monads):

Monad       = Noun -> Noun
Dyad        = Noun -> Noun -> Noun
Verb        = (Monad, Dyad)
Adverb      = Verb -> Verb
Conjunction = Verb -> Verb -> Verb

For now, we’ll say a Noun is a multidimensional array of numbers. The arrays usually have 0, 1, or 2 dimensions (and are called atoms, lists, and tables).

J monads and dyads are just unary and binary operators, but they are always packaged together: every symbol representing a verb is overloaded. Depending on context, a verb is interpreted as a unary or a binary operator, which may be completely unrelated.

For example:

   * 42     NB. Unary * is signum.
1
   42 * 42  NB. Binary * is multiplication.
1764

Adverbs and conjunctions are higher-order unary and binary operators. They operate on verbs, though some accept nouns (not shown above), and output verbs, though some produce nouns (also not shown). Adverbs are written postfix, and conjunctions are written infix.

Thankfully the designers refrained from naming them, say, monadic and dyadic adverbs [monadverbs and dyadverbs?], and decided against overloading for adverbs and conjunctions. Since verbs possess 2 meanings, a single symbol for adverb/conjunction would have at least 4 meanings! Was this how it was with APL?

Haskell has only one J-style verb, that is, only one overloaded operator: the unary minus. Code is easier to read for a reasonable price.

J pays a higher price, as every operator is overloaded. Moreover, not only does J possess the unary minus verb, but J also employs the underscore as the unary minus in numeric literals:

   -5
_5
   _5
_5
   --5
5
   -_5
5

Crash Course

Let’s take a few simple J examples:

q=. 2 3 5 7 9                define q
1 + 2 * q                    1 plus 2 times q
+/ q                         sum q
*/\ q                        partial products of q

and rewrite them in Haskell:

q = [2,3,5,7,9]    -- define q
((1+).(2*)) <$> q  -- 1 plus 2 times q
sum q              -- sum q
scanl1 (*) q       -- partial products of q

Thus:

  • Space-separated numbers are lists.

  • Operators on numbers automatically map over list of numbers.

  • The unary version of the adverb / is like foldr1; observe sum q is equivalent to

    foldr1 (+) q  -- sum q
  • The unary version of the adverb \ is like tail . inits; observe scanl1 (*) q is equivalent to:

    foldr1 (*) <$> tail (inits q)  -- partial products of q

The following are Haskell equivalents for other examples from the above link. We assume Data.List and Data.Ratio have been imported.

p = [1,2,3,2,2]                                  -- define p
map (q!!) . (`elemIndices` p) <$> nub p          -- classify q by p
sum <$> map (q!!) . (`elemIndices` p) <$> nub p  -- sum q classified by p
snd <$> sortOn fst (zip p q)          -- sort q in order of p
zipWith ((fromRational .) . (%)) p q  -- p divided by q
zipWith (%) p q                       -- p divided by rational q
scanl1 (+) $ zipWith (%) p q          -- partial sums

We might think J is straightforward, apart from automatically extending functions so they work on lists instead of just numbers. We would be wrong.

Obscurity Through Peculiarity

The lexer and parser are full of surprises. Parentheses often do what we might think, but not always:

   1 + 3
4
   (1) + (3)
4
   1 2 3
1 2 3
   (1) 2 (3)
|syntax error
|       (1)2(3)
   a=:3
   (a=:3)
3

Same goes for operator precedence:

   4 + 2 * 3
10
   2 * 3 + 4
14

More surprises await. For example, verbs are right associative: NVNVN = NV(NVN), VVVVV = VV(VVV), while conjunctions are left associative: VCVCV = (VCV)CV.

To solve these mysteries, we must scrutinize the J lexer and parser. The verb ;: tokenizes a string according to J rules:

   ;: '1 2 3: 4 (5) _a1b _2c3 d_4e:.:. .::##::@%.. NB. comment'
   ┌───┬──┬─┬─┬─┬─┬─────────┬────────┬───┬─┬───┬─┬───┬───────────┐
   │1 2│3:│4│(│5│)│_a1b _2c3│d_4e:.:.│.::│#│#::│@│%..│NB. comment│
   └───┴──┴─┴─┴─┴─┴─────────┴────────┴───┴─┴───┴─┴───┴───────────┘

Rather than reconstruct its underlying finite state machine, we observe:

  • a number is a string of alphanumeric characters or underscore that starts with a digit or underscore

  • lists of numbers are a single token

  • apart from lists, each token begins with a string of alphanumeric characters or the underscore, or otherwise a single character.

  • apart from lists, each token may be suffixed by any number of . or : characters.

then encode these rules with parser combinators:

import Data.Char
import Data.List
import Text.ParserCombinators.Parsec

-- Gratuitously point-free and terse.
jLine :: Parser [String]
jLine = (map unwords . groupBy ((. isJNum) . (&&) . isJNum)) -- Join numbers.
  <$> (spaces >> many jToken)  -- Eat leading spaces.

isJNum s@(c:_) = (isDigit c || c == '_') && last s `notElem` ".:"

jToken = (string "NB." >>= (<$> many anyChar) . (++)) <|> ((++) <$>  -- NB.
  (many1 (char '_' <|> alphaNum) <|> count 1 anyChar)  -- e.g. "ab_12" or "#".
  <*> many (oneOf ".:")                                -- e.g. "..:.:.::.".
  <* spaces)                                           -- Eat trailing spaces.

main = print $ parse jLine ""
  "1 2 3: 4 (5) _a1b _2c3 d_4e:.:. .::##::@%.. NB. comment"

Seems we’re about right:

Right ["1 2","3:","4","(","5",")","_a1b _2c3","d_4e:.:.",".::","#","#::","@","%..","NB. comment"]

There are differences: J’s lexer preserves the whitespace between list elements exactly, while we replace such whitespace with a single space. But this is unimportant, and in any case we could tweak our code to do this.

A Parsimonious Parser

J’s parser is well-documented. Each token becomes an executable fragment, which we manipulate on a stack. We examine the top few stack elements to decide what to execute next. We may need to shift more tokens onto the stack first. During execution, a sequence of fragments gets reduced to a single fragment.

J’s bare-bones parser and executor means we can easily build a crude J-to-Haskell translator. Thanks to pattern matching, our code resembles the parsing table in the documentation.

We define an algebraic data type to hold the "parts of speech" according to J. The Cash value represents a sentinel value for the edge of the input line, and also erroneous tokens. This name was chosen because the parser documentation uses the dollar sign for similar purposes.

Adverbs require an Int field to indicate whether they depend on the monadic or dyadic variant of a verb.

The famous fork of J translates from u v w to v <$> u <*> w, a delightful use of the Reader Monad.

import Data.Char
import Data.List
import qualified Data.Map.Strict as Map
import Text.ParserCombinators.Parsec

jLine :: Parser [String]
jLine = (map unwords . groupBy ((. isJNum) . (&&) . isJNum)) -- Join numbers.
  <$> (spaces >> many jToken)  -- Eat leading spaces.

isJNum s@(c:_) = (isDigit c || c == '_') && last s `notElem` ".:"

jToken = (string "NB." >>= (<$> many anyChar) . (++)) <|> ((++) <$>  -- NB.
  (many1 (char '_' <|> alphaNum) <|> count 1 anyChar)  -- e.g. "ab_12" or "#".
  <*> many (oneOf ".:")                                -- e.g. "..:.:.::.".
  <* spaces)                                           -- Eat trailing spaces.

data Fragment = Noun String
              | Verb (String, String)
              | Adverb ((String, Int), (String, Int))
              | Conjunction
              | Copula
              | LParen
              | RParen
              | Cash
              deriving Show

jFind xs = let ws = words xs in case length ws of
  1 | all isDigit xs                  -> Noun (show (read xs :: Int))
    | xs `Map.member` dict            -> dict Map.! xs
    | otherwise                       -> Cash
  _ | all (all isDigit) ws            -> Noun (show (read <$> ws :: [Int]))
    | otherwise                       -> Cash
  where
    dict = Map.fromList
      [ ("#", Verb ("length", "replicate"))
      , ("+", Verb ("TODO", "(+)"))
      , ("-", Verb ("negate", "(-)"))
      , ("*", Verb ("signum", "(*)"))
      , ("%", Verb ("(1/) . fromIntegral",
                    "(\\x y -> fromIntegral x / fromIntegral y)"))
      , (">:", Verb ("(1+)", "(fromEnum .) . (>)"))
      , ("i.", Verb ("flip take [0..]",
                     "(\\x y -> case elemIndex y x of " ++
                     "{Just i -> i; n -> length x})"))
      , ("/", Adverb (("foldr1", 2), ("flip . (map .) . flip . (map .)", 2)))
      , ("\\", Adverb (("(. tail. inits) . map", 1), ("TODO", 2)))
      , ("(", LParen)
      , (")", RParen)
      ]

main = interact $ \input -> unlines $ (<$> lines input) $ \s -> let
  Right ws = parse jLine "" s
  xs = Cash : reverse (Cash:(jFind <$> filter (not . isPrefixOf "NB.") ws))
  in case run xs [] of
    [Cash, Noun s, Cash] -> s
    _ -> "syntax error: " ++ show xs

run xs st
  | length st < 4 = shift
  -- 0 Monad
  | ccl,    (Verb (v, _), Noun n)         <- (x1, x2) =
    run xs (x0:Noun (concat [v, " $ ", n]):x3:rest)
  -- 1 Monad
  | cclavn, (Verb _, Verb (v, _), Noun n) <- (x1, x2, x3) =
    run xs (x0:x1:Noun (concat [v, " $ ", n]):rest)
  -- 2 Dyad
  | cclavn, (Noun m, Verb (_, v), Noun n) <- (x1, x2, x3) =
    run xs (x0:Noun (concat ["(", v, " $ ", m, ") $ ", n]):rest)
  -- 3 Adverb
  | cclavn, (Verb vv, Adverb (a1, a2))    <- (x1, x2) = let
    adverb (a, 1) (v, _) = concat ["(", a, " $ ", v, ")"]
    adverb (a, 2) (_, v) = concat ["(", a, " $ ", v, ")"]
    in run xs (x0:Verb (adverb a1 vv, adverb a2 vv):x3:rest)
  -- 5 Fork
  | cclavn, (Verb (u1, u2), Verb (v1, v2), Verb (w1, w2)) <- (x1, x2, x3) =
    run xs (x0:Verb (concat ["(", v2, " <$> ", u1, " <*> ", w1, ")"],
      "TODO"):rest)
  -- 8 Paren
  | LParen <- x0, isCAVN x1, RParen <- x2 = run xs (x1:x3:rest)
  | otherwise = shift
  where
    shift | (h:t) <- xs = run t (h:st)
          | otherwise   = st

    (x0:x1:x2:x3:rest) = st
    ccl = isCCL x0
    cclavn = ccl || isAVN x0

isCCL Cash   = True
isCCL Copula = True
isCCL LParen = True
isCCL _      = False

isAVN (Adverb _) = True
isAVN (Verb _)   = True
isAVN (Noun _)   = True
isAVN _          = False

isCAVN Conjunction = True
isCAVN x           = isAVN x

Let’s run this on a few J lines:

2+3*4
4*3+2
#i.3
*/\10#2
(+/%#)i.10

We get:

((+) $ 2) $ ((*) $ 3) $ 4
((*) $ 4) $ ((+) $ 3) $ 2
length $ flip take [0..] $ 3
((. tail. inits) . map $ (foldr1 $ (*))) $ (replicate $ 10) $ 2
((\x y -> fromIntegral x / fromIntegral y) <$> (foldr1 $ (+)) <*> length) $ flip take [0..] $ 10

In an interactive GHC session (where we’ve imported Data.List), we find these evaluate to:

14
20
3
[2,4,8,16,32,64,128,256,512,1024]
4.5

So far so good. But our translator fails to automatically map operators over lists. For example, 4 + 1 2 3 becomes:

((+) $ 4) $ [1,2,3]

which is invalid. It works if we prepend map to the expression, but there’s no easy way to make our simplistic translator do this for lists, let alone arrays of higher dimensions.

We’ll need a more sophisticated approach that automatically modifies functions to work on multi-dimensional arrays of all shapes and sizes.


Ben Lynn blynn@cs.stanford.edu 💡