From: peterson-john@CS.Yale.Edu (John C. Peterson) Newsgroups: comp.lang.functional Subject: Changes in Haskell 1.3 Date: 09 Sep 1995 03:59:34 MET Organization: Yale Haskell Project Distribution: world Introducing Haskell 1.3 This new version of the Haskell Report adds many new features to the Haskell language. In the five years since Haskell has been available to the functional programming community, Haskell programmers have requested a number of new language features. Most of these features have been implemented and tested in the various Haskell systems and we are confident that all of these additions to Haskell address a real need on the part of the community. This revision to the Haskell report is much more substantial than previous ones: many significant additions are being made. We have also streamlined some aspects of Haskell, eliminating features which have been little used and complicate the language. The final version of the Haskell 1.3 is expected to be complete in October, 1995. A preliminary version of the report will be available soon. All significant changes to the Haskell language, as well as their motivation, are described here. We are still open to comments and suggestions; please send mail to peterson-john@cs.yale.edu regarding Haskell 1.3. I will be happy to answer any questions or forward mail to either the Haskell mailing list or the 1.3 committee, as appropriate. Information about the design of Haskell 1.3 and other proposed extensions to Haskell is available on the web at http://www.cs.yale.edu/HTML/YALE/CS/haskell/haskell13.html There will be some minor incompatibilities with Haskell 1.2. These should not be serious and implementors are encouraged to provide a Haskell 1.2 compatibility mode. Overview Haskell 1.3 introduces the following major features: * Standardized libraries and a reduced prelude * Constructor classes (as in Gofer) * Monadic I/O * Strictness annotations in type definitions * Simple records * A new type mechanism * Special monad syntax (`do') * Qualified names * All names are now redefinable * The character set has been expanded to ISO-8559-1 Many other smaller changes to Haskell 1.2 have also been made. A complete description of new, changed, and eliminated features follows. Prelude Changes Haskell 1.3 will make a number of minor changes to the standard prelude. Many prelude functions will be moved to libraries, reducing the size of the Haskell core language. These changes will be described separately. Standard Libraries As Haskell has grown, many informal libraries of useful functions have been created. In Haskell 1.3, we have decided to standardize a set of libraries to accompany the core language. Some of the functions formerly in the prelude are now in libraries, decreasing the size of the core language and giving the user more names in the default namespace. We are dividing the Haskell report into two separate documents: a language report and a library report. The prelude, now a little smaller, will be described in the language report. The library report will continue to evolve after the 1.3 language report is complete. We have moved much of the I/O, complex and rational arithmetic, many lesser used list functions, and arrays to the libraries and also developed a number of completely new libraries. An initial Haskell library report will be available at the same time as the 1.3 language report. Constructor Classes We have observed that many programmers use Gofer instead of Haskell to use Gofer's constructor classes. Since constructor classes are well understood, widely used, and easily implemented we have added these to Haskell. Briefly, constructor classes remove the restriction that types be `first order'. That is, `T a' is a valid Haskell type, but `t a' is not since `t' is a type variable. Constructor classes increase the power of the class system. For example, this class definition uses constructor classes: class Monad m where (>>=) :: m a -> (a -> m b) -> m b return :: a -> m a Here, the type variable `m' must be instantiated to a polymorphic data type, as in instance Monad [] where f >>= g = concat (map g f) return x = [x] No changes to the expression language are necessary; constructor classes are an extension of the type language only. Constructor classes require an extra level of type information called `kinds'. Before type inference, the compiler must perform kind inference to compute a kinding for each type constructor. Kinds are much simpler than types and are not ordinarily noticed by the programmer. The changes to Haskell required to support constructor classes are: * The syntax of types includes type application. * Built-in types have names: [] for lists, (->) for arrow, and (,) for tuples. Using type application, the type `(,) a b' is identical to `(a,b)'. * Type constructors (but not type synonyms) can be partially applied. * Type variables in interface files may be annotated with a kind. This will not affect any type variables used by Haskell 1.2 programs. * The basic monadic operations: >>, >>=, and return, will be placed in the Monad class. All monads will share the same basic operations. Monadic I/O Monadic I/O has already become the de-facto standard in the various Haskell systems. We have chosen a fairly conservative, but extensible basic design (an IO monad with error handling), the details of which are documented at http://www.dcs.gla.ac.uk/~kh/Haskell1.3/IO.html This hasn't changed recently. The old I/O systems (continuation & stream) have been totally scrapped. Most of the I/O system has been moved out to libraries. The addition of monad syntax should improve the improve the comprehensibility of programs that make heavy use of I/O. Strict Constructors To achieve reasonable efficiency, Haskell programmers often use implementation-specific annotations to mark fields of data structures as strict. These strict components are evaluated when the structure is created instead of delayed until demanded. This avoids the extra overhead involved with delays and can result in much more compact structures. To help improve efficiency, we now supply a simple strictness annotation on the types in a data declaration (or record declaration - see below). Using `!' in front of a type in a data declaration marks a structure component as strict. (The `!' symbol is not a keyword; it has no special meaning outside data declarations). Only top-level types may be marked as strict; using `!' elsewhere in a type is not allowed. Any type in a structure component may be marked as strict. data Foo a = F Int !Bool a a This declares F as a constructor containing four fields: a (non-strict) Int, a strict Bool and two (non-strict) polymorphic fields. Fully polymorphic strictness implies that values of any type may be evaluated when placed into a strict data structure. There are compelling arguments for preventing functions from being evaluated in this manner. Without going into detail about the problems associated with forcing functions, we have chosen to prevent function forcing by using a special class, Data, to explicitly mark type variables which must not be instantiated as functions. Every data type, except ->, is a member of the Data class. If we want to make a polymorphic component strict, we must add a "Data" context to the declaration: data Data a => Foo a = F Int !Bool !a a The prelude will use strictness annotations in the definition of Ratio and Complex, making these numeric types much more efficient. To avoid unnecessary propagation of the Data class, it is a superclass of Num. The functions `strict' and `seq' have been added to the prelude to give the user additional control over evaluation. While no strictness annotations have been proposed for functions, these primitives can be used to implement strict evaluation when needed. seq :: Data a => a -> b -> b strict :: Data a => (a -> b) -> (a -> b) strict f = \x -> seq x (f x) The seq function evaluates its first argument before returning the second one. strict turns an ordinary function into a strict one. Strict data constructors accumulate all arguments before evaluating them: data R = R !Int !Int R x y = seq x (seq y (makeR x y)) -- just to show the semantics of R y = R undefined -- Not an error Haskell 1.3 will not provide strictness annotations for functions. These can be constructed explicitly using seq if needed. We are not sure how strictness should interact with partial application. Records The individual components of data types in Haskell 1.2 are completely anonymous: the type definition only names the type and the constructors, not the fields of a constructor. It is inconvenient to select from, construct, or modify values associated with a constructor which has many components. A number of more general record structures have been proposed and implemented. After much discussion, we have decided to avoid the issues of inheritance or object oriented programming for the moment and provide a simple syntax which will allow the fields of a single-constructor data type to have names. Selection, construction, and update operations which use field names are provided. The layout rule applies to all uses of {;} in the following syntax. Records are declared using a new type declaration: topdecl -> data [context =>] simple = con with {sig1;sig2; ...} [deriving classes] sig -> var1, ..., varn :: [!]type n >= 1 Example: date Point = Point with pointX, pointY :: Float deriving Text This names a type, Point, a constructor, also Point, and two fields: pointX and pointY. Field names are in the same namespace as other Haskell values; a field may not share a name with other field names or variables in scope. At present, records are restricted to single constructor data types. The field labels in the record can be used as selection functions. In this example, the functions pointX, pointY :: Point -> Float can be used to access the fields of a point. Construction of a new record uses the constructor and a `with' clause to initialize fields. Any fields which have been marked as strict must be mentioned during construction. Non-strict fields are initialized to bottom if not mentioned. exp -> con with {fdef1,...,fdefn} n >= 0 fdef -> var := exp Example: p = Point with pointX := 1 pointY := 2 The with construct can also be used to (non-destructively) update record values: exp <- exp with {fdef1,...,fdefn} n >= 1 Example: q = p with pointX := pointX p + 1 The `:=' operation can also be used by itself as an anonymous update function: exp -> var := exp Example: f :: Point -> Point f = pointX := 1 Record types cannot be referenced in the pattern language. Aside from the special syntax needed to access record fields, records are ordinary algebraic data types. The design of a record system for Haskell has been difficult. There are many obvious and useful extensions to the basic notion of records which have been considered. In the end, instead of endorsing any more advanced record features (inheritance, object oriented programming, defaulting field values on construction, enhanced polymorphism, pattern matching) we have decided to introduce the bare minimum of new functionality to the language. New Types Haskell users have often needed to declare a new name for an existing type. There were two ways of doing this: declaring a synonym, as in type Foo = Int or creating a type wrapper, as in data Foo = Foo Int This first approach has no runtime overhead, but does not distinguish the new type from the old. More seriously, instances cannot be attached to the new type, only the old one. The other approach incurs an additional overhead in the representation. An additional type declaration will provide a more efficient way of creating a new type from an existing one: newtype context => simple = con type [deriving classes] Examples: newtype Foo = Foo Int deriving Text newtype Bar a = Bar [a] deriving (Text,Eq) newtype LongName = L Int This new type is used in the same manner as a wrapper type and, similarly, the type name need not match the constructor name. The constructor is used for explicit coercion between the new and existing types. Derived instances simply re-use instances already attached to the existing type (except for Text, which prints the constructor as it would for a wrapper type). At first glance, it looks as though the newtype "Foo" could have been defined using normal data types and strictness annotations. data Foo = Foo !Int deriving Text The difference between the two is rather subtle (but important): * Using strictness annotations, evaluation of the field occurs when a Foo is created. That is: case Foo undefined of Foo _ -> True = undefined * Using newtype, evaluation of the field occurs when the field is used. That is case Foo undefined of Foo _ -> True = True *It is perfectly legal to write: newtype ST s a = ST (s -> (s,a)) but not to write: data ST s a = ST !(s -> (s,a)) since that would require a Data context for function spaces. Monad Syntax To make monadic programming more readable, `do' syntax has been added to Haskell: Syntax: Expansion: exp -> do {stmt} stmt -> exp exp stmt -> pat <- exp ; stmt exp >>= \pat -> do stmt stmt -> exp ; stmt exp >> do stmt stmt -> let decls ; stmt let decls in do stmt The Haskell layout rules already allow you to write: let x = 1 y = 2 in x + y as an abbreviation for: let { x = 1; y = 2 } in x + y By the same rules, it is possible to write: do x <- getInt y <- getInt return (x + y) as an abbreviation for: do { x <- getInt; y <- getInt; return (x + y) } C programmers might prefer the latter. As in a list comprehension, variables bound by patterns are scoped over the following statements in the do. Unlike list comprehensions, pattern match failure results in a runtime error. The do has the same type as the last statement. A let in a do is scoped over all remaining statements. Note there is no in; this distinguishes this from an ordinary let. Haskell 1.3 omits several features from earlier proposals and the Gofer implementation. In particular, there is no support for "guards". do x <- foo if even x bar The problem is that in some monads, guard failure should yield a zero while in others it should cause an exception or even an error (with an appropriate error message). Fortunately, there's no need to wrestle with these thorny semantic issues since it's trivial for programmers to define and use their own "guard functions". -- raise an error if condition fails. assert :: Monad m => Bool -> String -> m () assert p msg = if p then return () else error ("Assertion failed: " ++ msg) -- return a zero if condition fails. guard :: MonadZero m => Bool -> m () guard p msg = if p then return () else zero do x <- foo assert (even x) "foo didn't return an even result at line 32 of Bar.lhs" y <- bar x guard (x `elem` y) return y Qualified Names Current Haskell practice is for all outside names to be brought into scope using `import'. This has some disadvantages: renaming is required to resolve name clashes between different modules and the names themselves provide no hint of where they come from. In the worst case (infrequently used names which have been imported without explicitly listing what is being imported), this forces programmers to use grep (or similar technology) to locate the definition of the name. Qualified names are built from a module name and a local name, separated by a `.'. Thus, `Prelude.foldr' refers to the foldr exported from Prelude. (Note that qualified names are not the same as `original names': an original name uses the defining module; here, the exporting module is named). Qualified names have two advantages: * Using qualified names prevents or resolves name clashes between different modules. * Qualified names result in more readable code since import declarations need not be consulted to find the defining module for a name. A qualified name can be used for any entity. Short names, omitting the module and separating `.' can be used only for locally defined names, or for entities which are imported unqualified (i.e. ones mentioned in an import declaration which does not use the "qualified" keyword). For example, the following import declaration brings a module name into scope, but only for use with qualified names. The module may be locally renamed, allowing a shorter or more descriptive module name to be used in the importing module, or allowing the imported module to change with minimal effect on the program. topdecl -> import qualified con [as con] Only modules brought into scope using a qualified import can appear in qualified names. Qualified names are defined in the lexical syntax. Thus, `Foo.a' and `Foo . a' are quite different. No whitespace is permitted in a qualified name. Symbols may also be qualified: `Prelude.+' is an operator which can be used in exactly the same manner as `+'. Note that the `.' operator presents a syntactic problem. It causes `Foo..', which is widely used in export lists and occasionally in arithmetic sequences, to parse as a qualified name instead of a usage of the `..' token. Inserting a space before the `..' will resolve this problem. Redefinable Names A minor complaint about Haskell has been that names defined in PreludeCore cannot be redefined in any way. Since many operators, like +, -, >, and ==, are defined in PreludeCore this has made some users deeply unhappy -- this restriction prevents any sort of alternative numeric class structure which uses the standard symbols, for example. There is no real reason for stealing all these names from the user; in Haskell 1.3 PreludeCore is no longer special: all ordinary names are redefinable. Special syntax will remain attached to Prelude names; thus `[x]' would always refer to lists as defined in the Prelude. As before, the Prelude module is implicitly imported (in unqualified form) unless an explicit import is found. There is also an implicit (and unavoidable) qualified import of the Prelude which is used to define the meaning of various pieces of syntactic sugar. This eliminates the need to make PreludeCore symbols immutable. Since qualified names can always be used for imported entities, any Prelude entities that the programmer has chosen to shadow can still be referred to using `Prelude.'. Expanded Character Set This change in character set supports both an expanded Char type at execution time and a greatly increased vocabulary for constructing program names. Many new characters have been added to `small', `large', and `symbol' in the lexical syntax. The revised report will have full details. Other Changes to Haskell C-T Rule relaxation The C-T rule restricts the placement of instance declarations to the module containing either the class or data type definition associated with an instance. This is unnecessarily restrictive and will be relaxed in Haskell 1.3. The new restriction will be that no more than one instance for a given class - datatype pair is allowed in a program. (Sadly, there's little hope of detecting violations of this rule before link time.) The visibility of instance declarations presents a problem. Unlike classes or types, instances cannot be mentioned explicitly in export lists. Instead of changing the syntax of the export list, we have adopted a simple rule: when a module exports a class or a type, any associated instances in scope are also exported. Thus, the module module I where import M(C,T) instance C T where ... would not export the instance C T. On the other hand, adding either C or T to the export list of I would cause the instance to be exported from I. Any module importing C or T from I would have this instance declaration in scope. This will not break any existing applications since instances must appear in the module defining C or T and will thus be exported with them if they are used elsewhere. Polymorphic Recursion This is a simple extension to the type system which allows the user to attach a more general type to a recursive function than would be normally be inferred. For example, in f x y = if f True False then x else y the type of f would normally be inferred as `Bool -> Bool -> Bool'. However, polymorphic recursion allows the user to add a type signature to f such as `f :: a -> a -> a' to give it a more general type. While polymorphic recursion is not likely to be needed by the average Haskell user, it occasionally will prevent the type checker from complaining about a quite sensible looking program. Interface Files The current report requires that interface files be transitively closed and that the original names of all entities be supplied. Such interfaces are appropriate for compilers to generate, but are not necessarily useful to the programmer. Although individual compilers can still support the old style of interface, the report will be changed to allow interfaces which use exactly the same imports found in the implementation. Only locally defined entities need appear in these interfaces. List Comprehension Let Bindings The syntax of list comprehensions is being expanded to include qual -> let {decls} (The absence of `in' distinguishes this from a "let"-expression in a guard). These decls are scoped over the remaining qualifiers and the generated expression. Binding is irrefutable: pattern match failure is a program error. The expansion of this is trivial: it expands directly into a conventional let. Standard Annotations A number of pragmas, already widely used, will be standardized. Currently, the proposal calls for {-# Inline f #-} {-# NotInline f #-} {-# Specialize f :: type #-} Exact syntax has not yet been worked out. Minor Syntax Minor syntax changes include: * There is a new syntax for hex and octal constants * The interaction of layout and explicit braces has been clarified * The types listed in a default declaration must always be in parentheses * Empty export lists are now allowed * Presymbols are gone; this allows "~" and "-" to be embedded in a symbol * Modules with an omitted module name export only `main' * Extra commas are allowed in import / export / hiding lists * Interfaces cannot use `deriving' * \begin{code} - \end{code} is now allowed in literate files as an alternative to ">" ("Bird tracks") Features Removed from Haskell A number of features have been removed from Haskell. Import declarations no longer have a renaming clause. Qualified names should be used to handle name clashes. This also removes the need for the `single name' rule (see Section 5.1.2 of the report). n+k patterns are being removed from the language. While this will no doubt break some existing code, it removes a highly irregular case in both the syntax and semantics. (The convoluted decl syntax came as a result something like `x+1 = 2', which could either be an infix definition of + or a pattern binding of x to 1. The parse for `x+1 = 2' and `(x+1) = 2' is completely different at the moment!) Unfinished Business These issues are still being debated. * The syntax for `newtype' is not certain. It would be possible to fold this into an ordinary data declaration: current: newtype N a = N (T a) proposed: data N a as N (T a) --? not sure of this syntax Separate keywords would perhaps be less confusing. On the other hand, there is no distinction between these sorts of declarations in the type language. * Records could be allowed for any constructor of a multi-constructor type. If multi-constructor types are used, pattern matching must be able to (at least) discriminate on the constructor: data R = R1 with f1 :: Int | R2 with f2 :: Bool f :: R -> Int f x = case R1 -> = f1 x -- No fields allowed in pattern match _ -> 0 f :: R -> Int f (R1 with f1 <- f) = f -- Allows pattern match against fields f _ = 0 * Record updates could be either strict or lazy: record Point = Point with pointX, pointY :: Int a = pointX (undefined with pointX <- 1) With a lazy implementation, a is 1, while with a strict implementation a is undefined. The lazy semantics allow records to be created by adding slots to a totally uninitialized value (undefined) while, to accomodate strict slots, the strict semantics require an explicit record creation construct. Mark Jones sums up the debate thusly: Lazy semantics Strict semantics -------------------- ---------------- Point is a first class value, Point with ... is a language the uninitialized record construct for building new points expr with fields is a construct expr with fields is a construct for updating a record, non-strict for updating a record, strict in in the expr part. the expr part. Arguably more in-line with Arguably, not an issue. non-strict semantics of Haskell. Performance hit in naive Easier to obtain a relatively implementations. efficient implementation. At the naive implementation level, lazy semantics require fresh thunks for all slots not being updated. With the strict semantics, thunks in the old record can be recycled. Implementors wonder whether the naive implementation can be easily optimized. * The field name namespace issue is (perhaps) unresolved. The other candidate is ML style localized namespaces for field names. This would allow for simpler field names (x instead of pointX) but would make type inference more difficult. Type hints may be required to disambiguate field names. * A qualified name could be made available either through an explicit qualified import or via any mention of a module in an import. Consider: module Foo where import M(x) y = M.z + x If explicit qualified imports are required, this would be an error since there is no import qualified M(z) or just import qualified M. * Instead of a separate qualified import, module could be imported with an empty list of entities: import M() This would allow M to be used as a qualifier. * There is a proposal in the works to make export lists more explicit, as in module E ( module M(..), class C, type T(..), instance M.C E.T, N.v, x ) where ... This would avoid the syntax problem associated with M.. at the moment. It would also allow the class and type namespaces to overlap. * Pattern match failure inside do could be a "zero" in a monad with zero instead of an error. Unfortunately, the IO monad does not have a true zero ( failWith "error" doesn't quite work). It would be possible to allow do syntax to be used with both Monad and Monad0 by eliminating or disallowing refutable patterns when the monad is not in Monad0. Thus, do (x:xs) <- readLine return xs would result in a type error since the pattern (x:xs) is refutable. Using ~(x:xs) would work.