淘先锋技术网

首页 1 2 3 4 5 6 7

类型系统

摘自《Types and Programming Languages 》by Benjamin C. Pierce

刘建文略译(http://blog.csdn.net/keminlau

1.1 Types in Computer Science

Modern software engineering recognizes a broad range of formal methods for helping ensure that a system behaves correctly with respect to some specification, implicit or explicit, of its desired behavior. On one end of the spectrum are powerful frameworks such as Hoare logic, algebraic specification languages, modal logics, and denotational semantics. These can be used to express very general correctness properties but are often cumbersome to use and demand a good deal of sophistication on the part of programmers. At the other end are techniques of much more modest power — modest enough that automatic checkers can be built into compilers, linkers, or program analyzers and thus be applied even by programmers unfamiliar with the underlying theories. One well-known instance of this sort of lightweight formal methods is model checkers , tools that search for errors in finite-state systems such as chip designs or communication protocols. Another that is growing in popularity is run-time monitoring , a collection of techniques that allow a system to detect, dynamically, when one of its components is not behaving according to specification. But by far the most popular and best established lightweight formal methods are type systems , the central focus of this book.

现代软件工程识别出一系列涉及面很广(broad range)的形式方法( formal methods )用以帮助保证系统按照规范(或详细规格)能按照预期的目的正确地运行。在涉及面的一端是强大的框架,像Hoare 逻辑、代数规范语言、模型逻辑和形式语义学。这些形式方法可用来表达很通用的正确属性(general correctness properties),但不适合部分程序员使用,因为它们过于复杂了。在涉及面的另一端则是特殊的功能也很小的自动机,小到可以做进编译器、链接器或程 序分析器。这种轻量的形式方法的一个有名例子就是模型检查器( model checkers ),一 种搜查有限状态机(如芯片设计或通讯协议)错误的工具。运行时检测(监控)则是另一种越来越流行的模型检查器。它是由一组功能的结合,能够动态地发现系统 的组件不按规范作业。不过,到目前为止最流行最常用的轻量的形式方法却是(编程语言的)类型系统,类型系统也是本书的中心。

As with many terms shared by large communities, it is difficult to define "type system" in a way that covers its informal usage by programming language designers and implementors but is still specific enough to have any bite. One plausible definition is this:

A type system is a tractable (易驾驭的) syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute .

A number of points deserve comment. First, this definition identifies type systems as tools for reasoning about programs . This wording reflects the orientation of this book toward the type systems found in programming languages. More generally, the term type systems (or type theory ) refers to a much broader field of study in logic, mathematics, and philosophy. Type systems in this sense were first formalized in the early 1900s as ways of avoiding the logical paradoxes, such as Russell's ( Russell, 1902 ), that threatened the foundations of mathematics. During the twentieth century, types have become standard tools in logic, especially in proof theory (see Gandy, 1976 and Hindley, 1997 ), and have permeated the language of philosophy and science. Major landmarks in this area include Russell's original ramified theory of types ( Whitehead and Russell, 1910 ), Ramsey's simple theory of types (1925) — the basis of Church's simply typed lambda-calculus (1940) — Martin-Löf's constructive type theory (1973, 1984), and Berardi, Terlouw, and Barendregt's pure type systems ( Berardi, 1988 ; Terlouw, 1989 ; Barendregt, 1992 ).

就像其它被广泛使用的计算机术语一样,很难给“类型系统”一个准确的定义,因为它被编程语言设计者和实现者非形式使用好多年了。不过,我们还是试着定义一下:

类型系统是一种易控制的句法方法,用以根据计算值的类型划分短语来证明程序行为的存在性。

这 里需补充几点。首先,这个定义说明了类型系统是用于对程序进行推理的工具。这也反映了本书所面向的编程语言里的类型系统(而不是别的)。更一般的地说,类 型系统(类型理论)这个术语涉及逻辑、数学和哲学等研究领域。在这些领域,类型系统首次被提出用于解决罗素的逻辑悖论……FIXME

Even within computer science, there are two major branches to the study of type systems. The more practical, which concerns applications to programming languages, is the main focus of this book. The more abstract focuses on connections between various "pure typed lambda-calculi" and varieties of logic, via the Curry-Howard correspondence ( §9.4 ). Similar concepts, notations, and techniques are used by both communities, but with some important differences in orientation. For example, research on typed lambda-calculi is usually concerned with systems in which every well-typed computation is guaranteed to terminate, whereas most programming languages sacrifice this property for the sake of features like recursive function definitions.

即便在计算机科学,类型系统也存在两个研究方向。面向实践的关注类型系统在编程语言的应用;面向理论的则关注各种“纯类型化的lambda计算”与逻辑变种的链接。两个方向都使用了相似的概念、符号和技术,但某些方面存在着重大的区别……FIXME

Another important element in the above definition is its emphasis on classification of terms — syntactic phrases — according to the properties of the values that they will compute when executed. A type system can be regarded as calculating a kind of static approximation to the run-time behaviors of the terms in a program. (Moreover, the types assigned to terms are generally calculated compositionally , with the type of an expression depending only on the types of its subexpressions.)

定义补充http://www.pphsg.org/cdsmith/types.html

  • syntactic method .. by classifying phrases : A type system is necessarily tied to the syntax of the language. It is a set of rules for working bottom up from small to large phrases of the language until you reach the result.
    划分短语的句法方法 :类型系统必须捆绑语言的语法,语法也就是自下向上的从单词到句子的规则集;
  • proving the absence of certain program behaviors : This is the goal. There is no list of "certain" behaviors, though. The word just means that for any specific type system, there will be a list of things that it proves. What it proves is left wide open. (Later on in the text: "... each type system comes with a definition of the behaviors it aims to prevent.")
    证明程序行为的不存性 :这是目标。程序的行为没有一个确定的列表。证明程序行为的存在性指的是任何一个特定类型系统都有需要它证明的类型列表……FIXME
  • tractable : This just means that the type system finishes in a reasonable period of time. Without wanting to put words in anyone's mouth, I think it's safe to say most people would agree that it's a mistake to include this in the definition of a type system. Some languages even have undecidable type systems. Nevertheless, it is certainly a common goal; one doesn't expect the compiler to take two years to type-check a program, even if the program will run for two years.
    易控制的 :意思是说类型系统能在一个合理的时间内完成工作。

定义中需要重点补充的第二点是,定义强调句法短语的划分,划分是根据句法短语在运行时计算值的性质。类型系统可以看成是对程序运行时(句法短语的)行为的静态近似的计算。另外,一个句法短语的类型常常是合成的,由句法短语的子表达式的类型决定它的类型。

The word "static" is sometimes added explicitly — we speak of a "statically typed programming language," for example — to distinguish the sorts of compile-time analyses we are considering here from the dynamic or latent潜伏性 typing found in languages such as Scheme ( Sussman and Steele, 1975 ; Kelsey, Clinger, and Rees, 1998 ; Dybvig, 1996 ), where run-time type tags are used to distinguish different kinds of structures in the heap. Terms like "dynamically typed" are arguably雄辩地 misnomers误称 and should probably be replaced by "dynamically checked," but the usage is standard.

“ 静态”有时会显式的加上,比如“静态类型编程语言”,用以与一些编译时分析的动态类型系统的区分,比如像Scheme 使用的动态或潜在类型系统。Scheme运行时类型标签用来区分在堆内存的数据结构。“动态类型“这个词常常被误用,其实按照标准的说法应该是”动态检查 “。

Being static, type systems are necessarily also conservative保守的: they can categorically prove the absence of some bad program behaviors, but they cannot prove their presence, and hence they must also sometimes reject programs that actually behave well at run time. For example, a program like

if <complex test> then 5 else <type error>

will be rejected as ill-typed, even if it happens that the <complex test> will always evaluate to true , because a static analysis cannot determine that this is the case. The tension between conservativity and expressiveness is a fundamental fact of life in the design of type systems. The desire to allow more programs to be typed — by assigning more accurate types to their parts — is the main force driving research in the field.

静态意味保守:尽管类型系统能够很明确不存在有坏的程序行为,但是由于它不能保证所有情况,所以一些在运行良好的程序行为也会被类型系统拒绝。比如,像程序:
if <complex test> then 5 else <type error>
会被认为是坏类型而被拒绝,虽然 <complex test>永远都是“真”。因为静态分析不能判断 <complex test>永远是“真”这一事实。保守性与表达能力之间的张力是类型系统设计所面临的基本问题。为了尽量的使用程序类型化(通过给程序的各个部分一个精确的类型)是这一研究领域的主要动力。

A related point is that the relatively straightforward analyses embodied in most type systems are not capable of proscribing(禁止; 放逐; 排斥) arbitrary undesired program behaviors; they can only guarantee that well-typed programs are free from certain kinds of misbehavior. For example, most type systems can check statically that the arguments to primitive arithmetic operations are always numbers, that the receiver object in a method invocation always provides the requested method, etc., but not that the second argument to the division operation is non-zero, or that array accesses are always within bounds.

但问题是,大多数类型系统里内置的相对简单的分析没能力禁止那些恣意的程序行为;它们只保证类型良好的程序不会有某些特定性质的非法行为。比如,大部分类型系统可以检测出算术运算操作的操作数是否都是数值……但不能检测诸如被除数是否为零、数组访问是否越界等。

The bad behaviors that can be eliminated by the type system in a given language are often called run-time type errors . It is important to keep in mind that this set of behaviors is a per-language choice: although there is substantial overlap between the behaviors considered to be run-time type errors in different languages, in principle each type system comes with a definition of the behaviors it aims to prevent. The safety (or soundness ) of each type system must be judged with respect to its own set of run-time errors.

能被语言的类型系统去除的坏程序行为常被称为运行时类型错误(run-time type errors)。这里必须注意的是这些运行时类型错误是由语言本身性质决定的(当然不同的语言的类型系统所面向的运行时错误会有重叠的部分),从某种意思上看,类型系统的设计就是为避免某些运行时错误的发生。一种语言的安全性体现在其类型系统对运行时错误的可控制性上。

The sorts of bad behaviors detected by type analysis are not restricted to low-level faults like invoking non-existent methods: type systems are also used to enforce higher-level modularity properties and to protect the integrity of user-defined abstractions . Violations of information hiding, such as directly accessing the fields of a data value whose representation is supposed to be abstract, are run-time type errors in exactly the same way as, for example, treating an integer as a pointer and using it to crash the machine.

类型系统能发现的坏程序行为不仅仅于一些低级错误,像调用一个不存在的函数,类型系统也可以检查高级的(代码)模块性来保证抽象机制的实施。一些违反信息隐藏的行为,像直接访问被抽象过的数据字段,会像其它运行时类型错误那样当掉运行的程序,比如把一个整数当成指针用。

Typecheckers are typically built into compilers or linkers. This implies that they must be able to do their job automatically , with no manual intervention or interaction with the programmer — i.e., they must embody computationally tractable analyses. However, there is still plenty of room for requiring guidance from the programmer, in the form of explicit type annotations in programs. Usually, these annotations are kept fairly light, to make programs easier to write and read. But, in principle, a full proof that the program meets some arbitrary specification could be encoded in type annotations; in this case, the typechecker would effectively become a proof checker. Technologies like Extended Static Checking ( Detlefs, Leino, Nelson, and Saxe, 1998 ) are working to settle this territory between type systems and full-scale program verification methods, implementing fully automatic checks for some broad classes of correctness properties that rely only on "reasonably light" program annotations to guide their work.

(类型系统)的类型检测自动机常常被内建入编译器或链接器,这也意味着类型检测会自动进行(并且是易控制的在可接受的时间内完成的),不需要人工干预。不过,类型检测在别的地方还存在很多需要程序员参与的空间,比如显式的给程序标注(annotations)类型。这种标注通常都很少量的,为了让程序更易写易读。从原理上讲,通过给类型标注可以实现给程序作全面的形式证明,证明其符合规范。这种情况下,类型检测自动机完全变成了证明检查器……FIXME

By the same token, we are most interested in methods that are not just automatable in principle, but that actually come with efficient algorithms for checking types. However, exactly what counts as efficient is a matter of debate. Even widely used type systems like that of ML ( Damas and Milner, 1982 ) may exhibit huge typechecking times in pathological cases ( Henglein and Mairson, 1991 ). There are even languages with typechecking or type reconstruction problems that are undecidable, but for which algorithms are available that halt quickly "in most cases of practical interest" (e.g. Pierce and Turner, 2000 ; Nadathur and Miller, 1988 ; Pfenning, 1994 ).

参考