Compiling

But in our enthusiasm, we could not resist a radical overhaul of the system, in which all of its major weaknesses have been exposed, analyzed, and replaced with new weaknesses.
Bruce Leverett - "Register Allocation in Optimizing Compilers"
Compiling, linking, assembling, program analyes...
The Value Dependence Graph (VDG) seems to be the purest form, where imperative programs are transformed into a functional, graphical form (including reading and writing explicit stores), and is being used by Microsoft (gasp! :-) Research, under the direction of Daniel Weise, for compiler analyses.
http://www.research.microsoft.com/research/analysts/
Yep. We have a pure functional form for reasoning about real C programs. Our take is that a functional form is great for reasoning about programs, but less than convenient for writing programs. (Though recent work on state in functional languages is encouraging.) A program generally has "gratuitous state" and "real state." Gratuitous state is the result of non-shared, non-aliased variables used as scratchpads or loop arguments. We easily eliminate the gratuitous state to end up with a global state variable only where needed due to aliasing. Most of the examples people point to when claiming imperative programs are harder to reason about than functional programs involve gratuitous state, which is a little silly. The stuff that's really hard to reason about are the linked datastructures sitting in the heap. We're working on ways to break the residual state into multiple pieces to speed analysis and make it more precise. (As a result of our approach, we do a better job of finding common subexpressions than other systems, which is what this thread sort of started about. But in truth, there's no real gain to be had from doing this in real programs.)
The idea is that a functional representation makes analyses and optimizations easier. From what I have read, it sounds like they are using partial evaluation techniques (Fuse-like ones) to some extent.
Nope. We use beta-substitution, which many people confuse with PE, and we'll perform algebraic simplification, as any good compiler will, but there's no PE. (Although the graphs we use are a direct descendent of Fuse's graphs.)

Daniel


(25.08.95) Compiler-Related Internet Resource List noch nicht naeher angesehen
You can find information on GANDF (ANDF installers built from gcc back end) via the OSF RI Web server:

http://riwww.osf.org:8001/index.html

The ANDF page is:

http://riwww.osf.org:8001/andf/index.html

At the end of the page there is a link to "List of ANDF Papers", which includes the ANDF/TDF specification.

At the Grenoble OSF RI we have participated in the development of an ANDF interpreter. This was described at the ACM IR'95 workshop on intermediate languages - the paper "Validation of ANDF Components" is available at URL:

http://www.gr.osf.org:8001/general/gren_ri_andf_papers.html

Please contact me if you are interested in using this tool.

James (loveluck@gr.osf.org (James Loveluck))


Newsgroups: comp.compilers
From: collberg@cs.aukuni.ac.nz (Christian Collberg)
Subject: Linking References Available
Keywords: linker, bibliography
Organization: University of Auckland
Date: 30 Oct 1994 23:49:44 MET

I'm maintaining a list of references to linking:

http://cs.aukuni.ac.nz/~collberg/Research/References/linking.html

Currently there are 54 BibTeX entries.
Additions/corrections are welcome.

Enjoy,

Christian


--
___________________________________________________________________________
Christian Collberg                |   Email: c_collberg@cs.auckland.ac.nz
Computer Science Department       |   Fax:   +64-9-373-7453
University of Auckland            |   Phone: +64-9-373-7599 x 6137

Stanford SUIF home page
Subject: Need WWW or FTP addr. for information on alias/pointer-analysis

Check out http://acaps.cs.mcgill.ca/ftp/ftp.html


Newsgroups: comp.compilers
From: norman@flaubert.bellcore.com (Norman Ramsey)
Subject: Help available for direct manipulation of machine code
Keywords: code, tools, available
Organization: Bellcore
Date: 20 Dec 1994 19:36:10 MET

There has been some discussion recently about getting compilers to
emit object code instead of assembly language. The ``New Jersey
Machine-Code Toolkit'' helps with this and related problems, making it
easy to emit and recognize machine instructions in binary form. (It
doesn't help much with the rest of the object file.)

Applications that use the toolkit are written at an assembly-language
level of abstraction, but they recognize and emit binary. Guided by a
short instruction-set specification, the toolkit generates all the
bit-manipulating code. We've written specifications for the MIPS
R3000, SPARC, and Intel 486.

We have written two applications: a retargetable debugger and a
retargetable, optimizing linker. The toolkit generates efficient
code; for example, the linker emits binary up to 15% faster than it
emits assembly language, making it 1.7--2 times faster to produce an
a.out directly than by using the assembler.

More information, the software, and annotated machine descriptions,
can be found on the Web at URL
http://www.cs.princeton.edu/software/toolkit
Those without http access can ftp it all from
ftp://ftp.cs.princeton.edu/pub/packages/toolkit/v0.1a.tar.gz
A paper about the toolkit will appear in the 1995 Winter USENIX Conference.

Norman Ramsey Bell Communications Research norman@bellcore.com
Mary Fernandez Princeton University mff@cs.princeton.edu
Newsgroups: comp.compilers
From: dmclaugh@usa.acsys.com (David Paul McLaughlin)
Subject: my personal bug lists for compilers
Keywords: errors, WWW
Organization: Applied Computing Systems, Inc.
Date: 14 Dec 1994 19:20:13 MET

Here's is the URL:

/http://www.acsys.com/~dmclaugh/

have fun!
--
David P. McLaughlin
Applied Computing Systems, Inc.
David.McLaughlin@USA.ACSys.COM
(26.03.95) BTYACC announcement:
[..] an enhanced version of yacc, which supports backtracking in the event of parse conflicts and inherited attributes. These are actually closely connected as inherited attributes in a bottom-up parse tend to introduce conflicts. The program is public domain and is available for ftp.

jimad@microsoft.com (Jim Adcock) writes:
>What we see emerging are "standard object models", calling conventions,
>etc, for each particular operating system, such that compilers, libraries,
>apps, etc on that particular operating system can communicate with each
>other.

Unfortunately this can't happen on Microsoft platforms, it appears, since
Microsoft has patented their object layout. (Jim, I certainly know you
don't set policy for Microsoft, but since you bring up the issue...)

To view this patent, entitled "Method and system for implementing virtual
functions and virtual base classes and setting a this pointer for an
object-oriented programming language", number 5,297,284, point your
Web browser at

http://www.town.hall.org/Archives/patent/data1/05297/05297284

Read it and weep. Committee members, don't you wish *you* had this much
gall?

In essence, Microsoft patented an improvement on the "thunks" method
described in the ARM -- an improvement that has been independently
re-invented by many folks, including myself and Per Bothner of Cygnus,
before the patent was published, but may be OK anyway if there wasn't
a publication describing it. It would appear that any other compiler
vendor implementing objects with multiple inheritance or with virtual
base classes in a compatible way would infringe the patent. They could
license the patent, but this would rule out a compatible g++ port.

It would be legal for Microsoft to refuse to license this patent
altogether, meaning no legal compiler could be compatible. Aren't
software patents fun? Any dominant player in any market can make it
illegal for competitors to be compatible.
--
-- Joe Buck <jbuck@synopsys.com> (not speaking for Synopsys, Inc)
Phone: +1 415 694 1729

(21.03.95)
Newsgroups: comp.compilers
From: "Ronald F. Guilmette" <rfg@rahul.net>
Subject: Reliability (was: Re: Optimizing Across && And ||)
Keywords: optimize, design, testing
Organization: a2i network
Date: 16 Mar 1995 19:46:33 MET

David Keppel <pardo@cs.washington.edu> wrote:
>
>Opinion alert:
>... [Stuff about why implementors spend so much time worring about their
>     SPECmark numbers deleted]...
>
>I think we're starting to see changes here already, but it's an
>uphill battle to sell reliability...
Tell me about it! I sell compiler test suites for C and C++. It's a hard sell, even at the best of times, even though I price these things dirt cheap (relative to what it would cost for the compiler vendors to reproduce even a small fraction of my test cases).

I even had a guy at one local semiconductor firm say to me once "If we buy your test suite, then we will find some more bugs in our compiler. So what?"

I kid you not! This is really what he said.

>... which is unpleasant to talk about...

Compiler quality is only unplesant to talk about for those folks that don't have it. (I could name names here, but I won't.)

>... and hard to measure.

Rubbish! I've developed sophisticated automated techniques for producing randomized compiler test cases that root out even the most subtle compile- time bugs. Free samples are available for anonymous ftp in the files:

ftp.rahul.net:pub/rfg/roadtest/sample.1
ftp.rahul.net:pub/rfg/roadtest/sample.2

If you want to see your favorite C compiler screw up and give errors on some perfectly valid C code (and maybe even give up the ghost) just try compiling these examples. They found bugs in every one of the 20+ C compilers I've tried then on so far (although some verdors have since fixed the bugs which are triggered by my test cases).

Suggestion: If you try these test cases on some C compiler(s), I suggest disabling all warnings when doing the compiles. Any actual _errors_ that you get will be due to compiler bugs.

>Performance is less hard (but not easy) to measure and obviously desirable.

I can't remember who said it, but I am often reminded of a wonderful quote: "It is easy to get incorrect answers infinitely fast."

I got myself the two samples and submitted them to two ansi C compilers available here, gcc -ansi and Sun's acc. Only one case (sample.2 and gcc) resulted in a compiler crash. Here are the results:


A little while ago somebody on comp.compilers asked if the source for Pascal-S, a subset Pascal compiler, assembler and interpreter described in a paper by Wirth some years ago, was available on the net. I offered a version that I subsequently found had been vapourized in a VAX crash some years ago. However, by sniffing around the net I found another copy that I have modified to run under Turbo Pascal version 5. You can get this from my ftp server:

cscx.cs.rhbnc.ac.uk::/pub/compilers/pascals.pas

Pascal-S is written in Pascal, and forms an excellent introduction to the art of designing small compilers. The best source for the paper is in an edited collection by Baron entitled `Pascal - the language and its implementation' I should think it is out of print now, but most university libraries should have a copy. A modified version was used by Ben-Ari in the first edition of his book on concurrency.

Adrian

PS If you are interested in this kind of thing, you might like to join the mailing list for my RDP compiler-compiler which is a good way to start learning about recursive descent parsers. Send me a note if you want to find out more.

--
  Dr Adrian Johnstone, Dean of the Science Faculty, Dept of Computer Science,
    Royal Holloway, University of London, Egham, Surrey, TW20 0EX, England.
Email: adrian@dcs.rhbnc.ac.uk  Tel: +44 (0)1784 443425  Fax: +44 (0)1784 443420

Newsgroups: comp.compilers
From: quong@merlot.ecn.purdue.edu (Russell W Quong)
Subject: Another backtracking parser generator:  ANTLR.
Keywords: parse, PCCTS, LL(1)
Organization: Purdue University Engineering Computer Network
Date: 30 Mar 1995 19:49:24 MET

Along the subject of parser generators that produce backtracking
parsers, please be aware of the current release (1.31) of ANLTR,
which was developed by Terence Parr and myself.

ANTLR
 - produces recursive-descent LL(k) parsers.  In practice,
   k is limited to 3 or 4 depending on the grammar size.
     * allows the use of "predicates" to direct the parse.
     * Backtracking is goes under the name "syntactic predicate"

 - produces recursive-descent LL(k) parsers, which are easy to use/debug.
     * generates C++ code in the form of C++ parser classes,
        from which the user can subtype
     * it can also generate C code, though this option is showing its age.

 - has been a public-domain tool since the early 1990's,
   and is perhaps the most widely parser-generator after yacc/bison.

 - lexical analyzer support built in (no need for separate lexer file)

 - is part of the Purdue Compiler Construction Toolset (PCCTS)

 - has a newsgroup: 	comp.compilers.tools.pccts

 - has been ported to the PC/other platforms (see the newsgroup)

 - has an FTP-SITE = ftp://ftp.parr-research.com/pub/pccts

 - has technical papers
     * a user's manual
         [ in FTP-SITE/documentation/manual.ps ]
         (this manual is showing its age, but a book is being written).
     * a conference (1994 Intl Conf on Comp Constr) paper on predicates
  	 "Adding Semantic and Syntactic Predicates To LL(k): pred-LL(k)"
         [ in FTP-SITE/papers/predicates.ps.Z ]
     * a soon-to-appear article in Software Practice and Experience,
  	 "ANTLR: A Predicated-LL(k) Parser Generator"
         ( which we cannot give out until the article appears )

In both the conf. paper and SPE article, we give examples of how to
use predicates and backtracking.  In particular, we specifically show
how get around the "lexer feedback hack" and how to parse C++
expressions and declaration via backtracking, as Chris Dodd mentioned
in his btyacc posting (excerpted below).

(03.04.95)

Ulrich Neumerkel hat seine Diplomarbeit über Speicherbereinigung für Prologsysteme in HTML geschrieben (oder zumindest umgewandelt). Sehr löblich.


EEL: Machine-Independent Executable Editing
James R. Larus and Eric Schnarr
(To appear: PLDI '95, June 1995)

Abstract:
EEL (Executable Editing Library) is a library for building tools to analyze and modify an executable (compiled) program. The systems and languages communities have built many tools for error detection, fault isolation, architecture translation, performance measurement, simulation, and optimization using this approach of modifying executables. Currently, however, tools of this sort are difficult and time-consuming to write and are usually closely tied to a particu lar machine and operating system. EEL supports a machine- and system-independent editing model that enables tool builders to modify an executable without being aware of the details of the underlying architecture or operating system or being concerned with the consequences of deleting instructions or adding foreign code.

ftp: ftp.cs.wisc.edu:wwt/pldi95_eel.ps.Z


(03.04.95) Amongst the stranger kind of compilers are binary translators, i.e. systems that port executables. One of these is AT&T's FlashPort (Technology Overview). This page is part of the Trash Heap.