-

Automatic source code reduction⋆

Jiˇr´ı Diviˇs

Ondˇrej Bojar

bojar@ufal.mff.cuni.cz 0 0 Charles University, Faculty of Mathematics and Physics Malostranske nam. 25, Praha 1, CZ-118 00 , Czech Republic

9 16

The aim of this paper is to introduce Reductor, 1.1 Features of Reductor a program that automatically removes unused parts of the source code of valid programs written in the Mercury lan- Reductor implements two different kinds of reductions guage. Reductor implements two main kinds of reductions: - statical reduction and dynamical reduction. The statical reduction and dynamical reduction. In the statical user chooses between the statical and dynamical rereduction, Reductor exploits semantic analysis of the Mel- duction, not both at the same time. There is also bmoouvrende fMroemrcuthrey pCroomgrpaimle.r Dtoynfianmdircoaultrinedeuscwtihoinchocfarnoubteinrees- a trivial module reduction, which is done implicitly. additionally uses Mercury Deep Profiler and some sam- The module reduction takes all modules from ple input data for the program to remove unused contents the current directory that are (transitively) imported of the program routines. Reductor modifies the sources of by the main module of the program and copies them the program in a way, which keeps the formatting of the without any change in their contents into specified desoriginal program source so that the reduced code is further tination directory. editable. The statical reduction takes as an input a Mercury program and removes some inaccessible routines from the sources of the program. Inaccessible routines 1 Introduction are the routines that program would never call and that can be deleted, because they do not appear in any definition of any of the remaining routines. The dynamical reduction removes parts of routines that were never used on several inputs to the program. The reduction is based on profiling data from the runs of the program on some specified input. The deep profile is generated by Mercury Deep Profiler. The dynamically reduced program can be compiled but Reductor guarantees that the reduced program will generate the same output only for inputs the deep profile was obtained on.1 On different inputs, the reduced program may terminate with an exception. Reductor tries to preserve the original formatting of the original program to as much as possible. Code segments corresponding to goals or routines of the program to be removed are commented out using /*red: ... :red*/, any comments of the kind /* ... */ in such a segment are transformed into /nested:* ... *: nested/. If Reductor needs to add some code, then it is done by delimiting the inserted term by two newlines and a comment indicating that the term was inserted by Reductor.

Mercury [1] is a fast logic and functional programming language with advanced error detection features, developed for writing large real-world programs. Its syntax builds upon Prolog syntax (Prolog predicate clauses), adding some new declarations. These declarations are used for error checking and to speed-up the execution of a compiled program. Another great feature of Mercury is that it can compile to C or Java and thus it easily interfaces with foreign code. Motivations behind the design of Mercury are very well summed up in [2].

Reductor [3] is mainly intended for programmers who wish to understand and/or reuse code of a big program or start an independent project based on just a few features of an existing one. This is why Reductor modifies the sources of the program in a way, which preserves the original formatting, so that the reduced code is further editable. But there are also other uses, like reduction of the size of executables, decreasing the compilation time of reduced programs or releasing only a subsection of a huge project.

We suggest the reader gets acquainted with basics of Mercury language by reading one of the following summaries: Ralph Becket’s tutorial [4] and Seri´al Mer- 2 cury on ROOT.CZ [5] (in Czech).

Overview of Reductor

⋆ This work has been supported by the grants Euro

MatrixPlus (FP7-ICT-2007-3-231720 of the EU and 7E09003 of the Czech Republic) and MSM 0021620838. Reductor consists of the following parts.

1 We assume that the output of the original program is determined solely by its input data. 1. Intermodule Representation of the Program (IMR). After the third stage, each term is converted into This structure represents the entire Mercury pro- a syntax tree of the item.2 We will call this the Itemgram on various levels of representation (see Sec- Level Representation (ILR) of a program. After tion 2.1). It is used by all three remaining parts of this stage, each item is categorized based on the inReductor and it is based on the results of the se- formation extracted solely from its corresponding mantic analysis of Melbourne Mercury Compiler term(s), not from other terms of the module. The parse (MMC) which we reuse and modify to suit our tree still contains most of the syntactic details of the needs. item in the program. 2. I/O Interface. (Section 3) This part handles the Finally, after the fourth stage, the data structure representation of the changes in the program on called by the designers of MMC the High Level Data the term level (see below) and offers an interface Structure (HLDS) is constructed and filled with the for making the changes proposed by the statical results of the semantic analysis of a module. This inand dynamical reduction modules. It also outputs cludes the following: Declarations from imported modthe reduced sources. ules are added, predicates and their goals and subgoals This representation is designed for storing the are annotated with inferred determinisms and modes, source changes while preserving the original source variables in scopes of each subgoal of each predicate and its formatting as much as possible. are annotated with their modes and types, goals are 3. Dynamical Reduction. (Section 4) Using the data reduced to a certain equivalent subset of the original from IMR, this module dynamically reduces the goals from the code and thus lot of syntactic detail irprogram and submits the changes to the I/O in- relevant to the semantic analysis of the module source terface. This module also extracts the data from is lost.

the deep profile. After this stage, still most of the items from the 4. Statical Reduction. (Section 5) Using the data from source have their corresponding structures in HLDS, IMR, this module statically reduces the program but not vice-versa—e.g. there are structures for predand submits the changes to the I/O interface. icates that were automatically generated. We call the described state of HLDS the High-level Representation (HLR). 2.1 Levels of representation of a module As a part of the fourth stage, based on the mode analysis of the program, a procedure is constructed Reductor utilizes the results of the semantic analysis for each mode of the predicate/function because clauof Melbourne Mercury Compiler (MMC) [1]. We thus ses have to be reordered for each mode independently. extracted the part of MMC code responsible for the Also, in HLR, all clauses of a single procedure are compilation up to the end of semantic analysis and we combined into one goal. Similarly as for term, we debased the Reductor on the extracted code. fine the subgoal and the top-level goal (base goal).

The mentioned part of the compilation of From now on, we will usually ignore the difference a module consists of the following four main stages. between functions and predicates calling them simply At these stages, the program is represented in certain predicates. data structures. We call these data structures the level Finally, we define the call site as a goal which can of representation of the program. call a procedure (this includes goals that call lambda

Mercury program is composed of declarations and expressions). clauses, we will use the term item for either those.

The rst stage is lexical analysis, after which a module is represented as a list of lists of tokens. Each such 2.2 Inter-module representation list of tokens corresponds to an item. We call this rep- Considering the program representation, there is a big resentation Token-Level Representation (ToLR). difference between the needs of Reductor and the

Each item in a program corresponds to what can be needs of Compiler in the way modules are handled. considered the standard ISO Prolog term for the pur- Compiler compiles each module of the program indeposes of this paper. The second stage consists of pars- pendently but Reductor’s statical reduction needs to ing of the tokens into terms. After this stage a mod- gather semantic analysis for all modules at once and ule is represented as a list of terms. We will refer to store it for later uses. This fact has also some implicathis as the Term-Level Representation (TeLR). tion on memory requirements of Reductor. The parts of a term that are themselves terms may Our approach to combining MMC’s data strucbe called subterms and the term that corresponds to tures of each module into IMR is quite simple. We use an item may be sometimes called base term to avoid confusion. 2 Some terms may be combined to form as single item. map data types to map fully qualified module names 3.1 Specifying the changes in a term to these structures. There are also a few structures which are combined in a more sophisticated way to The changes in a term are represented by a term improve efficiency, but these details are unnecessary change. This structure mirrors the original term strucfor the purposes of this paper. ture and stores, for each subterm of the term, how the

From the beginning, it was clear that we will have subterm is changed. The change can be (1) replacing to reuse some of the MMC source. The question was the term with the string, (2) enclosing the subterm by whether to use only ILR or use also HLR of the pro- two strings or (3) replacing the term with one of its gram. We decided to use MMC’s semantic analysis subterms. in order to achieve powerful enough reductions, es- Formally, the term change can have one of the folpecially in the case of dynamical reduction. On the lowing values: other hand, the use of HLR makes Reductor more de- no change: The corresponding term and all of its pendent on MMC and makes it harder to understand, subterms are not modified. which is among other things caused by the fact that no lvl change(list(term change)): The term is MMC is designed for other purposes than simplicity not modified at the level of its functor, the and reusability of the semantic analysis alone. term change list stores information about the sub

As a part of IMR, there is also a structure that terms. stores information about the module dependencies subst ins(string, list(term change), string): The (imports) and the source files that they came from. term is to be preceded by the string given as the This structure is extracted from the make-like part of first parameter and followed by the string in the MMC that directs compilation. third parameter. The second argument represents the changes made on the term’s arguments. 3 I/O interface replace(string): The corresponding term is to be replaced by the given string.

The I/O interface allows both the statical and dynam- subst del(list(term change)): The substructures ical reductions to specify changes on the term level by of this term change, i.e. the lit of term changes adding, deleting or changing a particular term that have restricted set of allowed constructors here. corresponds to an item of a module. Each change is There can be only no change constructors, except specified in a data type called term change, see be- that one orig constructor has to be present. The low. substructures of the orig constructor have no reThe I/O interface performs the following steps: strictions.

The term that corresponds to this term change is 1. For each module in a program, the TeLR and a cor- substituted by the term that corresponds to the responding modi ed ToLR (MToLR) is con- term change with the orig constructor (which may structed. MToLR extends ToLR by storing the be changed further). exact position of the first character of the token in orig(list(term change)): This is applicable only if the corresponding source file. contained in substructure with subst del con2. A blank structure to hold term changes is set up structor as described.

for each module. 3. tAiotntsh.isTphoeinIt/,Ochianntgerefsaacreepsruobvmidiettsetdhbeyatshsoecrieadtiuocn- 3.2 Construction of change lists of items in TeLR with items in ILR, but the term For each source file we need to construct a change list change itself has to by provided by the reduction. which specifies what will be modified in the file. This 4. From the collected changes, we construct is done by constructing the change list for each term a change list for each module of the program, in the same order as they appear in the module, and which represents the changes to be made in the concatenating the change lists. We briefly describe our sources in terms of list of string insertions and approach to the problem of constructing the change deletions. The algorithm for this is explained in list for a term.

Section 3.2. We observed that each subterm consists of contiguEach element of the change list represents either ous block of tokens in the MToLR. The block of tokens the range of positions in the sources which is to is not affected by any subterms other than its own. be removed or a string and the position where to Our algorithm is based on a synchronous traversal insert it. of a base term and its term change and identifying 5. We copy the original sources and apply the the lists of consecutive tokens that correspond to the changes specified by change lists. subterms. This gives us for each term its beginning and end positions in the list of tokens. From this we compute the subterm’s beginning and end positions in the source file.

We determine the list of tokens that correspond to each subterm by finding how a given base term can be written using tokens, while matching it against the actual list of tokens in the MToLR. This gives us the beginning and ending token of each subterm.

While we traverse the subterm and the corresponding term change in the mentioned way, we additionally mark (according to the term change) which segments of token lists are to be removed and at which positions to insert the strings given in the term change.

The change list of the term is then constructed from the position information associated with the tokens. 4

Dynamical reduction 2. Determine which goals were called in HLR: (Section 4.4) We collect the data from MDP for some of the call sites in the program. The data we collect tells us which ports of the standard box model were used on those call sites. We then use the information from MMC semantic analysis to improve this information, as the data transfer from MDP to HLR is inaccurate. 3. Determine which predicates will be reduced: (Sections 4.2 and 4.3) Any procedures may be excluded from the reduction, if necessary, because our dynamical reduction has a local character—the reductions made in a predicate do not depend on reductions made on any other predicates. Currently, predicates that have multiple modes and typeclass methods are unsupported and thus we do not attempt to reduce them. As it will be seen later, the reduction of multi-moded predicates can be achieved by trivially combining the data of their individual procedures in HLR. 4. Transfer the data about called atomic goals into

the corresponding clauses in ILR. 5. Reduce the goals of an item: Identifying the subgoals of the processed item clause that are to be substituted with an exception. 6. Transfer the changes from ILR to TLR: (Section 4.5) Construct the term change for the term that corresponds to the item being processed and submit the information to the I/O interface described in Section 3.

The idea of the dynamical reduction is to take the

data collected by Mercury Deep Profiler (MDP) [1] on multiple runs of the program with various input data (we will call them test data in the following) and based on the profile, remove most of unused code.

This is achieved by cleverly substituting goals of unused branches of code with calls to the throw predicate (which throws an exception) so that we preserve the syntactic correctness of the program.

We will illustrate the basic idea on a simplified model. Imagine a simplified version of the execution model, which executes goals in top-down, left-to-right order. In this model we collect the information from the profile, which indicates for each atomic goal 4.2 Finding goals substitutable with whether it was called during the execution of the runs exceptions on the test data. We then reduce each procedure by substituting topmost subgoals that do not contain any This algorithm for computing removable goals is based called subgoals with a call to the throw predicate. The on the observation that in Mercury, the programmer resulting program will have same outputs for the runs can substitute any goal with a call to throw predicate of input data used to construct the profiling data. On without compromising the syntactic correctness of the different data the program may exit with an exception. program, i.e. the resulting program compiles, but it

The problem with this description is that MMC may give lot of warnings (e.g. about the presence of does not create programs that execute goals in this singleton variables, various determinism warnings). simple top-down, left-to-right sequence. For example, The algorithm processes goals of clauses in the ILR MMC reorders conjuncts to satisfy mode declarations to identify removable goals in program clauses. By reof the calls in the conjunction. However these issues do movable goals, we mean some subset of goals that not prohibit a similar approach to the one described. can, but does not have to, be substituted with an exception, while preserving the program semantics on 4.1 Overview of dynamical reduction the test data. By non-removable goals, we mean the complement of the subset. We describe the algorithm We present the main steps Reductor goes through to here and we discuss its correctness in Section 4.3. dynamically reduce a program. These steps and addi- As an input to this algorithm, we assume that we tional details are then discussed further in the sections are given the information about which atomic goals below: were called on the test data. More precisely, we assume 1. Load the pro le created on the test data: We inte- that we get a superset of the called atomic goals. We grated MDP into the Reductor and we use it to define atomic goals as the goals that do not have any collect and read the profiling data. subgoals, i.e. calls and unifications.

If we consider the ordering of goals that is given by Strict sequential operational semantics. For the traversing the goal tree in top-down, left-to-right or- dynamical reduction we need the following three asder, we mark all goals that are preceded by any called sumptions. We believe that using strict sequential opatomic goal (called goals included) as non-removable. erational semantics in MMC (explained in [6]) ensures

This means that some goals might not contain any them. called goals and still be non-removable. The reason for this is the fact that if we substituted those goals with a call to throw, we would change the way the goal is reordered and this could cause premature termination of the reduced program by an exception. Code reordering was the major challenge in the design of the dynamical reduction and we discuss it in Section 4.3.

The algorithm traverses the goal in the opposite order to the one described earlier. Each goal gets annotated with information saying if it is non-removable.

We call this information removal status: 1. No calls are optimized away. Our implementation of dynamical reduction assumes that the data from the deep profile are accurate in the “semantic” sense. If some calls were optimized away, the collected profile would indicate that the goal was not called, which then might cause the goal to be considered removable, and as such it could be potentially substituted with an exception. This could then make the reduced program to incorrectly throw an exception on the test data. 2. Conjunctions are reordered minimally, every time the ordering of conjuncts is the same. This restriction will be clarified later in this section, for now we just note that the if we created two profiles on the same test data with different conjunction orderings, we may get different information about which atomic goals are called, which could again potentially lead to different output of the algorithm from Section 4.2, which can be problematic. 3. Disjunctions are not reordered. The reason is same

as in (2).

Correctness of dynamical reduction. It is guaranteed that a program dynamically reduced for a given input will give the same results as the original one, if both are compiled with strict sequential operational semantics and the following conditions hold: All goals that contain a single subgoal: The removal status is the same as the removal status of its subgoal. `and', `implication', `equivalence': If the secondargument goal is non-removable, then the firstargument goal and all its subgoals are considered non-removable. `or': Removable if both of its subgoals are removable,

otherwise non-removable. `if then else': The ‘if then else’ goal inherits removal

status from its ‘if’ subgoal.

Atomic goals: They are non-removable if they are called, otherwise they are removable. This can be overridden with in ‘and’, ‘implication’, ‘equivalence’, ‘or’. 4.3

Issues with code reordering

In this section, we discuss the problems associated

with the fact that MMC may reorder goals. We discuss our assumptions about the compilation model of MMC and conditions that need to be satisfied for the algorithm for computing removable goals, introduced in Section 4.2, to be correct.

We note that by default, MMC reorders conjunctions to satisfy mode declarations of the calls in the The first two conditions ensure that each proceconjunctions. Additionally, MMC may reorder dis- dure is called with the same inputs as in the original junctions and optimize away some calls and do other program. We did not determine, if the condition (1) optimizations as discussed in [6]. is necessary, but we suspect that if we would not re

In a nutshell, in order for algorithm from Sec- quire it, there might be problems with goals with the tion 4.2 to work correctly, we need to assure that in determinism multi. both the original program, compiled to produce deep The failure to comply with the 2nd or 3rd condition profile, and the reduced program, the executed goals causes the reduced program to terminate by exception are identical in both programs and that they are exe- on the test data. cuted in the same order (all with respect to particular We assume that these restrictions are satisfied by input data). the default mode reordering algorithm of MMC as 1. The conjuncts that were called in the original program have the same ordering in the reduced program with respect to each other as in the original program and the call goals call the same modes of the called predicates. 2. The originally uncalled conjuncts are not re

ordered in front of any originally called conjuncts. 3. In the reduced program, only originally uncalled conjuncts may be substituted with call to throw. reused in Reductor, if strict sequential operational se- Transfer of the call status from atomic goals in mantics are used. We also believe that the use of any HLR to atomic goals in ILR. The algorithm for algorithm that performs minimal reordering as speci- identifying non-removable goals (Section 4.2) operates fied in Mercury Reference Manual [6] should also sat- on the ILR, but we have the information about called isfy our restrictions. If not, the use of other mode re- goals in the HLR. Therefore, we need to transfer the ordering algorithms in MMC might cause problems. data to the item level representation.

Also, it should be noted that we reasoned about correctness of the algorithm based on the ILR, but 4.4 Collecting data from the deep pro ler the actual mode reordering algorithm of MMC operpDaarttsa: f(r1o)mthtehceodperoinsleerrt.edThbey MMDMPCcofnorsisctoslloefcttiwnog tartaenssoitniotnhebeHtLwRee.nTIhLuRs wanedshHoLulRd dmoaeksensoutrecatuhsaet atnhye aprowfielbingintinerfofarcmeatthioant ipnrteosetnhtesDteheep.cdoallteactefidle,inafnodrm(2a)- tormouitbilte.foUrnbfroervtuitnya.tely this is very technical and we tion to the user. We call the information stored in One atomic goal in ILR can have multiple correDeep.data the deep pro le and use it in Reductor. sponding atomic goals in HLR. We designed an algo

The deep profile contains for each call site in each rgiotahlmsobfatshede otwnoanreapprpersoenxtimataiotensm, awthcherinegthofetlhoestatinomforictphraotcewdeurree uthseedlibsyt otfhepoprrtoscoedfuthreessctaalnleddarfdrobmoxthmeocdaelll smhaotrito,naanbyouILtRca’sll asttoamtuisc igsopaalrtisiamllyarrkeecdonastsrucactlleedd., Iinf smitoed. eMl(ocraellp,reexciits,elfyaiiln, raedddoi)t,ioMn etrocutrhye hsatasnodnaerdadbdoix- ItLheRregiosaal. called goal in HLR which corresponds to the tional port for the procedure throwing an exception. The matching of the atomic goals of the two representations is based on the fact that throughout all four Transferring data into the IMHLDS. We need to stages of compilation we reuse, MMC associates with transfer the information about the used ports from the each goal the information about what source file and deep profile to the corresponding call sites in IMHLDS. line number the goal is located on (for the purposes

Unfortunately, the data about call sites from the of error messages). We use this as an (ambiguous) tag deep profile do not correspond 1:1 with the HLDS, of each goal. The matching of the atomic goals of the because the deep profiling code is generated after the two representations thus consists of the matching of construction of HLR (i.e. the semantic analysis or the these tags3. 4th stage of the compilation). Also the data do not generally contain enough information for an unambiguous pairing of the two call-site structures. We thus had to design a mechanism that accounts for this.

Improving the information on called goals. Because the information from the deep profile that we collected into the HLR is not accurate and it does not give us much information about unifications, we improve our information about which atomic goals are called using the information obtained from MMC semantic analysis. 4.5

Changing the individual items of the program At this point it is decided which predicate goal will be changed and everything from now on is done locally on individual clauses of the program. We describe the design of the process of transforming each clause of a predicate by constructing term change for a term, given the original term and item that correspond to the clause and HLR with the information about call status of atomic goals.

Constructing the term change for the clause.

As an input to this, for each subgoal of the clause goal (in ILR), we know if the subgoal is removable.

The ILR of a clause corresponds reasonably well to the TeLR of a clause. This fact allows us to construct the term change by simultaneously recursively traversing the clause goal in ILR and its corresponding term while building the term change for the I/O interface.

If at any point we are not able to pair the ILR goal with its corresponding term, we just drop the change leaving the problematic term unreduced. This is based on the observation, that changing the removal status of a goal from removable to non-removable is never harmful, we only reduce less. 4.6

Final remarks on dynamical reduction There remains some unexploited potential for dynamical reduction if we allowed code reordering: reorder for mode constraints and maximum dynamical reduction. On the other hand, the implementation of this 3 Interestingly, the matching of items to terms in I/O interface is based on a similar principle. extension would be rather difficult and we believe that we would not gain much by this because (1) usually, the programmers write predicates in the “correct” order and (2) if we did major reordering, the reordering might confuse the programmer.

Recently, we learned about the new Profiler Feedback Framework in MMC, which might perhaps have saved us some work. The framework feeds back the information from the deep profile into the MMC. Unfortunately this new and still hidden4 feature is not available in the version of Mercury the Reductor is based on. 5

Statical reduction This section describes how Reductor identifies inac

cessible procedures to decide which clauses and declarations to remove. As indicated by the name, this analysis is performed statically, without the need to run the program in question.

First, we precisely define the notion of accessible procedures. Procedure B is accessible from procedure A, if there is a call site that can call procedure B in a goal of some procedure that is accessible from A. We call accessible procedure any procedure in a program that is accessible from the predicate main. It is not generally possible to say what procedures can be called from a call site. Reductor thus determines only superset of accessible procedures (elements of the superset may be further called accessible procedures, for brevity).

There are four steps that are taken in the process of statically reducing the procedures of a program: called is known (in HLR). For dynamic call sites, the procedure is determined at runtime. They are two kinds of a dynamic call site: (a) call sites for lambda expressions, and (b) call sites for instances of a class method.

Call sites (b) can be considered static, because the class method called is known (the instance is not).

Procedures that can be called from (a) are determined by treating the unification that declares the lambda expression as a call site of its lambda expression. 2. Identify further non-removable procedures. All accessible procedures are naturally non-removable.

Unfortunately, there are cases when even an inaccessible procedure cannot be removed. One example are procedures whose removal would require us to to remove additional declarations (beyond the obvious pred, func and mode declarations and procedure’s clauses). Another example are predicates exported from Mercury to the C interface— for these we cannot be sure if they are ever called from some C code. 3. Recalculate the superset of accessible procedures.

Marking some procedures as non-removable may require the non-removability for further procedures. We use the same depth-first search as above, except that we start in all non-removable procedures instead of main. 4. Remove the items that correspond to the removable procedures. The removal is done separately for each module of the program through finding their corresponding items and submitting them for deletion to the Reductor’s I/O interface. 6 1. Calculate the superset of accessible procedures of the program. Consider the call graph, which we define as an oriented graph, where the set of vertices is the set of all procedures in a program and set of oriented edges is defined by relation R. (A; B) 2 R if and only if B is called from a call site of A.

We use a depth-first search on the call graph starting from the predicate main. Each procedure that is processed is added to the set of accessible procedures. Processing one node of the search graph consists of finding its call sites (i.e. constructing the edges for the depth-first search). This is done by another depth first search on the goal tree of the procedure, where subgoals are the vertices of the tree and atomic goals (i.e. call goals and unifications) are the leaves of the tree.

There are two types of call sites: static and dynamic. For static call sites, the procedure to be

We think that in future, Reductor can be extended to be able to remove more declarations and to lift most of the limitations it places on the input program. There are however several areas where we are bounded by the current design of MMC. The most notable thing is that it is very problematic to make changes in the contents of clauses and declarations in the general case, unless we give up the aim of preserving the original formatting of the code, i.e. the major goal of Reductor.

One possible solution would be to extend Reductor’s I/O interface so that the reductions could submit changes directly to the HLR instead of just TeLR. Although we did not design such an interface, we believe that it would be reasonable to try to build it, if the MMC had not such an unfavorable design with regard 4 For more details see deep profiler/feedback.m in to this proposal. Certainly in the presence of such an newer versions of MMC source package. interface, the reductions would be quite easy transfor

Remarks on the interfacing compilers

mations of HLR. This would, at the very minimum, input data sets, combined with statical reduction and save us lot of work on dynamical reduction. some subtle changes in the original code, can be useful

We also believe that this proposed enhanced I/O in understanding and/or reusing code of big programs. interface could be easily reused for e.g. automatic code This was the main motivation behind building this restructuring or refactoring (see e.g. [7]) or intelligent tool. syntax highlighting. These two kinds of software can We have also commented on some limitations of make a meaningful use of semantic analysis provided the design of MMC, briefly explored the possibilities by the compiler. The need for such an interface can be for extending Reductor and gave some thoughts on illustrated by the fact that the task as simple as func- commonalities of Reductor code with some other detion renaming requires type analysis (which is a part velopment tools like automatic code refactoring and of the semantic analysis) to distinguish data construc- intelligent syntax highlighting. tors of discriminated unions from function calls of the same name and arity.

Conclusion 1. Mercury project WWW page.

http://www.mercury.csse.unimelb.edu.au/ We described two main automatic methods for source 2. Z. Somogyi, F. Henderson and T. Conway: Mercury: an code reduction of Mercury programs aimed at under- efficient purely declarative logic programming language. standing and reusing parts of Mercury projects. The ASCS'95, Glenelg, Australia, 1995, 499{512. statical reduction removes predicates that are never 3. J. Divis: Automatick´e odstranˇen´ı nadbyteˇcn´ych ˇc´ast´ı called (based on the statical analysis of the source programu6. Bachelor thesis at Charles University in code). The dynamical reduction removes parts of pred- Prague, Faculty of Mathematics and Physics, 2009. icates that were never used on some sample input data. 4. Ralph Becket's Mercury tutorial. The formal correctness of the code is preserved by in- http://www.mercury.csse.unimelb.edu.au/ serting exceptions at these places. The dynamically re- tutorial/book/book.pdf duced code is guaranteed to return the same outputs 5. Shtertipa:l /M/wewrcwu.rryoootn.RczO/OseTr.iCaZl.y/mercury/ on the inputs it was reduced for. On different outputs, 6. Mercury Reference Manual, version 0.13.1. the reduced program may exit with an exception. http://www.mercury.csse.unimelb.edu.au/

The implemented tool called Reductor performs information/doc-release/mercury ref/ both types of the reduction while preserving the orig- 7. T. Mens and T. Tourwe: A Survey of Software Refacinal formatting of the source code as much as possible toring, IEEE Transactions on Software Engineering, 30 to enable further development. This is a unique fea- (2), February 2004. ture that complicated the design of Reductor by far the most.

Reductor has been tested on few medium-sized programs and a large set of small programs (mostly from the MMC test suite). The reductions are sufficiently powerful. There are few imposed constraints on the Mercury language for statical reductions which concern rather rare features. Moreover, the presence of such unsupported constructs usually does not prevent Reductor from reducing the program. While in these cases the reduced program may become uncompilable, it is still useful for manual inspection. In the case of dynamical reduction, the tests did not reveal any substantial quantitative degradation in the amount of code reduced by the implemented method as compared to what can be potentially reduced by a similar method without the mentioned approximations.

The utility of Reductor for large scale projects has to be confirmed yet. We believe that use of dynamical reduction5 on the well chosen subset of all possible 5 The dynamical reduction can also be viewed as coverage testing tool that \visualizes" its results by commenting out the unused parts of code.

6 This is the formal title. The contents are written in English.