--- title: "Introduction to goalp" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to goalp} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## About goalp `goalp` is an R package for linear goal programming. It allows solving basic, weighted, and lexicographic linear goal programming problems, as well as a mixture of the weighted and lexicographic approaches. The package accepts a human-readable representation of the problem, and it automatically sets up the corresponding linear programming problem, which it later solves using the [lp_solve](https://lpsolve.sourceforge.net) linear solver, through its R interface [lpSolve](https://cran.r-project.org/package=lpSolve). The implementation of goal programming is based on the notation and approach described in Ignizio (1983). ## Basic goal programming in `goalp` Imagine you want to grow some vegetables and herbs in your garden. You can grow aubergine (A), broccoli (B), carrots (C), dill (D) or endive (E). Each vegetable requires a certain amount of land to grow per seedling you plant, as well as a certain amount of water and fertilizer. After a bit of research, you find out that each seedling of each vegetable needs the following amount of resources to grow. | |Aubergine|Broccoli|Carrot|Dill|Endive| |:--|:-------:|:------:|:----:|:--:|:----:| |Land|60|30|20|10|20| |Water|0.8|0.2|0.3|0.1|0.3| |Fertiliser|30|20|10|5|20| Where land is expressed in cm of a planting line, water in litres per week, and fertiliser in grams per season. These values are only meant as an example, and do not reflect real needs of these plants. Now imagine you have prepared the land already, and you have 3,000 cm of planting line available, you have 50 litres of water available during a week, and you have bought 2,000 grams of fertiliser. You would like to decide how much to plant of each vegetable in such a way that your land, water, and fertiliser consumption is as close as possible to your available resources. You can go a bit above or below each amount, as you can always extend the planting lines, get additional water from other sources, or buy extra fertiliser, or leave any amount of them unused. But you would like to minimise additional work or expenditure, as well as avoid any waste. How can you decide, then, what to plant while consuming as close as possible to the available resources? Goal programming can help us when we have multiple restrictions (land, water and fertiliser in our example) from which we want to deviate as little as possible. To formulate our vegetable patch problem as a goal programming problem, we must first write our constraints, which we will call *goals*, as the following set of equations. $$60A + 30B + 20C + 10D + 20E = 3000$$ $$0.8A + 0.2B + 0.3C + 0.1D + 0.3E = 50$$ $$30A + 20B + 10C + 5D + 20E = 2000$$ Additional to the goals listed above, we assume all *decision variables* (A, B, C, D, E) to be non-negative. This is an implicit constraints that will be included in all goal programming problems in goalp. We are unlikely to find a solution that satisfies all goals perfectly, so we allow for *negative and positive deviations* in each goal. For a given possible solution (i.e. a set of values for the decision variable), the negative deviation measures by how much the left-hand side of the goal is lacking to reach the right-hand side. The positive deviation, on the other hand, measures by how much the left-hand side of the equation is above or in excess of the right-hand side. The deviations are always expressed as non-negative values. For example, if we were to plant two dozens of each vegetable, i.e. $A=B=C=D=E=24$, then the left hand side of the "land" goal would have a value of $60*24 + 30*24 + 20*24 + 10*24 +20*24 = 3360$, which means our positive deviation would be equal to $d_{land+}=3360-3000=360$, because we have an excess of 360 units on our left hand side, above our goal. On the other hand, if we evaluate the left-hand side of our water goal, we find it has a value of $0.8*24 + 0.2*24 + 0.3*24 + 0.1*24 + 0.3*24 = 40.8$, meaning we are lacking 9.2 units to reach our goal, in other words $d_{water-}=50 - 40.8 = 9.2$. Finding values for A, B, C, D, and E that keep us as close as possible to our goals is equivalent to minimising the sum of all deviations. Therefore, we just need to solve the following linear programming problem: $$Min_{A, B, C, D, E, d_{i-}, d_{i+}} \sum_{i\in \{ land, water, fertiliser\} } d_{i-} + d_{i+}$$ Subject to the following set of constraints: $$60A + 30B + 20C + 10D + 20E + d_{land-} - d_{land+}= 3000$$ $$0.8A + 0.2B + 0.3C + 0.1D + 0.3E + d_{water-} - d_{water+} = 50$$ $$30A + 20B + 10C + 5D + 20E + d_{fertiliser-} - d_{fertiliser+} = 2000$$ We can solve this problem using `goalp`. To do it, we simply need to follow four simply steps. 1. Load the `goalp` package if it is not loaded already. 2. Write our goals in a text variable, one goal per line. 3. Call the `goalp` function with our goals as an argument. 4. Use `summary` on our results to see them on the screen. The process is exemplified in the following code snippet. ```{r basic equal only} # Load library library(goalp) # Write goals to a text variable goals <- "Land : 60*A + 30*B + 20*C + 10*D + 20*E = 3000 Water: .8*A + .2*B + .3*C + .1*D + .3*E = 50 Fertiliser: 30*A + 20*B + 10*C + 5*D + 20*E = 2000" # Solve problem gp <- goalp(goals) # Print results to screen summary(gp) ``` We observe that the solution is $A=0, B=0, C=100, D=0, E=50$, i.e. planting 100 carrots and 50 endives. The summary also reports the value of the deviations, with $d_{water-}=5$, meaning our solution wastes 5 litres of water (i.e. it only uses 45 litres). A few things to note when using `goalp`: - Names can be assigned to each goal, they should be written at the start of the line and separated from the goal by a colon. - Deviations must not be included in the goals, these are added automatically by `goalp`. - Goals must be written as regular R expressions, for example `"2*x + 3*y = 10"` is a valid goal, but `"2x + 3y = 10"` is not. Now imagine the fertiliser has a very long shelf life, so we do not care about using less of it, as we can always use whatever is left during the next season. In formal terms, this means that we are not interested in the minimising $d_{fertiliser-}$, or alternatively, that our fertiliser goal can be rewritten as follows. $$30A + 20B + 10C + 5D + 20E <= 2000$$ At the same time, imagine we really enjoy working in our plot of land, so we do not care about having to prepare more land for planting. In formal terms, this means that we are not interested in minimising $d_{land+}$, or alternatively, that our land goal can be rewritten as follows. $$60A + 30B + 20C + 10D + 20E >= 3000$$ We can solve the new problem using `goalp` using the following code. ```{r basic inequal} # Load library library(goalp) # Write goals to a text variable goals <- "Land : 60*A + 30*B + 20*C + 10*D + 20*E >= 3000 Water: .8*A + .2*B + .3*C + .1*D + .3*E = 50 Fertiliser: 30*A + 20*B + 10*C + 5*D + 20*E <= 2000" # Solve problem gp <- goalp(goals) # Print results to screen summary(gp) ``` The solution now changes, with the optimal now being planting 25 broccoli and 150 carrots. This new solution requires preparing 750 additional cm of land (7.5 metres) for planting, but consumes all available water and fertiliser. Now imagine we enjoy variety in our diet, so we would like to to produce at least 10 aubergines and 10 broccoli. While we like carrots, we do not want too many of them, so we would like to plant only between 10 and 20. Dill, on the other hand, we definitely do not need more than 5. We can add these ranges to the formulation of our problem. To add lower, upper and both kinds of bounds to the decision variables in our problem, we use the key words `lBound`, `uBound`, and `range` respectively. Note that these are hard constraints, as no deviations are allowed for them. ```{r basic bounds} # Load library library(goalp) # Write goals to a text variable goals <- "Land : 60*A + 30*B + 20*C + 10*D + 20*E >= 3000 Water: .8*A + .2*B + .3*C + .1*D + .3*E = 50 Fertiliser: 30*A + 20*B + 10*C + 5*D + 20*E <= 2000 A lBound 10 B lBound 10 C range [10,20] D uBound 5" # Solve problem gp <- goalp(goals) # Print results to screen summary(gp) ``` The optimal solution is now planting 52 aubergines, 12 broccoli and 20 carrots. Note that this solution satisfy all bounds imposed on the problem. It also implies preparing 8.80 additional metres of land for planting. Lower boundaries have the negative deviation $d_{i-}$ set to `NA`, while upper boundaries have the positive deviation $d_{i+}$ set to `NA`. This is because lower boundaries do not allow negative deviations, as the variable must be equal or bigger than the lower boundary. Similarly, upper bounds do not allow for positive deviations, therefore their values are shown as `NA`. Note that, unlike goals, names cannot be assigned to bounds. ## Weighted goal programming So far, we have given the same weight to all deviations. For example, lacking 1 cm of land was just as bad as using one litre more or less than 50 litres. Very often, this is not the case. Continuing with the example, deviating by one litre of water is probably more important than deviating by 1 cm of land. To address this disparity, we can weight the deviations in the objective function differently for each goal. For example, we could say that what matters are the deviations in terms of percentage, so deviating by 30 cm (1% of 3000 cm) is just as important as deviating by 0.5 litres (1% of 50 litres), or 20 grams of fertiliser (1% of 2000 grams). To do this, we need to assign weights to each deviation. We can do this easily in `goalp`, as follows. ```{r weights per goal} # Load library library(goalp) # Write goals to a text variable goals <- "Land : 60*A + 30*B + 20*C + 10*D + 20*E >= 3000 | 1/3000 Water: .8*A + .2*B + .3*C + .1*D + .3*E = 50 | 1/50 Fertiliser: 30*A + 20*B + 10*C + 5*D + 20*E <= 2000 | 1/2000 A lBound 10 B lBound 10 C range [10,20] D uBound 5" # Solve problem gp <- goalp(goals) # Print results to screen summary(gp) ``` Changing the weights of each deviation leads to a different optimal solution, this time the solution being 52 aubergines, 10 broccoli, 20 carrots, and 4 dill. The printed solution shows the weight of each deviation next to the value of each deviation, under the columns `w-` and `w+` for the weights of the negative and positive deviations, respectively. Note that goals expressed as equalities (`=`) have weights different from zero for both the negative ($d_{i-}$) and positive deviations ($d_{i+}$). Goals expressed as "bigger than" inequalities (`>=`) have the weight of $d_{i+}$ automatically set to zero, as going over the goal is not penalised. Goals expressed as "smaller than" inequalities (`<=`) have the weight of $d_{i-}$ automatically set to zero, as being under the goal is not penalised. The same effect could be achieved by setting the `normW` options to `TRUE` when calling `goalp`, i.e. `goalp(goal, normW=TRUE)`. The option `normW` stands for "normalised weights". It automatically defines weights as the inverse of the right-hand side of each goal. Note that bounds do not accept the use of weights. Now imagine that not only we want to reduce deviations from our ideal water usage of 50 litres, but also it is more onerous for us to exceed 50 litres, than it is to use less than that. For example, it could be that additional water has to be fetched from a distant tap, so it requires a significant effort, while using less water is of little annoyance. We can take this into consideration by defining different weights for the negative and positive deviations for the "water" goal. We do this by defining two weights for the goal, **first the negative deviation weight, followed by the weight of the positive deviation**, as shown in the example below. ```{r weights per deviation} # Load library library(goalp) # Write goals to a text variable goals <- "Land : 60*A + 30*B + 20*C + 10*D + 20*E >= 3000 | 1/3000 Water: .8*A + .2*B + .3*C + .1*D + .3*E = 50 | 1/50 10/50 Fertiliser: 30*A + 20*B + 10*C + 5*D + 20*E <= 2000 | 1/2000 A lBound 10 B lBound 10 C range [10,20] D uBound 5" # Solve problem gp <- goalp(goals) # Print results to screen summary(gp) ``` In this case the result does not change, as the previous solution had negative and positive deviations equal to zero for the water goal. Nevertheless, we can see that the weights reported in the `w-` and `w+` columns are different for the water goal. A more concise way to define the previous problem would have been the following. As we use the `normW=TRUE` option, the scaling of weights by the inverse of the righ-hand side value of each goal is automatic. ```{r eval=FALSE} # Load library library(goalp) # Write goals to a text variable goals <- "Land : 60*A + 30*B + 20*C + 10*D + 20*E >= 3000 Water: .8*A + .2*B + .3*C + .1*D + .3*E = 50 | 1 10 Fertiliser: 30*A + 20*B + 10*C + 5*D + 20*E <= 2000 A lBound 10 B lBound 10 C range [10,20] D uBound 5" # Solve problem gp <- goalp(goals, normW=TRUE) # Print results to screen summary(gp) ``` ## Lexicographic goal programming Now imagine that we are more interested in using all of our available land, than in fulfilling our water and fertiliser goals. For example, it could be that all the land we do not use is assigned to other people in the next harvest season. When one or more goals are more important than others, we use lexicographic goal programming. Lexicographic goal programming requires solving the problem multiple times, but considering only some deviation in each occasion. The idea idea is that we first optimise for the most important goal (or deviation), we fix those deviations to the best we found, and then we optimise again considering only the next most important deviation, and so on and so forth. For example, let's say our first priority is using at least 3000 cm of land, in other words, we want to minimise $d_{land-}$, so we assign it a priority of 1. Our next priority is avoiding to use more than 50 litres of water, so we assign $d_{water+}$ a priority of 2. Then we care the same about all remaining deviations, so we assign priority 3 to each of them. We communicate this priority to `goalp` using the following syntax. ```{r lexicographic} # Load library library(goalp) # Write goals to a text variable goals <- "Land : 60*A + 30*B + 20*C + 10*D + 20*E >= 3000 | 1# Water: .8*A + .2*B + .3*C + .1*D + .3*E = 50 | 3# 2# Fertiliser: 30*A + 20*B + 10*C + 5*D + 20*E <= 2000 | 3# A lBound 10 B lBound 10 C range [10,20] D uBound 5" # Solve problem gp <- goalp(goals) # Print results to screen summary(gp) ``` The solution has now changed slightly to 52 aubergines, 12 broccoli, and 20 carrots. As the previous solution already did not use less than 2000 cm of land the solution was not expected to change drastically. When we run `goalp`, we see that three sub-problems are solved, one for each level of priority. Note that, unlike weights, the syntax for priority includes a hashtag (`#`). It is also possible to combine lexicographic priorities and weights. But then the weights will be relevant only within the same level of priority. We could, for example, obtain sligtly different results if we instead run `goalp(goals, normW=TRUE)`. When printing the results, the lexicographic priority of each deviation is shown next to its weights, in columns `p-` and `p+` in the table at the bottom of the summary. ## Extensions `goalp` by default assumes that all decision variables are integers. But they can also be defined as *continuous* (non-negative real numbers) or *binary* (variables that can only take values 0 or 1). The typre of each variable can be specified through the `varType` option when calling `goalp`, for example `goalp(goals, varType=c(x="cont", y="int", z="bin"))`, where `x`, `y` and `z` are the names of the decision variables. `goalp` allows for enforced equality constraints using the `==` symbol. This will turn a goal into an effective constraint. Note that this increases the chances of the solver not finding a solution for the underlying linear optimisation problem. goalp` also allows for enforcing of constraints by setting goal's weights equal to `NA`. If a weight is set to `NA`, then the corresponding deviation is removed from the underlying linear optimisation problem. This is specially useful in the case of enforcing inequalities. For example, to enforce $x + y >= 5$: ```{r eval=FALSE} goals <- "g1: 3*x + 2*y + z = 20 g2: x + y >= 5 | NA" gp <- goalp(goals) summary(gp) ``` Besides the text syntax describing a goal programming problem, `goalp` also allows defining a problem using matrices. This is specially useful in the case of large problems. The problem is expressed in the form `Ax = b`, with the user having to provide the coefficient matrix `A`, with as many rows as goals and columns as decision variables, the vector `b`, with the right-hand side values of the goals, and an additional character vector `m` indicating if each goal relates the left- and right-hand sides using `"="`, `">="`, `"<="`, `"=="`, `"lBound"`, or `"uBound"`. Type `?goalp` in the console for more details. ## References Ignizio, J.P (1983) Generalized goal programming: An overview. *Computers and Operations Research* **10**, 277-289. [doi](https://doi.org/10.1016/0305-0548(83)90003-5)