

ap6801 contains the respondents' life satisfaction for the year 1985, bp9301 contains it for 1986, and so on.

The variables are originally named in this format: ap6801 bp9301 cp9601 dp9801 and all the way to zp15701. There are 26 variables for the respondent's life satisfaction (for 26 years). © W.I am working with a panel dataset containing the life satisfaction of interviewees in a survey. Whenever I am aware of a prcocedure not permitting the use of by, I will indicate this in the heading of the respective entry. Some procedures simply do not allow for comparison of groups here, typically you have to resort to the if clause. Particularly in the case of graphs, often the option over has to be used for comparing groups. Note that by does not work with all commands. Of course, there is also an option to sort data prior to doing your analyses. As long as your data set is not too large, it is no problem to use the sort option here – Stata is very fast. However, I've noted that Stata did not recognize data that were sorted (perhaps the problem was that there were "gaps" in the variable, i.e. If the data are already sorted by country, it is not necessary to use the sort option, as has been done in this example. For instance, to repeat a regression analysis by country, supposing that the respective variable indeed is named "country", you simply have to write:īy country, sort: regress income education tenure Sometimes you may wish to repeat an analysis for subgroups in your data. Will compute the overall number of cases within "ID_hh" assuming that "ID_hh" refers to households, the new variable will indicate the household size. Will create a variable "nhh" that indicates the position of each case within "ID_hh" that is, the first case will have the value of "1", the second of "2", etc. There is a number of keywords that may be used in this way, e.g., mean, median, std, min, max, or pctile, p(#) (for other values than the median). This is easily accomplished, provided that a variable indicates to which household each person belongs: In such a situation, you may wish to compute the household income, i.e., the sum of all individual incomes in each household. However, a common situation is that you have data which were collected on households, and in each household all adult persons were interviewed concerning, e.g., their individual income.

Of course, this is something you will not normally wish to do. Note that each case (=row) in the dataset will have the same value in this variable, to wit, the total of all incomes. Will compute the sum of income over the entire dataset and will store the result in a new variable called tinc. Some Stata commands that may be useful for data transformation do not relate to a single row of the data, but rather to the dataset in its entirety.

But you may also build it into the by prefix, as in:īysort country: some Stata commmand(s) Data transformation If this is not the case, you may use the sort command prior to executing the command beginning with by. Note, however, that this presupposes that the data are sorted by "country". Whatever is achieved by "some Stata command(s)" is accomplished separately for all groups defined by variable "country". The general form to deal with by is to use it as a prefix. It is most useful for data transformations, but of course it may also be used to do analyses by subgroups. To do something not on the entire dataset, but rather on subgroups, keyword by is used.
