- Laura Whiting

# Working with Dates

Stata stores dates in numeric form using elapsed-time format. A date variable is interpreted by Stata as the number of days since 1 January 1960. For example - 0 = 1 January 1960; 1 = 2 January 1960; -1 = 31 December 1959. This is similar to how dates are stored in Microsoft Excel, which interprets dates as the number of days since 1 January 1900. However, Stata dates are more versatile than Excel and can interpret negative dates as well as positive dates. Excel cannot do this, so any date prior to 1900 is uninterpretable in Excel.

Time and date-time variables are also stored numerically, however Stata interprets these as the number of milliseconds since 1 January 1960. For example – 0 = 1 January 1960 at 00:00:00.000; 1 = 1 January 1960 at 00:00:00.001; -1 = 31 December 1959 at 23:59:59.999. A time variable is formatted such that 1 January 1960 is not shown, however technically this date is assigned to time-only variables. Since you are only interested in the time for time-only variables, this fact is irrelevant.

By storing dates in this manner Stata makes it easy for you to perform calculations with your date and time variables. Some examples of helpful calculations you can perform with Stata dates is shown below.

**Worked Example 1 - Determining the Length of Stay in a Hotel**:

I have a small hotel dataset that contains the date and time (together) of check-in and check-out. The dataset is linked below.

I would like to know how many days customers spent at the hotel. In the command pane I type the following:

In the above code, the local macro *halfday* contains the number of milliseconds in 12 hours (or half a day). When performing a simple calculation of check-out minus check-in, the number of days stayed will be different because the days and times are recorded in milliseconds. To correct for this I use the macro *halfday* to calculate the length of the stay to the nearest half a day. I then format my new *stay* variable with *%tc* to tell Stata this is a date-time variable, and *jjj* to ask Stata to display the number of days in the whole year. As each stay is calculated as starting on 1 January 1960, this will show the number of days stayed up to 365.

**Worked Example 2 - Use Dates to Subset your Data**:

Using the same hotel dataset, I now want to summarize the cost variable to find out what the average cost was for bookings in the month of September in 2017. I can do this with the following qualifiers on my **summarize** command:

This gives the following information from **summarize**:

For the month of September the average cost to stay at this hotel was $2,785.67. The **tc()** function I used with my qualifiers is necessary for Stata to interpret the command correctly. Stata will convert the date and time information given within this function to the correct number of milliseconds. This allows Stata to qualify the command in numeric terms. For the **tc()** function to work you must provide the time, however the date is optional as this is also used for purely time variables. If no date is given then 1 January 1960 is assumed (as with all purely time variables). There is an equivalent **td()** function for purely date variables. For the **td()** function you must provide a date without a time.