Analysis of Meteorological Data of Pantnagar Weather Station
Table of Contents
About
This post is actually a summary of a research project I took under INSPIRE-SHE Scholarship Program by the Dept. of Science and Technology, Govt. of India. My plan was to make the content open-source on the web that faults could be corrected by time. The language is simple and very easy to understand and the ease of understanding is focused on A-level (10+2) students and beyond.
Abstract
The current project is based on 20 years of meteorological data provided by IMD approved weather station at Govind Ballabh Pant University of Agriculture and Technology, Pantnagar. The observations are taken twice a day and therefore fourteen times a week. My analysis is based on the weekly reports provided by the agro-meteorological department. The main aim of this project is to develop my analytical skills in meteorology and to provide some better ideas on meteorological studies. The meteorological observatory at Pantnagar is located at 290 N latitude, 79.30 E longitude and 243.84 m altitude under N. E. B. Crop Research Center. The Pantnagar region lies in the Tarai belt of Uttarakhand state of India. It is a comparatively hot and wet place to nearby places like the famous hill station Nainital. The annual rainfall is about 145 cm, which has a lot of variations throughout the year. The rainy season starts at the end of June and ends in September. Maximum rain is received from the south-west monsoon during the four months rainy season from June to September. The soil of this region is good for agriculture and holds enough moisture to produce good crops. The average pH value of the soil is 7.2 – 7.4. The temperature variation is very large, as summer holding temperature maxima of around 42-45 degrees Celsius while in winter it falls heavily to 2-4 degrees Celsius.
This report was aimed to study and analyze the collected weekly data within the limited time period of two months, as proposed by DST, Govt. of India under the INSPIRE-SHE summer project.
Introduction
Meteorology is a word made by a combination of two Greek words 'Meteors' which means 'atmospheric' or 'lofty' and 'logos' which means discourse or science. Therefore we can define Meteorology as the Science of Atmosphere. Meteorology is actually a study of atmospheric processes using extensive use of applied physics. The science of atmosphere involves the static and dynamic components of the atmosphere, so they when combined called weather.
Constitution of Atmosphere
The atmosphere of earth is a relatively thin sheet of gas firmly attracted to the surface by the gravitational force of the Earth. The atmosphere is mainly treated to be made of something special that is invisible and odorless substance called air. Air is the most important component of the earth's atmosphere. When we say 'Atmosphere', in behavior we mean 'Air'. The air itself has two constituents:
- Distinct
- Variable!
Distinct gaseous constituents are major and cover almost 100% of atmosphere. Main gases present in atmosphere are Nitrogen (78%) and Oxygen (21%) and minor constituents (~1%) are Argon types of inactive gases.
The main Variable constituents are water-vapor, ozone, carbon dioxide and dust. The availability of these constituents depends upon the location on the earth.
The standard state of the atmosphere has been well explored with the help of meteorological and satellite observations. The paragraph below will feature layers of atmosphere. The temperature and physical standards of every atmospheric layer has different values. Hence, it is highly recommended to study about them before we go ahead. The lowest layer contains about three fourths of the mass and almost all the moisture and dust of the atmosphere. It is called the troposphere. All the meteorological phenomena which are called weather and thus highly related to this project are confined to the troposphere. The top of the troposphere is called tropopause. The height of the tropopause varies from 8km (at the poles) to 16km (at the equator).
In the troposphere, the temperature decreases with the elevation at an average rate of 6.5 degrees Celsius per km. This is known as the lapse rate. Above the tropopause is the stratosphere which is free of daily and annual heating of earth's surface and contains very little dust and moisture. The stratosphere contains a major portion of the lifesaving Ozone. Stratopause is located at 30 to 50km above the earth surface as the upper limit of stratosphere. Above the stratosphere, there is the mesosphere and mesopause, a warm layer at 80km above the earth's surface. The fourth major layer above the mesopause is the ionosphere which contains a very little pressure of about 0.01 millibars at 90 km. This layer contains ultraviolet radiation, satellites etc. The Ionosphere merges gradually into the outermost shell called the exosphere. In the exosphere the mean free path is very large and the atmosphere has lost the property of continuum.
As already mentioned, the troposphere is the domain of the study of meteorology with an extension to the stratosphere involving ozone.
Meteorological observations
Using applied physics, engineers do periodical measurements on meteorological parameters to understand long term or short term atmospheric phenomena. The meteorological parameters are like coordinates to explain atmospheric observations and these are dependent of conditions (time and location). There are a number of parameters depending on the meteorological observations one uses, out of which some parameters are most important and shall be used in this project are:
- Atmospheric pressure
- Wind Velocity
- Wind direction
- Temperature
- Humidity
- Radiation
- Sunshine
- Precipitation (rain, snowfall, hail)
- Evaporation
Meteorological instruments are used to collect the data of these parameters at meteorological observatory or the weather station. In India there are a number of weather stations which are regulated by Indian Meteorological Department (IMD). All weather stations work on specific standards formulated by IMD for the location. The meteorological data can be obtained from the weather stations. I have collected the data from weather station Pantnagar, Uttarakhand. Generally the observations are taken twice a day, i.e., at 07:12 & at 14:12 hours.
The minimal understanding of physics of meteorological parameters is required before we can do statistical observations.
Atmospheric Pressure
This is the force exerted per unit surface area of earth at a location. Atmospheric pressure is usually measured by Fortins barometer consisting of an inverted U-tube filled with mercury kept in a cistern, or an aneroid barometer. The instrument which records the variation of the atmospheric pressure with respect to time is called a barograph.
The standard atmospheric pressure is $ 1.02 \times 10^5 N/m^2$ .
Wind Direction and Wind Velocity
Air in the motion is called the wind. The horizontal component of the air in movement parallel to the earth's surface is generally referred to as wind while the vertical components are referred to as the air currents. Measurement of wind direction is extremely easy and is measured by wind vane, while measurement of wind velocity is a little tricky and is measured using the cup anemometer.
Temperature
The most studied parameter of meteorology is temperature and it can be measured simply using mercury thermometer in degree Celsius. A continuous record of temperature with time can be obtained by an automatic recording instrument called the thermograph. The thermograph measures the temperature using the principle that a bimetallic strip changes its shape under the influence of the change in temperature. The maximum and minimum temperatures at a station are measured by a maximum thermometer and minimum thermometer respectively. The mean daily temperature is computed as the arithmetic average of the maximum and minimum temperatures recorded on that day. The daily range is the temperature difference between maximum and minimum temperatures for a particular day. The mean monthly temperature is computed as the arithmetic average of the mean daily temperatures of all days in a month. The mean annual temperature is the arithmetic average of the mean temperatures of all days in a year. The normal daily temperature is the average of the daily mean temperatures for a period of 30 years. Similarly, the normal monthly temperature is the average of the mean monthly temperatures for 30 years. The normal annual temperature is the average of the mean annual temperatures for 30 years.
Humidity
Also known as relative humidity, humidity is measured by the psychrometer. The continuous recording of humidity with time is done by an automatic recording instrument called a hydrographer. Humidity is the relative quantity and is measured in unit percent and is defined by the percentage measured by the ratio of actual and saturation vapor pressures at a given temperature.
Radiation
Radiation is a process in which energetic particles or energetic waves travel through a medium or space. In meteorological observations, thermal radiation causing an increase in temperature and solar radiation causing an effect on ozone layers, etc. are studied.
Sunshine
Sunshine is actually the measure of hours Sun emits light at a weather station a day. It is measured by an instrument called Sunshine recorder.
Precipitation
In meteorology, precipitation means the gathering of water from the atmosphere to earth in any physical form. Rainfall, Snowfall, Hails, Fog etc. are the ways water reaches to earth. Rainfall is measured using a rain gauge. The average rainfall over a significant area is approximated by three methods:
- Arithmetic Mean Method: The result is obtained by dividing the sum of the rainfall amounts recorded at all the rain gauge stations which are located within the area under consideration by the number of stations. i.e.,
$$ P=\frac{P_1+P_2+\ldots+P_n}{n}=\frac{1}{n} \sum_{k=1}^n P_k$$ This method is also known as the unweighted mean method within the area. - Thiessen Polygon Method: Suggested by Theissen in 1911, this method allows irregularities in gauge locations by measurement of each gauge in proportion to the area which is closer to that gauge than to any other gauge. The average depth of rainfall by this method is given by
$$ P=\frac{A_1 P_1+A_2 P_2+\ldots+A_n P_n}{A_1+A_2+\ldots+A_n}$$
where $ P_1, P_2, \ldots, P_n$ are the rainfalls recorded at raingauge stations with polygonal areas around them. - Isohyetal Method: A weakly converging method but perhaps the most exact method for rainfall measurement is the Isohyetal method. The accuracy actually depends upon the skill of the analyst. An isohyet is defined by a line joining points with equal rainfall.
For ease of calculations, I have used the first method in the report.
Evaporation
As usual, water evaporates into vapor at every temperature. As the temperature is increased, evaporation occurs readily. As a side-effect of temperature, water resources convert into vapor and it is very essential to know how much water was evaporated to predict the future precipitation predictions. A tank of water is used to determine the evaporation for a day, called Pan Evaporimeter.
Meteorological Analysis
Many meteorological phenomena are physical processes and in every physical process there should be a relation between the cause and the effect. Once the relation is executed, the output of the process can be easily and precisely predicted. But in some cases there is an element of uncertainty regarding the outcome of the process. The first type of processes are called deterministic processes, while the second type of processes are undeterministic. Prediction of sunset and sunrise are deterministic and that of rainfall and wind velocity are undeterministic. The most of the meteorological parameters are connected to such processes which are random, and where arises randomness there comes the Statistics and Probability. So before we proceed, let we have a quick review on some statistical topics which are closely related to the the calculations I shall use in this project.
- Random Process / Probabilistic Process / Random Phenomenon is a process/phenomenon in which no certain relation between the cause and the effect is observed. A random process always has an element of unpredictability.
- If an experiment / event is conducted N times, or if the outcome of a process is observed N times, and if a particular attribute A occurs n times, then the limit of n/N as N becomes large, is defined as probability.
Therefore, probability $ P=\displaystyle{\lim_{N \to \infty}} \frac{n}{N}$ .
- The maximum value of probability is 1 and the minimum value is 0. In first case event is said to occur and in later case it is said not to occur.
- Probability is a numerical quantity and hence its application is implemented to meteorological processes by defining some variables and functions.
- A variable associated with a random process which can't be predicted uniquely, is called a random variable.
- Random variable is classified into two ways: discrete random variable and continuous random variable.
- A discrete random variable is a variable that can take only a countable number of values. For example, rainy days in a month is a discrete but random variable.
- A continuous random variable is used when we are dealing with measuring data rather than counting data. For example the annual rainfall at a place is continuous but random variable.
- A random variable is usually denoted by an upper case letter, X, while any values which it takes is denoted by corresponding lower case x.
- The behavior of the random variable is completely described by either the probability density function (p.d.f.) or cumulative distribution function (C.D.F).
On the basis of statistical techniques, the meteorologists focus on prediction and extraction of useful information out of a lot of data containing observational errors. In the present project, our main aim shall be to discuss the meteorological parameters on the basis of regression analysis, time series and predictability.
Regression Analysis: The most useful mathematical modeling in meteorological calculations is regression analysis. Regression equations are heavily used for over a hundred years to estimate summer monsoon rainfall in India. It was first introduced by Blanford in 1884 and later improved by Sir G Walker in 1921-1927. Regression analysis tries to find out the average relationship between the variables. It refers to the method by which estimates are made of the values of one variable is made by the knowledge of other one or more variables.
On the basis of complications, there are following types of Regression Equations as described by Walker:
- Simple Linear Regression
- Multiple Linear Regression
- Curvilinear Regression
Simple Linear Regression: Let $ x$ be the independent variable and $ y$ be the dependent variable, then the relationship between $ x$ and $ y$ as,
$ y=a+bx$
where $ a$ and $ b$ are constants, which are obtained from regression analysis, is called simple linear regression.
Multiple Linear Regression: Let $ y$ be the dependent variable and x1 x 2 ...xn are independent variables, then the multiple linear regression can be written by relation,
$ y=a_0+a_1x_1+a_2x_2+ \ldots+a_n x_n$
where $ a_0, a_1$ ..are the regression constants to be determined.
Curvilinear Regression: Let $ x$ be an independent variable and $ y$ be a dependent variable, then the relationship of form
$ y=ax^b$ is called curvilinear regression, where $ a$ and $ b$ are constants to be determined.
Analysis of Regression Equations:
The regression coefficients are estimated by the least square method and step-wise regression. If $ \bar{x}$ and $ \bar{y}$ are the mean values of variables $ x $ and $ y$ and $ X$ , $ Y$ are deviations from the mean, i.e.,
$ X=x-\bar{x}$ & $ Y=y-\bar{y}$ .
Solving Regression Equations: Let $ (x_1,y_1)$ , $ (x_2, y_2)$ , .... $ (x_n, y_n)$ be the $ n$ -pairs of observations on $ x$ and $ y$ . Assuming that the best values of the constant $ a$ and $ b$ are known, we can predict the value of $ y_k$ for some $ x_k$ . Let the predicted value of $ y_k$ is $ w_k$ , therefore the deviation, $ Y_k=y_k-w_k$ .
But since, $ y=a+bx$ .
Therefore, $ w_k=a+bx_k$ .
Hence the deviation, $ Y=y_k-a-bx_k$ .
The best regression is one that makes these deviations as small as possible. For the best regression, the squared deviations shall be minimum. Therefore, the criterion is
Minimize $ S=\sum{(y_k-a-bx_k)}^2$ .
Now, as $ S$ is minimum for $ a$ and $ b$ therefore,
$ \frac{\partial{S}}{\partial{a}}=0$
$ \frac{\partial{S}}{\partial{b}}=0$ .
These two are called normal equations, involving $ a$ and $ b$ as solvable quantities.
Similar method can be used in solving multiple linear regression, by inputting
$$ S=\sum_{k=1}^{n}{(y_k-a_0-a_1x_{1k}-a_2x_{2k}-\ldots-a_nx_{nk})}^2$$
Now if we derive partial derivatives of $ S$ w.r.t. $ a_0, a1, a2$ etc, we get $ n+1$ normal equations. These can be solved easily to find the coefficients.
Solving curvilinear equations is very similar to that of simple linear equations.
As, $ y=ax^b$
Taking log of both sides,
$ \log y=\log a + b \log x$ .
Now, we can use least square method for $ log x$ and $ log y$ to get the best values of $ log a$ and $ b$ , and thus $ a$ and $ b$ .
Correlation:
In a random process, one random variable is associated with other variable, then these variables are said to be correlated and the relation involving the variables is called a correlation. The measure of correlation, which is called the correlation coefficient, summarizes the direction and magnitude of correlation.
Let $ (x_1, y_1), (x_2, y_2),(x_3, y_3), \ldots (x_n, y_n)$ be the n-pairs of observations of the random variables $ X$ and $ Y$ . Then the sample correlation coefficient of these two variables is given by.
$ r= \dfrac{\frac{1}{n} \sum_{k=1}^n {(x_k-\bar{x})(y_k - \bar{y})}}{s_x s_y}$ .
Where, $ \bar{x}$ and $ s_x$ are the mean and standard deviation of the variable $ X$ and $ \bar{y}$ and $ s_y$ are the mean and standard deviation of the variable $ Y$ . The value of $ r$ lies in between $ -1$ and $ +1$ . When, $ r=-1$ , the variables are said to be perfectly negatively correlated and when, $ r=+1$ , the variables are said to be perfectly positively correlated. When $ r=0$ , variables are said to be not-correlated.
Software used
The software proposed to be used to complete the project are LaTeX (for mathematical writing), SPSS (for Graphical and Descriptive analysis), Mathematica (for complicated calculations) and PSPP (as an alternate to SPSS in some cases). Microsoft Office was used to analyze the collected data of the project. An online copy of the project will be available on https://gauravtiwari.org/summerproject.
Aim and Conclusions of the Project
The major aim of this project is to motivate myself towards a multidimensional research career. I am basically a mathematics major student, but highly interested in earth and planetary sciences. This project might throw some light to understand the subject. On the other hand, this project is intended to enhance my analytical skills. I hope this will help me to achieve my goal.
References
- Hydrometeorology, Weisner C.J., 1983-Chapman & Hall Ltd., London
- Atmosphere, Wikipedia-The Online Encyclopedia
http://en.wikipedia.org/wiki/Atmosphere - A Textbook of Hydrology, Dr. P. Jaya Rami Reddy
Laxmi Publications India, 2006 - Use of Probability Distribution in Rainfall Analysis, M A Sharma & J B Singh.
Attached Documents
The following attachments are required to be read in order to understand the analytic report. All the attachments are Microsoft Excel files which need MS Excel 2007 or higher in order to be rendered properly.
- Year 1989 to 2008 : 20 Years weekly meteorological data
- Yearly Meteorological Data: With Analytic and Moving Average Graphs
- Rainfall Data for Moving Average
- Temperature Data for Moving Averages
- Data.sav file for SPSS analysis.
- Images from the weather station.
Analysis and Interpretation of Rainfall Data
The rainfall process is essentially random in nature. We can not predict with certainty what would be the rainfall for any given period in future. The rainfall magnitudes can be estimated only with some probability attached to them.
The data of annual rainfall can be graphically presented in many ways such as the chronological chart, bar diagram and the ordinate graph.
Mean and Standard Deviation of Annual Rainfall from the Data
The mean of rainfall is defined by $ \bar{x}=\frac{\sum{x_n}}{n}$
Or, from the data sheet, $ \bar{x}=1610.835$ mm per year
Therefore, standard deviation of the annual rainfall:
$ s_x=\sqrt{\frac{\sum{(x_i-\bar{x})}^2}{n-1}}$
=2768.38 mm per year
Similar method follows regarding the analysis number of Rainy days.
We have mean of number of Rainy Days= 78 days per year (round up to nearest natural number)
and standard deviation =11.8 days per year
Moving Averages Curve
Moving averages curve can be plotted on the basis of the file MovingAverageRF and can be simultaneously implied on the original rainfall data. See Chart M1 and M2 for the moving averages curves of Rainfall and Rainy Days respectively.
Analysis and Representation of Temperature Data
Temperature is relatively less random in nature compared to that is rainfall. The statistical analysis of Temperature data reveals the minor uncertainty of annual maximum and minimum temperatures of the weather station.
Mean and Standard Deviations of Temperatures (Max. and Min. Temps respectively)
The mean of Temp Avg Max (or Min) is defined by $ \bar{T}=\frac{\sum{T_n}}{n}$
And, standard deviation of the Annual Temperature Maxima (or Minima):
$ s_T=\sqrt{\frac{\sum{(T_i-\bar{T})}^2}{n-1}}$
From the data,
Mean of Annual Maximum Temperature =29.6°C
Standard Deviation of Annual Temperature Maxima=0.575°C
Mean of Annual Minimum Temperature=16.9°C
Standard Deviation of annual Temperature Minima=0.344°C
Moving Averages Curve
Moving averages curves can be plotted on the basis of the file MovingAverageTEMP and can be simultaneously implied on the original temperature data.
Correlation and Regression Analysis of Variables
Two Tail Sig. Correlation and Regression analysis of some important variables were done using SPSS and the output file is attached along with the project.
Analytic Report
After manipulation of meteorological data of Weather Station Pantnagar on weekly basis, some important aspects rose on the paper. Regardless of some missing values from the data.sav file, here are some important extremes on yearly basis:
1. Range of Temperature
The mean yearly maximum temperature of Pantnagar varies from 28.3 deg C to 30.4 deg C and the mean yearly minimum varying from 16.3 deg C to 17.5 deg C. On the other hand, the average weekly maximum was 42 deg C and average weekly minimum was 2.4 deg C. This shows the variation of actual temperatures through the days and weeks, yearly being almost constant though.
2. Range of Humidity
From the meteorological data it is clear that Pantnagar has a lot of variation among humidity through out the days. In morning (at 07:12 AM) the psychrometer read around 90% on a day, it falls rapidly to almost half at afternoon. The mean yearly humidity varies from 81.7 to 86.9% in the morning, while that in afternoon reads from 44.5 to 56.0% at an average.
3. Rainfall
The data involving the rainfall shows a lot of variations. In year 1992, the total amount of rainfall was 831.2 mm, while that in 2000 was 3218.6 mm. In analytical approach, one would not see any analogues changes within the years, but things being very uncertain. If we consider the number of rainy-days in a Meteorological year, we will find that the rainy season is around three and a half months long, with number of rainy days varying from 63 to 100.
4. Sunshine
In accordance to the geographic location, the Sun shines almost 7-8 hours a day with full brightness. The least average of sunshine being 6.40 hours (i.e., 6 hours and 24 minutes) per day in the year 2008 and the shiniest year was the initial 1989 with 7.97 hours per day.
5. Air
There aren't many high values at the arithmetic mean of wind velocities taken per year, but to be finer we find that it has a large range of wind velocities at a weekly basis. The daily wind velocities vary from 0.8 kmph to 30 kmph, all due to the location of the town. The average yearly wind velocity ranges from 2.94 kmph to 7.03 kmph.
6. Evaporation
As average sunshine hours are almost constant, the average evaporation per day is the same in that aspect. The average evaporation throughout a day is 5mm at an average, with ranges varying from 3.92 mm to 5.41 mm.
Summary
The results obtained during the project revealed some important aspects of the climate of Pantnagar Weather Station. As usual, the daily temperatures vary heavily per day but on aggregation, the mean is almost constant. The annual weekly maximum temperature rounds about 29.6°C with a standard deviation of just 0.575°C, while the annual weekly minimum temperature rounds about 16.9°C with a standard deviation of 0.344°C. Being not much far from the hill areas, Pantnagar feels very cold in the winter session while due to the Tarai effect the temperature in summers rises over the 40s. On the other hand, the rainfall quantity involves enormous variation through the years. The average rainfall per year for Pantnagar is 1610.8 mm with a standard deviation of 2768.4 mm from one year to next year. This results to an assumption that rainfall is very uncertain in this area as per years are concerned. The number of rainy days per year rounds about 78 days at average with 11.8 days per year being the standard deviation. Rainfall has not have much of a correlation with the number of Rainy Days and this provides the idea of uncertainty of rainfall per day. It is highly unpredictable how much precipitation would be there on a rainy day. On the morning the average relative humidity is 84.7% which at the afternoon falls significantly to 50.8%. This is the kind of variation, Tarai belt is known for. The Sun shines 7.37 hours per day on average and this results in 4.85 mm evaporation per day at average. The correlation and regression analysis of these two random variables are done in the attached SPSS document. Air is almost steady here if aggregation is concerned. The average wind velocity is about 4.75 kmph. $\Box$