Partial Derivatives

A partial derivative measures how a function of several variables changes as one variable is varied while the others are held constant. If \( f(x, y) \) describes a surface in three dimensions, \( \partial f / \partial x \) gives the slope of the surface in the \( x \)-direction at a fixed \( y \), and \( \partial f / \partial y \) gives the slope in the \( y \)-direction. Partial derivatives are the starting point for multivariable calculus and the language of physics, optimization, and machine learning.

Partial derivatives of a function z = f(x,y) — a saddle-shaped 3D surface with two tangent lines, one in the x-direction (∂z/∂x) and one in the y-direction (∂z/∂y).
Partial derivatives measure the rate of change of a multivariable function with respect to one variable, holding others constant.

Definition

For a function \( f(x, y) \) of two variables, the partial derivative with respect to \( x \) is:

$$ \frac{\partial f}{\partial x} = \lim_{h \to 0} \frac{f(x+h, y) – f(x, y)}{h} $$

This is just the ordinary derivative of \( f \) with respect to \( x \), treating \( y \) as a constant. The symbol \( \partial \) (a curly d, sometimes called ‘del’ or ‘partial’) distinguishes partial derivatives from total derivatives.

How to Compute

To find \( \partial f / \partial x \), treat every variable other than \( x \) as a constant and differentiate as usual.

Example 1. \( f(x, y) = 3x^2 y + 4y^3 \). Then \( \partial f / \partial x = 6xy \) (the second term is constant in \( x \)). And \( \partial f / \partial y = 3x^2 + 12y^2 \).

Example 2. \( f(x, y) = e^{xy} \). \( \partial f / \partial x = y e^{xy} \), \( \partial f / \partial y = x e^{xy} \).

Example 3. \( f(x, y, z) = x \sin(y) + z^2 \). \( \partial f / \partial x = \sin(y) \), \( \partial f / \partial y = x \cos(y) \), \( \partial f / \partial z = 2z \).

Notation

Several notations are common:

  • \( \dfrac{\partial f}{\partial x} \) — Leibniz notation, the most explicit.
  • \( f_x \) or \( f_x(x, y) \) — subscript notation, compact.
  • \( D_x f \) or \( \partial_x f \) — used in advanced texts.

Higher-Order Partials

You can differentiate again. The second-order partials of \( f(x, y) \) are:

$$ \frac{\partial^2 f}{\partial x^2} = f_{xx}, \quad \frac{\partial^2 f}{\partial y^2} = f_{yy}, \quad \frac{\partial^2 f}{\partial x \partial y} = f_{xy} = \frac{\partial}{\partial x}\left(\frac{\partial f}{\partial y}\right) $$

Clairaut’s theorem: if \( f_{xy} \) and \( f_{yx} \) are both continuous, they’re equal: \( f_{xy} = f_{yx} \). Order of differentiation doesn’t matter for well-behaved functions.

Geometric Interpretation

The graph of \( z = f(x, y) \) is a surface in 3D. At a point \( (x_0, y_0) \), \( \partial f / \partial x \) is the slope of the tangent line you’d see if you stood on the surface and walked in the \( x \)-direction (with \( y \) fixed). Similarly \( \partial f / \partial y \) is the slope in the \( y \)-direction. The two together define a tangent plane to the surface.

Applications

  • Physics. The wave equation, heat equation, and Schrödinger equation are partial differential equations — equations involving partial derivatives of unknown functions of space and time.
  • Thermodynamics. Quantities like temperature, pressure, and entropy depend on multiple state variables; partial derivatives like (∂U/∂V) at constant T appear throughout.
  • Economics. Marginal cost, marginal revenue, and elasticities are partial derivatives — they measure how an outcome changes when one input changes with all others fixed.
  • Optimization. Finding maxima or minima of a function of several variables means finding points where all partials are zero (critical points), then testing with second-order partials.
  • Machine learning. The gradient — vector of partial derivatives — is what gradient descent uses to update model parameters. Backpropagation in neural networks is the chain rule for partial derivatives applied recursively.

Related study notes: Gradient and Divergence, Differential Equations, Integration, Derivatives.

Frequently Asked Questions

What is a partial derivative?

The derivative of a multivariable function with respect to one of its variables, with the others held constant. For f(x, y), the partial derivative ∂f/∂x measures how f changes as x varies while y stays fixed. Notation uses ∂ (a curly d) to distinguish from ordinary derivatives.

How do you compute a partial derivative?

Treat every variable except the one you’re differentiating with respect to as a constant, then apply the usual rules of differentiation. Example: for f(x, y) = x²y + y³, take ∂f/∂x by treating y as a constant: ∂f/∂x = 2xy. Take ∂f/∂y by treating x as a constant: ∂f/∂y = x² + 3y².

What does the symbol ∂ mean?

It’s the partial derivative symbol — a curly ‘d’ (sometimes called ‘del’ or ‘partial’). It’s used to distinguish partial derivatives (∂f/∂x) from ordinary derivatives (df/dx). The ordinary ‘d’ is reserved for single-variable functions where there’s no ambiguity about which variable you’re differentiating.

Are mixed partial derivatives always equal?

Yes when the function and its partials are sufficiently smooth (continuous second partials). This is Clairaut’s theorem: f_xy = f_yx for well-behaved functions. So you can differentiate in either order and get the same result. Pathological functions where the partials exist but are discontinuous can fail this — but you rarely encounter them in physics or engineering.

How are partial derivatives used in machine learning?

The gradient — vector of all partial derivatives — is the search direction for gradient-based optimization. To train a neural network, you compute ∂L/∂w for each weight w of the model, then update w in the direction that decreases the loss L. Backpropagation is the chain rule for partial derivatives applied layer by layer through the network.

What’s the difference between a partial derivative and a total derivative?

A partial derivative holds other variables fixed. A total derivative accounts for how all variables change. If x and y both depend on t and you compute df/dt for f(x, y), the total derivative uses the chain rule: df/dt = (∂f/∂x)(dx/dt) + (∂f/∂y)(dy/dt). The partial would be just one of those terms.