C. The general theory of relativity and galaxies¶

Throughout this book, we largely use Newtonian (classical) mechanics and the Newtonian theory of gravity. But for more than a century now, it has been known that Newtonian theory provides only a limited description of reality. Newtonian theory fails both in the limit of small distances and in the limit of high velocities or strong gravitational fields. The presently-prevailing theory at small distances is that of quantum mechanics and has culminated in the standard model of particle physics; quantum theory is an essential ingredient in many processes relevant to galaxies or their observation, but because the dynamics of galaxies and the Universe is dominated by gravity, we can largely get away with ignoring quantum mechanics in this book. However, we cannot so easily ignore the fact that the Newtonian theory that we use throughout this book has been superseded by the general theory of relativity, which describes gravitation and the mechanics of particles on all but the smallest scales (Einstein 1916). In all cases where the predictions from the general theory of relativity (which we will also refer to as “GR”) differ measurably from those of Newtonian gravity, GR has so far always prevailed (e.g, Dyson et al. 1920; Adams 1925; Shapiro et al. 1971; Williams et al. 2004; Gravity Collaboration et al. 2018; Gravity Collaboration et al. 2020; see Adelberger et al. 2003 and Will 2014 for reviews). In this appendix, we therefore explicitly demonstrate how the Newtonian theory that we use throughout this book can be derived from the general theory of relativity. We will also discuss two applications of GR that we use and that cannot be adequately addressed in the Newtonian limit: the gravitational bending of light that gives rise to gravitational lensing (Chapter 16) and the dynamics of the Universe as a whole that is the basis for our discussion of how galaxies form in the expanding Universe (Chapter 18) .

The general theory of relativity is rightly considered one of the most satisfying theories in fundamental physics. Rather than describing the motion of particles as having their acceleration set by a mysterious force of gravity without explaining why it determines the acceleration rather than, e.g., the velocity or the jerk, GR explains the motion of both matter and light in a gravitational field as that along the shortest path in the four-dimensional spacetime curved by the presence of matter and energy. Einstein arrived at this intuitive picture through the equivalence principle that was inspired by and is a stronger version of the weak equivalence principle that we discuss in Chapter 3.1. The weak equivalence principle states that all objects fall the same way in an external gravitational field, that is, that the inertial mass that appears in Newton’s second law of motion and the gravitational mass that appears in Newton’s law of gravity are one and the same mass. Einstein generalized this principle to also require that in a region of spacetime that is small enough that tidal forces (i.e., gradients in the gravitational field) are negligibe, one cannot distinguish between a uniform acceleration and an external gravitational field; said another way, one cannot locally detect the existence of a gravitational field. In a local volume of spacetime and in the absence of non-gravitational forces, bodies can therefore always be considered to be falling freely, that is, with zero acceleration. And it is then only a small leap to suggest that bodies that are not subject to any non-gravitational forces are always falling freely as they traverse spacetime and that what we think of as the force of gravity is nothing more than the freely-falling path through a curved spacetime.

Taking these simple principles and the intuitive picture of gravity as free-falling motion in a curved spacetime and turning them into a quantitative, mathematical theory of gravitation requires a large amount of mathematics not typically taught to science students. The reward in learning this mathematics is that one understands how Einstein’s field equations, which describe how matter and energy curve spacetime, are essentially the simplest equations that one can write down relating the curvature of spacetime and a generalized notion of energy and momentum. That these equations can describe gravitation to one part in a billion in the solar system (Adelberger et al. 2003) is nothing short of remarkable. Here we simply want to show how Newtonian gravity emerges in the limit of weak gravitational fields and velocities small compared to the speed of light, how light is bent by weak gravitational fields, and we want to be able to build simple cosmological models to understand galaxy formation in the expanding Universe. We will therefore take a more pragmatic approach that is focused on these applications without elucidating the full structure of the theory, but we refer readers to the many excellent textbooks on GR to learn more about the underpinnings and other applications of the theory (e.g., Carroll 2004).

C.1. Einstein’s field equations and geodesic motion¶

C.1.1. Mathematical background¶

In his special theory of relativity, Einstein (1905) introduced the notion that space and time are really part of a single four-dimensional spacetime where distances between points are measured using the four-dimensional line element in rectangular coordinates

\begin{equation}\label{eq-gr-minkowski-metric} \mathrm{d} s^2 = -c^2\mathrm{d}t^2 + \mathrm{d} x^2 + \mathrm{d} y^2 + \mathrm{d} z^2\,. \end{equation}

The line element, or metric, is used to computed distances between two points \((t_1,\vec{x}_1)\) and \((t_2,\vec{x}_2)\). In general we will abbreviate these coordinates as a four-dimensional vector \(x^\mu = (ct,\vec{x})\) and write the line element as

\begin{equation}\label{eq-gr-minkowski-metric-asmetric} \mathrm{d} s^2 = \eta_{\mu\nu}\mathrm{d}x^\nu\mathrm{d} x^\nu\,, \end{equation}

where in the special theory of relativity, \(\eta_{\mu\nu}\) is a diagonal matrix with \(-1\) as the first element and \(+1\) as the other diagonal elements. Depending on the velocity of a body, the line element can be either positive—known as a spacelike path—or negative—known as a timelike path—or zero—the path of light or any particles moving at the speed of light. Massive particles move at velocities \(v < c\) and therefore on timelike paths. Because \(\mathrm{d} s^2\) is negative in this case, it makes sense to introduce a coordinate \(\tau\), which satisfies

\begin{equation}\label{eq-gr-minkowski-propertime} c^2\mathrm{d} \tau^2 = - \mathrm{d} s^2 = c^2\mathrm{d}t^2 - \mathrm{d} x^2 - \mathrm{d} y^2 - \mathrm{d} z^2=-\eta_{\mu\nu}\mathrm{d}x^\nu\mathrm{d} x^\nu\,. \end{equation}

The coordinate \(\tau\) is the proper time and it corresponds to the time measured by a clock caried along a body’s path. The metric is used to compute distances in spacetime, e.g., for spacelike paths \(x^\mu(\lambda)\) parameterized by a parameter \(\lambda\)

\begin{equation} \Delta s = \int_{x_1^\mu=(ct_1,\vec{x}_1)}^{x_2^\mu=(ct_2,\vec{x}_2)}\mathrm{d}\lambda\sqrt{\eta_{\mu\nu}{\mathrm{d}x^\mu \over \mathrm{d} \lambda}{\mathrm{d}x^\nu\over \mathrm{d}\lambda}}\,, \end{equation}

while for timelike paths we have that the distance is the difference in proper time

\begin{equation} \Delta \tau = \int_{x_1^\mu=(ct_1,\vec{x}_1)}^{x_2^\mu=(ct_2,\vec{x}_2)}\mathrm{d}\lambda\sqrt{-\eta_{\mu\nu}{\mathrm{d}x^\mu \over \mathrm{d} \lambda}{\mathrm{d}x^\nu\over \mathrm{d}\lambda}}\,, \end{equation}

The minus sign in front of the time dimension in the line element means that this metric is not of the usual Euclidean form, but is instead Lorentzian; the specific metric in Equation \eqref{eq-gr-minkowski-metric} is the Minkowski metric. The special theory of relativity is built upon the assumption that the laws of nature are the same in all inertial frames, that is, frames that are moving at constant velocity with respect to each other. In particular, the speed of light in vacuum is assumed to be a universal constant. The constancy of the speed of light in different inertial frames leads to the requirement that Lorentz transformations describe the coordinate transformation between two frames \((t,x,y,z)\) and \((t',x',y',y')\) moving with respect to each other; if the primed coordinate frame moves with respect to the unprimed one with a velocity \(v\) in the \(x\) direction, the Lorentz transformation is given by

\begin{equation}\label{eq-gr-lorentz} \begin{pmatrix} ct'\\x'\\y'\\z'\end{pmatrix} = \begin{pmatrix} \gamma & -\beta\gamma & 0 & 0 \\ -\beta\gamma & \gamma & 0 & 0\\ 0 & 0 & 1 & 0\\ 0& 0& 0& 1\end{pmatrix} \begin{pmatrix} ct\\x\\y\\z\end{pmatrix}\,, \end{equation}

where \(\beta = v/c\) and \(\gamma = 1/\sqrt{1-\beta^2}\). The transformation for a general velocity vector \(\vec{v}\) can be obtained by first rotating space such that the velocity is along \(x\), applying the transformation above, and rotating space back to the original direction. Denoting the position four vector as \(X'^{\mu} = (ct',x',y',z')\) and \(X^{\mu} = (ct,x,y,z)\) where \(\mu\) indexes the vector, we can write the general transformation equation as

\begin{equation}\label{eq-gr-sr-vectransform} X'^{\mu} = \Lambda^{\mu}_{\phantom{\mu}\nu}\,X^{\nu}\,, \end{equation}

where \(\Lambda^{\mu}_{\phantom{\mu}\nu}\) is a Lorentz transformation matrix (e.g., that from Equation \ref{eq-gr-lorentz} for a velocity along the \(x\) direction). As is standard in discussions of the special and general theories of relativity, we have used the Einstein summation convention here, which states that repeated indices are summed over; we refer to this process as a contraction. Another example of a vector is the four velocity \(U^\mu = \gamma\,(c,v_x,v_y,v_z)\). Equation \eqref{eq-gr-sr-vectransform} states how vectors transform under Lorentz transformations.

A related concept to vectors is that of one forms \(X_{\mu}\), which have their index on the bottom rather than the top (index placement is important in GR). An example of a one-form is the gradient of a scalar function \(f\), which transforms as

\begin{equation} \begin{pmatrix} {1\over c} {\partial f \over \partial t'} & {\partial f \over \partial x'} & {\partial f \over \partial y'} & {\partial f \over \partial z'}\end{pmatrix} = \begin{pmatrix} {1\over c} {\partial f \over \partial t} & {\partial f \over \partial x} & {\partial f \over \partial y} & {\partial f \over \partial z}\end{pmatrix}\begin{pmatrix} \gamma & +\beta\gamma & 0 & 0 \\ +\beta\gamma & \gamma & 0 & 0\\ 0 & 0 & 1 & 0\\ 0& 0& 0& 1\end{pmatrix}\,, \end{equation}

which we can write as

\begin{equation}\label{eq-gr-sr-gradtransform} \partial'_\mu = \Lambda_{\mu}^{\phantom{\mu}\nu}\,\partial_\nu\,, \end{equation}

where \(\partial_\mu = (\partial / c\partial t,\partial/\partial x,\partial/\partial y,\partial/\partial z)\) and similar for \(\partial'_\mu\). The matrix \(\Lambda_{\mu}^{\phantom{\mu}\nu}\) here is the inverse of the matrix \(\Lambda^{\mu}_{\phantom{\mu}\nu}\) in Equation \eqref{eq-gr-sr-vectransform} and one forms are therefore mathematical objects that transform according to the inverse transformation. Generalizing both vectors and one-forms, tensors are objects \(T^{\mu_1 \mu_2\ldots\mu_n}_{\phantom{\mu_1 \mu_2\ldots\mu_n}\nu_1\nu_2\ldots\nu_m}\) that transform as

\begin{equation}\label{eq-gr-sr-tensortransform} T'^{\mu_1 \mu_2\ldots\mu_n}_{\phantom{\mu_1 \mu_2\ldots\mu_n}\nu_1\nu_2\ldots\nu_m} = \Lambda^{\mu_1}_{\phantom{\mu_1}\alpha_1}\Lambda^{\mu_2}_{\phantom{\mu_2}\alpha_2}\ldots\Lambda^{\mu_n}_{\phantom{\mu_n}\alpha_n}\Lambda_{\nu_1}^{\phantom{\nu_1}\beta_1}\Lambda_{\nu_2}^{\phantom{\nu_2}\beta_2}\ldots\Lambda_{\nu_m}^{\phantom{\nu_m}\beta_m}\,T^{\alpha_1 \alpha_2\ldots\alpha_n}_{\phantom{\alpha_1 \alpha_2\ldots\alpha_n}\beta_1\beta_2\ldots\beta_m}\,. \end{equation}

One important property of a tensor is the number of upper and lower indices that it has, because this determines the transformation equation; the tensor above is said to be a \((n,m)\) tensor, with \(n\) upper indices and \(m\) lower indices. Scalars are quantities that are invariant under coordinate transformations; thus, they are \((0,0)\) tensors.

The first important tensor in special relativity is the metric tensor \(\eta_{\mu\nu}\), which gives the line element in Equation \eqref{eq-gr-minkowski-metric-asmetric}. Because \(\mathrm{d} s^2\) is a scalar and \(\mathrm{d}x^\nu\) and \(\mathrm{d} x^\nu\) are both vectors, it is clear that \(\eta_{\mu\nu}\) is a \((0,2)\) tensor. The Minkowski metric \(\eta_{\mu\nu}\) is simply

\begin{equation} \eta_{\mu\nu} = \mathrm{diag}(-1,1,1,1)\,, \end{equation}

for \(\mathrm{d}x^\nu = (c\mathrm{d}t,\mathrm{d}x,\mathrm{d}y,\mathrm{d}z)\), where \(\mathrm{diag}(\cdot,\cdot,\cdot,\cdot)\) denotes a diagonal matrix. The inverse metric \(\eta^{\mu\nu}\) is defined such that \(\eta_{\mu\lambda}\,\eta^{\lambda\nu} = \delta^{\mu}_{\nu}\), where \(\delta^{\mu}_{\nu}\) is the Kronecker delta. Because the metric is a matrix in the language of linear algebra, the inverse metric’s elements are given by the inverse matrix’ elements. Because it is equivalent to a diagonal matrix, the elements of the inverse Minkowski metric are equal to those of the Minkowski metric itself. In the general theory of relativity, the metric is generalized, but it is always the case that both the metric and the inverse metric are symmetric. One useful property of the metric is that it allows one to raise or lower indices on tensors to create different tensors. For example, to turn the four velocity \(U^\mu\) into a one form \(U_\mu\), do

\begin{equation} U_\mu = \eta_{\mu\nu}\,U^\nu\,. \end{equation}

With this mathematical infrastructure of tensors, we can then re-state special relativity’s postulate that the laws of motion are the same in every inertial reference frame as: any law of physics needs to be written in tensorial form where tensors transform under Lorentz transformations according to Equation \eqref{eq-gr-sr-tensortransform}. For example, a law of physics

\begin{equation} {\partial F^{\alpha\beta} \over \partial x^\alpha} = \mu_0\,J^\beta\,, \end{equation}

where \(\mu_0\) is some constant, \(F^{\alpha\beta}\) is a \((2,0)\) tensor, and \(J^\beta\) is a vector, would transform to a primed frame as

\begin{align} {\partial F'^{\alpha\beta} \over \partial x'^\alpha} & = \Lambda^{\alpha}_{\phantom{\alpha}\delta} \Lambda^{\beta}_{\phantom{\beta}\epsilon} \Lambda_{\alpha}^{\phantom{\alpha}\zeta} {\partial F^{\delta\epsilon} \over \partial x^\zeta}\\ & = \Lambda^{\beta}_{\phantom{\beta}\epsilon} \delta^{\zeta}_{\delta} {\partial F^{\delta\epsilon} \over \partial x^\zeta}\\ & = \Lambda^{\beta}_{\phantom{\beta}\epsilon} {\partial F^{\delta\epsilon} \over \partial x^\delta}\\ & = \Lambda^{\beta}_{\phantom{\beta}\epsilon}\,\mu_0\,J^\epsilon\\ & = \mu_0\,J'^\beta\,, \end{align}

and therefore has the same form. This only serves as an example, but it is in fact one of Maxwell’s equations describing electromagnetism.

In the general theory of relativity, invariance of the laws of physics under Lorentz transformations is generalized to require invariance under any coordinate transformation of spacetime. Spacetime itself is also generalized from the Minkowskian form where distances are computed using Equation \eqref{eq-gr-minkowski-metric} to being calculated using a general metric \(g_{\mu\nu}\) as

\begin{equation}\label{eq-gr-lineelement} \mathrm{d} s^2 = g_{\mu\nu}\mathrm{d}x^\nu\mathrm{d} x^\nu\,. \end{equation}

where \(g_{\mu\nu}\) is a symmetric \((0,2)\) tensor and the metric is always dimensionless. Such more general spaces are mathematically known as manifolds and one manifold that is likely familiar is that of the surface of a sphere, which is a two-dimensional analog of the types of manifolds that appear in GR; on the surface of a sphere, the line element is given by

\begin{equation} \mathrm{d} s^2 = \mathrm{d}\theta^2 + \sin^2 \theta\,\mathrm{d}\phi^2\,, \end{equation}

where \((\theta,\phi)\) are the polar and azimuthal angle in the spherical coordinates of Appendix A.1. Much like the surface of a sphere is curved, spacetime itself can be curved. While we think of a sphere as being curved because we can see it as curved embedded in the three-dimensional space of regular life, an important property of manifolds is that the curvature is in fact an intrinsic property of the manifold that can be determined without any reference to an embedding space. A standard way to make this palatable is to point out that one can figure out the curvature of a sphere from determining the sum of the angles of a triangle drawn on its surface, which is larger than \(180^\circ\) by an amount that depends on the curvature; this measurement can manifestly be performed without reference to the embedding space.

Properly defining vectors, one-forms, and general tensors on manifolds is mathematically difficult and requires the introduction of tangent spaces and cotangent spaces that correspond to the space of vectors and one forms, respectively, at a given point. We’ll try to get by in the following without introducing this level of mathematical rigor. Assuming that vectors, one forms, and tensors can be defined on the spacetime manifold, they transform under a coordinate transformation \(x^\mu \rightarrow x'^\mu\) as, for example, for a vector \(X^\mu\)

\begin{equation}\label{eq-gr-gr-vectransform} X'^{\mu} = {\partial x'^\mu \over \partial x^\nu}\,X^{\nu}\,, \end{equation}

that is, simply using the Jacobian of the coordinate transformation. One forms \(X_\mu\) transform as

\begin{equation}\label{eq-gr-gr-oneformtransform} X'_{\mu} = {\partial x^\nu \over \partial x'^\mu}\,X_{\nu}\,, \end{equation}

that is, using the inverse Jacobian. General tensors transform as

\begin{equation}\label{eq-gr-gr-tensortransform} T'^{\mu_1 \mu_2\ldots\mu_n}_{\phantom{\mu_1 \mu_2\ldots\mu_n}\nu_1\nu_2\ldots\nu_m} = {\partial x'^{\mu_1} \over \partial x^{\alpha_1}}{\partial x'^{\mu_2} \over \partial x^{\alpha_2}}\ldots{\partial x'^{\mu_n} \over \partial x^{\alpha_n}} {\partial x^{\beta_1} \over \partial x'^{\nu_1}}{\partial x^{\beta_2} \over \partial x'^{\nu_2}}\ldots{\partial x^{\beta_n} \over \partial x'^{\nu_n}}\,T^{\alpha_1 \alpha_2\ldots\alpha_n}_{\phantom{\alpha_1 \alpha_2\ldots\alpha_n}\beta_1\beta_2\ldots\beta_m}\,. \end{equation}

It is clear that the tensor-transformation equation from the special theory of relativity—Equation \eqref{eq-gr-sr-tensortransform}—is a special case of this for coordinate transformations given by the Lorentz transformation.

Laws of physics are generally written as differential equations, relating the rate of change with respect to time or space of different quantities. For example, Newton’s second law relates the acceleration, or the rate of change of the velocity with respect to time, to the force. The force itself is (minus) the gradient of a potential for a conservative force, or the rate of change of the potential with respect to the different spatial coordinates. To write down tensorial laws of physics on general manifolds, we therefore need a notion of a derivative on the manifold. Because the partial derivative operator \(\partial_\mu\) looks like it might be a one form, one might be tempted to think that the partial derivative can just act as a general derivative on any manifold, but in fact, the partial derivative operator acting on a tensor does not transform as a tensor, that is, according to Equation \eqref{eq-gr-gr-tensortransform}. However, it turns out that we can fix the partial derivative by adding a linear correction to it in a way that conserves desirable properties of the derivative, such as linearity and satisfying the product rule; this leads to the covariant derivative. For example, for vectors the covariant derivative \(\nabla_\mu\) acting on a vector \(X^\mu\) gives the \((1,1)\) tensor \(\nabla_\mu X^\nu\) through

\begin{equation} \nabla_\mu X^\nu = \partial_\mu X^\nu + \Gamma^{\nu}_{\mu\lambda}\,X^\lambda\,. \end{equation}

We want this covariant derivative of a vector to transform as a \((1,1)\) tensor, so we need

\begin{equation} \nabla'_\mu X'^\nu = {\partial x^\delta \over \partial x'^\mu}{\partial x'^\nu \over \partial x^\epsilon}\nabla_\delta X^\epsilon\,. \end{equation}

By requiring that this is the case, we can determine the transformation properties of \(\Gamma^{\nu}_{\mu\lambda}\). Working out \(\nabla'_\mu X'^\nu\) with the transformation properties we know, we have that

\begin{align} \nabla'_\mu X'^\nu & = \partial'_\mu X'^\nu + \Gamma'^{\nu}_{\mu\lambda}\,X'^\lambda\\ & = {\partial \over \partial x'^\mu} X'^\nu + \Gamma'^{\nu}_{\mu\lambda}\,X'^\lambda\\ & = {\partial x^\delta \over \partial x'^\mu}{\partial \over \partial x^\delta}\left( {\partial x'^\nu \over \partial x^\epsilon}X^\epsilon\right) + \Gamma'^{\nu}_{\mu\lambda}\,{\partial x'^\lambda \over \partial x^\epsilon}X^\epsilon\\ & = {\partial x^\delta \over \partial x'^\mu}{\partial x'^\nu \over \partial x^\epsilon}{\partial \over \partial x^\delta}X^\epsilon+{\partial x^\delta \over \partial x'^\mu}X^\epsilon{\partial \over \partial x^\delta}\left( {\partial x'^\nu \over \partial x^\epsilon}\right) + \Gamma'^{\nu}_{\mu\lambda}\,{\partial x'^\lambda \over \partial x^\epsilon}X^\epsilon\,, \end{align}

and we want this to equal

\begin{equation} {\partial x^\delta \over \partial x'^\mu}{\partial x'^\nu \over \partial x^\epsilon}\nabla_\delta X^\epsilon = {\partial x^\delta \over \partial x'^\mu}{\partial x'^\nu \over \partial x^\epsilon}{\partial \over \partial x^\delta} X^\epsilon+ {\partial x^\delta \over \partial x'^\mu}{\partial x'^\nu \over \partial x^\epsilon} \Gamma^{\epsilon}_{\delta\lambda} X^\lambda\,. \end{equation}

For this equality to hold for any vector \(X^\mu\), it has to be the case that

\begin{equation}\label{eq-gr-connection-coeffs-transform} \Gamma'^{\nu}_{\mu\lambda} = {\partial x^\delta \over \partial x'^\mu}{\partial x^\epsilon \over \partial x'^\lambda}{\partial x'^\nu\over \partial x^\zeta}\Gamma^\zeta_{\delta\epsilon} + {\partial x^\delta \over \partial x'^\mu}{\partial x^\epsilon \over \partial x'^\lambda}{\partial^2 x'^\nu \over \partial x^\delta \partial x^\epsilon}\,. \end{equation}

The \(\Gamma^{\nu}_{\mu\lambda}\) coefficients are known as connection coefficients. Note that \(\Gamma^{\nu}_{\mu\lambda}\) is not a tensor itself; indeed, it cannot be, because we constructed it to correct the non-tensorial nature of the partial derivative in such a way that the covariant derivative is tensorial itself. Furthermore requiring that the covariant derivative reduces to partial derivatives when applied to scalars and that it commutes with contractions of indices—that is, e.g., \(\nabla_\mu(T^\lambda_{\phantom{\lambda}\lambda\nu}) = (\nabla T)^{\phantom{\mu}\lambda}_{\mu\phantom{\lambda}\lambda\nu}\)—allows one to show that the covariant derivative of a one form \(X_\mu\) is given by

\begin{equation} \nabla_\mu X_\nu = \partial_\mu X_\nu - \Gamma^{\lambda}_{\mu\nu}\,X_a\lambda\,, \end{equation}

where \(\Gamma^{\lambda}_{\mu\nu}\) is the same matrix as that appears in the covariant derivative of a vector. Because tensors transform as combinations of \(n\) vectors and \(m\) one-forms, the covariant derivative of a general tensor is

\begin{align} \nabla_{\sigma} T^{\mu_1 \mu_2\ldots\mu_n}_{\phantom{\mu_1 \mu_2\ldots\mu_n}\nu_1\nu_2\ldots\nu_m} = & \ \partial_{\sigma} T^{\mu_1 \mu_2\ldots\mu_n}_{\phantom{\mu_1 \mu_2\ldots\mu_n}\nu_1\nu_2\ldots\nu_m} \\ & + \Gamma^{\mu_1}_{\sigma\lambda}T^{\lambda \mu_2\ldots\mu_n}_{\phantom{\lambda \mu_2\ldots\mu_n}\nu_1\nu_2\ldots\nu_m} + \Gamma^{\mu_2}_{\sigma\lambda}T^{\mu_1\lambda\ldots\mu_n}_{\phantom{\mu_1 \lambda\ldots\mu_n}\nu_1\nu_2\ldots\nu_m} + \ldots + \Gamma^{\mu_n}_{\sigma\lambda}T^{\mu_1 \mu_2\ldots\lambda}_{\phantom{\mu_1 \mu_2\ldots\lambda}\nu_1\nu_2\ldots\nu_m}\nonumber \\ & - \Gamma^{\lambda}_{\sigma\nu_1}T^{\mu_1 \mu_2\ldots\mu_n}_{\phantom{\mu_1 \mu_2\ldots\mu_n}\lambda\nu_2\ldots\nu_m} - \Gamma^{\lambda}_{\sigma\nu_2}T^{\mu_1 \mu_2\ldots\mu_n}_{\phantom{\mu_1 \mu_2\ldots\mu_n}\nu_1\lambda\ldots\nu_m} - \ldots - \Gamma^{\lambda}_{\sigma\nu_m}T^{\mu_1 \mu_2\ldots\mu_n}_{\phantom{\mu_1 \mu_2\ldots\mu_n}\nu_1\nu_2\ldots\lambda} \,.\nonumber \end{align}

One can define many different covariant derivatives by choosing different connection coefficients \(\Gamma^{\lambda}_{\mu\nu}\), because these all define a good covariant derivatives as long as they satisfy Equation \eqref{eq-gr-connection-coeffs-transform}. But if we additionally require that the connection coefficients are symmetric under the swapping of their lower coefficients, \(\Gamma^{\lambda}_{\mu\nu} = \Gamma^{\lambda}_{\nu\mu}\), and that the metric is covariantly constant, \(\nabla_\sigma g_{\mu\nu} = 0\), then there is a single covariant derivative that has these properties and it is this covariant derivative that is used in the general theory of relativity for reasons that will become clear in the next paragraph. From the two defining properties of these connection coefficients, one can derive an explicit expression for them

\begin{equation}\label{eq-gr-christoffel-metric} \Gamma^{\lambda}_{\mu\nu} = {1\over 2}g^{\lambda \epsilon}\,\left(\partial_\mu g_{\nu\epsilon} + \partial_\nu g_{\epsilon\mu}-\partial_\epsilon g_{\mu\nu}\right)\,. \end{equation}

This connection is known as the Christoffel connection.

C.1.2. Generalizing Newton’s second law: the geodesic equation¶

We will require a bit more mathematical formalism to understand how the presence of matter and energy curves spacetime, but assuming for now that spacetime is curved by the presence of matter and energy, we can derive the GR generalization of Newton’s second law of motion, Equation (4.2). Because in GR, gravity is not a force, but rather the motion under the influence of gravity is that of motion in the spacetime curved by matter and energy, the relevant form that we need to generalize is the force-free form of Newton’s second law (assuming that no other forces are relevant). For a path \(x^\mu(\lambda)\) parameterized by a so-called affine parameter \(\lambda\), we can write Newton’s second law without force as

\begin{equation}\label{eq-gr-newton2} {\mathrm{d}^2 x^\mu \over \mathrm{d} \lambda^2} = 0\,, \end{equation}

because in the Newtonian framework, the zero-th component of this is

\begin{equation} {\mathrm{d}^2 x^0 \over \mathrm{d} \lambda^2} = c{\mathrm{d}^2 t \over \mathrm{d} \lambda^2}= 0\,, \end{equation}

with solution

\begin{equation} t = A\,\lambda+B\,, \end{equation}

and \(\lambda\) is therefore nothing more than an arbitrarily re-scaled time; choosing \(A=1\), \(B=0\), we have that \(\lambda = t\) and the spatial part of Equation \eqref{eq-gr-newton2} is

\begin{equation} {\mathrm{d}^2 \vec{x} \over \mathrm{d} t^2} = 0\,, \end{equation}

which is equivalent to Newton’s second law in the absence of a force for constant mass. To generalize Equation \eqref{eq-gr-newton2} to curved spacetime, we therefore have to write it in tensorial form. To do this, we use the principle of minimal coupling, which we can informally state as saying that the generalization of any law of physics from the Minkowski spacetime of the special theory of relativity to the general curved spacetimes of GR has to be as simple as possible. In particular, we should not introduce the tensor that we will use to describe curvature below or its contractions explicitly. Because we have not introduced the curvature tensor yet, we are in no danger of doing that here! Thus, we simply write Equation \eqref{eq-gr-newton2} in a covariant way under Lorentz transformations and generalize this form to curved spacetimes. A Lorentz-invariant way of writing Equation \eqref{eq-gr-newton2} is as

\begin{equation}\label{eq-gr-newton3} {\mathrm{d}^2 x^\mu \over \mathrm{d} \lambda^2} = {\mathrm{d} x^\nu \over \mathrm{d} \lambda}\partial_\nu {\mathrm{d} x^\mu \over \mathrm{d} \lambda} = 0\,, \end{equation}

because \({\mathrm{d} x^\mu \over \mathrm{d} \lambda}\) is a vector and in the absence of curvature, partial differentiation of a vector gives a tensor. In curved spacetime, we know from the discussion above that partial differentiation of a vector does not produce a tensor, but we also know that we can correct this by switching to the covariant derivative. The obvious curved-spacetime generalization of Equation \eqref{eq-gr-newton3} is therefore

\begin{equation} {\mathrm{d} x^\nu \over \mathrm{d} \lambda}\nabla_\nu {\mathrm{d} x^\mu \over \mathrm{d} \lambda} = 0\,. \end{equation}

Writing out the covariant derivative explicitly and slightly simplifying the resulting expression, we have

\begin{equation}\label{eq-gr-geodesic-1} {\mathrm{d}^2 x^\mu \over \mathrm{d} \lambda^2} +\Gamma^{\mu}_{\nu\epsilon}{\mathrm{d} x^\nu \over \mathrm{d} \lambda} {\mathrm{d} x^\epsilon \over \mathrm{d} \lambda} = 0\,. \end{equation}

This is the geodesic equation and it is the GR generalization of Newton’s second law. Any connection \(\Gamma^{\mu}_{\nu\epsilon}\) on a manifold defines a different geodesic equation, but only the Christoffel connection from Equation \eqref{eq-gr-christoffel-metric} has the property that the resulting path is the shortest spacetime path between two points \(x_1^\mu\) and \(x_2^\mu\). Proving this explicitly is a somewhat tedious and unsightful exercise using standard calculus of variations and we will skip this proof here. But the important thing to remember is that the geodesic equation using the Christoffel connection implies that bodies move along paths that are the shortest path in spacetime in the absence of non-gravitational forces.

C.1.3. Curvature¶

Our remaining task for constructing a fully tensorial theory of gravity in general spacetimes is to generalize Newton’s law of gravity, or equivalently, the Poisson equation (3.2). Because in GR, the force of gravity is replaced by the curvature of spacetime, which is curved by matter and energy, we need to describe the curvature and the matter and energy content and relate them in a tensorial way. Let’s start with curvature. As we saw above, partial differentiation in flat spacetimes is replaced by covariant derivatives in curved spacetimes. In flat space, the order of partial differentiation does not matter, that is, for example \(\partial_\mu \partial_\nu X^\epsilon = \partial_\nu \partial_\mu X^\epsilon\), but this does not hold for covariant derivatives. For the specific case of the Christoffel connection, we can write

\begin{equation} \nabla_\mu \nabla_\nu X^\delta - \nabla_\nu \nabla_\mu X^\delta = R^\delta_{\phantom{\delta}\epsilon\mu\nu}X^\epsilon\,, \end{equation}

(for connections that are not symmetric under the swapping of their lower indices, there is an additional term proportional to the antisymmetric part of the connection). Because the right-hand side is zero and, thus, \(R^\delta_{\phantom{\delta}\epsilon\mu\nu}=0\) in flat spacetime where the covariant derivative reduces to a partial derivative, it is clear that the tensor \(R^\delta_{\phantom{\delta}\epsilon\mu\nu}\) is a measure of the curvature. This tensor is known as the Riemann tensor and from its definition above, we can obtain an explicit expression for it in terms of the connection

\begin{equation}\label{eq-gr-riemann-tensor} R^\delta_{\phantom{\delta}\epsilon\mu\nu} = \partial_\mu\Gamma^\delta_{\nu\epsilon}-\partial_\nu\Gamma^\delta_{\mu\epsilon}+\Gamma^\delta_{\mu\lambda}\Gamma^\lambda_{\nu\epsilon}-\Gamma^\delta_{\nu\lambda}\Gamma^\lambda_{\mu\epsilon}\,. \end{equation}

Combined with Equation \eqref{eq-gr-christoffel-metric} for the Christoffel connection, it is clear that the curvature is solely determined by the metric \(g_{\mu\nu}\). We need two further tensors that are computed from the Riemann tensor. The Ricci tensor \(R_{\mu\nu}\) is the following contraction of the Riemann tensor

\begin{equation}\label{eq-gr-ricci-tensor} R_{\mu\nu} = R^\delta_{\phantom{\delta}\mu\delta\nu}\,, \end{equation}

and we can define the index{Ricci scalar} Ricci scalar \(R\) from it

\begin{equation}\label{eq-gr-ricci-scalar} R = g^{\mu\nu}R_{\mu\nu}\,. \end{equation}

The Ricci tensor and Ricci scalar satisfy the following useful relation

\begin{equation}\label{eq-gr-deinsteintensor} g^{\mu\delta}\nabla_\delta\left(R_{\mu\nu}-{1\over 2}R\,g_{\mu\nu}\right)=0\,. \end{equation}

The tensor in the parentheses here is known as the Einstein tensor for reasons that will soon become clear. As we discussed above, the metric is dimensionless and the Riemann tensor, the Ricci tensor, and the Ricci scalar have dimensions of 1/length\(^2\), because they are obtained from second derivatives with respect to \(x^\mu\) (which has units of length). The inverse of the square root of these tensors therefore gives a typical length on which the space is curved, with a large length corresponding to a small curvature. As an example, the Ricci scalar for a sphere with radius \(r\) is \(2/r^2\), which makes sense: a sphere with a large radius (say the Earth) has a smaller curvature than one with a small radius (say a tennis ball you are holding here on Earth).

C.1.4. Matter and energy¶

The final ingredient that we need is to express the matter and energy that curve spacetime. So far we haven’t said much about quantities like momentum, energy, etc., instead mainly focusing on the geometric structure of spacetime. But now we will discuss how we can deal with quantities like momentum and energy in a tensorial fashion. As we already discussed above, the position four vector is given by \(X^{\mu} = (ct,x,y,z)\) and the four velocity is \(U^\mu = \gamma\,(c,v_x,v_y,v_z)\); the four velocity is in fact the derivative of the position four vector with respect to the proper time

\begin{equation} U^\mu = {\mathrm{d} X^\mu \over \mathrm{d}\tau}\,. \end{equation}

(to show that this is equivalent to \(U^\mu = \gamma\,(c,v_x,v_y,v_z)\), use the fact that \(\mathrm{d}t = \gamma \mathrm{d}\tau\)). The four momentum \(p^\mu\) is then given by

\begin{equation} p^\mu = m\,U^\mu\,. \end{equation}

In the limit of small velocities, \(p^0 = \gamma\,mc \approx mc+ mv^2/[2c] = E/c\), where \(E = mc^2 + mv^2/2\) is the rest energy \(mc^2\) plus the kinetic energy. Similarly, \(p^i\) (where the latin index \(i\) indicates that this only indexes the spatial directions) is \(p^i = \gamma\,mv^i \approx mv^i\), the momentum in classical mechanics. The four momentum therefore generalizes the concepts of energy and momentum.

While \(X^\mu\) and \(p^\mu\) describe a single particle, to describe an extended system with macroscopic properties such as density and pressure, we need a more general concept, much like we describe a gas with density, pressure, etc. or the gravitational effect of a massive body in Newtonian gravity with its density. In this context, such a system is always described as a fluid, of which the important characteristics for how it couples to gravity are its energy and momentum densities, pressure, and anisotropic stress (but it can also have entropy etc.). Counting the number of properties, you see that there are 10 and they can therefore be described by a symmetric \(4\times 4\) matrix (four diagonal elements and six off-diagonal elements). By constructing this matrix such that it is a tensor \(T^{\mu\nu}\) we’ll have what we need to write down Einstein’s equations for coupling matter and energy to gravity.

The tensor \(T^{\mu\nu}\) is the stress-energy tensor or the energy-momentum tensor. It is technically defined as the flux of four-momentum \(p^\mu\) across a surface at constant \(x^\nu\). In a fluid’s rest frame, the first component, \(T^{00}\), is therefore the flux of energy in time, which is the density multiplied by \(c^2\), and the \(T^{i0} = T^{0i}\) components are similarly the momentum density. The spatial components \(T^{ij}\) describe the stresses in the fluid, with the diagonal terms representing pressure in different directions and the off-diagonal terms describing shears. In all of the applications that we consider here, we can approximate the relevant fluids as a perfect fluid, which is a fluid that can be fully characterized by the rest-frame energy density \(\rho\) and the rest-frame isotropic pressure \(p\) (\(p\) without an index here represents the pressure rather than the momentum). Thus, in the rest frame of a perfect fluid, the stress-energy tensor is given by

\begin{equation} T^{\mu\nu} = \begin{pmatrix} \rho c^2 & 0 & 0 & 0 \\ 0 & p & 0 & 0\\ 0 & 0 & p & 0\\ 0& 0& 0& p\end{pmatrix}\,. \end{equation}

To figure out how we can write this in a tensorial manner, we can for example transform it to a moving reference frame by applying a Lorentz transformation. For the frame moving at speed \(v\) in the \(x\) direction from Equation \eqref{eq-gr-lorentz}, we get

\begin{align} T^{\mu\nu} & = \begin{pmatrix} \gamma & -\beta\gamma & 0 & 0 \\ -\beta\gamma & \gamma & 0 & 0\\ 0 & 0 & 1 & 0\\ 0& 0& 0& 1\end{pmatrix}\begin{pmatrix} \rho c^2 & 0 & 0 & 0 \\ 0 & p & 0 & 0\\ 0 & 0 & p & 0\\ 0& 0& 0& p\end{pmatrix}\begin{pmatrix} \gamma & -\beta\gamma & 0 & 0 \\ -\beta\gamma & \gamma & 0 & 0\\ 0 & 0 & 1 & 0\\ 0& 0& 0& 1\end{pmatrix}\\ & = \begin{pmatrix} \gamma^2 \rho c^2 + \beta^2\gamma^2 p & -\beta\gamma^2 \rho c^2-\beta\gamma^2 p & 0 & 0 \\ -\beta\gamma^2 \rho c^2-\beta\gamma^2 p& \gamma^2\beta^2 \rho c^2 + \gamma^2 p & 0 & 0\\ 0 & 0 & p & 0\\ 0& 0& 0& p\end{pmatrix}\\ & = \gamma^2 \begin{pmatrix} \rho c^2 + p & -\beta [\rho c^2+ p] & 0 & 0 \\ -\beta [\rho c^2+ p]& \beta^2 [\rho c^2 + p] & 0 & 0\\ 0 & 0 & 0 & 0\\ 0& 0& 0& 0\end{pmatrix} +\begin{pmatrix} -p & 0 & 0 & 0 \\ 0& p & 0 & 0\\ 0 & 0 & p & 0\\ 0& 0& 0& p\end{pmatrix} \end{align}

We can compare this to the tensor product of the four-velocity \(U^\mu\), which in this frame is \(U^\mu = \gamma(c,-v,0,0) = \gamma c(1,-\beta,0,0)\) (because the fluid was stationary in the original frame, its velocity in the moving frame is the opposite of the frame’s velocity)

\begin{equation} U^\mu U^\nu = \gamma^2 c^2 \begin{pmatrix} 1 \\ -\beta \\ 0 \\ 0\end{pmatrix} \begin{pmatrix} 1 & -\beta & 0 & 0\end{pmatrix} = \gamma^2 c^2 \begin{pmatrix} 1 & -\beta & 0 & 0 \\ -\beta & \beta^2 & 0 & 0 \\ 0&0&0&\\0&0&0&0\end{pmatrix}\,, \end{equation}

it is clear that we can write

\begin{equation} T^{\mu\nu} = \left(\rho + {p\over c^2}\right)\,U^\mu U^\nu + p\,\eta^{\mu\nu}\,, \end{equation}

where we use the Minkowski metric because we are considering only the Lorentz transformations from the special theory of relativity. This is now manifestly a tensor and it turns out to be the correct one, as one can check by applying different Lorentz transformations. The obvious generalization to curved spacetime of this is

\begin{equation}\label{eq-gr-stressenergy-perfectfluid} T^{\mu\nu} = \left(\rho + {p\over c^2}\right)\,U^\mu U^\nu + p\,g^{\mu\nu}\,, \end{equation}

that is, we simply replace the Minkowski metric with the general metric. Non-relativistic matter in galaxies and the Universe has a pressure that is much smaller than its energy density, and for such pressureless dust, the stress-energy tensor is simply

\begin{equation}\label{eq-gr-stressenergy-dust} T^{\mu\nu} = \rho\,U^\mu U^\nu\,. \end{equation}

Conservation of energy and momentum when a system is invariant under time or space translations is an important aspect of classical mechanics. In the special theory of relativity, this can be expressed as

\begin{equation} \partial_\mu T^{\mu\nu} = 0\,. \end{equation}

In curved spacetime, this is generalized to

\begin{equation} \nabla_\mu T^{\mu\nu} = 0\,, \end{equation}

which from the discussion above should come as no surprise. Generally, the stress-energy tensor is derived from whatever theory describes the matter or energy that one is considering. For example, the stress-energy tensory of electromagnetic fields is determined from electromagnetism. We won’t say more about that here, because for our purposes we can get by with the informal discussion of the stress-energy tensor presented here.

C.1.5. Einstein’s field equations¶

We are now finally at the point where we can state how macroscopic systems of matter and energy curve spacetime. As discussed in the previous few paragraphs, macroscopic systems are described by their stress-energy tensor \(T^{\mu\nu}\), while the curvature of spacetime is expressed using the Riemann tensor \(R^\delta_{\phantom{\delta}\epsilon\mu\nu}\) and its contractions in the form of the Ricci tensor \(R_{\mu\nu}\) and the Ricci scalar \(R\). Because we have to relate curvature to a \((2,0)\) tensor, the obvious choice is the Ricci tensor, but stating that \(R_{\mu\nu} \propto T_{\mu\nu}\) would violate the conservation of the stress-energy tensor (because \(g^{\mu\delta}\nabla_\delta R_{\mu\nu} = g^{\mu\delta}\nabla_\delta R\,g_{\mu\nu}/2\); see Equation \ref{eq-gr-deinsteintensor}). Because of Equation \eqref{eq-gr-deinsteintensor}, however, if the Einstein tensor is proportional to the stress-energy tensor, then the stress-energy tensor is conserved. The resulting equations are the Einstein field equations

\begin{equation}\label{eq-gr-fieldeq} R_{\mu\nu}-{1\over 2}R\,g_{\mu\nu} = {8\pi G \over c^4}\,T_{\mu\nu}\,. \end{equation}

Because each side is a symmetric \((0,2)\) tensor, this represents ten equations, which is why the plural ‘field equations’ is used. Because the metric is covariantly constant, the following more general version still leads to \(\nabla_\mu T^{\mu\nu} = 0\)

\begin{equation}\label{eq-gr-fieldeq-general} R_{\mu\nu}-{1\over 2}R\,g_{\mu\nu} + \Lambda g_{\mu\nu}= {8\pi G \over c^4}\,T_{\mu\nu}\,. \end{equation}

The constant \(\Lambda\) here is the cosmological constant. Einstein first introduced this term and then removed it, but one interpretation of the discovery that the expansion of the present-day Universe is accelerating (Riess et al. 1998; Perlmutter et al. 1999) is that that \(\Lambda \neq 0\) and that Equation \eqref{eq-gr-fieldeq-general} therefore describes our Universe; all present observations are consistent with this interpretation. Because in the following we work with relatively simple stress-energy tensors, a useful alternative formulation of the field equations can be obtained by contracting Equation \eqref{eq-gr-fieldeq-general} with the metric, solving for the Ricci scalar, and plugging this into Equation \eqref{eq-gr-fieldeq-general}; this gives

\begin{equation}\label{eq-gr-fieldeq-general-alt} R_{\mu\nu}- \Lambda g_{\mu\nu}= {8\pi G \over c^4}\,\left(T_{\mu\nu} -{1 \over 2}T\,g_{\mu\nu}\right)\,, \end{equation}

where \(T = g_{\mu\nu} T^{\mu\nu} = g^{\mu\nu} T_{\mu\nu}\).

Our description of how the general theory of relativity changes Newtonian mechanics and gravitation is now complete. Newton’s second law is replaced by the geodesic Equation \eqref{eq-gr-geodesic-1} and we repeat it here to have it handy for what follows

\begin{equation}\label{eq-gr-geodesic} {\mathrm{d}^2 x^\mu \over \mathrm{d} \lambda^2} +\Gamma^{\mu}_{\nu\epsilon}{\mathrm{d} x^\nu \over \mathrm{d} \lambda} {\mathrm{d} x^\epsilon \over \mathrm{d} \lambda} = 0\,. \end{equation}

and Newton’s law of gravitation is replaced by the Einstein field equations of Equation \eqref{eq-gr-fieldeq-general}. In the next section, we demonstrate that these equations reduce to Newton’s laws in the limit of small velocities and weak gravitational fields.

C.2. The Newtonian limit¶

In this and the next section, we consider the general theory of relativity in the limit of weak gravitational fields. In this section we focus on non-relativistic matter and demonstrate that Einstein’s field equations and the geodesic equation reduce to the Poisson equation and Newton’s second law, respectively. In the next section, we consider the trajectories of light in weak gravitational fields, which will allow us to discuss the way gravitational fields bend light and give rise to the phenomenon of gravitational lensing. To allow for these two applications, we start with a general discussion of GR in the limit of weak gravitational fields. For completeness, we’ll carry around the cosmological constant \(\Lambda\) for a while below, but the measured value of \(\Lambda\) is much smaller than the curvature induced by galactic gravitational fields and \(\Lambda\) is only relevant on cosmological scales (\(\Lambda \approx 1/\mathrm{Gpc}^2)\).

C.2.1. Einstein’s field equations for weak gravitational fields¶

What we mean by a weak gravitational field in GR is that the metric \(g_{\mu\nu}\) is approximately the Minkowskian metric \(\eta_{\mu\nu}\) of flat space, with a first order correction \(h_{\mu\nu}\) whose elements satisfy \(|h_{\mu\nu}| \ll 1\)

\begin{equation} g_{\mu\nu} = \eta_{\mu\nu} + h_{\mu\nu}\,, \end{equation}

and because \(|h_{\mu\nu}| \ll 1\), we can raise and lower indices simply by using the Minkowski metric, for example, \(h^{\mu\nu} = \eta^{\mu\delta}\eta^{\nu\epsilon} h_{\delta\epsilon}\) to first order in the small quantities \(h_{\mu\nu}\). We then need to work out Einstein’s field equations to first order in \(h_{\mu\nu}\) and therefore need to compute the Einstein tensor from Equation \eqref{eq-gr-deinsteintensor} to this order. The first thing we need to do is to compute the Christoffel connection from Equation \eqref{eq-gr-christoffel-metric}. Because \(\partial_\lambda \eta_{\mu\nu} = 0\), the Christoffel connection in this case is simply

\begin{equation}\label{eq-gr-christoffel-metric-weak} \Gamma^{\lambda}_{\mu\nu} = {1\over 2}\eta^{\lambda \epsilon}\,\left(\partial_\mu h_{\nu\epsilon} + \partial_\nu h_{\epsilon\mu}-\partial_\epsilon h_{\mu\nu}\right)+\mathcal{O}(h_{\mu\nu}^2)\,. \end{equation}

As expected, the Christoffel connection is a first-order quantity, because it is zero for flat space. We then use the Christoffel connection to compute the Riemann tensor from Equation \eqref{eq-gr-riemann-tensor}, which is similarly a first-order quantity. Because the Christoffel connection is a first-order quantity, the last two terms in the Riemann tensor of Equation \eqref{eq-gr-riemann-tensor}, which involve products of two connections, vanish to first order and we have

\begin{align} R^\delta_{\phantom{\delta}\epsilon\mu\nu} & = \partial_\mu\Gamma^\delta_{\nu\epsilon}-\partial_\nu\Gamma^\delta_{\mu\epsilon}+\mathcal{O}(h_{\mu\nu}^2)\\ & = {1\over 2}\eta^{\delta \lambda}\,\left(\partial_\mu\partial_\epsilon h_{\lambda\nu}-\partial_\mu\partial_\lambda h_{\nu\epsilon} - \partial_\nu\partial_\epsilon h_{\lambda\mu}+\partial_\nu\partial_\lambda h_{\mu\epsilon}\right)+\mathcal{O}(h_{\mu\nu}^2)\,.\label{eq-gr-riemann-tensor-weak} \end{align}

From this we find the Ricci tensor using Equation \eqref{eq-gr-ricci-tensor}

\begin{equation}\label{eq-gr-ricci-tensor-weak} R_{\mu\nu} = {1\over 2}\,\left(\partial_\mu\partial_\lambda h^{\lambda}_{\phantom{\lambda}\nu}+\partial_\nu\partial_\lambda h^{\lambda}_{\phantom{\lambda}\mu}- \partial_\mu\partial_\nu h_{\lambda}^{\phantom{\lambda}\lambda}-\eta^{\delta \lambda}\partial_\delta\partial_\lambda h_{\mu\nu} \right)+\mathcal{O}(h_{\mu\nu}^2)\,. \end{equation}

Finally, the Ricci scalar from Equation \eqref{eq-gr-ricci-scalar}

\begin{equation}\label{eq-gr-ricci-scalar-weak} R =\partial_\mu\partial_\nu h^{\mu\nu}- \eta^{\mu\nu}\partial_\mu\partial_\nu h_{\lambda}^{\phantom{\lambda}\lambda}+\mathcal{O}(h_{\mu\nu}^2)\,. \end{equation}

In deriving these, we have occasionally relabeled repeated indices to use \((\mu,\nu)\) as much as possible and we’ve made ample use of the symmetry of \(\eta_{\mu\nu}\) and \(h_{\mu\nu}\) and of the exchangeability of partial derivatives. Combining the Ricci tensor and scalar to form the Einstein tensor \(R_{\mu\nu}-{1\over 2}R\,g_{\mu\nu}\) and using the fact that the cosmological constant in our Universe itself is small such that \(\Lambda h_{\mu\nu} \ll \Lambda \eta_{\mu\nu}\), we finally find the linearized field equations

\begin{align}\label{eq-gr-fieldeqs-linear} {1\over 2}\,& \left(\partial_\mu\partial_\lambda h^{\lambda}_{\phantom{\lambda}\nu} +\partial_\nu\partial_\lambda h^{\lambda}_{\phantom{\lambda}\mu}- \partial_\mu\partial_\nu h_{\lambda}^{\phantom{\lambda}\lambda}-\eta^{\delta \lambda}\partial_\delta\partial_\lambda h_{\mu\nu} \right. \\ &\qquad \qquad \left. -\eta_{\mu\nu}\partial_\delta\partial_\epsilon h^{\delta\epsilon}+\eta_{\mu\nu}\eta^{\delta\epsilon}\partial_\delta\partial_\epsilon h_{\lambda}^{\phantom{\lambda}\lambda}\right)+ \Lambda \eta_{\mu\nu}+\mathcal{O}(h_{\mu\nu}^2) = {8\pi G \over c^4}\,T_{\mu\nu}\,.\nonumber \end{align}

Before continuing, we will note without proof that even though there are ten equations relating \(h_{\mu\nu}\) to the stress-energy tensor, four of these equations are redundant, because the equations are invariant under the transformation \(h_{\mu\nu} \rightarrow h_{\mu\nu} + \varepsilon\delta_\mu\xi_\nu + \varepsilon\delta_\nu\xi_\mu\) for any four vector \(\xi^\mu\), where \(\epsilon \ll 1\). This is a form of gauge invariance that is similar to the fact that we can add a constant to the gravitational potential in Newtonian gravity and obtain the same forces and densities or to the gauge invariance in electromagnetism. We can use this gauge invariance to give \(h_{\mu\nu}\) convenient properties, by fixing the vector \(\xi^\mu\) (so-called “choosing a gauge”). Einstein’s general field equations \eqref{eq-gr-fieldeq-general} in fact have a similar gauge invariance that follows from \(g^{\mu\delta}\nabla_\delta\left(R_{\mu\nu}-{1\over 2}R\,g_{\mu\nu}+ \Lambda g_{\mu\nu}\right)=0\).

At this stage, it is standard to work in the rest frame in which the bulk velocity of the gravitating matter and energy is zero and write the components of \(h_{\mu\nu}\) suggestively as

\begin{align}\label{eq-gr-weak-metric-1} h_{00} & = -2{\Phi \over c^2}\,,\\ h_{0i} & = w_i\,,\\ h_{ij} & = 2s_{ij} -2{\Psi \over c^2}\,\delta_{ij}\,,\label{eq-gr-weak-metric-2} \end{align}

where \(\delta_{ij}\) is the Kronecker delta. These correspond to the scalar, vector, and tensor parts of the transformation of the metric under spatial rotations of the rest frame. The tensor part is further split up into a trace-free part \(s_{ij}\) and half the trace of the spatial tensor \(\Psi/c^2 = -\sum_{ij} \delta^{ij}h_{ij}/6\) (as usual, latin indexing indicates that the indices only run over the spatial dimensions; we add the explicit summation to be very clear that we intend to sum over latin indices here). The line element is then given by

\begin{equation} \mathrm{d} s^2 = -\left(1+2{\Phi \over c^2}\right)\,c^2\mathrm{d}t^2 + 2\sum_i w_i\,\mathrm{d}x^i\mathrm{d}t +\sum_{ij} \left[\left(1-2{\Psi \over c^2}\right)\delta_{ij} + 2s_{ij}\right]\mathrm{d}x^i\mathrm{d}x^j\,. \end{equation}

It turns out that we can use the gauge invariance from the previous paragraph to fix

\begin{align}\label{eq-gauge-poisson-1} \sum_i \partial_i w_i & = 0\,,\\ \sum_i \partial_i s_{ij} & =0\,,\label{eq-gauge-poisson-2} \end{align}

by fixing the gauge by requiring the four vector \(\xi^\mu\) to be the vector that leads to these constraints (this can be straightforwardly shown by demonstrating that these constraints lead to a well-defined \(\xi^\mu\) using the gauge condition). This gauge is known as the Poisson gauge (Bertschinger 1995). Plugging Equations \eqref{eq-gr-weak-metric-1}–\eqref{eq-gr-weak-metric-2} into the linearized field equations \eqref{eq-gr-fieldeqs-linear} and using the gauge conditions from Equations \eqref{eq-gauge-poisson-1}–\eqref{eq-gauge-poisson-2} and dropping the explicit \(\mathcal{O}(h_{\mu\nu}^2)\) condition expression the weakness of the gravitational field, we find that the field equations become

\begin{align}\label{eq-gr-fieldeqs-linear-components-1} {2\over c^2}\nabla^2\Psi- \Lambda &= {8\pi G \over c^4}\,T_{00}\,,\\ -{1\over 2} \nabla^2 w_j + {2\over c^2}\partial_0 \partial_j \Psi &= {8\pi G \over c^4}\,T_{0j}\,,\label{eq-gr-fieldeqs-linear-components-2}\\ {1\over c^2}\left(\delta_{ij}\nabla^2 - \partial_i\partial_j\right)\left(\Phi-\Psi\right) -\partial_0\partial_i w_j \qquad \quad &\nonumber\\- \partial_0 \partial_j w_i +{2\over c^2}\delta_{ij}\partial^2_0 \Psi - \eta^{\delta \lambda}\partial_\delta\partial_\lambda s_{ij} + \Lambda\,\delta_{ij} &= {8\pi G \over c^4}\,T_{ij}\,.\label{eq-gr-fieldeqs-linear-components-3} \end{align}

These are the general linearized field equations and they, for example, describe the behavior of gravitational waves. But we will use them here only to derive the Newtonian limit.

In the Newtonian limit, the source of gravity is non-relativistic matter (e.g., planets, gas, stars, galaxies, clusters of galaxies). In the limit of velocities that are small compared to the speed of light, the pressure and stress in a fluid is always much less than the energy density and the relevant stress-energy tensor is therefore that of pressureless dust from Equation \eqref{eq-gr-stressenergy-dust}. Because we are working in the frame where the bulk velocity of the fluid is zero, the stress-energy tensor is simply

\begin{equation} T^{\mu\nu} = \rho\,c^2\,\delta^{\mu\nu} = \begin{pmatrix} \rho c^2 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0& 0& 0& 0\end{pmatrix}\,, \end{equation}

and \(T_{\mu\nu}\) is the same matrix (up to corrections of \(\mathcal{O}[h_{\mu\nu}]\)). Furthermore, for non-relativistic matter, derivatives \(\partial_0\) of the gravitational field with respect to \(ct\) are a fraction of order \(v/c \ll 1\) of the spatial derivatives, and we can therefore set all time derivatives to zero in the linearized field equations. The \(0j\) Equation \eqref{eq-gr-fieldeqs-linear-components-2} then becomes

\begin{equation} \nabla^2 w_j = 0\,, \end{equation}

which combined with the Poisson gauge condition from Equation \eqref{eq-gauge-poisson-1} implies that

\begin{equation} w_i = 0\,, \end{equation}

for a well-defined solution at infinity. Similarly, the trace of the \(ij\) Equation \eqref{eq-gr-fieldeqs-linear-components-3} gives

\begin{equation}\label{eq-gr-lensing-poisson-lambda} 2\nabla^2\left(\Phi-\Psi\right) = -3\Lambda c^2\,, \end{equation}

and the \(00\) Equation \eqref{eq-gr-fieldeqs-linear-components-1} becomes

\begin{equation} 2\nabla^2\Psi- \Lambda c^2 = 8\pi G\rho\,, \end{equation}

Adding them together, these last two equations imply

\begin{equation}\label{eq-gr-poisson-notquite} \nabla^2\Phi = 4\pi G\rho-\Lambda c^2\,, \end{equation}

which looks a lot like the Poisson equation (3.2), except for the extra term involving \(\Lambda\)! We will soon see that this is indee d the equivalent of the Poisson equation, that is, that the potential \(\Phi\) that satisfies this equation is also the potential whose gradient sets the acceleration of a non-relativistic particle.

As discussed above, the measured value of the cosmological constant \(\Lambda \approx 1/\mathrm{Gpc}^2\) is such that it is only relevant on cosmological scales. Expressed as a fraction of the critical density, the cosmological constant is \(\Omega_\Lambda \approx 0.7\) today and smaller in the past. As we discuss in Chapter 18.2.2, virialized structures such as galaxy halos and clusters of galaxies form when the mean density in a region of the Universe exceeds about 100 times the critical density at \(z \approx 0\). Thus, at a galaxy or cluster’s virial radius, the equivalent density \(\Lambda c^2 \lesssim 1\,\%\) of the dark-matter halo’s mean density. If the halo forms earlier, the equivalent density is even smaller, because halos form at higher overdensity with respect to the critical density (\(\approx 200\)) and the critical density itself is higher in the past. Within galaxies, the density of matter is orders of magniture larger near their centers than the mean density within their virial radius (e.g., the density profile of the Milky Way in Chapter 2.2.3). Therefore, on the scale of virialized halos and everything inside of them, \(\Lambda c^2/(4\pi G) \ll \rho\) and we can actually ignore the effect of the cosmological constant and solve for the remaining unknowns in the metric. For \(\Lambda =0\), we immediately find from Equation \eqref{eq-gr-lensing-poisson-lambda} that

\begin{equation} \nabla^2\left(\Phi-\Psi\right) = 0 \end{equation}

or

\begin{equation}\label{eq-gr-lensing-equals-newtonian} \Phi = \Psi\,, \end{equation}

for a well-defined solution at infinity (that is, they both approach zero at infinity). We then also have from Equation \eqref{eq-gr-fieldeqs-linear-components-2} that \(\nabla^2 s_{ij} = 0\), which together with the Poisson gauge condition implies \(s_{ij}=0\) for a well-defined solution at infinity. Finally, Equation \eqref{eq-gr-poisson-notquite} in the limit \(\Lambda =0\) becomes the actual equivalent of the Poisson equation

\begin{equation}\label{eq-gr-poisson} \nabla^2\Phi = 4\pi G\rho\,. \end{equation}

The metric for weak gravitational fields produced by non-relativistic matter in its rest frame is therefore

\begin{equation}\label{eq-gr-metric-weak-solution} \mathrm{d} s^2 = -\left(1+2{\Phi \over c^2}\right)\,c^2\mathrm{d}t^2 +\left(1-2{\Phi \over c^2}\right)\,\left(\mathrm{d}x^2+\mathrm{d}y^2+\mathrm{d}z^2\right)\,. \end{equation}

Note that below, to be able to clearly distinguish between the dynamics of non-relativistic and relativistic particles, we will use the metric

\begin{equation}\label{eq-gr-metric-weak-solution-lensing-explicit} \mathrm{d} s^2 = -\left(1+2{\Phi \over c^2}\right)\,c^2\mathrm{d}t^2 +\left(1-2{\Psi \over c^2}\right)\,\left(\mathrm{d}x^2+\mathrm{d}y^2+\mathrm{d}z^2\right)\,, \end{equation}

that is, we do not use the constraint from Equation \eqref{eq-gr-lensing-equals-newtonian}. The potential \(\Phi\) is known as the Newtonian potential, while \(\Psi\) is known as the lensing potential or curvature potential (we’ll use curvature potential to avoid confusion with the two-dimensional lensing potential from Chapter 16). In GR, we have that \(\Phi = \Psi\) as we have just shown, but this may not hold in other theories of gravity and testing this equality is a strong test of GR, especially on cosmological scales (e.g., Reyes et al. 2010; Bertschinger 2011).

C.2.2. Geodesics of non-relativistic matter in weak gravitational fields¶

We’ve seen how GR in the limit of weak gravitational fields and non-relativistic matter sources gives rise to Equation \eqref{eq-gr-poisson} that has the same form as the Poisson equation. To establish the equivalence of GR to Newtonian gravity and classical mechanics, we further need to demonstrate that the GR equation of motion reduces to Newton’s second law in the limit of \(v \ll c\), with the same gravitational potential appearing through its gradient. The equation of motion in GR is the geodesic Equation \eqref{eq-gr-geodesic}. Because we are interested in non-relativistic matter, which follows timelike paths, we can use the proper time \(\tau\) as the affine parameter and write this as

\begin{equation}\label{eq-gr-geodesic-nonrelativistic} {\mathrm{d}^2 x^\mu \over \mathrm{d} \tau^2} +\Gamma^{\mu}_{\nu\epsilon}{\mathrm{d} x^\nu \over \mathrm{d} \tau} {\mathrm{d} x^\epsilon \over \mathrm{d} \tau} = 0\,. \end{equation}

That the matter is non-relativistic means that \(|\mathrm{d} x^i/\mathrm{d} \tau|\ll |\mathrm{d} ct/\mathrm{d} \tau|\). Keeping in mind that the Christoffel connection itself is a first-order quantity in the weak gravitational field, to first order the geodesic equation becomes

\begin{equation}\label{eq-gr-geodesic-nonrelativistic-2} {\mathrm{d}^2 x^\mu \over \mathrm{d} \tau^2} +c^2\Gamma^{\mu}_{00}\left({\mathrm{d} t \over \mathrm{d} \tau}\right)^2 = 0\,. \end{equation}

To compute the Christoffel connection, we use Equation \eqref{eq-gr-christoffel-metric-weak} for the metric in Equation \eqref{eq-gr-metric-weak-solution}, again dropping time derivatives of the metric, because they are much smaller than spatial derivatives. Then we find

\begin{equation}\label{eq-gr-christoffel-metric-weak-newton} c^2\Gamma^{\mu}_{00} = \eta^{\mu \epsilon}\partial_\epsilon \Phi\,. \end{equation}

The \(i\)-th component of Equation \eqref{eq-gr-geodesic-nonrelativistic-2} is then

\begin{equation}\label{eq-gr-newtoniangeodesic-int} {\mathrm{d}^2 x^i \over \mathrm{d} \tau^2}+ \partial_i \Phi\left({\mathrm{d} t \over \mathrm{d} \tau}\right)^2 = {\mathrm{d}^2 x^i \over \mathrm{d} t^2}\left({\mathrm{d} t \over \mathrm{d} \tau}\right)^2 + {\mathrm{d} x^i \over \mathrm{d} t}{\mathrm{d}^2 t \over \mathrm{d} \tau^2} + \partial_i \Phi\left({\mathrm{d} t \over \mathrm{d} \tau}\right)^2= 0 \,, \end{equation}

while the 0-th component gives

\begin{equation} {\mathrm{d}^2 t \over \mathrm{d} \tau^2} =\left({\mathrm{d} t \over \mathrm{d} \tau}\right)^2\,{\partial \Phi/c^2 \over \partial t}\,, \end{equation}

and after dividing by \((\mathrm{d}t/\mathrm{d}\tau)^2\), Equation \eqref{eq-gr-newtoniangeodesic-int} therefore becomes

\begin{equation} {\mathrm{d}^2 x^i \over \mathrm{d} t^2} = - {1 \over c}{\mathrm{d} x^i \over \mathrm{d} t}{\partial \Phi\over \partial [ct]} - \partial_i \Phi \,. \end{equation}

For non-relativistic matter, the first term is much smaller than the second term and in vector form we therefore get

\begin{equation}\label{eq-gr-newtons2nd} {\mathrm{d}^2 \vec{x} \over \mathrm{d} t^2} = -\nabla \Phi \,, \end{equation}

which is exactly Newton’s second law if we identify \(-\nabla \Phi\) with the gravitational force. Because \(\Phi\) obeys Equation \eqref{eq-gr-poisson}, we therefore see that \(\Phi\) is identical to the Newtonian gravitational potential: \(\Phi\) obeys the Poisson equation \eqref{eq-gr-poisson} and its gradient acts like a force in Newton’s second law in Equation \eqref{eq-gr-newtons2nd}. We further see that Newton’s laws hold in any frame that moves slowly with respect to the rest frame of the matter: From the decomposition of the metric in Equation \eqref{eq-gr-weak-metric-1}, \(\Phi\) acts like a scalar under spatial rotations and Lorentz transformations with \(v \ll c\) do not change \(\Phi\) or the manipulations of the geodesic equations either. Thus, we see that Newtonian gravity and classical mechanics is recovered in the limit of low velocities and weak gravitational fields. The gravitational potential in galaxies is \(|\Phi/c^2| \lesssim (300\,\mathrm{km\,s}^{-1} / 300,000\,\mathrm{km\,s}^{-1})^2 = 10^{-6}\) and even in large clusters, we have that \(|\Phi/c^2| \lesssim (3,000\,\mathrm{km\,s}^{-1} / 300,000\,\mathrm{km\,s}^{-1})^2 \approx 10^{-4}\), so the Newtonian limit applies to high accuracy within galaxy clusters and galaxies.

C.3. Gravitational light bending and the Shapiro delay¶

We discuss gravitational lensing by stars, compact objects, galaxies, and clusters of galaxies in Chapter 16. The basic ingredients from the general theory of relativity that we need to study gravitational lensing are the deflection and the time delay experienced by light traveling through a gravitational field. Because the gravitational fields involved are always weak, we can use the Newtonian limit from the previous section to derive these.

In the limit of a weak gravitational field, the metric is that of Equation \eqref{eq-gr-metric-weak-solution}, but we’ll use the more general form of Equation \eqref{eq-gr-metric-weak-solution-lensing-explicit} to clearly distinguish the role of the Newtonian and curvature potential. Trajectories of light are again solutions of the geodesic Equation \eqref{eq-gr-geodesic}, but unlike in the previous section, we now have to solve this for relativistic matter. Light travels on null geodesics that have \(\Delta s = 0\). Because of this, we cannot use the proper time to parameterize the light’s path, but we can use a different parameter \(\lambda\) such that the light’s trajectory is \(x^\mu(\lambda)\). The null-geodesic condition is then

\begin{equation}\label{eq-gr-null-geodesic} g_{\mu\nu}{\mathrm{d} x^\mu \over \mathrm{d} \lambda}{\mathrm{d} x^\nu \over \mathrm{d} \lambda} = 0\,. \end{equation}

For the perturbed metric \(g_{\mu\nu} = \eta_{\mu\nu} + h_{\mu\nu}\), this needs to hold to zero-th and first order. We decompose the trajectory as \(x^\mu(\lambda) = x_0^\mu(\lambda)+x_1^\mu(\lambda)\), where \(|x_1^\mu| \ll |x_0^\mu|\) and writing \(\mathrm{d}x_0^\mu / \mathrm{d}\lambda \equiv k^\mu\) and \(\mathrm{d}x_1^\mu / \mathrm{d}\lambda \equiv l^\mu\), we have that the null-geodesic condition to zero-th order implies that

\begin{equation} (k^0)^2 = (\vec{k})^2 = k^2\,, \end{equation}

where \(k\) is the length of \(\vec{k}\), the spatial part of \(k^\mu= (k^0,\vec{k})\). The zero-th order geodesic equation is simply

\begin{equation} {\mathrm{d} k^\mu \over \mathrm{d} \lambda} = 0\,, \end{equation}

and \(k^\mu\) is therefore constant; this is the straight trajectory of light in flat space. The first-order null-geodesic equation \eqref{eq-gr-null-geodesic} is

\begin{equation} \eta_{\mu\nu} k^\mu l^\nu + \eta_{\mu\nu} k^\nu l^\mu + h_{\mu\nu} k^\mu k^\nu = 0\,, \end{equation}

or substituting in the metric from Equation \eqref{eq-gr-metric-weak-solution-lensing-explicit}

\begin{equation}\label{eq-gr-veclk} \vec{l}\cdot \vec{k} - k\, l^0 = {k^2 \over c^2}\left(\Phi+\Psi\right)\,. \end{equation}

To work out the geodesic equation to first order, we compute the Christoffel connection using Equation \eqref{eq-gr-christoffel-metric-weak} for the metric in Equation \eqref{eq-gr-metric-weak-solution-lensing-explicit} and we can still drop time derivatives of the metric, because they are much smaller than spatial derivatives. However, we need more than just \(\Gamma^{\mu}_{00}\) in this case and now have to first order that

\begin{align}\label{eq-gr-christoffel-metric-weak-newton-full} c^2\Gamma^{0}_{00} & = c^2\Gamma^{0}_{ij} = c^2\Gamma^{i}_{0j} = c^2\Gamma^{i}_{j0} = 0\,,\\ c^2\Gamma^{i}_{00} & = c^2\Gamma^{0}_{0i} = c^2\Gamma^{0}_{i 0} = \partial_i \Phi\,,\\ c^2\Gamma^{i}_{jk} & = \delta_{jk}\partial_i \Psi - \delta_{ik}\partial_j \Psi - \delta_{ij}\partial_k \Psi\,. \end{align}

Because the Christoffel connection is a first-order quantity, the first-order equations of motion for \(l^\mu\) are therefore

\begin{align}\label{eq-gr-lensing-trajectory-basic-1} c^2{\mathrm{d} l^0 \over \mathrm{d} \lambda} & = -2 k\,\sum_{i}{(\partial_i \Phi)k^i}\,,\\ c^2{\mathrm{d} l^i \over \mathrm{d} \lambda} & = -k^2(\partial_i [\Phi + \Psi]) +2k^i \sum_{j} (\partial_j \Psi) k^j\,.\label{eq-gr-lensing-trajectory-basic-2} \end{align}

The first of these equations has the solution

\begin{equation}\label{eq-gr-lensing-trajectory-l0} l^0 = -{2k \over c^2} \Phi\,, \end{equation}

through direct integration and using the boundary condition that \(l^0 = 0\) for \(\Phi = 0\). The angle between the direction of light \(\vec{k}\) to zero-th order and the deflection \(\vec{l}\) is then given by plugging this into Equation \eqref{eq-gr-veclk}

\begin{equation}\label{eq-gr-photon-trajectory-ldotk} \vec{l}\cdot \vec{k} = {k^2 \over c^2}\left(\Psi - \Phi\right)\,. \end{equation}

This is zero when the curvature and Newtonian potential are equal, as is the case in GR. But we’ll work out the general case here. The observable deflection angle is the component of \(\vec{l}\) that is perpendicular to \(\vec{k}\): \(\vec{l}_\perp = \vec{l} - k^{-2}(\vec{l}\cdot \vec{k})\vec{k}\). To obtain \(\mathrm{d} \vec{l}_\perp / \mathrm{d} \lambda\), we can write Equation \eqref{eq-gr-lensing-trajectory-basic-2} as

\begin{equation} c^2{\mathrm{d} l^i \over \mathrm{d} \lambda} = -k^2(\partial_i [\Phi + \Psi]) +k^i \sum_{j} (\partial_j [\Phi + \Psi]) k^j +k^i \sum_{j} (\partial_j [\Psi - \Phi]) k^j \,. \end{equation}

The last term on the right-hand side is parallel to \((\mathrm{d}\vec{l}/\mathrm{d}\lambda\cdot \vec{k})\vec{k}\) and thus does not enter into \(\mathrm{d} \vec{l}_\perp / \mathrm{d} \lambda\), while the first two terms are proportional to the projection of the gradient \(\nabla (\Phi + \Psi)\) onto the plane perpendicular to \(\vec{k}\) and these two terms thus give \(\mathrm{d} \vec{l}_\perp / \mathrm{d} \lambda\) as

\begin{equation} c^2{\mathrm{d} \vec{l}_\perp \over \mathrm{d} \lambda} = -k^2\,\nabla_\perp \left(\Phi + \Psi\right)\,, \end{equation}

where \(\nabla_\perp f= \nabla f - k^{-2}\,(\vec{k}\cdot \nabla f)\,\vec{k}\).

The observed deflection angle \(\hat{\boldsymbol{\alpha}}\) is a two-dimensional vector given by

\begin{equation} \hat{\boldsymbol{\alpha}} = -{\Delta \vec{l}_\perp \over k}\,, \end{equation}

where the minus sign comes from the fact that we look backwards along the light path. We can work out this deflection angle as

\begin{align} \hat{\boldsymbol{\alpha}} & = -{1 \over k}\int \mathrm{d} \lambda\, {\mathrm{d} \vec{l}_\perp \over \mathrm{d} \lambda}\\ & = k\,\int \mathrm{d} \lambda\, \nabla_\perp \left({\Phi + \Psi \over c^2}\right)\\ & = \int \mathrm{d} s \,\nabla_\perp \left({\Phi + \Psi \over c^2}\right)\,, \end{align}

where in the last step we integrate over the unperturbed path \(\mathrm{d}s = k\mathrm{d}\lambda\), because the deflection angle is small. Because \(\Phi = \Psi\) in the general theory of relativity, the GR-specific result is

\begin{align}\label{eq-gr-light-bend-alpha} \hat{\boldsymbol{\alpha}} & = 2\int \mathrm{d} s\, \nabla_\perp \left({\Phi \over c^2}\right)\,. \end{align}

Thus, we see that the reason that Einstein’s prediction for the gravitational bending of light is twice the Newtonian prediction is that light travels according to two potentials, the Newtonian and the curvature potential, which are equal. The motion of non-relativistic matter, as we saw in the previous section, is solely determined by the Newtonian potential.

Another consequence of the GR equations of motion for light is that light moving in a gravitational field takes longer to reach us than light moving in a flat background. There are two contributions to this: (i) the fact that the curved trajectory is geometrically longer and therefore takes longer to traverse; we’ll call this delay \(\Delta t_\mathrm{geom}\). And (ii) the fact that time appears to slow down along light’s trajectory in the presence of a gravitational field, which we’ll call the gravitational time delay \(\Delta t_\mathrm{Shapiro}\). Both are small compared to the time it takes light to traverse the unperturbed trajectory on galactic or cosmological scales. Thus, we can compute \(\Delta t_\mathrm{geom}\) as the difference in the (spatial) lengths of the curved and straight trajectories divided by the speed of light in vacuum \(c\), \(\Delta t_\mathrm{Shapiro}\) from the slowed-down unperturbed trajectory, and obtain the total time delay as \(\Delta t = \Delta t_\mathrm{geom} + \Delta t_\mathrm{Shapiro}\). The geometric time delay then does not require any further ingredients from GR and is discussed further in Chapter 16.

In the formalism that we are using here, the gravitational time delay has two components: (i) the fact that coordinate time \(\Delta t_\mathrm{coord} = (1/c)\int \mathrm{d} \lambda \,\mathrm{d} x^0 / \mathrm{d} \lambda\) elapses more slowly in a gravitational field and the fact that part of the perturbation \(\vec{l}\) is along the unperturbed trajectory (see Equation \ref{eq-gr-photon-trajectory-ldotk}) and we therefore need to subtract this. In GR, the latter contribution is zero because \(\Phi = \Psi\), but as before we’ll work out the more general case where we may have that \(\Phi \neq \Psi\). The total gravitational time delay is then

\begin{equation} c\Delta t_\mathrm{Shapiro} = \int \mathrm{d} \lambda \,\left( l^0 - l_\parallel\right)\,, \end{equation}

where \(l_\parallel = (\vec{l} \cdot \vec{k})/k\). From the discussion above, it is clear that

\begin{equation} c^2\,{\mathrm{d} l_\parallel \over \mathrm{d} \lambda} = k (\vec{k}\cdot\nabla [\Psi - \Phi])\,, \end{equation}

with solution

\begin{equation} l_\parallel = {k \over c^2} \left(\Psi-\Phi\right)\,, \end{equation}

through direct integration and using the boundary condition that \(l_\parallel = 0\) for \(\Phi-\Psi = 0\). Combining this with Equation \eqref{eq-gr-lensing-trajectory-l0}, we then get

\begin{align} c\Delta t_\mathrm{Shapiro} & = - \int \mathrm{d} s \left({\Phi + \Psi \over c^2}\right)\,, \end{align}

where as above \(\mathrm{d}s = k\mathrm{d}\lambda\), because we integrate over the unperturbed trajectory. When \(\Phi = \Psi\) as the general theory of relativity predicts, this becomes

\begin{align}\label{eq-gr-gravtimedelay} c\Delta t_\mathrm{Shapiro} & = - 2\int \mathrm{d} s \left({\Phi \over c^2}\right)\,. \end{align}

This gravitational time delay was first derived by Shapiro (1964) as a test of the general theory of relativity, which it passed with flying colors (e.g., Shapiro et al. 1971; Reasenberg et al. 1979). Writing \(\Psi = \gamma \Phi\), measuring the value of \(\gamma\) becomes a strong test of GR, with the GR prediction being \(\gamma = 1\) (\(\gamma\) is a parameter in the parameterized post-Newtonian formalism). In the solar system, measurements of the Shapiro delay from the Cassini spacecraft of radio signals as they pass near the Sun give \(\gamma - 1 = (2.1 \pm 2.3)\times 10^{-5}\) (Bertotti et al. 2003). However, \(\gamma\) could depend on scale in alternatives to GR and constraints on the scales of galaxies are significantly weaker, e.g., \(\gamma = 0.97 \pm 0.09\) from comparing the non-relativistic, \(\Phi\)-dependent kinematics of (non-relativistic) stars in galaxy that also acts as a \((\Phi+\Psi)\)-dependent gravitational lens (Collett et al. 2018).

C.4. Homogeneous and isotropic cosmological models¶

Another important application of the general theory of relativity in the context of galaxies is using it to describe the structure and evolution of the Universe on the largest scales. This is the realm of cosmology. In Chapter 18, which describes the growth of initial density fluctuations in the Universe and how they gravitationally collapse to form galaxies, we depend on basic results for the evolution of the Universe. We will describe those here, but refer the reader to more advanced cosmology texts such as Dodelson & Schmidt (2020) for further details.

C.4.1. The Friedmann–Lemaître–Robertson–Walker metric¶

The basic assumption behind the cosmological models that we describe here, and that are those generally used to describe our \(\Lambda\)CDM Universe, is that on large scales, the 3D spatial structure of the Universe is both isotropic and homogeneous. Isotropy means that the Universe has the same properties in all directions when viewed from a point, say, the Earth. Homogeneity means that there is no special location in the Universe, that is, each location is statistically equivalent to all other positions. Establishing that isotropy and homogeneity on large scales is a good assumption is not easy and it was for a long time taken as an axiom. By observing the Univere in different directions, it is relatively easy to determine that the Universe appears isotropic as observed from Earth (e.g., the temperature of the cosmic microwave background [CMB] is the same to about one part in \(10^5\) after we account for the motion of the Earth; Smoot et al. 1992), but to determine homogeneity we need to establish the isotropy of the Universe around two separate points. Clever use of distortions in the spectrum of the CMB can be used to do just that (Goodman 1995). But stronger constraints come from large surveys of the redshifts of galaxies, which can also directly determine the homogeneity of the Universe by determining the density distribution around many different galaxies and showing that it is statistically the same. Using surveys of galaxies and quasars, recent surveys have demonstrated that the Universe is statistically-homogeneous on scales \(\gtrsim 100\,\mathrm{Mpc}\) (Scrimgeour et al. 2012; Laurent et al. 2016). Thus, observational evidence is strongly in favor of the simplifying assumptions of isotropy and homogeneity.

To derive a cosmological model, we must then solve Einstein’s field equations \eqref{eq-gr-fieldeq} for metrics that are consistent with the assumptions of isotropy and homogeneity. Starting with the assumption of isotropy, we require that the metric does not change under spatial rotations. We therefore have to build the line element \(\mathrm{d}s^2\) out of components that are invariant under spatial rotations. Because the metric only involves the spatial coordinates as \(x^i\), \(\mathrm{d} x^i\), we can only use the following rotationally-invariant combinations: \(\tilde{r} = \sum_i (x^i)^2\), \(\sum_i x^i\,\mathrm{d} x^i\), and \(\sum_i \mathrm{d} x^i \mathrm{d} x^i\) (we again explicitly sum over the latin spatial indexes). These we combine with the time coordinate that can appear as \(\tilde{t}\) or \(\mathrm{d}\tilde{t}\) to form the most general metric that is invariant under spatial rotations

\begin{equation} \mathrm{d}s^2 = -a(\tilde{r},\tilde{t})\,c^2\mathrm{d}\tilde{t}^2 + b(\tilde{r},\tilde{t})\sum_i{x^i \,c\mathrm{d}\tilde{t}\mathrm{d}x^i} + c(\tilde{r},\tilde{t})\sum_{ij} x^i x^j \mathrm{d}x^i \mathrm{d}x^j + d(\tilde{r},\tilde{t})\sum_i{\mathrm{d}x^i \mathrm{d} x^i}\,. \end{equation}

where the \(a(\cdot)\), \(b(\cdot)\), \(c(\cdot)\), and \(d(\cdot)\) are general functions for now. Because rotational invariance is more obvious in spherical coordinates, we can re-write this in spherical coordinates as (see Chapter A.1)

\begin{equation} \mathrm{d}s^2 = -a(\tilde{r},\tilde{t})\,c^2\mathrm{d}\tilde{t}^2 + \tilde{r}\,b(\tilde{r},\tilde{t})\,c\mathrm{d}t\mathrm{d}\tilde{r} + \left[\tilde{r}^2\,c(\tilde{r},\tilde{t})+d(\tilde{r},\tilde{t})\right] \mathrm{d}\tilde{r}^2 + \tilde{r}^2\,d(\tilde{r},\tilde{t})\,\left[\mathrm{d}\theta^2 +\sin^2 \theta \mathrm{d}\phi^2\right]\,. \end{equation}

This can be simplified by re-defining the coordinates, which we have the freedom to do. We can remove the \(d(\tilde{r},t)\) function by re-defining \(\hat{r} = \tilde{r}\sqrt{d(\tilde{r},\tilde{t})}\); then the metric becomes

\begin{equation} \mathrm{d}s^2 = -\hat{a}(\hat{r},\tilde{t})\,c^2\mathrm{d}\tilde{t}^2 + \hat{r}\,\hat{b}(\hat{r},\tilde{t})\,c\mathrm{d}t\mathrm{d}\hat{r} + \hat{c}(\hat{r},\tilde{t}) \mathrm{d}\hat{r}^2 + \hat{r}^2\,\left[\mathrm{d}\theta^2 +\sin^2 \theta \mathrm{d}\phi^2\right]\,, \end{equation}

where the functions \(\hat{a}(\cdot)\), \(\hat{b}(\cdot)\), and \(\hat{c}(\cdot)\) are related to the un-hatted versions by the radial-coordinate transformation, but we do not have to worry about how. We can furthermore get rid of \(\hat{b}(\cdot)\) by redefining the time coordinate to \(\hat{t}\) that is related to \(\tilde{t}\) through \(\tilde{t} = f(\hat{r},\hat{t})\) where \(\hat{b}(\hat{r},\hat{t}) = 2\hat{a}(\hat{r},\hat{t})\partial_{\hat{r}} f(\hat{r},\hat{t})\). The resulting metric can be written in the following form

\begin{equation} \mathrm{d}s^2 = -e^{2\alpha(\hat{r},\tilde{t})}\,c^2\mathrm{d}\tilde{t}^2 + + e^{2\beta(\hat{r},\tilde{t})}\,\mathrm{d}\hat{r}^2 + \hat{r}^2\,\left[\mathrm{d}\theta^2 +\sin^2 \theta \mathrm{d}\phi^2\right]\,. \end{equation}

So far, we have only used isotropy of the metric around a single point, but next we want to add the additional constraints from spatial homogeneity (as written so far, this metric is the one that leads to the Schwarzschild solution). That is, the metric needs to be the same no matter which spatial point is used as the origin of the spherical coordinate system. This immediately demands that \(\alpha(\hat{r},\tilde{t}) \equiv \alpha(\tilde{t})\) and we can absorb the remaining \(\alpha(\tilde{t})\) through a re-definition of the time coordinate that satisfies \(\mathrm{d}t = e^{\alpha(\tilde{t})} \mathrm{d}\tilde{t}\). Thus, we arrive at

\begin{equation}\label{eq-gr-frw-line-element-almostthere} \mathrm{d}s^2 = -c^2\mathrm{d}t^2 + \left\{e^{2\beta(\hat{r},t)}\,\mathrm{d}\hat{r}^2 + \hat{r}^2\,\left[\mathrm{d}\theta^2 +\sin^2 \theta \mathrm{d}\phi^2\right]\right\}\,. \end{equation}

The requirement that the spatial part of the metric—the part between the curly braces, which is conventially denoted as \(\gamma_{ij}\)—is independent of the origin is equivalent to the statement that the Riemann tensor only involves tensors that are invariant under spatial translations, spatial rotations, or Lorentz transformations. The only such tensor that can be part of the Riemann tensor turns out to be the metric itself (the Kronecker tensor and the so-called Levi-Civita tensor are other invariant tensors, but they cannot be included while keeping the symmetries of the Riemann tensor). Thus, the Riemann tensor for the three-dimensional spatial space has to be

\begin{equation} \tilde{R}_{ijkl} = \tilde{k}(t)\,\left(\gamma_{ik}\gamma_{jl}-\gamma_{il}\gamma_{jk}\right)\,, \end{equation}

which is the unique combination that has the symmetries of the Riemann tensor and where \(k\) is a constant and we use \(\tilde{R}_{ijkl}\) to distinguish this from the Riemann tensor of four-dimensional spacetime. The Ricci tensor is

\begin{equation}\label{eq-gr-frw-Riccitensor-maxsym} \tilde{R}_{ij} = 2\tilde{k}(t)\,\gamma_{ij}\,, \end{equation}

and the Ricci scalar is

\begin{equation} \tilde{R} = 6\tilde{k}(t)\,, \end{equation}

where we have kept the time-dependence of the curvature scalar \(\tilde{k}\) explicit. Thus, for the metric of Equation \eqref{eq-gr-frw-line-element-almostthere} to be spatially homogeneous, its Ricci tensor has to satisfy Equation \eqref{eq-gr-frw-Riccitensor-maxsym}. Computing the Ricci tensor for the metric of Equation \eqref{eq-gr-frw-line-element-almostthere} and equating it to twice \(\tilde{k}(t)\) times the metric gives

\begin{align} \tilde{R}_{11}& = {2\over \hat{r}}\partial_{\hat{r}} \beta(\hat{r},t) = 2\tilde{k}(t)\,e^{2\beta(\hat{r},t)} \,,\\ \tilde{R}_{22}= \tilde{R}_{33}/\sin^2 \theta& = e^{-2\beta(\hat{r},t)}\left[\hat{r}\partial_{\hat{r}} \beta(\hat{r},t)-1\right]+1 = 2\tilde{k}(t)\hat{r}^2 \,. \end{align}

The first of these equations has the solution \(e^{-2\beta(\hat{r},t)} = C-\tilde{k}(t)\hat{r}^2\) and the other equation fixes \(C = 1\) such that the resulting metric is

\begin{equation}\label{eq-gr-frw-line-element-justaboutthere} \mathrm{d}s^2 = -c^2\mathrm{d}t^2 + {1\over 1-\tilde{k}(t)\hat{r}^2}\,\mathrm{d}\hat{r}^2 + \hat{r}^2\,\left[\mathrm{d}\theta^2 +\sin^2 \theta \mathrm{d}\phi^2\right]\,. \end{equation}

Finally, we re-define the radial coordinate one last time to absorb all but the sign \(k= \{-1,0,+1\}\) of \(\tilde{k}\) and an overall normalization into the radial coordinate with the transformation \(r\,a(t) = r\, R(t)/R_0 = \hat{r}\) with \(R(t) = 1/\sqrt{|\tilde{k}(t)|}\) (the spatial Ricci scalar is then \(\tilde{R} = 6k/R(t)^2\)). We then arrive at the final form of the isotropic and homogeneous metric

\begin{equation}\label{eq-gr-frw-line-element} \mathrm{d}s^2 = -c^2\mathrm{d}t^2 + a^2(t)\left\{{1\over 1-kr^2/R_0^2}\,\mathrm{d}r^2 + r^2\,\left[\mathrm{d}\theta^2 +\sin^2 \theta \mathrm{d}\phi^2\right]\right\}\,. \end{equation}

This is the famous Friedmann–Lemaître–Robertson–Walker metric (often just Robertson–Walker metric or FLRW metric).

For \(k=0\), it is clear that the FLRW metric is that of flat space, because the spatial part is then simply \(\mathrm{d}x^2+\mathrm{d}y^2+\mathrm{d}z^2\) written in spherical coordinates

\begin{equation}\label{eq-gr-frw-line-element-zerok} \mathrm{d}s^2 = -c^2\mathrm{d}t^2 + a^2(t)\left\{\mathrm{d}\chi^2 + \chi^2\,\left[\mathrm{d}\theta^2 +\sin^2 \theta \mathrm{d}\phi^2\right]\right\}\,, \end{equation}

where we have written \(r = \chi\) for consistency with the \(k\neq 0\) spaces below. Note that this doesn’t fix the global structure of space to be \(\mathbb{R}^3\), but could also for example be the product of three one-dimensional tori, which is also a flat space. The Ricci scalar \(\tilde{R} = 0\) at all times and space is therefore flat at all times. For \(k=+1\), the FLRW metric can be written as

\begin{equation}\label{eq-gr-frw-line-element-posk} \mathrm{d}s^2 = -c^2\mathrm{d}t^2 + a^2(t)\left\{\mathrm{d}\chi^2 + R_0^2\,\sin^2(\chi/R_0)\,\left[\mathrm{d}\theta^2 +\sin^2 \theta \mathrm{d}\phi^2\right]\right\}\,, \end{equation}

where

\begin{equation}\label{eq-gr-comoving-sphere} \chi = R_0\,\sin^{-1} \left(r/R_0\right)\,. \end{equation}

The spatial metric within the curly braces is that of a sphere in three-dimensions, which has positive curvature. The curvature of the entire space evolves as \(\tilde{R} = 6/R(t)^2\), thus, as space expands, the curvature decreases. The three-dimensional sphere is the only such physically-realistic space; because it is finite, \(k=+1\) spaces are called closed. For \(k=-1\), the FLRW metric can be written as

\begin{equation}\label{eq-gr-frw-line-element-negk} \mathrm{d}s^2 = -\mathrm{d}t^2 + a^2(t)\left\{\mathrm{d}\chi^2 + R_0^2\,\sinh^2(\chi/R_0)\,\left[\mathrm{d}\theta^2 +\sin^2 \theta \mathrm{d}\phi^2\right]\right\}\,, \end{equation}

where

\begin{equation}\label{eq-gr-comoving-hyper} \chi = R_0\,\sinh^{-1} \left(r/R_0\right)\,. \end{equation}

Such a space is a three-dimensional generalization of hyperboloids, which can be infinite in size and such spaces are therefore referred to as open. These three versions of the metric are what’s used to compute distances in FLRW Universe models. It is conventional to choose \(R_0\) such that \(a(t)=1\) at the present time. In that case, \(\chi\) is the comoving distance, the distance between objects in the Universe that factors out the overall change \(R(t)\) to the distances of objects because of the expansion or contraction of the Universe. The parameter \(a\) is then the scale factor and it gives the size of the Universe relative to what it is today. The actual distance \(D\) between objects in the Universe at a given time is the proper distance, which is obtained by multiplying the comoving distance with the scale factor: \(D = a\,\chi\).

C.4.2. The Friedmann equations¶

The evolution of isotropic and homogeneous Universes is entirely determined by the time evolution of the scale factor \(a\). To solve for this evolution for a given model for the matter and energy content of the Universe, we need to solve Einstein’s field equations \eqref{eq-gr-fieldeq} and for this we need to determine the Einstein tensor. As usual, to do this, we need to compute the Christoffel connection, the Riemann tensor, and then the Ricci tensor and scalar, starting from Equation \eqref{eq-gr-christoffel-metric}. This is straightforward for the FRW metric in Equation \eqref{eq-gr-frw-line-element} and we simply state the result here. The non-zero components of the Ricci tensor are the diagonal elements

\begin{align} c^2R_{00} & = -3{\ddot{a} \over a}\,,\\ c^2R_{11} & = {a\,\ddot{a}+2\dot{a}^2+2c^2k/R_0^2 \over 1-kr^2/R_0^2}\,,\\ c^2R_{22} = c^2R_{33}/\sin^2 \theta & = r^2\,\left(a\ddot{a}+2\dot{a}^2+2c^2k/R_0^2\right)\,, \end{align}

and the Ricci scalar is

\begin{equation} c^2R = 6\left[{\ddot{a} \over a} + \left({\dot{a} \over a}\right) +{c^2k\over a^2 R_0^2}\right]\,. \end{equation}

Next, we need to specify the stress-energy tensor of the matter and energy components that make up the Universe. The assumption of isotropy and homogeneity also applies to \(T^{\mu\nu}\) and similar to how we proceeded for the metric, we can constrain the form of \(T^{\mu\nu}\) by requiring that it is isotropic and spatially homogeneous. Working in the frame in which the matter is at rest, the \(00\) component of \(T^{\mu\nu}\) is the energy density \(\rho c^2\) and because of the requirement of homogeneity, this can only depend on time: \(\rho c^2 \equiv \rho(t) c^2\). The \(0i\) components are the momentum density and for the same reason they can only depend on time, but we saw in Section C.1 that these components are also equal to the energy flux in space and if they are non-zero, they would induce spatial inhomogeneity. We therefore must have that \(T^{0i} = T^{i0} = 0\). Finally, the purely-spatial components \(T^{ij}\) must be built from spatial tensors that are invariant under rotations and translations and again the only option is the spatial metric \(\gamma^{ij}\) itself, multiplied by a function of time. Therefore, the stress energy tensor is given by

\begin{align} T^{00} & = \rho(t)c^2\,,\\ T^{ij} & = p(t)\,\gamma^{ij}\,, \end{align}

where the spatial part of the metric, \(\gamma_{ij}\) is the part of the FLRW metric in Equation \eqref{eq-gr-frw-line-element} between the curly braces, \(\rho(t)c^2\) is the energy density, and \(p(t)\) is the pressure. Comparing this to Equation \eqref{eq-gr-stressenergy-perfectfluid}, we see that the stress-energy tensor has to be that of a perfect fluid. The trace of \(T^{\mu\nu}\) is

\begin{equation} T = -\rho(t)c^2 + 3p(t)\,. \end{equation}

We can use this stress-energy tensor to write down the field equations in the form of Equation \eqref{eq-gr-fieldeq-general-alt}; the only non-trivial components are equivalent to the following equations (from now on, we drop the explicit time dependence of \(\rho\) and \(p\) to reduce notational clutter)

\begin{align} -3{\ddot{a} \over a} + \Lambda c^2 & = 4\pi G\left(\rho + 3{p\over c^2}\right)\,,\\ {\ddot{a}\over a}+2\left({\dot{a} \over a}\right)^2+2{c^2k \over a^2R_0^2} - \Lambda c^2 & = 4\pi G\left(\rho - {p\over c^2}\right)\,, \end{align}

because the other equations are either trivially satisfied (\(0=0\)) or equivalent to the second one. We can use the first equation to remove the \(\ddot{a}\) term in the second equation to arrive at

\begin{equation}\label{eq-gr-friedmann-1} H^2 = \left({\dot{a} \over a}\right)^2= {8\pi G\rho\over 3}+{\Lambda c^2\over 3}-{c^2k \over a^2R_0^2}\,, \end{equation}

where \(H = \dot{a}/a\) is the Hubble parameter. Together with the first equation written as

\begin{equation}\label{eq-gr-friedmann-2} \dot{H} + H^2 = {\ddot{a} \over a} = -{4\pi G\over 3}\left(\rho + 3{p\over c^2}\right) + {\Lambda c^2\over 3}\,, \end{equation}

these two equations make up the Friedmann equations.

To solve the Friedmann equations, we need to specify the functions \(\rho(t)\) and \(p(t)\). The standard approach is to use an equation of state to relate the energy density to the pressure and the typically-used equations of state can all be written as (see the discussion in Chapter 18.1.2)

\begin{equation}\label{eq-gr-eq-state-constant-w} p = w \rho c^2\,, \end{equation}

where \(w\) is a constant. For example, for pressureless dust, which describes the behavior of non-relativistic matter (ordinary and dark) on large scales well, we have that \(w = 0\) (see Equation \ref{eq-gr-stressenergy-dust}). Light has \(w = 1/3\). It is also common to convert the cosmological constant term \(\propto \Lambda\) in the Friedmann equations into an energy component by using \(\rho = \Lambda c^2 / [8\pi G]\) and \(w=-1\); in this case the cosmological constant is referred to as dark energy. To determine how the density and pressure of a perfect fluid with equation of state \eqref{eq-gr-eq-state-constant-w} depends on the scale factor \(a\), we can use the conservation of the stress-energy tensor, \(\nabla_\mu T^{\mu\nu} = 0\). This gives

\begin{align} 0 = c \nabla_\mu T^{\mu\nu} & = -\dot{\rho}c^2 -3{\dot{a} \over a}\left(\rho c^2 + p\right)\\ & = -\dot{\rho}c^2 -3{\dot{a} \over a}\,\rho c^2\left(1+w\right)\,, \end{align}

or

\begin{equation} {\dot{\rho}\over \rho} =-3{\dot{a} \over a}\left(1+w\right)\,, \end{equation}

with solution

\begin{equation}\label{eq-gr-rho-perfectfluid-constantw-intime} \rho \propto a^{-3\left(1+w\right)}\,. \end{equation}

This has the expected behavior that the energy density of non-relativistic matter, which is largely rest mass, dilutes as \(1/a^3\) and that the energy density of light dilutes as \(1/a^4\), because of the additional redshifting effect of expanding space. Dark energy with \(w=-1\) has a constant density \(\rho(a) = \mathrm{constant}\). Because light’s wavelength is \(\propto 1/a\), light emitted at earlier times and observed today shows a redshift \(z\) that is given by \(1+z = 1/a\). Because of this equation and because observations of the earlier Universe generally involve redshifted light, the redshift \(z\) is often used instead of \(a\) when discussing cosmology. Any function below that is a function of the scale factor can be equivalently considered a function of the redshift \(z = 1/a-1\).

Writing the cosmological constant as a density component and assuming that there are \(N\) perfect-fluid energy components following Equation \eqref{eq-gr-eq-state-constant-w} indexed by \(i\), we can write the first Friedmann equation \eqref{eq-gr-friedmann-1} as

\begin{align}\label{eq-gr-friedmann-density-parameter} {H^2 \over H_0^2} & = \Omega_{0,k}\,a^{-2} + \sum_i \Omega_{0,i} a^{-(1+w_i)}\,, \end{align}

where \(H_0\) is the value of the Hubble parameter today (the Hubble constant), the density parameters \(\Omega_{0,i}\) are the energy density of component \(i\) expressed in terms of today’s critical density

\begin{equation} \rho_{0,c} = {3H_0^2\over 8\pi G} \end{equation}

as

\begin{equation} \Omega_{0,i} = {\rho_{0,i} \over \rho_{0,c}}\,, \end{equation}

and

\begin{equation} \Omega_{0,k} = 1-\sum_i \Omega_{0,i} = -{c^2k\over H^2_0 R_0^2} \end{equation}

represents the curvature component as a density parameter as well. We have defined the critical density and curvature parameter here at the present time, but these definitions can be generalized to any time; all of these quantities in general change with time.

The energy density of our Universe is dominated by matter (both ordinary and dark with combined density parameter \(\Omega_{0,m}\)), dark energy (or, equivalently, a non-zero cosmological constant; density parameter \(\Omega_{0,\Lambda}\)), and radiation (with density parameter \(\Omega_{0,r}\)). Thus, for our Universe, we can write Equation \eqref{eq-gr-friedmann-density-parameter} as

\begin{align}\label{eq-gr-friedmann-matter-de-radiation-curvature} E^2(a) = {H(a)^2 \over H_0^2} & = \Omega_{0,m}\,a^{-3} + \Omega_{0,\Lambda}+\Omega_{0,r}\,a^{-4}+\Omega_{0,k}\,a^{-2}\,, \end{align}

where we have introduced the function \(E(a)\) that we will use further below. The density parameters of the various energy components can be constrained using observations of the CMB and of the large-scale structure of galaxies. These constrain the curvature to be very small (Planck Collaboration et al. 2020)

\begin{equation} \Omega_{0,k} = 0.001\pm0.002\,. \end{equation}

Assuming then that the Universe is flat, the values of the other density parameters are

\begin{align} \Omega_{0,m} & = 0.3111 \pm 0.0056\,,\\ \Omega_{0,\Lambda} & = 0.6889 \pm 0.0056\,,\\ \Omega_{0,r} & = (9\pm0.4)\times 10^{-5}\,. \end{align}

Thus, at the present time, the energy density of the Universe is dominated by the dark energy component, but with a substantial contribution from matter.

Because of the different behavior of the energy density with scale factor for different \(w\) (Equation \ref{eq-gr-rho-perfectfluid-constantw-intime}), different components dominate the energy density at different times. While the energy density in radiation is negligible today, because it behaves as \(a^{-4}\), at early times, radiation dominates the energy budget. This only lasts until a redshift of \(z_{\mathrm{eq}} \approx 3,400\), the redshift at which the energy density of radiation and matter are equal (found from \(\Omega_{0,m} (1+z)^{3} = \Omega_{0,r}(1+z)^{4}\)). This is before \(z\approx 1,000\) when the CMB forms and baryons decouple from light. At \(1 \lesssim z \lesssim 3,400\), matter (ordinary and dark) dominates the energy budget and the cosmological constant (or dark energy) term in the Friedmann equations is negligible. At \(z \lesssim 1\), the cosmological constant becomes important.

Because the formation of galaxies largely happens at \(z \gtrsim 1\), a Universe consisting of matter only is a good and simple approximation for the background cosmology that galaxies form in. Because our Universe is consistent with being flat, such a Universe has a matter density parameter \(\Omega_{0,m} = 1\). In that case, we can solve the Friedmann equations to give

\begin{equation} a \propto t^{2/3}\,, \end{equation}

and

\begin{equation} H = {2 \over 3} t^{-1}\,. \end{equation}

This Universe model is known as the Einstein-–de Sitter model. We use this in Chapter 18 to investigate the growth of initial density perturbations into the galaxies that we see today.

C.4.3. The angular diameter distance¶

In an expanding (or contracting) Universe, the distance in static, flat space is replaced by a whole bevy of different distances depending on the context. For example, the distance \(D_A\) that relates an object’s small angular size \(\delta\theta\) to its physical size \(\delta s\) through

\begin{equation} \delta\theta = {\delta s \over D_A}\,, \end{equation}

is known as the angular diameter distance and we need this in our discussion of gravitational lensing in Chapter 16. Placing the origin at the observer’s position at \(a=1\) and assuming that we are observing an object at scale factor \(a\), then we can determine the angular diameter distance by first converting the object’s size \(\delta s\) to the observer’s time by dividing by \(a\) to account for the change in the Universe’s size. Then we have that \(\delta \theta = (\delta s / a) /r\), where \(r\) is the coordinate distance. Therefore

\begin{equation} D_A = a\,r\,. \end{equation}

If we are observing the object using light (which we generally are!), then we can compute \(r\) on a null geodesic where \(\mathrm{d} s = 0\) and thus \(c\mathrm{d}t = a\,\sqrt{1/(1-kr^2/R_0^2)}\mathrm{d} r = a\,\mathrm{d} \chi\) from Equation \eqref{eq-gr-frw-line-element}. Then we can obtain \(r = r(\chi)\) by inverting the relevant equation (\(r=\chi\), Equation \ref{eq-gr-comoving-sphere}, or Equation \ref{eq-gr-comoving-hyper} depending on whether \(k=0,+1,-1\)) where the comoving distance is

\begin{equation}\label{eq-gr-comoving-distance} \chi = c\int{\mathrm{d}t \over a} = c\int{\mathrm{d}a \over \dot{a} a}\,. \end{equation}

To solve this integral, we use the expression for \(\dot{a}/a\) for our Universe from Equation \eqref{eq-gr-friedmann-matter-de-radiation-curvature}. The comoving distance from Equation \eqref{eq-gr-comoving-distance} is given by

\begin{align} \chi & = {c\over H_0} \int_{1}^a{\mathrm{d}a' \over a'^2\,E(a')}\\ & = {c\over H_0} \int_{1}^a{\mathrm{d}a \over a^2}{1 \over \sqrt{\Omega_{0,m}\,a^{-3} + \Omega_{0,\Lambda}+\Omega_{0,r}\,a^{-4}+\Omega_{0,k}\,a^{-2}}}\\ & = {c\over H_0} \int_{0}^z\mathrm{d}z\,{1 \over \sqrt{\Omega_{0,m}\,(1+z)^{3} + \Omega_{0,\Lambda}+\Omega_{0,r}\,(1+z)^{4}+\Omega_{0,k}\,(1+z)^{2}}}\,,\label{eq-gr-comovingdist-explicit} \end{align}

where we have written the integral either as an integral over scale factor or redshift. The angular diameter distance is then

\begin{equation}\label{eq-gr-angulardist-explicit} D_A = a \times \begin{cases} {c \over H_0\sqrt{\Omega_{0,k}}}\,\sin\left(\chi\,{H_0\sqrt{\Omega_{0,k}}\over c}\right)\,, & k =+1\\ \chi\,, & k = 0\\ {c \over H_0\sqrt{-\Omega_{0,k}}}\,\sinh\left(\chi\,{H_0\sqrt{-\Omega_{0,k}}\over c}\right)\,, & k = -1\end{cases}\,. \end{equation}

To compute the angular diameter distance between two scale factors \(a_1\) and \(a_2\) (or redshifts \(z_1\) and \(z_2\)) observing an object at \(a_2\) from \(a_1\), compute the integral in Equation \eqref{eq-gr-comovingdist-explicit} between these two scale factors and replace the prefactor \(a\) in Equation \eqref{eq-gr-angulardist-explicit} by \(a_2/a_1\).

Dynamics and Astrophysics of Galaxies

By Jo Bovy

Related Topics

C. The general theory of relativity and galaxies¶

C.1. Einstein’s field equations and geodesic motion¶

C.1.1. Mathematical background¶

C.1.2. Generalizing Newton’s second law: the geodesic equation¶

C.1.3. Curvature¶

C.1.4. Matter and energy¶

C.1.5. Einstein’s field equations¶

C.2. The Newtonian limit¶

C.2.1. Einstein’s field equations for weak gravitational fields¶

C.2.2. Geodesics of non-relativistic matter in weak gravitational fields¶

C.3. Gravitational light bending and the Shapiro delay¶

C.4. Homogeneous and isotropic cosmological models¶

C.4.1. The Friedmann–Lemaître–Robertson–Walker metric¶

C.4.2. The Friedmann equations¶

C.4.3. The angular diameter distance¶