# 16. Gravitational lensing¶

So far in this book, we have only used the motions of stars, gas, and entire stellar clusters or galaxies to infer the mass distributions of galactic systems. But there are a variety of other methods in use to determine stellar and total masses of galaxies and one of the most useful, assumption-free, and frankly fascinating ones is *gravitational lensing*. The field of gravitational lensing is built on the fact that light in a gravitational field gets deflected from its otherwise straight
trajectory and that mass distributions therefore act like an optical lens on light. Because the amount of deflection light experiences as well as the more complex lensing phenomenology depends only on the instantaneous distribution of total mass, gravitational lensing is a valuable tool to determine the mass distribution within galaxies, clusters of galaxies, and of the Universe as a whole. A taste of the wondrous phenomena that gravitational lensing gives rise to is given by the following two
images: the left image shows the first discovered gravitational lens, the “twin quasar” QSO 0957+561 (Walsh et al. 1979) and the right image is the quadruply-imaged quasar RXJ1131-1231, whose four quasar images are embedded in an Einstein ring resulting from the lensing of the quasar’s host galaxy (Sluse et al. 2003).

(Credit: left: QSO 0957+561, the “Twin quasar” [the two bright points straddling the galaxy in the bottom-left quadrant of the image]: NASA/ESA Hubble Space telescope; right: RXJ1131-1231: NASA/ESA Hubble Space telescope, and Suyu et al. 2017)

Soldner (1804) first discussed the deflection of light by massive objects. He considered light to be a particle on a hyperbolic orbit around a central mass (e.g., the Sun); because the mass \(m\) of a particle does not affect its orbit in a potential with mass \(M\) as long as \(m \ll M\), the mass of the particle drops out and the deflection angle for light can be estimated. This Newtonian deflection can be easily computed using the method that we used in Chapter 6.1
to calculate the velocity change \(\delta v_y\) of a particle moving along the \(x\) axis that passes by a massive body with mass \(M\) with impact parameter \(b\): the deflection is \(\hat{\alpha} = \delta v_y / v = 2GM/(bc^2)\) for a particle moving at the speed of light \(v=c\) (Equation 6.5). This calculation assumes that gravity bends light and there was no reason to believe that
this was the case in the Newtonian theory of light. Einstein (1911) demonstrated from the equivalence principle, which would form the foundation of the general theory of relativity (GR), that gravitational fields have to bend light; working without the full GR field and geodesic equations, he obtained the same result \(\hat{\alpha} = 2GM/(bc^2)\). Soldner (1804) ends his paper by stating “Hopefully no one finds it problematic, that I treat a light ray almost as a ponderable body”
(translation by Wikisource). And indeed, with the full set of GR equations, Einstein showed that this *was* problematic: GR in fact predicts a deflection that is twice as big: \(\hat{\alpha} = 4GM/(bc^2)\) in case of a central mass (Einstein 1916; see below).

While Einstein’s general theory of relativity had already successfully postdicted the anomalous precession of Mercury (which amounts to \(43''\); Le Verrier 1859; Newcomb 1882) as resulting from a first-order correction to the gravitational force coming from the Sun, Einstein’s prediction that light deflection in GR is twice the Newtonian prediction allowed for the first real test of GR. For a light ray grazing the surface of the Sun, the Newtonian and GR predictions are respectively \(0''.875\) and \(1''.75\), predictions that could be distinguished by observations at the time. The eclipse of 1919 presented the first opportunity to test GR in this way (Eddington 1919) and, as is well known now, the observations confirmed the GR prediction, marking the triumph of GR over Newtonian gravity (Dyson et al. 1920; modern measurements confirm the GR prediction and rule out the Newtonian deflection by \(\gtrsim 600\) times the error, e.g., Lebach et al. 1995).

With the fact that gravity bends light rays now established, astrophysicists started considering additional effects that could result from this action. Eddington (1920) and Chwolson (1924) pointed out that gravitational lensing could result in the light from distant objects having multiple light paths towards us, which would give rise to multiple images separated by small angles, so-called image splittings. Einstein (1936) considered the gravitational lensing effect of a star on a background star, but concluded that the micro-arcsecond image splittings are too small to be observed. But the first to propose that gravitational lensing could become the essential tool in extragalactic astronomy that it has become in the last decades was Zwicky (1937a). Based on his determination that the dynamical mass of the Coma cluster estimated using the virial theorem was 100 times larger than previously assumed based on the light coming from Coma-cluster galaxies (Zwicky 1933), he found that galaxies are massive enough to produce image splittings of background sources with separations \(\approx 1''\). Zwicky (1937b) further calculated that the probability of this lensing effect is \(\approx 1\%\), a prediction that has held up remarkably well over the years: gravitational lensing by galaxies is therefore a rare, but not impossible phenomenon to observe.

Nevertheless, it took another few decades to observe the first gravitational lens, largely due to the lack of bright objects that can be seen over the cosmological scales necessary to have a good chance of experiencing gravitational lensing. The discovery of quasars as bright extragalactic sources in the 1960s, however, cleared the path towards discovery, because quasars are both bright enough that they could be seen to cosmological distances, and rare enough that seeing identical quasars within
a few arcseconds is too much of a coincidence to explain without gravitational lensing. The first gravitationally-lensed quasar was discovered by Walsh et al. (1979) and is known as the “twin quasar” (QSO 0957+561). It was soon followed by other multiply-imaged quasars, including some with more than two images (e.g., Weymann et al. 1980; Young et al. 1981; Huchra et al. 1985), the discovery of background
galaxies lensed into arcs by clusters of galaxies (Lynds & Petrosian 1989; Soucail et al. 1987), and the discovery of Einstein rings formed by the gravitational lensing of compact radio galaxies (Hewitt et al. 1988). Today, hundreds of lensed quasars and galaxies are known, with large numbers added recently from large photometric and spectroscopic surveys (e.g., Canameras et al. 2020; Huang et al. 2021). The LSST survey on the Vera Rubin Observatory should soon increase this to \(\approx 10,000\)
lensed quasars and other upcoming surveys should lead to similar increases in the near future. Multiple or strongly-deformed images are the hallmark of *strong lensing*, but it was also quickly realised that the smaller effect from lensing further from the centers of massive galaxies and clusters could be detected statistically through correlated changes in the shapes of background galaxies. This regime is known as
*weak lensing* and it was first discovered in massive clusters acting as lenses of a population of faint, blue background galaxies (Tyson et al. 1990). Because weak lensing can probe the mass distribution far from galaxies’ centers, it is especially useful for measuring the total dark matter masses of galaxies (e.g., Brainerd et al. 1996). Similarly, the combined effect of *all* intervening mass in the Universe causes a general distortion of observed galaxies that can be used to determine the
energy content of the Universe (Kristian & Sachs 1966). This *cosmic shear* effect has become one of the major cosmological probes since its first measurement (Wittman et al. 2000; Abbott et al. 2018). It is clear that gravitational lensing has flourished into a large area of extragalactic astronomy with many exciting discoveries and applications yet to come.

Gravitational lensing has many applications, some of which we will discuss in this and the next chapter. It can be used to study the mass distribution of the lens in a manner that is in many ways orthogonal to the usual dynamical methods (which typically assume equilibrium, which gravitational lensing does not require); both the large-scale (dark matter) and small-scale (stellar) distribution can be studied using different lensing observables. That the lensing effect also includes an overall magnification of the lensed source means that we can use gravitational lensing to study otherwise too faint sources, both in our own Galaxy (e.g., Bensby et al. 2013) and at cosmological distances (Zheng et al. 2012). But gravitational lensing is not just a static phenomenon and the time dependence of lensing also has important applications. For example, the gravitational time delay that accompanies lensing (see below) allows lensing to measure the Hubble constant (Refsdal 1964) and the micro-arcsecond lensing (“microlensing”) produced by moving star lenses can be used to study the distribution of stars and compact objects in our own Galaxy (Paczynski 1986b) and in external lens galaxies (Chang & Refsdal 1979).

In this chapter, we give an introduction to the mathematical framework that describes gravitational lensing in its various forms. We discuss in some detail how gravitational lensing is used to constrain the mass distributions of galaxies and clusters of galaxies, but discuss results derived from gravitational lensing on the mass distribution and structure of galaxies in more detail in the next chapter. While the cosmic-shear lensing effect of the entire large-scale mass distribution of the Universe has become an important cosmological probe, we focus on lensing by individidual stars, galaxies, and clusters of galaxies here and do not discuss the somewhat different formalism necessary to study cosmic shear.

## 16.1. The lensing equation¶

### 16.1.1. The deflection angle¶

As discussed above, the study of the deflection of light in gravitational fields is a phenomenon that requires the general theory of relativity. In the limit of weak gravitational fields—\(|\Phi/c^2| \ll 1\) for \(\Phi \rightarrow 0\) as \(r\rightarrow \infty\), always the case on galactic scales—luckily for us, the GR calculation of the deflection experienced by a light ray traversing a gravitational potential \(\Phi\) is simply given by

where \(\hat{\boldsymbol{\alpha}}\) is the two-dimensional deflection angle perpendicular to the propagation direction, \(\nabla_\perp f= \nabla f - k^{-2}\,(\vec{k}\cdot \nabla f)\,\vec{k}\) is the gradient perpendicular to the unperturbed photon direction \(\vec{k}\), and the integral is over the unperturbed trajectory. This equation is derived in detail in Appendix C.3 and it is Equation (C.115) there.

The simplest application of Equation \eqref{eq-gravlens-light-bend-alpha} is to the case of light deflection by a point mass potential \(\Phi = -GM/r\). This describes, for example, the deflection of light grazing the surface of the Sun as measured in the 1919 eclipse. In the formalism below, we define the coordinate system such that the unperturbed light ray travels along the \(z\) direction and the vector \(\vec{x} = (x,y)\) describes the plane perpendicular to this direction. Assuming that the light ray originates far from the point mass at \(z=-\infty\) and is also observed far from the point mass at \(z=+\infty\), we then have that

where \(b = \sqrt{x^2+y^2}\) and \(\hat{\vec{x}}\) is the unit vector pointing towards \((x,y)\). Thus, the deflection points towards \(\vec{x}\): we observe a background object further from the point-mass at \((x,y) = (0,0)\) than it would be without the gravitational light deflection. The magnitude of \(\hat{\boldsymbol{\alpha}}\) for \(M=M_\odot\) and \(b = R_\odot\) is the value predicted by Einstein (1916) and confirmed by Dyson et al. (1920):

```
[3]:
```

```
from astropy.constants import G, c
print((4.*G*u.Msun/u.Rsun/c**2.)\
.to(u.arcsec,equivalencies=u.dimensionless_angles()))
```

```
1.751190325559984 arcsec
```

From the calculation above, it is immediately clear that if \(\Phi(x,y,z)\) does not depend on the angle \(\phi\) in the \((x,y)\) plane [that is, if \(\Phi(x,y,z) \equiv \Phi(R,z)\)], then \(\hat{\boldsymbol{\alpha}} \parallel \vec{x}\): *for axially-symmetric lenses, the deflection is in the direction pointing from the lens’ center to the object* and is given by

where \(\partial \Phi / \partial R\) is the derivative of the gravitational potential with respect to the cylindrical radius in the \((x,y)\) plane. If the potential is spherically symmetric, we can also compute this as \(\hat{\boldsymbol{\alpha}} = {2\over c^2}\,\vec{x}\int_{-\infty}^{+\infty} \mathrm{d} z\, {1\over r}{\partial \Phi \over \partial r}\).

Galaxies are extended objects and for the purpose of lensing are not well-modeled as point masses. As discussed in Chapter 3.4.5, because galaxies are observed to have flat rotation curves (Chapter 9), a simple one-parameter model for a galaxy is that of a logarithmic potential \(\Phi = v_c^2 \ln r\) or, equivalently, the potential of the singular isothermal sphere \(\Phi = 2\sigma^2 \ln r\) (see Chapter 6.6.2) ; in the context of lensing, the logarithmic potential is typically referred to as the singular isothermal sphere. For this potential, Equation \eqref{eq-gravlens-light-bend-alpha-axi} gives a deflection of

where in the last line we have also written the result in terms of the velocity dispersion of the singular isothermal sphere. Massive galaxies like the Milky Way or elliptical galaxies have \(\sigma \approx 200\, \mathrm{km\,s}^{-1}\) and therefore give rise to deflections with typical sizes of \(\approx 1''\):

```
[5]:
```

```
print((4.*numpy.pi*(200.*u.km/u.s/c)**2.)\
.to(u.arcsec,equivalencies=u.dimensionless_angles()))
```

```
1.1535955781163916 arcsec
```

Such angular deflections are at the edge of the range that large ground-based telescopes can resolve; space telescopes such as the Hubble Space Telescope can resolve these with relative ease. Note that the deflection angle for the singular isothermal sphere does not depend on the position of the source!

Calculating deflection angles using Equation \eqref{eq-gravlens-light-bend-alpha} is simple for any gravitational potential and we implement the general expression in the following function

```
[6]:
```

```
from scipy import integrate
from astropy.constants import c
from galpy.potential import evaluateRforces, evaluatephiforces
def deflection(Pot,R,phi=0.):
# Some gymnastics to deal with units
if isinstance(R,u.Quantity):
R= (R/(Pot[0]._ro if isinstance(Pot,list) else Pot._ro)/u.kpc)\
.to_value(u.dimensionless_unscaled)
cp,sp= numpy.cos(phi), numpy.sin(phi)
int_Rforce= integrate.quad(lambda z: evaluateRforces(Pot,R,z,phi=phi,
use_physical=False),
-numpy.inf,numpy.inf)[0]
int_phiforce= integrate.quad(lambda z: evaluatephiforces(Pot,R,z,phi=phi,
use_physical=False),
-numpy.inf,numpy.inf)[0]
# Result includes some gymnastics to deal with units
return (-2./c**2.*numpy.array([int_Rforce*cp-int_phiforce*sp/R,
int_Rforce*sp+int_phiforce*cp/R])\
*(Pot[0]._vo**2. if isinstance(Pot,list) else Pot._vo**2.)*u.km**2/u.s**2)\
.to(u.arcsec,equivalencies=u.dimensionless_angles())
```

For example, for an NFW halo with virial mass \(M_\mathrm{vir} = 10^{13}\,M_\odot\)—a typical mass for an elliptical lens—and a concentration of 6.37 following the Dutton & Macciò (2014) relation as in Chapter 3.4.6, we find the following deflection angle as a function of separation:

```
[68]:
```

```
from galpy.potential import NFWPotential
np= NFWPotential(mvir=10,conc=6.37,
H=67.1,overdens=200.,wrtcrit=True,
ro=8.,vo=220.)
figsize(7,4)
Rs= numpy.linspace(0.01*u.kpc,np.rvir(H=67.1,overdens=200.,wrtcrit=True),101)
plot(Rs,u.Quantity([deflection(np,R)[0].to(u.arcsec) for R in Rs]))
xlabel(r'$R\,(\mathrm{kpc})$')
ylabel(r'$\widehat{\boldsymbol{\alpha}}\,(\mathrm{arcsec})$');
```

The value of the deflection angle is similar to that for the singular-isothermal-sphere example above, because the typical velocity dispersion of a \(M_\mathrm{vir} = 10^{13}\,M_\odot\) is \(\approx 200\,\mathrm{km\,s}^{-1}\).

### 16.1.2. The lensing equation¶

Knowledge of the deflection angle \(\hat{\boldsymbol{\alpha}}\) allows us to relate the unperturbed position on the sky of a source to the observed, lensed position. Assuming that light only travels through a single gravitational field (e.g., that of one star or one galaxy) strong enough to observably deflect it, the geometry looks as in the following figure:

```
[434]:
```

```
def angle_plot(line1,line2,offset=1,color='k',origin=(0, 0),
len_x_axis=1,len_y_axis=1):
# Draw angle arc between two lines
# Edited from https://stackoverflow.com/a/25228427 and
# https://gist.github.com/battlecook/0c0bdb7097ec7c8fa160e342b1bf51ef
from matplotlib.patches import Arc
# Angle between line1 and x-axis
l1xy= line1.get_xydata()
slope1= (l1xy[1][1]-l1xy[0][1])/float(l1xy[1][0]-l1xy[0][0])
angle1= numpy.degrees(numpy.arctan(slope1))
# Angle between line2 and x-axis
l2xy= line2.get_xydata()
slope2= (l2xy[1][1]-l2xy[0][1])/float(l2xy[1][0]-l2xy[0][0])
angle2= numpy.degrees(numpy.arctan(slope2))
# Angle between them
theta1= numpy.amin([angle1,angle2])
theta2= numpy.amax([angle1,angle2])
angle= theta2-theta1
return Arc(origin,len_x_axis*offset,len_y_axis*offset,0,theta1,theta2,
color=color,label=str(angle)+u"\u00b0")
def line_between_points(o1,o2,*args,**kwargs):
return plot([o1[0],o2[0]],[o1[1],o2[1]],*args,**kwargs)
def extend_line(o1,o2,xnew,*args,**kwargs):
"""Extend the line between o1 and o2 to xnew"""
slope= (o2[1]-o1[1])/(o2[0]-o1[0])
ynew= slope*(xnew-o1[0])+o1[1]
return line_between_points(o1,[xnew,ynew],*args,**kwargs)
figsize(11,5.5)
# Source, lens, observer position
source= [0.,.5]
lens= [1.1,.75]
observer= [2.5,0.]
line_os= line_between_points(source,observer,'k--')[0]
line_sl= line_between_points(source,lens,'k-')[0]
line_ol= line_between_points(observer,lens,'k-')[0]
gca().add_patch(angle_plot(line_ol,line_os,-1.5,origin=observer))
line_beta= line_between_points([0,0],observer,'k:',lw=2.)[0]
gca().add_patch(angle_plot(line_beta,line_os,-2.,origin=observer))
gca().add_patch(angle_plot(line_beta,line_ol,-.75,origin=observer))
line_ex= extend_line(lens,observer,0.65,'k:',lw=2.)[0]
gca().add_patch(angle_plot(line_ex,line_sl,-0.5,origin=lens))
# Angle labels
galpy_plot.text(0.8,0.8,r'$\hat{\alpha}$',ha='center',va='center',fontsize=18.)
galpy_plot.text(1.74,0.3,r'$\alpha$',ha='center',va='center',fontsize=18.)
galpy_plot.text(1.45,0.1,r'$\beta$',ha='center',va='center',fontsize=18.)
galpy_plot.text(2.1,0.13,r'$\theta$',ha='center',va='center',fontsize=18.)
# Some dots
plot([source[0],lens[0],observer[0],lens[0]],[source[1],lens[1],observer[1],0.],
'ko',ms=10.)
# Impact parameter
plt.annotate(s='',xy=(lens[0],0.02),xytext=(lens[0],lens[1]-0.02),
arrowprops=dict(arrowstyle='<|-|>'))
galpy_plot.text(lens[0],lens[1]/2.,r'$\xi$',
ha='center',va='center',fontsize=18.,backgroundcolor='w')
# Distance arrows
plt.annotate(s='',xy=(source[0],-0.1),xytext=(lens[0],-0.1),
arrowprops=dict(arrowstyle='<|-|>'))
galpy_plot.text(0.5*(source[0]+lens[0]),-0.1,r'$D_{\mathrm{LS}}$',
ha='center',va='center',fontsize=18.,backgroundcolor='w')
plt.annotate(s='',xy=(observer[0],-0.1),xytext=(lens[0],-0.1),
arrowprops=dict(arrowstyle='<|-|>'))
galpy_plot.text(0.5*(observer[0]+lens[0]),-0.1,r'$D_{\mathrm{L}}$',
ha='center',va='center',fontsize=18.,backgroundcolor='w')
plt.annotate(s='',xy=(source[0],-0.175),xytext=(observer[0],-0.175),
arrowprops=dict(arrowstyle='<|-|>'))
galpy_plot.text(0.5*(source[0]+observer[0]),-0.175,r'$D_{\mathrm{S}}$',
ha='center',va='center',fontsize=18.,backgroundcolor='w')
# Observer, source, lens annotations
galpy_plot.text(source[0],source[1]+0.05,r'$\mathrm{Source}$',
ha='center',va='bottom',fontsize=18.)
galpy_plot.text(lens[0]-0.1,0.05,r'$\mathrm{Lens}$',
ha='center',va='bottom',fontsize=18.)
galpy_plot.text(observer[0],observer[1]+0.1,r'$\mathrm{Observer}$',
ha='center',va='bottom',fontsize=18.)
xlim(-0.28,2.6)
ylim(-.3,1.1)
gca().axis('off');
```

In this figure, the source is on the left, the lens in the middle, and the observer is on the right. In all galactic applications of gravitational lensing, the extent of the lens is much smaller than the distances between the source, lens, and observer. We can therefore approximate the lens as being infinitely thin in the direction perpendicular to the light ray. This is the **thin screen approximation**. We can then approximate the perturbed
light path in the manner of the above figure, with light propagating in a straight line until it reaches the lens at a closest approach \(\boldsymbol{\xi}\) where its direction is changed by the deflection angle; after this, the ray again propagates on a straight line until it is observed. In the absence of the lens, we would then observe the source at an angular position \(\boldsymbol{\beta}\) on the **source plane**, but because of the deflection,
we actually observe it at \(\boldsymbol{\theta}\). The figure above demonstrates how these angles are related in a two dimensional slice of the full three-dimensional picture. Because the angles involved are small, the relation between the angles is

(The figure above shows that this equation holds for both the \(x\) and \(y\) components of the angles and therefore the vector form follows). This is the **lensing equation** (Bourassa & Kantowski 1975). Beause of the way the relative distances appear in this equation, it is common to define the **reduced deflection angle** \(\boldsymbol{\alpha}\)
as

where \(\hat{\boldsymbol{\alpha}}\) is a function of \(\boldsymbol{\theta}\) through \(\boldsymbol{\xi} = D_\mathrm{L}\boldsymbol{\theta}\). Then, the lensing equation can be written as

The distances that appear in the lensing equation are the regular Euclidean distance in the case of lensing within our own Galaxy (or within the local Universe), but for cosmological lenses (such as a high-redshift background quasar lensed by an intermediate-redshift galaxy), the distances that appear are *angular diameter distances*. The angular diameter distance depends on the energy and matter content of the Universe and on its spatial
curvature. The angular diameter distance is discussed in more detail in Appendix C.4.3. Note that on cosmological scales, it is *not* the case that \(D_{\mathrm{S}} = D_{\mathrm{LS}} + D_{\mathrm{L}}\).

For strong-lensing systems, it is generally a good assumption that there is only a single deflector along the line of sight (*the* lens), but in the weak-lensing regime (see below) or for certain more detailed applications in strong lenses, it is necessary to consider the effect of additional deflectors in front of or behind the main lens. In that case, we have to deal with **multi-plane lensing**. The lensing equation in this case can be
straightforwardly derived by looking at the single-plane lensing equation \eqref{eq-gravlens-lensing-eq} as mapping an image onto its source location; in multi-plane lensing the source obtained by a single application of Equation \eqref{eq-gravlens-lensing-eq} applied to the closest lens to the observer, becomes the image from the second-closest lens and so on. Thus, the lensing equation is replaced by a set of equations
(e.g., Blandford & Narayan 1986; Kovner 1987)

where \(D_k\) is the (angular diameter) distance to lens \(k\) and \(D_{\mathrm{kS}}\) that between lens \(k\) and the source; the index \(i\) runs from \(i=1\) to \(i=N\) for \(N\) lenses and we set \(\boldsymbol{\beta}_1 = \boldsymbol{\theta}\) (thus, the index runs backwards and the source is at \(\boldsymbol{\beta}_{N+1}\)). While this equation looks simple, solving it is much more difficult than solving the single-plane lensing equation. In this chapter, we will only consider the single-plane lensing equation, but a large literature on solving the multi-plane lensing equation in various regimes exists (e.g., Keeton et al. 1997; Schneider 1997; Birrer et al. 2017; Schneider 2019).

### 16.1.3. Examples: point mass and singular isothermal sphere¶

In the case of a point-mass lens, we have from Equation \eqref{eq-gravlens-deflect-point} that

where we have substituted \(b = D_\mathrm{L}|\boldsymbol{\theta}|\) and \(\hat{\vec{x}} = \boldsymbol{\theta}/|\boldsymbol{\theta}|\). From the discussion above, we know that for an axially-symmetric lens, the deflection is along the line connecting \(\boldsymbol{\beta}\) and the center of the lens, and the lensing equation for axially-symmetric lenses therefore reduces to a one-dimensional equation for the position \(\theta\) on this line (at least when \(\boldsymbol{\beta} \neq 0\)).

To solve this equation, we first define the **Einstein angle** \(\theta_\mathrm{E}\) as

because then we have that

This has the solutions

When \(\boldsymbol{\beta} = 0\), this gives \(\theta = \theta_\mathrm{E}\). However, in this case, our reduction of the problem to a one-dimensional equation does not hold, and in fact the entire ring \(|\boldsymbol{\theta}| = \theta_\mathrm{E}\) solves the lensing equation. This is an **Einstein ring** with \(\theta_\mathrm{E}\) in this context known as the **Einstein radius**: a source directly
behind a point-like object gets lensed into a ring.

We can similarly solve the lensing equation for lensing by a singular isothermal sphere. From Equation \eqref{eq-gravlens-deflect-sis}, we have that

Again, as long as \(\boldsymbol{\beta} \neq 0\), the lensing equation reduces to one-dimensional form and now

In this case, we can define the Einstein angle as

Equation \eqref{eq-gravlens-lensing-eq-reduced-sis} has a single solution \(\theta = \beta + \mathrm{sign}(\beta)\,\theta_\mathrm{E}\) if \(|\beta| \geq \theta_\mathrm{E}\), but when \(|\beta| < \theta_\mathrm{E}\) it has two solutions \(\theta = \beta \pm\theta_\mathrm{E}\). When \(\boldsymbol{\beta} = 0\), we again find that the entire ring \(|\boldsymbol{\theta}| = \theta_\mathrm{E}\) satisfies the lensing equation and we again get an Einstein ring.

The point-mass and singular-isothermal-sphere solutions illustrate several important features of gravitational lensing. The first is that gravitational lensing can lead to the appearance of multiple images: the point-mass lens always has two solutions corresponding to two apparent positions for every source, while the singular isothermal sphere multiply-images sources that fall within its Einstein ring. The second is that gravitational lensing can lens a compact source into a one-dimensional Einstein ring if the source and the lens are aligned. More generally, the fact that the lensing equation for axially-symmetric lenses reduces to a one-dimensional equation (except when an Einstein ring forms) means that quadrupolar, four-image configurations (“quads”) such as the “Einstein cross” (Huchra et al. 1985) cannot result from axially-symmetric lenses. We also see that every source in the sky is technically lensed by a single point-like lens or by a single singular isothermal sphere: No matter how distant, the gravitational field of a singular isothermal sphere moves the image position by \(\theta_\mathrm{E}\). However, at large distances from the centers of galaxies or clusters of galaxies, the density distribution drops off faster and a point-mass potential is a better model. At large separations \(|\beta| \gg |\theta_\mathrm{E}|\), Equation \eqref{eq-gravlens-lensing-eq-sol-pm} shows that the two images approach \(\theta_1 \rightarrow \beta\) and \(\theta_2 \rightarrow 0\) and one of the images therefore overlaps with the lens. Moreover, as we will see in an exercise, this image is also strongly demagnified and therefore almost impossible to observe.

## 16.2. The lensing and Fermat potentials¶

### 16.2.1. The lensing potential¶

While the formalism introduced in the previous section in principle suffices to fully describe lensing of non-variable sources (variable sources have additional lensing effects due to the time delay that accompanies lensing), much understanding is gained by taking a step back and presenting the alternative formalism of lensing contained in the lensing and Fermat potentials. This formalism starts from the observation that we can swap the order of the integral and the gradient in the calculation of the reduced deflection angle obtained from combining Equations \eqref{eq-gravlens-light-bend-alpha} and \eqref{eq-gravlens-reduced-deflect}. As usual defining our coordinates such that the unperturbed light ray propagates along the \(z\) axis, we can write the reduced deflection angle as

where \(D_{\mathrm{L}}\boldsymbol{\theta} = (x,y)\) and we have replaced the spatial gradient with the equivalent angular gradient. The (dimensionless) function \(\psi(\boldsymbol{\theta})\) that is defined by this equation is

The function \(\psi(\boldsymbol{\theta})\) is therefore a scalar function that contains all of the information necessary to compute the two-dimensional deflection on the sky. From now on, we’ll typically drop the argument \(\boldsymbol{\theta}\), but remember that \(\psi(\boldsymbol{\theta})\) is a function of angle. It turns out to be related to the lens mass distribution in a simple manner, as can be seen by taking the two-dimensional Laplacian of \(\psi(\boldsymbol{\theta})\)

where we converted the angular Laplacian \(\nabla^2_{\boldsymbol{\theta}}\) to the spatial one \(\nabla^2_{\boldsymbol{\xi}}\), we used the Laplacian in polar coordinates from Equation (A.25) to work out \(\nabla^2_{\boldsymbol{\xi}}\) in polar coordinates, and we used the Poisson equation in cylindrical coordinates (Equation 8.25) to replace the two-dimensional Laplacian with the density. The integral of
the second term in the last step is zero because we assume that \(\partial \Phi / \partial z \rightarrow 0\) as \(|z| \rightarrow 0\) in all galactic applications of lensing (this assumption is also used to derive the deflection angle in GR). We thus see that the relation between the Laplacian of \(\psi\) and the surface density looks like a two-dimensional version of the Poisson equation and \(\psi\) is therefore known
as the **lensing potential** or the **deflection potential**. For axially-symmetric lens mass distributions, the lensing potential is axisymmetric, that is, it does not depend on the polar angle. Equation \eqref{eq-gravlens-lensingpot-laplacian-1} is generally written as

where

is the **convergence** or the dimensionless surface density and

is the **critical surface density**. The critical surface density plays a significant role in gravitational lensing that we will discuss below.

For any given gravitational potential, we can compute the lensing potential using Equation \eqref{eq-gravlens-lensingpot} and we implement this in the following function:

```
[7]:
```

```
from scipy import integrate
from astropy.cosmology import Planck18 as cosmo
from astropy.constants import c
from galpy.potential import evaluatePotentials
def lensing_potential(Pot,theta,zlens,zsource,zmax=numpy.inf):
DL= cosmo.angular_diameter_distance(zlens)
cosmo_fac= cosmo.angular_diameter_distance_z1z2(zlens,zsource)\
/cosmo.angular_diameter_distance(zsource)/DL
# Some gymnastics to deal with units
R= (numpy.sqrt(theta[0]**2.+theta[1]**2.)*DL)\
.to(u.kpc,equivalencies=u.dimensionless_angles())
R= (R/(Pot[0]._ro if isinstance(Pot,list) else Pot._ro)/u.kpc)\
.to_value(u.dimensionless_unscaled)
phi= numpy.arctan2(theta[1],theta[0])
# Result includes more gymnastics to deal with units
return (cosmo_fac*2./c**2.
*integrate.quad(lambda z: evaluatePotentials(Pot,R,z,phi=phi,
use_physical=False),
-zmax,zmax)[0]
*(Pot[0]._ro*Pot[0]._vo**2. if isinstance(Pot,list)
else Pot._ro*Pot._vo**2.)*u.kpc*u.km**2/u.s**2)\
.to(u.arcsec**2,equivalencies=u.dimensionless_angles())
```

For the \(M_\mathrm{vir} = 10^{13}\,M_\odot\) NFW halo that we considered in the previous section, the lensing potential only depends on the distance \(|\boldsymbol{\theta}|\) from the center and is given by:

```
[142]:
```

```
from galpy.potential import NFWPotential
np= NFWPotential(mvir=10,conc=6.37,
H=67.1,overdens=200.,wrtcrit=True,
ro=8.,vo=220.)
zlens, zsource= 0.5, 1.
DL= cosmo.angular_diameter_distance(zlens)
thetas= numpy.linspace(-np.rvir(H=67.1,overdens=200.,wrtcrit=True)/DL,
np.rvir(H=67.1,overdens=200.,wrtcrit=True)/DL,101)\
.to(u.arcsec,equivalencies=u.dimensionless_angles())
plot(thetas,
u.Quantity([lensing_potential(np,u.Quantity([theta,0.*u.arcsec]),
zlens,zsource).to(u.arcsec**2)
for theta in thetas]))
xlabel(r'$|\boldsymbol{\theta}|\,(\mathrm{arcsec})$')
ylabel(r'$\psi\,(\mathrm{arcsec}^2)$');
```

Note that we express the lensing potential in units of arcseconds squared here for reasons that will soon become clear.

Because the lensing potential satisfies the two-dimensional Poisson equation, we can derive an expression for the lensing potential in terms of the surface density of the lens. Similar to how the solution of the Poisson equation for a point-mass density in three dimensions is \(-GM/r\) (see Chapter 3.1), the solution of the two-dimensional Poisson equation for a two-dimensional point (a wire in three dimensions) is \(\propto \ln R\). By decomposing the surface density into points, we can therefore obtain the lensing potential as

Because the reduced deflection angle is the gradient of \(\psi\), we can obtain it as

We can also derive an alternative expression to Equation \eqref{eq-gravlens-light-bend-alpha} for the deflection angle \(\hat{\boldsymbol{\alpha}}\)

where \(\boldsymbol{\xi} = D_L\,\boldsymbol{\theta}\) (see the diagram above). Thus, this discussion of the lensing potential makes it clear that the gravitational-lensing phenomenon is *entirely determined by the density integrated along the line of sight, the surface density* \(\Sigma\), *or equivalently, the convergence* \(\kappa\). For axially-symmetric lenses with \(\Sigma(R,\phi) \equiv \Sigma(R)\), these expressions simplify to

that is, the deflection angles are along the vector connecting the lens’ center to the point of closest approach with unit vectors \(\hat{\boldsymbol{\theta}}\) or \(\hat{\boldsymbol{\xi}}\) depending on whether we use angular or spatial coordinates. Similar to how the gravitational force of a spherically-symmetric mass distribution is fully determined by the enclosed mass, the deflection angle of an axially-symmetric lens is fully determined by the mass enclosed within the impact parameter.

While evaluating the lensing potential for a given mass distribution is straightforward using Equation \eqref{eq-gravlens-lenspot-from-kappa} or Equation \eqref{eq-gravlens-lensingpot-from-kappa-axially-symm}, for non-axially-symmetric lenses computing this double integral is generally computationally expensive, especially because lensing analyses of observational data often demand running through large numbers of lens mass distributions. This is a similar problem as we faced in Chapters 8 and 13 when calculating the gravitational potential for flattened or triaxial mass distributions. There are similar ways of dealing with this situation as we considered in those chapters. Like in Chapter 8.2, we can circumvent Equation \eqref{eq-gravlens-lenspot-from-kappa} altogether by simply positing a form for the lensing potential and computing the resulting convergence from Equation \eqref{eq-gravlens-2dpoisson} (e.g., Blandford & Kochanek 1987). This makes all operations on the lensing potential, such as computing the reduced deflection angle, or the magnification matrix defined below, simple. But it has the disadvantage that one cannot directly specify the convergence and that the resulting convergence may be unphysical. When introducing flattening in the lensing potential, for large values of the flattening, the surface density typically develops a dumbbell shape and it may become negative (similar to what happens for the flattened logarithmic potential; see Chapter 8.2.3). Another option is to work with flattened mass models for which the lensing potential, deflection angle, and other lensing quantities can be computed quickly when care is taken with the numerical implementation. This is, for example, the case for a convergence of the form

where \(q\) is the flattening. For \(t = 1\), this corresponds to a flattened singular-isothermal sphere. For this convergence, all lensing quantities can be computed using fast series approximations of the relevant integrals (e.g., Barkana 1998; Tessore & Metcalf 2015). It is also possible to expand the convergence into multipoles as we did in Chapter 13.3.1 for gravitational potentials, but the procedure for lensing is much simpler, because the potential and convergence are only two-dimensional.

### 16.2.2. The Fermat potential and the time delay¶

The lensing potential allows for an alternative statement of the lensing equation \eqref{eq-gravlens-reduced-deflect} by writing it as

because \(\nabla_{\boldsymbol{\theta}} \psi = \boldsymbol{\alpha}\). Thus images located at \(\boldsymbol{\theta}\) for a source located at \(\boldsymbol{\beta}\) form where the scalar function in the square brackets is extremized. This scalar function is known as the **Fermat potential** defined by

As an example, we compute the Fermat potential for an elliptical galaxy represented as a spherical perfect ellipsoid from Chapter 13.2.3 (\(b=c=1\)) with a mass of \(5\times 10^{11}\,M_\odot\) and a scale radius of \(a=4\,\mathrm{kpc}\). For three different source positions \(\boldsymbol{\beta}\), indicated by the dashed line, we can compute \(\tau(\boldsymbol{\theta};\boldsymbol{\beta})\) along the line connecting \(\boldsymbol{\beta}\) and \(\boldsymbol{\theta}\) where we know the images to lie for an axially-symmetric mass distribution:

```
[228]:
```

```
def fermat_potential(Pot,beta,theta,zlens,zsource,zmax=numpy.inf):
return 0.5*numpy.sum((beta-theta)**2.)\
-lensing_potential(Pot,theta,zlens,zsource,zmax=zmax)
def extrema(x,y):
# Find position of extrema in y
indx= ((y > numpy.roll(y,1))*(y > numpy.roll(y,-1)))\
+((y < numpy.roll(y,1))*(y < numpy.roll(y,-1)))
# Remove edges
ii= numpy.arange(len(x))
indx*= (ii != 0)*(ii != (len(x)-1))
return (x[indx],y[indx])
from galpy.potential import PerfectEllipsoidPotential
pep= PerfectEllipsoidPotential(amp=5*10.**11*u.Msun,a=4*u.kpc,b=1.,c=1.)
betas= [0.1,0.7,1.9]
figsize(11,4)
for ii,beta in enumerate(betas):
subplot(1,len(betas),ii+1)
if numpy.fabs(beta) < 0.5:
thetas= numpy.linspace(-1.5,1.5,30)*u.arcsec
elif numpy.fabs(beta) < 1.5:
thetas= numpy.linspace(-4.01,4.01,30)*u.arcsec
else:
thetas= numpy.linspace(-5.01,5.01,30)*u.arcsec
ploty= u.Quantity([fermat_potential(pep,u.Quantity([beta,0.])*u.arcsec,
u.Quantity([theta,0.*u.arcsec]),
zlens,zsource).to(u.arcsec**2)
for theta in thetas])
line= plot(thetas,ploty)
plot(*extrema(thetas,ploty),'o',color=line[0].get_color())
axvline(beta,color='k',ls='--')
xlabel(r'$|\boldsymbol{\theta}|\,(\mathrm{arcsec})$')
if ii == 0:
ylabel(r'$\tau\,(\mathrm{arcsec}^2)$');
```

We see that the Fermat potential has three extrema labeled with dots for a source position close to the center of the lensing potential, corresponding to three images. For source positions further from the center, only a single extremum (a minimum) is present, but as long as the source remains close to the center, the image position is significantly offset from the source position. As the source position gets further from the center, the \(|\boldsymbol{\theta}-\boldsymbol{\beta}|^2/2\) term in the Fermat potential starts to dominate and the single image is close to the source position.

The reason that the Fermat potential is called the *Fermat potential* is that it is related to the time delay experienced by light along its deflected trajectory. As discussed in Appendix C.3, in gravitational fields light is delayed compared to its unperturbed path because of two reasons: (i) the deflected path is longer than the unperturbed path, leading to a geometrical time delay \(\Delta t_\mathrm{geom}\); and (ii) time appears to slow down along
light’s trajectory in the presence of a gravitational field, which gives the gravitational time delay \(\Delta t_\mathrm{Shapiro}\) (first discussed by Shapiro 1964). In the weak gravitational fields of galactic gravitational lensing, we can approximate these as (i) the geometrical time delay being equal to the length of the deflected path divided by the speed of light \(c\) and (ii) the gravitational time delay along the unperturbed path. To compute the geometric time delay, we
can use the following extension of the basic lensing diagram:

```
[435]:
```

```
def angle_plot(line1,line2,offset=1,color='k',origin=(0, 0),
len_x_axis=1,len_y_axis=1,
dir1=1,dir2=1):
# Draw angle arc between two lines
# Edited from https://stackoverflow.com/a/25228427 and
# https://gist.github.com/battlecook/0c0bdb7097ec7c8fa160e342b1bf51ef
from matplotlib.patches import Arc
# Angle between line1 and x-axis
l1xy= line1.get_xydata()
slope1= (l1xy[1][1]-l1xy[0][1])/float(l1xy[1][0]-l1xy[0][0])
angle1= numpy.degrees(numpy.arctan(slope1))+90.*(dir1-1)
# Angle between line2 and x-axis
l2xy= line2.get_xydata()
slope2= (l2xy[1][1]-l2xy[0][1])/float(l2xy[1][0]-l2xy[0][0])
angle2= numpy.degrees(numpy.arctan(slope2))+90.*(dir2-1)
# Angle between them
theta1= numpy.amin([angle1,angle2])
theta2= numpy.amax([angle1,angle2])
angle= theta2-theta1
return Arc(origin,len_x_axis*offset,len_y_axis*offset,0,theta1,theta2,
color=color,label=str(angle)+u"\u00b0")
def line_between_points(o1,o2,*args,**kwargs):
return plot([o1[0],o2[0]],[o1[1],o2[1]],*args,**kwargs)
def extend_line(o1,o2,xnew,*args,**kwargs):
"""Extend the line between o1 and o2 to xnew"""
slope= (o2[1]-o1[1])/(o2[0]-o1[0])
ynew= slope*(xnew-o1[0])+o1[1]
return line_between_points(o1,[xnew,ynew],*args,**kwargs)
figsize(11,5.5)
# Source, lens, observer position
source= [0.,.5]
lens= [1.1,.75]
observer= [2.5,0.]
line_os= line_between_points(source,observer,'k--')[0]
line_sl= line_between_points(source,lens,'k-')[0]
line_ol= line_between_points(observer,lens,'k-')[0]
gca().add_patch(angle_plot(line_ol,line_os,-1.5,origin=observer))
line_beta= line_between_points([0,0],observer,'k:',lw=2.)[0]
gca().add_patch(angle_plot(line_beta,line_os,-2.,origin=observer))
gca().add_patch(angle_plot(line_beta,line_ol,-.75,origin=observer))
line_ex= extend_line(lens,observer,source[0],'k:',lw=2.)[0]
# Also compute the apparent position of the source, re-use 'extend-line' code
# We put the apparent source at the position on the sky and
# at the correct geometric time-delay
slope= (lens[1]-observer[1])/(lens[0]-observer[0])
apparent_x= 0.105
apparent= [apparent_x,slope*(apparent_x-observer[0])+observer[1]]
gca().add_patch(angle_plot(line_ex,line_sl,-0.5,origin=lens))
line_sa= line_between_points(source,apparent,'k--')[0]
# Now draw the line between the source and the position along the
# observer-apparent line that is at the same distance from the
# observer as the unperturbed source
# This position is set to make this the case
x_eql= 0.235
eql= [x_eql,slope*(x_eql-observer[0])+observer[1]]
line_se= line_between_points(source,eql,'k-.')[0]
gca().add_patch(angle_plot(line_os,line_se,0.5,origin=source))
gca().add_patch(angle_plot(line_os,line_se,0.55,origin=source))
from matplotlib.lines import Line2D
line_eo= Line2D([observer[0],eql[0]],[observer[1],eql[1]])
gca().add_patch(angle_plot(line_eo,line_se,0.5,origin=eql,dir2=-1))
gca().add_patch(angle_plot(line_eo,line_se,0.55,origin=eql,dir2=-1))
# Also add angles at source plane
gca().add_patch(angle_plot(line_sl,line_sa,0.2,origin=source))
gca().add_patch(angle_plot(line_sl,line_sa,0.22,origin=source))
gca().add_patch(angle_plot(line_sl,line_sa,0.24,origin=source))
gca().add_patch(angle_plot(line_ex,line_sa,0.2,origin=apparent,dir2=-1))
gca().add_patch(angle_plot(line_ex,line_sa,0.22,origin=apparent,dir2=-1))
gca().add_patch(angle_plot(line_ex,line_sa,0.24,origin=apparent,dir2=-1))
gca().add_patch(angle_plot(line_se,line_sa,0.4,origin=source))
# Time delay
plt.annotate(s='',xy=(apparent[0],apparent[1]+0.075),
xytext=(eql[0]+0.04,eql[1]+0.0475),
arrowprops=dict(arrowstyle='<|-|>'))
# Angle labels
galpy_plot.text(0.8,0.8,r'$\hat{\alpha}$',ha='center',va='center',fontsize=18.)
galpy_plot.text(1.74,0.3,r'$\alpha$',ha='center',va='center',fontsize=18.)
galpy_plot.text(1.45,0.1,r'$\beta$',ha='center',va='center',fontsize=18.)
galpy_plot.text(2.1,0.13,r'$\theta$',ha='center',va='center',fontsize=18.)
galpy_plot.text(-0.1,0.85,r'$\delta$',ha='center',va='center',fontsize=18.)
plot([-0.05,0.05],[0.825,0.725],'k-',lw=0.8)
# Some dots
plot([source[0],lens[0],observer[0],lens[0],apparent[0],eql[0]],
[source[1],lens[1],observer[1],0.,apparent[1],eql[1]],
'ko',ms=10.)
# Impact parameter
plt.annotate(s='',xy=(lens[0],0.02),xytext=(lens[0],lens[1]-0.02),
arrowprops=dict(arrowstyle='<|-|>'))
galpy_plot.text(lens[0],lens[1]/2.,r'$\xi$',
ha='center',va='center',fontsize=18.,backgroundcolor='w')
# Distance arrows
plt.annotate(s='',xy=(source[0],-0.1),xytext=(lens[0],-0.1),
arrowprops=dict(arrowstyle='<|-|>'))
galpy_plot.text(0.5*(source[0]+lens[0]),-0.1,r'$D_{\mathrm{LS}}$',
ha='center',va='center',fontsize=18.,backgroundcolor='w')
plt.annotate(s='',xy=(observer[0],-0.1),xytext=(lens[0],-0.1),
arrowprops=dict(arrowstyle='<|-|>'))
galpy_plot.text(0.5*(observer[0]+lens[0]),-0.1,r'$D_{\mathrm{L}}$',
ha='center',va='center',fontsize=18.,backgroundcolor='w')
plt.annotate(s='',xy=(source[0],-0.175),xytext=(observer[0],-0.175),
arrowprops=dict(arrowstyle='<|-|>'))
galpy_plot.text(0.5*(source[0]+observer[0]),-0.175,r'$D_{\mathrm{S}}$',
ha='center',va='center',fontsize=18.,backgroundcolor='w')
# Observer, source, lens annotations
galpy_plot.text(source[0],source[1]-0.15,r'$\mathrm{Source}$',
ha='center',va='bottom',fontsize=18.)
galpy_plot.text(lens[0]-0.1,0.05,r'$\mathrm{Lens}$',
ha='center',va='bottom',fontsize=18.)
galpy_plot.text(observer[0],observer[1]+0.1,r'$\mathrm{Observer}$',
ha='center',va='bottom',fontsize=18.)
galpy_plot.text(apparent[0]-0.1,apparent[1]-0.1,r'$\mathrm{A}$',
ha='center',va='bottom',fontsize=18.)
galpy_plot.text(eql[0]+0.15,eql[1]-0.05,r'$\mathrm{EQ}$',
ha='center',va='bottom',fontsize=18.)
xlim(-0.28,2.6)
ylim(-.3,1.5)
gca().axis('off');
```

Compared to the previous version, we have extended the line towards the apparent position of the source \(\boldsymbol{\theta}\) and added the following two points along that line: point ‘EQ’ that is at the same distance from the observer as the source along the *unperturbed* trajectory and point ‘A’ that is at the same distance from the observer as the source along the *perturbed* trajectory. The geometric difference in the path length is therefore the distance between ‘EQ’ and ‘A’. This
definition implies that the angles indicated by the double and triple arcs are pairwise equal. Working in a flat geometry and in the limit of small angles, the difference in path length is then equal to the distance between the source and ‘A’ multiplied by the small angle \(\delta\). Consideration of all the angles involved shows that \(\delta = (\hat{\alpha}-\alpha)/2 = \hat{\alpha}/2\times D_\mathrm{L}/D_{\mathrm{S}}\) and that the distance between the source and ‘A’ equals
\(\hat{\alpha}\,D_{\mathrm{LS}}\). The geometric time delay is therefore

We have derived this equation assuming a spatially-flat background, but galaxy or cluster lensing happens at cosmological distances where the Universe is expanding and the Universe may be spatially curved on those scales. Accounting for this is simple, because physically the time delay happens when the light passes through the lens and gravitational lenses are small compared to the scale on which the Universe expands or may be curved. For cosmological applications of lensing, the above equation therefore holds in the frame of the lens and to convert the time delay in the lens frame to that of the observer, we need to account for the expansion of the Universe with its accompanying \((1+z)\) lengthening of time intervals

where \(z_\mathrm{L}\) is the redshift of the lens. As shown in Appendix C.3, the gravitational time delay is given by Equation (C.120), which in our coordinate system and accounting for the cosmological time delay becomes

Comparing this to the definition of \(\psi\) in Equation \eqref{eq-gravlens-lensingpot}, we can therefore write this as

The total time delay is therefore given by

where the **time-delay distance** \(D_{\Delta t}\) is defined by

Equation \eqref{eq-gravlens-fermat-vs-timedelay} demonstrates that the total time delay is proportional to the Fermat potential. Comparing this to Equation \eqref{eq-gravlens-lensing-eq-asgradient}, we can therefore state the lensing equation as (Schneider 1985; Blandford & Narayan 1986)

This means that images form at extrema of the time delay surface, in the same way as they do in geometrical optics according to Fermat’s principle (hence, *Fermat potential*). In fact, Fermat’s principle holds in the general theory of relativity and we could have derived the lensing equation from it in the first place (Kovner 1990; Nityananda & Samuel 1992).

To get a sense of the magnitude of the time delays involved in gravitational lensing, we can estimate both the geometric and gravitational time delays for a typical lens. Assuming an elliptical galaxy modeled as a singular isothermal sphere with \(\sigma \approx 200\,\mathrm{km\,s}^{-1}\), we estimated above that \(|\hat{\boldsymbol{\alpha}}| \approx 1''\) and because for typical lens systems \(D_{\mathrm{LS}}/D_{\mathrm{S}}\approx 0.5\), similarly \(|\boldsymbol{\alpha}| = |\boldsymbol{\theta}-\boldsymbol{\beta}| \approx 0.5''\). For typical lens systems, \((1+z_{\mathrm{L}})\,D_\mathrm{L}\,D_{\mathrm{S}}/ D_{\mathrm{LS}} \approx 4\,\mathrm{Gpc}\) and we thus have that

corresponding to a distance of \(c\Delta t_\mathrm{geom} \approx 0.014\,\mathrm{pc}\). We can estimate the gravitational time delay from combining \(|\Phi| \approx 2\sigma^2\) with an estimate of the thickness of the lens, which is approximately twice the virial radius of \(\sim 300\,\mathrm{kpc}\), so for a lens at \(z_\mathrm{L}\approx 0.5\)

which corresponds to a distance of \(c\Delta t_\mathrm{Shapiro} \approx 1.6\,\mathrm{pc}\). The gravitational time delay therefore dominates the total time delay. However, the absolute time delay is unobservable, because we can’t know when the images would have arrived in the absence of the lens. What *is* observable is the relative time delay between multiple images of the same source, because this can be determined if the source is variable by cross-correlating time series observations of
the different images (e.g., if the lensed image is a quasar, its stochastic variability can be used to determine the relative time delays). The typical image separation is \(\approx 2\,\boldsymbol{\alpha}\), so the relative geometric time delay is

but for the gravitational time delay, we need the gradient of the integrated potential over the image, multipying the absolute gravitational time delay by \(\approx b/R_{\mathrm{vir}} \approx 1/60\), because typical impact parameters are \(b \sim \mathcal{O}(5\,\mathrm{kpc})\); the relative gravitational time delay is therefore

Thus, the relative gravitational time delay between images of the same source is similar to the relative geometric time delay. The fact that this has to be the case follows from the lensing equation, which in the Fermat form from Equation \eqref{eq-gravlens-lensing-eq-fermat} states that \(\nabla_{\boldsymbol{\theta}}\left(\Delta t_\mathrm{geom}\right)= \nabla_{\boldsymbol{\theta}}\left(\Delta t_\mathrm{shapiro}\right)\). We have used \(\sim\) signs instead of \(\approx\) to indicate that these are very rough estimates. In addition to the time delays being dependent on the lens mass and the lens and source redshifts, even for a given lens the time delays can vary by an order of magnitude depending on the image configuration. The reason for this is that gravitationally-lensed images are found in very particular regions surrounding the lens to be observable (e.g., Turner et al. 1984).

### 16.2.3. The critical surface density¶

Before discussing how we can use the Fermat potential to understand where lensed images form and what their properties are, we want to understand the physical meaning of the critical surface density from Equation \eqref{eq-gravlens-criticalsurfdens}. Let’s consider a gravitational lens with constant surface density \(\Sigma(R,\phi) = \Sigma_{\mathrm{crit}}\) and therefore \(\kappa(\boldsymbol{\theta}) \equiv 1\). Then we can compute \(\boldsymbol{\alpha}\) from Equation \eqref{eq-gravlens-reduceddeflect-from-kappa-axially-symm} and find

that is, the reduced deflection angle is equal to the image position. The lensing equation then implies that \(\boldsymbol{\beta} = 0\) for every \(\boldsymbol{\theta}\). Such a lens focuses perfectly: a source at \(\boldsymbol{\beta} = 0\) appears at all \(\boldsymbol{\theta}\) or, equivalently, every ray originating from the observer perfectly focuses at \(\boldsymbol{\beta} = 0\) (all within the small-angle approximation of course, so in reality this can’t quite happen).
Thus, lenses with \(\Sigma > \Sigma_{\mathrm{crit}}\) are therefore able to focus well and the importance of the critical surface density in gravitational lensing derives from the fact (which we will not prove) that if a lens has \(\Sigma > \Sigma_{\mathrm{crit}}\) *anywhere*, then it is capable of producing multiple images of a source. That is, for a lens with \(\Sigma > \Sigma_{\mathrm{crit}}\) anywhere, there are source positions \(\boldsymbol{\beta}\) with multiple solutions
\(\boldsymbol{\theta}\) of the lensing equation (Subramanian & Cowling 1986). In general, this is only a sufficient condition, not a necessary condition. However, for an axially-symmetric lens whose mass peaks at its center (the latter is generally the case for galactic lenses), \(\Sigma > \Sigma_{\mathrm{crit}}\) *is* a necessary and sufficient condition for multiple images to be possible. For example, for the point-mass lens in Section 16.1.3
above, we found that it always produces multiple images and a point-mass always exceeds the critical surface density, because it has infinite density at the center. For the singular-isothermal-sphere lens, similarly, \(\Sigma = \Sigma_{\mathrm{crit}}\) at \(\theta = \theta_\mathrm{E}/2\) and we saw that multiple images are possible when \(|\boldsymbol{\beta}| < \theta_\mathrm{E}\).

To understand the typical value of the critical surface density, we can write Equation \eqref{eq-gravlens-criticalsurfdens} as

for galaxy lenses. Massive elliptical galaxies have masses of \(\sim 10^{11}\,M_\odot\) within their central few kpc and therefore typically exceed the critical surface density when observed at Gpc-scale distances. For lensing by stars, the relative surface density is that within their radius and the typical critical surface density is therefore

The second equation shows that lensing by the Sun does not produce multiple images (\(\Sigma > \Sigma_{\mathrm{crit}}\) is necessary, because the Sun is approximately spherically symmetric), but the first equation demonstrates that lensing by stars in our own Galaxy or at cosmological distances can produce multiple images. Indeed, the critical surface density for lensing by stars is so low that at 1 kpc, the critical surface density is \(\approx 1\,M_\odot/(30\,\mathrm{AU})^2)\) and
multiple images can therefore form for background stars within 30 projected AU; this allows lensing by stars in our Galaxy to constrain the population of exoplanets at tens of AU (Gould & Loeb 1992). Because the critical surface density is \(\propto D_{\mathrm{S}} /(D_{\mathrm{LS}}\,D_{\mathrm{L}})\), lensing by stars in external galaxies can also produce multiple images out to even \(\sim 1/10\,\mathrm{pc}\), but it requires a source that is smaller than this projected size, because
otherwise the multiple images are blurred together. Only the innermost regions of a quasar are small enough for this. Note that the multiple images formed by stars are separated by such a small amount that they cannot be resolved, but the accompanying magnification *can* be observed. We’ll discuss magnification in the next section.

Background sources with images near \(\kappa \approx 1\) display large distortions and magnifications and may be multiply imaged. For this reason, lensing in the regime \(\kappa \approx 1\) is called **strong lensing**. Lensing when \(\kappa \ll 1\) only gives rise to subtle differences in the shape and magnitude of background sources and multiple images do not occur; this regime is therefore known as **weak lensing**. There is no
precise definition of the boundary between strong and weak lensing, but generally it is the case that the effects of strong lensing can be determined for a single source, while weak lensing signatures are only detectable when combining data on multiple background sources.

### 16.2.4. Galaxy masses from strong gravitational lensing¶

Now that we understand the basic ingredients in gravitational lensing, we can start considering the application of gravitational lensing to observational data. The basic observational data in gravitational lensing are:

The positions of the images for multiply-imaged sources;

the relative arrival times of different images for variable sources;

the shape of the images for extended sources;

and the relative fluxes of different images (or the relative sizes of extended sources).

To describe shape, flux, and size distortions, we need the derivative of the lensing transformation, which we discuss in the next section, so here we will start by considering the first two observables.

Starting with the image positions \(\boldsymbol{\theta}_i\) for a multiply-imaged source, we know that each position satisfies the lensing equation \eqref{eq-gravlens-lensing-eq-reduced}. Because each image results from the same source position \(\boldsymbol{\beta}\), we have that

Fully modeling an observed lensing system requires one to infer the unknown source position \(\boldsymbol{\beta}\), but simply from the knowledge that different images come from the same source, we have that

Thus, the observed image locations directly constrain the difference in reduced deflection angle between the locations. For axially-symmetric lenses, Equation \eqref{eq-gravlens-deflect-from-kappa-axially-symm} demonstrates that this difference is simply related to the difference in enclosed masses contained in the cylinder traced out by \(|\boldsymbol{\theta}| = |\boldsymbol{\theta}_i|\). For non-axially-symmetric lenses, \(\Delta \boldsymbol{\alpha}\) corresponds to a difference in the integrated convergence, but as long as the lens isn’t too far from axial symmetry, the latter is close to the enclosed mass. This is similar to how measurements of the circular velocity relate to the enclosed mass: for spherical potentials, the circular velocity is directly related to the enclosed mass, for non-spherical potentials \(V_c\) directly relates to the gravitational field.

In the examples of the point-mass and singular-isothermal-sphere lenses in Section 16.1.3, we saw that when \(\boldsymbol{\beta} = 0\), an entire ring \(|\boldsymbol{\theta}| = \theta_\mathrm{E}\) solves the lensing equation, with \(\theta_\mathrm{E}\) the Einstein radius. This in fact happens for any axially-symmetric lens, because \(\boldsymbol{\alpha} \parallel \boldsymbol{\theta}\). Using Equation \eqref{eq-gravlens-deflect-from-kappa-axially-symm}, the Einstein radius is

or

where \(M\) is the mass enclosed within \(\theta_\mathrm{E}\) (this is, of course, the same expression as we derived for a point mass in Equation \ref{eq-gravlens-einstein-angle-point-mass}). Thus, when we observe an Einstein ring, the radius of the ring combined with the \(D_{\mathrm{LS}}/(D_{\mathrm{L}} D_{\mathrm{S}})\) factor that can be determined using redshifts, directly determines the enclosed mass within the Einstein radius. Because gravitational lensing does not need to make any assumptions about the dynamical state of the lens, this is a highly robust mass measurement. The physical size of the Einstein radius is

Observed Einstein radii are typically \(\sim 1''\), so Equations \eqref{eq-gravlens-einstein-radius-units} and \eqref{eq-gravlens-einstein-radius-kpc} demonstrate that this corresponds to a mass of \(\sim 10^{11}\,M_\odot\) within \(\sim 5\,\mathrm{kpc}\). Thus, strong lensing constrains the mass of galaxies in the region where stars (and gas for spirals) contribute significantly to the mass, *not* at large distances where dark matter dominates.

Image positions, or the Einstein ring, in gravitational lens systems quite directly measure the enclosed mass and they are therefore analogous to rotation curve observations, except that they only cover a narrow range of radii. For variable sources, it is also possible to determine the relative arrival times of the images (by looking for when the time series observed in one image repeats in the other images). Equation \eqref{eq-gravlens-fermat-vs-timedelay} shows that the difference between arrival times then directly determines differences in the Fermat potential at the image locations, multiplied by the time-delay distance that can again be determined from observed redshifts for the source and lens. Modeling of the image locations allows one to determine the source location \(\boldsymbol{\beta}\) and the difference in Fermat potential can then be converted to a difference in the lensing potential. Thus, observed time delays determine differences in the lensing potential. This is analogous to how a determination of the escape velocity at a position in a galaxy directly constrains the difference between potential at that position and infinity (see Chapter 4.2). Similar to the gravitational potential, the lensing potential is sensitive to the mass distribution on very large scales, and observed relative time delays can therefore determine the potential on a more global scale than can image positions or the location of the Einstein ring.

As we have already stressed, unlike the situtation in dynamical modeling, lensing does not need any additional assumptions and only relies on the general theory of relativity (in dynamical modeling, unless we can directly determine accelerations, we always need additional assumptions about the dynamical state of the sytem). This makes lensing the most robust method for determining masses. However, lensing does have some disadvantages compared to dynamical modeling:

An observed Einstein ring essentially only determines the total mass contained within the Einstein ring. Because the appearance of an Einstein ring indicates that the system is very close to having axial symmetry, the Einstein ring is not sensitive to the details of the mass distribution either inside or outside of the ring. Multiple images (coming from an off-center source or from a non-axially-symmetric source) similarly appear at \(\kappa \approx 1\) and thus also only constrain the mass distribution over a narrow range in radii. Strong lensing therefore only essentially provides a measurement of the mass within the Einstein radius. Weak lensing allows one to study the mass distribution outside the Einstein radius, but neither strong nor weak lensing can determine the mass distribution at scales \(\ll \theta_\mathrm{E}\), in particular the cores of galaxies (although the general absence of the central, odd image does provide a qualitative constraint; see Section 16.4.3 below).

Because an individual galaxy or cluster is so thin compared to the cosmological distances between the observer, lens, and source, gravitational lensing by an individual system is only sensitive to the integrated surface density, and has no way of determining the mass distribution along the line of sight. Thus, with lensing alone, we cannot determine the three-dimensional mass distribution of a galaxy or cluster.

All lensing observables with the exception of relative arrival times are dimensionless (the angular positions of images, and the relative magnification and shear that we will consider below). Converting image positions or the Einstein angle to constraints on the mass distribution therefore depends on our knowledge of cosmological distances (through the critical surface density). Furthermore, lensing is very susceptible to exact degeneracies that seriously limit how much information we can extract from a lensing system without additional information.

The degeneracies pose a serious problem. Degeneracies result from transformations of the total time delay in Equation \eqref{eq-gravlens-fermat-vs-timedelay} from which all lensing observables can be derived. There are two types of degeneracies: (i) general degeneracies of the time delay that occur for any image positions and (ii) degeneracies for the observed image positions. The latter consists of transformations that would affect images formed for other sources, but not the actual observed images. The main types of degeneracies are similarity degeneracies and the mass-sheet degeneracy (Falco et al. 1985; Gorenstein et al. 1988; Saha 2000). The similarity degeneracies rescale the geometric and Shapiro contributions to the time delay separately. For example, if we multiply both the time-delay distance \(D_{\Delta t}\) and the surface density \(\Sigma\) by the same factor \(s\), then \(\Delta t_\mathrm{lensing} \rightarrow s\,\Delta t_\mathrm{lensing}\). Because the lensing equation is equivalent to \(\nabla_{\boldsymbol{\theta}}\left(\Delta t_\mathrm{lensing}\right)= 0\), this transformation leaves the image positions unchanged, but scales the relative time delays by \(s\). Of course, we can only scale the time-delay distance if we do not know it! Because we can typically determine the lens and source redshift, the main uncertainty in \(D_{\Delta t}\) comes from the assumed cosmological parameters, with the main uncertainty being the inverse dependence of \(D_{\Delta t}\) on the Hubble constant \(H_0\) (varying the density parameters of dark energy and matter has a much more subtle effect). Thus, any uncertainty in \(H_0\) directly translates into an equivalent uncertainty in the inferred \(\Sigma\) from lensing. Observations of relative time delays break this degeneracy and allow both the Hubble constant and \(\Sigma\) to be determined (Refsdal 1964). Another version of the similarity degeneracy is to rescale the source and image positions by \(\sqrt{s}\) instead of rescaling \(D_{\Delta t}\); this can only be done without affecting observables for unresolved lensing, as in the case of microlensing in the Milky Way.

A more serious degeneracy is the **mass-sheet degeneracy**. This degeneracy only acts on the Fermat potential part of the time delay and so we can work with \(\tau(\boldsymbol{\theta};\boldsymbol{\beta})\). Under the following transformation of the lensing potential

the Fermat potential transforms as

Thus, the Fermat potential gets multiplied by \((1-\lambda)\) and the source is shifted by \(\boldsymbol{\beta}'\) and scaled by \(1/(1-\lambda)\). Because the source position is not directly observable, the source shift/scale is unobservable. Relative time delays scale as \((1-\lambda)\). Because the entire Fermat potential is scaled by \((1-\lambda)\), the lensing equation remains the same, except for the shift and scale in the source position; observed image positions are
therefore conserved by the transformation. The mass-sheet degeneracy is known as the *mass-sheet degeneracy*, because the transformation of the lensing potential according to Equation \eqref{eq-gravlens-mass-sheet-transform} modifies the convergence as

(note that \(\nabla^2 |\boldsymbol{\theta}|^2 = 4\)), and therefore corresponds to the addition of a sheet with constant surface density \(\lambda\,\Sigma_{\mathrm{crit}}\) (and, thus, constant convergence \(\lambda\)).

The mass-sheet degeneracy is a serious issue for measurements of the Hubble constant using relative time delays in strong gravitational lenses. Relative time delays scale as \(H_0^{-1}\) and varying \(\lambda\) directly affects the inferred Hubble constant as \(H_0(1-\lambda)\). By letting \(\lambda\) go to one, the inferred Hubble constant can be arbitrarily small!

The mass-sheet degeneracy may seem like an unrealistic degeneracy, because it requires an infinite sheet of constant surface density to be added to the system and we may think that a simple boundary condition that \(\kappa \rightarrow 0\) at large distances from the lens would force \(\lambda \rightarrow 0\). But this is a case where a degeneracy that leaves the observed properties unaffected while otherwise changing the model is important. Because the deflection angle for an axially-symmetric lens only depends on the enclosed mass, we can re-distribute the constant-density sheet’s mass into a finite disk as long as we do this in an axially-symmetric way and keep the enclosed mass within and outside of all observed images the same (Saha 2000). This is then a perfectly-plausible mass distribution that is hard to exclude a priori. This degeneracy can only be broken by including non-lensing data, such as measurements of the internal velocities of the lens (Grogin & Narayan 1996).

## 16.3. Magnification and shear¶

### 16.3.1. Introduction¶

Nearby light rays that pass by a gravitational lens are deflected in a similar, but not identical manner and this causes lensed objects to be deformed and magnified. This deformation is described by the Jacobian of the transformation from the source plane position \(\boldsymbol{\beta}\) to the observed position \(\boldsymbol{\theta}\). This Jacobian is known as the *magnification matrix* with elements

The overall increase \(\mu\) in the size of a lensed source is given by the determinant of \(\mathcal{M}\) and the parameter \(\mu = |\mathcal{M}|\) is called the **magnification**. The magnification depends on the position of the image relative to the lens.

As an example, we can compute the magnification for lensing by a singular isothermal sphere. We can write the solution of the lensing equation \eqref{eq-gravlens-lensing-eq-reduced-sis} for the singular isothermal sphere as

where the negative-sign solution is only a solution when \(|\boldsymbol{\beta}| < \theta_\mathrm{E}\) (\(\theta_\mathrm{E}\) is defined in Equation \ref{eq-gravlens-sis-einstein-angle}). The magnification matrix is then given by

where \((\beta_1,\beta_2)\) are the components of the source vector. The magnification is

This example demonstrates a few important general aspects of the magnification produced by gravitational lensing. Firstly, the *total* magnification of all of the images—the sum of \(\mu\) for all produced images—is larger than 1: in this example, the total magnification is equal to \(\mu_{1} = 1 +\theta_\mathrm{E} / |\boldsymbol{\beta}|\) in the case of a single image or \(\mu_{1+2}=2\) when there are two images (excepting \(\boldsymbol{\beta} = 0\)). Thus, the total size of the
source on the sky is increased. Secondly, there is a source location where the magnification diverges: at \(\boldsymbol{\beta} = 0\) for the singular isothermal sphere. Thirdly, at large separations between the source and the lens, the magnification tends to one, that is, the unlensed size.

### 16.3.2. The conservation of surface brightness and the magnification¶

The magnification is called the *magnification*, because it corresponds to the increase in the observed flux density \(F_\nu\) at a given frequency \(\nu\). The reason for this is as follows: for unresolved objects, modern CCD detectors measure the flux density—the amount of energy passing through a unit surface element per unit time and per unit wavelength—by essentially counting the number of photons of a given (range of) frequency incident on the
detector per unit time. The flux density is related to the specific intensity or surface brightness \(I_\nu\) through

where the integral is over the entire angular extent of the observed object (\(\Omega\) is solid angle and the factor \(\cos\theta\) accounts for the angle between the line of sight to the object and the detector). For all but a small number of astronomical sources, the object is so small that this is well approximated by

where \(\Delta \Omega\) is the angular size of the source and \(\cos \theta\) is tyically one. If the wavelength of light remains the same during propagation (an important caveat that we will return to below), then surface brightness is conserved along the light path. This is because the surface brightness is related to the phase-space density \(f\) of photons through

where \(h\) is the Planck constant. The phase-space density is conserved for light in the absence of absorption, emission, or scattering, because of the optics equivalent of Liouville’s theorem in the Hamiltonian form of geometrical optics (derivation similar to that in Chapter 6.3; this conservation law is also known as the *conservation of etendue*). The conservation of phase-space density itself is a consequence of the conservation
of phase-space volume and of the number of photons (as no photons are assumed to be created or destroyed); this continues to hold in the general theory of relativity because the phase-space volume is Lorentz invariant (e.g., Misner et al. 1973). Because surface brightness is conserved, the ratio of the lensed vs. unlensed flux density of an unresolved source is therefore the magnification \(\mu\): lensing increases the flux density (and thus the apparent magnitude) of background sources,
because it increases their angular size at constant surface brightness. As the singular-isothermal-sphere example above demonstrates, lensing thus typically increases the magnitude of unresolved sources *regardless of their relative orientation with respect to the lens* (\(\mu \geq 1\ \forall \boldsymbol{\beta}\)). Thus, lensing typically increases the magnitude of a source *for any observer in the Universe*. At first glance, this seems counterintuitive!

Physically, what happens during lensing is that the momentum density of photons at the detector decreases (the momentum volume is \(\propto \nu^2\mathrm{d}\nu\mathrm{d}\Omega\)), while the spatial density increases to make up for this to satisfy Liouville’s theorem. Therefore, lensing effectively moves the observer closer to the source, where it appears larger on the sky and where the flux is higher. This explains the apparent paradox that lensing magnifies *all* background sources. One
might be tempted to think that the increased brightness of a background object due to lensing needs to be counteracted by a decreased brightness as observed by another observer somewhere else in the Universe, but this is not borne out by the equations. That this is possible is because lensing effectively moves *all* observers closer to the source.

For resolved objects, detectors record the surface brightness directly rather than the flux density and because surface brightness is conserved, resolved objects therefore do not appear brighter than their unlensed counterparts (the same thing happens with any telescope, which does not make resolved objects brighter, just larger). However, the increased size of resolved objects increases both their total flux and their footprint on a detector and this therefore makes it easier to determine their internal structure, for example, with detailed photometric or spectroscopic observations.

Finally, as discussed above, the conservation of surface brightness requires that light does not change wavelength as it propagates. This is the case for observations in our Galaxy or in the nearby Universe, but once objects are located at cosmological distances, the expansion of the Universe causes the frequency of light to decrease as \(\nu \propto (1+z)^{-3}\). Surface brightness is then no longer conserved, but instead surface brightness is \(\propto (1+z)^{-4}\), because there is an
additional factor of \((1+z)\) due to time dilation. Thus, objects at cosmological distances do appear less bright than their local counterparts. Gravitational lensing does not change this: because lensing by galaxies happens in such a small region of space with \(\delta z \lll 1\), surface brightness *is* conserved as light passes through the lens and for cosmological sources, lensing therefore simply maintains the \((1+z)^{-4}\) dilution of surface brightness.

### 16.3.3. Convergence and shear¶

Much can be learned about the magnification matrix from considering its inverse \(\mathcal{A} = \mathcal{M}^{-1}\), because this matrix can be expressed in terms of the second derivative of the lensing potential \(\psi(\boldsymbol{\theta})\). Using the lensing equation \eqref{eq-gravlens-lensing-eq-reduced} and the relation between the reduced deflection angle and the lensing potential from Equation \eqref{eq-gravlens-reduced-deflect-asgradient}, we have that

This form makes it immediately clear that \(\mathcal{A}\), and thus its inverse \(\mathcal{M}\) as well, is a symmetric matrix. It therefore only has 3 independent components. It is convenient to decompose \(\mathcal{A}\) into a diagonal matrix and a trace-free part as

where

and through Equation \eqref{eq-gravlens-2dpoisson} this \(\kappa\) is therefore the convergence. The parameters of the trace-free part \(\begin{pmatrix} \gamma_1 & \gamma_2 \\ \gamma_2 & -\gamma_1\end{pmatrix}\) are

and the eigenvalues of the trace-free part are

The trace-free part can also be written as

The full inverse magnification matrix can then be written as

The eigenvectors of this matrix (and of the trace-free part alone) are \((\sin \phi,-\cos\phi)\) and \((\cos\phi,\sin\phi)\) and the corresponding eigenvalues are

where the \(\lambda_t = 1-\kappa-\gamma\) eigenvalue has eigenvector \((\sin\phi,-\cos \phi)\), while the \(\lambda_r =1-\kappa+\gamma\) eigenvalue has eigenvector \((\cos \phi,\sin\phi)\). It is left as an exercise to show that for an axially-symmetric lens, the angle \(\phi\) is the angle between the reduced deflection angle and the \(x\) axis. For an axially-symmetric lens the deflection angle points along \((\cos\phi,\sin \phi)\), and \(\lambda_t\), therefore, gives the distortion tangential to the deflection angle and tangential to a circle around the lens’ center, while \(\lambda_r\) gives that radial to it. This is the origin of the tangential and radial descriptors.

The image distortion by lensing is therefore the combination of an isotropic scaling of the source and a shear. Notably absent is a rotation component; this is absent because the magnification matrix is symmetric. Lensing thus scales and shears images, but it does not rotate them. The shear can be thought of in two ways: either by the amount the image is squashed or stretched along the \(x=0\) (with \(\gamma_1\)) and \(x=y\) (with \(\gamma_2\)) directions or by the amount the image is squashed or stretched along the line that makes an angle \(\phi\) with the \(x\) axis. Adding the isotropic scaling effect of the convergence, all of these image distortions are illustrated in the following figure (where non-labeled parameters are zero):

```
[418]:
```

```
def transform_circle(kappa,gamma1,gamma2,radius=1.,sgn=1.):
A= numpy.array([[1.-kappa-gamma1,-gamma2],[-gamma2,1.-kappa+gamma1]])
M= numpy.linalg.inv(A)
circle_x= numpy.linspace(-radius,radius,301)
circle_y= sgn*numpy.sqrt(radius**2.-circle_x**2.)
return numpy.dot(M,numpy.vstack((circle_x,circle_y)))
fill_between(*transform_circle(0.,0.,0.,sgn=-1),
transform_circle(0.,0.,0.,sgn=1)[1],
facecolor='0.8')
kappas= [0.2,0.,0.,-0.2,0.,0.]
gamma1s= [0.,0.2,0.,0.,-0.2,0.]
gamma2s= [0.,0.,0.2,0.,0.,-0.2]
phis= [0,0,45,0,90,0-45]
figsize(12,6)
for ii,(kappa,gamma1,gamma2,phi) in enumerate(zip(kappas,gamma1s,gamma2s,phis)):
subplot(2,3,ii+1)
fill_between(*transform_circle(0.,0.,0.,sgn=-1),
transform_circle(0.,0.,0.,sgn=1)[1],
facecolor='0.8')
plot(*transform_circle(kappa,gamma1,gamma2,sgn=1),'k-')
plot(*transform_circle(kappa,gamma1,gamma2,sgn=-1),'k-')
xlim(-1.5,1.5)
ylim(-1.5,1.5)
gca().set_aspect('equal')
gca().axis('off')
if kappa != 0.:
label= r'$\kappa = {:+.1f}$'.format(kappa)
elif gamma1 != 0.:
label= r'$\gamma_1 = {:+.1f};\ \phi= {:.0f}^\circ$'.format(gamma1,phi)
else:
label= r'$\gamma_2 = {:+.1f};\ \phi= {:+.0f}^\circ$'.format(gamma2,phi)
galpy_plot.text(label,title=True,fontsize=18.)
tight_layout();
```

The effect of the convergence and shear on a circular image with radius \(r\) is to transform it to an ellipse with semi-major and semi-minor axes

that is, the radius of the circle divided by the two eigenvalues.

Note that the shear in gravitational lensing does not correspond to the typical definition of shear in mathematics, where shear is a mapping that moves each point along a fixed direction by an amount proportional to the distance between the point and the line that is parallel to the shear direction and goes through the origin. Nevertheless, qualitatively, the shear in gravitational lensing is similar to the mathematical definition.

The magnification is \(\mu = |\mathcal{M}| = |\mathcal{A}|^{-1}\) and therefore

To illustrate the inverse-magnification matrix formalism, let’s revisit the example of the singular isothermal sphere that we considered at the beginning of this section. The potential of the singular isothermal sphere is \(\Phi(R,z) = \sigma^2\,\ln\left(R^2+z^2\right)+\mathrm{constant}\) from Equation (6.87) and we can compute the lensing potential \(\psi(\boldsymbol{\theta})\) from Equation \eqref{eq-gravlens-lensingpot} by fixing the constant such that \(\psi(0) = 0\). Using \(R \equiv D_{\mathrm{L}}|\boldsymbol{\theta}|\), we find that

using the definition from Equation \eqref{eq-gravlens-sis-einstein-angle} for the Einstein angle in the case of the singular isothermal sphere. Because \(\boldsymbol{\alpha} = \nabla_{\boldsymbol{\theta}} \psi\), this leads to the same reduced deflection angle as in Equation \eqref{eq-gravlens-reduceddeflect-sis}. We can then compute the convergence \(\kappa\) and shear components \(\gamma_1\) and \(\gamma_2\) using Equations \eqref{eq-gravlens-convergence-and-lensing-pot}, \eqref{eq-gravlens-shear1-and-lensing-pot}, and \eqref{eq-gravlens-shear2-and-lensing-pot} and we find that

and therefore

The eigenvalues of the inverse magnification matrix are therefore \(\lambda_r = 1\) and \(\lambda_t = 1-2\kappa\): images are only distorted in the tangential direction, leading to circular arcs surrounding the lens. This gives a first indication of why extended lensed images are generally arcs around the center of the lens. The magnification is

Using Equation \eqref{eq-gravlens-kappa-gammas-sis-1} it is easy to show that this is equivalent to Equation \eqref{eq-gravlens-magnification-sis-direct} that we computed directly at the beginning of this section. This expression once more shows why the convergence \(\kappa\) is an important quantity in gravitational lensing. If \(\kappa \ll 1\)—or, equivalently, \(\Sigma \ll \Sigma_{\mathrm{crit}}\)—the magnification is \(\mu \approx 1\); however, for \(\kappa \approx 1\), \(\mu\) becomes large.

Lensing potentials with constant convergence or shear are of some interest. Such potentials may seem unphysical, because galactic lens mass distributions should have \(\kappa, \gamma \rightarrow 0\) at large separations, but as we saw in Section 16.2.4 above, a constant-convergence mass sheet is an important degeneracy in lens modeling and can be re-arranged into a more physical disk distribution. Similarly, a lensing potential with constant shear can be used to approximately model the contribution to the shear from the environment of a lens. Because the convergence and shear are second derivatives of the lensing potential, constant convergence or shear is obtained from a lensing potential that is quadratic in \((\theta_1,\theta_2)\)

and from Equations \eqref{eq-gravlens-convergence-and-lensing-pot}, \eqref{eq-gravlens-shear1-and-lensing-pot}, and \eqref{eq-gravlens-shear2-and-lensing-pot}, we then have that

A lensing potential with zero shear has \(a=c=\kappa/2\) and \(b=0\) and is thus

This is the lensing potential of an infinite mass-sheet. It is no surprise that an infinite, homogeneous mass-sheet has zero shear, because shear distorts images in specific directions and there is no preferred direction anywhere on a homogeneous mass-sheet. An **external convergence** \(\kappa_\mathrm{ext}\) is often included in strong-lensing analyses to account for mass along the line of sight that is not part of the main lens. A pure-shear
lensing potential (with \(\kappa = 0\)) has \(a = -c = \gamma_1/2\) and \(b = \gamma_2\) and therefore

Such an **external shear** contribution is often included in lens models to account for the shear coming from the larger-scale environment of the lens.

### 16.3.4. Galaxy masses from weak gravitational lensing¶

We discussed in Section 16.2.4 above how strong-lensing observations can determine the mass within the Einstein radius of a lens well, with the main complication being the mass-sheet degeneracy. Equations \eqref{eq-gravlens-einstein-radius-units} and \eqref{eq-gravlens-einstein-radius-kpc} show that for the typically observed \(\theta_\mathrm{E}\sim 1''\), strong lensing constrains the mass of massive galaxies close to their centers at \(\mathcal{O}(1-10\,\mathrm{kpc})\). To determine the mass profile at larger distances, for example, to map the dark matter distribution out to the virial radius, we therefore have to venture beyond the strong-lensing regime where \(\kappa \approx 1\) to the weak lensing regime where \(\kappa, \gamma \ll 1\).

In the weak lensing regime, background sources are not multiply imaged, so the observables used in strong lensing analyses that we discussed in Section 16.2.4—relative image positions and relative arrival times—are no longer available. When only a single image of a background source is produced, its position does not contain any information about the lens, because all of the information is degenerate with the unobservable source position. Similarly, even if the source is variable, there is no information in the arrival time, because we cannot know when the signal would have arrived without the intervening lens. However, the single image is affected by the second derivatives of the lensing potential in the form of magnification and shear. In the weak lensing regime \(\kappa, \gamma \ll 1\), the magnification and shear are small, but by combining measurements from many background sources, they can be constrained.

While magnification by gravitational lensing is highly useful for making background objects easier to study, its use for determining the lens’ mass distribution is limited. The reason is that it is difficult to know the intrinsic size (for a resolved source) or flux (for an unresolved source) that provides the baseline from which magnification is measured. For resolved galaxies, there is essentially no standard ruler on galactic scales that is robust enough to be used as a baseline (that is, we cannot, for example, assume that a certain type of galaxy has a size of \(X\,\mathrm{kpc}\) between isophotes of \(Y\,\mathrm{mag\,arcsec}^{-2}\), which would then appear as \(\approx\mu X\,\mathrm{kpc}\)). Similarly, standard candles among unresolved extragalactic sources are rare. The best standard candles on cosmological scales are the type Ia supernovae discussed in Chapter 12.3.1, but they are so rare that only a single multiply-imaged type Ia supernovae is known at the time of writing (Goobar et al. 2017) and few weakly-lensed type Ia supernovae have been observed (e.g., Nordin et al. 2014). However, magnification can be detected statistically by its effect on the background population of unresolved sources. The most obvious effect is that the lensing magnification lowers the effective magnitude limit, such that a survey that is sensitive to unlensed sources down to a given flux limit is sensitive to a factor \(\mu\) fainter behind the lens. However, a competing effect is that the effective volume that is observed is smaller, because the lens magnifies the entire background area as well; therefore, the sources behind the lens come from an area of the sky that is effectively \(\mu\) smaller than the unlensed population. Whether or not more or less background sources are observed behind a lens is therefore set by how steeply the number of sources increases as a function of magnitude, the luminosity function of the sources. Generally, the luminosity function increases fast enough that the overall effect is that of more background sources. Such magnification measurements have been done using quasars (e.g., Ménard et al. 2010) and high-redshfit Lyman-break galaxies (e.g., Hildebrandt et al. 2009). By combining such measurements for many lenses, the typical mass profile of massive galaxies out to multiple Mpc can be determined.

Because magnification is so hard to determine from a set of background sources, shear is typically the quantity constrained by weak gravitational lensing through its effect on the shape of background galaxies. In Equation \eqref{eq-gravlens-lensed-axis-ratio}, we already saw that the effect of weak gravitational lensing is to deform a circular background source into an ellipse, but to focus on the effect on the *shape* of a background source, it helps to re-write the inverse
magnification matrix \(\mathcal{A}\) from Equation \eqref{eq-gravlens-inverse-mag} as

where

is the **reduced shear** (similarly, \(g = \gamma/[1-\kappa]\)). Equation \eqref{eq-gravlens-inversemag-asonemkappa-g} shows that \((1-\kappa)\) simply performs an overall scaling of the background source and that it is only the reduced shear components that change the shape. Thus, only the reduced shear can be determined from galaxy shapes. In the weak-lensing regime, \(\kappa \ll 1\) and, therefore,
\(g_i \approx \gamma_i\). For this reason, it is often said that weak-lensing can directly constrain the shear, but technically it is always the case that it is really the reduced shear that is constrained by shape measurements. In terms of the reduced shear, we can re-write Equation \eqref{eq-gravlens-lensed-axis-ratio} for the effect of weak lensing on a circular background source as

and the axis ratio \(a/b = (1+g)/(1-g)\) is a function of \(g\) alone.

Background sources are not circular, but to a good approximation they can typically be represented as ellipses. We can characterize these using their eccentricity (or ellipticity in this context), defined in the usual way as \(e = (a-b)/(a+b)\) with \(a\) and \(b\) the semi-major and semi-minor axis. But the effect of the shear on elliptical background sources is most easily expressed as a change to the *two-dimensional*
eccentricity defined as

where \(\theta\) is the position angle of the ellipse (the angle between the ellipse’s major axis and the \(x\) axis) and where in the second equality we have equivalently expressed this as a complex number (the angle appears as \(2\theta\) here, because an ellipse is symmetric under a rotation by \(180^\circ\); we avoid calling this two-dimensional quantity a vector, because it does not transform as a vector under rotations). We can similarly express the reduced shear as

The shear \(\boldsymbol{\gamma}\) can be represented similarly. The effect of the magnification matrix on the two-dimensional eccentricity is most easily expressed using the complex-number representation and in the limit \(g < 1\) is given by (Kochanek 1990; Miralda-Escude 1991; Seitz & Schneider 1997)

where \(^*\) denotes complex conjugation. In the weak-lensing regime, this simplifies to

where the final expression approximates \(\vec{g} \approx \boldsymbol{\gamma}\). Thus, the effect of weak-lensing shear is to change the two-dimensional eccentricity of background sources by the two-dimensional (reduced) shear. Background sources may have an unknown distribution of eccentricities \(e\), but their orientations are random and therefore the average \(\langle \vec{e}_\mathrm{int}\rangle\) of the intrinsic two-dimensional eccentricities is zero. Thus, we find that the average \(\langle \vec{e}_\mathrm{obs}\rangle\) of the observed two-dimensional eccentricities at a given location behind a lens is

The average of the observed eccentricities is therefore a direct measurement of the (reduced) shear!

Because it is easier to determine observationally, weak lensing analyses typically use a different definition of the ellipticity, \(\chi = (a^2-b^2)/(a^2+b^2)\), with a similar definition of the two-dimensional ellipticity as in Equation \eqref{eq-gravlens-twoecc}. In terms of this ellipticity, Equation \eqref{eq-gravlens-twode-weaklensing} becomes

and thus similarly

In practice, the width of the unknown eccentricity distribution means that we have to average over quite a few background sources for the mean intrinsic eccentricity to be much less than the shear (especially, because the shear is small itself). Thus, we require a large population of background sources that is furthermore dense on the sky. For individual galaxies, the shear itself and the number of background sources is too small to directly perform this analysis. However, in galaxy clusters both the shear itself is larger (because of the more extended mass distribution compared to a galaxy) and the number of background sources is larger because of the larger sky footprint of galaxy clusters, which are \(\gtrsim 100\) times larger than massive galaxies. Thus, in galaxy clusters, one can determine the two-dimensional shear \(\boldsymbol{\gamma}\) over the entire area of the cluster. This shear field can then be inverted to obtain the convergence. That this is possible is clear from the fact that the shear is given by second derivatives of the lensing potential, while the lensing potential itself can be computed as a two-dimensional integral over the convergence (Equation \ref{eq-gravlens-lenspot-from-kappa}). This can be done in various ways, e.g., directly using the shear in a two-dimensional integral that relates the convergence to the shear (Kaiser & Squires 1993) or relating the convergence to an integral over the gradient of the shear (Kaiser 1995). However, as we saw in Equation \eqref{eq-gravlens-masssheet-zero-shear}, a constant-density mass-sheet does not give rise to shear and shear-based weak lensing is therefore subject to the mass-sheet degeneracy: shear-based weak lensing can only determine the convergence up to an unknown constant. This degeneracy can be broken, e.g., by observing a single magnification, because a mass-sheet with convergence \(\lambda\) changes the magnification by \(1/(1-\lambda)^2\), or using non-lensing information such as velocity dispersions or X-ray constraints on the cluster mass.

For galaxies, even massive ones, the induced shear is too small and the available populations of background sources are generally too small to detect the lensing signature (e.g., Bartelmann & Schneider 2001). However, it is still possible to detect the subtle effect of gravitational lensing and constrain the mass profile by combining measurements from many different galaxies of a certain type and determining their typical mass profile. Because it is difficult to combine the direct shear
\(\boldsymbol{\gamma}\) measurements directly for many different galaxies, this is often done by looking for the effect of lensing on the *orientation* of background galaxies rather than on their shape (Brainerd et al. 1996). This is known as *galaxy-galaxy lensing*. We can calculate the weak-lensing effect on background-source orientations by considering how the background distribution of two-dimensional eccentricities \(\vec{e}\) is
modified by lensing. From Equation \eqref{eq-gravlens-twode-weaklensing}, we know that the intrinsic distribution \(p_{\mathrm{int}}(\vec{e})\) is transformed to the observed distribution

where in the second approximation we have assumed the weak-lensing regime \(\vec{g} \approx \boldsymbol{\gamma}\) and that \(|\boldsymbol{\gamma}| \ll |\vec{e}|\) (when using the other ellipticity definition \(\boldsymbol{\chi}\), \(\vec{g}\) and \(\gamma_i\) simply require an additional factor of two, see Equation \ref{eq-gravlens-twodchi-weaklensing}). For the two-dimensional eccentricity, we again use the complex-number representation \(\vec{e} = e\,e^{2i\theta}\). Assuming that the intrinsic orientations are fully random and that the intrinsic distribution of \(\theta\) is therefore uniform, we have that \(p_{\mathrm{int}}(\vec{e}) \propto p_{\mathrm{int}}(e)\). The background galaxy is located at an angle \(\eta\) in the coordinate frame used for the lens, but we can rotate the coordinate system such that this angle is zero. In this coordinate system, the shear is given by

because the shear matrix \(\vec{G} = \begin{pmatrix} \gamma_1 & \gamma_2 \\ \gamma_2 & -\gamma_1 \end{pmatrix}\) transforms under rotation matrices \(\vec{R}\) as \(\vec{R}\vec{G}\vec{R}^T\). In Equation \eqref{eq-gravlens-pobse-pinte} we then have that

where we have defined

The observed distribution of orientations \(\theta\) is then given by

using that \(\mathrm{d}\vec{e} = e\,\mathrm{d}e\,\mathrm{d}\theta\) and where in going from the second to the third line we have used integration by parts assuming that the surface terms are zero (they are for typical eccentricity distributions). The factor of \(2/\pi\) comes from the fact that the orientation can be constrained to \(0 \leq \theta \leq \pi/2\) for this purpose, because of the \(\cos 2\theta\) dependence. For typical values of the shear \(\gamma_T \approx 0.01\) and the average inverse eccentricity \(\langle 1/e\rangle \approx 8\) (Brainerd et al. 1996), the intrinsic and lensed orientation distributions are shown in the following figure:

```
[433]:
```

```
oneovere= 8.
gammat= 0.01
thetas= numpy.linspace(0.,numpy.pi/2.,101)
figsize(7,4)
plot(thetas*180./numpy.pi,1./180.*thetas**0,label=r'$\mathrm{intrinsic}$')
plot(thetas*180./numpy.pi,1./180.*(1.-gammat*numpy.cos(2.*thetas)*oneovere),
label=r'$\mathrm{lensed},\, \gamma_T = 0.01$')
ylim(0.,0.008)
xlabel(r'$\theta\,(\mathrm{deg})$')
ylabel(r'$p(\theta)$')
legend(frameon=False,fontsize=18.);
```

Thus, we see that the effect of lensing is to subtly skew the distribution of orientation angles towards \(90^\circ\). Because we defined the coordinate system such that the background galaxy lies along \(y=0\), this means that the background source’s major axis is skewed towards being perpendicular to the line connecting the source to the lens; that is, the preference is towards *tangential alignment*.

By examining the distribution of orientation angles behind lenses of a certain type, one can therefore infer the radial dependence of the shear \(\gamma_T\) defined in Equation \eqref{eq-gravlens-tangential-shear-galaxy-galaxy}. For general lens mass distribution, this provides a constraint on the two-dimensional shear. But generally, observational analyses measure the azimuthally-averaged tangential shear \(\langle\gamma_T\rangle\) and this is related to the azimuthally-averaged convergence \(\langle \kappa \rangle\) as (Kaiser 1995)

where \(\langle \kappa (<r)\rangle\) is the average convergence inside of \(r\) (equal to the enclosed mass divided by \(2\pi\Sigma_{\mathrm{crit}}\)) and \(\langle \kappa (r)\rangle\) that at \(r\). Thus, a measurement of \(\langle\gamma_T\rangle\) directly constrains the mass profile, specifically, the difference between the average convergence at smaller radii and the local convergence (because of the mass-sheet degeneracy, it is again only possible to constrain the convergence up to a constant). Because the preferential tangential alignment of background galaxies can be measured out to the virial radius and beyond, these measurements allow the dark matter halos of massive galaxies to be measured (e.g., Hoekstra et al. 2004). Alternatively, it is possible to perform an analysis of the tangential modification of the eccentricities themselves along similar lines (e.g., Sheldon et al. 2004).

Because weak-lensing analyses must combine shape measurements from many background sources, weak lensing is a field in which all of the observational and interpretational details matter greatly. Much of the weak-lensing literature is therefore concerned with the proper determination of galaxy shapes from images of galaxies, with an especially large role played by the point-spread function that blurs all observations of galaxies. Another important issue is that of intrinsic alignments between
background sources, which is important at large distances from the lenses where the intrinsic shear is \(\approx 10^{-4}\). We will not discuss these issues here, but instead refer to reader to excellent reviews on this topic (e.g., Mandelbaum 2018). In the context of our discussion, however, we point out one complication: we have phrased the entire weak-lensing formalism in this section in terms of the dimensionless quantities \(\kappa\) and \(\gamma_i\), but these are
related to the physical mass distribution using the critical surface density, which depends on the distances involved as \(\Sigma_\mathrm{crit} \propto D_{\mathrm{S}}/[D_{\mathrm{LS}}\,D_{\mathrm{L}}]\). Thus, when using a single (cluster) lens with many background sources, or multiple (galaxy) lenses with many background sources, we have to accounted for the fact that each lens-source pair has a *different* value of \(D_{\mathrm{S}}/[D_{\mathrm{LS}}\,D_{\mathrm{L}}]\). While
spectroscopic redshifts are often available for the lenses, the large number of background sources means that it is not typically feasible to obtain spectroscopic redshifts for the background sources, but instead that one has to rely on photometric redshifts. However, for two reasons this isn’t a very large problem: (i) for low-redshift lenses and high-redshift background sources, \(D_{\mathrm{S}}/[D_{\mathrm{LS}}\,D_{\mathrm{L}}] \approx 1/D_{\mathrm{L}}\) and the lens redshift is therefore
more valuable than the source redshifts, and (ii) the intrinsic noise in the measurement coming from the eccentricity distribution is much larger than the noise coming from the photometric redshifts. Of course, for the most sensitive lensing measurements, it is important to properly account for the redshifts of the lenses and sources.

## 16.4. The occurrence of images and the critical and caustic curves¶

### 16.4.1. The topology of the Fermat potential¶

We have already solved the lensing equation \eqref{eq-gravlens-lensing-eq-reduced} for the special cases of a point mass or singular-isothermal sphere and we saw that under certain conditions, multiple images of a single source can form. We have also shown that for axially-symmetric lenses, the images lie on a single line, the line that connects the source to the center of the lens. Now, we turn to a more general discussion of where images form, how many images form, and the properties of these images.

To broadly understand when and where images form, we can use the fact that images form at extrema of the Fermat potential, as we demonstrated in Section 16.2.2 above. The Fermat potential, given in Equation \eqref{eq-gravlens-fermat}, is a simple sum of (i) a quadratic function \(\left|\boldsymbol{\theta}-\boldsymbol{\beta}\right|^2/2\) and (ii) minus the lensing potential \(-\psi(\boldsymbol{\theta})\). The quadratic function has a single extremum (a minimum) at \(\boldsymbol{\theta} = \boldsymbol{\beta}\), while \(-\psi(\boldsymbol{\theta})\) generally has a single extremum (a maximum) at the center of the lens for realistic galactic mass distributions. Where and how many images form is then a function of to what extent the lens \(-\psi(\boldsymbol{\theta})\) can push up the \(\left|\boldsymbol{\theta}-\boldsymbol{\beta}\right|^2/2\) surface to create additional extrema in addition to the \(\boldsymbol{\theta} \approx \boldsymbol{\beta}\) one.

For axially-symmetric lenses, we discussed an example Fermat potential just after Equation \eqref{eq-gravlens-fermat} that illustrates the different possibilities. If the source position \(\boldsymbol{\beta}\) is far from the lens’ center, \(|\psi(\boldsymbol{\theta})| \ll \left|\boldsymbol{\theta}-\boldsymbol{\beta}\right|^2/2\) and there is a single image with \(\boldsymbol{\theta} \approx \boldsymbol{\beta}\). As the source gets close to the lens’ center, \(-\psi(\boldsymbol{\theta})\) increases and, if the surface density of the lens is high enough, at some point becomes large enough to create a maximum in the Fermat potential, where an additional image forms. Because for galaxy lenses, we can assume that \(\psi(\boldsymbol{\theta}) \rightarrow 0\) as \(\boldsymbol{\theta} \rightarrow \infty\), at large \(|\boldsymbol{\theta}|\) the Fermat potential approaches \(\left|\boldsymbol{\theta}-\boldsymbol{\beta}\right|^2/2\), which increases as \(\boldsymbol{\theta} \rightarrow \infty\). Therefore, aside from the minimum at \(\boldsymbol{\theta} \approx \boldsymbol{\beta}\), another minimum has to be present on the other side of the maximum and a third image has to be present. An axially-symmetric galaxy lens therefore produces either one or three images. However, as we discuss below, we may not be able to observe all three images and for a lens with a central singularity, one of the images may be absent altogether.

For lenses that do not possess axial symmetry, more complex image configurations are possible. One possible scenario is shown in the following figure, where we move a source along the line \(\beta_1 = 2\beta_2\) from an initial wide separation between source and lens center to a close alignment. We show contours of the Fermat potential for a massive elliptical galaxy centered at the origin (again modeled as a perfect ellipsoid), with the unlensed position of the source indicated by a plus sign:

```
[476]:
```

```
from galpy.potential import PerfectEllipsoidPotential
# Twice as massive as axially-symmetric example, could
# also reduce zlens to 0.3-ish
pep= PerfectEllipsoidPotential(amp=10.**12*u.Msun,
a=4*u.kpc,b=0.5,c=1.)
zlens, zsource= 0.5, 1.
betas= [[2,1],[0.8,0.4],[0.4,0.2],[0.2,0.1]]
levelss= [[21.8,22.,22.2,22.6,23.,23.4,23.8,24.2,24.6,25.,
25.5,26.5,27.5,28.5,30.,32.,34.,36.,38.,40.,43.,46.,49.],
[23.,23.2,23.4,23.6,24.,24.4,24.8,25.2,25.6,26.,
27.,28.,29.,30.,32.,34.,36.,38.,40.],
[23.4,23.6,23.8,23.9,24.,24.2,24.4,24.6,24.669,
24.8,25.,25.4,25.8,26.2,26.6,27.,27.5,28.,28.5,29.,
29.5,30.,30.5,31.,31.5],
[23.6,23.65,23.7,23.75,23.8,23.8225,23.83,23.85,
23.9,24.,24.1,24.2,24.3,24.354,24.4,24.5,24.75,25.,
25.25,25.5,26.,26.5,27.,27.5,28.,28.5,29.,29.5,30.,30.5]]
saddle_contourss= [[],[],
[24.669],[23.83,24.354]]
figsize(10,10)
for ii,(beta,levels,saddle_contours) \
in enumerate(zip(betas,levelss,saddle_contourss)):
subplot(2,2,ii+1)
if numpy.linalg.norm(beta) < 0.4:
thetas= numpy.linspace(-2.51,2.51,50)*u.arcsec
elif numpy.linalg.norm(beta) < 0.5:
thetas= numpy.linspace(-3.01,3.01,50)*u.arcsec
else:
thetas= numpy.linspace(-4.01,4.01,50)*u.arcsec
thetaxs= thetas
thetays= thetas
plotz= u.Quantity([u.Quantity(\
[fermat_potential(pep,u.Quantity(beta)*u.arcsec,
u.Quantity([thetax,thetay]),
zlens,zsource).to(u.arcsec**2)
for thetay in thetays]) for thetax in thetaxs])
levels= numpy.array(levels)
cntrlw= numpy.array([1. for l in levels])
for saddle in saddle_contours:
cntrlw[levels==saddle]= 2.
galpy_plot.dens2d(plotz.T,origin='lower',
xrange=[thetaxs[0].to_value(u.arcsec),
thetaxs[-1].to_value(u.arcsec)],
yrange=[thetays[0].to_value(u.arcsec),
thetays[-1].to_value(u.arcsec)],
xlabel=r'$\theta_1\,(\mathrm{arcsec})$',
ylabel=r'$\theta_2\,(\mathrm{arcsec})$',
justcontours=True,
levels=levels,
cntrlw=cntrlw,
gcf=True)
plot([beta[0]],[beta[1]],'+',color='#1f77b4',ms=15,mew=3)
tight_layout();
```

Again, when the source is far from the center of the lens, there is a single image with \(\boldsymbol{\theta} \approx \boldsymbol{\beta}\) (top row, where the right panel has a smaller source-lens separation and, consequently, larger distortions of the \(\left|\boldsymbol{\theta}-\boldsymbol{\beta}\right|^2/2\) contribution by the lensing potential). For a large-enough lens surface density, as the source gets closer to the lens center an additional extremum (a maximum) appears near the center of the lens (bottom left panel). Additionally, there is again a third extremum introduced by this, a saddle point that occurs where the thick contour intersects itself. In this case, there are therefore three images. As the source gets even closer to the lens center (bottom right panel), a further two extrema are introduced when the maximum is distorted into two maxima separated by another saddle point (at the intersection of the second thick contour). Thus, five images are produced.

While the exact locations and properties of the lensed images of a source depend on the details of the lensing potential, the general properties of images can be studied using methods from the mathematical field of topology. This field is concerned with properties of spaces that are invariant under, e.g., continuous deformations of the space. For example, the *genus* of a surface is the number of holes it has and is a topological quantity. A sphere has genus zero, as has any continuous
deformation of the sphere, while a donut has genus one. The number of images that form in gravitational lensing is similarly a topological quantity. Using topology, one can prove a theorem (a theorem!) about the number of images produced by a gravitational lens:

**The odd-number theorem** (Burke 1981; McKenzie 1985): *The number of images produced by a transparent, non-singular lens is odd.*

This is a very general theorem that does not, for example, require that the lensing potential increases monotonically as the distance to the lens center increases (as we assumed above for realistic galactic mass distributions). All that the theorem requires is that the lens is transparent, that is, that it does not obscure any of the images through, e.g., dust attenuation. It further requires that the lens is non-singular, meaning that the deflection angle is finite everywhere.

The odd-number theorem is the best kind of theorem in astrophysics: a mathematically-proven statement that does not seem to hold in nature! Recall the examples given at the start of this chapter, the “twin quasar” 0957+561 and the RXJ1131-1231 system, or other famous gravitational lenses like the Einstein cross. These all have an *even* number of images, two or four typically. While some lenses with three images are known (e.g., Lawrence et al. 1984; Hewitt et al. 1987), they are rare and some
lenses that are originally thought to have three images turn out to have four, with one image pair hard to resolve (e.g., the second lens system to be discovered: PG1115+08; Weymann et al. 1980; Hege et al. 1981). Thus, the assumptions behind the odd-number theorem have to be broken by most real gravitational lenses. We will come back to this later.

Other general statements about the lensed images that we can make relate to their parity—whether or not the image is flipped along two perpendicular directions—and the image magnification. These are statements about extended images and therefore require the magnification matrix from Section 16.3 above. From Equation \eqref{eq-gravlens-inverse-mag-as-d2lensing}, we can also write the inverse magnification matrix \(\mathcal{A}\) as the Hessian of the Fermat potential

and the parity and magnification of images is therefore directly determined by the curvature of the Fermat potential or, equivalently, the time-delay surface \(\Delta t_\mathrm{lensing} = D_{\Delta t}\tau(\boldsymbol{\theta};\boldsymbol{\beta})/c\) (Equation \ref{eq-gravlens-fermat-vs-timedelay}). The parity and magnification are both determined by the eigenvalues \(\lambda_r\) and \(\lambda_t\) of the magnification matrix. We say that an image has positive partial parity in one of the eigenvectors if the corresponding eigenvalue is positive and negative partial parity otherwise. If the product of the eigenvalues is positive, we say that the images has positive total parity and similar for negative total parity. Minima and maxima have positive total parity, while saddle points have negative total parity, but only a minimum has two positive partial parities. These different parity combinations are also known as Type I (for two positive partial parities, a minimum), Type II (for opposite sign partial parities, a saddle point), and Type III (for two negative partial parities, a maximum). The following then hold (Blandford & Narayan 1986):

For a non-singular, transparent lens, \(n+1\) of the \(2n+1\) images have positive total parity and \(n\) have negative total parity. For example, in the example above, we have either (i) a single minimum, (ii) a minimum, a maximum, and a saddle point, or (iii) two minima, a maximum, and two saddle points. The addition of an image pair when the surface density increases always leads to the introduction of a saddle point and either a minimum or a maximum. The sum of the total parities of all of the images is always one.

The global minimum has positive total and partial parities. For a variable source, the global minimum can be determined as that which is observed first (that is, whose variable time series is not repeated in the other images), because it is the minimum of the time delay surface.

Images that are minima have magnifications \(\mu > 1\) (Schneider 1984). This follows from the fact that they have two positive partial parities and thus \(\lambda_{_{t}^{r}}= 1-\kappa\pm\gamma > 1\) and \(|\gamma| < 1-\kappa < 1\) (because we assume that \(\Sigma > 0\)). The magnification from Equation \eqref{eq-gravlens-magnification-from-kappa-gamma} is then \(\mu = 1/|(1-\kappa)^2-\gamma^2| > 1\), because \((1-\kappa)^2-\gamma^2 < 1-\gamma^2 < 1\). Because the global minimum has to exist, there is

*always*an image with more flux than the unlensed source. This again demonstrates that gravitational lensing always increases the total flux received from background sources.Images that are closer together tend to be more highly magnified. Qualitatively, this is because images that are closer together cannot have too much curvature in the Fermat potential to go between two nearby extrema. As images get so close together that they merge, the Fermat potential has to become flat between them and the magnification goes to infinity. We will come back to this when we discuss the importance of the critical curve below.

Because we cannot see the unlensed source for lensing by galaxies, we can of course not directly observe the parity of each image, but we can determine the relative parities of all observed images. With time-delay observations, we can determine the global minimum and because we know that it has two positive partial parities, we can use it to determine the absolute parity of all images.

From these considerations, we can make the following general statements about possible image configurations:

If there is a single image, then it has to be a minimum, with positive total and partial parities and increased size/flux. This is the case for lenses that have too low surface density to allow multiple images to form, or far away from the center of a massive lens that can produce multiple images. Thus, lensing keeps the orientation of the single image intact, but it generically magnifies it.

If there are three images, one has to be a minimum and one has to be a saddle point. The remaining image can either be a minimum or a maximum. The contour through the saddle point can then have different topologies, with the first (two minima, one saddle) having the topology of the lemniscate and the second one that of the limaçon. The limaçon is the topology of the three-image saddle-contour in the example above. The lemniscate is a horizontal figure eight shape. From the relative parities, we can determine which topology a given observed lens has.

If there are five images, there are six combinations possible (Blandford & Narayan 1986). These are built from the lemniscate/limaçon building blocks, with the additional images forming from the splitting of one of the minima or maxima into three images with lemniscate or limaçon topology. There are six rather than eight topological possibilities, because the splitting of the minima in the two-minima (lemniscate) three-image configuration are equivalent to each other. In the example above, the topology is that of the minimum in the three-image limaçon splitting into a lemniscate topology with two minima and one saddle point.

More than five images is possible as well, but requires increasingly non-typical mass distributions.

### 16.4.2. Critical curves and caustics¶

For a massive galaxy that is capable of producing multiple images, an interesting question is in which regions of the source and image planes multiple images are possible. For sources at large distances from the lens center, a single image is formed at the minimum of the Fermat potential. For a non-singular lens, the lensing potential smoothly distorts the \(\left|\boldsymbol{\theta}-\boldsymbol{\beta}\right|^2/2\) contribution to the Fermat potential and as a source gets closer to the lens
center, the Fermat potential (or the time-delay surface) distorts smoothly. At some point, the source will enter the region where three images form, with either lemniscate or limaçon topology. Typically, this involves a non-extremal part of the Fermat potential surface transitioning into a pair of extrema; looked at in the other direction of going from three to one image, this process is known as the *merging* of two images, after which they disappear (as we discuss below, it is also possible
for all three images to merge simultaneously, after which a single image remains). Because the transformation of the surface into two additional extrema is smooth, it is clear that at the transition, the Fermat potential surface has to become flat in at least one direction, the direction along which the split between the two new images happens. Because the curvature matrix (the Hessian) of the Fermat potential surface is the inverse magnification matrix, these loci of zero curvature correspond
to infinite magnification. The same happens when going from three to five images and so on: the part of the Fermat potential surface where two additional images form, always has to go through a configuration where the Fermat potential surface is flat along the splitting direction and the magnification diverges there (note that infinite magnification does not actually occur in reality, because of the finite size of the source and the breakdown of the geometrical-optics approximation near
caustics; Ohanian 1983). Thus, curves of (formally) infinite magnification separate regions with different numbers of images. In the image plane, these curves are called **critical curves**, while their locations in the source plane are known as **caustics** (caustics is also more generally used for both critical curves and caustics). Image pairs that appear when a source crosses a caustic are always on
opposite sides of the corresponding critical curve, because they always have to have opposite total parity from each other.

As we saw in Section 16.3.3 above, the magnification is \(\mu = 1/|\lambda_t\lambda_r|\), where \(\lambda_{_{t}^{r}}= 1-\kappa\pm\gamma\) are the radial and tangential eigenvalues. The critical curves are therefore the loci in the image plane where either the radial or tangential eigenvalue of the inverse magnification matrix is zero. When \(\lambda_r = 0\), we speak of the *radial critical curve*, while when \(\lambda_t = 0\), we
call this the *tangential critical curve*. When images form near the radial critical curve, they are strongly deformed in the radial direction, while images that form near the tangential critical curve are strongly deformed in the tangential direction (see Equation \ref{eq-gravlens-lensed-axis-ratio}). Because gravitational-lens systems are easier to spot when the magnification is high, observed lens systems typically have images near the critical curve and if the lensed source
is an extended object, they are therefore highly distorted (see, for example, the example of RXJ1131-1231 at the beginning of this chapter).

A general method for computing the critical curves and caustics of a given lens is to compute the inverse of the magnification in the source plane and solve for the locus where it is zero. Separately determining the radial and tangential critical curves is possible by solving for where the radial or tangential eigenvalue is zero, respectively. The corresponding caustic can then be obtained using the lensing equation \eqref{eq-gravlens-lensing-eq-reduced}, which directly allows the position \(\boldsymbol{\beta}\) of the caustic to be computed from the \(\boldsymbol{\theta}\) location of the critical curve. For some lens mass distributions, we can compute the critical curve and the caustic analytically and this is useful to gain intuition about what to expect for the critical curves and caustics of realistic galaxy lenses. For example, as we demonstrated in Section 16.3.3, we can compute the eigenvalues of the inverse magnification matrix analytically for the singular-isothermal sphere and we have that \(\lambda_r = 1\) and \(\lambda_t = 1-2\kappa\). Only the tangential critical curve therefore exists, because \(\lambda_r\) never equals zero. In general, the radial critical curve is inside of the tangential critical curve. This is easy to see from the eigenvalues: \(\lambda_t = 0\) requires \(1-\kappa = \gamma > 0\) and thus, \(\kappa < 1\), while \(\lambda_r = 0\) needs \(1-\kappa = -\gamma < 0\), and therefore \(\kappa > 1\). Assuming that the surface density, and thus the convergence, increases towards the center of the lens, \(\lambda_r = 0\) therefore occurs closer to the lens center than \(\lambda_t=0\). However, for caustics the order is generally opposite, with the radial caustic occuring outside of the tangential caustic.

To further illustrate the critical curves and caustics, we can compute them for the lens mass distribution for which we displayed the Fermat potential earlier in this section. For this mass distribution, we need to calculate the critical curves and caustics numerically, and to do this we need to compute \(\kappa\) and \(\gamma\) from the second derivatives of the lensing potential. The second derivatives can be calculated by numerically integrating derivatives of the gravitational potential itself over \(z\) similar to how we computed the deflection angle in Section 16.2.1 above. With the numerically calculated values of \(\kappa(\boldsymbol{\theta})\) and \(\gamma(\boldsymbol{\theta})\), we can plot contours of where \(\lambda_{_{t}^{r}} = 0\) to show the critical curves and transform these curves to the source plane using the lensing equation to display the caustics. This is all done in the following code:

```
[578]:
```

```
from scipy import integrate
from astropy.cosmology import Planck18 as cosmo
from astropy.constants import c
from galpy.potential import evaluateRforces, evaluatephiforces, \
evaluateR2derivs, evaluatephi2derivs, evaluateRphiderivs
def convergence_and_shear(Pot,theta,zlens,zsource,zmax=numpy.inf):
DL= cosmo.angular_diameter_distance(zlens)
cosmo_fac= DL*cosmo.angular_diameter_distance_z1z2(zlens,zsource)\
/cosmo.angular_diameter_distance(zsource)
R= (numpy.sqrt(theta[0]**2.+theta[1]**2.)*DL)\
.to(u.kpc,equivalencies=u.dimensionless_angles())
R= (R/(Pot[0]._ro if isinstance(Pot,list) else Pot._ro)/u.kpc)\
.to_value(u.dimensionless_unscaled)
phi= numpy.arctan2(theta[1],theta[0])
cp, sp= numpy.cos(phi), numpy.sin(phi)
cp2, sp2, cpsp= cp**2., sp**2., cp*sp
# Need to integrate Rforce, phiforce, R2deriv, phi2deriv, and Rphideriv
int_Rforce= integrate.quad(lambda z: evaluateRforces(Pot,R,z,phi=phi,
use_physical=False),
-zmax,zmax)[0]
int_phiforce= integrate.quad(lambda z: evaluatephiforces(Pot,R,z,phi=phi,
use_physical=False),
-zmax,zmax)[0]
int_R2deriv= integrate.quad(lambda z: evaluateR2derivs(Pot,R,z,phi=phi,
use_physical=False),
-zmax,zmax)[0]
int_phi2deriv= integrate.quad(lambda z: evaluatephi2derivs(Pot,R,z,phi=phi,
use_physical=False),
-zmax,zmax)[0]
int_Rphideriv= integrate.quad(lambda z: evaluateRphiderivs(Pot,R,z,phi=phi,
use_physical=False),
-zmax,zmax)[0]
# Now compte x2deriv, y2deriv, and xyderiv
int_x2deriv= int_R2deriv*cp2-2.*int_Rphideriv*cpsp/R+int_phi2deriv*sp2/R**2.\
-int_Rforce*sp2/R-2.*int_phiforce*cpsp/R**2.
int_y2deriv= int_R2deriv*sp2+2.*int_Rphideriv*cpsp/R+int_phi2deriv*cp2/R**2.\
-int_Rforce*cp2/R+2.*int_phiforce*cpsp/R**2.
int_xyderiv= int_R2deriv*cpsp+int_Rphideriv*(cp2-sp2)/R-int_phi2deriv*cpsp/R**2.\
+int_Rforce*cpsp/R+int_phiforce*(cp2-sp2)/R**2.
# and finally the convergence and shear
kappa= 0.5*(int_x2deriv+int_y2deriv)
gamma1= 0.5*(int_x2deriv-int_y2deriv)
gamma2= int_xyderiv
gamma= numpy.sqrt(gamma1**2.+gamma2**2.)
return ((cosmo_fac*2./c**2.*kappa
*(Pot[0]._vo**2./Pot[0]._ro if isinstance(Pot,list)
else Pot._vo**2./Pot._ro)/u.kpc*u.km**2/u.s**2)\
.to_value(u.dimensionless_unscaled),
(cosmo_fac*2./c**2.*gamma
*(Pot[0]._vo**2./Pot[0]._ro if isinstance(Pot,list)
else Pot._vo**2./Pot._ro)/u.kpc*u.km**2/u.s**2)\
.to_value(u.dimensionless_unscaled))
def source_from_image(Pot,theta,zlens,zsource):
"""Compute source position from image position
using the lensing equation"""
DL= cosmo.angular_diameter_distance(zlens)
cosmo_fac= cosmo.angular_diameter_distance_z1z2(zlens,zsource)\
/cosmo.angular_diameter_distance(zsource)
R= (numpy.sqrt(theta[0]**2.+theta[1]**2.)*DL)\
.to(u.kpc,equivalencies=u.dimensionless_angles())
R= (R/(Pot[0]._ro if isinstance(Pot,list) else Pot._ro)/u.kpc)\
.to_value(u.dimensionless_unscaled)
phi= numpy.arctan2(theta[1],theta[0])
alpha= cosmo_fac*deflection(Pot,R,phi=phi)
return theta-alpha
from galpy.potential import PerfectEllipsoidPotential
# Twice as massive as axially-symmetric example, could
# also reduce zlens to 0.3-ish
pep= PerfectEllipsoidPotential(amp=10.**12*u.Msun,
a=4*u.kpc,b=0.5,c=1.)
zlens, zsource= 0.5, 1.
# Compute convergence and shear on a grid in theta1 and theta2
thetas= numpy.linspace(-3.01,3.01,44)*u.arcsec
thetaxs= thetas
thetays= thetas
kappagamma= numpy.array([[convergence_and_shear(pep,
u.Quantity([thetax,thetay]),
zlens,zsource)
for thetay in thetays] for thetax in thetaxs])
kappa= kappagamma[:,:,0]
gamma= kappagamma[:,:,1]
lambdar= 1.-kappa+gamma
lambdat= 1.-kappa-gamma
figsize(8,4)
# Critical curves
subplot(1,2,1)
ltcont= galpy_plot.dens2d(lambdat.T,origin='lower',
xrange=[thetaxs[0].to_value(u.arcsec),
thetaxs[-1].to_value(u.arcsec)],
yrange=[thetays[0].to_value(u.arcsec),
thetays[-1].to_value(u.arcsec)],
xlabel=r'$\theta_1\,(\mathrm{arcsec})$',
ylabel=r'$\theta_2\,(\mathrm{arcsec})$',
justcontours=True,cntrlw=2.,
levels=[0.],gcf=True,
retCont=True)
lrcont= galpy_plot.dens2d(lambdar.T,origin='lower',
xrange=[thetaxs[0].to_value(u.arcsec),
thetaxs[-1].to_value(u.arcsec)],
yrange=[thetays[0].to_value(u.arcsec),
thetays[-1].to_value(u.arcsec)],
xlabel=r'$\theta_1\,(\mathrm{arcsec})$',
ylabel=r'$\theta_2\,(\mathrm{arcsec})$',
justcontours=True,cntrlw=2.,cntrls=':',
levels=[0.],overplot=True,
retCont=True)
galpy_plot.text(-2.3,2.,r'$\mathrm{Tangential\ critical\ curve}$',
fontsize=15.)
plot([-1.8,-1.25],[1.8,1.2],'k-',lw=0.8)
galpy_plot.text(-2.3,-2.4,r'$\mathrm{Radial\ critical\ curve}$',
fontsize=15.)
plot([-1.6,-.55],[-1.9,-.45],'k-',lw=0.8)
annotate(r'$\mathrm{Image\ plane}$',(0.5,1.01),xycoords='axes fraction',
ha='center',va='bottom',fontsize=18.)
# Corresponding caustics
subplot(1,2,2)
for ii,cont in enumerate([ltcont,lrcont]):
# Extract critical curve contour
critx= cont.collections[0].get_paths()[0].vertices[:,0]
crity= cont.collections[0].get_paths()[0].vertices[:,1]
# Contour may consist of multiple paths
for further in cont.collections[0].get_paths()[1:]:
critx= numpy.concatenate((critx,further.vertices[:,0]))
crity= numpy.concatenate((crity,further.vertices[:,1]))
# Map to source plane using the lensing equation
beta= numpy.array([source_from_image(pep,u.Quantity([tcritx,tcrity])*u.arcsec,zlens,zsource)
for tcritx,tcrity in zip(critx,crity)])
plot(beta[:,0],beta[:,1],'k-' if ii == 0 else 'k:',lw=2.)
galpy_plot.text(-2.3,2.5,r'$\mathrm{Tangential\ caustic}$',
fontsize=15.)
plot([-1.8,-.35],[2.35,0.35],'k-',lw=0.8)
galpy_plot.text(-2.3,-2.7,r'$\mathrm{Radial\ caustic}$',
fontsize=15.)
plot([-1.6,-1.],[-2.25,-1.7],'k-',lw=0.8)
annotate(r'$\mathrm{Source\ plane}$',(0.5,1.01),xycoords='axes fraction',
ha='center',va='bottom',fontsize=18.)
xlabel(r'$\theta_1\,(\mathrm{arcsec})$')
ylabel(r'$\theta_2\,(\mathrm{arcsec})$')
xlim(thetaxs[0].to_value(u.arcsec),thetaxs[-1].to_value(u.arcsec))
ylim(thetays[0].to_value(u.arcsec),thetays[-1].to_value(u.arcsec))
tight_layout();
```

On the left we see the expected behavior in the image plane: the tangential critical curve occurs outside of the radial critical curve. However, the caustics in the source plane on the right demonstrate that the curves are reversed in the source plane: the tangential caustic is inside the radial caustic. Thus, generically what happens when we move a source closer to the lens center is the following: a source outside the radial caustic produces a single image; as the source crosses the radial
caustic, two additional images are produced and these additional images appear close to the center of the lens near the radial critical curve; as the source moves closer and crosses the tangential caustic, two more images are created and these images appear further from the lens center, near the tangential critical curve. This is the behavior seen in the Fermat potential surfaces shown at the beginning of this section when we move the source closer to the center. All critical curves and caustics
are roughly elliptically shaped, except for the tangential caustic, which instead looks like an *astroid*; this is typically the case and the tangential caustic is therefore also known as the *astroid caustic*.

The critical curves and caustics depend strongly on the shape of the lens mass distribution. Taking the same mass distribution as above, but compressing it to an axis ratio of \(0.2\) or rounding it so it is almost axially-symmetric with an axis ratio of \(0.99\), we get the following critical curves and caustics:

```
[29]:
```

```
pep2= PerfectEllipsoidPotential(amp=10.**12*u.Msun,
a=4*u.kpc,b=0.2,c=1.)
pep99= PerfectEllipsoidPotential(amp=10.**12*u.Msun,
a=4*u.kpc,b=0.99,c=1.)
zlens, zsource= 0.5, 1.
# Compute convergence and shear on a grid in theta1 and theta2
thetas= numpy.linspace(-3.01,3.01,44)*u.arcsec
thetaxs= thetas
thetays= thetas
gcf= True
for pep,color in zip([pep2,pep99],['#1f77b4','#ff7f0e']):
kappagamma= numpy.array([[convergence_and_shear(pep,
u.Quantity([thetax,thetay]),
zlens,zsource)
for thetay in thetays] for thetax in thetaxs])
kappa= kappagamma[:,:,0]
gamma= kappagamma[:,:,1]
lambdar= 1.-kappa+gamma
lambdat= 1.-kappa-gamma
figsize(8,4)
# Critical curves
subplot(1,2,1)
ltcont= galpy_plot.dens2d(lambdat.T,origin='lower',
xrange=[thetaxs[0].to_value(u.arcsec),
thetaxs[-1].to_value(u.arcsec)],
yrange=[thetays[0].to_value(u.arcsec),
thetays[-1].to_value(u.arcsec)],
xlabel=r'$\theta_1\,(\mathrm{arcsec})$',
ylabel=r'$\theta_2\,(\mathrm{arcsec})$',
justcontours=True,cntrlw=2.,cntrcolors=color,
levels=[0.],gcf=gcf,overplot=not gcf,
retCont=True)
gcf= False
lrcont= galpy_plot.dens2d(lambdar.T,origin='lower',
xrange=[thetaxs[0].to_value(u.arcsec),
thetaxs[-1].to_value(u.arcsec)],
yrange=[thetays[0].to_value(u.arcsec),
thetays[-1].to_value(u.arcsec)],
xlabel=r'$\theta_1\,(\mathrm{arcsec})$',
ylabel=r'$\theta_2\,(\mathrm{arcsec})$',
justcontours=True,cntrlw=2.,cntrls=':',
levels=[0.],overplot=True,cntrcolors=color,
retCont=True)
annotate(r'$\mathrm{Image\ plane}$',(0.5,1.01),xycoords='axes fraction',
ha='center',va='bottom',fontsize=18.)
# Corresponding caustics
subplot(1,2,2)
for ii,cont in enumerate([ltcont,lrcont]):
# Extract critical curve contour
critx= cont.collections[0].get_paths()[0].vertices[:,0]
crity= cont.collections[0].get_paths()[0].vertices[:,1]
# Contour may consist of multiple paths
for further in cont.collections[0].get_paths()[1:]:
critx= numpy.concatenate((critx,further.vertices[:,0]))
crity= numpy.concatenate((crity,further.vertices[:,1]))
# Map to source plane using the lensing equation
beta= numpy.array([source_from_image(pep,u.Quantity([tcritx,tcrity])*u.arcsec,zlens,zsource)
for tcritx,tcrity in zip(critx,crity)])
plot(beta[:,0],beta[:,1],ls='-' if ii == 0 else ':',lw=2.,color=color)
subplot(1,2,2)
annotate(r'$\mathrm{Source\ plane}$',(0.5,1.01),xycoords='axes fraction',
ha='center',va='bottom',fontsize=18.)
xlabel(r'$\theta_1\,(\mathrm{arcsec})$')
ylabel(r'$\theta_2\,(\mathrm{arcsec})$')
xlim(thetaxs[0].to_value(u.arcsec),thetaxs[-1].to_value(u.arcsec))
ylim(thetays[0].to_value(u.arcsec),thetays[-1].to_value(u.arcsec))
tight_layout();
```

For the near-axially-symmetric case, we see that all curves become near-circles; this makes intuitive sense, because there is no preferred direction in an axially-symmetric lens. It remains the case that the radial critical curve is inside of the tangential critical curve no matter the shape of the lens, but in the axially-symmetric case, the tangential caustic becomes a point. This is a general property of axially-symmetric lenses. Thus, sources never cross the tangential caustic for
axially-symmetric lenses and only three images can be produced (as we already argued above). However, the relevance of the tangential critical curve is that it is the Einstein ring for a source that is right behind the lens. Such a source is lensed into an entire ring consisting of the tangential critical curve. For the strongly-flattened lens, the critical curves are also highly flattened, in the same direction as the mass distribution (we flattened it along \(\theta_2\)). The tangential
caustic also stretches in the same direction as the mass distribution, but the radial caustic elongates in the perpendicular direction. The tangential caustic also starts to encroach on the radial one and they meet along \(\theta_2\). This means that along the \(\theta_2\) direction, there is a transition from a single image to five images! For highly-flattened lenses, the tangential caustic can even extend beyond the radial caustic, a configuration that is known as a
*naked cusp* (Keeton & Kochanek 1998). The significance of naked-cusp lenses is that these can produce three images of a source on the same side of the lens; when a source crosses the radial caustic first, the two additional images generally lie on the opposite side of the lens as the original image (as in the example at the beginning of this section).

From the discussion so far, it is clear that what happens to sources near the caustics is highly important in studies of gravitational lensing, because observational biases cause high-magnification lens systems to be preferentially found in observational data. Again, it is possible to make very general statements about the behavior near caustics, because the universal aspects of this behavior are described by the mathematical theory of catastrophes. We will only briefly describe important
aspects of this for the study of gravitational lensing. Catastrophe theory is the general theory of dynamical systems that display qualitative changes in their behavior under small changes to the input parameters, called “control parameters”. For a gravitational lens of a fixed mass distribution and with the source and lens at fixed redshifts, there are two control parameters, the two-dimensional location \(\boldsymbol{\beta}\) of the source in the source plane. Fundamentally, there are only
two basic catastrophes for a system with two control parameters: the *fold* and the *cusp*. The fold is the form of both the radial and astroid caustic, except in the corners of the astroid, where the catastrophe has the form of a cusp. The difference between a fold and a cusp is that when we move across a fold caustic going from the region of \(n\) to
\(n-2\) images (e.g., \(n=3\) or \(n=5\) in the examples here), two images approach the corresponding critical curve and merge and disappear as the source crosses the fold caustic. The remaining images are far removed from this location in the image plane. On the other hand, when a source crosses the cusp caustic going from the region of \(n\) to \(n-2\) images, three images approach each other along the critical curve. When they merge as the source crosses the cusp caustic,
they combine into a single image (but note that there may be other pairs of images in the system, if \(n>3\)). Because this single image lies close to the critical curve, it is highly magnified. A source near a cusp caustic is therefore a way to create highly magnified isolated images (highly magnified images near a fold come in pairs).

The example at the beginning of this section showed a source crossing fold caustics. It is clear from that figure that pairs of images are created where there was no image before as the source moves across the radial and astroid caustics closer to the center. As an example of a source crossing a cusp, we can take the same massive elliptical galaxy centered at the origin, but now moving the source closer and closer to the lens center along the \(\theta_2 = 0\) direction, where the astroid caustic has a cusp:

```
[711]:
```

```
from galpy.potential import PerfectEllipsoidPotential
# Twice as massive as axially-symmetric example, could
# also reduce zlens to 0.3-ish
pep= PerfectEllipsoidPotential(amp=10.**12*u.Msun,
a=4*u.kpc,b=0.5,c=1.)
zlens, zsource= 0.5, 1.
betas= [[2,0],[0.8,0.],[0.4,0.],[0.1,0.]]
levelss= [numpy.arange(38,60,1),
[39.,39.25,39.5,40.,40.5,41.,41.5,41.895,42.,42.5,
43.,44.,45.,46.,47.,48.,49.,50.,51.,52.,53.],
[39.63,39.65,39.6875,39.8,40.,40.25,40.5,40.75,41.,
41.104,41.25,41.5,41.85,42.,42.5,43.,43.5,
44.,44.5,45.,45.5],
[39.85,39.875,40.,40.1,40.1932,40.25,40.37,
40.5,40.545,40.75,41.,41.5,42.,42.5]]
saddle_contourss= [[],[41.895],
[39.6875,41.104],[40.1932,40.545]]
figsize(10,10)
for ii,(beta,levels,saddle_contours) \
in enumerate(zip(betas,levelss,saddle_contourss)):
subplot(2,2,ii+1)
if numpy.linalg.norm(beta) < 0.4:
thetas= numpy.linspace(-2.51,2.51,50)*u.arcsec
elif numpy.linalg.norm(beta) < 0.5:
thetas= numpy.linspace(-3.01,3.01,50)*u.arcsec
else:
thetas= numpy.linspace(-4.01,4.01,50)*u.arcsec
thetaxs= thetas
thetays= thetas
plotz= u.Quantity([u.Quantity(\
[fermat_potential(pep,u.Quantity(beta)*u.arcsec,
u.Quantity([thetax,thetay]),
zlens,zsource).to(u.arcsec**2)
for thetay in thetays]) for thetax in thetaxs])
levels= numpy.array(levels)
cntrlw= numpy.array([1. for l in levels])
for saddle in saddle_contours:
cntrlw[levels==saddle]= 2.
galpy_plot.dens2d(plotz.T,origin='lower',
xrange=[thetaxs[0].to_value(u.arcsec),
thetaxs[-1].to_value(u.arcsec)],
yrange=[thetays[0].to_value(u.arcsec),
thetays[-1].to_value(u.arcsec)],
xlabel=r'$\theta_1\,(\mathrm{arcsec})$',
ylabel=r'$\theta_2\,(\mathrm{arcsec})$',
justcontours=True,
levels=levels,
cntrlw=cntrlw,
gcf=True)
plot([beta[0]],[beta[1]],'+',color='#1f77b4',ms=15,mew=3)
tight_layout();
```

Again, when the source is far from the lens center, a single image is produced (top left panel) and as the source crosses the fold radial caustic, a pair of images appears on the opposite side of the lens (top right panel). However, when the source crosses the cusp astroid caustic, the single image on the left splits into three images (with limaçon topology, two minima and a saddle point; bottom left panel). Thus, we have an image configuration with three images on one side of the lens, an image
on the other side of the lens, and a final image near the center of the lens. This is similar to the image morphology of the RXJ1131-1231 lens system displayed at the beginning of this chapter (at least if the central image is ignored; see below). As the source gets closer to the center, the two minima move towards larger \(|\theta_2|\) and smaller \(|\theta_1|\) and as the source gets very close to center, the images appear on four sides of the lens, a configuration similar to that of
the famous “Einstein cross” (again, ignoring the central image; Huchra et al. 1985). Lensing close to the center of the astroid caustic is sometimes referred to as the *cross* configuration, with the four basic strong lensing configurations then being fold, cusp, cross, and Einstein ring.

Because lensing near a fold or a cusp caustic produces the highly-magnified images that are preferentially detected in flux-limited surveys of galaxies (Turner et al. 1984), it is useful to understand lensing near these catastrophes. One general set of results that can be derived for lensing near folds or cusps has to do with the relative magnifications of the three images involved in the crossing of the caustic near a fold or cusp. For smooth lens mass distributions (that is, such that the lens does not significantly change around the caustic), the following relations should hold (Blandford & Narayan 1986; Schneider & Weiss 1992):

*Lensing near a fold*: the magnifications of the two images \(A\) and \(B\) that appear as a source crosses a fold caustic are equal: \(\mu_A = \mu_B\);*Lensing near a cusp*: the magnification of the central image \(A\) is equal to the sum of the magnifications of the flanking images \(B\) and \(C\): \(\mu_A = \mu_B + \mu_C\).

These two relations can be succinctly expressed if we give the magnification the sign of the total parity (such that the magnification can be positive or negative). In that case, both relations become \(\sum_i \mu_i = 0\): the sum of the signed magnifications equals zero near a fold or a cusp. Even though it is difficult to obtain absolute magnifications, we can determine relative magnifications by comparing the fluxes of different images. As with the odd-number theorem, these relations are
*also* not typically observed to hold for observed lenses. This could be due to intrinsic variability of the source, because then the relative time delays mean that we are not observing the same intrinsic flux for all images. Or the assumption that the lens is smooth could be broken, for example, because of microlensing by stars in the lens (if the source is so small that it is susceptible to microlensing, see Section 16.2.3) or if the mass
distribution is not smooth due to substructure in the dark matter distribution. If all other possibilities can be excluded, so-called **flux-ratio anomalies** become a sensitive and robust method for determining the clustering of dark matter on small scales and constraining the microphysics of dark matter (Mao & Schneider 1998; Metcalf & Zhao 2002; Dalal & Kochanek 2002; Gilman et al. 2020).

### 16.4.3. Lensing by cuspy mass distributions¶

We now understand many of the general characteristics of where lensed images appear and what their basic properties are. But we haven’t yet addressed why the most basic general result in lensing does not seem to hold: the odd-number theorem appears to be in conflict with almost all observed lens systems! The basic reason for this (and what it helps demonstrate) is that the cores of galaxies are very dense and indeed, the density distribution in the cores of galaxies may be singular. For example, as we discussed in Chapter 13.1, the cores of elliptical galaxies range from near-isothermal, with stellar densities \(\rho(r) \propto r^{-\alpha}\) with \(0 \lesssim \alpha \lesssim 1\) to strong cusps with \(1 \lesssim \alpha \lesssim 2\). The dark-matter halo is similarly believed to have \(\rho(r) \propto 1/r\). As we discussed above, for singular lens mass distributions, the odd-number theorem does not hold, because for singular lenses, the image that would appear near the lens center is instead a defect in the Fermat potential. The relevant singularity is that of the deflection angle rather than the density: the odd-number theorem holds for singular densities that produce finite deflection angles everywhere, but not for singular deflection angles. Thus, for \(\rho(r) \propto r^{-\alpha}\), shallow cusps have \(0 \lesssim \alpha \lesssim 1\) and the odd-number theorem holds, while for strong cusps with \(1 \lesssim \alpha \lesssim 2\) the odd-number theorem fails and we expect an even number of images (this is the same distinction between shallow and non-shallow cusps that was relevant for the orbital structure and self-consistent shape of triaxial models in Chapter 14.4.2).

If lenses aren’t actually singular, but just very dense (as must be the case for real systems), the odd-number theorem should hold, but it will appear to be broken. This is because for a very dense lens, the image that forms near the center of the lens is highly demagnified. For a dense lens, the lensing potential is deep and highly curved and one of the images that forms beyond the first is a maximum near the center of the lens with high curvature of the Fermat potential surface. Because the
magnification is the inverse of the curvature, this image is highly demagnified and therefore difficult to observe. Moreover, because it near-coincides with the lens, the lensing galaxy itself can obscure the image further (thus breaking the other assumption in the odd-number theorem: the transparency of the lens). Thus, whether because of a true singularity or simply a dense galaxy core, we expect to not see the central image and thus to observe an *even* number of images.

Beyond the central image, lensing in singular or high-density lenses is similar to that in non-singular, lower-density lenses such as the examples that we considered above. For example, we can move a source towards the center of a lens modeled as a flattened, cuspy Hernquist density along a fold astroid caustic and this produces a similar sequence of images as in the cored perfect-ellipsoid example above. The Hernquist’s central behavior is such that the deflection angle is finite everywhere and there are therefore an odd number of images:

```
[410]:
```

```
from galpy.potential import TriaxialHernquistPotential
# Twice as massive as axially-symmetric example, could
# also reduce zlens to 0.3-ish
pep= TriaxialHernquistPotential(amp=3*10.**12*u.Msun,
a=4*u.kpc,b=0.5,c=1.)
zlens, zsource= 0.5, 1.
betas= [[2,0],[0.6,0.],[0.2,0.],[0.05,0.]]
levelss= [numpy.arange(16.5,41.,1.),
sorted(numpy.concatenate(([18.6566,18.75],
numpy.arange(17.5,18.,.3),
numpy.arange(18.,33.,1.)))),
sorted(numpy.concatenate(([17.735,17.7657,18.18675],
numpy.arange(17.7,18.,.05),
numpy.arange(18.,22.4,.3)))),
sorted(numpy.concatenate(([17.9172,18.02335],
numpy.arange(17.8,17.9,.05),
numpy.arange(18.1,22.,.3))))]
saddle_contourss= [[],[18.6566],
[17.7657,18.18675],[17.9172,18.02335]]
figsize(10,10)
for ii,(beta,levels,saddle_contours) \
in enumerate(zip(betas,levelss,saddle_contourss)):
subplot(2,2,ii+1)
if numpy.linalg.norm(beta) < 0.4:
thetas= numpy.linspace(-2.51,2.51,50)*u.arcsec
elif numpy.linalg.norm(beta) < 0.5:
thetas= numpy.linspace(-3.01,3.01,50)*u.arcsec
else:
thetas= numpy.linspace(-4.01,4.01,50)*u.arcsec
thetaxs= thetas
thetays= thetas
plotz= u.Quantity([u.Quantity(\
[fermat_potential(pep,u.Quantity(beta)*u.arcsec,
u.Quantity([thetax,thetay]),
zlens,zsource).to(u.arcsec**2)
for thetay in thetays]) for thetax in thetaxs])
levels= numpy.array(levels)
cntrlw= numpy.array([1. for l in levels])
for saddle in saddle_contours:
cntrlw[levels==saddle]= 2.
galpy_plot.dens2d(plotz.T,origin='lower',
xrange=[thetaxs[0].to_value(u.arcsec),
thetaxs[-1].to_value(u.arcsec)],
yrange=[thetays[0].to_value(u.arcsec),
thetays[-1].to_value(u.arcsec)],
xlabel=r'$\theta_1\,(\mathrm{arcsec})$',
ylabel=r'$\theta_2\,(\mathrm{arcsec})$',
justcontours=True,
levels=levels,
cntrlw=cntrlw,
gcf=True)
plot([beta[0]],[beta[1]],'+',color='#1f77b4',ms=15,mew=3)
tight_layout();
```

Considering a flattened version of the singular-isothermal sphere instead, which has \(\rho(r) \propto 1/r^2\), we still get a similar sequence, but the image that appears in the Hernquist and perfect-ellipsoid lenses near the center is absent. The following set of source positions that cross the fold caustic as in the first example in this section illustrates this:

```
[186]:
```

```
from galpy.potential import PowerTriaxialPotential
pep= PowerTriaxialPotential(amp=5*10.**10*u.Msun,alpha=2.01,b=0.5,c=1.)
zlens, zsource= 0.5, 1.
betas= [[1,0.5],[0.5,0.25],[0.3,0.15],[0.08,0.04]]
levelss= [numpy.arange(4982,5000,0.5),
sorted(numpy.concatenate(([4985.26],
numpy.arange(4983.7,4986.5,0.25),
numpy.arange(4986.5,4996.5,0.5)))),
sorted(numpy.concatenate(([4984.26,4984.265,4984.983],
numpy.arange(4983.9,4984.5,0.1),
numpy.arange(4984.6,4987.5,0.2)))),
sorted(numpy.concatenate(([4984.4908,4984.69],
numpy.arange(4984.1,4984.5,0.075),
numpy.arange(4984.6,4987.,0.2))))]
saddle_contourss= [[],[4985.26],
[4984.265,4984.983],[4984.4908,4984.69]]
figsize(10,10)
for ii,(beta,levels,saddle_contours) \
in enumerate(zip(betas,levelss,saddle_contourss)):
subplot(2,2,ii+1)
if numpy.linalg.norm(beta) < 0.4:
thetas= numpy.linspace(-2.51,2.51,50)*u.arcsec
elif numpy.linalg.norm(beta) < 0.5:
thetas= numpy.linspace(-3.01,3.01,50)*u.arcsec
else:
thetas= numpy.linspace(-4.01,4.01,50)*u.arcsec
thetaxs= thetas
thetays= thetas
plotz= u.Quantity([u.Quantity(\
[fermat_potential(pep,u.Quantity(beta)*u.arcsec,
u.Quantity([thetax,thetay]),
zlens,zsource,zmax=50.).to(u.arcsec**2)
for thetay in thetays]) for thetax in thetaxs])
levels= numpy.array(levels)
cntrlw= numpy.array([1. for l in levels])
for saddle in saddle_contours:
cntrlw[levels==saddle]= 2.
galpy_plot.dens2d(plotz.T,origin='lower',
xrange=[thetaxs[0].to_value(u.arcsec),
thetaxs[-1].to_value(u.arcsec)],
yrange=[thetays[0].to_value(u.arcsec),
thetays[-1].to_value(u.arcsec)],
xlabel=r'$\theta_1\,(\mathrm{arcsec})$',
ylabel=r'$\theta_2\,(\mathrm{arcsec})$',
justcontours=True,
levels=levels,
cntrlw=cntrlw,
gcf=True)
plot([beta[0]],[beta[1]],'+',color='#1f77b4',ms=15,mew=3)
tight_layout();
```

Note that what appears to be an extremum near the center in the top right and bottom panels is in fact a defect without an image. The kink in the contour near the center in the top left panel shows the effect of the defect.

While for the flattened Hernquist lens, the critical curves and caustics are qualitatively similar to that of the perfect ellipsoid, for the flattened singular-isothermal sphere the situation is more complex. As we saw in Equation \eqref{eq-gravlens-sis-gamma-kappa}, for the axially-symmetric singular isothermal sphere, \(\lambda_r = 1\) and it therefore does not have a radial critical curve or caustic. This continues to be the case for the flattened singular-isothermal sphere (Kormann et al. 1994). The tangential critical curve and caustic, however, are similar as before:

```
[240]:
```

```
from galpy.potential import PowerTriaxialPotential
# Similar surface density as previous models
pep= PowerTriaxialPotential(amp=5*10.**10*u.Msun,alpha=2.01,b=0.5,c=1.)
zlens, zsource= 0.5, 1.
# Compute convergence and shear on a grid in theta1 and theta2
thetas= numpy.linspace(-3.01,3.01,44)*u.arcsec
thetaxs= thetas
thetays= thetas
kappagamma= numpy.array([[convergence_and_shear(pep,
u.Quantity([thetax,thetay]),
zlens,zsource)
for thetay in thetays] for thetax in thetaxs])
kappa= kappagamma[:,:,0]
gamma= kappagamma[:,:,1]
lambdar= 1.-kappa+gamma
lambdat= 1.-kappa-gamma
figsize(8,4)
# Critical curves
subplot(1,2,1)
ltcont= galpy_plot.dens2d(lambdat.T,origin='lower',
xrange=[thetaxs[0].to_value(u.arcsec),
thetaxs[-1].to_value(u.arcsec)],
yrange=[thetays[0].to_value(u.arcsec),
thetays[-1].to_value(u.arcsec)],
xlabel=r'$\theta_1\,(\mathrm{arcsec})$',
ylabel=r'$\theta_2\,(\mathrm{arcsec})$',
justcontours=True,cntrlw=2.,
levels=[0.],gcf=True,
retCont=True)
galpy_plot.text(-2.3,2.,r'$\mathrm{Tangential\ critical\ curve}$',
fontsize=15.)
plot([-1.8,-1.05],[1.8,1.05],'k-',lw=0.8)
annotate(r'$\mathrm{Image\ plane}$',(0.5,1.01),xycoords='axes fraction',
ha='center',va='bottom',fontsize=18.)
# Corresponding caustics
subplot(1,2,2)
for ii,cont in enumerate([ltcont]):
# Extract critical curve contour
critx= cont.collections[0].get_paths()[0].vertices[:,0]
crity= cont.collections[0].get_paths()[0].vertices[:,1]
# Contour may consist of multiple paths
for further in cont.collections[0].get_paths()[1:]:
critx= numpy.concatenate((critx,further.vertices[:,0]))
crity= numpy.concatenate((crity,further.vertices[:,1]))
# Map to source plane using the lensing equation
beta= numpy.array([source_from_image(pep,u.Quantity([tcritx,tcrity])*u.arcsec,zlens,zsource)
for tcritx,tcrity in zip(critx,crity)])
plot(beta[:,0],beta[:,1],'k-' if ii == 0 else 'k:',lw=2.)
galpy_plot.text(-2.3,2.,r'$\mathrm{Tangential\ caustic}$',
fontsize=15.)
plot([-1.8,-.35],[1.85,0.35],'k-',lw=0.8)
annotate(r'$\mathrm{Source\ plane}$',(0.5,1.01),xycoords='axes fraction',
ha='center',va='bottom',fontsize=18.)
xlabel(r'$\theta_1\,(\mathrm{arcsec})$')
ylabel(r'$\theta_2\,(\mathrm{arcsec})$')
xlim(thetaxs[0].to_value(u.arcsec),thetaxs[-1].to_value(u.arcsec))
ylim(thetays[0].to_value(u.arcsec),thetays[-1].to_value(u.arcsec))
tight_layout();
```

Because there is only a single caustic, you might think that there is only a transition between one and two images for the flattened singular-isothermal sphere (two, because the odd-number theorem does not hold). However, in singular lenses, there is in fact a region similar to that encompassed by the radial caustic in the previous examples where there are two images, and passing the tangential caustic then creates four images (this is what happens in the Fermat-potential example above). The
curve that separates the single-image region from the two-image region is known as the *cut* (Kovner 1987). Softening the singularity, the cut becomes the radial caustic. The cut can therefore be computed by softening the singularity (e.g., by replacing \(x^2 + y^2/b^2\) by \(x^2 + y^2/b^2 + R_c^2\)) and obtaining the cut as the radial caustic in the limit of zero softening (\(R_c \rightarrow 0\)). It is clear from this discussion that
aside from the absence of the central image, lensing by singular mass distributions is essentially the same as that in non-singular lenses.