14.3. Schwarzschild modeling

\label{sec-ellipequil-schwarzschild}

Schwarzschild modeling is a powerful way to model the equilibrium structure of elliptical galaxies and triaxial mass distributions. While it is certainly possible to write down simple distribution function models for oblate elliptical galaxies much like we did for disk galaxies in Chapter 10, the complex orbital structure of many elliptical galaxies makes it difficult to create realistic models in this way. Moreover, coming up with simple triaxial models is difficult and only possible in very idealized cases. For this reason, it was unclear for a while after the first observational indications that many elliptical galaxies may be triaxial (which we discussed at the start of Chapter 12), whether stable, triaxial mass configurations are even possible in the first place. To resolve this question, Martin Schwarzschild developed a general, highly-flexible method for obtaining equilibrium models for very general mass distributions. In the 40 years since it appeared, this method has become widely used to study galactic equilibria and to interpret observational data.

The steady-state models for spherical and disk mass distributions that we have discussed in Chapters 5 and 10 have all been based on the Jeans theorem (Chapter 5.5): that any solution of the equilibrium collisionless Boltzmann equation only depends on the phase-space coordinates through the integrals of motion. Because of this theorem, we have built equilibrium models by positing that the distribution function is a function of one or more of the integrals of the motion (such as specific energy and specific angular momentum). We then either solved for such a function that corresponds to a given density profile (e.g., exactly for the case of the Eddington inversion for ergodic spherical systems of Chapter 5.6.1 or approximately for the disk distribution functions of Chapter 10.2.2) . Or we simply chose a form that seems reasonable and computed the resulting density distribution (like in the case of the isothermal sphere of Chapter 5.6.2). Another way of phrasing the Jeans theorem, however, is that the distribution function cannot depend on the phase along the orbit of a body, it can only change from orbit to orbit. Therefore, we can phrase the Jeans theorem in terms of orbits as

Orbit-based Jeans theorem: Any function that is constant along orbits is a solution of the equilibrium collisionless Boltzmann equation. Furthermore, any solution of the equilibrium collisionless Boltzmann equation is constant along orbits.

The first part of this theorem results from Liouville’s theorem (Equation 5.34): the distribution function does not change as a phase-space position orbits; therefore, if the distribution function is constant along the orbit at \(t=0\), it will be constant along the orbit at all times. The second part follows from the regular Jeans theorem: the regular Jeans theorem demonstrates that the distribution function is itself an integral of the motion; therefore, it is constant along orbits.

The orbit-based Jeans theorem therefore suggest another way of constructing equilibrium mass models: rather than positing a function of the integrals of motion (which are difficult to compute for general mass models), we can posit a function of the orbits themselves. This is a much more flexible way to create equilibrium mass models, because even when integrals of the motion are difficult to obtain (e.g., for a triaxial distribution), it is straightforward to numerically obtain orbits in any gravitational potential. This fundamental insight was first proposed by Schwarzschild (1979), who used it to obtain the first model for a triaxial stellar system in dynamical equilibrium.

In this second part of the chapter, we discuss the general approach of Schwarzschild modeling, illustrate its use in a simple one-dimensional problem, and then discuss more detailed aspects of its application to full three-dimensional equilibrium models of real galaxies. Applications of Schwarzschild modeling are discussed further in Chapter 16.

14.3.1. General approach

Schwarzschild modeling was first used to create a dynamical model for a given general (e.g., triaxial) density distribution and we will describe this type of modeling first before describing how to incorporate constraints on the solution beyond the density. Given an equilibrium mass distribution \(\rho(\vec{x})\), we can obtain the gravitational potential \(\Phi(\vec{x})\) by solving the Poisson equation (e.g., for a triaxial system using the approach described in Chapter 12) . We then integrate a large number \(N\) of orbits \(i\) in this gravitational potential; each orbit is a time series of phase-space coordinates \(\vec{w}_i(t_k) = [\vec{x}_i(t_k),\vec{v}_i(t_k)]\) sampled at \(K\) regularly-spaced times \(t_k\). Given the time series, we can then determine each orbit’s contribution to the density at \(L\) points \(\vec{x}_j\) indexed by \(j\), for example, by making a histogram \(n_{ij}\) of the set of \(K\) \(\vec{x}_i(t_k)\) (approximating the density as the number of points \(k\) where \(\vec{x}_i(t_k)\) falls within a small volume \(\delta V_j\) around \(\vec{x}_j\)).

We built a distribution function out of the \(N\) orbits by giving each orbit a weight \(w_i\); the total mass associated with a given orbit is then \(m_i = w_i\,M\), where \(M\) is the total mass of the galaxy. The total density at point \(\vec{x}_j\) corresponding to a set of weights \(w_i\) is then given by \(\tilde{\rho}_j = \sum_i w_i\,{n_{ij} \over K}\,{M \over\delta V_j}\). With \(p_{ij} \equiv n_{ij} / K\), the fraction of time an orbit spends in the volume near \(\vec{x}_j\), and \(\delta \rho_j \equiv M / \delta V_j\), the basic density discretization, this becomes \begin{equation} \tilde{\rho}_j = \delta \rho_j \sum_i w_i\,p_{ij}\,. \end{equation} This model density can then be compared to the true density \(\rho(\vec{x}_j)\) and a perfect match is obtained when \begin{equation}\label{eq-basic-schwarzschild-constraint} \rho(\vec{x}_j) = \tilde{\rho}_j = \delta \rho_j \sum_i w_i\,p_{ij}\,. \end{equation} For a given target density profile and a given set of orbits, this is a set of \(L\) linear equations in the unknown weights \(w_i\) that could be solved through standard techniques for solving linear sets of equations. In principle, it is therefore possible to use \(N = L\) orbits and solve this equation. However, a further constraint is necessary to obtain a physical distribution function, namely, that \(\forall i: w_i \geq 0\); all weights have to be positive or zero. This constraint will typically not be satisfied when \(N = L\): to exactly match the density at \(L\) points with \(L\) orbits, it will be necessary to include some negatively-weighted orbits and these are not physical configurations. It is therefore necessary to use \(N \gg L\) orbits, in which case the system of equations (14.51) becomes underdetermined and an infinite \(N-L\) dimensional subspace of solutions is obtained. As long as this subspace overlaps with the part of space where \(\forall i: w_i \geq 0\), a physical solution is possible, but now we have infinitely many such solutions! To pick a specific solution requires specifying another constraint on the set of weights that corresponds to a desirable property of the solution: we specify some additional objective function \(S(\vec{w})\) that we want to maximize. Therefore, solving for the distribution function requires us to find the set of weights \(\vec{w} = w_i\) that solves \begin{align} \mathrm{Maximize}\ S(\vec{w}) \ \mathrm{subject\ to} & \begin{cases} \forall j: \tilde{\rho}_j = \delta \rho_j \sum_i w_i\,p_{ij}\,.\\ \forall i: w_i \geq 0\end{cases}\label{eq-schwarzschild-basicprogramming} \end{align} When the mass of the galaxy is known, an additional constraint is that \(\sum_i w_i = 1\).

While mass modeling of complex galactic distributions is a very esoteric subject, optimization problems of the sort (14.52) are very commonly encountered throughout science and engineering and, in particular, in many business settings (for example, Fedex and other courier services need to solve instances of this problem daily to figure out the optimal delivery schedule for packages). Because of this, highly efficient open-source and commercial software packages exist to solve this class of problem (some quite pricey!) and even very large systems can be solved quickly. In general, the class of optimization problems (14.52) is known as non-linear programming, with linear and quadratic programming corresponding to objective functions \(S(\vec{w})\) that are linear or quadratic, respectively, in the weights and for which special solution methods exist.

When Schwarzschild (1979) first proposed this method, he applied it to find a steady-state dynamical model for a triaxial version of a modified Hubble profile, which in the spherical case has \(\rho(r) \propto (1+r^2)^{-3/2}\). Because Schwarzschild only wanted to demonstrate the existence of any equilibrium distribution function for this triaxial system, he did not care greatly about the properties of the weights and therefore chose a simple linear objective function \(S(\vec{w})\), for which the solution of the problem (14.52) is simple. However, using a linear objective function tends to lead to solutions with very spiky behavior, that is, where many weights are zero while only a small subset is non-zero, and the resulting distribution function is therefore not very smooth (and, thus, likely not very realistic for a real galaxy). The reason for this is that in many cases, minimizing a linear objective function over the weights is equivalent to minimizing the number of non-zero weights (Candes et al. 2006), an insight that has turned out to be highly productive in signal processing and many other modeling contexts (including in astrophysics). However, for Schwarzschild modeling of galaxies, this is not a desirable property (an example is shown in the next section).

Non-linear objective functions are therefore preferred to create smoother distribution functions. Because quadratic objective functions lead to their own class of efficient solutions of the optimization problem (14.52), a quadratic objective function can be used. For example, \(S(\vec{w}) = -\sum_i w_i^2/W_i\) (optimizing this under the constraint \(\sum_i w_i = 1\) for a galaxy with known mass leads to \(w_i \propto W_i\) and the \(W_i\) are therefore a sort of prior assumption about what the weights should be). A popular, but fully non-linear objective function is the entropy, \(S(\vec{w}) = -\sum_i w_i\,\ln w_i\), which will tend to create smoother distribution functions. However, fully non-linear programming problems are harder to solve. An example of this is also shown below.

So far, we have only required that the density resulting from the weighted combination of orbits reproduces a target density. However, we can also incorporate constraints on the velocity structure of the model, because we have access to the velocities along each orbit. We can use the velocities, for example, to compute the mean velocity or the velocity dispersion in a volume around \(\vec{x}_j\), using moments computed using the orbit weights. For instance, we could attempt to create a model that is as close to isotropy as possible by including an objective function of the form \begin{align} S_\mathrm{iso}(\vec{w}) = -\sum_l \left[\left(1-{\sigma_\theta^2(\vec{x}_l) + \sigma_\phi^2(\vec{x}_l) \over 2\,\sigma_r^2(\vec{x}_l)}\right)^2 + \left(1-{\sigma_\theta^2(\vec{x}_l) \over \sigma_\phi^2(\vec{x}_l)}\right)^2\right]\,, \end{align} with the velocity dispersions computed at a set of points \(\vec{x}_l\), because minimizing this function would minimize differences between the velocity dispersions (the first term is the anisotropy for spherical systems that we defined in Equation 5.46). Similar objective functions could be used to obtain a certain anisotropy profile, to add overall rotation to the model, or to include constraints on higher-order moments of the velocity distribution. Because these are highly non-linear objective functions, solving the optimization problem becomes harder.

A second class of velocity constraints are observational ones. When Schwarzschild modeling is used to interpret observational data on the surface density and velocity structure of galaxies, the model’s velocity structure is matched to the observed structure using a chi-squared or likelihood-based objective function, \(S(\vec{w}) = -\chi^2\) or \(S(\vec{w}) = \ln \mathcal{L}\), where \(\chi^2\) or \(\ln \mathcal{L}\) is an expression that depends on the exact data obtained and its uncertainties. We discuss this in more detail in Section 14.3.3 below.

14.3.2. A worked example in one dimension

\label{sec-ellipequil-1dschwarzschild}

To illustrate the general concept of Schwarzschild modeling and its main ingredients, and to demonstrate how well it works, we apply it in this section to a simple one-dimensional problem. We consider the problem of determining the equilibrium distribution function of a constant density slab of matter, for which the gravitational potential is that of a harmonic oscillator. This makes for a great example, because orbit integration in this potential is analytic (see, e.g., Chapter 4.2.1) . As for any one-dimensional system, the correct distribution function for this density profile is easy to compute analytically (see Chapter 10.4.3 and below), and we can therefore straightforwardly check whether Schwarzschild modeling recovers the correct distribution function. Because the system is only one-dimensional, in practice running the Schwarzschild modeling procedure is very fast and can easily be done on any modern computer. However, keep in mind that one-dimensional modeling differs qualitatively from three-dimensional modeling, in that many of the practical issues with Schwarzschild modeling in three dimensional systems that we will discuss below, do not occur in one dimension or are easy to solve (such as, finding all the orbits, deciding how long to integrate, chaotic orbits, etc.).

We consider orbits in a one-dimensional harmonic-oscillator potential \begin{equation} \Phi(x) = \frac{\omega^2 x^2}{2}\, \end{equation} where \(\omega\) is the frequency of the oscillator. All orbits in this potential are analytic and given by \begin{align} x(t) & = A_0\,\cos(\omega t + \phi_0)\,;\quad v(t) = -A_0\,\omega\,\sin(\omega t + \phi_0)\,, \end{align} where \(A_0\) and \(\phi_0\) are the amplitude and initial phase of the orbit that are given by the initial conditions. The amplitude \(A_0\) is related to the specific energy as \(A_0 = \sqrt{2E}/\omega\). Thus, an orbit with specific energy \(E=0\) remains motionless at the bottom of the potential well and all non-trivial orbits have positive, non-zero specific energy and a period of \(T = 2\pi/\omega\). Orbits are labeled by \(A_0\) or, equivalently, \(E\) and form a one-dimensional sequence, as expected. By applying the Poisson equation, it is easy to see that a harmonic-oscillator potential corresponds to a constant density, with the relation between the constant density \(\rho_0\) and the oscillation frequency given by \begin{equation}\label{eq-slab-poisson} \omega^2 = 4\pi G\rho_0\,. \end{equation}

Our goal is to find a self-consistent phase-space distribution function that produces a finite-width, constant density slab. That is, we want to find the distribution function that creates the density \begin{equation}\label{eq-slab-density} \rho(x) = \left\{\begin{array}{lr} \rho_0, & \text{for } -T/2 \leq x \leq T/2\\ 0, & \text{otherwise}\,, \end{array}\right. \end{equation} where \(T\) is the thickness of the slab centered around \(x=0\). This problem can be solved analytically using the tools described in Chapter 10.4, and we will do that below, but let us determine this distribution using Schwarzschild modeling.

We first implement a function that returns an entire orbit for a given specific energy \(E\), frequency \(\omega\), and number of times along the orbit:

[13]:
def harmOsc_orbit(E,omega,nt=101):
    """Return the orbit with specific energy E covering a single
    period of the harmonic oscillator potential with frequency omega;
    nt= number times along the orbit to return"""
    A0= numpy.sqrt(2.*E)/omega
    # Make sure not to double count t=0,period
    ts= numpy.linspace(0.,2*numpy.pi/omega,nt+1)[:-1]
    return (A0*numpy.cos(omega*ts),-A0*omega*numpy.sin(omega*ts))

To test this function, we plot the orbit with \(E=0.1\) in a potential with \(\omega=1.1\):

[14]:
figure(figsize=(4,4))
plot(*harmOsc_orbit(0.1,1.1))
xlabel(r'$x$')
ylabel(r'$y$');
../_images/chapters_III-03.-Equilibria-of-Elliptical-Galaxies-and-Dark-Matter-Halos_60_0.svg

Figure 14.6: An orbit in a one-dimensional harmonic oscillator.

This orbit is the expected ellipse. Note that the plotted orbit doesn’t close, because in anticipation of using this function in Schwarzschild modeling, we have made sure to not have points along the orbit at the same orbital phase. Such points would be double-counted in using this orbit as a building block for the distribution function.

In Schwarzschild modeling, we build the distribution function by weighting orbits such as the one in Figure 14.6 as building blocks of the galaxy. Uniformly sampling along the orbit gives its contribution to the total density. The more points along the orbit that we evaluate, the better we approximate the orbit’s contribution to the total density. Here we compute an orbit’s contribution to the density simply by building a histogram of the orbit’s \(x\) values; an example using the orbit from Figure 14.6 is shown as the \(E=0.1\) histogram in Figure 14.7.

We can write a general function that computes the total density for a set of orbits with energies \(\vec{E}\) and weights \(\vec{w}\):

[15]:
def harmOsc_dens(E,w,omega,nt=101,
                range=[-2.5,2.5],bins=51):
    """Return the combined density from orbits with
    specific energies E and weights w, with individual
    orbits sampled at nt times; compute the density as
    a histogram over 'range' in 'bins' bins"""
    return numpy.histogram(numpy.array([harmOsc_orbit(Ei,omega,nt=nt)[0]
                                        for Ei in E]).flatten(),
                           weights=numpy.tile(w,(nt,1)).T.flatten()
                                       *bins/(range[1]-range[0])/nt,
                           range=range,bins=bins)

Combining the contributions of different orbits, we can build up the total density. As an example, we plot the cumulative contribution to the density from orbits with \(E=0.1, 0.2\) and \(0.3\) weighted equally in the left panel of Figure 14.7 as well. Each additional curve above the previous one shows the addition of one orbit, starting with the lowest specific energy one.

[16]:
figure(figsize=(11,4.5))
subplot(1,2,1)
hist_E0p1 = hist(numpy.array([harmOsc_orbit(0.1,1.1,nt=1001)[0]]).flatten(),
                 bins=21,histtype='step',
                 label=r'$E=0.1$')
hist_E0p2 = hist(numpy.array([harmOsc_orbit(0.1,1.1,nt=1001)[0],
                              harmOsc_orbit(0.2,1.1,nt=1001)[0]]).flatten(),
                 bins=21,histtype='step',
                 label=r'$E=0.1+0.2$')
hist_E0p3 = hist(numpy.array([harmOsc_orbit(0.1,1.1,nt=1001)[0],
                              harmOsc_orbit(0.2,1.1,nt=1001)[0],
                              harmOsc_orbit(0.3,1.1,nt=1001)[0]]).flatten(),
                  bins=21,histtype='step',lw=2.,
                  label=r'$E=0.1+0.2+0.3$')
ylim(0.,299.)
xlabel(r'$x$')
# Flip the order in the histogram
handles, labels = gca().get_legend_handles_labels()
legend(handles=[h for h in handles[::-1]],
       frameon=False,fontsize=18.)
subplot(1,2,2)
hy,hx= harmOsc_dens(numpy.linspace(1e-10,1.,101),
                    numpy.ones(101)/101,1.1)
plot((numpy.roll(hx,-1)+hx)[:-1]/2.,hy,ds='steps-mid')
xlabel(r'$x$')
tight_layout();
../_images/chapters_III-03.-Equilibria-of-Elliptical-Galaxies-and-Dark-Matter-Halos_68_0.svg

Figure 14.7: Contribution to the density from three orbits (left) and uniformly sampled orbits (right) in a one-dimensional harmonic oscillator.

When we weight all orbits with energies \(0 \leq E \leq 1\) equally, sampling the energy distribution with 101 orbits, we get the density shown in the right panel of Figure 14.7.

We now want to set the weights \(\vec{w}\) such that the resulting density distribution is that which sources the gravitational potential, that is, a constant density over the slab as in Equation (14.57). For this, we will evaluate the density corresponding to a given orbit sampling represented by a set of energies \(\vec{E}\) and a set of orbit weights \(\vec{w}\) in bins like in Figure 14.7, and require that the density in each bin equals that of Equation (14.57); that is, the resulting density should equal \(\rho_0\) within the slab and be zero outside. The only constraint on the weights is that they are positive, \(w_i \geq 0\,\forall i\).

To get a first sense of how this works, consider the interactive example below. On the left, you see a histogram of orbit weights for 10 energies spanning 0 to the maximum specific energy orbit. On the right, you see the resulting density histogram from combining all of the 10 orbits’ \(x\) values using their orbit weights. You can drag the top of the weight bins to change the orbit weights and watch how the resulting density changes. Try finding the combination of orbit weights that gives the correct constant density, given as the orange line!

To obtain a first quantitative solution, we follow the original procedure used by Schwarzschild (1979) and use the linear programming version of the optimization problem in Equation (14.52). However, the basic Schwarzschild modeling problem, of matching the density (Equation 14.51) with positive weights, does not have a linear-programming loss function directly associated with it, so following Schwarzschild, we can make up one at random. To obtain a first solution, we will simply use the objective function \(\phi(\vec{w}) = \sum_i w_i\). We can solve the linear-programming problem using scipy.optimize’s linprog function, which allows one to solve very general linear-programming problems. The following function sets up the constraints in a way that linprog understands and then returns the result. We define the weights such that each orbit’s weight is the total mass contributed by the orbit. Note that we do not enforce that the total mass of all the orbits is the known total mass of the system, as is commonly done in Schwarzschild modeling.

[17]:
from scipy import optimize
def optimize_weights_linprog(E,omega,nt=101,T=3.,
                            range=[-2.5,2.5],bins=51):
    """Determine the Schwarzschild modeling solution using
    linear programming using orbits with specific energies E,
    in a harmonic oscillator with frequency omega for a target
    slab with thickness T. Compute the model density as a
    histogram over 'range' in 'bins' bins.
    Returns the optimal weights of the orbits"""
    rho0= omega**2./(4.*numpy.pi)
    # Define constraints, integrate all orbits
    orbs= numpy.array([harmOsc_orbit(Ei,omega,nt=nt)[0] for Ei in E])
    # Compute density for each orbit
    orbhists= numpy.apply_along_axis(\
                lambda x: numpy.histogram(x,range=range,bins=bins,
                                          density=True)[0],1,orbs).T
    # orbhists is now the matrix A in the constraint equation
    # Density to match:
    dens_match= rho0*numpy.ones(bins)
    xbins= (numpy.roll(numpy.linspace(range[0],range[1],bins+1),-1)
            +numpy.linspace(range[0],range[1],bins+1))[:-1]/2.
    dens_match[(xbins < -T/2.)+(xbins > T/2)]= 0.
    # dens_match is now the vector b in the constraint equation
    # Random objective function: minimize sum of weights
    c= numpy.ones(len(E))
    # default in linprog is for all parameters to be positive,
    # so don't need to specify that
    return optimize.linprog(c,A_eq=orbhists,b_eq=dens_match)

We run the code for an example where \(\omega = 1.1\) and \(T = 3\). We attempt to match the density over the range \([-1.75,1.75]\) (the density should only be non-zero over \([-1.5,1.5]\) in this case, for \(T=3\)) using orbits with energies up to 120% of what we know the maximum specific energy to be, because the maximum specific energy is for an orbit that turns around at \(x = T/2\): \(E_{\mathrm{max}} = \omega^2 T^2/8\). We use 1,001 orbits, evaluating 3001 points along each orbit to get an accurate sampling of the density contributed by each orbit.

[18]:
omega= 1.1
T= 3.
Emax= omega**2.*T**2./8.
xrange= [-1.75,1.75]
nE= 1001
nt= 3001
bins= 101
E= numpy.linspace(0.,Emax*1.2,nE)
dE= E[1]-E[0]
opt= optimize_weights_linprog(E,omega,T=T,
                              bins=bins,range=xrange,nt=nt)
# A second version with 2001 orbits discussed below
omega= 1.1
T= 3.
Emax= omega**2.*T**2./8.
xrange= [-1.75,1.75]
nE= 2001
nt= 3001
bins= 101
E2= numpy.linspace(0,Emax*1.1,nE)
dE2= E2[1]-E2[0]
opt2= optimize_weights_linprog(E2,omega,T=T,
                               bins=bins,range=xrange,nt=nt)

The weights specify how much mass is contributed to the system by orbits with different energies. In Figure 14.8, we plot the differential specific energy distribution \(N(E)\), which is the amount of mass per unit specific energy.

[19]:
figure(figsize=(12,4))
subplot(1,2,1)
plot(E,opt['x']/dE,label=r'$N=1,001$')
plot(E2,opt2['x']/dE2,label=r'$N=2,001$')
xlabel(r'$E$')
ylabel(r'$N(E)$')
legend(frameon=False,fontsize=18.)
subplot(1,2,2)
plot(E,opt['x']/dE)
plot(E2,opt2['x']/dE2)
xlim(0.3,.5)
xlabel(r'$E$')
ylabel(r'$N(E)$')
tight_layout();
../_images/chapters_III-03.-Equilibria-of-Elliptical-Galaxies-and-Dark-Matter-Halos_79_0.svg

Figure 14.8: Optimal differential specific energy distribution for representing the harmonic-oscillator density using a superposition of a finite number of orbits, for two different orbit samplings.

We see that the distribution of energies is very spiky, with only a few orbits dominating the distribution. This is because we chose a loss function that does not penalize non-smooth distributions and that in fact maximizes spikiness (see the discussion in the previous section). Nevertheless, the density distribution resulting from this set of weights matches the target density very well, as demonstrated in Figure 14.10 below. The target density is shown over the range where it is non-zero and as we can see, the density distribution corresponding to the optimized weights matches this constant density and its range very well. The non-smoothness of the weight distribution is such that if we slightly change the set of orbits used, we can get quite different weights for orbits that are otherwise similar, yet still match the density distribution as well. For example, when using 2,001 orbits using a slightly different specific energy sampling, we get the differential specific energy distribution that is also shown in Figure 14.8 and for which the implied density is also shown in Figure 14.10. Overall, the two distributions agree well, but zooming in as we do in the right panel of Figure 14.8, we see that the spikes in the weight distribution can be quite different heights.

To obtain a smoother, and thus more realistic distribution, we need to use a loss function that penalizes non-smoothness. For the sake of this demonstration, we will use the entropy loss function and make use of scipy’s constrained optimization routines. The following function implements the entropy loss function and collects the constraints on the weights in a way that they can be given to the scipy.optimize.minimize function:

[20]:
from scipy import optimize
def optimize_weights_entropy(E,omega,nt=101,T=3.,
                             range=[-2.5,2.5],bins=51):
    rho0= omega**2./(4.*numpy.pi)
    # Initial value
    init_w= numpy.ones_like(E)/len(E)
    # Define loss function and its Jacobian: minimize minus the entropy
    def loss(w):
        return numpy.sum(w*numpy.log(w))
    def jac(w):
        return numpy.log(w)+1.
    # Define the constraint equation:
    # Integrate all orbits
    orbs= numpy.array([harmOsc_orbit(Ei,omega,nt=nt)[0] for Ei in E])
    # Compute density for each orbit
    orbhists= numpy.apply_along_axis(\
                    lambda x: numpy.histogram(x,range=range,bins=bins,
                                              density=True)[0],1,orbs).T
    # Density to match:
    dens_match= rho0*numpy.ones(bins)
    xbins= (numpy.roll(numpy.linspace(range[0],range[1],bins+1),-1)
            +numpy.linspace(range[0],range[1],bins+1))[:-1]/2.
    dens_match[(xbins < -T/2.)+(xbins > T/2)]= 0.
    constraints= [{'type':'eq',
                   'fun':lambda w: numpy.dot(orbhists,w)-dens_match,
                   'jac':lambda w: orbhists},
                  {'type':'ineq',
                   'fun': lambda w: w}]
    return optimize.minimize(loss,
                             numpy.copy(init_w)\
                                 *(1.+numpy.random.normal(size=len(init_w))/10.),
                             jac=jac,constraints=constraints,
                             method='SLSQP',options={'maxiter':1001})

We apply this method to the same problem as above, now using only 201 orbits for 31 density bins. Constrained, non-linear optimization is a difficult problem and going to larger numbers of bins fails to produce a solution in this case, but the setup below should work:

[21]:
omega= 1.1
T= 3.
Emax= omega**2.*T**2./8.
xrange= [-1.75,1.75]
nE= 201
nt= 3001
bins= 31
E_ent= numpy.linspace(1e-10,Emax*1.2,nE)
dE_ent= E_ent[1]-E_ent[0]
opt_ent= optimize_weights_entropy(E_ent,omega,T=T,
                                  bins=bins,range=xrange,nt=nt)

We again plot the differential specific energy distribution and Figure 14.9 shows that it looks much smoother than before (we only include the \(N=1,001\) case for comparison).

[22]:
figure(figsize=(6,4))
plot(E_ent,opt_ent['x']/dE_ent,
     label=r'$\mathrm{entropy}$',zorder=2,lw=2.)
plot(E,opt['x']/dE,label=r'$\mathrm{linear},\ N=1,001$',zorder=1)
xlabel(r'$E$')
ylabel(r'$f(E)$')
legend(fontsize=18,frameon=False);
../_images/chapters_III-03.-Equilibria-of-Elliptical-Galaxies-and-Dark-Matter-Halos_88_0.svg

Figure 14.9: Like Figure 14.8, but using entropy penalization of the differential specific energy distribution.

We see that the entropy-based solution is much less spiky than the solution using the linear loss function that we obtained above. The match of the density corresponding to the orbital weights with the target density is equally good as above:

[23]:
hy,hx= harmOsc_dens(E_ent,opt_ent['x'],omega,
                    nt=nt,range=xrange)
figure(figsize=(6,4))
plot((numpy.roll(hx,-1)+hx)[:-1]/2.,hy,
     ds='steps-mid',label=r'$\mathrm{Entropy}$',
     zorder=2,lw=2.)
hy,hx= harmOsc_dens(E,opt['x'],omega,nt=nt,range=xrange)
plot((numpy.roll(hx,-1)+hx)[:-1]/2.,hy,ds='steps-mid',label=r'$N=1,001$')
hy,hx= harmOsc_dens(E2,opt2['x'],omega,nt=nt,range=xrange)
plot((numpy.roll(hx,-1)+hx)[:-1]/2.,hy,ds='steps-mid',label=r'$N=2,001$')
plot(numpy.linspace(-T/2,T/2,101),
     omega**2./(4.*numpy.pi)*numpy.ones(101),label=r'$\mathrm{target}$')
ylim(0.,omega**2./(4.*numpy.pi)*1.5)
xlabel(r'$x$')
legend(fontsize=18,frameon=False);
../_images/chapters_III-03.-Equilibria-of-Elliptical-Galaxies-and-Dark-Matter-Halos_90_0.svg

Figure 14.10: Target and predicted density for the differential specific energy distributions from Figures 14.8 and 14.9.

Thus, both are good solutions, but the entropy one is obviously to be preferred as it is smoother.

Because we can compute the steady-state distribution function for a one-dimensional system directly, we can compare the numerical solution above to the true solution. To compute the homogeneous slab’s equilibrium distribution function directly, we can use the approach from Chapter 10.4.3. It is left as an exercise to demonstrate that the distribution function is \begin{equation} f(E) = \left\{\begin{array}{lr} \frac{\rho_0}{\sqrt{2}\pi\,\sigma}\,\frac{1}{\sqrt{1-E/\sigma^2}}\, & \text{for } E < \sigma^2\\ 0, & \text{otherwise } \end{array}\right.\,, \end{equation} where \(\sigma^2 \equiv \omega^2 T^2/8 = \pi G \rho_0\,T^2/2\), using the Poisson equation for the slab (14.56). However, we cannot directly compare this \(f(E)\) to the orbital weights, because \(f(E)\) gives the value of the distribution function at a given \((x,v)\), which in a steady state is only a function of integrals of the motion like the specific energy. That is, \(f(E)\) is the value of the probability distribution \(f(x,v)\) rather than a probability distribution in specific-energy space itself, which is what the distribution of orbital weights is (see the discussion in Chapter 5.5). To get the theoretical differential specific-energy distribution \(N(E)\), we need to multiply \(f(E)\) with the volume of phase-space \(g(E)\) occupied by orbits with specific energy \(E\). This function \(g(E)\) is called the density of states and it can be computed as \begin{equation} g(E) = \int \mathrm{d}x\mathrm{d}v\,\delta\left(E-H[x,v]\right)\,, \end{equation} where \(H[x,v]\) is the Hamiltonian. It is again left as an exercise to show that for the harmonic oscillator with non-zero density over the slab with width \(T\), this becomes \(g(E) = 2\pi/\omega\). Thus, we see that for the one-dimensional harmonic oscillator potential, the density of states is actually a constant equal to the period of the orbit.

In Figure 14.11, we compare the theoretically expected differential specific-energy distribution for the one-dimensional slab to the one obtained by optimizing the orbital weights using Schwarzschild modeling.

[24]:
def harmOsc_df_analytic(E,omega,T=3.):
    """Equilibrium distribution function for a harmonic
    oscillator with frequency omega and a constant-density
    slab of thickness T as a function of specific energy E."""
    sigma2= omega**2.*T**2./8.
    rho0= omega**2./(4.*numpy.pi)
    out= rho0/numpy.sqrt(2*sigma2*(1.-E/sigma2))/numpy.pi
    out[E >= sigma2]= 0.
    return out
def harmOsc_density_states(E,omega,T=3.):
    """The density of states for a harmonic
    oscillator with frequency omega over a
    constant-density slab with thickness T
    as a function of the specific energy E"""
    return 2.*numpy.pi/omega*numpy.ones_like(E)
Es= numpy.linspace(1e-10,1.5,101)
figure(figsize=(6,4))
plot(E_ent,opt_ent['x']/dE_ent)
plot(Es,harmOsc_df_analytic(Es,omega,T=T)
         *harmOsc_density_states(Es,omega))
xlabel(r'$E$')
ylabel(r'$N(E)$');
../_images/chapters_III-03.-Equilibria-of-Elliptical-Galaxies-and-Dark-Matter-Halos_93_0.svg

Figure 14.11: Theoretical differential specific energy distribution of the one-dimensional harmonic oscillator compared to the entropy solution from Figure 14.9.

We see that the agreement is excellent: the shape and amplitude of the differential specific-energy distributions almost exactly matches. The Schwarzschild-modeled weights extend to somewhat larger energies, which is because of the coarse density grid that we employed when matching the density, but otherwise the two functions are very similar. Schwarzschild modeling with a loss function that penalizes spiky distribution functions therefore returns the correct distribution function for this one-dimensional system.

14.3.3. Studying observed galaxies using Schwarzschild modeling

\label{sec-ellipequil-observedschwarzschild}

The one-dimensional example above is useful for showing how Schwarzschild modeling works in principle and in demonstrating that it recovers the correct distribution function. But being in one dimension, it is a highly idealized example that does not suffer from the complexities that plague three-dimensional Schwarzschild modeling (including such seemingly simple questions as “How long should I integrate orbits for?”). It also does not illustrate how Schwarzschild modeling is used for building dynamical models of observed galaxy photometry and kinematics. This section provides an introduction into the practice of Schwarzschild modeling of observed, external galaxies.

Observational Schwarzschild modeling generally proceeds as follows. First of all, typical observables available for observational Schwarzschild modeling are: photometric measurements of the surface brightness of the galaxy and information on the velocity distribution (velocity profile; VP) in the galaxy. This velocity information typically exists in the form of constraints on the moments of the line-of-sight velocity distribution obtained from long-slit spectroscopy along the major and minor axis of the galaxy (e.g., Franx et al. 1989; Bender et al. 1994) or, nowadays, from integral-field spectroscopy (IFS) over a two-dimensional area with various ground-based instruments (e.g., SAURON: Bacon et al. 2001; MUSE: Bacon et al. 2010). The extracted moments typically go up to the fourth moment: mean velocity, velocity dispersion, and the third and fourth moments expressed as Gauss-Hermite moments \(h_3\) and \(h_4\) (van der Marel & Franx 1993); however, the details of this do not concern us here.

Rather than blindly proposing mass models, using them to integrate orbits, and match the observed surface brightness and kinematic information (which would be extremely expensive to do!), the observed surface brightness is used to build a mass model for the luminous component(s) of the galaxy. Because the surface brightness corresponds to the two-dimensional projection of the underlying, three-dimensional luminosity density, assumptions about the underlying density have to be made in order to deproject it, and values for the orientation of the galaxy with respect to the line of sight need to be chosen (the inclination, but for triaxial models there is an additional angle as well). We discuss this deprojection further below. Next, one has to convert luminosity to mass, which is generally done by assuming a constant mass-to-light ratio \(M/L\), although this could depend on position in the galaxy or vary between different components; this conversion then gives a mass model for the luminous component. To this one can add dark components—in elliptical galaxies either a dark-matter halo or a central, supermassive black hole—to obtain the full gravitational potential.

Once one has a model for the full gravitational potential, this potential is then used to create an orbit library that contains the building blocks (the orbits) of a Schwarzschild model. For a given set of orbit weights, one can then predict both the surface brightness and whatever kinematic information is available (e.g., the mean velocity and velocity dispersion profile along the major axis, or over the face of the galaxy for IFS data), and the weights are optimized using non-linear programming (Equation 14.52), with the equality constraint corresponding to the requirement that the dynamical model should match the input luminous mass profile and the kinematic information included as a minus-chi-squared (or log likelihood) contribution to the objective function to be maximized. By running many instances of this basic Schwarzschild modeling procedure for different settings of the assumed parameters (inclination, mass-to-light ratio[s], mass of a supermassive black hole, properties of the dark-matter halo, etc.) and comparing the objective function for the best-fit orbital-weight distribution for each model, one can determine the overall best model.

This is a complicated procedure! And what’s worse, each individual part has its own set of complexities that make them highly non-trivial. So let’s walk through some of the steps in more detail in the context of a real, observed galaxy (but we won’t attempt to do the full Schwarzschild modeling, that would take us too long!). As the example, we consider NGC 4342, a low-luminosity lenticular galaxy in the Virgo cluster that has a supermassive black hole at its center (Cretton & van den Bosch 1999). Figure 14.12 has images from SDSS and HST observations that show what NGC 4342 looks like.

Image of NGC 4342, a lenticular galaxy (from SDSS)

Image of NGC 4342, a lenticular galaxy (from HST)

Image of NGC 4342, a lenticular galaxy (from HST, zoomed)

Figure 14.12: NGC 4342. SDSS \(i\) image (left) from Baillard et al. (2011); HST WFPC2/F814W images (middle, zoomed on the right) obtained using the MgeFit code (Cappellari 2002).

It is clear that NGC 4342 has a complex structure: as can be seen in the HST image, the isophotes are strongly flattened in the inner arcsecond, indicating the presence of a nuclear disk. A popular semi-parametric—meaning a model that has so many parameters that it is highly flexible—method for fitting the surface brightness profile and deprojecting it to a three-dimensional profile is that of a multi-Gaussian expansion (MGE). Such a model is easy to deal with: it can be de-projected straightforwardly under various assumptions about the underlying three-dimensional density (axisymmetric, or triaxial) and the gravitational potential can be easily computed using the methods of Chapter 12.2 (Emsellem et al. 1994). Efficient algorithms also exist to fit the MGE to observed surface photometry (Cappellari 2002; Shajib 2019).

Using similar notation as in Chapter 8.1.2, in the MGE, the observed surface brightness (in linear units of, e.g., \(L_\odot\,\mathrm{pc}^{-2}\)) \(\Sigma(r,\phi)\) is decomposed into \(N\) Gaussians as \begin{equation} \Sigma(r,\phi) = \sum_{j=1}^N {L_j \over 2\pi\,\sigma'^2_j\,q'_j}\,\exp\left[-{1 \over 2\,\sigma'^2_j}\,\left(x'^2_j + {y'^2_j \over q'^2_j}\right)\right]\,. \end{equation} where \((x'_j,y'_j)\) is the rectangular coordinate system on the sky in which the major axis falls along the \(x'_j\) axis \begin{align} x'_j & = r\,\sin\left(\phi-\psi_j\right)\,;\quad y'_j = r\,\cos\left(\phi-\psi_j\right)\,. \end{align} The parameters \((L_j,\sigma'_j,q'_j,\psi'_j)\) characterize each Gaussian: they are the total luminosity, the dispersion (in angular units) along the major axis, the axis ratio \(q'_j \leq 1\), and the position angle \(\psi'_j\); we use primed variables to indicate those that are specific to the sky projection. When not all \(\psi'_j\) are equal, then the galaxy has to be triaxial (this indicates the isophote twists discussed in Chapter 12) . The MGE surface brightness can be deprojected to a three-dimensional Gaussian expansion of the luminosity density (again working on coordinates aligned with the major axes) \begin{equation}\label{eq-mge-3dlumdens} j(x,y,z) = \sum_{j=1}^N {L_j \over (\sqrt{2\pi\,\sigma})^3\,p_j\,q_j}\,\exp\left[-{1 \over 2\,\sigma^2_j}\,\left(x^2_j + {y^2_j \over p^2_j}+ {z^2_j \over q^2_j}\right)\right]\,. \end{equation} The \(p_j\) parameters are necessary when the deprojection is triaxial (necessary when not all \(\psi'_j\) are equal), but to keep things simple, we will only consider axisymmetric deprojections, with \(p_j = 1\). The only angle necessary for the deprojection is then the inclination \(i\) (defined such that an inclination of \(i = 90^\circ\) indicates an edge-on galaxy) and we have that \(q_j = \sqrt{(q'^2_j-\cos^2 i )/ \sin^2 i}\) for an oblate deprojection and \(q_j = \sqrt{\sin^2 i / (1/q'^2_j - \cos^2 i)}\) for a prolate deprojection. See Cappellari (2002) for expressions for triaxial deprojections. The mass density \(\rho(x,y,z)\) can then be obtained by multiplying the luminosity density \(j(x,y,z)\) from Equation (14.62) by the mass-to-light ratio \(M/L\) (which could be taken to be different for different Gaussians without complicating the calculation of the gravitational potential).

We will use the decomposition obtained using the code of Cappellari (2002) to fit the HST images shown in Figure 14.12, to create a three-dimensional model of NGC 4342. Note that NGC 4342 is conveniently oriented almost edge-on and an axisymmetric deprojection is therefore almost unambiguous. The output from the Cappellari (2002) code is the total number of counts (as the Gaussians are fit to the raw photon count map), standard deviation in pixel units, and the flattening of the profile for each Gaussian. We load the output from the code in the following cell:

[25]:
ngc4342_counts= numpy.array([8656.99,29881.7,134271,48555.8,266390,
                             258099,138127,346805,103864,894409,
                             742665,266803])
ngc4342_sigpix= numpy.array([0.605197,2.12643,6.43766,7.38899,13.3541,
                             24.5162,44.5134,57.7564,86.9258,144.783,
                             260.711,260.711])
ngc4342_qprojs= numpy.array([0.805036,0.896834,0.713238,0.178244,
                             0.758412,0.878234,1,0.366929,1,0.26115,
                             0.294149,0.677718])
ngc4342_pixscale= 0.0455 # arcsec/pixel
ngc4342_exptime= 200 # s exposure time
ngc4342_distance= 16.*u.Mpc
ngc4342_inclination= 83.*u.degree
ngc4342_sky_pa= 35.8*u.degree
ngc4342_extinction= 0.031 # from NED

where we have also defined the pixel scale for the HST WFPC2/F814W observations used in the fit and the exposure time of the image (200 seconds), and we have assumed a value for the distance of 16 Mpc, an inclination of \(83^\circ\) (almost edge-on, but not exactly to keep things more interesting), a position angle on the sky of \(35^\circ.8\) (the angle between the galaxy’s major axis and the direction pointing north), and extinction. We can then convert these counts to the total surface brightness in \(L_\odot\,\mathrm{pc}^{-2}\), luminosity in \(L_\odot\), and, after assuming a mass-to-light ratio, mass in \(M_\odot\). The details of this transformation are somewhat complex and specific to the HST WFPC2/F814W observations, so we simply use them here without providing further details and obtain the total mass associated with each Gaussian component:

[26]:
def counts_to_surfdens(counts,sigpix,qproj,
                       pixscale,exptime,distance,
                       extinction=0.):
    C0= counts/(2.*numpy.pi*sigpix**2.*qproj)
    surfbright= 20.84+0.1+5.*numpy.log10(pixscale)\
        +2.5*numpy.log10(exptime)-2.5*numpy.log10(C0)-extinction
    return (64800./numpy.pi)**2.*10.**(0.4*(4.08-surfbright))*u.Lsun/u.pc**2.
def counts_to_luminosity(counts,sigpix,qproj,
                         pixscale,exptime,distance,
                         extinction=0.):
    surfdens= counts_to_surfdens(counts,sigpix,qproj,
                                 pixscale,exptime,distance,
                                 extinction=extinction)
    return 2.*numpy.pi*surfdens\
        *(sigpix*pixscale*(u.arcsec/u.rad).to(u.dimensionless_unscaled)\
          *distance)**2.*qproj
ngc4342_ml= 4.*u.Msun/u.Lsun
ngc4342_masses= ngc4342_ml*counts_to_luminosity(ngc4342_counts,ngc4342_sigpix,
                                                ngc4342_qprojs,ngc4342_pixscale,
                                                ngc4342_exptime,ngc4342_distance,
                                                extinction=ngc4342_extinction)

We also implement and apply the deprojection of the axis ratio:

[27]:
def qproj_to_q(qproj,inclination):
    return numpy.sqrt(qproj**2.-numpy.cos(inclination)**2.)/numpy.sin(inclination)
ngc4342_qs= qproj_to_q(ngc4342_qprojs,ngc4342_inclination)

and finally we compute the dispersion \(\sigma_j\) in physical units (of distance):

[28]:
ngc4342_sigmas= (ngc4342_sigpix*ngc4342_pixscale*u.arcsec*ngc4342_distance)\
    .to(u.kpc,equivalencies=u.dimensionless_angles())

We can then define a function that sets up a three-dimensional MGE model for a given set of parameters (we also allow it to rotate the model such that the \(x-y\) plane is parallel to the sky):

[29]:
from galpy.potential import (TriaxialGaussianPotential,
                             RotateAndTiltWrapperPotential)
def multi_gaussian_potential(masses,sigmas,qs,
                             inclination=None,sky_pa=None):
    """Set up an MGE mass model in galpy given a
    set of deprojected (masses,dispersions,axis ratios)
    Set the inclination and sky-position angle if you want
    to be able to show the model on the sky"""
    out= []
    for mass,sigma,q in zip(masses,sigmas,qs):
        galpot= TriaxialGaussianPotential(amp=mass,sigma=sigma,
                                          c=q.to_value(u.dimensionless_unscaled))
        if not inclination is None:
            galpot= RotateAndTiltWrapperPotential(pot=galpot,sky_pa=sky_pa,
                                                  inclination=inclination,galaxy_pa=0.)
        out.append(galpot)
    return out

We first initialize a model rotated such that the \(z\) axis is the line-of-sight and such that the galaxy in the \(x-y\) plane should lie where the observed galaxy is:

[30]:
ngc4342_mge_pot_sky= multi_gaussian_potential(ngc4342_masses,
                                              ngc4342_sigmas,
                                              ngc4342_qs,
                                              inclination=ngc4342_inclination,
                                              sky_pa=ngc4342_sky_pa)

The surface density of the MGE model over \(35^{\prime\prime}\) (similar to the middle panel in Figure 14.12) and zoomed in on the central \(\approx 7^{\prime\prime}\) is shown in Figure 14.13.

[31]:
from galpy.potential import plotSurfaceDensities
figure(figsize=(8,4))
subplot(1,2,1)
xmax= ((35./2.*u.arcsec)*ngc4342_distance).to(u.kpc,equivalencies=u.dimensionless_angles())
plotSurfaceDensities(ngc4342_mge_pot_sky,log=True,
                     xmin=-xmax,xmax=xmax,ymin=-xmax,ymax=xmax,
                     nxs=40,nys=40,
                     justcontours=True,ncontours=13,gcf=True)
gca().set_aspect('equal')
subplot(1,2,2)
xmax= ((7.5/2.*u.arcsec)*ngc4342_distance).to(u.kpc,equivalencies=u.dimensionless_angles())
plotSurfaceDensities(ngc4342_mge_pot_sky,log=True,
                     xmin=-xmax,xmax=xmax,ymin=-xmax,ymax=xmax,
                     nxs=40,nys=40,
                     justcontours=True,ncontours=13,gcf=True)
gca().set_aspect('equal')
tight_layout();
../_images/chapters_III-03.-Equilibria-of-Elliptical-Galaxies-and-Dark-Matter-Halos_122_0.svg

Figure 14.13: Multi-Gaussian expansion of NGC 4342.

In the left panel, we see the narrowing of the isophotes near the center of the galaxy. Comparing these surface-density contours to the observations in Figure 14.12, we can see that the model is a good match (the smooth contours in the observed pictures is approximately the same MGE). Now that we are happy with the model, we set up a version that has the major and minor axes aligned with the coordinate axes:

[32]:
ngc4342_mge_pot= multi_gaussian_potential(ngc4342_masses,
                                          ngc4342_sigmas,
                                          ngc4342_qs)

In building a Schwarzschild model of NGC4342, the next step is to integrate orbits in this potential (assuming that we do not need to add any additional dark component) and build an orbit library. To build an orbit library for an axisymmetric model, we can use the specific energy and the \(z\)-component of the specific angular momentum as two grid axes, because these are integrals of the motion. But as we saw in Chapter 13.1, most orbits in an axisymmetric potential have a third integral for which, however, no explicit expression is known. We therefore need a third dimension to build the full orbit grid, but there is no a priori obvious way to build this third dimension from an integral of motion.

One way to tackle this problem is to use the potential’s zero-velocity curve (see Chapter 9.1): the locus of points for a given \(E\) and \(L_z\) for which \(\Phi(R,z) + L_z^2/(2\,R^2) = E\) (equivalent to \(v_R = v_z = 0\)). This is a useful place to start, because almost all orbits intersect the zero-velocity curve in two points at \(z \geq 0\) (the thin tube orbit only has a single intersection point and certain periodic orbits do not intersect at all; Ollongren 1962). For example, let’s consider orbits with specific energy equal to the specific energy of a circular orbit at \(500\) pc and \(z\)-component of the specific angular momentum half of that of the circular orbit; we start orbits along the zero-velocity curve on a grid in \(R\) and plot their resulting surface of section in the right panel of Figure 14.14 (see Chapter 13.1).

[33]:
from galpy.potential import (evaluatePotentials, vcirc,
                             zvc, zvc_range)
from galpy.orbit import Orbit
def orbit_zvcELz(R,E,Lz,pot=None):
    """Returns Orbit initialized at zero-velocity curve at R with given (E,Lz)"""
    return Orbit([R,0.*u.km/u.s,Lz/R,zvc(pot,R,E,Lz),0.*u.km/u.s,0.*u.rad],ro=8.,vo=220.)
Rc= 500.*u.pc
E= evaluatePotentials(ngc4342_mge_pot,Rc,0.)+vcirc(ngc4342_mge_pot,Rc)**2./2.
Lz= Rc*vcirc(ngc4342_mge_pot,Rc)/2.
zvc_Rmin, zvc_Rmax= zvc_range(ngc4342_mge_pot,E,Lz)
ts= numpy.linspace(0.,1.,1001)
figure(figsize=(9,4.5))
orbs= []
for Rstart in numpy.linspace(zvc_Rmin*1.01,zvc_Rmax*0.99,5):
    o= orbit_zvcELz(Rstart,E,Lz,pot=ngc4342_mge_pot)
    o.integrate(ts,ngc4342_mge_pot)
    subplot(1,2,1)
    o.plot(gcf=True)
    subplot(1,2,2)
    o.plotSOS(ngc4342_mge_pot,ncross=1_000,marker='o',ms=4.,gcf=True)
    orbs.append(o)
subplot(1,2,1)
Rs= numpy.linspace(*zvc_range(ngc4342_mge_pot,E,Lz),101)
plot(Rs,numpy.array([zvc(ngc4342_mge_pot,R,E,Lz).to_value(u.kpc) for R in Rs])*u.kpc,
     color='k',lw=2.,zorder=0)
plot(Rs,-numpy.array([zvc(ngc4342_mge_pot,R,E,Lz).to_value(u.kpc) for R in Rs])*u.kpc,
     color='k',lw=2.,zorder=0)
subplot(1,2,2)
xlabel(r'$R\,(\mathrm{kpc})$')
ylabel(r'$v_R\,(\mathrm{km\,s}^{-1})$')
tight_layout();
../_images/chapters_III-03.-Equilibria-of-Elliptical-Galaxies-and-Dark-Matter-Halos_130_0.svg

Figure 14.14: Example orbits in the multi-Gaussian expansion model of NGC 4342 in the meridional plane (left) and as surfaces of section (right).

The left panel shows a small part of the orbit (so the figure does not become too much of a mess), while the right shows the surface of section resulting from a long orbit integration. It’s clear that by simply scanning along the zero-velocity curve, we can efficiently explore the different types of orbits for a given specific energy and specific angular momentum (indeed, we immediately stumbled upon a resonant orbit and got close to the thin tube orbit).

We can then combine the orbits with weights and see what density distribution they lead to. For example, the orbits from above integrated for a long time and plotted in the \(x-z\) plane are shown in Figure 14.15.

[34]:
figure(figsize=(10,4))
# Long and dense
long_ts= numpy.linspace(0.,100.,100001)
for ii,o in enumerate(orbs):
    o.integrate(long_ts,ngc4342_mge_pot)
    subplot(1,len(orbs),ii+1)
    plot(o.x(long_ts[:len(long_ts)//20:10]),o.z(long_ts[:len(long_ts)//20:10]),
         '-',color=rcParams['axes.prop_cycle'].by_key()['color'][ii])
    xlim(-0.83,0.83)
    ylim(-0.49,0.49)
    xlabel(r'$x\,(\mathrm{kpc})$')
    if ii == 0: ylabel(r'$z\,(\mathrm{kpc})$')
    else: tick_params(axis='y',which='both',labelleft=False)
tight_layout();
../_images/chapters_III-03.-Equilibria-of-Elliptical-Galaxies-and-Dark-Matter-Halos_132_0.svg

Figure 14.15: Orbits from Figure 14.14 in the \(x\)-\(z\) plane.

We can then, for example, predict the surface density by binning, e.g., using equal weighting of the orbits; this is shown in the top panel of Figure 14.16.

[35]:
figure(figsize=(10,6))
subplot(2,1,1)
orbital_weights= [1.,1.,1.,1.,1.]
hexbin(numpy.array([o.x(long_ts) for o in orbs]).flatten(),
       numpy.array([o.z(long_ts) for o in orbs]).flatten(),
       C=numpy.array([w*numpy.ones_like(o.x(long_ts))
                      for (w,o) in zip(orbital_weights,orbs)]).flatten(),
       gridsize=31,reduce_C_function=numpy.sum)
colorbar()
ylabel(r'$z\,(\mathrm{kpc})$')
subplot(2,1,2)
orbital_weights= [1.,0.3,0.1,0.3,1.]
hexbin(numpy.array([o.x(long_ts) for o in orbs]).flatten(),
       numpy.array([o.z(long_ts) for o in orbs]).flatten(),
       C=numpy.array([w*numpy.ones_like(o.x(long_ts))
                      for (w,o) in zip(orbital_weights,orbs)]).flatten(),
       gridsize=31,reduce_C_function=numpy.sum)
colorbar()
xlabel(r'$x\,(\mathrm{kpc})$')
ylabel(r'$z\,(\mathrm{kpc})$')
tight_layout();
../_images/chapters_III-03.-Equilibria-of-Elliptical-Galaxies-and-Dark-Matter-Halos_134_0.svg

Figure 14.16: Density distribution resulting from combining the orbits from Figure 14.15 with equal weights (top) or with weights that result in a flatter profile (bottom).

Like in the one-dimensional example from the previous section, we very much see the turning points of the individual orbits here. Attempting to get a flatter distribution by giving more weight to those orbits that remain closer to the \(x-y\) plane, we obtain the density in the lower panel. This is starting to look more like a real galaxy, but to fully match the input density profile, we have to build the full orbit library, consisting of such grids as above for each combination of specific energy and specific angular momentum. One simple way to create the specific energy–angular-momentum grid is to first choose a grid of radii and use the energies of circular orbits at those radii as the specific energy grid; at each specific energy, the specific angular momentum can range between minus and positive the specific angular momentum of that circular orbit and the specific angular-momentum grid can thus be made between these two values (note that negative specific angular momentum orbits can be obtain simply by reversing the positive specific angular momentum orbits). We won’t attempt this here, because it would take far too much computational power, but it is a straightforward continuation of what we have implemented so far.

Going through this example, some of the practical issues when running Schwarzschild modeling should start becoming clear. For example, we integrated for a fixed amount of time, but because the radial, vertical, and rotational frequencies of most orbits are incommensurate, it would take infinitely long to actually fully sample the orbit. We therefore have to make a choice and demonstrate that the resulting phase-space distribution function does not change when integrating the orbits for longer. Sampling the orbits is also difficult because of the lack of a full set of integrals of the motion for realistic models of galaxies. Starting from the zero-velocity curve works well for axisymmetric potentials, but for triaxial mass models, the issue is more complicated (e.g., van den Bosch et al. 2008). In more complex cases, such as when modeling a barred galaxy, our actual understanding of the types of orbits that exist is still incomplete and properly sampling the orbits is therefore essentially impossible using current methods. Even when we have a way of finding all of the orbits, deciding how to build the grid in this space comes with its own set of difficulties, because we want to build the grid in as natural of a way as possible and avoid sampling the same orbit more than once to reduce the degeneracies in the problem (e.g., Thomas et al. 2004). Because we cannot realistically build a very fine grid, the resulting density distribution is often far from smooth and various techniques exist to alleviate this problem (like “dithering”, running a bunch of orbits near each grid point and averaging their predictions rather than just using a single one). Chaotic orbits can present difficulties as well, e.g., because they can take a very long time to mix if their Lyapunov time scale is large. Because all of the orbits need to be stored, Schwarzschild modeling also exacts heavy memory requirements on the computer system used in the modeling. Made-to-measure modeling (Syer & Tremaine 1996) is an alternative to Schwarzschild modeling where the orbital weights are optimized during orbit integration using a gradient-descent algorithm that overcomes some of these issues (especially that of the memory requirements and of sampling the orbits in complex systems, because it can be initialized using an \(N\)-body simulation). Made-to-measure modeling is equivalent to Schwarzschild modeling in the limit of a large orbit library and short time steps in the orbit integration (Malvido & Sellwood 2015).