Open Exoplanet Catalogue

The NASA open catalogue of exoplanets is a dataset of almost 4000 planets, created in 2012 and maintained as a free and decentralized database by a community of volunteers. Over the last 5 years, technological advancements and data collection efforts have spurned the discovery of more planets than in the previous 100 years combined. Sky survey projects all over the world are collecting terabytes of information every night, leaving data scientists with ample opportunity to explore the universe.2

Introduction

I downloaded the dataset from https://exoplanetarchive.ipac.caltech.edu/cgi-bin/TblView/nph-tblView?app=ExoTbls&config=planets. At the time of this article, the dataset contains 3972 confirmed exoplanets with 144 attributes relating to each. Because of the large number of attributes, I'm going to post omitted/shortened outputs of the dataframes and descriptions. If you want to print out all the information for a particular dataframe, wrap it in the function showall(df) or show(df, allcols=true, allrows=true).3

Included in the downloaded dataset is a header detailing the column definitions.

# COLUMN pl_hostname:    Host Name
# COLUMN pl_letter:      Planet Letter
# COLUMN pl_name:        Planet Name
# COLUMN pl_discmethod:  Discovery Method
# COLUMN pl_controvflag: Controversial Flag
# COLUMN pl_pnum:        Number of Planets in System
# โ‹ฎ
# COLUMN pl_orbper:      Orbital Period [days]
# COLUMN pl_orbsmax:     Orbit Semi-Major Axis [AU])
# COLUMN pl_orbeccen:    Eccentricity
# COLUMN pl_orbincl:     Inclination [deg]
# COLUMN st_m1:          m1 (Stromgren) [mag]
using ColorSchemes, CSV, DataFrames, Gadfly

# Exoplanets downloaded from https://exoplanetarchive.ipac.caltech.edu/cgi-bin/TblView/nph-tblView?app=ExoTbls&config=planets
exoplanets = CSV.read("planets_2019.06.07_18.33.16.csv", comment="#")
3972ร—144 DataFrame. Omitted printing of 134 columns
โ”‚ Row  โ”‚ rowid โ”‚ pl_hostname โ”‚ pl_letter โ”‚ pl_name   โ”‚ pl_discmethod   โ”‚ pl_controvflag โ”‚ pl_pnum โ”‚ pl_orbper โ”‚ pl_orbsmax โ”‚ pl_orbeccen โ”‚
โ”‚      โ”‚ Int64 โ”‚ String      โ”‚ String    โ”‚ String    โ”‚ String          โ”‚ Int64          โ”‚ Int64   โ”‚ Float64   โ”‚ Float64    โ”‚ Float64     โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 1    โ”‚ 1     โ”‚ 11 Com      โ”‚ b         โ”‚ 11 Com b  โ”‚ Radial Velocity โ”‚ 0              โ”‚ 1       โ”‚ 326.03    โ”‚ 1.29       โ”‚ 0.231       โ”‚
โ”‚ 2    โ”‚ 2     โ”‚ 11 UMi      โ”‚ b         โ”‚ 11 UMi b  โ”‚ Radial Velocity โ”‚ 0              โ”‚ 1       โ”‚ 516.22    โ”‚ 1.53       โ”‚ 0.08        โ”‚
โ‹ฎ
โ”‚ 3970 โ”‚ 3970  โ”‚ ups And     โ”‚ c         โ”‚ ups And c โ”‚ Radial Velocity โ”‚ 0              โ”‚ 3       โ”‚ 241.258   โ”‚ 0.827774   โ”‚ 0.2596      โ”‚
โ”‚ 3971 โ”‚ 3971  โ”‚ ups And     โ”‚ d         โ”‚ ups And d โ”‚ Radial Velocity โ”‚ 0              โ”‚ 3       โ”‚ 1276.46   โ”‚ 2.51329    โ”‚ 0.2987      โ”‚
โ”‚ 3972 โ”‚ 3972  โ”‚ xi Aql      โ”‚ b         โ”‚ xi Aql b  โ”‚ Radial Velocity โ”‚ 0              โ”‚ 1       โ”‚ 136.75    โ”‚ 0.68       โ”‚ 0.0         โ”‚

Overview

Now that we've loaded the dataset into a dataframe, we can start to dig in to the exoplanets and their attributes. By calling describe(df) we can calculate basic statistics about every attribute for every exoplanet.

# Statistical details of the entire dataset
@show describe(exoplanets)
144ร—8 DataFrame
โ”‚ Row โ”‚ variable    โ”‚ mean     โ”‚ min    โ”‚ median โ”‚ max    โ”‚ nunique โ”‚ nmissing โ”‚ eltype   โ”‚
โ”‚     โ”‚ Symbol      โ”‚ Unionโ€ฆ   โ”‚ Any    โ”‚ Unionโ€ฆ โ”‚ Any    โ”‚ Unionโ€ฆ  โ”‚ Unionโ€ฆ   โ”‚ DataType โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 1   โ”‚ rowid       โ”‚ 1986.5   โ”‚ 1      โ”‚ 1986.5 โ”‚ 3972   โ”‚         โ”‚          โ”‚ Int64    โ”‚
โ”‚ 2   โ”‚ pl_hostname โ”‚          โ”‚ 11 Com โ”‚        โ”‚ xi Aql โ”‚ 2963    โ”‚          โ”‚ String   โ”‚
โ‹ฎ
โ”‚ 142 โ”‚ st_m1       โ”‚ 0.285146 โ”‚ 0.129  โ”‚ 0.253  โ”‚ 0.774  โ”‚         โ”‚ 3615     โ”‚ Float64  โ”‚
โ”‚ 143 โ”‚ st_c1       โ”‚ 0.359557 โ”‚ -0.013 โ”‚ 0.368  โ”‚ 0.686  โ”‚         โ”‚ 3615     โ”‚ Float64  โ”‚
โ”‚ 144 โ”‚ st_colorn   โ”‚ 5.47432  โ”‚ 0      โ”‚ 5.0    โ”‚ 83     โ”‚         โ”‚          โ”‚ Int64    โ”‚

Here we can see that each of the columns belong to one of three categories, meta information about the discoveries, planet characteristics and measurements, or host star characteristics and measurements. Most of the columns show missing data, so we'll need to account for that when making comparisons between exoplanets or attributes.

How were they discovered?

Exoplanet discovery has exploded in the last 10 years thanks to powerful telescopes, sensitive photometry technology and precise data analysis. The dataset shows 10 different discovery methods used to find exoplanets. Transit and radial velocity appear to be the most popular and are both techniques that involve analyzing the photometry of the host star in a system to spot characterstics consistent with an orbiting planet.

The radial velocity method relies on the fact that a star moves in a small ellipse when a planet orbits it. This gravitational pull causes a slight shift in color signature when viewed from a distance. We can measure this shift over time, and if these movements happen at a regular interval for a fixed length of time, we can assume that a planet is probably orbiting the star.4

The transit method works by measuring the brightness of a star. When an orbiting planet pass between Earth and the star, the brightness of that star slightly dims. When this dimming happens at regular intervals, and for a fixed length of time, then it's probable that a planet is orbiting it.5

plot(
  dropmissing(exoplanets, [:pl_disc, :pl_discmethod]),
  x = :pl_disc,
  color = :pl_discmethod,
  Geom.line,
  Stat.histogram,
  Scale.y_sqrt,
  Guide.xlabel("Discovery Year"),
  Guide.colorkey(title = "Discovery Method")
)

Where are they?

Most discovered exoplanets live within 500 parsecs (1630.78 light years) of Earth. This is a limitation of our current technology more than anything. As our observation and data collection capabilities improve, I expect the number of discovered exoplanets to grow exponentially.

The closest and farthest planets we've found so far are Proxima Centauri b at 1.29 parsecs (4.21 light years) and SWEEPS-4 b/SWEEPS-11 b at 8500 parsecs (27,723.29 light years) respectively.

We're mapping the exoplanet locations using the galactic coordinate system. This is a polar coordinate system that uses the Earth as the origin and the center of the milky way galaxy as a 0 degree bearing.6 By converting the polar coordinates to cartesian coordinates, we can plot the relative position of the stars.

# Exoplanet locations
coordinates = unique(dropmissing(exoplanets, [:st_glon, :st_dist]), [:st_glon, :st_dist])

# Distance stats
sorted_distance = sort(dropmissing(exoplanets, [:st_dist]), :st_dist)
describe(sorted_distance[:st_dist])
closest = first(sorted_distance)
farthest = last(sorted_distance)

# Convert polar galactic coordinates to cartesian
x_pos = coordinates[:st_dist] .* cos.(coordinates[:st_glon])
y_pos = coordinates[:st_dist] .* sin.(coordinates[:st_glon])

plot(
  layer(
    x = [0, 8121.9961554],
    y = [0, -7.90263480146],
    label = ["Earth", "Galactic Center"],
    Geom.point,
    Geom.label,
    style(default_color=colorant"#d4d4d4", point_label_color=colorant"#d4d4d4")
  ),
  layer(
    x = x_pos,
    y = y_pos
  ),
  Guide.xlabel("Distance (Parsecs)"),
  Guide.ylabel("Distance (Parsecs)")
)

Planet Characterstics

Our solar system has 8 planets, each with varying characteristics. We have small terrestial planets, large gas giants, and cold ice giants. Do the exoplanets show as much variety? Do our discovery methods predispose us to finding certain types of planets?

How big are the planets?

When plotting the exoplanets by their mass and radius, we see a host of different sizes. The majority appear to be terrestrial around Earth's size, but we also have a smattering of gas giants bigger than Jupiter, the largest planet in our solar system.

planet_sizes = DataFrame(
  name = ["Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune"],
  mass = [0.0553, 0.815, 1, 0.107, 317.8, 95.2, 14.5, 17.1],
  radius = [0.383, 0.949, 1, 0.532, 11.21, 9.45, 4.01, 3.88]
)

plot(
  layer(
    planet_sizes,
    x = :radius,
    y = :mass,
    label = :name,
    Geom.point,
    Geom.label,
    style(default_color=colorant"#d4d4d4", point_label_color=colorant"#d4d4d4")
  ),
  layer(
    dropmissing(exoplanets, [:pl_rade, :pl_bmasse]),
    x = :pl_rade,
    y = :pl_bmasse
  ),
  Scale.y_sqrt,
  Guide.xlabel("Radius (Earth Radii)"),
  Guide.ylabel("Mass (Earth Mass)")
)

By plotting the size as a 2d density contour, we can see the patterns shown in the scatter plot above. It's clear in this plot, that most exoplanets cluster around sizes between Mercury/Earth/Mars and Uranus/Neptune.

plot(
  layer(
    planet_sizes,
    x = :radius,
    y = :mass,
    label = :name,
    Geom.point,
    Geom.label,
    style(default_color=colorant"#d4d4d4", point_label_color=colorant"#d4d4d4")
  ),
  layer(
    dropmissing(exoplanets, [:pl_rade, :pl_bmasse]),
    x = :pl_rade,
    y = :pl_bmasse,
    Geom.density2d
  ),
  style(key_position = :none),
  Scale.color_continuous(colormap=(x->colorant"#fe4365")),
  Guide.xlabel("Radius (Earth Radii)"),
  Guide.ylabel("Mass (Earth Mass)")
)

The giants in our solar system (Jupiter/Saturn/Uranus/Neptune) pale in comparison to the larger exoplanets. The plot below shows the relative size of the largest and smallest exoplanets discovered along with Jupiter and Earth as references.

sorted_size = sort(dropmissing(exoplanets, :pl_rade), :pl_rade)
smallest = first(sorted_size)
largest = last(sorted_size)

plot(
  layer(
    x = [3.5],
    y = [0],
    label = ["Kepler-37 b"],
    Geom.point,
    Geom.label,
    style(point_size = 0.336pt, point_label_color=colorant"#d4d4d4")
  ),
  layer(
    x = [3],
    y = [0],
    label = ["Earth"],
    Geom.point,
    Geom.label,
    style(point_size = 1pt, point_label_color=colorant"#d4d4d4")
  ),
  layer(
    x = [2.5],
    y = [0],
    label = ["Jupiter"],
    Geom.point,
    Geom.label,
    style(point_size = 11.21pt, point_label_color=colorant"#d4d4d4")
  ),
  layer(
    x = [1],
    y = [0],
    label = ["HD 100546 b"],
    Geom.point,
    Geom.label,
    style(point_size=77.342pt, point_label_color=colorant"#d4d4d4")
  ),
  Scale.y_continuous(minvalue=-200, maxvalue=200)
)

How hot are they?

A key characteristic for planet habitability is the surface temperature. We don't have a way to measure this on planets so far away, as atmospheric properties can raise or lower temperatures at the surface. Equilibrium temperature is a measurement we use to estimate their theoretical temperature by considering the planet as if it were a black body.7

plot(
  layer(
    x = [1],
    y = [5778],
    color = [255],
    shape = [Shape.xcross],
    size = [3pt],
    label = ["Earth"],
    Geom.point,
    Geom.label,
    style(point_label_color=colorant"#d4d4d4")
  ),
  layer(
    dropmissing(exoplanets, [:pl_eqt, :st_teff, :pl_orbsmax]),
    x = :pl_orbsmax,
    y = :st_teff,
    color = :pl_eqt
  ),
  Scale.x_log10,
  Scale.color_continuous(colormap=(x->get(ColorSchemes.blackbody, x))),
  Guide.xlabel("Orbital Semi Major Axis (AU)"),
  Guide.ylabel("Star Effective Temperature (K)"),
  Guide.colorkey(title="Planet Equilibrium   \nTemperature (K)  "),
  Guide.shapekey(pos=[10000,10000])
)

What do their orbits look like?

The orbits of the discovered exoplanets dpm't actually vary that much. Most orbits are small, circular and close to their host star.

I think the reason for these small, regular orbits has to do with our discovery methods. Since planets don't emit light, we can't measure them directly. We find them by measuring perturbations in movement or luminosity of their host star. Since a planets effect on a star (both occlusion and gravity) grows weaker with distance, it's natural that we find exoplanets that are close to their star.

# Orbit characteristics
semi_major_axis = plot(
  dropmissing(exoplanets, [:pl_orbsmax]),
  x = :pl_orbsmax,
  Geom.histogram(bincount=50),
  Scale.x_log10,
  Guide.xlabel("Orbital Semi Major Axis (AU)")
)

period = plot(
  dropmissing(exoplanets, [:pl_orbper]),
  x = :pl_orbper,
  Geom.histogram(bincount=50),
  Scale.x_log10,
  Guide.xlabel("Orbital Period (Days)")
)

eccentricity = plot(
  dropmissing(exoplanets, [:pl_orbeccen]),
  x = :pl_orbeccen,
  Geom.histogram(bincount=50),
  Guide.xlabel("Eccentricity")
)

inclination = plot(
  dropmissing(exoplanets, [:pl_orbincl]),
  x = :pl_orbincl,
  Geom.histogram(bincount=50),
  Guide.xlabel("Inclination (Deg)")
)

orbits = gridstack([semi_major_axis period; eccentricity inclination])

Do they have moons?

Not a single exoplanet in this dataset has a moon! This goes hand in hand with the discovery method problems I mentioned in the orbits section. Current techniques can't pick up objects so small, dark, and far away. The exoplanets we find are close to their host star where it's unlikely for a moon to develop a stable orbit. It's probable that we'll find a lot of exomoons in the future. Our solar system suggests that they are common around larger planets, with Jupiter and Saturn hosting 67 and 62 moons respectively.

julia> exoplanets[exoplanets[:pl_mnum] .> 0, :pl_mnum] |> length
julia> 0

Stellar Characteristics

Stars are a key factor in the life and discovery of exoplanets. Below we'll go through some of the characteristics of the stars that are hosting exoplanets and we'll see how they compare to our star, the sun.

How big are the stars?

Our sun is pretty close to the perfect average of star sizes. Of the discovered stars with exoplanets, the median mass and radius are 0.975 and 0.970 times the mass and radius of our sun. The mean mass and radius are 1.551 and 1.009 times the values of our sun.

plot(
  layer(
    x = [1],
    y = [1],
    label = ["Sun"],
    Geom.point,
    Geom.label,
    style(default_color=colorant"#d4d4d4", point_label_color=colorant"#d4d4d4")
  ),
  layer(
    dropmissing(exoplanets, [:st_rad, :st_mass]),
    x = :st_rad,
    y = :st_mass
  ),
  Guide.xlabel("Radius (Solar Radii)"),
  Guide.ylabel("Mass (Solar Radii)"),
  Scale.y_log10,
  Scale.x_log10
)

How hot and bright are they?

Most stars are actually less bright and hot than our own sun. The majority we've found are within the main sequence star classification.7

plot(
  layer(
    x = [5777],
    y = [1],
    label = ["Sun"],
    color = [5777],
    size = [3pt],
    shape = [Shape.xcross],
    Geom.point,
    Geom.label(position=:above),
    style(point_label_color=colorant"white")
  ),
  layer(
    dropmissing(exoplanets, [:st_lum, :st_teff]),
    y = :st_lum,
    x = :st_teff,
    color = :st_teff
  ),
  Scale.x_log10,
  Scale.color_continuous(colormap=(x->get(ColorSchemes.blackbody, x))),
  Guide.xlabel("Effective Temperature (K)"),
  Guide.ylabel("Luminosity (log(Solar))"),
  style(key_position=:none),
  Coord.cartesian(xflip=true)
)

What are they composed of?

All active stars give off energy through nuclear fusion reactions in their cores. Extreme pressure and temperature convert hydrogen into helium and sometimes heavier elements called metals.8 This composition is a measurement called metallicity and is a ratio of elements in comparison to the ratio of our sun. Metal rich stars tend to be older and have a higher chance of hosting terrestrial planets in its orbits.

The plot below shows the composition ratios of exoplanets we've measured. Iron is the most dominant by far, and we can see that the ratio around 0 (or our suns composition) is the most common.

met_fe = plot(
  dropmissing(exoplanets, [:st_metfe]),
  x = :st_metfe,
  Geom.histogram(bincount=50),
  Guide.xlabel("Metallicity (Dex)")
)

met_ratio = plot(
  dropmissing(exoplanets, [:st_metratio],
  x = :st_metratio,
  Geom.histogram,
  Guide.xlabel("Metallicity Ratio")
)

metallicity = hstack([met_fe, met_ratio])

Conclusion

Thanks for reading and exploring the exoplanets with me! We looked at how we discover exoplanets, what the exoplanets are like, and how their host stars compare to our own. I hope this inspires someone to dig into this dataset a bit more and hopefully find some cool insights. In the future it could be fun to build a model to process spectral data to search for exoplanet candidates of our own.

I think the future of astronomy is so exciting! It seems like every year NASA releases a cool breakthrough. Astronomy is so open and friendly and I can't wait to dig in to more universal datasets going forward.