Atmospheric Retrievals


The atmospheric_retrievals subpackage within the exoctk package currently contains a module for performing retrievals via the PLATON package. This Jupyter notebook (also shown below) contains a demo of how to use the platon_wrapper module.


Users who wish to use the atmospheric_retrievals tools may do so by installing the exoctk package. Please see the installation instructions for further details.


atmospheric_retrievals_demo
In [1]:
from IPython.display import Image

Atmospheric Retrievals

This notebook demonstrates the use of ExoCTK's platon_wrapper module. As suggested by its name, this module is a wrapper around the platon.retreiver.run_multinest and platon.retriever.run_emcee methods, which uses multinested sampling and MCMC algorithms to retrieve atmospheric parameters, respectively. For further information about platon, see the project documentation, the API docs for the retriever module, or the GitHub repository.

Note that some of the examples provided below are minimal, bare-bones examples that are meant to show how users may use the software while not taking much computation time to complete. The parameters used and the corresponding results are not indiative of a true scientific use case. For more comprehensive and robust examples, see the examples.py module in the exoctk.atmospheric_retrievals subpackage.

The first section of this notebook provides a simple example of how to run the software on a local machine. The later sections describe how to perform retrievals using Amazon Web Services (AWS) Elastic Computing (EC2) instances.

This notebook assumes that the user has installed the exoctk package and its required libraries. For more information about the installation of exoctk, see the installation instructions.

A Simple Example

Provided below is a simple example of how to use the PlatonWrapper object to perform atmospheric retrievals. First, a few necessary imports:

In [ ]:
import numpy as np
from platon.constants import R_sun, R_jup, M_jup
from exoctk.atmospheric_retrievals.platon_wrapper import PlatonWrapper

The PlatonWrapper object requires the user to supply a dictionary containing initial guesses of parameters that they wish to fit. Note that Rs, Mp, Rp, and T must be supplied, while the others are optional.

Also note that Rs are in units of solar radii, Mp are in units of Jupiter masses, and Rp is is units of Jupiter radii.

In [ ]:
params = {
    'Rs': 1.19,  # Required
    'Mp': 0.73,  # Required
    'Rp': 1.4,  # Required
    'T': 1200.0,  # Required
    'logZ': 0,  # Optional
    'CO_ratio': 0.53,  # Optional
    'log_cloudtop_P': 4,  # Optional
    'log_scatt_factor': 0,  # Optional
    'scatt_slope': 4,  # Optional
    'error_multiple': 1,  # Optional
    'T_star': 6091}  # Optional

In order to perform the retrieval, users must instantiate a PlatonWrapper object and set the parameters

In [ ]:
pw = PlatonWrapper()
pw.set_parameters(params)

Users may define fitting priors via the fit_info attribute.

In [ ]:
pw.fit_info.add_gaussian_fit_param('Mp', 0.04*M_jup)
pw.fit_info.add_uniform_fit_param('Rp', 0.9*(1.4 * R_jup), 1.1*(1.4 * R_jup))
pw.fit_info.add_uniform_fit_param('T', 300, 3000)
pw.fit_info.add_uniform_fit_param("logZ", -1, 3)
pw.fit_info.add_uniform_fit_param("log_cloudtop_P", -0.99, 5)

Prior to performing the retrieval, users must define bins, depths, and errors attributes. The bins atribute must be a list of lists, with each element being the lower and upper bounds of the wavelenth bin. The depths and errors attributes are both 1-dimensional numpy arrays.

In [ ]:
wavelengths = 1e-6*np.array([1.119, 1.1387])
pw.bins = [[w-0.0095e-6, w+0.0095e-6] for w in wavelengths]
pw.depths = 1e-6 * np.array([14512.7, 14546.5])
pw.errors = 1e-6 * np.array([50.6, 35.5])

With everything defined, users can now perform the retrieval. Users may choose to use the the MCMC method (emcee) or the Multinested Sampling method (multinest)

MCMC Method

In [ ]:
pw.retrieve('emcee')
pw.save_results()
pw.make_plot()

Mulinested Sampling Method

In [ ]:
pw.retrieve('multinest')
pw.save_results()
pw.make_plot()

Note that results are saved in a text file named <method>_results.dat, a corner plot is saved to <method>_corner.png, and a log file describing the execution of the software is saved to YYYY-MM-DD-HH-MM.log, which is a timestamp reflecting the creation time of the log file.

Using Amazon Web Services to Perform Atmospheric Retrievals

The following sections guide users on how to perform atmospheric retrievals using Amazon Web Services (AWS) Elastic Computing (EC2) instances.

Table of Contents:

  1. What is Amazon Web Services?
  2. What is an EC2 instance?
  3. Why use AWS?
  4. AWS-specific software in the atmospheric_retrievals subpackage
  5. Setting up an AWS account
  6. Create a SSH key pair
  7. Launch an EC2 instance
  8. Build the exoctk software environment on the EC2 instance
  9. Fill out the aws_config.json file
  10. Run some code!
  11. Output products
  12. Using a GPU-enabled EC2 instance

What is Amazon Web Services?

Amazon Web Services provides on-demand cloud-based computing platforms, with a variety of services such as Elastic Compute Cloud (EC2), Cloud Storage (S3), Relational Database Service (RDS), and more. Learn more at https://aws.amazon.com/what-is-aws/

What is an EC2 instance?

The Elastic Compute Cloud (EC2) service enables users to spin up virtual servers, with a variety of operating systems, storage space, memory, processors. Learn more at https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html

Why use AWS?

Atmospheric retrievals are often computationally expensive, both in the amount of time it takes to complete a retrieval, but also in the cost of purchasing and/or maintaining a suitable machine. Particularly, if users do not have access to a dedicated science machine or cluster, and instead must rely on personal laptops or desktops, atmospheric retrievals can become quite burdensome in day-to-day research work.

AWS provides a means to outsource this computational effort to machines that live in the cloud, and for low costs. With AWS, users can create a virtual machine (VM), perform atmospheric retrievals, and have the machine automatically shutdown upon completion. Depending on the type of VM, typical costs can range from anywhere between ~\$0.02/hour (i.e. a small, 1 CPU Linux machine) to ~\\$3.00/hour (i.e. a heftier, multiple CPU, GPU-enabled Linux machine).

For example, a small trial run of an atmospheric retrieval for hd209458b using PLATON takes roughly 35 minutes at a total cost of \$0.01 using a small CPU EC2 instance, and took roughly 24 minutes at a total cost of \\$1.22 using a GPU-enabled EC2 instance.

AWS-specific software in the atmospheric_retrievals subpackage

The atmospheric_retrievals subpackage provides software that enables users to use AWS EC2 instances to perform atmospheric retrievals. The relevant software modules/tools are:

  • aws_config.json - a configuration file that contains a path the a public ssh key and a pointer to a particular EC2 instance
  • aws_tools.py - Various functions to support AWS EC2 interactivity, such as starting/stopping an EC2 instance, transferring files to/from EC2 instances, and logging standard output from EC2 instances
  • build-exoctk-env-cpu.sh - A bash script for creating an exoctk software environment on an EC2 instance
  • build-exoctk-env-gpu.sh - A bash script for creating an exoctk software environment on a GPU-enabled EC2 instance
  • exoctk-env-init.sh - A bash script that initializes an existing exoctk software environment on an EC2 instance

Setting up an AWS account

Users must first set up an AWS account and configure a ssh key pair in order to connect to the services.

  1. Visit https://aws.amazon.com to create an account. Unfortunately, a credit card is required for sign up. There is no immediate fee for signing up; users will only incur costs when a service is used.
  2. Once an account has been created, sign into the AWS console. Users should see a screen similar to this:
  3. At the top of the page, under "Services", select "IAM" to access the Identity and Access Management console.
  4. On the left side of the page, select "Users"
  5. Click the "Add user" button to create a new user. In the "User name" field, enter the username used for the AWS account. Select "Programmatic access" for the "Access type" option. Click on "Next: Permissions".
  6. Select "Add user to group", and click "Create group". A "Create group" pane will open. In the "Group name" field, enter "admin". Check the box next to the first option, "AdministratorAccess", and click "Create group".
  7. Click "Next: Tags". This step is optional, so users may then click "Next: Review", then "Create user".
  8. When the user is created, users will be presented with a "Access Key ID" and "Secret Access Key". Take note of these, or download them to a csv file, as they will be used in the next step.
  9. In a terminal, type aws configure. Users will be prompted to enter their Access Key ID and the Secret Access Key from the previous step. Also provide a Default region name (e.g. us-east-1, us-west-1, etc.) and for "output format" use json. For a list of available region names, see https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html
  10. Executing these commands should result in the creation of a ~/.aws/ directory, containing config and credentials files populated with the information that was provided.

Create a SSH key pair

In order to connect to an EC2 instance, users must next configure an SSH key pair:

  1. In a terminal, type ssh-keygen -t rsa -f <rsa_key_name>, where <rsa_key_name> is the name of the resulting ssh key files (users can name this whatever they would like). When prompted to enter a passphrase, leave it empty by hitting enter, and then enter again. Running this command should result in the creation of two files: (1) <rsa_key_name>, which is the private SSH key, and <rsa_key_name>.pub, which is the public SSH key.
  2. In the browser, navigate to the AWS EC2 console (https://console.aws.amazon.com/ec2), select Key Pairs under Network & Security on the left hand side of the page.
  3. Select Import key pair
  4. In the Name field, enter a name you wish to use.
  5. In the large field on the bottom, paste the contents of the <rsa_key_name>.pub file.
  6. Select Import key pair to complete the process.

Launch an EC2 instance

To create and launch an EC2 instance:

  1. Select "Instances" from the left-hand side of the AWS EC2 console
  2. Select the "Launch Instance" button
  3. Select an Amazon Machine Image (AMI) of your choosing. Note that there is a box on the left that allows users to only show free tier only eligible AMIs. For the purposes of the examples in this notebook, it is suggested to use ami-0c322300a1dd5dc79 (Red Hat Enterprise Linux 8 (HVM), SSD Volume Type, 64-bit (x86)).
  4. Select the Instance Type with the configuration of your choosing. For the purposes of the examples in this notebook, it is suggested to use t2.small. When satisfied, choose "Review and Launch"
  5. On the "Review Instance Launch" page, users may review and/or change any settings prior to launching the EC2 instance. For the purposes of the examples in this notebook, it is suggested to "Edit storage" and increase the "Size" to 20 GiB to allow enough storage space to build the exoctk software environment.
  6. When satisfied, click "Launch". The user will be prompted to select or create a key pair. Select the existing key pair that was created in the Create a SSH key pair section. Check the acknowledgement box, and select "Launch Instances"
  7. If the EC2 instance was launched successfully, there will be a success message with a link to the newly-created EC2 instance.

Note: For users interested in using GPU-enabled EC2 instances, see the Using a GPU-enabled EC2 instance section at the end of this notebook. This warrants its own section because it requires a rather complex installation process.

Build the exoctk software environment on the EC2 instance

Once the newly-created EC2 instance has been in its "Running" state for a minute or two, users can log into the machine through the command line and install the necessary software dependencies needed for running the atmospheric_retrievals code.

To log into the EC2 instance from the command line, type:

ssh -i <path_to_private_key> ec2-user@<ec2_public_dns>

where <path_to_private_key> is the path to the private SSH key file (i.e. the <rsa_key_name> that was created in the Create a SSH key file section), and <ec2_public_dns> is the Public DNS of the EC2 instance, which is provided in the "Description" of the EC2 instance under the "Instances" panel in the AWS EC2 console. This public DNS should look something like ec2-NN-NN-NNN-NN.compute-N.amazonaws.com.

Users may be asked (yes/no) if they want to connect to the machine. Enter "yes".

Once logged in, users can build the exoctk software environment by either copy/pasting the commands from the atmospheric_retrievals/build-exoctk-env-cpu.sh file straight into the EC2 terminal, or by copying the build-exoctk-env-cpu.sh file directly to the EC2 instance and running it. To do the later option, from your local machine, type:

scp -i <path_to_private_key> build-exoctk-env-cpu.sh ec2-user@<ec2_public_dns>:/home/ec2-user/build-exoctk-env-cpu.sh
ssh -i <path_to_private_key> ec2-user@<ec2_public_dns>
./build-exoctk-env-cpu.sh

Once completed, users may log out of the EC2 instance, as there will no longer be any command-line interaction needed.

Fill out the aws_config.json file

Within the atmospheric_retrievals subpackage, there exists an aws_config.json file. Fill in the values for the two fields: ec2_id, and ssh_file. The ec2_id should contain the name of EC2 template ID (which can be found under "Instance ID" in the description of the EC2 instance in the AWS EC2 console), and ssh_file should point to the location of the private SSH file described in the Create a SSH key pair section:

{
    "ec2_id" : "<ec2_instance_ID>",
    "ssh_file" : "<path_to_private_key>"
}

Run some code!

Now that we have configured everything to run on AWS, the next step is to simply perform a retrieval! Open a Python session or Jupyter notebook. To invoke the use of the AWS EC2 instance, simply use the use_aws() method before performing the retrieval. A short example is provided below.

import numpy as np
from platon.constants import R_sun, R_jup, M_jup
from exoctk.atmospheric_retrievals.aws_tools import get_config
from exoctk.atmospheric_retrievals.platon_wrapper import PlatonWrapper

params = {
    'Rs': 1.19,  # Required
    'Mp': 0.73,  # Required
    'Rp': 1.4,  # Required
    'T': 1200.0,  # Required
    'logZ': 0,  # Optional
    'CO_ratio': 0.53,  # Optional
    'log_cloudtop_P': 4,  # Optional
    'log_scatt_factor': 0,  # Optional
    'scatt_slope': 4,  # Optional
    'error_multiple': 1,  # Optional
    'T_star': 6091}  # Optional

pw = PlatonWrapper()
pw.set_parameters(params)

pw.fit_info.add_gaussian_fit_param('Mp', 0.04*M_jup)
pw.fit_info.add_uniform_fit_param('Rp', 0.9*(1.4 * R_jup), 1.1*(1.4 * R_jup))
pw.fit_info.add_uniform_fit_param('T', 300, 3000)
pw.fit_info.add_uniform_fit_param("logZ", -1, 3)
pw.fit_info.add_uniform_fit_param("log_cloudtop_P", -0.99, 5)

wavelengths = 1e-6*np.array([1.119, 1.1387])
pw.bins = [[w-0.0095e-6, w+0.0095e-6] for w in wavelengths]
pw.depths = 1e-6 * np.array([14512.7, 14546.5])
pw.errors = 1e-6 * np.array([50.6, 35.5])

ssh_file = get_config()['ssh_file']
ec2_id = get_config()['ec2_id']
pw.use_aws(ssh_file, ec2_id)

pw.retrieve('multinest')
pw.save_results()
pw.make_plot()

Output products

Executing the above code will result in a few output files:

  • YYYY-MM-DD-HH-MM.log - A log file that captures information about the execution of the code, including software environment information, EC2 start/stop information, retrieval information and results, and total computation time.
  • multinest_results.dat/emcee_results.obj - A data file containing the best fit results of the retrieval. Note that emcee results are saved as a Python object and saved to an object file.
  • <method>_corner.png - A corner plot describing the quality of the best fit results of the retrieval, where <method> is the method used (i.e. multinest or emcee)

Here is an example of what these output products may look like:

In [2]:
Image(filename='figures/corner_plot.png')
Out[2]:
In [3]:
Image(filename='figures/results.png')
Out[3]: