DRAFT: Research Object Storage (S3) - Python / S3 API Guide (coming soon)
- Introduction to Boto3 python library
- Prerequisites
- Setting up Jupyter Notebook
- Setting up S3 Client connection
- S3 Actions (List/Create/Upload/Download/Delete/Versioning/Access Permission/Restoring)
Introduction to Boto3 python library
Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python. It allows Python developers to write software that makes use of services like Amazon S3, EC2, DynamoDB, and more. Boto3 makes it easy to integrate your Python application, library, or script with AWS services. It provides an object-oriented API as well as low-level access to AWS services.
For More Information: Visit Boto3 documentation page
Prerequisites
- Request for S3, Access Key ID and Secret Key ID from UW Madison Research Drive team
- Connection to WiscVPN (Virtual Private Network)
- Anaconda Navigator installed in your Local Machine
- Watch Video Tutorial: For Mac users
- Watch Video Tutorial: For Windows users
- Boto3 library installed in you Local Machine
- Follow below instructions to install Boto3 using Jupyter notebook
- After creating new notebook on your machines, to install a library in Jupyter Notebook, you need to use the
pipcommand. To install a library, you need to type the following command into a code cell and hit Shift+Enter: - !pip install boto3
Step 1: Setting Up a Jupyter Notebook
A Jupyter Notebook allows you to write and execute Python code in cells, making it easy to experiment with and run scripts step by step.
- Open Anaconda and then Click Launch on Jupyter Notebook
- Create a new Python notebook from the Jupyter interface
Step 2: Setting Up S3 Client
To connect to an S3 bucket, you need to use your credentials and set up a connection with the boto3 library. Here’s how to do it in your Jupyter notebook:
- First Make sure you are connected to campus WiFi and VPN
- In your Jupyter notebook, add the following code to initialize an S3 client:
- Replace your_aws_access_key_id and your_aws_secret_access_key (highlighted in code) with your Access Key ID and Secret Key which you received from UW Madison Research Drive team
import boto3
import botocore.exceptions
aws_access_key_id = "your_aws_access_key_id"
aws_secret_access_key = "your_aws_secret_access_key"
# Establish S3 connection
try:
s3_client = boto3.client(
's3',
endpoint_url='https://campus.s3.wisc.edu',
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key
)
print("Connection established successfully!")
except Exception as e:
print("Failed to establish connection.")
print("Error:", str(e))
Press Shift + Enter or Click on Run button in Jupyter Notebook to Run the cell of the code
Once you have installed Jupyter Notebook and set up connection with S3 Client, now you can perform various Actions based on your requirements, below are the list those Actions:
- List All S3 Buckets: See a list of all the buckets and files you have access to.
- Create New S3 bucket: Create a new bucket to upload and store your files
- Upload Files to S3: Upload one or more files from your local machine to a specific S3 bucket.
- Download Files from S3: Download files from S3 to your local machine.
- Delete Files from S3: Remove files from your S3 bucket when they are no longer needed.
- Manage Access Permissions: Set or revoke public access permissions for your S3 bucket.
- Enable/Disable Versioning: Set or suspend versioning for your S3 bucket.
- Object Restoring: Restore deleted file or folder
Action 1: List all your buckets
If you have multiple buckets in S3, you can list all of them to identify which one you want to interact with:
def list_buckets_and_files():
response = s3_client.list_buckets()
print("Listing all buckets and their contents:\n")
for bucket in response['Buckets']:
print(f"Bucket Name: {bucket['Name']}")
print(f"Creation Date: {bucket['CreationDate']}")
print("Files in the bucket:")
# List objects in the bucket
objects = s3_client.list_objects_v2(Bucket=bucket['Name'])
if 'Contents' in objects:
for obj in objects['Contents']:
print(f" File: {obj['Key']}")
print(f" Last Modified: {obj['LastModified']}")
print(f" Size (bytes): {obj['Size']}")
else:
print(" No files found in this bucket.")
print("\n" + "-" * 40 + "\n")
list_buckets_and_files() # Prints the list, files and details within the bucket
Sample Output:

Action 2: Create New S3 bucket
If you want to create new bucket within your S3 account:
def create_bucket(bucket_name):
try:
s3_client.create_bucket(Bucket=bucket_name)
print(f"Attempting to create bucket: {bucket_name}")
buckets = [bucket['Name'] for bucket in s3_client.list_buckets()['Buckets']]
if bucket_name in buckets:
print(f"Bucket '{bucket_name}' created successfully.")
else:
print(f"Failed to create bucket '{bucket_name}'. Please try again.")
except botocore.exceptions.ClientError as e:
print(f"An error occurred: {e.response['Error']['Message']}")
create_bucket('airline-bucket') # Replace 'airline-bucket' with your desired bucket name
Sample Output:

Note: Which providing the bucket name, keep in mind to only use string, numbers and hyphens. Other than this special characters and underscore will throw an error and will not allow you to create the bucket.
Action 3: Upload files to S3
You can upload files from your local machine to S3. Below are two methods, one for uploading a single file and one for uploading multiple files.
- Upload a Single File:
def upload_file(local_filename, bucket_name, object_key):
try:
print(f"Attempting to upload '{local_filename}' to bucket '{bucket_name}' as '{object_key}'...")
s3_client.upload_file(local_filename, bucket_name, object_key) # Upload file to specified S3 bucket
response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix=object_key) # Check if the file was uploaded by listing objects
if 'Contents' in response: # Verify if the upload exists in the bucket
print(f"File '{local_filename}' uploaded successfully as '{object_key}'.")
else:
print(f"Upload failed. File '{object_key}' not found in bucket '{bucket_name}'.")
except botocore.exceptions.ClientError as e: # Error handling
print(f"An error occurred: {e.response['Error']['Message']}")
upload_file('/Users/rushabh/Documents/AirlineDelay/Airline-Delay.ipynb', 'airline-bucket', 'Airline-Delay.ipynb') # Replace ('path/to/local/file.txt', 'your-bucket-name', 'file.txt') with your local machine path, your S3 bucket name, and file name
Sample Output:

- Upload Multiple Files:
def upload_files(local_files, bucket_name):
for local_file in local_files:
try:
file_key = os.path.basename(local_file)
s3_client.upload_file(local_file, bucket_name, file_key) # Upload file to specified S3 bucket
print(f"Attempting to upload '{local_file}' to bucket '{bucket_name}' as '{file_key}'...")
response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix=file_key) # Check if the file was uploaded by listing objects
if 'Contents' in response: # Verify if the upload exists in the bucket
print(f"File '{local_file}' uploaded successfully as '{file_key}'.")
else:
print(f"Upload failed. File '{file_key}' not found in bucket '{bucket_name}'.")
except botocore.exceptions.ClientError as e: # Error handling
print(f"An error occurred while uploading '{local_file}': {e.response['Error']['Message']}")
upload_files(['/Users/rushabh/Documents/AirlineDelay/January.csv', '/Users/rushabh/Documents/AirlineDelay/February.csv'], 'airline-bucket') # Replace ('path/to/local/file.txt', 'your-bucket-name') with your local machine path, your S3 bucket name
Sample Output:

Action 4: Download Files from S3
To retrieve a file from your S3 bucket to your local machine, use this function:
def download_file(bucket_name, object_key, local_filename):
try:
s3_client.download_file(bucket_name, object_key, local_filename) # Attempting to download file from S3 bucket
print(f"Attempting to download '{object_key}' from bucket '{bucket_name}'...")
if os.path.exists(local_filename): # Verify if the file was downloaded by checking local existence
print(f"File '{object_key}' found on server and download function completed; please verify '{local_filename}' for changes.")
else:
print(f"Download failed. File '{local_filename}' not found.")
except botocore.exceptions.ClientError as e: # Error handling
print(f"An error occurred: {e.response['Error']['Message']}")
download_file('airline-bucket', 'Airline-Delay.ipynb', '/Users/rushabh/Desktop/Airline-Delay.ipynb') # Replace ('bucket-name', 'file.txt', 'path/to/local/file.txt') with your bucket name, file name to download, and path of local machine
Sample Output:

Action 5: Delete Files from S3
You can delete files in your S3 bucket to free up space or remove outdated data. There are two methods for this: one for deleting a single file and one for deleting multiple files.
- Delete a Single File
def delete_file(bucket_name, object_key):
try:
response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix=object_key) # Check if the specified file exists in the bucket
if 'Contents' not in response: # If 'Contents' is not in the response, the file does not exist
print(f"File '{object_key}' does not exist in bucket '{bucket_name}'.")
return
else:
s3_client.delete_object(Bucket=bucket_name, Key=object_key) # File exists; proceed to delete the file
print(f"File '{object_key}' deleted successfully from bucket '{bucket_name}'.")
except botocore.exceptions.ClientError as e: # Error handling
print(f"An error occurred: {e.response['Error']['Message']}")
delete_file('airline-bucket', 'January.csv') # Replace ('bucket-name', 'file.txt') with your bucket name, and file name to delete
Sample Output:

- Delete Multiple Files
def delete_files(bucket_name, object_keys):
try:
existing_files = [] # Initialize a list to store existing files
for key in object_keys: # Check if each specified file exists in the bucket
response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix=key)
if 'Contents' in response:
existing_files.append(key)
else:
print(f"File '{key}' does not exist in bucket '{bucket_name}'.")
if not existing_files: # If no files exist for deletion, print a message and return
print("No valid files found for deletion.")
return
delete_objects = [{'Key': key} for key in existing_files] # Prepare files for deletion
s3_client.delete_objects(Bucket=bucket_name, Delete={'Objects': delete_objects}) # Delete all existing files in a single request
print(f"Files {existing_files} deleted successfully from bucket '{bucket_name}'.")
except botocore.exceptions.ClientError as e: # Error handling
print(f"An error occurred: {e.response['Error']['Message']}")
delete_files('airline-bucket', ['February.csv', 'Airline-Delay.ipynb']) # Replace ('bucket-name', ['file1.txt', 'file2.txt']) with your bucket name, and files name to delete
Sample Output:

Action 6: Managing Access Permissions
You can set or remove public access permissions for your bucket or individual files. (Read/Write/Full Access)
- Check first what kind of access (Read/Write/Full Control) your bucket has and then accordingly provide the access, to check follow the below code
def check_public_acl(bucket_name):
try:
response = s3_client.get_bucket_acl(Bucket=bucket_name) # Retrieve the ACL for the specified bucket
permissions = [] # List to add if other public permissions detected
for grant in response['Grants']: # Loop through each grant in the ACL response
grantee = grant['Grantee']
if grantee.get('Type') == 'Group' and grantee.get('URI') == 'http://acs.amazonaws.com/groups/global/AllUsers': # Check if grantee is the "All Users" group, indicating public access
permissions.append(grant['Permission'])
if permissions:
print(f"Public access detected with the following permissions: {permissions}")
return permissions
else:
print("No public access found in the bucket ACL.")
return None
except botocore.exceptions.ClientError as e:
print(f"An error occurred: {e.response['Error']['Message']}")
return None
check_public_acl('airline-bucket') # Replace 'bucket-name' with your desired bucket name
Sample Output:

- Grant Public read access to the bucket
def grant_public_read_access(bucket_name):
try:
s3_client.put_bucket_acl(Bucket=bucket_name, ACL='public-read') # Set the bucket ACL to public read
current_permissions = check_public_acl(bucket_name) # Verify if read access is granted
if 'READ' in current_permissions:
print(f"Public read access granted to bucket '{bucket_name}'.")
else:
print(f"Failed to grant public read access to bucket '{bucket_name}'.")
except botocore.exceptions.ClientError as e: # Error handling
print(f"An error occurred: {e.response['Error']['Message']}")
grant_public_read_access('airline-bucket') # Replace 'bucket-name' with your desired bucket name
Sample Output:

- Grant Public write access to the bucket
def grant_public_write_access(bucket_name):
try:
s3_client.put_bucket_acl(Bucket=bucket_name, ACL='public-read-write') # Set the bucket ACL to public read-write
current_permissions = check_public_acl(bucket_name) # Verify if write access is granted
if 'WRITE' in current_permissions:
print(f"Public write access granted to bucket '{bucket_name}'.")
else:
print(f"Failed to grant public write access to bucket '{bucket_name}'.")
except botocore.exceptions.ClientError as e: # Error handling
print(f"An error occurred: {e.response['Error']['Message']}")
grant_public_write_access('airline-bucket') # Replace 'bucket-name' with your desired bucket name
Sample Output:

Note: It is not recommended to provide write access and full control access
- Grant full control access to everyone
def grant_full_control(bucket_name):
try:
# Set full control access for the All Users group
acl = {
'Grants': [
{'Grantee': {'Type': 'Group', 'URI': 'http://acs.amazonaws.com/groups/global/AllUsers'},'Permission': 'FULL_CONTROL'}],
'Owner': s3_client.get_bucket_acl(Bucket=bucket_name)['Owner']
}
s3_client.put_bucket_acl(Bucket=bucket_name, AccessControlPolicy=acl) # Apply the ACL policy with full control
current_permissions = check_public_acl(bucket_name) # Verify if full control access is granted
if 'FULL_CONTROL' in current_permissions:
print(f"Full control access granted to bucket '{bucket_name}'.")
else:
print(f"Failed to grant full control access to bucket '{bucket_name}'.")
except botocore.exceptions.ClientError as e: # Error handling
print(f"An error occurred: {e.response['Error']['Message']}")
grant_full_control('airline-bucket') # Replace 'bucket-name' with your desired bucket name
Sample Output:

- Remove Public Access (Read/Write/Full Control) from the bucket
def remove_public_access(bucket_name):
try:
s3_client.put_bucket_acl(Bucket=bucket_name, ACL='private') # Set the bucket ACL to private
current_permissions = check_public_acl(bucket_name) # Verify if public access is removed
if not current_permissions:
print(f"Public access removed from bucket '{bucket_name}'.")
else:
print(f"Failed to remove public access from bucket '{bucket_name}'.")
except botocore.exceptions.ClientError as e: # Error handling
print(f"An error occurred: {e.response['Error']['Message']}")
remove_public_access('airline-bucket') # Replace 'bucket-name' with your desired bucket name
Sample Output:

Action 7: Enable & Disable Versioning
You can enable or suspend (i.e. Disable) versioning on a bucket depending on your requirements.
Learn More about Versioning/backup
- Check first the status of versioning to the bucket before Enabling or Disabling version
def check_versioning(bucket_name):
try:
response = s3_client.get_bucket_versioning(Bucket=bucket_name) # Get the versioning configuration for the bucket
if "Status" in response: # Check if 'Status' is present in the response
return f"Versioning status for bucket '{bucket_name}': {response['Status']}"
else:
print(f"Bucket '{bucket_name}' does not have versioning configured.")
return "Unconfigured"
except botocore.exceptions.ClientError as e: # Error handling
print(f"An error occurred: {e.response['Error']['Message']}")
return None
check_versioning('airline-bucket') # Replace 'bucket-name' with your required bucket name
Sample Output:

- Enable Versioning
def enable_versioning(bucket_name):
try:
s3_client.put_bucket_versioning(Bucket=bucket_name, VersioningConfiguration={'Status': 'Enabled'}) # Attempt to enable versioning
response = s3_client.get_bucket_versioning(Bucket=bucket_name) # Verify that versioning is enabled
if response.get("Status") == "Enabled": # Check if versioning is enabled by checking the "Status" field
print(f"Versioning has been enabled for bucket '{bucket_name}'.")
else:
print(f"Failed to enable versioning for bucket '{bucket_name}'. Versioning status is '{response.get('Status')}'.")
except botocore.exceptions.ClientError as e: # Error handling
print(f"An error occurred while enabling versioning: {e.response['Error']['Message']}")
enable_versioning('airline-bucket') # Replace 'bucket-name' with your required bucket name
Sample Output:

- Disable Versioning
def disable_versioning(bucket_name):
try:
s3_client.put_bucket_versioning(Bucket=bucket_name, VersioningConfiguration={'Status': 'Suspended'}) # Attempt to disable versioning
response = s3_client.get_bucket_versioning(Bucket=bucket_name) # Verify that versioning is disabled
if response.get("Status") == "Suspended": # Check if versioning is disabled by checking the "Status" field
print(f"Versioning has been disabled for bucket '{bucket_name}'.")
else:
print(f"Failed to suspend versioning for bucket '{bucket_name}'. Versioning status is '{response.get('Status')}'.")
except botocore.exceptions.ClientError as e: # Error handling
print(f"An error occurred while suspending versioning: {e.response['Error']['Message']}")
disable_versioning('airline-bucket') # Replace 'bucket-name' with your desired bucket name
Sample Output:

Action 8: Restoring Deleted File/Folder
You can restore the deleted files or folder. If you want to restore the deleted object, make sure your Version of bucket if enabled before deleting, if versioning is not enabled then you won't be able to restore the object.
- Restore the single file
def get_restore_deleted_object(bucket_name, object_key):
try:
response = s3_client.list_object_versions(Bucket=bucket_name, Prefix=object_key) # List all object versions for the given object key
if 'DeleteMarkers' in response: # Check if there are delete markers in the response
for version in response['DeleteMarkers']:
if version['IsLatest']: # Identify the latest delete marker
delete_marker_version_id = version['VersionId']
print(f"Delete marker found: VersionId = {delete_marker_version_id}")
try:
s3_client.delete_object(Bucket=bucket_name, Key=object_key, VersionId=delete_marker_version_id) # Restore the object by deleting the delete marker
print(f"Restored: Key = {object_key}")
except Exception as restore_error:
print(f"Error restoring object {object_key}: {str(restore_error)}")
break # Exit the loop after restoring the latest delete marker
else:
print(f"No latest delete marker found for {object_key}")
else:
print("No delete markers found")
except Exception as e:
print(f"Error processing object {object_key}: {str(e)}")
get_restore_deleted_object('airline-bucket', 'January.csv') # Restore a single deleted object
Sample Output:

- Restore multiple files
def get_restore_deleted_objects(bucket_name, object_keys):
for object_key in object_keys:
try:
response = s3_client.list_object_versions(Bucket=bucket_name, Prefix=object_key) # List all object versions for the given object key
if 'DeleteMarkers' in response:
print("----------------------------------------------------")
print(f"Checking versions for object: {object_key}")
for version in response['DeleteMarkers']:
if version['IsLatest']: # Identify the latest delete marker
delete_marker_version_id = version['VersionId']
print(f"Found delete marker for {object_key}: VersionId={delete_marker_version_id}")
s3_client.delete_object(Bucket=bucket_name, Key=object_key, VersionId=delete_marker_version_id) # Restore the object by deleting the delete marker
print(f"Restored: Key={object_key}, VersionId={delete_marker_version_id}")
break # Exit the loop after restoring the latest delete marker
else:
print(f"No latest delete marker found for {object_key}")
else:
print(f"No delete markers found for object: {object_key}")
except Exception as e:
print(f"Error processing object {object_key}: {str(e)}")
object_keys_to_restore = ['January.csv', 'February.csv']
get_restore_deleted_objects('airline-bucket', object_keys_to_restore) # Restore multiple deleted objects
Sample Output:

- Restore the whole folder
def get_restore_deleted_folder(bucket_name, folder_prefix):
try:
response = s3_client.list_object_versions(Bucket=bucket_name, Prefix=folder_prefix) # List all object versions for objects within the specified folder
objects_to_restore = []
if 'DeleteMarkers' in response: # Check if there are delete markers in the response
for version in response['DeleteMarkers']:
if version['IsLatest']: # Identify the latest delete marker for each object
objects_to_restore.append({
'Key': version['Key'],
'VersionId': version['VersionId']
})
if objects_to_restore: # If there are objects to restore, delete their delete markers
for obj in objects_to_restore:
try:
s3_client.delete_object(Bucket=bucket_name, Key=obj['Key'], VersionId=obj['VersionId'])
print(f"Restored: {obj['Key']}, VersionId={obj['VersionId']}")
except Exception as restore_error:
print(f"Error restoring object {obj['Key']}: {str(restore_error)}")
else:
print("No delete markers found for the specified folder.")
except Exception as e:
print(f"Error processing folder {folder_prefix}: {str(e)}")
get_restore_deleted_folder('airline-bucket', 'airline-data') # Restore all deleted objects within a folder
Sample Output:
