Dataframe manipulation and Sharpe ratio calculation in Python using pandas

Loading data from Talking to Chatbots Dataset reddgr/talking-to-chatbots-chats

😎 I have a dataframe with 3000 rows indexed and sorted by date. I want a new dataframe (without altering the original) with the last 255 rows
🤖
python
import pandas as pd

# Assuming 'df' is your original dataframe
# Create a new dataframe with the last 255 rows
new_df = df.tail(255).copy()

# new_df is the new dataframe with the last 255 rows

😎 now I want to filter for a specific calendar year. For example, 2023
🤖
python
# Assuming the date is the index of the dataframe and is in datetime format
filtered_df = new_df[new_df.index.year == 2023]

# filtered_df is the dataframe filtered for the calendar year 2023

😎 I want to add a condition that removes all columns where there are less than 250 valid numeric non-NA values
🤖
python
# Filter columns with at least 250 valid numeric non-NA values
final_df = filtered_df.dropna(axis=1, how='any', thresh=250)

# final_df is the dataframe after applying all the conditions

😎 got this error: ” “You cannot set both the how and thresh arguments at the same time.””
🤖
python
# Correcting the approach to filter columns based on valid numeric non-NA values
final_df = filtered_df.loc[:, filtered_df.apply(lambda x: pd.to_numeric(x, errors='coerce').notnull().sum() >= 250)]

# final_df now correctly filters out columns not meeting the criteria

😎 are you sure we need that complexity?

pandas.DataFrame.dropna
DataFrame.dropna(*, axis=0, how=_NoDefault.no_default, thresh=_NoDefault.no_default, subset=None, inplace=False, ignore_index=False)[source]
Remove missing values.

See the User Guide for more on which values are considered missing, and how to work with missing data.

Parameters:
axis{0 or ‘index’, 1 or ‘columns’}, default 0
Determine if rows or columns which contain missing values are removed.

0, or ‘index’ : Drop rows which contain missing values.

1, or ‘columns’ : Drop columns which contain missing value.

Only a single axis is allowed.

how{‘any’, ‘all’}, default ‘any’
Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

‘any’ : If any NA values are present, drop that row or column.

‘all’ : If all values are NA, drop that row or column.

threshint, optional
Require that many non-NA values. Cannot be combined with how.

subsetcolumn label or sequence of labels, optional
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.

inplacebool, default False
Whether to modify the DataFrame rather than creating a new one.

ignore_indexbool, default False
If True, the resulting axis will be labeled 0, 1, …, n – 1.
🤖
python
# Simplifying the approach based on your correction
final_df = filtered_df.dropna(axis=1, thresh=250)

# final_df now filters out columns with less than 250 valid numeric non-NA values without the previously mentioned complexity

😎 I have an annual risk-free rate. In order to include it in my shape ratio calculation, which includes average daily returns, I need to convert the risk-free rate from annual to daily:

sharpe_ratio = (rent_diaria_promedio – rfr_diario ) / std_dev
🤖
python
# Assuming 'rent_diaria_promedio' is the average daily return, 'std_dev' is the standard deviation of daily returns
# and 'annual_rfr' is the annual risk-free rate

# Convert annual risk-free rate to daily risk-free rate
# Assuming 252 trading days in a year
rfr_diario = (1 + annual_rfr) ** (1/252) - 1

# Calculate Sharpe Ratio
sharpe_ratio = (rent_diaria_promedio - rfr_diario) / std_dev

# sharpe_ratio is the calculated Sharpe Ratio


Hugging Face Dataset Metrics

All the conversation prompts, responses, and metrics are available to download and explore on Hugging Face dataset reddgr/talking-to-chatbots-chats: