Useful tool for inserting DataFrames into the Excel sheet.

Overview

PyCellFrame

Insert Pandas DataFrames into the Excel sheet with a bunch of conditions

Install

pip install pycellframe

Usage

Examples

Let's suppose that we have an Excel file named "numbers.xlsx" with the sheet named "Dictionary" in which we would like to insert the pandas.DataFrame.

Import pandas and create an example DataFrame (which will be inserted into the Excel sheet):

import pandas as pd


ex = {
    'Num': [1, 2, 3, 4],
    'AfterFirstBlankCol': 'AfterFirstBlank',
    'Descr': ['One', 'Two', 'Three', 'Four'],
    'AfterSecondBlankCol': 'AfterSecondBlank.',
    'Squared': [1, 4, 9, 16],
    'Binary:': ['1', '10', '11', '100']
}

df = pd.DataFrame(ex)
  • Import openpyxl.load_workbook and open numbers.xlsx - Our Excel workbook;
  • Get - Dictionary our desired sheet:
from openpyxl import load_workbook


workbook = load_workbook('numbers.xlsx')
worksheet = workbook['Dictionary']

Functions

1. incell_style(cell_src, cell_dst)
  • Let's say, we have a cell in Excel Dictionary sheet that we would like to copy the style from, and it is O3;
  • Let O4 be our destination cell:

NOTE: If we wanted to copy that style to more than one cell, we would simply use the loop depending on the locations of the destination cells.

from pycellframe import incell_style


incell_style(cell_src=worksheet['O3'], cell_dst=worksheet['O4'])
2. sheet_to_sheet(filename_sheetname_src, filename_sheetname_dst, calculated)
  • Let's say that we have two Excel files, and we need specific sheet from one file to be completely copied to another file's specific sheet;
  • filename_sheetname_src is the parameter for one file -> sheet the data to be copied from (tuple(['FILENAME_SRC', 'SHEETNAME_SRC']));
  • worksheet_dst is the parameter for the destination Worksheet the data to be copied to (openpyxl.worksheet.worksheet.Worksheet);
  • Let's assume that we have file_src.xlsx as src file and for worksheet_src we can use its CopyThisSheet sheet.
  • We can use output.xlsx -> CopyToThisSheet sheet as the destination worksheet, for which we already declared the Workbook object above.

NOTE: We are assuming that we need all the formulas (where available) from the source sheet, not calculated data, so we set calculated parameter to False

from pycellframe import sheet_to_sheet


worksheet_to = workbook['CopyToThisSheet']

sheet_to_sheet(filename_sheetname_src=('file_src.xlsx', 'CopyThisSheet'),
               worksheet_dst=worksheet_to,
               calculated=False)
3. incell_frame(worksheet, dataframe, col_range, row_range, num_str_cols, skip_cols, headers)
  • From our package pycellframe import function incell_frame;
  • Insert ex - DataFrame into our sheet twice - with and without conditions:
from pycellframe import incell_frame


# 1 - Simple insertion
incell_frame(worksheet=worksheet, dataframe=df)

# 2 - Insertion with some conditions
incell_frame(worksheet=worksheet,
             dataframe=df,
             col_range=(3, 0),
             row_range=(6, 8),
             num_str_cols=['I'],
             skip_cols=['D', 'F'],
             headers=True)

In the first insertion, we did not give our function any arguments, which means the DataFrame ex will be inserted into the Dictionary sheet in the area A1:F4 (without the headers).

However, with the second insertion we define some conditions:

  • col_range=(3, 0) - This means that insertion will be started at the Excel column with the index 3 (column C) and will not be stopped until the very end, since we gave 0 as the second element of the tuple

  • row_range=(6, 8) - Only in between these rows (in Excel) will the DataFrame data be inserted, which means that only the first row (since the headers is set to True) from ex will be inserted into the sheet

  • num_str_cols=['F'] - Another condition here is to not convert Binary column values to int. If we count, this column will be inserted in the Excel column F, so we tell the function to leave the values in it as string

  • skip_cols=['D', 'F'] - D and F columns in Excel will be skipped and since our worksheet was blank in the beginning, these columns will be blank (that is why I named the columns in the DataFrame related names)

  • headers=True - This time, the DataFrame columns will be inserted, too, so the overall insertion area would be C6:J8

For really detailed description of the parameters, please see:
  1. incell_frame.__docs__
  2. sheet_to_sheet.__docs__
  3. incell_style.__docs__
  • Finally, let's save our changes to a new Excel file:
workbook.save('output.xlsx')

Full Code

import pandas as pd
from openpyxl import load_workbook
from pycellframe import incell_style, \
                        incell_frame, \
                        sheet_to_sheet


ex = {
    'Num': [1, 2, 3, 4],
    'AfterFirstBlankCol': 'AfterFirstBlank',
    'Descr': ['One', 'Two', 'Three', 'Four'],
    'AfterSecondBlankCol': 'AfterSecondBlank.',
    'Squared': [1, 4, 9, 16],
    'Binary:': ['1', '10', '11', '100']
}

df = pd.DataFrame(ex)

workbook = load_workbook('numbers.xlsx')
worksheet = workbook['Dictionary']


# Copy the cell style
incell_style(cell_src=worksheet['O3'], cell_dst=worksheet['O4'])


# Copy the entire sheet
worksheet_to = workbook['CopyToThisSheet']

sheet_to_sheet(filename_sheetname_src=('file_src.xlsx', 'CopyThisSheet'),
               worksheet_dst=worksheet_to,
               calculated=False)


# Insert DataFrame into the sheet

## 1 - Simple insertion
incell_frame(worksheet=worksheet, dataframe=df)

## 2 - Insertion with some conditions
incell_frame(worksheet=worksheet,
             dataframe=df,
             col_range=(3, 0),
             row_range=(6, 8),
             num_str_cols=['I'],
             skip_cols=['D', 'F'],
             headers=True)

workbook.save('output.xlsx')
Owner
Luka Sosiashvili
Luka Sosiashvili
PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift This project is composed of two parts: Part1 and Part2

Emmanuel Boateng Sifah 1 Jan 19, 2022
A meta plugin for processing timelapse data timepoint by timepoint in napari

napari-time-slicer A meta plugin for processing timelapse data timepoint by timepoint. It enables a list of napari plugins to process 2D+t or 3D+t dat

Robert Haase 2 Oct 13, 2022
Synthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences

Synthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences. Copula and functional Principle Component Analysis (fPCA) are st

32 Dec 20, 2022
The Master's in Data Science Program run by the Faculty of Mathematics and Information Science

The Master's in Data Science Program run by the Faculty of Mathematics and Information Science is among the first European programs in Data Science and is fully focused on data engineering and data a

Amir Ali 2 Jun 17, 2022
LynxKite: a complete graph data science platform for very large graphs and other datasets.

LynxKite is a complete graph data science platform for very large graphs and other datasets. It seamlessly combines the benefits of a friendly graphical interface and a powerful Python API.

124 Dec 14, 2022
Udacity-api-reporting-pipeline - Udacity api reporting pipeline

udacity-api-reporting-pipeline In this exercise, you'll use portions of each of

Fabio Barbazza 1 Feb 15, 2022
Catalogue data - A Python Scripts to prepare catalogue data

catalogue_data Scripts to prepare catalogue data. Setup Clone this repo. Install

BigScience Workshop 3 Mar 03, 2022
A Python package for the mathematical modeling of infectious diseases via compartmental models

A Python package for the mathematical modeling of infectious diseases via compartmental models. Originally designed for epidemiologists, epispot can be adapted for almost any type of modeling scenari

epispot 12 Dec 28, 2022
Open source platform for Data Science Management automation

Hydrosphere examples This repo contains demo scenarios and pre-trained models to show Hydrosphere capabilities. Data and artifacts management Some mod

hydrosphere.io 6 Aug 10, 2021
Working Time Statistics of working hours and working conditions by industry and company

Working Time Statistics of working hours and working conditions by industry and company

Feng Ruohang 88 Nov 04, 2022
Pipetools enables function composition similar to using Unix pipes.

Pipetools Complete documentation pipetools enables function composition similar to using Unix pipes. It allows forward-composition and piping of arbit

186 Dec 29, 2022
Statistical package in Python based on Pandas

Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy. Some of its main features are listed below. F

Raphael Vallat 1.2k Dec 31, 2022
Stock Analysis dashboard Using Streamlit and Python

StDashApp Stock Analysis Dashboard Using Streamlit and Python If you found the content useful and want to support my work, you can buy me a coffee! Th

StreamAlpha 27 Dec 09, 2022
Python utility to extract differences between two pandas dataframes.

Python utility to extract differences between two pandas dataframes.

Jaime Valero 8 Jan 07, 2023
Pipeline to convert a haploid assembly into diploid

HapDup (haplotype duplicator) is a pipeline to convert a haploid long read assembly into a dual diploid assembly. The reconstructed haplotypes

Mikhail Kolmogorov 50 Jan 05, 2023
BasstatPL is a package for performing different tabulations and calculations for descriptive statistics.

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics. It provides: Frequency table constr

Angel Chavez 1 Oct 31, 2021
Automated Exploration Data Analysis on a financial dataset

Automated EDA on financial dataset Just a simple way to get automated Exploration Data Analysis from financial dataset (OHLCV) using Streamlit and ta.

Darío López Padial 28 Nov 27, 2022
t-SNE and hierarchical clustering are popular methods of exploratory data analysis, particularly in biology.

tree-SNE t-SNE and hierarchical clustering are popular methods of exploratory data analysis, particularly in biology. Building on recent advances in s

Isaac Robinson 61 Nov 21, 2022
Cold Brew: Distilling Graph Node Representations with Incomplete or Missing Neighborhoods

Cold Brew: Distilling Graph Node Representations with Incomplete or Missing Neighborhoods Introduction Graph Neural Networks (GNNs) have demonstrated

37 Dec 15, 2022
Collections of pydantic models

pydantic-collections The pydantic-collections package provides BaseCollectionModel class that allows you to manipulate collections of pydantic models

Roman Snegirev 20 Dec 26, 2022