Useful tool for inserting DataFrames into the Excel sheet.

Overview

PyCellFrame

Insert Pandas DataFrames into the Excel sheet with a bunch of conditions

Install

pip install pycellframe

Usage

Examples

Let's suppose that we have an Excel file named "numbers.xlsx" with the sheet named "Dictionary" in which we would like to insert the pandas.DataFrame.

Import pandas and create an example DataFrame (which will be inserted into the Excel sheet):

import pandas as pd


ex = {
    'Num': [1, 2, 3, 4],
    'AfterFirstBlankCol': 'AfterFirstBlank',
    'Descr': ['One', 'Two', 'Three', 'Four'],
    'AfterSecondBlankCol': 'AfterSecondBlank.',
    'Squared': [1, 4, 9, 16],
    'Binary:': ['1', '10', '11', '100']
}

df = pd.DataFrame(ex)
  • Import openpyxl.load_workbook and open numbers.xlsx - Our Excel workbook;
  • Get - Dictionary our desired sheet:
from openpyxl import load_workbook


workbook = load_workbook('numbers.xlsx')
worksheet = workbook['Dictionary']

Functions

1. incell_style(cell_src, cell_dst)
  • Let's say, we have a cell in Excel Dictionary sheet that we would like to copy the style from, and it is O3;
  • Let O4 be our destination cell:

NOTE: If we wanted to copy that style to more than one cell, we would simply use the loop depending on the locations of the destination cells.

from pycellframe import incell_style


incell_style(cell_src=worksheet['O3'], cell_dst=worksheet['O4'])
2. sheet_to_sheet(filename_sheetname_src, filename_sheetname_dst, calculated)
  • Let's say that we have two Excel files, and we need specific sheet from one file to be completely copied to another file's specific sheet;
  • filename_sheetname_src is the parameter for one file -> sheet the data to be copied from (tuple(['FILENAME_SRC', 'SHEETNAME_SRC']));
  • worksheet_dst is the parameter for the destination Worksheet the data to be copied to (openpyxl.worksheet.worksheet.Worksheet);
  • Let's assume that we have file_src.xlsx as src file and for worksheet_src we can use its CopyThisSheet sheet.
  • We can use output.xlsx -> CopyToThisSheet sheet as the destination worksheet, for which we already declared the Workbook object above.

NOTE: We are assuming that we need all the formulas (where available) from the source sheet, not calculated data, so we set calculated parameter to False

from pycellframe import sheet_to_sheet


worksheet_to = workbook['CopyToThisSheet']

sheet_to_sheet(filename_sheetname_src=('file_src.xlsx', 'CopyThisSheet'),
               worksheet_dst=worksheet_to,
               calculated=False)
3. incell_frame(worksheet, dataframe, col_range, row_range, num_str_cols, skip_cols, headers)
  • From our package pycellframe import function incell_frame;
  • Insert ex - DataFrame into our sheet twice - with and without conditions:
from pycellframe import incell_frame


# 1 - Simple insertion
incell_frame(worksheet=worksheet, dataframe=df)

# 2 - Insertion with some conditions
incell_frame(worksheet=worksheet,
             dataframe=df,
             col_range=(3, 0),
             row_range=(6, 8),
             num_str_cols=['I'],
             skip_cols=['D', 'F'],
             headers=True)

In the first insertion, we did not give our function any arguments, which means the DataFrame ex will be inserted into the Dictionary sheet in the area A1:F4 (without the headers).

However, with the second insertion we define some conditions:

  • col_range=(3, 0) - This means that insertion will be started at the Excel column with the index 3 (column C) and will not be stopped until the very end, since we gave 0 as the second element of the tuple

  • row_range=(6, 8) - Only in between these rows (in Excel) will the DataFrame data be inserted, which means that only the first row (since the headers is set to True) from ex will be inserted into the sheet

  • num_str_cols=['F'] - Another condition here is to not convert Binary column values to int. If we count, this column will be inserted in the Excel column F, so we tell the function to leave the values in it as string

  • skip_cols=['D', 'F'] - D and F columns in Excel will be skipped and since our worksheet was blank in the beginning, these columns will be blank (that is why I named the columns in the DataFrame related names)

  • headers=True - This time, the DataFrame columns will be inserted, too, so the overall insertion area would be C6:J8

For really detailed description of the parameters, please see:
  1. incell_frame.__docs__
  2. sheet_to_sheet.__docs__
  3. incell_style.__docs__
  • Finally, let's save our changes to a new Excel file:
workbook.save('output.xlsx')

Full Code

import pandas as pd
from openpyxl import load_workbook
from pycellframe import incell_style, \
                        incell_frame, \
                        sheet_to_sheet


ex = {
    'Num': [1, 2, 3, 4],
    'AfterFirstBlankCol': 'AfterFirstBlank',
    'Descr': ['One', 'Two', 'Three', 'Four'],
    'AfterSecondBlankCol': 'AfterSecondBlank.',
    'Squared': [1, 4, 9, 16],
    'Binary:': ['1', '10', '11', '100']
}

df = pd.DataFrame(ex)

workbook = load_workbook('numbers.xlsx')
worksheet = workbook['Dictionary']


# Copy the cell style
incell_style(cell_src=worksheet['O3'], cell_dst=worksheet['O4'])


# Copy the entire sheet
worksheet_to = workbook['CopyToThisSheet']

sheet_to_sheet(filename_sheetname_src=('file_src.xlsx', 'CopyThisSheet'),
               worksheet_dst=worksheet_to,
               calculated=False)


# Insert DataFrame into the sheet

## 1 - Simple insertion
incell_frame(worksheet=worksheet, dataframe=df)

## 2 - Insertion with some conditions
incell_frame(worksheet=worksheet,
             dataframe=df,
             col_range=(3, 0),
             row_range=(6, 8),
             num_str_cols=['I'],
             skip_cols=['D', 'F'],
             headers=True)

workbook.save('output.xlsx')
Owner
Luka Sosiashvili
Luka Sosiashvili
A computer algebra system written in pure Python

SymPy See the AUTHORS file for the list of authors. And many more people helped on the SymPy mailing list, reported bugs, helped organize SymPy's part

SymPy 9.9k Dec 31, 2022
The Spark Challenge Student Check-In/Out Tracking Script

The Spark Challenge Student Check-In/Out Tracking Script This Python Script uses the Student ID Database to match the entries with the ID Card Swipe a

1 Dec 09, 2021
Python reader for Linked Data in HDF5 files

Linked Data are becoming more popular for user-created metadata in HDF5 files.

The HDF Group 8 May 17, 2022
A CLI tool to reduce the friction between data scientists by reducing git conflicts removing notebook metadata and gracefully resolving git conflicts.

databooks is a package for reducing the friction data scientists while using Jupyter notebooks, by reducing the number of git conflicts between different notebooks and assisting in the resolution of

dataroots 86 Dec 25, 2022
PipeChain is a utility library for creating functional pipelines.

PipeChain Motivation PipeChain is a utility library for creating functional pipelines. Let's start with a motivating example. We have a list of Austra

Michael Milton 2 Aug 07, 2022
Numerical Analysis toolkit centred around PDEs, for demonstration and understanding purposes not production

Numerics Numerical Analysis toolkit centred around PDEs, for demonstration and understanding purposes not production Use procedure: Initialise a new i

George Whittle 1 Nov 13, 2021
Synthetic Data Generation for tabular, relational and time series data.

An Open Source Project from the Data to AI Lab, at MIT Website: https://sdv.dev Documentation: https://sdv.dev/SDV User Guides Developer Guides Github

The Synthetic Data Vault Project 1.2k Jan 07, 2023
A simple and efficient tool to parallelize Pandas operations on all available CPUs

Pandaral·lel Without parallelization With parallelization Installation $ pip install pandarallel [--upgrade] [--user] Requirements On Windows, Pandara

Manu NALEPA 2.8k Dec 31, 2022
Data Intelligence Applications - Online Product Advertising and Pricing with Context Generation

Data Intelligence Applications - Online Product Advertising and Pricing with Context Generation Overview Consider the scenario in which advertisement

Manuel Bressan 2 Nov 18, 2021
Exploring the Top ML and DL GitHub Repositories

This repository contains my work related to my project where I scraped data on the most popular machine learning and deep learning GitHub repositories in order to further visualize and analyze it.

Nico Van den Hooff 17 Aug 21, 2022
Improving your data science workflows with

Make Better Defaults Author: Kjell Wooding [email protected] This is the git re

Kjell Wooding 18 Dec 23, 2022
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

AWS Data Wrangler Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretMana

Amazon Web Services - Labs 3.3k Jan 04, 2023
A Big Data ETL project in PySpark on the historical NYC Taxi Rides data

Processing NYC Taxi Data using PySpark ETL pipeline Description This is an project to extract, transform, and load large amount of data from NYC Taxi

Unnikrishnan 2 Dec 12, 2021
Bigdata Simulation Library Of Dream By Sandman Books

BIGDATA SIMULATION LIBRARY OF DREAM BY SANDMAN BOOKS ================= Solution Architecture Description In the realm of Dreaming, its ruler SANDMAN,

Maycon Cypriano 3 Jun 30, 2022
CPSPEC is an astrophysical data reduction software for timing

CPSPEC manual Introduction CPSPEC is an astrophysical data reduction software for timing. Various timing properties, such as power spectra and cross s

Tenyo Kawamura 1 Oct 20, 2021
songplays datamart provide details about the musical taste of our customers and can help us to improve our recomendation system

Songplays User activity datamart The following document describes the model used to build the songplays datamart table and the respective ETL process.

Leandro Kellermann de Oliveira 1 Jul 13, 2021
A fast, flexible, and performant feature selection package for python.

linselect A fast, flexible, and performant feature selection package for python. Package in a nutshell It's built on stepwise linear regression When p

88 Dec 06, 2022
Conduits - A Declarative Pipelining Tool For Pandas

Conduits - A Declarative Pipelining Tool For Pandas Traditional tools for declaring pipelines in Python suck. They are mostly imperative, and can some

Kale Miller 7 Nov 21, 2021
MapReader: A computer vision pipeline for the semantic exploration of maps at scale

MapReader A computer vision pipeline for the semantic exploration of maps at scale MapReader is an end-to-end computer vision (CV) pipeline designed b

Living with Machines 25 Dec 26, 2022
DaCe is a parallel programming framework that takes code in Python/NumPy and other programming languages

aCe - Data-Centric Parallel Programming Decoupling domain science from performance optimization. DaCe is a parallel programming framework that takes c

SPCL 330 Dec 30, 2022