0% found this document useful (0 votes)

6 views7 pages

Solution

The document contains Python code for importing and analyzing clinical trial data from two Excel files, EDC and CCC. It includes functions to check for discrepancies between the two datasets, count records, and identify missing records. The results indicate discrepancies in adverse event reporting and differences in record counts between the two files.

Uploaded by

patilrushikes714

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views7 pages

Solution

Uploaded by

patilrushikes714

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

importing files and libraries

import pandas as pd
EDC = 'EDC_TC.xls.xlsx'
CCC = 'CCC_TC.xlsx'

edc_df = pd.read_excel(EDC)
ccc_df = pd.read_excel(CCC)
print(edc_df.head(10))
print(ccc_df.head(10))

Batch No Batch ID Dispense Unit Number ID Dispense Unit Number ID2

\
0 ABC/123 NaN 987654 DUN_ID_2

1 ABC/123 NaN 987670 DUN_ID_2

2 ABC/123 NaN 987677 DUN_ID_2

3 ABC/123 NaN 981655 DUN_ID_2

4 ABC/123 NaN 978234 DUN_ID_2

5 ABC/123 NaN 978235 DUN_ID_2

6 ABC/123 NaN 978236 DUN_ID_2

Onset date Onset date 1 AE related

0 2024-01-01 01jan24 No
1 2024-01-01 01jan24 YES
2 2024-01-10 01jan24 YES
3 2024-07-15 07jan24 No
4 2024-01-01 01jan24 No
5 2024-02-01 01feb24 No
6 2024-03-01 01mar24 No
Trial/Study Type Trial/Study Number Center/Site Subject/Patient ID
\
0 Clinical Trial AB1234 123 12345

1 Clinical Trial AB1234 120 12047

2 Clinical Trial AB1234 120 12047

3 Clinical Trial AB1234 110 11054

4 Clinical Trial AB1234 110 11051

5 Clinical Trial AB1234 110 11051

6 Clinical Trial AB1234 110 11051

7 Clinical Trial AB1234 119 11958

8 Clinical Trial AB1234 119 11958

Technical Complaint No. Complaint No. AE related DUN Number \

0 1 CC-12345 No 987654
1 1 CC-12041 No 987670
2 2 CC-12042 No 987677
3 1 CC-12043 No 978233
4 1 CC-12145 No 978234
5 2 CC-12565 No 978235
6 3 CC-21345 No 978236
7 2 CC-14567 Yes 985648
8 1 CC-14568 No 985649

TC led to an SAE
0 No
1 No
2 YES
3 No
4 No
5 No
6 No
7 Yes
8 No

edc_df = pd.read_excel(EDC,usecols=['Subject', 'Seq No', 'Dispense

Unit Number ID', 'AE related'])
ccc_df = pd.read_excel(CCC,usecols=['Subject/Patient ID', 'Technical
Complaint No.', 'DUN Number', 'AE related'])
print(edc_df.head(10))
print(ccc_df.head(10))
print(edc_df.shape)
print(ccc_df.shape)

Subject Seq No Dispense Unit Number ID AE related

0 12345 1 987654 No
1 12047 1 987670 YES
2 12047 3 987677 YES
3 12871 1 981655 No
4 11051 1 978234 No
5 11051 2 978235 No
6 11051 3 978236 No
Subject/Patient ID Technical Complaint No. AE related DUN Number
0 12345 1 No 987654
1 12047 1 No 987670
2 12047 2 No 987677
3 11054 1 No 978233
4 11051 1 No 978234
5 11051 2 No 978235
6 11051 3 No 978236
7 11958 2 Yes 985648
8 11958 1 No 985649
(7, 4)
(9, 4)

start the indexing from 1

edc_df.index += 1
ccc_df.index += 1

print(edc_df)
print(ccc_df)

Subject Seq No Dispense Unit Number ID AE related

1 12345 1 987654 No
2 12047 1 987670 YES
3 12047 3 987677 YES
4 12871 1 981655 No
5 11051 1 978234 No
6 11051 2 978235 No
7 11051 3 978236 No
Subject/Patient ID Technical Complaint No. AE related DUN Number
1 12345 1 No 987654
2 12047 1 No 987670
3 12047 2 No 987677
4 11054 1 No 978233
5 11051 1 No 978234
6 11051 2 No 978235
7 11051 3 No 978236
8 11958 2 Yes 985648
9 11958 1 No 985649

Problem No. 1:

Check whether all the records match each other

if not discrepancies are present
def check_records_match(edc, ccc):
discrepancies = []
for i, edc_row in edc.iterrows():
matched = False
for j, ccc_row in ccc.iterrows():
if edc_row['Subject'] == ccc_row['Subject/Patient ID'] and
\
edc_row['Dispense Unit Number ID'] == ccc_row['DUN
Number']:
matched = True
if edc_row['AE related'].lower() != ccc_row['AE
related'].lower():
discrepancies.append((i, 'AE related', edc_row['AE
related'], ccc_row['AE related']))
if not matched:
discrepancies.append((i, 'Record not found', 'EDC record
not found in CCC'))

for j, ccc_row in ccc.iterrows():

matched = False
for i, edc_row in edc.iterrows():
if edc_row['Subject'] == ccc_row['Subject/Patient ID'] and
\
edc_row['Dispense Unit Number ID'] == ccc_row['DUN
Number']:
matched = True
if not matched:
discrepancies.append((j, 'Record not found', 'CCC record
not found in EDC'))

return discrepancies

#execccution
discrepancies = check_records_match(edc_df, ccc_df)
print("Discrepancies:")
for discrepancy in discrepancies:
print(discrepancy)

Discrepancies:
(2, 'AE related', 'YES', 'No')
(3, 'AE related', 'YES', 'No')
(4, 'Record not found', 'EDC record not found in CCC')
(4, 'Record not found', 'CCC record not found in EDC')
(8, 'Record not found', 'CCC record not found in EDC')
(9, 'Record not found', 'CCC record not found in EDC')

Problem No. 2:

Check the total number of records in the EDC

File are equal to the CCC File.
def check_record_counts(edc, ccc):
len_edc = len(edc)
len_ccc = len(ccc)
if len_edc != len_ccc:
return "Not Equal\n" f"Number of records in EDC: {len_edc},
Number of records in CCC: {len_ccc}"
return "The number of records in both tables are equal."

# execution
record_count_check = check_record_counts(edc_df, ccc_df)
print("\nRecord Count Check:")
print(record_count_check)

Record Count Check:

Not Equal
Number of records in EDC: 7, Number of records in CCC: 9
Problem No. 3:

Check if any record is missing from both the

tables where one record is present in EDC file
but not in CCC file and vice versa.
def check_missing_records(edc, ccc):
# Create dictionaries to map (Subject, Dispense Unit Number ID)
and (Subject/Patient ID, DUN Number) to their row indices
edc_index = {(row['Subject'], row['Dispense Unit Number ID']): idx
for idx, row in edc.iterrows()}
ccc_index = {(row['Subject/Patient ID'], row['DUN Number']): idx
for idx, row in ccc.iterrows()}

# Convert the keys of these dictionaries to sets for comparison

edc_ids = set(edc_index.keys())
ccc_ids = set(ccc_index.keys())

# Calculate missing records

missing_from_ccc = edc_ids - ccc_ids
missing_from_edc = ccc_ids - edc_ids

# Prepare results with row numbers

missing_from_ccc_with_indices = [(record, edc_index[record]) for
record in missing_from_ccc]
missing_from_edc_with_indices = [(record, ccc_index[record]) for
record in missing_from_edc]

return missing_from_edc_with_indices,
missing_from_ccc_with_indices

# Execution
missing_from_edc, missing_from_ccc = check_missing_records(edc_df,
ccc_df)

print("\nMissing Records from EDC (Row Number in EDC):")

for missing in missing_from_edc:
record, row_num = missing
print(f"Record: {record}, Row Number: {row_num}")

print("\nMissing Records from CCC (Row Number in CCC):")

for missing in missing_from_ccc:
record, row_num = missing
print(f"Record: {record}, Row Number: {row_num}")
Missing Records from EDC (Row Number in EDC):
Record: (11054, 978233), Row Number: 4
Record: (11958, 985648), Row Number: 8
Record: (11958, 985649), Row Number: 9

Missing Records from CCC (Row Number in CCC):

Record: (12871, 981655), Row Number: 4

Cs Practical 2024
No ratings yet
Cs Practical 2024
6 pages
Gezgin Robotlar Notlar
No ratings yet
Gezgin Robotlar Notlar
6 pages
Uriel Herrera Ayon 01 - FEBRERO - 2018
No ratings yet
Uriel Herrera Ayon 01 - FEBRERO - 2018
3 pages
Gidb4995020-Cp4 Ut1 Math Paper 1 QP
No ratings yet
Gidb4995020-Cp4 Ut1 Math Paper 1 QP
8 pages
LFS 2013 Data Layout
No ratings yet
LFS 2013 Data Layout
32 pages
Kode Nama Barang Stok Sistem
No ratings yet
Kode Nama Barang Stok Sistem
4 pages
CEB 282 Continuous Assessment Record
No ratings yet
CEB 282 Continuous Assessment Record
2 pages
Network Resource & CDR Details
No ratings yet
Network Resource & CDR Details
3 pages
Int 206 Practical 2023
No ratings yet
Int 206 Practical 2023
6 pages
listprintAAC3041000012 2023 0624 1609
No ratings yet
listprintAAC3041000012 2023 0624 1609
190 pages
MML - Base Band
No ratings yet
MML - Base Band
8 pages
12 Cs Practical QP-final
No ratings yet
12 Cs Practical QP-final
11 pages
KEAM 2023 Medical Admission EWS List
No ratings yet
KEAM 2023 Medical Admission EWS List
272 pages
Ezhava: Admission To B.Pharm Course
No ratings yet
Ezhava: Admission To B.Pharm Course
223 pages
Antenna RF
No ratings yet
Antenna RF
20 pages
12 Servicing PDF
No ratings yet
12 Servicing PDF
1,005 pages
Master Sheet On Pre-Test Knowledge Score of Staff Nurses: Reliability of The Tool
No ratings yet
Master Sheet On Pre-Test Knowledge Score of Staff Nurses: Reliability of The Tool
1 page
F55SRODH
No ratings yet
F55SRODH
2 pages
Lab Question Set
No ratings yet
Lab Question Set
7 pages
Nammhvna 000010
No ratings yet
Nammhvna 000010
1,017 pages
Catlist Medi
No ratings yet
Catlist Medi
282 pages
Cs Practical Questions
No ratings yet
Cs Practical Questions
6 pages
Nammtiaa 000010
No ratings yet
Nammtiaa 000010
1,356 pages
Trial P1 Sep 2024 QP
No ratings yet
Trial P1 Sep 2024 QP
19 pages
Data Analysis Report
No ratings yet
Data Analysis Report
46 pages
CPT Coding Essentials For Anesthesiology and Pain Management 2019 1st Edition - Ebook PDF Version PDF Available
100% (2)
CPT Coding Essentials For Anesthesiology and Pain Management 2019 1st Edition - Ebook PDF Version PDF Available
142 pages
Submit A Word Document
No ratings yet
Submit A Word Document
9 pages
NTPC Answer Key
No ratings yet
NTPC Answer Key
35 pages
Ret Iva 99035 1era QCN Abril 2025
No ratings yet
Ret Iva 99035 1era QCN Abril 2025
3 pages
Workshop 36
No ratings yet
Workshop 36
8 pages
A320 Amm CH 12 PDF
No ratings yet
A320 Amm CH 12 PDF
1,171 pages
Ca Food and Nutrition en
No ratings yet
Ca Food and Nutrition en
5 pages
Data Cleaning
No ratings yet
Data Cleaning
10 pages
Listado Tecnologias NoPOS Medicamentos
No ratings yet
Listado Tecnologias NoPOS Medicamentos
524 pages
12th - Mid-Term-IP
No ratings yet
12th - Mid-Term-IP
5 pages
202311eer
No ratings yet
202311eer
3 pages
Module 5 Exercises Problem No. 2
No ratings yet
Module 5 Exercises Problem No. 2
2 pages
Dico Libelle2023 Diff
No ratings yet
Dico Libelle2023 Diff
79 pages
Quiz 4 - G5
No ratings yet
Quiz 4 - G5
2 pages
CS1010E - Final1920s1 Sample Solution
No ratings yet
CS1010E - Final1920s1 Sample Solution
15 pages
Peserta Anbk
No ratings yet
Peserta Anbk
6 pages
ENH 1202 - Updated
No ratings yet
ENH 1202 - Updated
2 pages
Error Log Lorad Affinity AMP
No ratings yet
Error Log Lorad Affinity AMP
2 pages
CH 12 Servicing
No ratings yet
CH 12 Servicing
1,152 pages
Computer Science XI Keys 2021
No ratings yet
Computer Science XI Keys 2021
2 pages
Class 12 Informatics Practices Sample Paper Set 2
No ratings yet
Class 12 Informatics Practices Sample Paper Set 2
12 pages
PLMK Batch 1
No ratings yet
PLMK Batch 1
15 pages
NEW BATCH 7pm
No ratings yet
NEW BATCH 7pm
22 pages
DAV QP Dec, 2022
No ratings yet
DAV QP Dec, 2022
15 pages
Mock Paper 02 - Part 1
No ratings yet
Mock Paper 02 - Part 1
14 pages
Nipcib 000001
100% (2)
Nipcib 000001
197 pages
HSM#3 - April - PR (Till 25 Mar)
No ratings yet
HSM#3 - April - PR (Till 25 Mar)
3,693 pages
Mat 1580024973
No ratings yet
Mat 1580024973
408 pages
Practical Schedule-1
No ratings yet
Practical Schedule-1
1 page
Interaction With Display and File Processing Opcodes: Overview
No ratings yet
Interaction With Display and File Processing Opcodes: Overview
12 pages
Topic: The SET, MERGE, UPDATE Statements
No ratings yet
Topic: The SET, MERGE, UPDATE Statements
10 pages
Integrated Summary of Safety and Efficacy Programming For Studies Using Electronic Data Capture
No ratings yet
Integrated Summary of Safety and Efficacy Programming For Studies Using Electronic Data Capture
6 pages
LTCBillCodeCrosswalk 100109
No ratings yet
LTCBillCodeCrosswalk 100109
67 pages
Stat 440 Lab Exercises 13
No ratings yet
Stat 440 Lab Exercises 13
3 pages
Calendar of Activities Cop
No ratings yet
Calendar of Activities Cop
6 pages
Management of Penile Fracture
No ratings yet
Management of Penile Fracture
4 pages
Public Health Nursing 8th Edition Stanhope HQ File Fast Access
No ratings yet
Public Health Nursing 8th Edition Stanhope HQ File Fast Access
324 pages
Food Defense Planning For Wholesale Food Establishments A Self-Inspection Checklist
0% (1)
Food Defense Planning For Wholesale Food Establishments A Self-Inspection Checklist
2 pages
Community Action Programs Overview
No ratings yet
Community Action Programs Overview
2 pages
The Development An Anger Expression and Control Scale: Thcrgse Van Elderen," Stan Maes Ivan Komproe
No ratings yet
The Development An Anger Expression and Control Scale: Thcrgse Van Elderen," Stan Maes Ivan Komproe
13 pages
PHD Dissertation Template
100% (1)
PHD Dissertation Template
8 pages
Operational Manual of Mbbs Curriculum 2012: Subject: Pathology
No ratings yet
Operational Manual of Mbbs Curriculum 2012: Subject: Pathology
34 pages
Ept Training Dates - Aviation Medical - Cabin Crew Payment Plans 2024
No ratings yet
Ept Training Dates - Aviation Medical - Cabin Crew Payment Plans 2024
1 page
Nursing Care Plan 1
100% (1)
Nursing Care Plan 1
2 pages
Eye Exercise
No ratings yet
Eye Exercise
3 pages
(MMPI) : The Minnesota Multiphasic Personality Inventory
No ratings yet
(MMPI) : The Minnesota Multiphasic Personality Inventory
22 pages
Coffee and Kids
No ratings yet
Coffee and Kids
2 pages
Amoyer,+7 Paez-Varas ORA 576 Final
No ratings yet
Amoyer,+7 Paez-Varas ORA 576 Final
21 pages
Antimicrobial Drug Resistance Mechanisms of Drug Resistance, Volume 1, 2nd Edition (FULL VERSION DOWNLOAD)
No ratings yet
Antimicrobial Drug Resistance Mechanisms of Drug Resistance, Volume 1, 2nd Edition (FULL VERSION DOWNLOAD)
17 pages
Peak Flow Meters
No ratings yet
Peak Flow Meters
16 pages
Nursing Drug Study Guide
No ratings yet
Nursing Drug Study Guide
5 pages
Applied - Strive - Pharmacy - 508
No ratings yet
Applied - Strive - Pharmacy - 508
4 pages
Successful Treatment of Nasal Osteosarcoma in A Cat With Lomustine and Prednisolone A Case Report
No ratings yet
Successful Treatment of Nasal Osteosarcoma in A Cat With Lomustine and Prednisolone A Case Report
2 pages
Intratect IVIg Solution: Uses & Dosage
No ratings yet
Intratect IVIg Solution: Uses & Dosage
9 pages
Genogram
No ratings yet
Genogram
38 pages
Electronic Health Records and Nursing: Applications To Nursing Care
No ratings yet
Electronic Health Records and Nursing: Applications To Nursing Care
20 pages
Essential Med Learner Guide - 2025 - V02
No ratings yet
Essential Med Learner Guide - 2025 - V02
16 pages
Barriers To Antenatal Care in An Urban Community Gambia
No ratings yet
Barriers To Antenatal Care in An Urban Community Gambia
9 pages
Case-Study-2 Asthma
100% (3)
Case-Study-2 Asthma
5 pages
NEBOSH Risk Assessment Guide
No ratings yet
NEBOSH Risk Assessment Guide
17 pages
Hand Book Cardiovascular 2018 19-5-18 Final
No ratings yet
Hand Book Cardiovascular 2018 19-5-18 Final
110 pages
Case Study Gonorrhea
No ratings yet
Case Study Gonorrhea
37 pages
Functional Medicine & Mast Cell Disorders
No ratings yet
Functional Medicine & Mast Cell Disorders
4 pages
Medical Device Compliance Guide
100% (2)
Medical Device Compliance Guide
1 page

Solution

Uploaded by

Solution

Uploaded by

importing files and libraries

Site Subject Visit Form Seq No Complaint Sample

Batch No Batch ID Dispense Unit Number ID Dispense Unit Number ID2

1 ABC/123 NaN 987670 DUN_ID_2

2 ABC/123 NaN 987677 DUN_ID_2

3 ABC/123 NaN 981655 DUN_ID_2

4 ABC/123 NaN 978234 DUN_ID_2

5 ABC/123 NaN 978235 DUN_ID_2

6 ABC/123 NaN 978236 DUN_ID_2

Onset date Onset date 1 AE related

1 Clinical Trial AB1234 120 12047

2 Clinical Trial AB1234 120 12047

3 Clinical Trial AB1234 110 11054

4 Clinical Trial AB1234 110 11051

5 Clinical Trial AB1234 110 11051

6 Clinical Trial AB1234 110 11051

7 Clinical Trial AB1234 119 11958

8 Clinical Trial AB1234 119 11958

Technical Complaint No. Complaint No. AE related DUN Number \

edc_df = pd.read_excel(EDC,usecols=['Subject', 'Seq No', 'Dispense

Subject Seq No Dispense Unit Number ID AE related

start the indexing from 1

Subject Seq No Dispense Unit Number ID AE related

Check whether all the records match each other

for j, ccc_row in ccc.iterrows():

Check the total number of records in the EDC

Record Count Check:

Check if any record is missing from both the

# Convert the keys of these dictionaries to sets for comparison

# Calculate missing records

# Prepare results with row numbers

print("\nMissing Records from EDC (Row Number in EDC):")

print("\nMissing Records from CCC (Row Number in CCC):")

Missing Records from CCC (Row Number in CCC):

You might also like