Working with External Libraries
External libraries are like power tools in a workshop. Python comes with basic tools (built-in modules), but for specialized tasks, you bring in power tools like a drill (Pandas for data), a saw (Matplotlib for charts), or a sander (NumPy for math).
One of Python's greatest strengths is its vast ecosystem of **external libraries** – pre‑written code that you can install and use to solve complex problems without reinventing the wheel. While Python's standard library provides many useful modules (like `math`, `random`, `datetime`), external libraries extend Python into domains like data science, web development, machine learning, image processing, and more.
## Installing External Libraries with `pip`
`pip` is Python's package installer. It downloads and installs libraries from the Python Package Index (PyPI).
**Basic commands:**
```bash
pip install package_name # install latest version
pip install package_name==1.0.0 # install specific version
pip install -r requirements.txt # install from a requirements file
pip uninstall package_name # remove a package
pip list # list installed packages
pip show package_name # show package details
```
## Virtual Environments
A **virtual environment** is an isolated Python environment that allows you to manage dependencies for different projects separately. This prevents version conflicts (e.g., Project A needs Django 2.0, Project B needs Django 3.0).
```bash
python -m venv myenv # create virtual environment
source myenv/bin/activate # activate on Mac/Linux
myenv\Scripts\activate # activate on Windows
deactivate # exit the environment
```
## Popular External Libraries by Category
### Data Science & Analysis
- **NumPy** – fundamental package for numerical computing (arrays, linear algebra, random numbers).
- **Pandas** – data manipulation and analysis (DataFrame, reading CSV/Excel, grouping, merging).
- **SciPy** – scientific computing (optimisation, integration, statistics).
- **StatsModels** – statistical models (regression, time series).
### Data Visualization
- **Matplotlib** – basic plotting (line, bar, scatter, histogram).
- **Seaborn** – statistical visualisations built on Matplotlib (heatmaps, pair plots).
- **Plotly** – interactive web‑based charts.
- **Bokeh** – interactive visualisations for browsers.
### Machine Learning & AI
- **Scikit‑learn** – classic ML (classification, regression, clustering, preprocessing).
- **TensorFlow / Keras** – deep learning.
- **PyTorch** – deep learning with dynamic computation graphs.
### Web Development
- **Django** – full‑stack web framework (batteries‑included).
- **Flask** – lightweight micro‑framework.
- **FastAPI** – modern, fast API framework.
- **Requests** – HTTP client (already covered in Lesson 22).
### Web Scraping
- **BeautifulSoup4** – HTML/XML parsing (Lesson 21).
- **Scrapy** – full‑featured scraping framework.
- **Selenium** – browser automation for dynamic content.
### Image & Video Processing
- **Pillow (PIL)** – image manipulation (resize, crop, filter, convert).
- **OpenCV** – computer vision (face detection, object tracking).
### Database
- **SQLAlchemy** – ORM (Object‑Relational Mapping) for working with databases.
- **Psycopg2** – PostgreSQL adapter.
- **PyMongo** – MongoDB driver.
### Testing
- **pytest** – simple, powerful testing framework.
- **unittest** – built‑in testing module (standard library).
### Automation
- **Selenium** – browser automation.
- **PyAutoGUI** – control mouse and keyboard.
- **Schedule** – job scheduling.
## How to Choose the Right Library
1. **Read the documentation** – official docs usually have examples.
2. **Check activity** – look at GitHub stars, last commit date, number of contributors.
3. **Look for tutorials** – a library with many Stack Overflow questions is easier to learn.
4. **Consider performance** – for large datasets, prefer compiled libraries like NumPy.
5. **Check dependencies** – avoid libraries that pull in many heavy dependencies.
## Creating Your Own Library
You can package your own Python code into a library and share it. The basic structure:
```
my_library/
README.md
setup.py
my_library/
__init__.py
module1.py
tests/
```
A simple `setup.py`:
```python
from setuptools import setup, find_packages
setup(
name='my_library',
version='0.1.0',
packages=find_packages(),
install_requires=['requests>=2.25.0'],
python_requires='>=3.7',
)
```
Then upload to PyPI using `twine`.
## Best Practices
- Always use virtual environments for project‑specific dependencies.
- Pin versions in `requirements.txt` (e.g., `pandas==1.5.3`) to ensure reproducibility.
- Read the license of any library you use – some are not free for commercial use.
- Keep your libraries updated, but test compatibility before upgrading.
## Common Mistakes
- Installing packages globally without a virtual environment – leads to version conflicts.
- Forgetting to activate the virtual environment before installing or running code.
- Hard‑coding paths that rely on specific library versions.
- Using `pip freeze > requirements.txt` in a global environment – includes unrelated packages.
## Practice Exercises
1. Install `requests` in a new virtual environment and use it to fetch data from `https://api.github.com/users/octocat`.
2. Use `pandas` to read a CSV file (create one), calculate the mean of a column, and save the result to a new CSV.
3. Use `matplotlib` to plot a simple line graph of `x = [1,2,3,4,5]` and `y = [2,4,6,8,10]`.
4. Install `BeautifulSoup4` and parse an HTML string to extract all `<a>` links.
5. Write a small `setup.py` for a custom module and install it in editable mode (`pip install -e .`).
This lesson provides **10 complete examples** covering Pandas, NumPy, Matplotlib, Requests, BeautifulSoup, Pillow, a data analysis pipeline, pip commands, library categories, and creating your own library.
# Working with External Libraries
print("WORKING WITH EXTERNAL LIBRARIES")
print("=" * 60)
# Note: For actual use, install libraries first:
# pip install pandas numpy matplotlib seaborn
# Example 1: Introduction to Pandas (Data Analysis)
print("\n1. PANDAS - DATA ANALYSIS LIBRARY")
print("-" * 30)
print("Pandas is like Excel for Python - it handles tables of data.")
print("Main structures: Series (1D) and DataFrame (2D table)")
# Simulating Pandas functionality for demonstration
class MockDataFrame:
"""Mock DataFrame to demonstrate concepts without actual pandas"""
def __init__(self, data, columns=None):
self.data = data
self.columns = columns or [f"col{i}" for i in range(len(data[0]))]
def head(self, n=5):
"""Show first n rows"""
return self.data[:n]
def describe(self):
"""Show statistics"""
stats = {
"count": len(self.data),
"unique": len(set(row[0] for row in self.data)),
"example": self.data[0][0]
}
return stats
def filter_by(self, column, value):
"""Filter rows"""
col_index = self.columns.index(column)
return [row for row in self.data if row[col_index] == value]
# Sample data
student_data = [
["Alice", "Math", 85, "A"],
["Bob", "Science", 92, "A"],
["Charlie", "Math", 78, "B"],
["Diana", "Science", 88, "B"],
["Eve", "Math", 95, "A"],
["Frank", "Science", 82, "B"],
]
print("\nSample student data:")
df = MockDataFrame(student_data, ["Name", "Subject", "Score", "Grade"])
print("First 3 rows:")
for row in df.head(3):
print(f" {row}")
print(f"\nDataFrame stats: {df.describe()}")
print("\nFilter Math students:")
math_students = df.filter_by("Subject", "Math")
for student in math_students:
print(f" {student}")
# Example 2: NumPy - Numerical Computing
print("\n\n2. NUMPY - NUMERICAL COMPUTING")
print("-" * 30)
print("NumPy handles arrays and mathematical operations efficiently.")
print("Key features: Multi-dimensional arrays, mathematical functions, linear algebra")
# Simulating NumPy concepts
class MockArray:
"""Mock NumPy array"""
def __init__(self, data):
self.data = data
def shape(self):
if isinstance(self.data[0], list):
return (len(self.data), len(self.data[0]))
return (len(self.data),)
def mean(self):
return sum(self.data) / len(self.data) if self.data else 0
def reshape(self, new_shape):
"""Reshape array"""
flat_data = []
for item in self.data:
if isinstance(item, list):
flat_data.extend(item)
else:
flat_data.append(item)
# Simple reshape for 1D to 2D
rows, cols = new_shape
result = []
for i in range(rows):
row = flat_data[i*cols:(i+1)*cols]
result.append(row)
return MockArray(result)
# Demonstrate array operations
print("\nArray Operations:")
arr = MockArray([1, 2, 3, 4, 5, 6])
print(f"Array: {arr.data}")
print(f"Shape: {arr.shape()}")
print(f"Mean: {arr.mean():.2f}")
reshaped = arr.reshape((2, 3))
print(f"\nReshaped to 2x3: {reshaped.data}")
print(f"New shape: {reshaped.shape()}")
# Example 3: Matplotlib - Data Visualization
print("\n\n3. MATPLOTLIB - DATA VISUALIZATION")
print("-" * 30)
print("Matplotlib creates charts and graphs from data.")
print("Common plot types: Line, Bar, Scatter, Histogram, Pie")
# Simulating plot creation
class MockPlot:
"""Mock matplotlib plot"""
@staticmethod
def line_plot(x_data, y_data, title):
print(f"\nLine Plot: {title}")
print("X-axis:", x_data)
print("Y-axis:", y_data)
print("Creating line chart...")
# In real matplotlib: plt.plot(x_data, y_data); plt.title(title); plt.show()
@staticmethod
def bar_chart(labels, values, title):
print(f"\nBar Chart: {title}")
for label, value in zip(labels, values):
bar = "█" * (value // 2) # Simple bar representation
print(f"{label:10} {value:3} {bar}")
@staticmethod
def scatter_plot(x_data, y_data, title):
print(f"\nScatter Plot: {title}")
print(" X Y Plot")
for x, y in zip(x_data, y_data):
position = int((x + y) / 2)
plot_line = " " * position + "•"
print(f"{x:3} {y:3} {plot_line}")
# Create sample plots
months = ["Jan", "Feb", "Mar", "Apr", "May"]
sales = [150, 200, 175, 225, 300]
temperatures = [5, 7, 12, 15, 18, 22, 25, 23, 19, 14, 9, 6]
MockPlot.line_plot(months, sales, "Monthly Sales")
MockPlot.bar_chart(months, sales, "Monthly Sales Comparison")
# Scatter plot example
study_hours = [2, 3, 4, 5, 6, 7, 8]
exam_scores = [65, 70, 75, 80, 85, 90, 95]
MockPlot.scatter_plot(study_hours, exam_scores, "Study Hours vs Exam Scores")
# Example 4: Requests - HTTP Library (revisited with more depth)
print("\n\n4. REQUESTS - HTTP LIBRARY")
print("-" * 30)
print("Requests simplifies making HTTP requests.")
print("Common uses: API calls, web scraping, downloading files")
# Simulating requests functionality
class MockRequests:
"""Mock requests module"""
@staticmethod
def get(url, params=None, headers=None):
print(f"GET request to: {url}")
if params:
print(f"Parameters: {params}")
if headers:
print(f"Headers: {headers}")
# Return mock response
return MockResponse(200, "{\"data\": \"example response\"}")
@staticmethod
def post(url, data=None, json=None, headers=None):
print(f"POST request to: {url}")
if json:
print(f"JSON data: {json}")
# Return mock response
return MockResponse(201, "{\"id\": 123, \"status\": \"created\"}")
print("\nMaking API requests:")
# Mock API calls
response = MockRequests.get(
"https://api.example.com/data",
params={"page": 1, "limit": 10},
headers={"Authorization": "Bearer token123"}
)
print(f"Response status: {response.status_code}")
response = MockRequests.post(
"https://api.example.com/users",
json={"name": "John", "email": "john@example.com"}
)
print(f"Response status: {response.status_code}")
# Example 5: BeautifulSoup - HTML Parsing (revisited)
print("\n\n5. BEAUTIFULSOUP - HTML PARSING")
print("-" * 30)
print("BeautifulSoup parses HTML and XML documents.")
print("Common uses: Web scraping, data extraction from web pages")
# Simulating BeautifulSoup
class MockBeautifulSoup:
"""Mock BeautifulSoup"""
def __init__(self, html, parser):
self.html = html
def find(self, tag, attrs=None):
print(f"Finding <{tag}> with attributes: {attrs}")
return MockTag(f"<{tag}>Sample content</{tag}>")
def find_all(self, tag, attrs=None):
print(f"Finding all <{tag}> elements")
return [
MockTag(f"<{tag}>Item 1</{tag}>"),
MockTag(f"<{tag}>Item 2</{tag}>"),
MockTag(f"<{tag}>Item 3</{tag}>")
]
class MockTag:
def __init__(self, content):
self.content = content
def text(self):
return self.content.split(">")[1].split("<")[0]
def get(self, attr):
return f"value_of_{attr}"
print("\nParsing HTML:")
html_content = "<html><body><h1>Title</h1><p>Content</p></body></html>"
soup = MockBeautifulSoup(html_content, "html.parser")
# Find elements
title = soup.find("h1")
print(f"Found title: {title.text()}")
all_paragraphs = soup.find_all("p")
print(f"Found {len(all_paragraphs)} paragraphs")
# Example 6: Pillow - Image Processing
print("\n\n6. PILLOW - IMAGE PROCESSING")
print("-" * 30)
print("Pillow (PIL) handles image manipulation.")
print("Common uses: Resizing, cropping, filtering, converting formats")
# Simulating Pillow functionality
class MockImage:
"""Mock PIL Image"""
def __init__(self, filename):
self.filename = filename
self.size = (800, 600) # Default size
self.format = "JPEG"
def resize(self, new_size):
print(f"Resizing image from {self.size} to {new_size}")
self.size = new_size
return self
def crop(self, box):
print(f"Cropping image with box: {box}")
return self
def save(self, new_filename, format=None):
format = format or self.format
print(f"Saving image as {new_filename} ({format})")
@staticmethod
def open(filename):
print(f"Opening image: {filename}")
return MockImage(filename)
print("\nImage processing example:")
img = MockImage.open("photo.jpg")
print(f"Original size: {img.size}")
# Resize image
img.resize((400, 300))
print(f"New size: {img.size}")
# Crop image
img.crop((100, 100, 300, 200))
# Save image
img.save("photo_resized.jpg", "JPEG")
# Example 7: Real-world project using multiple libraries
print("\n\n7. REAL-WORLD PROJECT: DATA ANALYSIS PIPELINE")
print("-" * 30)
print("Simulating a data analysis project with multiple libraries:")
print("1. Pandas for data manipulation")
print("2. NumPy for numerical operations")
print("3. Matplotlib for visualization")
print("4. Requests for data fetching")
class DataAnalysisPipeline:
"""Mock data analysis pipeline"""
def __init__(self):
self.data = None
def fetch_data(self, source):
"""Fetch data from source"""
print(f"Fetching data from {source}...")
# Simulate fetched data
self.data = [
{"Date": "2024-01-01", "Sales": 150, "Visitors": 200},
{"Date": "2024-01-02", "Sales": 180, "Visitors": 220},
{"Date": "2024-01-03", "Sales": 200, "Visitors": 250},
{"Date": "2024-01-04", "Sales": 175, "Visitors": 210},
{"Date": "2024-01-05", "Sales": 220, "Visitors": 280},
]
print(f"Fetched {len(self.data)} records")
def analyze_data(self):
"""Analyze the data"""
if not self.data:
print("No data to analyze")
return
print("\nData Analysis:")
print("=" * 40)
# Calculate statistics
total_sales = sum(item["Sales"] for item in self.data)
total_visitors = sum(item["Visitors"] for item in self.data)
avg_sales = total_sales / len(self.data)
conversion_rate = (total_sales / total_visitors) * 100
print(f"Total Sales: ${total_sales}")
print(f"Total Visitors: {total_visitors}")
print(f"Average Daily Sales: ${avg_sales:.2f}")
print(f"Conversion Rate: {conversion_rate:.1f}%")
# Find best day
best_day = max(self.data, key=lambda x: x["Sales"])
print(f"\nBest Day: {best_day['Date']}")
print(f"Sales: ${best_day['Sales']}, Visitors: {best_day['Visitors']}")
def visualize_data(self):
"""Create visualizations"""
if not self.data:
print("No data to visualize")
return
print("\nData Visualization:")
print("=" * 40)
# Extract data for plotting
dates = [item["Date"][-2:] for item in self.data] # Just day numbers
sales = [item["Sales"] for item in self.data]
visitors = [item["Visitors"] for item in self.data]
# Create simple text-based charts
print("\nSales Trend:")
max_sales = max(sales)
for date, sale in zip(dates, sales):
bar_length = int((sale / max_sales) * 20)
bar = "█" * bar_length
print(f"Day {date}: ${sale:3} {bar}")
print("\nVisitors Trend:")
max_visitors = max(visitors)
for date, visitor in zip(dates, visitors):
bar_length = int((visitor / max_visitors) * 20)
bar = "█" * bar_length
print(f"Day {date}: {visitor:3} visitors {bar}")
def generate_report(self):
"""Generate analysis report"""
print("\nGenerating Report:")
print("=" * 40)
report = f"""
DATA ANALYSIS REPORT
=====================
Period: January 1-5, 2024
Summary:
- Total Sales: ${sum(item['Sales'] for item in self.data)}
- Total Visitors: {sum(item['Visitors'] for item in self.data)}
- Average Conversion Rate: {(sum(item['Sales'] for item in self.data) / sum(item['Visitors'] for item in self.data)) * 100:.1f}%
Recommendations:
1. Focus marketing on high-conversion days
2. Analyze visitor demographics for better targeting
3. Consider promotions during low-traffic periods
Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
"""
print(report)
# Save report to file
with open("analysis_report.txt", "w") as f:
f.write(report)
print("Report saved to analysis_report.txt")
# Run the pipeline
print("\nRunning Data Analysis Pipeline:")
pipeline = DataAnalysisPipeline()
pipeline.fetch_data("sales_database")
pipeline.analyze_data()
pipeline.visualize_data()
pipeline.generate_report()
# Example 8: Installing and managing libraries
print("\n\n8. INSTALLING AND MANAGING LIBRARIES")
print("-" * 30)
print("Using pip (Python package installer):")
print("\nBasic commands:")
print("pip install package_name # Install a package")
print("pip install package_name==1.0.0 # Install specific version")
print("pip install -r requirements.txt # Install from requirements file")
print("pip uninstall package_name # Remove a package")
print("pip list # Show installed packages")
print("pip show package_name # Show package info")
print("\nExample requirements.txt file:")
requirements = """
# Project dependencies
pandas==1.5.3
numpy==1.24.3
matplotlib==3.7.1
requests==2.28.2
beautifulsoup4==4.12.2
pillow==9.5.0
"""
print(requirements)
print("\nVirtual environments (venv):")
print("python -m venv myenv # Create virtual environment")
print("source myenv/bin/activate # Activate (Linux/Mac)")
print("myenv\\Scripts\\activate # Activate (Windows)")
print("deactivate # Deactivate")
# Example 9: Popular Python libraries by category
print("\n\n9. POPULAR PYTHON LIBRARIES")
print("-" * 30)
libraries_by_category = {
"Data Science & Analysis": [
"pandas", "numpy", "scipy", "statsmodels"
],
"Data Visualization": [
"matplotlib", "seaborn", "plotly", "bokeh"
],
"Machine Learning": [
"scikit-learn", "tensorflow", "pytorch", "keras"
],
"Web Development": [
"django", "flask", "fastapi", "requests"
],
"Web Scraping": [
"beautifulsoup4", "scrapy", "selenium"
],
"Database": [
"sqlalchemy", "psycopg2", "pymongo"
],
"Testing": [
"pytest", "unittest", "nose"
],
"Automation": [
"selenium", "pyautogui", "schedule"
]
}
for category, libs in libraries_by_category.items():
print(f"\n{category}:")
for lib in libs:
print(f" • {lib}")
# Example 10: Creating your own library
print("\n\n10. CREATING YOUR OWN LIBRARY")
print("-" * 30)
print("Structure of a Python library:")
print("""
my_library/
README.md
setup.py
my_library/
__init__.py
module1.py
module2.py
tests/
test_module1.py
examples/
example_usage.py
""")
print("\nExample setup.py:")
setup_py = """
from setuptools import setup, find_packages
setup(
name="my_library",
version="0.1.0",
author="Your Name",
author_email="your.email@example.com",
description="A short description of your library",
packages=find_packages(),
install_requires=[
'requests>=2.25.0',
'pandas>=1.3.0',
],
python_requires='>=3.7',
)
"""
print(setup_py)
print("\nPublishing to PyPI:")
print("1. Create accounts on test.pypi.org and pypi.org")
print("2. Build package: python setup.py sdist bdist_wheel")
print("3. Upload: twine upload dist/*")
print("\nUsing your library:")
print("pip install my_library")
print("# Then in Python:")
print("import my_library")
print("from my_library import module1")
→ Run this code interactively