Working with External Libraries

External libraries are like power tools in a workshop. Python comes with basic tools (built-in modules), but for specialized tasks, you bring in power tools like a drill (Pandas for data), a saw (Matplotlib for charts), or a sander (NumPy for math).
One of Python's greatest strengths is its vast ecosystem of **external libraries** – pre‑written code that you can install and use to solve complex problems without reinventing the wheel. While Python's standard library provides many useful modules (like `math`, `random`, `datetime`), external libraries extend Python into domains like data science, web development, machine learning, image processing, and more.

## Installing External Libraries with `pip`

`pip` is Python's package installer. It downloads and installs libraries from the Python Package Index (PyPI).

**Basic commands:**
```bash
pip install package_name # install latest version
pip install package_name==1.0.0 # install specific version
pip install -r requirements.txt # install from a requirements file
pip uninstall package_name # remove a package
pip list # list installed packages
pip show package_name # show package details
```

## Virtual Environments

A **virtual environment** is an isolated Python environment that allows you to manage dependencies for different projects separately. This prevents version conflicts (e.g., Project A needs Django 2.0, Project B needs Django 3.0).

```bash
python -m venv myenv # create virtual environment
source myenv/bin/activate # activate on Mac/Linux
myenv\Scripts\activate # activate on Windows
deactivate # exit the environment
```

## Popular External Libraries by Category

### Data Science & Analysis
- **NumPy** – fundamental package for numerical computing (arrays, linear algebra, random numbers).
- **Pandas** – data manipulation and analysis (DataFrame, reading CSV/Excel, grouping, merging).
- **SciPy** – scientific computing (optimisation, integration, statistics).
- **StatsModels** – statistical models (regression, time series).

### Data Visualization
- **Matplotlib** – basic plotting (line, bar, scatter, histogram).
- **Seaborn** – statistical visualisations built on Matplotlib (heatmaps, pair plots).
- **Plotly** – interactive web‑based charts.
- **Bokeh** – interactive visualisations for browsers.

### Machine Learning & AI
- **Scikit‑learn** – classic ML (classification, regression, clustering, preprocessing).
- **TensorFlow / Keras** – deep learning.
- **PyTorch** – deep learning with dynamic computation graphs.

### Web Development
- **Django** – full‑stack web framework (batteries‑included).
- **Flask** – lightweight micro‑framework.
- **FastAPI** – modern, fast API framework.
- **Requests** – HTTP client (already covered in Lesson 22).

### Web Scraping
- **BeautifulSoup4** – HTML/XML parsing (Lesson 21).
- **Scrapy** – full‑featured scraping framework.
- **Selenium** – browser automation for dynamic content.

### Image & Video Processing
- **Pillow (PIL)** – image manipulation (resize, crop, filter, convert).
- **OpenCV** – computer vision (face detection, object tracking).

### Database
- **SQLAlchemy** – ORM (Object‑Relational Mapping) for working with databases.
- **Psycopg2** – PostgreSQL adapter.
- **PyMongo** – MongoDB driver.

### Testing
- **pytest** – simple, powerful testing framework.
- **unittest** – built‑in testing module (standard library).

### Automation
- **Selenium** – browser automation.
- **PyAutoGUI** – control mouse and keyboard.
- **Schedule** – job scheduling.

## How to Choose the Right Library

1. **Read the documentation** – official docs usually have examples.
2. **Check activity** – look at GitHub stars, last commit date, number of contributors.
3. **Look for tutorials** – a library with many Stack Overflow questions is easier to learn.
4. **Consider performance** – for large datasets, prefer compiled libraries like NumPy.
5. **Check dependencies** – avoid libraries that pull in many heavy dependencies.

## Creating Your Own Library

You can package your own Python code into a library and share it. The basic structure:
```
my_library/
README.md
setup.py
my_library/
__init__.py
module1.py
tests/
```

A simple `setup.py`:
```python
from setuptools import setup, find_packages
setup(
name='my_library',
version='0.1.0',
packages=find_packages(),
install_requires=['requests>=2.25.0'],
python_requires='>=3.7',
)
```
Then upload to PyPI using `twine`.

## Best Practices

- Always use virtual environments for project‑specific dependencies.
- Pin versions in `requirements.txt` (e.g., `pandas==1.5.3`) to ensure reproducibility.
- Read the license of any library you use – some are not free for commercial use.
- Keep your libraries updated, but test compatibility before upgrading.

## Common Mistakes

- Installing packages globally without a virtual environment – leads to version conflicts.
- Forgetting to activate the virtual environment before installing or running code.
- Hard‑coding paths that rely on specific library versions.
- Using `pip freeze > requirements.txt` in a global environment – includes unrelated packages.

## Practice Exercises

1. Install `requests` in a new virtual environment and use it to fetch data from `https://api.github.com/users/octocat`.
2. Use `pandas` to read a CSV file (create one), calculate the mean of a column, and save the result to a new CSV.
3. Use `matplotlib` to plot a simple line graph of `x = [1,2,3,4,5]` and `y = [2,4,6,8,10]`.
4. Install `BeautifulSoup4` and parse an HTML string to extract all `<a>` links.
5. Write a small `setup.py` for a custom module and install it in editable mode (`pip install -e .`).

This lesson provides **10 complete examples** covering Pandas, NumPy, Matplotlib, Requests, BeautifulSoup, Pillow, a data analysis pipeline, pip commands, library categories, and creating your own library.
# Working with External Libraries
print("WORKING WITH EXTERNAL LIBRARIES")
print("=" * 60)

# Note: For actual use, install libraries first:
# pip install pandas numpy matplotlib seaborn

# Example 1: Introduction to Pandas (Data Analysis)
print("\n1. PANDAS - DATA ANALYSIS LIBRARY")
print("-" * 30)

print("Pandas is like Excel for Python - it handles tables of data.")
print("Main structures: Series (1D) and DataFrame (2D table)")

# Simulating Pandas functionality for demonstration
class MockDataFrame:
    """Mock DataFrame to demonstrate concepts without actual pandas"""
    def __init__(self, data, columns=None):
        self.data = data
        self.columns = columns or [f"col{i}" for i in range(len(data[0]))]
    
    def head(self, n=5):
        """Show first n rows"""
        return self.data[:n]
    
    def describe(self):
        """Show statistics"""
        stats = {
            "count": len(self.data),
            "unique": len(set(row[0] for row in self.data)),
            "example": self.data[0][0]
        }
        return stats
    
    def filter_by(self, column, value):
        """Filter rows"""
        col_index = self.columns.index(column)
        return [row for row in self.data if row[col_index] == value]

# Sample data
student_data = [
    ["Alice", "Math", 85, "A"],
    ["Bob", "Science", 92, "A"],
    ["Charlie", "Math", 78, "B"],
    ["Diana", "Science", 88, "B"],
    ["Eve", "Math", 95, "A"],
    ["Frank", "Science", 82, "B"],
]

print("\nSample student data:")
df = MockDataFrame(student_data, ["Name", "Subject", "Score", "Grade"])

print("First 3 rows:")
for row in df.head(3):
    print(f"  {row}")

print(f"\nDataFrame stats: {df.describe()}")

print("\nFilter Math students:")
math_students = df.filter_by("Subject", "Math")
for student in math_students:
    print(f"  {student}")

# Example 2: NumPy - Numerical Computing
print("\n\n2. NUMPY - NUMERICAL COMPUTING")
print("-" * 30)

print("NumPy handles arrays and mathematical operations efficiently.")
print("Key features: Multi-dimensional arrays, mathematical functions, linear algebra")

# Simulating NumPy concepts
class MockArray:
    """Mock NumPy array"""
    def __init__(self, data):
        self.data = data
    
    def shape(self):
        if isinstance(self.data[0], list):
            return (len(self.data), len(self.data[0]))
        return (len(self.data),)
    
    def mean(self):
        return sum(self.data) / len(self.data) if self.data else 0
    
    def reshape(self, new_shape):
        """Reshape array"""
        flat_data = []
        for item in self.data:
            if isinstance(item, list):
                flat_data.extend(item)
            else:
                flat_data.append(item)
        
        # Simple reshape for 1D to 2D
        rows, cols = new_shape
        result = []
        for i in range(rows):
            row = flat_data[i*cols:(i+1)*cols]
            result.append(row)
        return MockArray(result)

# Demonstrate array operations
print("\nArray Operations:")
arr = MockArray([1, 2, 3, 4, 5, 6])
print(f"Array: {arr.data}")
print(f"Shape: {arr.shape()}")
print(f"Mean: {arr.mean():.2f}")

reshaped = arr.reshape((2, 3))
print(f"\nReshaped to 2x3: {reshaped.data}")
print(f"New shape: {reshaped.shape()}")

# Example 3: Matplotlib - Data Visualization
print("\n\n3. MATPLOTLIB - DATA VISUALIZATION")
print("-" * 30)

print("Matplotlib creates charts and graphs from data.")
print("Common plot types: Line, Bar, Scatter, Histogram, Pie")

# Simulating plot creation
class MockPlot:
    """Mock matplotlib plot"""
    @staticmethod
    def line_plot(x_data, y_data, title):
        print(f"\nLine Plot: {title}")
        print("X-axis:", x_data)
        print("Y-axis:", y_data)
        print("Creating line chart...")
        # In real matplotlib: plt.plot(x_data, y_data); plt.title(title); plt.show()
    
    @staticmethod
    def bar_chart(labels, values, title):
        print(f"\nBar Chart: {title}")
        for label, value in zip(labels, values):
            bar = "█" * (value // 2)  # Simple bar representation
            print(f"{label:10} {value:3} {bar}")
    
    @staticmethod
    def scatter_plot(x_data, y_data, title):
        print(f"\nScatter Plot: {title}")
        print("  X   Y   Plot")
        for x, y in zip(x_data, y_data):
            position = int((x + y) / 2)
            plot_line = " " * position + "•"
            print(f"{x:3} {y:3} {plot_line}")

# Create sample plots
months = ["Jan", "Feb", "Mar", "Apr", "May"]
sales = [150, 200, 175, 225, 300]
temperatures = [5, 7, 12, 15, 18, 22, 25, 23, 19, 14, 9, 6]

MockPlot.line_plot(months, sales, "Monthly Sales")
MockPlot.bar_chart(months, sales, "Monthly Sales Comparison")

# Scatter plot example
study_hours = [2, 3, 4, 5, 6, 7, 8]
exam_scores = [65, 70, 75, 80, 85, 90, 95]
MockPlot.scatter_plot(study_hours, exam_scores, "Study Hours vs Exam Scores")

# Example 4: Requests - HTTP Library (revisited with more depth)
print("\n\n4. REQUESTS - HTTP LIBRARY")
print("-" * 30)

print("Requests simplifies making HTTP requests.")
print("Common uses: API calls, web scraping, downloading files")

# Simulating requests functionality
class MockRequests:
    """Mock requests module"""
    @staticmethod
    def get(url, params=None, headers=None):
        print(f"GET request to: {url}")
        if params:
            print(f"Parameters: {params}")
        if headers:
            print(f"Headers: {headers}")
        
        # Return mock response
        return MockResponse(200, "{\"data\": \"example response\"}")
    
    @staticmethod
    def post(url, data=None, json=None, headers=None):
        print(f"POST request to: {url}")
        if json:
            print(f"JSON data: {json}")
        
        # Return mock response
        return MockResponse(201, "{\"id\": 123, \"status\": \"created\"}")

print("\nMaking API requests:")
# Mock API calls
response = MockRequests.get(
    "https://api.example.com/data",
    params={"page": 1, "limit": 10},
    headers={"Authorization": "Bearer token123"}
)
print(f"Response status: {response.status_code}")

response = MockRequests.post(
    "https://api.example.com/users",
    json={"name": "John", "email": "john@example.com"}
)
print(f"Response status: {response.status_code}")

# Example 5: BeautifulSoup - HTML Parsing (revisited)
print("\n\n5. BEAUTIFULSOUP - HTML PARSING")
print("-" * 30)

print("BeautifulSoup parses HTML and XML documents.")
print("Common uses: Web scraping, data extraction from web pages")

# Simulating BeautifulSoup
class MockBeautifulSoup:
    """Mock BeautifulSoup"""
    def __init__(self, html, parser):
        self.html = html
        
    def find(self, tag, attrs=None):
        print(f"Finding <{tag}> with attributes: {attrs}")
        return MockTag(f"<{tag}>Sample content</{tag}>")
    
    def find_all(self, tag, attrs=None):
        print(f"Finding all <{tag}> elements")
        return [
            MockTag(f"<{tag}>Item 1</{tag}>"),
            MockTag(f"<{tag}>Item 2</{tag}>"),
            MockTag(f"<{tag}>Item 3</{tag}>")
        ]

class MockTag:
    def __init__(self, content):
        self.content = content
    
    def text(self):
        return self.content.split(">")[1].split("<")[0]
    
    def get(self, attr):
        return f"value_of_{attr}"

print("\nParsing HTML:")
html_content = "<html><body><h1>Title</h1><p>Content</p></body></html>"
soup = MockBeautifulSoup(html_content, "html.parser")

# Find elements
title = soup.find("h1")
print(f"Found title: {title.text()}")

all_paragraphs = soup.find_all("p")
print(f"Found {len(all_paragraphs)} paragraphs")

# Example 6: Pillow - Image Processing
print("\n\n6. PILLOW - IMAGE PROCESSING")
print("-" * 30)

print("Pillow (PIL) handles image manipulation.")
print("Common uses: Resizing, cropping, filtering, converting formats")

# Simulating Pillow functionality
class MockImage:
    """Mock PIL Image"""
    def __init__(self, filename):
        self.filename = filename
        self.size = (800, 600)  # Default size
        self.format = "JPEG"
    
    def resize(self, new_size):
        print(f"Resizing image from {self.size} to {new_size}")
        self.size = new_size
        return self
    
    def crop(self, box):
        print(f"Cropping image with box: {box}")
        return self
    
    def save(self, new_filename, format=None):
        format = format or self.format
        print(f"Saving image as {new_filename} ({format})")
    
    @staticmethod
    def open(filename):
        print(f"Opening image: {filename}")
        return MockImage(filename)

print("\nImage processing example:")
img = MockImage.open("photo.jpg")
print(f"Original size: {img.size}")

# Resize image
img.resize((400, 300))
print(f"New size: {img.size}")

# Crop image
img.crop((100, 100, 300, 200))

# Save image
img.save("photo_resized.jpg", "JPEG")

# Example 7: Real-world project using multiple libraries
print("\n\n7. REAL-WORLD PROJECT: DATA ANALYSIS PIPELINE")
print("-" * 30)

print("Simulating a data analysis project with multiple libraries:")
print("1. Pandas for data manipulation")
print("2. NumPy for numerical operations")
print("3. Matplotlib for visualization")
print("4. Requests for data fetching")

class DataAnalysisPipeline:
    """Mock data analysis pipeline"""
    
    def __init__(self):
        self.data = None
        
    def fetch_data(self, source):
        """Fetch data from source"""
        print(f"Fetching data from {source}...")
        # Simulate fetched data
        self.data = [
            {"Date": "2024-01-01", "Sales": 150, "Visitors": 200},
            {"Date": "2024-01-02", "Sales": 180, "Visitors": 220},
            {"Date": "2024-01-03", "Sales": 200, "Visitors": 250},
            {"Date": "2024-01-04", "Sales": 175, "Visitors": 210},
            {"Date": "2024-01-05", "Sales": 220, "Visitors": 280},
        ]
        print(f"Fetched {len(self.data)} records")
    
    def analyze_data(self):
        """Analyze the data"""
        if not self.data:
            print("No data to analyze")
            return
        
        print("\nData Analysis:")
        print("=" * 40)
        
        # Calculate statistics
        total_sales = sum(item["Sales"] for item in self.data)
        total_visitors = sum(item["Visitors"] for item in self.data)
        avg_sales = total_sales / len(self.data)
        conversion_rate = (total_sales / total_visitors) * 100
        
        print(f"Total Sales: ${total_sales}")
        print(f"Total Visitors: {total_visitors}")
        print(f"Average Daily Sales: ${avg_sales:.2f}")
        print(f"Conversion Rate: {conversion_rate:.1f}%")
        
        # Find best day
        best_day = max(self.data, key=lambda x: x["Sales"])
        print(f"\nBest Day: {best_day['Date']}")
        print(f"Sales: ${best_day['Sales']}, Visitors: {best_day['Visitors']}")
    
    def visualize_data(self):
        """Create visualizations"""
        if not self.data:
            print("No data to visualize")
            return
        
        print("\nData Visualization:")
        print("=" * 40)
        
        # Extract data for plotting
        dates = [item["Date"][-2:] for item in self.data]  # Just day numbers
        sales = [item["Sales"] for item in self.data]
        visitors = [item["Visitors"] for item in self.data]
        
        # Create simple text-based charts
        print("\nSales Trend:")
        max_sales = max(sales)
        for date, sale in zip(dates, sales):
            bar_length = int((sale / max_sales) * 20)
            bar = "█" * bar_length
            print(f"Day {date}: ${sale:3} {bar}")
        
        print("\nVisitors Trend:")
        max_visitors = max(visitors)
        for date, visitor in zip(dates, visitors):
            bar_length = int((visitor / max_visitors) * 20)
            bar = "█" * bar_length
            print(f"Day {date}: {visitor:3} visitors {bar}")
    
    def generate_report(self):
        """Generate analysis report"""
        print("\nGenerating Report:")
        print("=" * 40)
        
        report = f"""
DATA ANALYSIS REPORT
=====================
Period: January 1-5, 2024

Summary:
- Total Sales: ${sum(item['Sales'] for item in self.data)}
- Total Visitors: {sum(item['Visitors'] for item in self.data)}
- Average Conversion Rate: {(sum(item['Sales'] for item in self.data) / sum(item['Visitors'] for item in self.data)) * 100:.1f}%

Recommendations:
1. Focus marketing on high-conversion days
2. Analyze visitor demographics for better targeting
3. Consider promotions during low-traffic periods

Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
"""
        
        print(report)
        
        # Save report to file
        with open("analysis_report.txt", "w") as f:
            f.write(report)
        print("Report saved to analysis_report.txt")

# Run the pipeline
print("\nRunning Data Analysis Pipeline:")
pipeline = DataAnalysisPipeline()
pipeline.fetch_data("sales_database")
pipeline.analyze_data()
pipeline.visualize_data()
pipeline.generate_report()

# Example 8: Installing and managing libraries
print("\n\n8. INSTALLING AND MANAGING LIBRARIES")
print("-" * 30)

print("Using pip (Python package installer):")
print("\nBasic commands:")
print("pip install package_name          # Install a package")
print("pip install package_name==1.0.0   # Install specific version")
print("pip install -r requirements.txt   # Install from requirements file")
print("pip uninstall package_name        # Remove a package")
print("pip list                         # Show installed packages")
print("pip show package_name            # Show package info")

print("\nExample requirements.txt file:")
requirements = """
# Project dependencies
pandas==1.5.3
numpy==1.24.3
matplotlib==3.7.1
requests==2.28.2
beautifulsoup4==4.12.2
pillow==9.5.0
"""
print(requirements)

print("\nVirtual environments (venv):")
print("python -m venv myenv          # Create virtual environment")
print("source myenv/bin/activate    # Activate (Linux/Mac)")
print("myenv\\Scripts\\activate     # Activate (Windows)")
print("deactivate                   # Deactivate")

# Example 9: Popular Python libraries by category
print("\n\n9. POPULAR PYTHON LIBRARIES")
print("-" * 30)

libraries_by_category = {
    "Data Science & Analysis": [
        "pandas", "numpy", "scipy", "statsmodels"
    ],
    "Data Visualization": [
        "matplotlib", "seaborn", "plotly", "bokeh"
    ],
    "Machine Learning": [
        "scikit-learn", "tensorflow", "pytorch", "keras"
    ],
    "Web Development": [
        "django", "flask", "fastapi", "requests"
    ],
    "Web Scraping": [
        "beautifulsoup4", "scrapy", "selenium"
    ],
    "Database": [
        "sqlalchemy", "psycopg2", "pymongo"
    ],
    "Testing": [
        "pytest", "unittest", "nose"
    ],
    "Automation": [
        "selenium", "pyautogui", "schedule"
    ]
}

for category, libs in libraries_by_category.items():
    print(f"\n{category}:")
    for lib in libs:
        print(f"  • {lib}")

# Example 10: Creating your own library
print("\n\n10. CREATING YOUR OWN LIBRARY")
print("-" * 30)

print("Structure of a Python library:")
print("""
my_library/
    README.md
    setup.py
    my_library/
        __init__.py
        module1.py
        module2.py
    tests/
        test_module1.py
    examples/
        example_usage.py
""")

print("\nExample setup.py:")
setup_py = """
from setuptools import setup, find_packages

setup(
    name="my_library",
    version="0.1.0",
    author="Your Name",
    author_email="your.email@example.com",
    description="A short description of your library",
    packages=find_packages(),
    install_requires=[
        'requests>=2.25.0',
        'pandas>=1.3.0',
    ],
    python_requires='>=3.7',
)
"""
print(setup_py)

print("\nPublishing to PyPI:")
print("1. Create accounts on test.pypi.org and pypi.org")
print("2. Build package: python setup.py sdist bdist_wheel")
print("3. Upload: twine upload dist/*")

print("\nUsing your library:")
print("pip install my_library")
print("# Then in Python:")
print("import my_library")
print("from my_library import module1")

→ Run this code interactively