Working with External Libraries

External libraries are like power tools in a workshop. Python comes with basic tools (built-in modules), but for specialized tasks, you bring in power tools like a drill (Pandas for data), a saw (Matplotlib for charts), or a sander (NumPy for math).

One of Python's greatest strengths is its vast ecosystem of **external libraries** – pre‑written code that you can install and use to solve complex problems without reinventing the wheel. While Python's standard library provides many useful modules (like `math`, `random`, `datetime`), external libraries extend Python into domains like data science, web development, machine learning, image processing, and more.

## Installing External Libraries with `pip`

`pip` is Python's package installer. It downloads and installs libraries from the Python Package Index (PyPI).

**Basic commands:**
```bash
pip install package_name # install latest version
pip install package_name==1.0.0 # install specific version
pip install -r requirements.txt # install from a requirements file
pip uninstall package_name # remove a package
pip list # list installed packages
pip show package_name # show package details
```

## Virtual Environments

A **virtual environment** is an isolated Python environment that allows you to manage dependencies for different projects separately. This prevents version conflicts (e.g., Project A needs Django 2.0, Project B needs Django 3.0).

```bash
python -m venv myenv # create virtual environment
source myenv/bin/activate # activate on Mac/Linux
myenv\Scripts\activate # activate on Windows
deactivate # exit the environment
```

## Popular External Libraries by Category

### Data Science & Analysis
- **NumPy** – fundamental package for numerical computing (arrays, linear algebra, random numbers).
- **Pandas** – data manipulation and analysis (DataFrame, reading CSV/Excel, grouping, merging).
- **SciPy** – scientific computing (optimisation, integration, statistics).
- **StatsModels** – statistical models (regression, time series).

### Data Visualization
- **Matplotlib** – basic plotting (line, bar, scatter, histogram).
- **Seaborn** – statistical visualisations built on Matplotlib (heatmaps, pair plots).
- **Plotly** – interactive web‑based charts.
- **Bokeh** – interactive visualisations for browsers.

### Machine Learning & AI
- **Scikit‑learn** – classic ML (classification, regression, clustering, preprocessing).
- **TensorFlow / Keras** – deep learning.
- **PyTorch** – deep learning with dynamic computation graphs.

### Web Development
- **Django** – full‑stack web framework (batteries‑included).
- **Flask** – lightweight micro‑framework.
- **FastAPI** – modern, fast API framework.
- **Requests** – HTTP client (already covered in Lesson 22).

### Web Scraping
- **BeautifulSoup4** – HTML/XML parsing (Lesson 21).
- **Scrapy** – full‑featured scraping framework.
- **Selenium** – browser automation for dynamic content.

### Image & Video Processing
- **Pillow (PIL)** – image manipulation (resize, crop, filter, convert).
- **OpenCV** – computer vision (face detection, object tracking).

### Database
- **SQLAlchemy** – ORM (Object‑Relational Mapping) for working with databases.
- **Psycopg2** – PostgreSQL adapter.
- **PyMongo** – MongoDB driver.

### Testing
- **pytest** – simple, powerful testing framework.
- **unittest** – built‑in testing module (standard library).

### Automation
- **Selenium** – browser automation.
- **PyAutoGUI** – control mouse and keyboard.
- **Schedule** – job scheduling.

## How to Choose the Right Library

1. **Read the documentation** – official docs usually have examples.
2. **Check activity** – look at GitHub stars, last commit date, number of contributors.
3. **Look for tutorials** – a library with many Stack Overflow questions is easier to learn.
4. **Consider performance** – for large datasets, prefer compiled libraries like NumPy.
5. **Check dependencies** – avoid libraries that pull in many heavy dependencies.

## Creating Your Own Library

You can package your own Python code into a library and share it. The basic structure:
```
my_library/
README.md
setup.py
my_library/
__init__.py
module1.py
tests/
```

A simple `setup.py`:
```python
from setuptools import setup, find_packages
setup(
name='my_library',
version='0.1.0',
packages=find_packages(),
install_requires=['requests>=2.25.0'],
python_requires='>=3.7',
)
```
Then upload to PyPI using `twine`.

## Best Practices

- Always use virtual environments for project‑specific dependencies.
- Pin versions in `requirements.txt` (e.g., `pandas==1.5.3`) to ensure reproducibility.
- Read the license of any library you use – some are not free for commercial use.
- Keep your libraries updated, but test compatibility before upgrading.

## Common Mistakes

- Installing packages globally without a virtual environment – leads to version conflicts.
- Forgetting to activate the virtual environment before installing or running code.
- Hard‑coding paths that rely on specific library versions.
- Using `pip freeze > requirements.txt` in a global environment – includes unrelated packages.

## Practice Exercises

1. Install `requests` in a new virtual environment and use it to fetch data from `https://api.github.com/users/octocat`.
2. Use `pandas` to read a CSV file (create one), calculate the mean of a column, and save the result to a new CSV.
3. Use `matplotlib` to plot a simple line graph of `x = [1,2,3,4,5]` and `y = [2,4,6,8,10]`.
4. Install `BeautifulSoup4` and parse an HTML string to extract all `<a>` links.
5. Write a small `setup.py` for a custom module and install it in editable mode (`pip install -e .`).

This lesson provides **10 complete examples** covering Pandas, NumPy, Matplotlib, Requests, BeautifulSoup, Pillow, a data analysis pipeline, pip commands, library categories, and creating your own library.

# Working with External Libraries print("WORKING WITH EXTERNAL LIBRARIES") print("=" * 60) # Note: For actual use, install libraries first: # pip install pandas numpy matplotlib seaborn # Example 1: Introduction to Pandas (Data Analysis) print("\n1. PANDAS - DATA ANALYSIS LIBRARY") print("-" * 30) print("Pandas is like Excel for Python - it handles tables of data.") print("Main structures: Series (1D) and DataFrame (2D table)") # Simulating Pandas functionality for demonstration class MockDataFrame: """Mock DataFrame to demonstrate concepts without actual pandas""" def __init__(self, data, columns=None): self.data = data self.columns = columns or [f"col{i}" for i in range(len(data[0]))] def head(self, n=5): """Show first n rows""" return self.data[:n] def describe(self): """Show statistics""" stats = { "count": len(self.data), "unique": len(set(row[0] for row in self.data)), "example": self.data[0][0] } return stats def filter_by(self, column, value): """Filter rows""" col_index = self.columns.index(column) return [row for row in self.data if row[col_index] == value] # Sample data student_data = [ ["Alice", "Math", 85, "A"], ["Bob", "Science", 92, "A"], ["Charlie", "Math", 78, "B"], ["Diana", "Science", 88, "B"], ["Eve", "Math", 95, "A"], ["Frank", "Science", 82, "B"], ] print("\nSample student data:") df = MockDataFrame(student_data, ["Name", "Subject", "Score", "Grade"]) print("First 3 rows:") for row in df.head(3): print(f" {row}") print(f"\nDataFrame stats: {df.describe()}") print("\nFilter Math students:") math_students = df.filter_by("Subject", "Math") for student in math_students: print(f" {student}") # Example 2: NumPy - Numerical Computing print("\n\n2. NUMPY - NUMERICAL COMPUTING") print("-" * 30) print("NumPy handles arrays and mathematical operations efficiently.") print("Key features: Multi-dimensional arrays, mathematical functions, linear algebra") # Simulating NumPy concepts class MockArray: """Mock NumPy array""" def __init__(self, data): self.data = data def shape(self): if isinstance(self.data[0], list): return (len(self.data), len(self.data[0])) return (len(self.data),) def mean(self): return sum(self.data) / len(self.data) if self.data else 0 def reshape(self, new_shape): """Reshape array""" flat_data = [] for item in self.data: if isinstance(item, list): flat_data.extend(item) else: flat_data.append(item) # Simple reshape for 1D to 2D rows, cols = new_shape result = [] for i in range(rows): row = flat_data[i*cols:(i+1)*cols] result.append(row) return MockArray(result) # Demonstrate array operations print("\nArray Operations:") arr = MockArray([1, 2, 3, 4, 5, 6]) print(f"Array: {arr.data}") print(f"Shape: {arr.shape()}") print(f"Mean: {arr.mean():.2f}") reshaped = arr.reshape((2, 3)) print(f"\nReshaped to 2x3: {reshaped.data}") print(f"New shape: {reshaped.shape()}") # Example 3: Matplotlib - Data Visualization print("\n\n3. MATPLOTLIB - DATA VISUALIZATION") print("-" * 30) print("Matplotlib creates charts and graphs from data.") print("Common plot types: Line, Bar, Scatter, Histogram, Pie") # Simulating plot creation class MockPlot: """Mock matplotlib plot""" @staticmethod def line_plot(x_data, y_data, title): print(f"\nLine Plot: {title}") print("X-axis:", x_data) print("Y-axis:", y_data) print("Creating line chart...") # In real matplotlib: plt.plot(x_data, y_data); plt.title(title); plt.show() @staticmethod def bar_chart(labels, values, title): print(f"\nBar Chart: {title}") for label, value in zip(labels, values): bar = "█" * (value // 2) # Simple bar representation print(f"{label:10} {value:3} {bar}") @staticmethod def scatter_plot(x_data, y_data, title): print(f"\nScatter Plot: {title}") print(" X Y Plot") for x, y in zip(x_data, y_data): position = int((x + y) / 2) plot_line = " " * position + "•" print(f"{x:3} {y:3} {plot_line}") # Create sample plots months = ["Jan", "Feb", "Mar", "Apr", "May"] sales = [150, 200, 175, 225, 300] temperatures = [5, 7, 12, 15, 18, 22, 25, 23, 19, 14, 9, 6] MockPlot.line_plot(months, sales, "Monthly Sales") MockPlot.bar_chart(months, sales, "Monthly Sales Comparison") # Scatter plot example study_hours = [2, 3, 4, 5, 6, 7, 8] exam_scores = [65, 70, 75, 80, 85, 90, 95] MockPlot.scatter_plot(study_hours, exam_scores, "Study Hours vs Exam Scores") # Example 4: Requests - HTTP Library (revisited with more depth) print("\n\n4. REQUESTS - HTTP LIBRARY") print("-" * 30) print("Requests simplifies making HTTP requests.") print("Common uses: API calls, web scraping, downloading files") # Simulating requests functionality class MockRequests: """Mock requests module""" @staticmethod def get(url, params=None, headers=None): print(f"GET request to: {url}") if params: print(f"Parameters: {params}") if headers: print(f"Headers: {headers}") # Return mock response return MockResponse(200, "{\"data\": \"example response\"}") @staticmethod def post(url, data=None, json=None, headers=None): print(f"POST request to: {url}") if json: print(f"JSON data: {json}") # Return mock response return MockResponse(201, "{\"id\": 123, \"status\": \"created\"}") print("\nMaking API requests:") # Mock API calls response = MockRequests.get( "https://api.example.com/data", params={"page": 1, "limit": 10}, headers={"Authorization": "Bearer token123"} ) print(f"Response status: {response.status_code}") response = MockRequests.post( "https://api.example.com/users", json={"name": "John", "email": "john@example.com"} ) print(f"Response status: {response.status_code}") # Example 5: BeautifulSoup - HTML Parsing (revisited) print("\n\n5. BEAUTIFULSOUP - HTML PARSING") print("-" * 30) print("BeautifulSoup parses HTML and XML documents.") print("Common uses: Web scraping, data extraction from web pages") # Simulating BeautifulSoup class MockBeautifulSoup: """Mock BeautifulSoup""" def __init__(self, html, parser): self.html = html def find(self, tag, attrs=None): print(f"Finding <{tag}> with attributes: {attrs}") return MockTag(f"<{tag}>Sample content</{tag}>") def find_all(self, tag, attrs=None): print(f"Finding all <{tag}> elements") return [ MockTag(f"<{tag}>Item 1</{tag}>"), MockTag(f"<{tag}>Item 2</{tag}>"), MockTag(f"<{tag}>Item 3</{tag}>") ] class MockTag: def __init__(self, content): self.content = content def text(self): return self.content.split(">")[1].split("<")[0] def get(self, attr): return f"value_of_{attr}" print("\nParsing HTML:") html_content = "<html><body><h1>Title</h1><p>Content</p></body></html>" soup = MockBeautifulSoup(html_content, "html.parser") # Find elements title = soup.find("h1") print(f"Found title: {title.text()}") all_paragraphs = soup.find_all("p") print(f"Found {len(all_paragraphs)} paragraphs") # Example 6: Pillow - Image Processing print("\n\n6. PILLOW - IMAGE PROCESSING") print("-" * 30) print("Pillow (PIL) handles image manipulation.") print("Common uses: Resizing, cropping, filtering, converting formats") # Simulating Pillow functionality class MockImage: """Mock PIL Image""" def __init__(self, filename): self.filename = filename self.size = (800, 600) # Default size self.format = "JPEG" def resize(self, new_size): print(f"Resizing image from {self.size} to {new_size}") self.size = new_size return self def crop(self, box): print(f"Cropping image with box: {box}") return self def save(self, new_filename, format=None): format = format or self.format print(f"Saving image as {new_filename} ({format})") @staticmethod def open(filename): print(f"Opening image: {filename}") return MockImage(filename) print("\nImage processing example:") img = MockImage.open("photo.jpg") print(f"Original size: {img.size}") # Resize image img.resize((400, 300)) print(f"New size: {img.size}") # Crop image img.crop((100, 100, 300, 200)) # Save image img.save("photo_resized.jpg", "JPEG") # Example 7: Real-world project using multiple libraries print("\n\n7. REAL-WORLD PROJECT: DATA ANALYSIS PIPELINE") print("-" * 30) print("Simulating a data analysis project with multiple libraries:") print("1. Pandas for data manipulation") print("2. NumPy for numerical operations") print("3. Matplotlib for visualization") print("4. Requests for data fetching") class DataAnalysisPipeline: """Mock data analysis pipeline""" def __init__(self): self.data = None def fetch_data(self, source): """Fetch data from source""" print(f"Fetching data from {source}...") # Simulate fetched data self.data = [ {"Date": "2024-01-01", "Sales": 150, "Visitors": 200}, {"Date": "2024-01-02", "Sales": 180, "Visitors": 220}, {"Date": "2024-01-03", "Sales": 200, "Visitors": 250}, {"Date": "2024-01-04", "Sales": 175, "Visitors": 210}, {"Date": "2024-01-05", "Sales": 220, "Visitors": 280}, ] print(f"Fetched {len(self.data)} records") def analyze_data(self): """Analyze the data""" if not self.data: print("No data to analyze") return print("\nData Analysis:") print("=" * 40) # Calculate statistics total_sales = sum(item["Sales"] for item in self.data) total_visitors = sum(item["Visitors"] for item in self.data) avg_sales = total_sales / len(self.data) conversion_rate = (total_sales / total_visitors) * 100 print(f"Total Sales: ${total_sales}") print(f"Total Visitors: {total_visitors}") print(f"Average Daily Sales: ${avg_sales:.2f}") print(f"Conversion Rate: {conversion_rate:.1f}%") # Find best day best_day = max(self.data, key=lambda x: x["Sales"]) print(f"\nBest Day: {best_day['Date']}") print(f"Sales: ${best_day['Sales']}, Visitors: {best_day['Visitors']}") def visualize_data(self): """Create visualizations""" if not self.data: print("No data to visualize") return print("\nData Visualization:") print("=" * 40) # Extract data for plotting dates = [item["Date"][-2:] for item in self.data] # Just day numbers sales = [item["Sales"] for item in self.data] visitors = [item["Visitors"] for item in self.data] # Create simple text-based charts print("\nSales Trend:") max_sales = max(sales) for date, sale in zip(dates, sales): bar_length = int((sale / max_sales) * 20) bar = "█" * bar_length print(f"Day {date}: ${sale:3} {bar}") print("\nVisitors Trend:") max_visitors = max(visitors) for date, visitor in zip(dates, visitors): bar_length = int((visitor / max_visitors) * 20) bar = "█" * bar_length print(f"Day {date}: {visitor:3} visitors {bar}") def generate_report(self): """Generate analysis report""" print("\nGenerating Report:") print("=" * 40) report = f""" DATA ANALYSIS REPORT ===================== Period: January 1-5, 2024 Summary: - Total Sales: ${sum(item['Sales'] for item in self.data)} - Total Visitors: {sum(item['Visitors'] for item in self.data)} - Average Conversion Rate: {(sum(item['Sales'] for item in self.data) / sum(item['Visitors'] for item in self.data)) * 100:.1f}% Recommendations: 1. Focus marketing on high-conversion days 2. Analyze visitor demographics for better targeting 3. Consider promotions during low-traffic periods Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')} """ print(report) # Save report to file with open("analysis_report.txt", "w") as f: f.write(report) print("Report saved to analysis_report.txt") # Run the pipeline print("\nRunning Data Analysis Pipeline:") pipeline = DataAnalysisPipeline() pipeline.fetch_data("sales_database") pipeline.analyze_data() pipeline.visualize_data() pipeline.generate_report() # Example 8: Installing and managing libraries print("\n\n8. INSTALLING AND MANAGING LIBRARIES") print("-" * 30) print("Using pip (Python package installer):") print("\nBasic commands:") print("pip install package_name # Install a package") print("pip install package_name==1.0.0 # Install specific version") print("pip install -r requirements.txt # Install from requirements file") print("pip uninstall package_name # Remove a package") print("pip list # Show installed packages") print("pip show package_name # Show package info") print("\nExample requirements.txt file:") requirements = """ # Project dependencies pandas==1.5.3 numpy==1.24.3 matplotlib==3.7.1 requests==2.28.2 beautifulsoup4==4.12.2 pillow==9.5.0 """ print(requirements) print("\nVirtual environments (venv):") print("python -m venv myenv # Create virtual environment") print("source myenv/bin/activate # Activate (Linux/Mac)") print("myenv\\Scripts\\activate # Activate (Windows)") print("deactivate # Deactivate") # Example 9: Popular Python libraries by category print("\n\n9. POPULAR PYTHON LIBRARIES") print("-" * 30) libraries_by_category = { "Data Science & Analysis": [ "pandas", "numpy", "scipy", "statsmodels" ], "Data Visualization": [ "matplotlib", "seaborn", "plotly", "bokeh" ], "Machine Learning": [ "scikit-learn", "tensorflow", "pytorch", "keras" ], "Web Development": [ "django", "flask", "fastapi", "requests" ], "Web Scraping": [ "beautifulsoup4", "scrapy", "selenium" ], "Database": [ "sqlalchemy", "psycopg2", "pymongo" ], "Testing": [ "pytest", "unittest", "nose" ], "Automation": [ "selenium", "pyautogui", "schedule" ] } for category, libs in libraries_by_category.items(): print(f"\n{category}:") for lib in libs: print(f" • {lib}") # Example 10: Creating your own library print("\n\n10. CREATING YOUR OWN LIBRARY") print("-" * 30) print("Structure of a Python library:") print(""" my_library/ README.md setup.py my_library/ __init__.py module1.py module2.py tests/ test_module1.py examples/ example_usage.py """) print("\nExample setup.py:") setup_py = """ from setuptools import setup, find_packages setup( name="my_library", version="0.1.0", author="Your Name", author_email="your.email@example.com", description="A short description of your library", packages=find_packages(), install_requires=[ 'requests>=2.25.0', 'pandas>=1.3.0', ], python_requires='>=3.7', ) """ print(setup_py) print("\nPublishing to PyPI:") print("1. Create accounts on test.pypi.org and pypi.org") print("2. Build package: python setup.py sdist bdist_wheel") print("3. Upload: twine upload dist/*") print("\nUsing your library:") print("pip install my_library") print("# Then in Python:") print("import my_library") print("from my_library import module1")