Programming Discussions

Tuesday, February 18, 2025

Python - list vs pandas.series

pandas.Series and Python's list both allow you to store a collection of data, but they have significant differences in terms of functionality, performance, and ease of use. Here's a comparison of the two:

1. Indexing and Labels

pandas.Series: Each element in a Series has an associated index (which can be labeled), making it easier to access data using meaningful labels rather than just integer-based indices.
- Example:
```
import pandas as pd
series = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(series['a'])  # Output: 10
```
list: A list is indexed by integers, starting from 0. There are no custom labels, only numerical indices.
- Example:
```
my_list = [10, 20, 30]
print(my_list[0])  # Output: 10
```

2. Data Types

pandas.Series: A Series can hold data of any type (integers, floats, strings, etc.), but it’s optimized for heterogeneous types and can handle missing data (e.g., NaN).
- Series can be numeric, boolean, string, and more.
list: A list in Python can also hold any type of data, but lists are not optimized for handling missing values or complex data operations like Series.
- Lists do not naturally handle NaN or missing values.

3. Vectorized Operations

pandas.Series: Supports vectorized operations (i.e., operations applied to every element without the need for explicit loops), which allows you to perform arithmetic operations and transformations on the entire series at once.
- Example:
```
series = pd.Series([1, 2, 3])
print(series * 2)  # Output: [2, 4, 6]
```
list: Does not support vectorized operations. You would have to use a loop or list comprehension to perform element-wise operations.
- Example:
```
my_list = [1, 2, 3]
result = [x * 2 for x in my_list]  # Output: [2, 4, 6]
```

4. Performance

pandas.Series: Optimized for large-scale data manipulation and performance. Series are implemented using NumPy arrays under the hood, allowing for efficient operations.
list: Slower when working with large datasets, especially for operations that require iteration or element-wise manipulation.

5. Missing Data Handling

pandas.Series: Supports missing data using NaN (Not a Number), and provides methods to handle missing data (e.g., isnull(), fillna()).
- Example:
```
series = pd.Series([1, None, 3])
print(series.isnull())  # Output: [False, True, False]
```
list: Does not have built-in support for missing values. You would have to use None or other custom indicators and manually handle missing data.

6. Aggregations and Functions

pandas.Series: Provides built-in methods for aggregation and statistical functions like sum(), mean(), std(), min(), max(), etc.
- Example:
```
series = pd.Series([1, 2, 3])
print(series.mean())  # Output: 2.0
```
list: Does not have direct support for aggregation functions. You would have to use external libraries (like sum(), min(), max()) or implement your own functions.
- Example:
```
my_list = [1, 2, 3]
print(sum(my_list))  # Output: 6
```

7. Alignment and Handling of Different Lengths

pandas.Series: Supports automatic alignment when performing operations on two Series, even if they have different indices. Missing values are filled with NaN.
- Example:
```
s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s2 = pd.Series([4, 5], index=['b', 'c'])
print(s1 + s2)  # Output: NaN for 'a' and sum for 'b' and 'c'
```
list: Does not have automatic alignment; you would need to manually ensure that lists are of equal length when performing element-wise operations.

8. Integration with DataFrames

pandas.Series: Commonly used as a column in a pandas.DataFrame. A DataFrame is essentially a collection of Series.
- Example:
```
df = pd.DataFrame({'col1': [1, 2, 3]})
print(df['col1'])  # Output: Series with index 0, 1, 2
```
list: Lists are more generic and do not integrate with DataFrames, though they can be converted to a DataFrame if needed.

9. Operations with Numpy

pandas.Series: Because it is built on top of NumPy, Series can directly interact with NumPy functions and arrays. You can use NumPy operations on Series for efficient computations.
list: Lists are not directly compatible with NumPy functions. You would need to first convert a list into a NumPy array before performing NumPy operations.

Summary Table

Feature	`pandas.Series`	`list`
Indexing	Labeled indices (customizable)	Integer-based indices
Data Types	Handles mixed data types, supports `NaN`	No built-in handling for missing data
Vectorized Operations	Yes (fast element-wise operations)	No (requires loops or list comprehension)
Performance	Optimized for large data, fast	Slower for large datasets
Missing Data	Built-in support for `NaN`	No built-in support (use `None`)
Aggregation Functions	Built-in (sum, mean, etc.)	Needs external functions or manual implementation
Data Alignment	Automatic alignment (different indices)	No automatic alignment
Integration with DataFrame	Core component of `DataFrame`	Can be converted to DataFrame

Conclusion:

pandas.Series is ideal for structured, labeled data and is a more powerful and efficient tool for data analysis, manipulation, and aggregation.
list is a general-purpose Python container, useful for simple collections of data, but lacks the advanced capabilities of pandas.Series.

Tuesday, February 11, 2025

PyTest Introduction

Pytest is a popular Python testing framework known for its simplicity and powerful features. It provides functionalities to:

Write and execute unit tests, integration tests, or functional tests.
Support for test fixtures to manage setup and teardown.
Parametrize tests to run the same test with multiple inputs.
Filter, skip, or mark tests for better control.
Generate test reports with plugins (like pytest-html).
Compatible with other Python libraries and tools.

How Does Pytest Know Which Functions to Test?

Pytest automatically discovers test functions based on naming conventions. When you run pytest, it looks for files, classes, and functions that match specific patterns.

Rules for Test Discovery in Pytest

File Names:
Pytest looks for files with names starting with test_ or ending with _test.
Example:
- test_example.py ✅
- example_test.py ✅
- example.py ❌ (Not discovered unless explicitly specified)

Function Names:
Pytest looks for functions inside those files with names starting with test_.
Example:

def test_addition():  # ✅ Will be discovered
    assert 1 + 1 == 2

def addition_test():  # ❌ Won't be discovered
    assert 1 + 1 == 2

Class Names:
If you want to organize tests into classes, pytest looks for classes whose names start with Test. These classes should not have an __init__ method.
Example:

class TestMath:  # ✅ Class will be discovered
    def test_addition(self):  # ✅ Function will be discovered
        assert 1 + 1 == 2

class MathTests:  # ❌ Class won't be discovered
    def test_subtraction(self):  # ❌ Function won't be discovered
        assert 2 - 1 == 1

Directories:
Pytest scans directories recursively for test files. By default, it looks in:
- The current working directory.
- Any subdirectories that don’t start with . or _.
Example directory structure:
```
project/
├── tests/
│   ├── test_math.py
│   ├── test_string.py
└── app/
    └── main.py
```
Running pytest from the project/ directory will automatically find tests/test_math.py and tests/test_string.py.

Examples

Discovered Test

File: test_sample.py

def test_sum():
    assert 2 + 2 == 4

Command:

$ pytest

Output:

========================================= test session starts =========================================
collected 1 item

test_sample.py .                                                                          [100%]
========================================== 1 passed in 0.01s ==========================================

Undiscovered Function

File: test_sample.py

def sum_test():  # Does not follow pytest naming convention
    assert 2 + 2 == 4

Command:

$ pytest

Output:

========================================= test session starts =========================================
collected 0 items

How to Override the Default Discovery Rules?

Run Specific Files/Functions Manually:
You can explicitly specify the file or function to test:
```
$ pytest test_example.py::test_specific_function
```

Change Naming Patterns:
Use the --pyargs or --collect-only options, or configure pytest.ini to adjust discovery:


# pytest.ini
[pytest]
python_files = *_tests.py  # Match files ending with '_tests.py'
python_classes = *TestCase  # Match classes ending with 'TestCase'
python_functions = test_*  # Match functions starting with 'test_'

Run All Functions:
Use the pytest --collect-only command to list all functions that match the discovery rules:
```
$ pytest --collect-only
```

Monday, February 10, 2025

Python - Lambda, generator, closures and decorators vs functions

While lambdas, generators, closures, and decorators are technically functions, they differ from regular user-defined functions in terms of purpose, scope, and functionality. Here's how they compare to user-defined functions:

1. Lambda Functions vs. User-Defined Functions

Key Differences:

Lambda:
- A lambda is a one-liner, anonymous function.
- Used for short, simple operations.
- Cannot contain statements (like if, for, etc.).
- Typically used as a temporary or inline function.
User-Defined Function:
- A named function defined using the def keyword.
- Can include multiple lines of code and more complex logic.
- Supports reusability across the program.

Example:

# Lambda
add_lambda = lambda x, y: x + y
print(add_lambda(3, 5))  # Output: 8

# User-defined function
def add_function(x, y):
    return x + y

print(add_function(3, 5))  # Output: 8

When to Use?

Use lambda for quick, simple tasks (e.g., sorting, filtering).
Use user-defined functions for more complex logic or when readability is important.

2. Generators vs. User-Defined Functions

Key Differences:

Generator:
- Defined like a function but uses the yield keyword instead of return.
- Produces a sequence of values lazily (one at a time), saving memory.
- Keeps track of the state between iterations.
User-Defined Function:
- Returns a single value or object and exits after execution.
- Cannot maintain state between function calls.

Example:

# Generator
def count_up_to(n):
    for i in range(1, n + 1):
        yield i

# User-defined function
def count_up_to_list(n):
    return list(range(1, n + 1))

# Using the generator
for num in count_up_to(5):
    print(num, end=" ")  # Output: 1 2 3 4 5

# Using the user-defined function
print(count_up_to_list(5))  # Output: [1, 2, 3, 4, 5]

When to Use?

Use generators for iterating over large datasets efficiently.
Use user-defined functions when you need the entire result at once.

3. Closures vs. User-Defined Functions

Key Differences:

Closure:
- A nested function that retains access to the variables in its enclosing scope even after the outer function has finished executing.
- Used for creating function factories or maintaining state.
User-Defined Function:
- Does not inherently retain any state beyond its local variables.
- Requires explicit state passing (e.g., arguments).

Example:

# Closure
def make_multiplier(factor):
    def multiplier(number):
        return number * factor
    return multiplier

times_two = make_multiplier(2)
print(times_two(5))  # Output: 10

# User-defined function
def multiplier(number, factor):
    return number * factor

print(multiplier(5, 2))  # Output: 10

When to Use?

Use closures for creating functions with predefined configurations or state.
Use user-defined functions when state is passed explicitly.

4. Decorators vs. User-Defined Functions

Key Differences:

Decorator:
- A higher-order function that takes another function as input and modifies or extends its behavior.
- Applied using the @decorator syntax.
- Adds functionality without modifying the original function.
User-Defined Function:
- Typically performs a specific operation.
- Does not modify other functions unless explicitly designed to do so.

Example:

# Decorator
def logger(func):
    def wrapper(*args, **kwargs):
        print(f"Function {func.__name__} is called with {args}")
        return func(*args, **kwargs)
    return wrapper

@logger
def greet(name):
    return f"Hello, {name}!"

print(greet("Alice"))
# Output:
# Function greet is called with ('Alice',)
# Hello, Alice!

# User-defined function
def greet(name):
    return f"Hello, {name}!"

print(greet("Alice"))  # Output: Hello, Alice!

When to Use?

Use decorators to extend or modify the behavior of existing functions (e.g., logging, authentication).
Use user-defined functions for standalone operations.

Summary Table:

Feature	Purpose	When to Use?
Lambda	Simple, inline, anonymous functions	Short operations like sorting or filtering
Generator	Lazy iteration over large data	Handling large datasets or infinite sequences efficiently
Closure	Retain state in a nested function	Function factories or functions with pre-configured parameters
Decorator	Modify/extend another function's behavior	Adding functionality (e.g., logging, authentication) to existing functions
User-Defined Function	General-purpose reusable functions	Any operation requiring more complex logic or structure

These features provide specialized ways to make your code more concise, efficient, or reusable in specific scenarios.

List and Tuple - Difference

The key difference between lists and tuples in Python is their mutability:

List: Ordered and mutable, meaning you can change its elements (add, remove, or modify them).
Tuple: Ordered but immutable, meaning once it's created, its elements cannot be changed.

Example: List

# Creating a list
my_list = [1, 2, 3]

# Modifying the list
my_list[0] = 10  # Update an element
my_list.append(4)  # Add a new element
my_list.remove(2)  # Remove an element

print(my_list)  # Output: [10, 3, 4]

Example: Tuple

# Creating a tuple
my_tuple = (1, 2, 3)

# Trying to modify the tuple
# my_tuple[0] = 10  # This will raise an error: TypeError: 'tuple' object does not support item assignment

# Accessing elements (this is allowed)
print(my_tuple[0])  # Output: 1

Key Characteristics:

Mutability:
- Lists allow you to modify their elements.
- Tuples do not allow modification after creation.
Performance:
- Tuples are slightly faster than lists due to their immutability.
Usage:
- Use lists when the data needs to change (e.g., dynamically adding/removing elements).
- Use tuples for fixed collections of data (e.g., coordinates (x, y) or configuration data).

Practical Difference Example:

List Use Case:

# Shopping list: It changes over time
shopping_list = ["milk", "bread"]
shopping_list.append("eggs")
print(shopping_list)  # Output: ['milk', 'bread', 'eggs']

Tuple Use Case:

# Days of the week: Fixed and unchanging
days_of_week = ("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
print(days_of_week)  # Output: ('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday')

Tuesday, January 24, 2023

AttributeError: 'DataFrame' object has no attribute 'unique'

The error message "'DataFrame' object has no attribute 'unique'" occurs when you are trying to use the unique() method on a pandas DataFrame object, but this method is only available for pandas Series objects.

#1. Applying unique to dataframe

df = pd.DataFrame([{"A":"aaa","B":"bbb"}])

print(df.unique()) #Throws error

#2. Error can be reproduced if dataframe has multiple columns with same name.The error can pop-up if column is renamed mistakenly.

import pandas as pd
df = pd.DataFrame([{"A":"aaa","B":"bbb"}])
print(df)

print(df['A'].dropna().unique()) #works fine
df =df.rename(columns={ df.columns[1]:df.columns[0] }) # column

print(df)

print(df['A'].dropna().unique()) # 

Hope it helps!!

Monday, September 11, 2017

working with report model

To create a shared data source

In Solution Explorer, right-click Shared Data Sources, and then select Add New Data Source.
In the Name box, type: RMS.
In the Type drop-down list, select Report Server Model.
In the Connection string area, type: Server=<>; datasource=<>.
Select Credentials.
Select Use Windows Authentication (Integrated Security) and then click OK.

The data source appears in the Shared Data Sources folder in Solution Explorer.
On the File menu, click Save All.

XML parsing: line 1, character 75, illegal name character

When saving strings to XML, or when trying to extract text within tags it important to escape invalid characters . The following table shows the invalid XML characters and their escaped equivalents.

Invalid XML Character Replaced With
< <
> >
" "
' '
& &

if we try following code in SQL window, where we are trying to extract text from within html tags.

declare @v varchar(40)
Set @v='a & b'
Select cast(@v as XML).value('.','varchar(max)')

we will receive error like "XML parsing: line 1, character xx, illegal name character"

Solution:

declare @v varchar(40)
Set @v='a & b'
Select cast(replace(replace(replace(@v,'>','><![CDATA['),'</',']]></')+']]>','<![CDATA[]]>','') as XML).value('.','varchar(max)')

As we know CDATA section is "a section of element content that is marked for the parser to interpret as only character data, not markup." so we will include CDATA in such a way that text is inside it and parsing of it wouldn't result in error.

Hope it helps

Link Unit

Tuesday, February 18, 2025

Python - list vs pandas.series

1. Indexing and Labels

2. Data Types

3. Vectorized Operations

4. Performance

5. Missing Data Handling

6. Aggregations and Functions

7. Alignment and Handling of Different Lengths

8. Integration with DataFrames

9. Operations with Numpy

Summary Table

Conclusion:

Tuesday, February 11, 2025

PyTest Introduction

How Does Pytest Know Which Functions to Test?

Rules for Test Discovery in Pytest

Examples

Discovered Test

Undiscovered Function

How to Override the Default Discovery Rules?

Monday, February 10, 2025

Python - Lambda, generator, closures and decorators vs functions

1. Lambda Functions vs. User-Defined Functions

Key Differences:

Example:

2. Generators vs. User-Defined Functions

Key Differences:

Example:

3. Closures vs. User-Defined Functions

Key Differences:

Example:

4. Decorators vs. User-Defined Functions

Key Differences:

Example:

Summary Table:

List and Tuple - Difference

Example: List

Example: Tuple

Key Characteristics:

Practical Difference Example:

List Use Case:

Tuple Use Case:

Tuesday, January 24, 2023

AttributeError: 'DataFrame' object has no attribute 'unique'

AttributeError: 'DataFrame' object has no attribute 'unique'

Monday, September 11, 2017

working with report model

To create a shared data source

XML parsing: line 1, character 75, illegal name character