Code Project

Link Unit

Tuesday, February 18, 2025

Python - list vs pandas.series

 pandas.Series and Python's list both allow you to store a collection of data, but they have significant differences in terms of functionality, performance, and ease of use. Here's a comparison of the two:

1. Indexing and Labels

  • pandas.Series: Each element in a Series has an associated index (which can be labeled), making it easier to access data using meaningful labels rather than just integer-based indices.
    • Example:
      import pandas as pd
      series = pd.Series([10, 20, 30], index=['a', 'b', 'c']) print(series['a']) # Output: 10
  • list: A list is indexed by integers, starting from 0. There are no custom labels, only numerical indices.
    • Example:
      my_list = [10, 20, 30]
      print(my_list[0]) # Output: 10

2. Data Types

  • pandas.Series: A Series can hold data of any type (integers, floats, strings, etc.), but it’s optimized for heterogeneous types and can handle missing data (e.g., NaN).
    • Series can be numeric, boolean, string, and more.
  • list: A list in Python can also hold any type of data, but lists are not optimized for handling missing values or complex data operations like Series.
    • Lists do not naturally handle NaN or missing values.

3. Vectorized Operations

  • pandas.Series: Supports vectorized operations (i.e., operations applied to every element without the need for explicit loops), which allows you to perform arithmetic operations and transformations on the entire series at once.
    • Example:
      series = pd.Series([1, 2, 3])
      print(series * 2) # Output: [2, 4, 6]
  • list: Does not support vectorized operations. You would have to use a loop or list comprehension to perform element-wise operations.
    • Example:
      my_list = [1, 2, 3]
      result = [x * 2 for x in my_list] # Output: [2, 4, 6]

4. Performance

  • pandas.Series: Optimized for large-scale data manipulation and performance. Series are implemented using NumPy arrays under the hood, allowing for efficient operations.
  • list: Slower when working with large datasets, especially for operations that require iteration or element-wise manipulation.

5. Missing Data Handling

  • pandas.Series: Supports missing data using NaN (Not a Number), and provides methods to handle missing data (e.g., isnull(), fillna()).
    • Example:
      series = pd.Series([1, None, 3])
      print(series.isnull()) # Output: [False, True, False]
  • list: Does not have built-in support for missing values. You would have to use None or other custom indicators and manually handle missing data.

6. Aggregations and Functions

  • pandas.Series: Provides built-in methods for aggregation and statistical functions like sum(), mean(), std(), min(), max(), etc.
    • Example:
      series = pd.Series([1, 2, 3])
      print(series.mean()) # Output: 2.0
  • list: Does not have direct support for aggregation functions. You would have to use external libraries (like sum(), min(), max()) or implement your own functions.
    • Example:
      my_list = [1, 2, 3]
      print(sum(my_list)) # Output: 6

7. Alignment and Handling of Different Lengths

  • pandas.Series: Supports automatic alignment when performing operations on two Series, even if they have different indices. Missing values are filled with NaN.
    • Example:
      s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
      s2 = pd.Series([4, 5], index=['b', 'c']) print(s1 + s2) # Output: NaN for 'a' and sum for 'b' and 'c'
  • list: Does not have automatic alignment; you would need to manually ensure that lists are of equal length when performing element-wise operations.

8. Integration with DataFrames

  • pandas.Series: Commonly used as a column in a pandas.DataFrame. A DataFrame is essentially a collection of Series.
    • Example:
      df = pd.DataFrame({'col1': [1, 2, 3]})
      print(df['col1']) # Output: Series with index 0, 1, 2
  • list: Lists are more generic and do not integrate with DataFrames, though they can be converted to a DataFrame if needed.

9. Operations with Numpy

  • pandas.Series: Because it is built on top of NumPy, Series can directly interact with NumPy functions and arrays. You can use NumPy operations on Series for efficient computations.
  • list: Lists are not directly compatible with NumPy functions. You would need to first convert a list into a NumPy array before performing NumPy operations.

Summary Table

Featurepandas.Serieslist
IndexingLabeled indices (customizable)Integer-based indices
Data TypesHandles mixed data types, supports NaNNo built-in handling for missing data
Vectorized OperationsYes (fast element-wise operations)No (requires loops or list comprehension)
PerformanceOptimized for large data, fastSlower for large datasets
Missing DataBuilt-in support for NaNNo built-in support (use None)
Aggregation FunctionsBuilt-in (sum, mean, etc.)Needs external functions or manual implementation
Data AlignmentAutomatic alignment (different indices)No automatic alignment
Integration with DataFrameCore component of DataFrameCan be converted to DataFrame

Conclusion:

  • pandas.Series is ideal for structured, labeled data and is a more powerful and efficient tool for data analysis, manipulation, and aggregation.
  • list is a general-purpose Python container, useful for simple collections of data, but lacks the advanced capabilities of pandas.Series.
Blogger Tricks

Tuesday, February 11, 2025

PyTest Introduction

 Pytest is a popular Python testing framework known for its simplicity and powerful features. It provides functionalities to:

  • Write and execute unit tests, integration tests, or functional tests.
  • Support for test fixtures to manage setup and teardown.
  • Parametrize tests to run the same test with multiple inputs.
  • Filter, skip, or mark tests for better control.
  • Generate test reports with plugins (like pytest-html).
  • Compatible with other Python libraries and tools.

How Does Pytest Know Which Functions to Test?

Pytest automatically discovers test functions based on naming conventions. When you run pytest, it looks for files, classes, and functions that match specific patterns.

Rules for Test Discovery in Pytest

  1. File Names:
    Pytest looks for files with names starting with test_ or ending with _test.
    Example:

    • test_example.py
    • example_test.py
    • example.py ❌ (Not discovered unless explicitly specified)
  2. Function Names:
    Pytest looks for functions inside those files with names starting with test_.
    Example:

    def test_addition(): # ✅ Will be discovered
    assert 1 + 1 == 2 def addition_test(): # ❌ Won't be discovered assert 1 + 1 == 2
  3. Class Names:
    If you want to organize tests into classes, pytest looks for classes whose names start with Test. These classes should not have an __init__ method.
    Example:

    class TestMath: # ✅ Class will be discovered
    def test_addition(self): # ✅ Function will be discovered assert 1 + 1 == 2 class MathTests: # ❌ Class won't be discovered def test_subtraction(self): # ❌ Function won't be discovered assert 2 - 1 == 1
  4. Directories:
    Pytest scans directories recursively for test files. By default, it looks in:

    • The current working directory.
    • Any subdirectories that don’t start with . or _.

    Example directory structure:

    project/
    ├── tests/ │ ├── test_math.py │ ├── test_string.py └── app/ └── main.py

    Running pytest from the project/ directory will automatically find tests/test_math.py and tests/test_string.py.


Examples

Discovered Test

File: test_sample.py

def test_sum():
assert 2 + 2 == 4

Command:

$ pytest

Output:

========================================= test session starts =========================================
collected 1 item test_sample.py . [100%] ========================================== 1 passed in 0.01s ==========================================

Undiscovered Function

File: test_sample.py

def sum_test(): # Does not follow pytest naming convention
assert 2 + 2 == 4

Command:

$ pytest

Output:

========================================= test session starts =========================================
collected 0 items

How to Override the Default Discovery Rules?

  1. Run Specific Files/Functions Manually:
    You can explicitly specify the file or function to test:

    $ pytest test_example.py::test_specific_function
  2. Change Naming Patterns:
    Use the --pyargs or --collect-only options, or configure pytest.ini to adjust discovery:


    # pytest.ini [pytest] python_files = *_tests.py # Match files ending with '_tests.py' python_classes = *TestCase # Match classes ending with 'TestCase' python_functions = test_* # Match functions starting with 'test_'
  3. Run All Functions:
    Use the pytest --collect-only command to list all functions that match the discovery rules:

    $ pytest --collect-only

Monday, February 10, 2025

Python - Lambda, generator, closures and decorators vs functions

While lambdas, generators, closures, and decorators are technically functions, they differ from regular user-defined functions in terms of purpose, scope, and functionality. Here's how they compare to user-defined functions:


1. Lambda Functions vs. User-Defined Functions

Key Differences:

  • Lambda:

    • A lambda is a one-liner, anonymous function.
    • Used for short, simple operations.
    • Cannot contain statements (like if, for, etc.).
    • Typically used as a temporary or inline function.
  • User-Defined Function:

    • A named function defined using the def keyword.
    • Can include multiple lines of code and more complex logic.
    • Supports reusability across the program.

Example:

# Lambda
add_lambda = lambda x, y: x + y print(add_lambda(3, 5)) # Output: 8 # User-defined function def add_function(x, y): return x + y print(add_function(3, 5)) # Output: 8

When to Use?

  • Use lambda for quick, simple tasks (e.g., sorting, filtering).
  • Use user-defined functions for more complex logic or when readability is important.

2. Generators vs. User-Defined Functions

Key Differences:

  • Generator:

    • Defined like a function but uses the yield keyword instead of return.
    • Produces a sequence of values lazily (one at a time), saving memory.
    • Keeps track of the state between iterations.
  • User-Defined Function:

    • Returns a single value or object and exits after execution.
    • Cannot maintain state between function calls.

Example:

# Generator
def count_up_to(n): for i in range(1, n + 1): yield i # User-defined function def count_up_to_list(n): return list(range(1, n + 1)) # Using the generator for num in count_up_to(5): print(num, end=" ") # Output: 1 2 3 4 5 # Using the user-defined function print(count_up_to_list(5)) # Output: [1, 2, 3, 4, 5]

When to Use?

  • Use generators for iterating over large datasets efficiently.
  • Use user-defined functions when you need the entire result at once.

3. Closures vs. User-Defined Functions

Key Differences:

  • Closure:

    • A nested function that retains access to the variables in its enclosing scope even after the outer function has finished executing.
    • Used for creating function factories or maintaining state.
  • User-Defined Function:

    • Does not inherently retain any state beyond its local variables.
    • Requires explicit state passing (e.g., arguments).

Example:

# Closure
def make_multiplier(factor): def multiplier(number): return number * factor return multiplier times_two = make_multiplier(2) print(times_two(5)) # Output: 10 # User-defined function def multiplier(number, factor): return number * factor print(multiplier(5, 2)) # Output: 10

When to Use?

  • Use closures for creating functions with predefined configurations or state.
  • Use user-defined functions when state is passed explicitly.

4. Decorators vs. User-Defined Functions

Key Differences:

  • Decorator:

    • A higher-order function that takes another function as input and modifies or extends its behavior.
    • Applied using the @decorator syntax.
    • Adds functionality without modifying the original function.
  • User-Defined Function:

    • Typically performs a specific operation.
    • Does not modify other functions unless explicitly designed to do so.

Example:

# Decorator
def logger(func): def wrapper(*args, **kwargs): print(f"Function {func.__name__} is called with {args}") return func(*args, **kwargs) return wrapper @logger def greet(name): return f"Hello, {name}!" print(greet("Alice")) # Output: # Function greet is called with ('Alice',) # Hello, Alice! # User-defined function def greet(name): return f"Hello, {name}!" print(greet("Alice")) # Output: Hello, Alice!

When to Use?

  • Use decorators to extend or modify the behavior of existing functions (e.g., logging, authentication).
  • Use user-defined functions for standalone operations.

Summary Table:

FeaturePurposeWhen to Use?
LambdaSimple, inline, anonymous functionsShort operations like sorting or filtering
GeneratorLazy iteration over large dataHandling large datasets or infinite sequences efficiently
ClosureRetain state in a nested functionFunction factories or functions with pre-configured parameters
DecoratorModify/extend another function's behaviorAdding functionality (e.g., logging, authentication) to existing functions
User-Defined FunctionGeneral-purpose reusable functionsAny operation requiring more complex logic or structure

These features provide specialized ways to make your code more concise, efficient, or reusable in specific scenarios.

List and Tuple - Difference

 The key difference between lists and tuples in Python is their mutability:

  • List: Ordered and mutable, meaning you can change its elements (add, remove, or modify them).
  • Tuple: Ordered but immutable, meaning once it's created, its elements cannot be changed.

Example: List

# Creating a list
my_list = [1, 2, 3] # Modifying the list my_list[0] = 10 # Update an element my_list.append(4) # Add a new element my_list.remove(2) # Remove an element print(my_list) # Output: [10, 3, 4]

Example: Tuple

# Creating a tuple
my_tuple = (1, 2, 3) # Trying to modify the tuple # my_tuple[0] = 10 # This will raise an error: TypeError: 'tuple' object does not support item assignment # Accessing elements (this is allowed) print(my_tuple[0]) # Output: 1

Key Characteristics:

  1. Mutability:

    • Lists allow you to modify their elements.
    • Tuples do not allow modification after creation.
  2. Performance:

    • Tuples are slightly faster than lists due to their immutability.
  3. Usage:

    • Use lists when the data needs to change (e.g., dynamically adding/removing elements).
    • Use tuples for fixed collections of data (e.g., coordinates (x, y) or configuration data).

Practical Difference Example:

List Use Case:

# Shopping list: It changes over time
shopping_list = ["milk", "bread"] shopping_list.append("eggs") print(shopping_list) # Output: ['milk', 'bread', 'eggs']

Tuple Use Case:

# Days of the week: Fixed and unchanging
days_of_week = ("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday") print(days_of_week) # Output: ('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday')

Tuesday, January 24, 2023

AttributeError: 'DataFrame' object has no attribute 'unique'

AttributeError: 'DataFrame' object has no attribute 'unique'

The error message "'DataFrame' object has no attribute 'unique'" occurs when you are trying to use the unique() method on a pandas DataFrame object, but this method is only available for pandas Series objects.

#1. Applying unique to dataframe

df = pd.DataFrame([{"A":"aaa","B":"bbb"}])
print(df.unique()) #Throws error

#2. Error can be reproduced if dataframe has multiple columns with same name.The error can pop-up if column is renamed mistakenly.

import pandas as pd
df = pd.DataFrame([{"A":"aaa","B":"bbb"}])
print(df)

print(df['A'].dropna().unique()) #works fine
df =df.rename(columns={ df.columns[1]:df.columns[0] }) # column

print(df)

print(df['A'].dropna().unique()) #

Hope it helps!!

Monday, September 11, 2017

working with report model

To create a shared data source

  1. In Solution Explorer, right-click Shared Data Sources, and then select Add New Data Source.
  2. In the Name box, type: RMS.
  3. In the Type drop-down list, select Report Server Model.
  4. In the Connection string area, type: Server=<>; datasource=<>.
  5. Select Credentials.
  6. Select Use Windows Authentication (Integrated Security) and then click OK.
    The data source appears in the Shared Data Sources folder in Solution Explorer.
  7. On the File menu, click Save All.

XML parsing: line 1, character 75, illegal name character

When saving strings to XML, or when trying to extract text within tags it important to escape invalid characters . The following table shows the invalid XML characters and their escaped equivalents.

Invalid XML Character Replaced With
<                          <
>                          >
" "
' '
& &

if we try following code in SQL window, where we are trying to extract text from within html tags.

declare @v varchar(40)
Set @v='a & b'
Select cast(@v as XML).value('.','varchar(max)')

we will receive error like "XML parsing: line 1, character xx, illegal name character"

Solution: 

declare @v varchar(40)
Set @v='a & b'
Select cast(replace(replace(replace(@v,'>','><![CDATA['),'</',']]></')+']]>','<![CDATA[]]>','') as XML).value('.','varchar(max)')

As we know CDATA section is "a section of element content that is marked for the parser to interpret as only character data, not markup." so we will include CDATA in such a way that text is inside it and parsing of it wouldn't result in error. 

Hope it helps