Python¶
In this section we introduce intermediate python programming and some packages e.g. numpy, pandas.
- references - https://github.com/yasoob/intermediatePython 
- https://www.liaoxuefeng.com/wiki/1016959663602400 (Chinese) 
- https://wiki.python.org/moin/Powerful%20Python%20One-Liners 
 
Programmer Tools¶
Debugging¶
- methods - pdb.set_trace()to pause running. now use- breakpoint()after 3.7
- assert x == 2, 'msg'
- loggingand output specific msg type
 
- see details 
Object Introspection¶
- dir()- return a list of attributes and methods belonging to an object 
 
- type()
- id()
- inspect.getmembers()
Syntax¶
exceptions¶
- try, except, else, finally
- try, except E1 as e, except E2 as eto catch multiple error one by one
- try, except Exception as eto catch multiple errors at once
for/else¶
- elseclause is executes after the loop completes normally (without- break)
- e.g. loop to search, if found then - break, if not found then go to- else
tenery operator¶
- x = 1 if a > 1 else a
- name = 'a' or 'b'return- a.
- dynamic default name 
def my_function(real_name, optional_display_name=None):
    optional_display_name = optional_display_name or real_name
*args and **kwargs¶
- when define - def fun(*arg, **kwarg)- argspasses an unspecified number of non-keyworded arguments in a list
- kwargspasses an unspecified number of keyworded arguments in a dictionary
- e.g. pass plot arguments to - plt.plot()in self-defined plot functions.
 
- when call - fun(*arg, **kwarg)- argscan be a pre-defined tuple
- kwargscan be a pre-defined dictionary, with arg-value being the key-value pair
- *and- **is used to unpack
 
open() and context managers¶
- see - https://github.com/yasoob/intermediatePython/blob/master/open_function.rst 
- https://github.com/yasoob/intermediatePython/blob/master/context_managers.rst 
 
Data Structures¶
Mutable vs Immutable¶
- identity, type, and value - an object’s identity never changes once it has been created; you may think of it as the object’s address in memory. - The - isoperator compares the identity of two objects
- The - id()function returns an integer representing its identity
- objects with different names but have the same identity are referencing to the same object in computer memory 
 
- an object’s type defines the possible values and methods - the - type()function returns the type of an object. An object type is unchangeable like the identity.
 
- an object’s value can or cannot be changed depending by its type - objects whose value is changed with their identity unchanged are said to be mutable, e.g., list, dictionary, set and user-defined classes. 
- objects whose identity must be changed once its value is changed are called immutable, this means if we change its value then a new object (with new identity) is created, e.g., int, float, decimal, bool, string, tuple, and range. After the value is changed, its - idalso changes
- the - ==operator is used to compare the values of two objects
 
 
- difference between mutable and immutable objects arises when you assign a variable to another variable - a = 1 ## int is immutable b = a print(b is a) ## True idb_before = id(b) b = b + 1 id(b) == idb_before ## False, a new object is created print(a is b) ## False print(a) ## 1 a = [1] ## list is mutable b = a print(b is a) ## True idb_before = id(b) b.append([2]) id(b) == idb_before ## True, only the value is changed print(a is b) ## True print(a) ## [1,2], changed 
- there is also difference if we set the default argument to be a mutable object in a function - def add_to(num, target=[]): target.append(num) return target add_to(1) ## [1] add_to(2) ## [1, 2] add_to(3, target=[4]) ## [4, 3] add_to(5) ## [1, 2, 5] - in Python, the default arguments are evaluated (and their identities are created in the memory) once the function is defined, not each time the function is called 
- add_to(1)used the default value of target- []which is created in the memory when the function is defined, and changed it to- [1]
- add_to(2)used the- targetin the memory, which is- [1]
- add_to(3, target=[4])used the passed value- [4], not the object in the memory
- add_to(5)used the- targetin the memory again, which is- [1,2]resulting from- add_to(2)
- to be safe, mutable type as default value should be defined in the following way - def add_to(num, target=None): if target is None: target = [] target.append(num) return target 
 
- mutability of containers - some objects contain references to other objects, these objects are called containers. Some examples of containers are a tuple, list, and dictionary. The value of an immutable container that contains a reference to a mutable member can be changed if that mutable member is changed. However, the container is still considered immutable because when we talk about the mutability of a container only the identities of the contained objects are implied. 
- if an immutable container contains only immutable members, then the value of the container cannot be changed if we change the immutable members 
 
- ref: - https://towardsdatascience.com/https-towardsdatascience-com-python-basics-mutable-vs-immutable-objects-829a0cb1530a 
 
Classes and Magic Methods¶
- functions vs methods - functions may be associated with packages, e.g. - np.sqrt()
- methods are always associated with objects, e.g. - df.head()
 
- class variables vs instance variables - instance variables are unique to every object 
- class variables are for data shared between different instances of a class 
- using mutable class variables is dangerous 
- e.g. in the example below, - nameis a instance variable,- piis a immutable class variable, and- superpowersis a mutable class variable- class SuperClass(): superpowers = [] pi = 3.14 def __init__(self, name): self.name = name def add_superpower(self, power): self.superpowers.append(power) foo = SuperClass('foo') bar = SuperClass('bar') foo.name ## 'foo' bar.name ## 'bar' foo.pi = 10 print(foo.pi) ## 10 print(bar.pi) ## 3.14 foo.add_superpower('fly') bar.superpowers ## ['fly'] foo.superpowers ## ['fly'] 
 
- magic methods - magic methods are also called dunder (double underscore) methods - e.g. - __init__, __getitem__, __iter__, __next__, __call__, etc.
 
- __slots__- by default Python uses a dict to store an object’s instance attributes - pros: allows setting arbitrary new attributes at runtime 
- cons: wastes a lot of RAM if you create a lot of objects with known attributes 
- solution: store the fixed set of attributes in slots to save 50% RAM 
 
- just specify the attributes names in a list and pass it to - __slots__- class MyClass(object): __slots__ = ['name', 'identifier'] def __init__(self, name, identifier): self.name = name self.identifier = identifier self.set_up() 
- ref: https://stackoverflow.com/questions/472000/usage-of-slots 
 
 
Iterables, Iterators, Generators and Coroutines¶
- An - iteratbleis any object in Python which has an- __iter__or a- __getitem__method defined, which returns an iterator or can take indexes
- An - iteratoris any object in Python which has a- __next__method defined- e.g. - stris an itertable but not an iterator.- iter(iterable)will return an inerator object.
- next(iterator)allows us to access the next element
 - my_string = "Yasoob" my_iter = iter(my_string) print(next(my_iter)) ## 'Y' 
- An - generatoris an- iterator, but you can only iterate over it once. They do not store all he values in memory, they generate the values on the fly.- can be defined by generator comprehensions - (i for i in range(10))
- can be defined by function using - yield
 - def generator_function(): for i in range(3): yield i gen = generator_function() print(next(gen)) ## 0 for i in gen: print(i) ## 1,2 
- coroutines are similar to generators but it takes value from input - next()to execute it
- .send()to input the next value to- yield
- .close()to close
- see https://github.com/yasoob/intermediatePython/blob/master/coroutines.rst 
 
Operations¶
- Complexity of operations (link)[https://www.cnblogs.com/luozx207/p/12793168.html]. 
- set.remove()vs- set.discard(): The- remove()method raises an error when the specified element doesn’t exist in the given set, however the- discard()method doesn’t raise any error if the specified element is not present in the set and the set remains unchanged.
collections module¶
- the - collectionspython module contains a number of useful container data types
- defaultdict- defaultdictis a sub-class of the dict class that returns a dictionary-like object. The functionality of both dictionaries and- defualtdictare almost same except for the fact that- defualtdictnever raises a- KeyError. It provides a default value for the key that does not exists.
- for details see https://www.geeksforgeeks.org/defaultdict-in-python/ 
- e.g. - from collections import defaultdict def default_value(): return "Not Present" d = defaultdict(def_value) ## or defaultdict(lambda: "Not Present") d["a"] = 1 print(d["a"]) ## 1 print(d["b"]) ## "Not Present" 
- one can also specify the default ‘null’ type of the value by - defaultdict(factory_function), where- factory_functioncan be- int, str, set, listetc.
 
- OrderedDict- OrderedDictkeeps its entries sorted as they are initially inserted. Overwriting a value of an existing key doesn’t change the position of that key. However, deleting and reinserting an entry moves the key to the end of the dictionary.
- e.g. - from collections import OrderedDict colours = OrderedDict([("Red", 198), ("Green", 170), ("Blue", 160)]) for k, v in colours.items(): print(k, v) 
 
- Counter- Counteris used to count the number of occurrences of a particular item in an iterable, and return a dictionary-like- Counterobject.
- e.g. - from collections import Counter l = ['a', 'b', 'a', 'c'] freq = Counter(l) for k, v in colours.items(): print(k, v) 
 
- deque- dequeis preferred over list in the cases where we need quicker append and pop operations from both the ends of container, as- dequeprovides an O(1) time complexity for append and pop operations as compared to- listwhich provides O(n) time complexity.
- methods include - appendleft(), extendleft()and- popleft()
- We can also limit the amount of items a deque can hold, e.g. - deque([0, 1, 2, 3, 5], maxlen=5). By doing this when we achieve the maximum limit of our deque it will simply pop out the items from the opposite end.
 
- namedtuple- We’ve known that a tuple is basically a immutable list. Likewise, a - namedtuplecan be seen as a immutable dictionary.
- namedtuples are backwards compatible with normal tuples (e.g. indexed by integer), and require no more memory than regular tuples. 
- namedtuples are more lightweight and faster than dictionaries, and can be convert to dictionaries 
- A named tuple has two required argument: - tuple nameand the- field_names. e.g.- from collections import namedtuple Animal = namedtuple('Animal', 'name age type') ## tuple name and field names perry = Animal(name='perry', age=31, type='cat') print(perry[0]) ## 'perry', index by integer like a regular tuple print(perry.name) ## 'perry', index by key like a dictionary print(perry._asdict()) ## convert to an OrderedDict perry.age = 42 ## error, since it is immutable 
 
- Enum- Enumis a data container that is preferred when we require immutable and unique keys or values. For instance, weekday names and weekday numbers.- from enum import Enum class Weekday(Enum): Mon = 1 Tue = 2 Mon = 4 ## error, duplicate keys are not allowed Monday = 1 ## alias keys are allowed. can use @unique to disable them Weekday.Mon = 1 ## error, immutable Weekday.Monday ## output: Weekday.Mon 
- To get enumeration members, use - Weekday(1), Weekday['Mon']or- Weekday.Mon
- To get member names and values, use - member.nameand- member.value
- A one-liner to define an - Enumclass (indexing from 1 by default)- Weekday = Enum('Day', ('Mon', 'Tue', 'Wed', 'Th', 'Fri', 'Sat', 'Sun')) print(Weekday.Mon.value) ## 1 
 
Functional Programming¶
enumerate()¶
- can take an optional argument to specify the starting index - enumerate(my_list, 1)
- can also be used to create a list of tuples - list(enumerate(my_list, 1))
lambda¶
- used to define a anonymous function 
- e.g. sort a list of tuples by the first element in that tuple - a = [(1, 2), (4, 1), (9, 10)] a.sort(key=lambda x: x[1]) 
sorted()¶
- the - list.sort()method is only defined for lists.
- in contrast, the - sorted()function accepts any iterable.
- e.g. sort words in a sentence in alphabet order. - sorted("This is a test string from Andrew".split(), key=str.lower) 
- the key-function can be - itemgetter()or- attrgetter()from the- operatormodule.
- see https://docs.python.org/3/howto/sorting.html 
map(), filter() and reduce()¶
- map(fun, iterable)may be faster than list comprehension if- funis pre-defined (not through- lambda)
- filter(fun, iterable)is used for masking, where- funshould return- True/False
- reduce(fun, iterable, initilizer=None)applies a particular function passed in its argument to all of the list elements mentioned in the sequence passed along.- def reduce(function, iterable, initializer=None): roughly equivalent it = iter(iterable) if initializer is None: value = next(it) else: value = initializer for element in it: value = function(value, element) return value from functools import reduce reduce(lambda a, b: a + b, l) sum(l) reduce(lambda a, b : a if a > b else b, l) max(l) reduce(lambda z, x: z + [y + [x] for y in z], l, [[]]) all subsets of l 
Comprehensions¶
- listcomprehensions:- squared = [x**2 for x in range(10)]
- setcomprehensions:- {x**2 for x in [1, 1, 2]}
- dictcomprehensions:- {key: value for ... }- e.g. swap keys and values - {v: k for k, v in some_dict.items()}
 
- generatorcomprehensions- don’t allocate memory for the whole list but generate one item at a time, thus more memory efficient. 
 - my_gen = (i for i in range(30) if i % 3 == 0) for x in my_gen: ... 
Numpy¶
- isin
- repeatvs- tile
- concatenate,- vstackand- hstack
- strides
- empty
- arrayvs- asarray
Pandas¶
ref
Series¶
“Series = Vector + labels”
attributes
- .index
- .values
methods
- .describe()
- .head(), .tail()
- .plot()
DataFrame¶
Indexing¶
| Select | Syntax | Result | 
|---|---|---|
| a column | 
 | Series | 
| columns by labels | 
 | DataFrame | 
| columns by labels | 
 | DataFrame | 
| a row by its label | 
 | Series | 
| a row by its integer location | 
 | Series | 
| rows by integers | 
 | DataFrame | 
| rows by labels | 
 | DataFrame | 
| rows by boolean | 
 | DataFrame | 
| entries by integers | 
 | DataFrame | 
| entries by labels | 
 | DataFrame | 
Note that
- if only select rows or columns, then - []is enough.
- .locis primarily label based, but may also be used with a boolean array. The following are valid input- a single label, a list of labels, a slice of labels, a boolean array 
 
- .ilocis primarily integer based. The following are valid input- an integer, a list of integers, a slice of integers, a boolean array 
 
Methods¶
- .max()
- df['col1'].corr(df['col2'])
- df.dtypesand- df['col'].dtype
- df['col'].astype(). To convert a column of numbers in string format to float format, two methods- pd.to_numeric('col')or- df['col'].astype(float)can be used. But if there is entry- missing, it is better to use the former one, since we can specify- errors='coerce'to set the invalid parsing as NaN.
- .rename(columns=names)where names is a dictionary of- 'old_name':'new_name'
- .isnull()to check missing
- .dropna()to drop all rows containing any missing entries- add - subset = ['col1', 'col2']to specify columns
 
- .fillna(value='')fill missing values with specific value- .fillna(method='bfill')
- .fillna(method='ffill')
 
- df.agg(func, axis=0)to apply built-in aggregation or self-defined functions to column(s) ‘0’ or row(s) ‘1’.
- df.apply(fun)apply self-defined function to columns or rows- axis = {0 or ‘index’, 1 or ‘columns’}, default 0
 
- df.applymap(fun)apply self-defined function to each entry
- df._get_numeric_data()filter only numeric columns
- df.sort_values(by, axis=0, ascending=True)sort values
Multiple methods can be applied sequentially and arranged in a easy-to-read format using ()
(df
  .groupby('type')
  .mean()
  .sort_values()
  .plot
  .bar(
    figsize = (4,3),
    layout = (4,5),
  )
)
Plot¶
- Equal axis aspect ratio using ax or plt - axs[0, 1].axis('equal')
- axs.set_aspect('equal', 'box')
- plt.gca().set_aspect('equal', adjustable='box')
- plt.axis('square')
 
- Spine placement docu - ax.spines.left.set_position('center') ax.spines.bottom.set_position('center') ax.spines.right.set_color('none') ax.spines.top.set_color('none') 
- Shaded area - plt.fill_between(x, yhigh, ylow, facecolor="orange", # The fill color color='blue', # The outline color alpha=0.2) # Transparency of the fill 
Miscellaneous¶
- %whowill give you a list of all current user-defined variables
- %whoswill give you more details on all current user-defined variables
- dir()will give you the list of in scope variables