API

Package sparkdatachallenge

Arrays A and B consisting of N non-negative integers are given. Together, they represent N real numbers, denoted as C[0], …, C[N−1]. Elements of A represent the integer parts and the corresponding elements of B (divided by 1,000,000) represent the fractional parts of the elements of C.

A[I] and B[I] represent C[I] = A[I] + B[I] / 1,000,000.

## A pair of indices (P, Q) is multiplicative if 0 ≤ P < Q < N and C[P] * C[Q] ≥ C[P] + C[Q]. ##

The package contains several methods to find the number of multiplicative pairs in C.

sparkdatachallenge.check_input(inA: numpy.array, inB: numpy.array, scale: int = 1000000)bool

Check input method.

Parameters
  • inA (np.array) – array containing the integer part

  • inB (np.array) – array containing the decimal part

  • scale (int, optional) – scale factor for the decimal parts, by default 1_000_000

Returns

Check if input is valid.

Return type

bool

sparkdatachallenge.compare(A: numpy.array, B: numpy.array, P: int, Q: int, scale: int = 1000000)bool

Comparing composed numbers using there original integer and decimal values as integers.

Parameters
  • A (np.array) – integer parts

  • B (np.array) – decimal parts

  • P (int) – index

  • Q (int) – index

  • scale (int, optional) – scale for decimals, by default 1_000_000

Returns

return true if multiplicative

Return type

bool

sparkdatachallenge.generate_add_triu(C)

Method to return an upper triangular array, containing the element by element sums of a given input array C. The upper triangular part comes from the fact we only want products where col_idx > row_idx (hence k=-1) as C is assumed to be an non-decreasing array of decimal numbers adn where are looking for multiplicative pairs.

Parameters

C (np.array) – non-decreasing array of decimal numbers

Returns

upper triangular array of element by element sums

Return type

np.array

sparkdatachallenge.generate_mul_triu(C: numpy.array)numpy.array

Method to return an upper triangular array, containing the element by element products of a given input array C. The upper triangular part comes from the fact we only want products where col_idx > row_idx (hence k=-1) as C is assumed to be an non-decreasing array of decimal numbers adn where are looking for multiplicative pairs.

Cnp.array

non-decreasing array of decimal numbers

Returns

upper triangular array of element by element products

Return type

np.array

sparkdatachallenge.pairs(M: numpy.array)List[tuple]

Method to generate the multiplicative pairs.add()

Parameters

M (np.array) – Array containing inequality values.

Returns

List of pairs as tuples.

Return type

List[tuple]

sparkdatachallenge.solution_brute1(A: numpy.array, B: numpy.array, verbose: bool = True)int

Brute force method one - using upper triangular matrices. Expected to fail with large arrays and it does due to memory issues !!!!

FAILS FOR LARGE ARRAYS!!!!

Parameters
  • A (np.array) – Integer part array

  • B (np.array) – Decimal part array

  • verbose (bool, optional) – to print out of pairs, by default True

Returns

number of multiplicative pairs

Return type

int

sparkdatachallenge.solution_brute2(A: numpy.array, B: numpy.array, verbose: bool = True, threshold: int = 1000000000, scale: int = 1000000)int

Brute force method based on double for-loop.add()

Parameters
  • A (np.array) – integer part of the decimal numbers

  • B (np.array) – decimal part of the decimal numbers

  • verbose (bool, optional) – Print the mul pairs, by default True

  • threshold (int, optional) – Threshold for breaking the for looop, by default 1_000_000_000

  • scale (int, optional) – scale factor for the decimals, by default 1_000_000

Returns

returns the number of mul pairs of lower than threshold otherwise return threshold value

Return type

int

sparkdatachallenge.solution_math(A: numpy.array, B: numpy.array, threshold: int = 1000000000, scale: int = 1000000)int

Math based method. See tutorial/examples in docs for more details.add()

Parameters
  • A (np.array) – integer part of the decimal numbers

  • B (np.array) – decimal part of the decimal numbers

  • threshold (int, optional) – threshold value for the number of pairs, by default 1_000_000_000

  • scale (int, optional) – scale factor for the decimals, by default 1_000_000

Returns

returns number of mul pairs or the threshold value

Return type

int

sparkdatachallenge.solution_math2(A: numpy.array, B: numpy.array, threshold: int = 1000000000, scale: int = 1000000)int

Math based method. See tutorial/examples in docs for more details.add()

Parameters
  • A (np.array) – integer part of the decimal numbers

  • B (np.array) – decimal part of the decimal numbers

  • threshold (int, optional) – threshold value for the number of pairs, by default 1_000_000_000

  • scale (int, optional) – scale factor for the decimals, by default 1_000_000

Returns

returns number of mul pairs or the threshold value

Return type

int